Cluster Node Unable to Maintain Cluster Membership

My cluster logs are very similar to the above thread... was it ever addressed?
[SV] Already protecting connection with message security level 'sign'
[FTI] Stream already exists to node: false
[Channel IP to another cluster node member] Close()
GracefuleClose(1226) because of channel to remote endpoint another cluster node
~ is closed
Cluster services stops and generates:
The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server serverName$. The target name used was
serverName.
This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal name (SPN) is registered on an account other than the account the target service is using. Ensure that the target SPN
is only registered on the account used by the server.
Roderick Lyons

Hi Roderick Lyons,
Could you tell us the exact URL “above thread” I am not very sure which thread you meaning.
Please offer more information about your environment, such as, the DC server edition, the cluster node server edition.
If you are 2003 and 2012R2 mixed DC environment please restart your cluster node then do the further monitor.
The related article:
It turns out that weird things can happen when you mix Windows Server 2003 and Windows Server 2012 R2 domain controllers
http://blogs.technet.com/b/askds/archive/2014/07/23/it-turns-out-that-weird-things-can-happen-when-you-mix-windows-server-2003-and-windows-server-2012-r2-domain-controllers.aspx
Can't log on after changing machine account password in mixed Windows Server 2012 R2 and Windows Server 2003 environment
http://support.microsoft.com/kb/2989971
From the current error another possible is you never run the cluster validation before you create the cluster, please run the cluster validation first then post
the warning or error information.
If above solution not work please consider reboot your PDC at unproductive time.
More information:
Kerberos Service Principal Name on Wrong Account
https://support.microsoft.com/kb/2706695?wa=wsignin1.0
Fixing the Security-Kerberos / 4 error
http://blogs.technet.com/b/dcaro/archive/2013/07/04/fixing-the-security-kerberos-4-error.aspx
Service Principal Names (SPNs) SetSPN Syntax (Setspn.exe)
http://social.technet.microsoft.com/wiki/contents/articles/717.service-principal-names-spns-setspn-syntax-setspn-exe.aspx
I’m glad to be of help to you!
Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Support, contact [email protected]

Similar Messages

How to fix ? please advise: In Adobe LiveCycle ES2, JBOSS(4.2.1.GA) node unable to join cluster after restart.

Hi Team,
We are using Adobe LiveCycle ES2, JBOSS(4.2.1.GA) on windows OS.
We are facing issue after every time we restart JBOSS. JBOSS node after restart is coming up but unable to join the cluster.
We are getting below error in the jboss server.log:
2014-07-18 00:25:37,206 WARN [org.jgroups.protocols.pbcast.GMS] join(10.183.100.39:61469) sent to 10.183.100.39:64118 timed out, retrying
2014-07-18 00:25:44,206 WARN [org.jgroups.protocols.pbcast.GMS] join(10.183.100.39:61469) sent to 10.183.100.39:64118 timed out, retrying
2014-07-18 00:25:51,206 WARN [org.jgroups.protocols.pbcast.GMS] join(10.183.100.39:61469) sent to 10.183.100.39:64118 timed out, retrying
2014-07-18 00:25:58,207 WARN [org.jgroups.protocols.pbcast.GMS] join(10.183.100.39:61469) sent to 10.183.100.39:64118 timed out, retrying
2014-07-18 00:26:05,207 WARN [org.jgroups.protocols.pbcast.GMS] join(10.183.100.39:61469) sent to 10.183.100.39:64118 timed out, retrying
Could you please help to advise on this.
Thanks.

My apologies about the wall of text. After I made my original post, I thought maybe it would better to go back and put it in a pastebin instead. I was not able to edit that post once I sent it.
In regards to your question, the permissions on the
/Library/LaunchAgents/com.adobe.AAM.Updater-1.0.plist file is "read and write" for system, wheel and everyone.

Cluster nodes discover peers outside cluster domain

All, cross-posted from the ColdFusion Server Administration
forum:
I've run into an issue with CFMX7 clustering on a subnet with
multicast disabled. In our configuration, we have two physical
Windows Server 2003 Enterprise Edition servers hosting nine
ColdFusion MX 7 Enterprise clusters. Each server hosts one of two
instances in a cluster. i.e.:
server1 [1.2.3.4] - instance1-1 <- cluster1 -> server2
[1.2.3.5] - instance1-2
server1 [1.2.3.4] - instance2-1 <- cluster2 -> server2
[1.2.3.5] - instance2-2
server1 [1.2.3.4] - instance3-1 <- cluster3 -> server2
[1.2.3.5] - instance3-2
server1 [1.2.3.4] - instance4-1 <- cluster4 -> server2
[1.2.3.5] - instance4-2
server1 [1.2.3.4] - instance5-1 <- cluster5 -> server2
[1.2.3.5] - instance5-2
server1 [1.2.3.4] - instance6-1 <- cluster6 -> server2
[1.2.3.5] - instance6-2
server1 [1.2.3.4] - instance7-1 <- cluster7 -> server2
[1.2.3.5] - instance7-2
server1 [1.2.3.4] - instance8-1 <- cluster8 -> server2
[1.2.3.5] - instance8-2
server1 [1.2.3.4] - instance9-1 <- cluster9 -> server2
[1.2.3.5] - instance9-2
My first step in enabling peer discovery was to add the
unicastPeer attribute to the ClusterManager service under each
instance.
e.g. jrun.xml on instance1-1:
<service class="jrunx.cluster.ClusterManager"
name="ClusterManager">
<attribute name="bindToJNDI">true</attribute>
<attribute name="enabled">true</attribute>
<attribute
name="clusterDomain">cluster1</attribute>




<attribute name="unicastPeer">1.2.3.5</attribute>
<service class="jrunx.cluster.ClusterDeployerService"
name="ClusterDeployerService">
<attribute
name="deployDirectory">{jrun.server.rootdir}/SERVER-INF/cluster</attribute>
<attribute name="deactivated">false</attribute>
</service>
</service>
e.g. jrun.xml on instance1-2:
<service class="jrunx.cluster.ClusterManager"
name="ClusterManager">
<attribute name="bindToJNDI">true</attribute>
<attribute name="enabled">true</attribute>
<attribute
name="clusterDomain">cluster1</attribute>




<attribute name="unicastPeer">1.2.3.4</attribute>
<service class="jrunx.cluster.ClusterDeployerService"
name="ClusterDeployerService">
<attribute
name="deployDirectory">{jrun.server.rootdir}/SERVER-INF/cluster</attribute>
<attribute name="deactivated">false</attribute>
</service>
</service>
. . . and so on for each instance and cluster. This is where
the problem begins. When I start the instances, every instance
discovers every other instance as a cluster peer, regardless of
cluster domain.
Another forum user suggested using host:port, where port is
the JNDI listening port. That doesn't work. Using the Jini
listening port, however, does work, e.g.:
<attribute
name="unicastPeer">1.2.3.4:4160</attribute>
That presents another problem. The Jini listening port
defaults to 4160. If 4160 is taken, a port is chosen at random.
I can't find documentation on setting a static Jini listening
port, if that's even the correct action to take.
Thoughts?
From what I can tell, the version of Reggie (the Jini lookup
service) shipped with JRun only supports setting the unicast
listening port programmatically. Reggie is started by
jrunx.cluster.ClusterManager.init--actually, the private method
startLookupService--and JRun doesn't appear to ever call Reggie's
setUnicastPeer method.
Assuming we can't tweak Reggie, I guess a more appropriate
question is how do we get JRun's RMI service (?) to honor
groups/domains in a call to getPeers? I'll cross-post to the JRun
forums and investigate JRun Updater 6.
Trev

. . . and it appears I'm exposing my ignorance of Jini in
general. :-)
If I now understand the Jini discovery process correctly, a
multicast request includes one or more service IDs and one or more
groups. The registrar will respond if and only if its service ID is
not in the request and its group memberships exactly match one or
more of the groups in the request.
A unicast request includes nothing more than the protocol
version, and the registrar will respond as if a valid multicast
request had been received.
In both cases, the response packet includes a marshalled copy
of the ServiceRegistrar object and the names of all groups of which
the registrar is a member.
Without looking at more of JRun, I'm guessing that in some,
if not all cases, either JRun's discovery implementation assumes
that any response from a unicast query is valid, regardless of the
server IDs or group names received, or the logic that sorts out the
response isn't 100% correct.

Cluster node networking

I have five node Windows Server 2008 R2 Hyper-V cluster. I put one node to Maintance mode and all VMs migrated to other hosts. I pulled out LAN cables form that node for testing (one out, waited a litte, put it back and pulled second and so on) and put
them right back on.
After that I had a lot of cluster errors and some VMs restarted.
I have put many times nodes on maintance mode and restarted / shut down them and never had any cluster problems. Why did I have now when I pulled out LAN cables?

Hi antesl,
The
failover behavior occurs because the cluster node has detect the cluster resource or node fail, such as network, storage, please refer the following related KB to confirm there have no potential single point failure configuration in your
cluster.
Failover Cluster
http://msdn.microsoft.com/en-us/library/ff650328.aspx
Failover Cluster Step-by-Step Guide: Configuring the Quorum in a Failover Cluster
http://technet.microsoft.com/zh-cn/library/cc770620(v=ws.10).aspx
How a Server Cluster Works
http://technet.microsoft.com/en-us/library/cc738051(v=ws.10).aspx
HYPER-V 2008 R2 SP1 Best Practices (In Easy Checklist Form)
http://blogs.technet.com/b/askpfeplat/archive/2012/11/19/hyper-v-2008-r2-sp1-best-practices-in-easy-checklist-form.aspx
I’m glad to be of help to you!
Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Support, contact [email protected]

OES2 SP2a cluster node freeze

Hi all.
I have a 3 node cluster based on OES2 SP2a fully patched. There are a coupe of resources: Master_IP and a NSS volume.
The cluster is virtualized on ESXi 4.1 fully patched, and vmware-tools are installed and up to date.
If i do an "rcnetwork stop" on a node, it remains with no network for about 20 seconds, and then freezes. Does not reboot. Only freezes. The resource is balanced correctly, but the server remains hanged.
This behaviour is the same on a server with a cluster resource on it and on a server with no cluster resource on it. Always hangs.
The correct behaviour should be a reboot, shouldn't?
Any hints?
Thanks in advance.

The node does not reboot because ....
9.11 Preventing a Cluster Node Reboot after a Node Shutdown
If LAN connectivity is lost between a cluster node and the other nodes in the cluster, it is possible that the lost node will be automatically shut down by the other cluster nodes. This is normal cluster operating behavior, and it prevents the lost node from trying to load cluster resources because it cannot detect the other cluster nodes. By default, cluster nodes are configured to reboot after an automatic shutdown.
On certain occasions, you might want to prevent a downed cluster node from rebooting so you can troubleshoot problems.
Section 9.11.1, OES 2 SP2 with Patches and Later
Section 9.11.2, OES 2 SP2 Release Version and Earlier
9.11.1 OES 2 SP2 with Patches and Later
Beginning in the OES 2 SP2 Maintenance Patch for May 2010, the Novell Cluster Services reboot behavior conforms to the kernel panic setting for the Linux operating system. By default the kernel panic setting is set for no reboot after a node shutdown.
You can set the kernel panic behavior in the /etc/sysctl.conf file by adding a kernel.panic command line. Set the value to 0 for no reboot after a node shutdown. Set the value to a positive integer value to indicate that the server should be rebooted after waiting the specified number of seconds. For information about the Linux sysctl, see the Linux man pages on sysctl and sysctl.conf.
1.
As the root user, open the /etc/sysctl.conf file in a text editor.
2.
If the kernel.panic token is not present, add it.
kernel.panic = 0
3.
Set the kernel.panic value to 0 or to a positive integer value, depending on the desired behavior.
No Reboot: To prevent an automatic cluster reboot after a node shutdown, set the kernel.panic token to value to 0. This allows the administrator to determine what caused the kernel panic condition before manually rebooting the server. This is the recommended setting.
kernel.panic = 0
Reboot: To allow a cluster node to reboot automatically after a node shutdown, set the kernel.panic token to a positive integer value that represents the seconds to delay the reboot.
kernel.panic = <seconds>
For example, to wait 1 minute (60 seconds) before rebooting the server, specify the following:
kernel.panic = 60
4.
Save your changes.
9.11.2 OES 2 SP2 Release Version and Earlier
In OES 2 SP release version and earlier, you can modify the opt/novell/ncs/bin/ldncs file for the cluster to trigger the server to not automatically reboot after a shutdown.
1.
Open the opt/novell/ncs/bin/ldncs file in a text editor.
2.
Find the following line:
echo -n $TOLERANCE > /proc/sys/kernel/panic
3.
Replace $TOLERANCE with a value of 0 to cause the server to not automatically reboot after a shutdown.
4.
After editing the ldncs file, you must reboot the server to cause the change to take effect.

Unable to failover the services in active-active cluster node

Hi,
i am applying the sp2 patch for sql server 2008 r2 in active-active cluster, we have 3 services in the cluster , node 1 as 2 prefered owner and node 2 as 1 prefered owner, when i try to move the service from node 2 to node1 , i am getting the below errors
DCOM was unable to communicate with the computer XXXXXXXXX using any of the configured protocols.
The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server XXXXXXXXX. The target name used was RPCSS/XXXXXX. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal
name (SPN) is registered on an account other than the account the target service is using. Please ensure that the target SPN is registered on, and only registered on, the account used by the server. This error can also happen when the target service is using
a different password for the target service account than what the Kerberos Key Distribution Center (KDC) has for the target service account. Please ensure that the service on the server and the KDC are both updated to use the current password. If the server
name is not fully qualified, and the target domain (XXXXXX) is different from the client domain (XXXXXXX), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server.
The Cluster service failed to bring clustered service or application 'CHCROCHC045' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
Cluster resource 'SQL Server (CHCROCHC045)' in clustered service or application 'CHCROCHC045' failed.
any inputs appreciated to resolve this issue as i could not procedd with patching
BR
PGR

Hi PGR,
As the issue is more related to Windows Server, I would like to recommend you post the issue in the
Windows Server forums for better support.
In addition, below are some article about troubleshooting error ” DCOM was unable to communicate with the computer XXXXXXXXX using any of the configured protocols” for your reference.
Event ID 10009 — COM Remote Service Availability
How to troubleshoot DCOM 10009 error logged in system event?
Thanks,
Lydia Zhang
Lydia Zhang
TechNet Community Support

Unable to create cluster, hangs on forming cluster

Hi all,
I am trying to create a 2 node cluster on two x64 Windows Server 2008 Enterprise edition servers. I am running the setup from the failover cluster MMC and it seems to run ok right up to the point where the snap-in says creating cluster. Then it seems to hang on "forming cluster" and a message pops up saying "The operation is taking longer than expected". A counter comes up and when it hits 2 minutes the wizard cancels and another message comes up "Unable to sucessfully cleanup".
The validation runs successfully before I start trying to create the cluster. The hardware involved is a HP EVA 6000, two Dell 2950's
I have included the report generated by the create cluster wizard below and the error from the event log on one of the machines (the error is the same on both machines).
Is there anything I can do to give me a better indication of what is happening, so I can resolve this issue or does anyone have any suggestions for me?
Thanks in advance.
Anthony
Create Cluster Log
==================
Beginning to configure the cluster <cluster>.
Initializing Cluster <cluster>.
Validating cluster state on node <Node1>
Searching the domain for computer object 'cluster'.
Creating a new computer object for 'cluster' in the domain.
Configuring computer object 'cluster' as cluster name object.
Validating installation of the Network FT Driver on node <Node1>
Validating installation of the Cluster Disk Driver on node <Node1>
Configuring Cluster Service on node <Node1>
Validating installation of the Network FT Driver on node <Node2>
Validating installation of the Cluster Disk Driver on node <Node2>
Configuring Cluster Service on node <Node2>
Waiting for notification that Cluster service on node <Node2>
Forming cluster '<cluster>'.
Unable to successfully cleanup.
To troubleshoot cluster creation problems, run the Validate a Configuration wizard on the servers you want to cluster.
Event Log
=========
Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          29/08/2008 19:43:14
Event ID:      1570
Task Category: None
Level:         Critical
Keywords:
User:          SYSTEM
Computer:      <NODE 2>
Description:
Node 'NODE2' failed to establish a communication session while joining the cluster. This was due to an authentication failure. Please verify that the nodes are running compatible versions of the cluster service software.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
    <Provider Name="Microsoft-Windows-FailoverClustering" Guid="{baf908ea-3421-4ca9-9b84-6689b8c6f85f}" />
    <EventID>1570</EventID>
    <Version>0</Version>
    <Level>1</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime="2008-08-29T18:43:14.294Z" />
    <EventRecordID>4481</EventRecordID>
    <Correlation />
    <Execution ProcessID="2412" ThreadID="3416" />
    <Channel>System</Channel>
    <Computer>NODE2</Computer>
    <Security UserID="S-1-5-18" />
</System>
<EventData>
    <Data Name="NodeName">node2</Data>
</EventData>
</Event>
====
I have also since tried creating the cluster with the firewall and no success.
I have tried creating the node from the other cluster and this did not work either
I tried creating a cluster with just a single node and this did create a cluster. I could not join the other node and the network name resource did not come online either. The below is from the event logs.
Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          01/09/2008 12:42:44
Event ID:      1207
Task Category: Network Name Resource
Level:         Error
Keywords:
User:          SYSTEM
Computer:      Node1.Domain
Description:
Cluster network name resource 'Cluster Name' cannot be brought online. The computer object associated with the resource could not be updated in domain 'Domain' for the following reason:
Unable to obtain the Primary Cluster Name Identity token.
The text for the associated error code is: An attempt has been made to operate on an impersonation token by a thread that is not currently impersonating a client.
The cluster identity 'CLUSTER$' may lack permissions required to update the object. Please work with your domain administrator to ensure that the cluster identity can update computer objects in the domain.

I am having the exact same issue... but these are on freshly created virtual machines... no group policy or anything...
I am 100% unable to create a Virtual Windows server 2012 failover cluster using two virtual fiber channel adapters to connect to the shared storage.
I've tried using GUI and powershell, I've tried adding all available storage, or not adding it, I've tried renaming the server and changing all the IP addresses....
To reproduce:
1. Create two identical Server 2012 virtual machines
(My Config: 4 CPU's, 4gb-8gb dynamic memory, 40gb HDD, two network cards (one for private, one for mgmt), two fiber cards to connect one to each vsan.)
2. Update both VM's to current windows updates
3. Add Failover Clustering role, Reboot, and try to create cluster.
Cluster passed all validation tests perfectly, but then it gets to "forming cluster" and times out =/
Any assistance would be greatly appreciate.

SCVMM losing connection to cluster nodes

Hey guys'n girls, I hope this is the right forum for this question. I already opened a ticket at MS support as well because it's impacting our production environment indirectly, but even after a week there's been no contact. Losing faith in MS support there
The problem we're having is that scvmm is that a host enters the 'needs attention' state, with a winrm error 0x80338126. I guess it has something to do with the network or with Kerberos, and I've found some info on it, but I still haven't been able to solve
it. Do you guys have any ideas?
Problem summary:
We are seeing an issue on our new hyper-v platform. The platform should have been in production last week, but this issue is delaying our project as we can't seem to get it stable.
The problem we are experiencing is that SCVMM loses the connection to some of the Hyper-V nodes. Not one
specific node. Last week it happened to two nodes, and today it happened to another node. I see issues with WinRM, and I expect something to do with kerberos. See the bottom of this post for background details and software versions.
The host gets the status 'needs attention', and if you look at the status of the machine, WinRM gives an error. The error is:
Error (2916)
VMM is unable to complete the request. The connection to the agent cc1-hyp-10.domaincloud1.local was lost.
WinRM: URL: [http://cc1-hyp-10.domaincloud1.local:5985], Verb: [ENUMERATE], Resource: [http://schemas.microsoft.com/wbem/wsman/1/wmi/root/cimv2/Win32_Service], Filter: [select * from Win32_Service where Name="WinRM"]
Unknown error (0x80338126)
Recommended Action
Ensure that the Windows Remote Management (WinRM) service and the VMM agent are installed and running and that a firewall is not blocking HTTP/HTTPS traffic. Ensure that VMM server is able to communicate with cc1-hyp-10.domaincloud1.local over WinRM by successfully
running the following command:
winrm id –r:cc1-hyp-10.domaincloud1.local
This
problem can also be caused by a Windows Management Instrumentation (WMI) service crash. If the server is running Windows Server 2008 R2, ensure that KB 982293 (http://support.microsoft.com/kb/982293)
is installed on it.
If the error persists, restart cc1-hyp-10.domaincloud1.local and then try the operation again. /nRefer to
http://support.microsoft.com/kb/2742275 for more details.
Doing a simple test from the VMM server to the problematic cluster node shows this error:
PS C:\> hostname
CC1-VMM-01
PS C:\> winrm id -r:cc1-hyp-10.domaincloud1.local
WSManFault
    Message = WinRM cannot complete the operation. Verify that the specified computer name is valid, that the computer is accessible over the network, and that a firewall exception for the WinRM service is enabled and allows access from this
computer. By default, the WinRM firewall exception for public profiles limits access to remote computers within the same local subnet.
Error number: -2144108250 0x80338126
WinRM cannot complete the operation. Verify that the specified computer name is valid, that the computer is accessible over the network, and that a firewall exception for the WinRM service is enabled and allows access from this computer. By default, the WinRM
firewall exception for public profiles limits access to remote computers within the same local subnet.
I CAN connect from other hosts to this problematic cluster node:
PS C:\> hostname
CC1-HYP-16
PS C:\> winrm id -r:cc1-hyp-10.domaincloud1.local
IdentifyResponse
    ProtocolVersion =
http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd
    ProductVendor = Microsoft Corporation
    ProductVersion = OS: 6.3.9600 SP: 0.0 Stack: 3.0
    SecurityProfiles
        SecurityProfileName =
http://schemas.dmtf.org/wbem/wsman/1/wsman/secprofile/http/spnego-kerberos
And I can connect from the vmm server to all other cluster nodes:
PS C:\> hostname
CC1-VMM-01
PS C:\> winrm id -r:cc1-hyp-11.domaincloud1.local
IdentifyResponse
    ProtocolVersion =
http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd
    ProductVendor = Microsoft Corporation
    ProductVersion = OS: 6.3.9600 SP: 0.0 Stack: 3.0
    SecurityProfiles
        SecurityProfileName =
http://schemas.dmtf.org/wbem/wsman/1/wsman/secprofile/http/spnego-kerberos
So at this point only the test from the cc1-vmm-01 to cc1-hyp-10 seems to be problematic.
I followed the steps in the page
https://support.microsoft.com/kb/2742275 (which is referred to above). I tried the VMMCA, but it can't really get it working the way I want, or it seems to give outdated recommendations.
I tried checking for duplicate SPN's by running setspn -x on affected machines. No results (although I do not understand
what an SPN is or how it works). I rebuilt the performance counters.
It tried setting 'sc config winrm type= own' as described in [http://blinditandnetworkadmin.blogspot.nl/2012/08/kb-how-to-troubleshoot-needs-attention.html].
If I reboot this cc1-hyp-10 machine, it will start working perfectly again. However, then I can't troubleshoot the issue, and it will happen again.
I want this problem to be solved, so vmm never loses connection to the hypervisors it's managing again!
Background information:
We've set up a platform with Hyper-V to run a VM workload. The platform consists of the following hardware:
2 Dell R620's with 32GB of RAM, running hyper-v to virtualize the cloud management layer (DC's, VMM, SQL). These machines are called cc1-hyp-01 and cc1-hyp-02. They run the management vm's like cc1-dc-01/02, cc1-sql-01, cc1-vmm-01, etc. The names are self-explanatory.
The VMM machine is NOT clustered.
8 Dell M620 blades with 320GB of RAM, running hyper-v to virtualize the customer workload. The machines are
called cc1-hyp-10 until cc1-hyp-17. They are in a cluster.
2 Equallogic units form a SAN (premium storage), and we have a Dell R515 running iscsi target (budget storage).
We have Dell Force10 switches and Cisco C3750X switches to connect everything together (mostly 10GB links).
All hosts run Windows Server 2012R2 Datacenter edition. The VMM server runs System Center Virtual Machine Manage 2012 R2.
All the latest Windows updates are installed on every host. There are no firewalls between any host (vmm and hypervisors) at this level. Windows firewalls are all disabled. No antivirus software is installed, no symantec software is installed.
The only non-standard software that is installed is the Dell Host Integration Tools 4.7.1, Dell Openmanage Server Administrator, and some small stuff like 7-zip, bginfo, net-snap, etc.
The SCVMM service is running under the domain account DOMAINCLOUD1\scvmm. This machine is in the local administrators group of each cluster node.
On top of this cloud layer we're running the tenant layer with a lot of vm's for a specific customer (although they are all off now).

I think I found the culprit, after an hour of analyzing wireshark dumps I found the vmm had jumbo frames enabled on the management interface to the hosts (and the underlying infrastructure does not).. Now my winrm commands started working again.

Error while getting cluster node subtree

Hi,
We are on SP15.
The console logs show the following error
log generation timestamp : 2006_01_17_at_17_14_05
java.rmi.RemoteException: Error while getting cluster node subtree of :name=ClusterNodeRepresentative,j2eeType=com.sap.engine.services.adminadapter.impl.ClusterNodeRepresentative,SAP_J2EEClusterNode=2053400,SAP_J2EECluster=""; nested exception is:
     com.sap.engine.services.jmx.exception.MBeanServerClusterException: Exception during invocation of remote MBeanServer method, target node: 2053400
     at com.sap.engine.services.adminadapter.impl.ConvenienceEngineAdministratorImpl.getClusterNodeSubTree(ConvenienceEngineAdministratorImpl.java:242)
     at com.sap.engine.services.adminadapter.impl.ConvenienceEngineAdministratorImplp4_Skel.dispatch(ConvenienceEngineAdministratorImplp4_Skel.java:99)
     at com.sap.engine.services.rmi_p4.DispatchImpl._runInternal(DispatchImpl.java:304)
     at com.sap.engine.services.rmi_p4.DispatchImpl._run(DispatchImpl.java:193)
     at com.sap.engine.services.rmi_p4.server.P4SessionProcessor.request(P4SessionProcessor.java:122)
     at com.sap.engine.core.service630.context.cluster.session.ApplicationSessionMessageListener.process(ApplicationSessionMessageListener.java:33)
     at com.sap.engine.core.cluster.impl6.session.MessageRunner.run(MessageRunner.java:41)
     at com.sap.engine.core.thread.impl3.ActionObject.run(ActionObject.java:37)
     at java.security.AccessController.doPrivileged(Native Method)
     at com.sap.engine.core.thread.impl3.SingleThread.execute(SingleThread.java:100)
     at com.sap.engine.core.thread.impl3.SingleThread.run(SingleThread.java:170)
Caused by: com.sap.engine.services.jmx.exception.MBeanServerClusterException: Exception during invocation of remote MBeanServer method, target node: 2053400
     at com.sap.engine.services.jmx.ClusterInterceptor.invoke(ClusterInterceptor.java:816)
     at com.sap.pj.jmx.server.interceptor.MBeanServerInterceptorChain.invoke(MBeanServerInterceptorChain.java:330)
     at com.sap.engine.services.adminadapter.impl.ConvenienceEngineAdministratorImpl.getClusterNodeSubTree(ConvenienceEngineAdministratorImpl.java:239)
     ... 10 more
Caused by: com.sap.engine.services.jmx.exception.JmxConnectorException: Unable to de-serialize request parameters, message [ JMX request (java) v1.0 len: 345 | src: cluster target-node: 2053400 req: invoke params-number: 4 params-bytes: 0 | :name=ClusterNodeRepresentative,j2eeType=com.sap.engine.services.adminadapter.impl.ClusterNodeRepresentative,SAP_J2EEClusterNode=2053400,SAP_J2EECluster="" null null null ]
     at com.sap.engine.services.jmx.MBeanServerConnectionImpl.invokeMbsInternal(MBeanServerConnectionImpl.java:680)
     at com.sap.engine.services.jmx.MBeanServerConnectionImpl.invoke(MBeanServerConnectionImpl.java:467)
     at com.sap.engine.services.jmx.MBeanServerConnectionSecurityWrapper.invoke(MBeanServerConnectionSecurityWrapper.java:221)
     at com.sap.engine.services.jmx.ClusterInterceptor.invoke(ClusterInterceptor.java:813)
     ... 12 more
Caused by: javax.management.InstanceNotFoundException: MBean with name com.sap.default:name=ClusterNodeRepresentative,j2eeType=com.sap.engine.services.adminadapter.impl.ClusterNodeRepresentative,SAP_J2EEClusterNode=2053400,SAP_J2EECluster=XD1 not found in repository
     at com.sap.pj.jmx.server.MBeanServerImpl.getClassLoaderFor(MBeanServerImpl.java:1408)
     at com.sap.pj.jmx.server.interceptor.MBeanServerWrapperInterceptor.getClassLoaderFor(MBeanServerWrapperInterceptor.java:455)
     at com.sap.engine.services.jmx.CompletionInterceptor.getClassLoaderFor(CompletionInterceptor.java:567)
     at com.sap.pj.jmx.server.interceptor.BasicMBeanServerInterceptor.getClassLoaderFor(BasicMBeanServerInterceptor.java:438)
     at com.sap.jmx.provider.ProviderInterceptor.getClassLoaderFor(ProviderInterceptor.java:330)
     at com.sap.engine.services.jmx.RedirectInterceptor.getClassLoaderFor(RedirectInterceptor.java:501)
     at com.sap.pj.jmx.server.interceptor.MBeanServerInterceptorChain.getClassLoaderFor(MBeanServerInterceptorChain.java:443)
     at com.sap.engine.services.jmx.RequestMessage.readParams(RequestMessage.java:523)
     at com.sap.engine.services.jmx.RequestMessage.getParams(RequestMessage.java:578)
     at com.sap.engine.services.jmx.MBeanServerInvoker.invokeMbs(MBeanServerInvoker.java:106)
     at com.sap.engine.services.jmx.JmxServiceConnectorServer.receiveWait(JmxServiceConnectorServer.java:173)
     at com.sap.engine.core.service630.context.cluster.message.MessageListenerWrapper.process(MessageListenerWrapper.java:81)
     at com.sap.engine.core.cluster.impl6.ms.MSListenerThread.run(MSListenerThread.java:47)
     at com.sap.engine.frame.core.thread.Task.run(Task.java:64)
     at com.sap.engine.core.thread.impl6.SingleThread.execute(SingleThread.java:78)
     at com.sap.engine.core.thread.impl6.SingleThread.run(SingleThread.java:148)
java.lang.NullPointerException
     at com.sap.engine.services.adminadapter.gui.ClusterView.addGlobalDispatcherServiceProperties(ClusterView.java:455)
     at com.sap.engine.services.adminadapter.gui.ClusterView.createGlobalTrees(ClusterView.java:508)
     at com.sap.engine.services.adminadapter.gui.ClusterView.access$1200(ClusterView.java:29)
     at com.sap.engine.services.adminadapter.gui.ClusterView$4.run(ClusterView.java:420)
java.rmi.RemoteException: Error while getting cluster node subtree of :name=ClusterNodeRepresentative,j2eeType=com.sap.engine.services.adminadapter.impl.ClusterNodeRepresentative,SAP_J2EEClusterNode=2053400,SAP_J2EECluster=""; nested exception is:
     com.sap.engine.services.jmx.exception.MBeanServerClusterException: Exception during invocation of remote MBeanServer method, target node: 2053400
     at com.sap.engine.services.adminadapter.impl.ConvenienceEngineAdministratorImpl.getClusterNodeSubTree(ConvenienceEngineAdministratorImpl.java:242)
     at com.sap.engine.services.adminadapter.impl.ConvenienceEngineAdministratorImplp4_Skel.dispatch(ConvenienceEngineAdministratorImplp4_Skel.java:99)
     at com.sap.engine.services.rmi_p4.DispatchImpl._runInternal(DispatchImpl.java:304)
     at com.sap.engine.services.rmi_p4.DispatchImpl._run(DispatchImpl.java:193)
     at com.sap.engine.services.rmi_p4.server.P4SessionProcessor.request(P4SessionProcessor.java:122)
     at com.sap.engine.core.service630.context.cluster.session.ApplicationSessionMessageListener.process(ApplicationSessionMessageListener.java:33)
     at com.sap.engine.core.cluster.impl6.session.MessageRunner.run(MessageRunner.java:41)
     at com.sap.engine.core.thread.impl3.ActionObject.run(ActionObject.java:37)
     at java.security.AccessController.doPrivileged(Native Method)
     at com.sap.engine.core.thread.impl3.SingleThread.execute(SingleThread.java:100)
     at com.sap.engine.core.thread.impl3.SingleThread.run(SingleThread.java:170)
Caused by: com.sap.engine.services.jmx.exception.MBeanServerClusterException: Exception during invocation of remote MBeanServer method, target node: 2053400
     at com.sap.engine.services.jmx.ClusterInterceptor.invoke(ClusterInterceptor.java:816)
     at com.sap.pj.jmx.server.interceptor.MBeanServerInterceptorChain.invoke(MBeanServerInterceptorChain.java:330)
     at com.sap.engine.services.adminadapter.impl.ConvenienceEngineAdministratorImpl.getClusterNodeSubTree(ConvenienceEngineAdministratorImpl.java:239)
     ... 10 more
Caused by: com.sap.engine.services.jmx.exception.JmxConnectorException: Unable to de-serialize request parameters, message [ JMX request (java) v1.0 len: 345 | src: cluster target-node: 2053400 req: invoke params-number: 4 params-bytes: 0 | :name=ClusterNodeRepresentative,j2eeType=com.sap.engine.services.adminadapter.impl.ClusterNodeRepresentative,SAP_J2EEClusterNode=2053400,SAP_J2EECluster="" null null null ]
     at com.sap.engine.services.jmx.MBeanServerConnectionImpl.invokeMbsInternal(MBeanServerConnectionImpl.java:680)
     at com.sap.engine.services.jmx.MBeanServerConnectionImpl.invoke(MBeanServerConnectionImpl.java:467)
     at com.sap.engine.services.jmx.MBeanServerConnectionSecurityWrapper.invoke(MBeanServerConnectionSecurityWrapper.java:221)
     at com.sap.engine.services.jmx.ClusterInterceptor.invoke(ClusterInterceptor.java:813)
     ... 12 more
Caused by: javax.management.InstanceNotFoundException: MBean with name com.sap.default:name=ClusterNodeRepresentative,j2eeType=com.sap.engine.services.adminadapter.impl.ClusterNodeRepresentative,SAP_J2EEClusterNode=2053400,SAP_J2EECluster=XD1 not found in repository
     at com.sap.pj.jmx.server.MBeanServerImpl.getClassLoaderFor(MBeanServerImpl.java:1408)
     at com.sap.pj.jmx.server.interceptor.MBeanServerWrapperInterceptor.getClassLoaderFor(MBeanServerWrapperInterceptor.java:455)
     at com.sap.engine.services.jmx.CompletionInterceptor.getClassLoaderFor(CompletionInterceptor.java:567)
     at com.sap.pj.jmx.server.interceptor.BasicMBeanServerInterceptor.getClassLoaderFor(BasicMBeanServerInterceptor.java:438)
     at com.sap.jmx.provider.ProviderInterceptor.getClassLoaderFor(ProviderInterceptor.java:330)
     at com.sap.engine.services.jmx.RedirectInterceptor.getClassLoaderFor(RedirectInterceptor.java:501)
     at com.sap.pj.jmx.server.interceptor.MBeanServerInterceptorChain.getClassLoaderFor(MBeanServerInterceptorChain.java:443)
     at com.sap.engine.services.jmx.RequestMessage.readParams(RequestMessage.java:523)
     at com.sap.engine.services.jmx.RequestMessage.getParams(RequestMessage.java:578)
     at com.sap.engine.services.jmx.MBeanServerInvoker.invokeMbs(MBeanServerInvoker.java:106)
     at com.sap.engine.services.jmx.JmxServiceConnectorServer.receiveWait(JmxServiceConnectorServer.java:173)
     at com.sap.engine.core.service630.context.cluster.message.MessageListenerWrapper.process(MessageListenerWrapper.java:81)
     at com.sap.engine.core.cluster.impl6.ms.MSListenerThread.run(MSListenerThread.java:47)
     at com.sap.engine.frame.core.thread.Task.run(Task.java:64)
     at com.sap.engine.core.thread.impl6.SingleThread.execute(SingleThread.java:78)
     at com.sap.engine.core.thread.impl6.SingleThread.run(SingleThread.java:148)
java.lang.NullPointerException
     at com.sap.engine.services.adminadapter.gui.ClusterView.addGlobalDispatcherServiceProperties(ClusterView.java:455)
     at com.sap.engine.services.adminadapter.gui.ClusterView.createGlobalTrees(ClusterView.java:508)
     at com.sap.engine.services.adminadapter.gui.ClusterView.access$1200(ClusterView.java:29)
     at com.sap.engine.services.adminadapter.gui.ClusterView$4.run(ClusterView.java:420)
Any clue whats it?
rgds

Go the same error
+ /usr/java14_64/bin/java -showversion -Duser.language=en -DP4ClassLoad=P4Connection -Dp4Cache=clean -jar go.jar
java version "1.4.2"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2)
Classic VM (build 1.4.2, J2RE 1.4.2 IBM AIX 5L for PowerPC (64 bit JVM) build caix64142ifx-20061222 (ifix 113727: SR7 + 112603) (JIT enabled: jitc))
java.lang.NullPointerException
        at com.sap.engine.services.adminadapter.gui.ClusterView$4.run(ClusterView.java:405)
Need some help!
Bernard

Question about cluster node NodeWeight property

Hi,
I have a three nodes (A/B/C) windows 2008 r2 sp1 cluster testCluster, and installed KB2494036 for three nodes,suppose Node A is a active node.
I configured node C's NodeWeight property to 0, and node A and node B keep default (NodeWeight=1). I also added a shared disk Q for cluster quorum.
So i want to know if node C and Node B are down , is the windows cluster testCluster down as lost of quorum or keep up?
At the first i thought testCluster should keep up , because the cluster has 2 votes (node A and quorum), node B is down, node C doesn't join voting. But after testing, testCluster was down as lost of quorum.
So anybody konw the reason,thanks.

Hello mark.gao,
Let me see if I understand correctly your steps, so I can think that if you create your cluster with three nodes at the beginning your quorum model should be "Node Majority", then you have three votes one per each node.
Then was removed the vote for Node "C" and added a disk to be witness for cluster quorum, at this point we have two out of three votes from the original configuration on "Node Majority"
Question:
At some point you changed the quorum model to be "Node and Disk Majority"???
Maybe this is the issue, you are stuck on "Node Majority" and when "B" and "C" nodes are down we have only one vote from node "A" therefore there is no quorum to keep the service online.
On 2012 we have the awesome option to configure a Dynamic Quorum:
Dynamic quorum management
In Windows Server 2012, as an advanced quorum configuration option, you can choose to enable dynamic quorum management by cluster. When this option is enabled, the cluster dynamically manages
the vote assignment to nodes, based on the state of each node. Votes are automatically removed from nodes that leave active cluster membership, and a vote is automatically assigned when a node rejoins the cluster. By default, dynamic quorum management is enabled.
Note
With dynamic quorum management, the cluster quorum majority is determined by the set of nodes that are active members of the cluster at any time. This is an important distinction from the cluster quorum in Windows Server 2008 R2, where the quorum
majority is fixed, based on the initial cluster configuration.
With dynamic quorum management, it is also possible for a cluster to run on the last surviving cluster node. By dynamically adjusting the quorum majority requirement, the cluster can sustain
sequential node shutdowns to a single node.
The cluster-assigned dynamic vote of a node can be verified with the DynamicWeight common property of the cluster node by using the Get-ClusterNodeWindows
PowerShell cmdlet. A value of 0 indicates that the node does not have a quorum vote. A value of 1 indicates that the node has a quorum vote.
The vote assignment for all cluster nodes can be verified by using the Validate Cluster Quorum validation test.
Additional considerations
Dynamic quorum management does not allow the cluster to sustain a simultaneous failure of a majority of voting members. To continue running, the cluster must always have a quorum majority at the time of a node shutdown or failure.
If you have explicitly removed the vote of a node, the cluster cannot dynamically add or remove that vote.
Configure and Manage the Quorum in a Windows Server 2012 Failover Cluster
https://technet.microsoft.com/en-us/library/jj612870.aspx#BKMK_dynamic
Hope this info help you to reach your goal. :D
5ALU2 !

Node does not join cluster upon reboot

Hi Guys,
I have two servers [Sun Fire X4170] clustered together using Solaris cluster 3.3 for Oracle Database. They are connected to a shared storage which is Dell Equallogic [iSCSI]. Lately, I have ran into a weird kind of a problem where as both nodes come up fine and join the cluster upon reboot; however, when I reboot one of nodes then any of them does not join cluster and shows following errors:
This is happening on both the nodes [if I reboot only one node at a time]. But if I reboot both the nodes at the same time then they successfully join the cluster and everything runs fine.
Below is the output from one node which I rebooted and it did not join the cluster and puked out following errors. The other node is running fine will all the services.
In order to get out of this situation, I have to reboot both the nodes together.
# dmesg output #
Apr 23 17:37:03 srvhqon11 ixgbe: [ID 611667 kern.info] NOTICE: ixgbe2: link down
Apr 23 17:37:12 srvhqon11 iscsi: [ID 933263 kern.notice] NOTICE: iscsi connection(5) unable to connect to target SENDTARGETS_DISCOVERY
Apr 23 17:37:12 srvhqon11 iscsi: [ID 114404 kern.notice] NOTICE: iscsi discovery failure - SendTargets (010.010.017.104)
Apr 23 17:37:13 srvhqon11 iscsi: [ID 240218 kern.notice] NOTICE: iscsi session(9) iqn.2001-05.com.equallogic:0-8a0906-96cf73708-ef30000005e50a1b-sblprdbk online
Apr 23 17:37:13 srvhqon11 scsi: [ID 583861 kern.info] sd11 at scsi_vhci0: unit-address g6090a0887073cf961b0ae505000030ef: g6090a0887073cf961b0ae505000030ef
Apr 23 17:37:13 srvhqon11 genunix: [ID 936769 kern.info] sd11 is /scsi_vhci/disk@g6090a0887073cf961b0ae505000030ef
Apr 23 17:37:13 srvhqon11 scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
Apr 23 17:37:13 srvhqon11 /scsi_vhci/disk@g6090a0887073cf961b0ae505000030ef (sd11): Command failed to complete (3) on path iscsi0/[email protected]:0-8a0906-96cf73708-ef30000005e50a1b-sblprdbk0001,0
Apr 23 17:46:54 srvhqon11 svc.startd[11]: [ID 122153 daemon.warning] svc:/network/iscsi/initiator:default: Method or service exit timed out. Killing contract 41.
Apr 23 17:46:54 srvhqon11 svc.startd[11]: [ID 636263 daemon.warning] svc:/network/iscsi/initiator:default: Method "/lib/svc/method/iscsid start" failed due to signal KILL.
Apr 23 17:46:54 srvhqon11 svc.startd[11]: [ID 748625 daemon.error] network/iscsi/initiator:default failed repeatedly: transitioned to maintenance (see 'svcs -xv' for details)
Apr 24 14:50:16 srvhqon11 svc.startd[11]: [ID 694882 daemon.notice] instance svc:/system/console-login:default exited with status 1
root@srvhqon11 # svcs -xv
svc:/system/cluster/loaddid:default (Oracle Solaris Cluster loaddid)
State: offline since Tue Apr 23 17:46:54 2013
Reason: Start method is running.
See: http://sun.com/msg/SMF-8000-C4
See: /var/svc/log/system-cluster-loaddid:default.log
Impact: 49 dependent services are not running:
svc:/system/cluster/bootcluster:default
svc:/system/cluster/cl_execd:default
svc:/system/cluster/zc_cmd_log_replay:default
svc:/system/cluster/sc_zc_member:default
svc:/system/cluster/sc_rtreg_server:default
svc:/system/cluster/sc_ifconfig_server:default
svc:/system/cluster/initdid:default
svc:/system/cluster/globaldevices:default
svc:/system/cluster/gdevsync:default
svc:/milestone/multi-user:default
svc:/system/boot-config:default
svc:/system/cluster/cl-svc-enable:default
svc:/milestone/multi-user-server:default
svc:/application/autoreg:default
svc:/system/basicreg:default
svc:/system/zones:default
svc:/system/cluster/sc_zones:default
svc:/system/cluster/scprivipd:default
svc:/system/cluster/cl-svc-cluster-milestone:default
svc:/system/cluster/sc_svtag:default
svc:/system/cluster/sckeysync:default
svc:/system/cluster/rpc-fed:default
svc:/system/cluster/rgm-starter:default
svc:/application/management/common-agent-container-1:default
svc:/system/cluster/scsymon-srv:default
svc:/system/cluster/sc_syncsa_server:default
svc:/system/cluster/scslmclean:default
svc:/system/cluster/cznetd:default
svc:/system/cluster/scdpm:default
svc:/system/cluster/rpc-pmf:default
svc:/system/cluster/pnm:default
svc:/system/cluster/sc_pnm_proxy_server:default
svc:/system/cluster/cl-event:default
svc:/system/cluster/cl-eventlog:default
svc:/system/cluster/cl-ccra:default
svc:/system/cluster/ql_upgrade:default
svc:/system/cluster/mountgfs:default
svc:/system/cluster/clusterdata:default
svc:/system/cluster/ql_rgm:default
svc:/system/cluster/scqdm:default
svc:/application/stosreg:default
svc:/application/sthwreg:default
svc:/application/graphical-login/cde-login:default
svc:/application/cde-printinfo:default
svc:/system/cluster/scvxinstall:default
svc:/system/cluster/sc_failfast:default
svc:/system/cluster/clexecd:default
svc:/system/cluster/sc_pmmd:default
svc:/system/cluster/clevent_listenerd:default
svc:/application/print/server:default (LP print server)
State: disabled since Tue Apr 23 17:36:44 2013
Reason: Disabled by an administrator.
See: http://sun.com/msg/SMF-8000-05
See: man -M /usr/share/man -s 1M lpsched
Impact: 2 dependent services are not running:
svc:/application/print/rfc1179:default
svc:/application/print/ipp-listener:default
svc:/network/iscsi/initiator:default (?)
State: maintenance since Tue Apr 23 17:46:54 2013
Reason: Restarting too quickly.
See: http://sun.com/msg/SMF-8000-L5
See: /var/svc/log/network-iscsi-initiator:default.log
Impact: This service is not running.
######## Cluster Status from working node ############
root@srvhqon10 # cluster status
=== Cluster Nodes ===
--- Node Status ---
Node Name Status
srvhqon10 Online
srvhqon11 Offline
=== Cluster Transport Paths ===
Endpoint1 Endpoint2 Status
srvhqon10:igb3 srvhqon11:igb3 faulted
srvhqon10:igb2 srvhqon11:igb2 faulted
=== Cluster Quorum ===
--- Quorum Votes Summary from (latest node reconfiguration) ---
Needed Present Possible
2 2 3
--- Quorum Votes by Node (current status) ---
Node Name Present Possible Status
srvhqon10 1 1 Online
srvhqon11 0 1 Offline
--- Quorum Votes by Device (current status) ---
Device Name Present Possible Status
d2 1 1 Online
=== Cluster Device Groups ===
--- Device Group Status ---
Device Group Name Primary Secondary Status
--- Spare, Inactive, and In Transition Nodes ---
Device Group Name Spare Nodes Inactive Nodes In Transistion Nodes
--- Multi-owner Device Group Status ---
Device Group Name Node Name Status
=== Cluster Resource Groups ===
Group Name Node Name Suspended State
ora-rg srvhqon10 No Online
srvhqon11 No Offline
nfs-rg srvhqon10 No Online
srvhqon11 No Offline
backup-rg srvhqon10 No Online
srvhqon11 No Offline
=== Cluster Resources ===
Resource Name Node Name State Status Message
ora-listener srvhqon10 Online Online
srvhqon11 Offline Offline
ora-server srvhqon10 Online Online
srvhqon11 Offline Offline
ora-stor srvhqon10 Online Online
srvhqon11 Offline Offline
ora-lh srvhqon10 Online Online - LogicalHostname online.
srvhqon11 Offline Offline
nfs-rs srvhqon10 Online Online - Service is online.
srvhqon11 Offline Offline
nfs-stor-rs srvhqon10 Online Online
srvhqon11 Offline Offline
nfs-lh-rs srvhqon10 Online Online - LogicalHostname online.
srvhqon11 Offline Offline
backup-stor srvhqon10 Online Online
srvhqon11 Offline Offline
cluster: (C383355) No response from daemon on node "srvhqon11".
=== Cluster DID Devices ===
Device Instance Node Status
/dev/did/rdsk/d1 srvhqon10 Ok
/dev/did/rdsk/d2 srvhqon10 Ok
srvhqon11 Unknown
/dev/did/rdsk/d3 srvhqon10 Ok
srvhqon11 Unknown
/dev/did/rdsk/d4 srvhqon10 Ok
/dev/did/rdsk/d5 srvhqon10 Fail
srvhqon11 Unknown
/dev/did/rdsk/d6 srvhqon11 Unknown
/dev/did/rdsk/d7 srvhqon11 Unknown
/dev/did/rdsk/d8 srvhqon10 Ok
srvhqon11 Unknown
/dev/did/rdsk/d9 srvhqon10 Ok
srvhqon11 Unknown
=== Zone Clusters ===
--- Zone Cluster Status ---
Name Node Name Zone HostName Status Zone Status
Regards.

check if your global devices are mounted properly
#cat /etc/mnttab | grep -i global
check if proper entries are there on both systems
#cat /etc/vfstab | grep -i global
give output for quoram devices .
#scstat -q
or
#clquorum list -v
also check why your scsi initiator service is going offline unexpectedly
#vi /var/svc/log/network-iscsi-initiator:default.log

JNDI Lookup for multiple server instances with multiple cluster nodes

Hi Experts,
I need help with retreiving log files for multiple server instances with multiple cluster nodes. The system is Netweaver 7.01.
There are 3 server instances all instances with 3 cluster nodes.
There are EJB session beans deployed on them to retreive the log information for each server node.
In the session bean there is a method:
public List getServers() {
List servers = new ArrayList();
ClassLoader saveLoader = Thread.currentThread().getContextClassLoader();
try {
   Properties prop = new Properties();
   prop.setProperty(Context.INITIAL_CONTEXT_FACTORY, "com.sap.engine.services.jndi.InitialContextFactoryImpl");
   prop.put(Context.SECURITY_AUTHENTICATION, "none");
   Thread.currentThread().setContextClassLoader((com.sap.engine.services.adminadapter.interfaces.RemoteAdminInterface.class).getClassLoader());
   InitialContext mInitialContext = new InitialContext(prop);
   RemoteAdminInterface rai = (RemoteAdminInterface) mInitialContext.lookup("adminadapter");
   ClusterAdministrator cadm = rai.getClusterAdministrator();
   ConvenienceEngineAdministrator cea = rai.getConvenienceEngineAdministrator();
   int nodeId[] = cea.getClusterNodeIds();
   int dispatcherId = 0;
   String dispatcherIP = null;
   String p4Port = null;
   for (int i = 0; i < nodeId.length; i++) {
    if (cea.getClusterNodeType(nodeId[i]) != 1)
     continue;
    Properties dispatcherProp = cadm.getNodeInfo(nodeId[i]);
    dispatcherIP = dispatcherProp.getProperty("Host", "localhost");
    p4Port = cea.getServiceProperty(nodeId[i], "p4", "port");
    String[] loc = new String[3];
    loc[0] = dispatcherIP;
    loc[1] = p4Port;
    loc[2] = null;
    servers.add(loc);
   mInitialContext.close();
} catch (NamingException e) {
} catch (RemoteException e) {
} finally {
   Thread.currentThread().setContextClassLoader(saveLoader);
return servers;
and the retreived server information used here in another class:
public void run() {
ReadLogsSession readLogsSession;
int total = servers.size();
for (Iterator iter = servers.iterator(); iter.hasNext();) {
   if (keepAlive) {
    try {
     Thread.sleep(500);
    } catch (InterruptedException e) {
     status = status + e.getMessage();
     System.err.println("LogReader Thread Exception" + e.toString());
     e.printStackTrace();
    String[] serverLocs = (String[]) iter.next();
    searchFilter.setDetails("[" + serverLocs[1] + "]");
    Properties prop = new Properties();
    prop.put(Context.INITIAL_CONTEXT_FACTORY, "com.sap.engine.services.jndi.InitialContextFactoryImpl");
    prop.put(Context.PROVIDER_URL, serverLocs[0] + ":" + serverLocs[1]);
    System.err.println("LogReader run [" + serverLocs[0] + ":" + serverLocs[1] + "]");
    status = " Reading :[" + serverLocs[0] + ":" + serverLocs[1] + "] servers :[" + currentIndex + "/" + total + " ] ";
    prop.put("force_remote", "true");
    prop.put(Context.SECURITY_AUTHENTICATION, "none");
    try {
     Context ctx = new InitialContext(prop);
     Object ob = ctx.lookup("com.xom.sia.ReadLogsSession");
     ReadLogsSessionHome readLogsSessionHome = (ReadLogsSessionHome) PortableRemoteObject.narrow(ob, ReadLogsSessionHome.class);
     status = status + "Found ReadLogsSessionHome ["+readLogsSessionHome+"]";
     readLogsSession = readLogsSessionHome.create();
     if(readLogsSession!=null){
      status = status + " Created ["+readLogsSession+"]";
      List l = readLogsSession.getAuditLogs(searchFilter);
      serverLocs[2] = String.valueOf(l.size());
      status = status + serverLocs[2];
      allRecords.addAll(l);
     }else{
      status = status + " unable to create readLogsSession ";
     ctx.close();
    } catch (NamingException e) {
     status = status + e.getMessage();
     System.err.println(e.getMessage());
     e.printStackTrace();
    } catch (CreateException e) {
     status = status + e.getMessage();
     System.err.println(e.getMessage());
     e.printStackTrace();
    } catch (IOException e) {
     status = status + e.getMessage();
     System.err.println(e.getMessage());
     e.printStackTrace();
    } catch (Exception e) {
     status = status + e.getMessage();
     System.err.println(e.getMessage());
     e.printStackTrace();
   currentIndex++;
jobComplete = true;
The application is working for multiple server instances with a single cluster node but not working for multiple cusltered environment.
Anybody knows what should be changed to handle more cluster nodes?
Thanks,
Gergely

Thanks for the response.
I was afraid that it would be something like that although
was hoping for
something closer to the application pools we use with IIS to
isolate sites
and limit the impact one badly behaving one can have on
another.
mmr
"Ian Skinner" <[email protected]> wrote in message
news:fe5u5v$pue$[email protected]..
> Run CF with one instance. Look at your processes and see
how much memory
> the "JRun" process is using, multiply this by number of
other CF
> instances.
>
> You are most likely going to end up on implementing a
"handful" of
> instances versus "dozens" of instance on all but the
beefiest of servers.
>
> This can be affected by how much memory each instance
uses. An
> application that puts major amounts of data into
persistent scopes such as
> application and|or session will have a larger foot print
then a leaner
> application that does not put much data into memory
and|or leave it there
> for a very long time.
>
> I know the first time we made use of CF in it's
multi-home flavor, we went
> a bit overboard and created way too many. After nearly
bringing a
> moderate server to its knees, we consolidated until we
had three or four
> or so IIRC. A couple dedicated to to each of our largest
and most
> critical applications and a couple general instances
that ran many smaller
> applications each.
>
>
>
>
>

O2cb_ctl : unable to load cluster configuration file while RAC setup.

OCFS2: unable to load cluster configuration file
Hi,
I installed OCFS2 successfully.
I successfully install 3 RPMs.
I am getting error while running ocfs2console to do cluster configuration.
I tried to run this manully using "o2cb_ctl -C -n NODENAME -t node -a number=NODENUM -a ip_address=IPADDR -a ip_port=IPPORT -a cluster=CLUSTERNAME "
this commnd. but getting error as
"o2cb_ctl: Unable to access cluster service while creating node Could not add node "
Then I edited this file manually and copied to another nodes using SCP. Then tried to use o2cg_ctl utility as
# /etc/init.d/o2cb offline ocfs2
# /etc/init.d/o2cb unload
# /etc/init.d/o2cb configure
Configuring the O2CB driver.
Next, I configure /etc/ocfs2/cluster.conf:
cluster:
node_count = 2
name = oracle
node:
ip_port = 7777
ip_address = 140.187.222.222
number = 1
name = ocvmrh2053
cluster = oracle
node:
ip_port = 7777
ip_address = 140.187.222.222
number = 2
name = ocvmrh2051
cluster = oracle
Next, I try to load ocfs2 modules by command: o2cb load - everything messages is OK
And I try switch ocfs2 cluster to online (by o2cb online oracle), write this error message:
Starting cluster oracle: Failed
o2cb_ctl: Unable to load cluster configuration file "/etc/ocfs2/cluster.conf"
Stopping cluster oracle: Failed
o2cb_ctl: Unable to load cluster configuration file "/etc/ocfs2/cluster.conf"
I think both errors i get are due to same issue. Please kindly reply if anyone having idea.
Thanks.
Please anyone can help me to resolve this.
Message was edited by:
user596035

provide output of this command from all OCFS2 nodes:
ls -l /etc/ocfs2/cluster.conf
Regards,
Martin

Iscsi target on cluster node

Hi
I'm trying to install a cluster on a lab environment, i have to physical servers, and would like to use them as cluster nodes, on one of this nodes i would like to install iscsi target server to use for sharing disk to the cluster itself is this possible?
because i did all the configurations but after installing the cluster the iscsi target server doesn't work anymore.
thanks

Hi
I'm trying to install a cluster on a lab environment, i have to physical servers, and would like to use them as cluster nodes, on one of this nodes i would like to install iscsi target server to use for sharing disk to the cluster itself is this possible?
because i did all the configurations but after installing the cluster the iscsi target server doesn't work anymore.
thanks
Bad news: You cannot do it with a Microsoft built-in solutions because you do need indeed to have physical shared storage to make Microsoft iSCSI target clustered. Something like on Robert Smit blog here:
Clustering Microsoft iSCSI Target
https://robertsmit.wordpress.com/2012/06/26/clustering-iscsi-target-on-windows-2012-step-by-step/
...or here:
MSFT iSCSI Target in HA
https://technet.microsoft.com/en-us/library/gg232621(v=ws.10).aspx
...or very detailed walk thru here:
MSFT iSCSI Target in High Availability Mode
https://techontip.wordpress.com/2011/05/03/microsoft-iscsi-target-cluster-building-walkthrough/
Good news: you can take a third-party solution from various companies (below) and create an HA iSCSI volumes on just a pair of nodes. See:
StarWind Virtual SAN
http://www.starwindsoftware.com/starwind-virtual-san-free
(this setup is FREE of charge, you just need to be an MCT, MVP or MCP to obtain your free 2-node key)
...to have a setup like this:
Also SteelEye has similar one here:
SteelEye #SANLess Clusters
http://us.sios.com/products/datakeeper-cluster/
DataCore SANsyxxxx
http://www.datacore.com/products/SANsymphony-V.aspx
You can also spawn a VMs running FreeBSD/HAST or Linux/DRBD to build a very similar setup yourself (Two-node setups should be Active-Passive to avoid brain split, Windows solutions from above all maintain own pacemaker and heartbeats to run Active-Active
on just a pair of nodes).
Good luck and happy clustering :)
StarWind Virtual SAN clusters Hyper-V without SAS, Fibre Channel, SMB 3.0 or iSCSI, uses Ethernet to mirror internally mounted SATA disks between hosts.

Common memory place across the cluster nodes

Hi All,
I am a websphere application server v6.1 user. I am running an application that uses a HashMap to store common information in the form of key value pairs. The application works fine in a single server environment but the same application fails in a cluster environment. This happens because the HashMap information will not be available for the cluster environment nodes which were running on a different JVM�s.
Could anybody suggest a good design where in I can use a common place to store the HashMap information like queue, database or any common memory area which is available across the cluster nodes? I am not really familiar with the memory facilities offered by websphere server. (The use of a central database is the worst case I prefer as the application makes several calls to the database resulting in a deadlock and 100% CPU utilization)
And also the values to the HashMap were added dynamically so the memory place should allow me to add my values dynamically during the runtime.
Please suggest is there any other way or any links to refer to achieve the above situation.
Thanks in advance
-Sandeep
Message was edited by:
km-sandeep

For a similar scenario we maintain a version flag in the DB based on which we would reload the hashmap.I'm too interested in finding out a design without DB.

Cluster Node Unable to Maintain Cluster Membership

Similar Messages

Maybe you are looking for