Novell Cluster Services on VMWARE

I am currently running a 3 node cluster (oes2sp2 Linux) connected to an EMC SAN. This cluster was commissioned, fully patched, etc 250 days ago. It's been running smoothly and hasn't missed a beat :-) The nodes are bare-metal installs.
Now I've been tasked with investigating the possibility of virtualizing the nodes using VMWare (and making use of VMotion).
This will give us the ability to have 6 virtual nodes rather than 3 physical nodes.
What is the general feeling of the community with regards to:
1. Virtualizing cluster nodes
2. How complex is the setup on VMWare
3. Could I do a rolling migration from physical to virtual
Many thanks in advance for any comments, tips, advice, etc

On 11.03.2011 14:06, laurabuckley wrote:
>
> Now I've been tasked with investigating the possibility of virtualizing
> the nodes using VMWare (and making use of VMotion).
>
> This will give us the ability to have 6 virtual nodes rather than 3
> physical nodes.
>
> What is the general feeling of the community with regards to:
>
> 1. Virtualizing cluster nodes
> 2. How complex is the setup on VMWare
> 3. Could I do a rolling migration from physical to virtual
lot of talks about this, many things to consider.. here are few atleast;
If you have (or buy) VMware HA and vMotion, do you even need Novell
Cluster Services anymore?
if you want vMotion, you need to use vmdk for storage, which might be an
issue for large volumes.. or not?
For large storage you could use RDM LUNs directly from SAN. But then you
cannot use vMotion or run many nodes (of the same cluster) on one
physical server.
We are running multiple three node OES2 NCS clusters on three VMware
servers, using RDM LUNs from EMC SAN. Did a rolling migration from
physical Netware clusters using the same LUNs.
-sk

Similar Messages

  • CFMX7 - Linux - Novell Cluster Services

    I have a 2 node cluster set up with Suse Linux and Novell
    Cluster Services. There is a cluster resource of apache that is set
    up in the cluster, as well as a shared volume as a cluster
    resource. THe apache running as the cluster resource is using the
    shared volume for the web root home. Furthermore, apache is running
    individually on each node. Can anyone offer any opinions on setting
    up CFMX7 in this environment. We are not worried about load
    balancing, this is strictly needed for failover. My concerns are
    which install config I could be using, Individual Server,
    MultiServer, or J2EE? Can we use 1 CFADMIN for both nodes to
    minimize the overhead of keeping the configs in sync? During the
    install, can we use the cluster resource instance of apache to
    install to? My concern here is when we install it on the second
    node and point to the shared volume for the web root home, it's
    going to have a problem since it already exists? Any input is
    appreciated. Thanks.

    On 11.03.2011 14:06, laurabuckley wrote:
    >
    > Now I've been tasked with investigating the possibility of virtualizing
    > the nodes using VMWare (and making use of VMotion).
    >
    > This will give us the ability to have 6 virtual nodes rather than 3
    > physical nodes.
    >
    > What is the general feeling of the community with regards to:
    >
    > 1. Virtualizing cluster nodes
    > 2. How complex is the setup on VMWare
    > 3. Could I do a rolling migration from physical to virtual
    lot of talks about this, many things to consider.. here are few atleast;
    If you have (or buy) VMware HA and vMotion, do you even need Novell
    Cluster Services anymore?
    if you want vMotion, you need to use vmdk for storage, which might be an
    issue for large volumes.. or not?
    For large storage you could use RDM LUNs directly from SAN. But then you
    cannot use vMotion or run many nodes (of the same cluster) on one
    physical server.
    We are running multiple three node OES2 NCS clusters on three VMware
    servers, using RDM LUNs from EMC SAN. Did a rolling migration from
    physical Netware clusters using the same LUNs.
    -sk

  • Novell Cluster Services - help shape the roadmap

    Dear Community Members,
    We have a NCS survey going for two weeks - take a look and give us your feedback. The survey link is at the bottom of the blog post.
    Important Notice
    Thanks,
    Glen

    Glen,
    It appears that in the past few days you have not received a response to your
    posting. That concerns us, and has triggered this automated reply.
    Has your problem been resolved? If not, you might try one of the following options:
    - Visit http://support.novell.com and search the knowledgebase and/or check all
    the other self support options and support programs available.
    - You could also try posting your message again. Make sure it is posted in the
    correct newsgroup. (http://forums.novell.com)
    Be sure to read the forum FAQ about what to expect in the way of responses:
    http://forums.novell.com/faq.php
    If this is a reply to a duplicate posting, please ignore and accept our apologies
    and rest assured we will issue a stern reprimand to our posting bot.
    Good luck!
    Your Novell Product Support Forums Team
    http://forums.novell.com/

  • The Cluster Service function call 'ClusterResourceControl' failed with error code '1008(An attempt was made to reference a token that does not exist.)' while verifying the file path. Verify that your failover cluster is configured properly.

    I am experiencing this error with one of our cluster environment. Can anyone help me in this issue.
    The Cluster Service function call 'ClusterResourceControl' failed with error code '1008(An attempt was made to reference a token that does not exist.)' while verifying the file path. Verify that your failover cluster is configured properly.
    Thanks,
    Venu S.
    Venugopal S ----------------------------------------------------------- Please click the Mark as Answer button if a post solves your problem!

    Hi Venu S,
    Based on my research, you might encounter a known issue, please try the hotfix in this KB:
    http://support.microsoft.com/kb/928385
    Meanwhile since there is less information about this issue, before further investigation, please provide us the following information:
    The version of Windows Server you are using
    The result of SELECT @@VERSION
    The scenario when you get this error
    If anything is unclear, please let me know.
    Regards,
    Tom Li

  • SAP Cluster service issue

    Here is the description of the PRD cluster scenario. ( windows 2008 + oracle)
    We have 2 nodes .
    1. host-erpn01 ( Have ASCS , Database instance, Enqueue and Dialog
    Instance installed)
    2. host-erp02 ( Have Central Instance, Dialog Instance and Enqueue installed)
    When we move "SAP SID" service using "failover cluster management tool" from one node to another its fails and we have to manually select the  "SAP SID cluster service" and "SAP SID cluster instance" to online.
    These both service and instance were coming online after manual selection, however after some time in the mmc console of node 2 the sap instances hosted on node1 are in red cross and are giving " cannot connect to sap service dcom interface error 800706BA"
    We replaced the sapstartsrv.exe from working directory of ASCS instance to CI executable directory.
    Now the disp+work is stopped for CI instance. Also in the CI instance executable directory we can see five files with name of sapstartsrv i.e
    sapstartsrv.exe.new , sapstartsrv.exe.tmp, sapstartsrv.new, sapstartsrv.pdb and actual sapstartsrv.exe file.
    Here is the log of sapstartsrv.log  CI work directory from node2.
    trc file: "sapstartsrv.log", trc level: 0, release: "701"
    pid        1968
    Mon Oct 11 15:55:33 2010
    SAP HA Trace: Build in SAP Microsoft Cluster library '701, patch 32, changelist 1046543' initialized
    Initializing SAPControl Webservice
    SapSSLInit failed => https support disabled
    Starting WebService Named Pipe thread
    Starting WebService thread
    Webservice named pipe thread started, listening on port
    .\pipe\sapcontrol_01
    Webservice thread started, listening on port 50113
    GCCIA\csrvadmin is starting SAP System at 2010/10/11 16:09:07
    SAP HA Trace: FindClusterResource: SAP resource not found [sapwinha.cpp, line 334]
    SAP HA Trace: SAP_HA_FindSAPInstance returns: SAP_HA_NOT_CLUSTERED [sapwinha.cpp, line 907]"
    or you can view other logs from the work directory dump at
    http://s000.tinyupload.com/index.php?file_id=45384422007535688902
    Now when we try to start the SAPSID_00 service manually its giving error "The SAPSID_00 service failed to start due to the following error: The system cannot find the path specified.
    Please advice.
    Regards
    Edited by: Tech GCCIA on Oct 11, 2010 3:27 PM
    Edited by: Tech GCCIA on Oct 11, 2010 3:28 PM

    Hi Sunil ,
                       On node 1 there is no  listener.trc at /oracle_home/network/trace folder , here is the log of listener.log file in case if it is helpful.
    TNSLSNR for 64-bit Windows: Version 10.2.0.4.0 - Production on 10-OCT-2010 10:37:37
    Copyright (c) 1991, 2007, Oracle.  All rights reserved.
    System parameter file is D:\oracle\GCP\102\network\admin\listener.ora
    Log messages written to D:\oracle\GCP\102\network\log\listener.log
    Trace information written to D:\oracle\GCP\102\network\trace\listener.trc
    Trace level is currently 0
    Started with pid=3116
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
    .\pipe\GCP.WORLDipc)))
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
    .\pipe\GCPipc)))
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=gccia-erpn01.gccia.com.sa)(PORT=1527)))
    Listener completed notification to CRS on start
    TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE
    TNSLSNR for 64-bit Windows: Version 10.2.0.4.0 - Production on 10-OCT-2010 11:59:37
    Copyright (c) 1991, 2007, Oracle.  All rights reserved.
    System parameter file is D:\oracle\GCP\102\network\admin\listener.ora
    Log messages written to D:\oracle\GCP\102\network\log\listener.log
    Trace information written to D:\oracle\GCP\102\network\trace\listener.trc
    Trace level is currently 0
    Started with pid=5036
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
    .\pipe\GCP.WORLDipc)))
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
    .\pipe\GCPipc)))
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=1527)))
    Listener completed notification to CRS on start
    TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE
    10-OCT-2010 12:00:31 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=60592)) * establish * GCP * 0
    10-OCT-2010 12:00:31 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=60593)) * establish * GCP * 0
    10-OCT-2010 12:00:31 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=60594)) * establish * GCP * 0
    10-OCT-2010 12:00:31 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=60595)) * establish * GCP * 0
    10-OCT-2010 12:00:31 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=60596)) * establish * GCP * 0
    10-OCT-2010 13:01:19 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61336)) * establish * GCP * 0
    10-OCT-2010 13:01:37 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61340)) * establish * GCP * 0
    10-OCT-2010 13:01:37 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61341)) * establish * GCP * 0
    10-OCT-2010 13:01:37 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61342)) * establish * GCP * 0
    10-OCT-2010 13:01:37 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61343)) * establish * GCP * 0
    10-OCT-2010 13:01:37 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61344)) * establish * GCP * 0
    10-OCT-2010 13:08:27 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61485)) * establish * GCP * 0
    10-OCT-2010 13:08:42 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61489)) * establish * GCP * 0
    10-OCT-2010 13:08:42 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61490)) * establish * GCP * 0
    10-OCT-2010 13:08:42 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61491)) * establish * GCP * 0
    10-OCT-2010 13:08:42 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61492)) * establish * GCP * 0
    10-OCT-2010 13:08:42 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61493)) * establish * GCP * 0
    TNSLSNR for 64-bit Windows: Version 10.2.0.4.0 - Production on 10-OCT-2010 13:09:57
    Copyright (c) 1991, 2007, Oracle.  All rights reserved.
    System parameter file is D:\oracle\GCP\102\network\admin\listener.ora
    Log messages written to D:\oracle\GCP\102\network\log\listener.log
    Trace information written to D:\oracle\GCP\102\network\trace\listener.trc
    Trace level is currently 0
    Started with pid=2336
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
    .\pipe\GCP.WORLDipc)))
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
    .\pipe\GCPipc)))
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=1527)))
    Listener completed notification to CRS on start
    TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE
    TNSLSNR for 64-bit Windows: Version 10.2.0.4.0 - Production on 10-OCT-2010 13:14:34
    Copyright (c) 1991, 2007, Oracle.  All rights reserved.
    System parameter file is D:\oracle\GCP\102\network\admin\listener.ora
    Log messages written to D:\oracle\GCP\102\network\log\listener.log
    Trace information written to D:\oracle\GCP\102\network\trace\listener.trc
    Trace level is currently 0
    Started with pid=4948
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
    .\pipe\GCP.WORLDipc)))
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
    .\pipe\GCPipc)))
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=1527)))
    Listener completed notification to CRS on start
    TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE
    TNSLSNR for 64-bit Windows: Version 10.2.0.4.0 - Production on 10-OCT-2010 13:38:12
    Copyright (c) 1991, 2007, Oracle.  All rights reserved.
    System parameter file is D:\oracle\GCP\102\network\admin\listener.ora
    Log messages written to D:\oracle\GCP\102\network\log\listener.log
    Trace information written to D:\oracle\GCP\102\network\trace\listener.trc
    Trace level is currently 0
    Started with pid=2456
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
    .\pipe\GCP.WORLDipc)))
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
    .\pipe\GCPipc)))
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=1527)))
    Listener completed notification to CRS on start
    TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE
    TNSLSNR for 64-bit Windows: Version 10.2.0.4.0 - Production on 10-OCT-2010 14:03:35
    Copyright (c) 1991, 2007, Oracle.  All rights reserved.
    System parameter file is D:\oracle\GCP\102\network\admin\listener.ora
    Log messages written to D:\oracle\GCP\102\network\log\listener.log
    Trace information written to D:\oracle\GCP\102\network\trace\listener.trc
    Trace level is currently 0
    Started with pid=2756
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
    .\pipe\GCP.WORLDipc)))
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
    .\pipe\GCPipc)))
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=1527)))
    Listener completed notification to CRS on start
    TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE
    TNSLSNR for 64-bit Windows: Version 10.2.0.4.0 - Production on 10-OCT-2010 14:10:42
    Copyright (c) 1991, 2007, Oracle.  All rights reserved.
    System parameter file is D:\oracle\GCP\102\network\admin\listener.ora
    Log messages written to D:\oracle\GCP\102\network\log\listener.log
    Trace information written to D:\oracle\GCP\102\network\trace\listener.trc
    Trace level is currently 0
    Started with pid=4812
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
    .\pipe\GCP.WORLDipc)))
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
    .\pipe\GCPipc)))
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=1527)))
    Listener completed notification to CRS on start
    TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE
    TNSLSNR for 64-bit Windows: Version 10.2.0.4.0 - Production on 11-OCT-2010 09:34:05
    Copyright (c) 1991, 2007, Oracle.  All rights reserved.
    System parameter file is D:\oracle\GCP\102\network\admin\listener.ora
    Log messages written to D:\oracle\GCP\102\network\log\listener.log
    Trace information written to D:\oracle\GCP\102\network\trace\listener.trc
    Trace level is currently 0
    Started with pid=1920
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=1527)))
    Listener completed notification to CRS on start
    TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE
    TNSLSNR for 64-bit Windows: Version 10.2.0.4.0 - Production on 11-OCT-2010 21:12:29
    Copyright (c) 1991, 2007, Oracle.  All rights reserved.
    System parameter file is D:\oracle\GCP\102\network\admin\listener.ora
    Log messages written to D:\oracle\GCP\102\network\log\listener.log
    Trace information written to D:\oracle\GCP\102\network\trace\listener.trc
    Trace level is currently 0
    Started with pid=1952
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=1527)))
    Listener completed notification to CRS on start
    TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE

  • The Cluster service is shutting down because quorum was lost

    Hi, we recently experienced the above issue and after looking for explanations I haven't been able to find any satisfying answers when other people have posted this issue.
    Our problem is as follows:
    2 node 2008R2 cluster running SQL 2012
    Each node is a HP BL460c running in a HP C7000 Blade Chassis.
    We were updating the flexfabric cards on one of the chassis.  The other chassis had been patched the previous week with no problems. 
    During the update process the flexfabric cards, which hold the Ethernet and FC connections, reboot so before work had begun all active cluster services had been failed over to the node in the chassis not being worked on.  However despite this the cluster
    service shut down on this one particular cluster.  All other clusters running across these 2 chassis continued to run as expected.
    As other people have posted before we saw the following errors in the system log.
    1564: File share witness resource 'File Share Witness' failed to arbitrate for the file share
    1069: Cluster resource 'File Share Witness' in clustered service or application 'Cluster Group' failed.
    1172: The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.
    Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected
    such as hubs, switches, or bridges.
    However we cant understand what could cause this to happen when the service is running on the node in the chassis not being updated, especially when the same update was performed the week before with no issues.  How can both nodes lose connectivity
    to the File Share Witness at the same time?
    Cluster Validation tests run fine and don't highlight any issues.  The file share witness is accessible from both servers.

    Hi,
    Please confirm you have install the Recommended hotfixes and updates for Windows Server 2008 R2 SP1 Failover Clusters update, especially the following hotfix.
    The network location profile changes from "Domain" to "Public" in Windows 7 or in Windows Server 2008 R2
    http://support.microsoft.com/kb/2524478/EN-US
    A hotfix is available that adds two new cluster control codes to help you determine which cluster node is blocking a GUM update in Windows Server 2008 R2 and Windows Server
    2012
    http://support.microsoft.com/kb/2779069/EN-US
    Hope this helps.
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • Cluster services UNKNOWN state

    Hi,
    I am having two node cluster database. I have some doubt
    If cluster services will go UNKNOWN state in first node existing connection will failover to second node?
    New connections will try to connect first node?

    user2017273 wrote:
    Hi,
    I am having two node cluster database. I have some doubtQuit doubting and TEST it for yourself. Also actually reading the documentation will help
    >
    If cluster services will go UNKNOWN state in first node existing connection will failover to second node?
    Maybe...
    New connections will try to connect first node?If nodex is down any connection attempt should go to the remaining nodes.

  • Error in coherence-- stopping cluster service.

    i do have found the error in one of my coherence server log files can some one explain me what does it mean?
    Coherence Logger@9272718 3.4.2/411 ERROR 2009-06-01 16:08:31.396/1217.130 Oracle Coherence GE 3.4.2/411 <Error> (thread=Cluster, member=3): Received cluster heartbeat from the senior Member(Id=7, Timestamp=2009-04-24 12:29:25.802, Address=xx.xxx.xx.xxx:8093, MachineId=55400, Location=machine:server72,process:11324, Role=WeblogicServer) that does not contain this Member(Id=3, Timestamp=2009-06-01 15:48:09.18, Address=xx.xxx.xxx.xx:8091, MachineId=47428, Location=site:ops.company.org,machine:cohserverbox1,process:14401, Role=CoherenceServer); stopping cluster service.
    Thanks Much

    Hi,
    This error essentially means what it says: The process received a cluster heartbeat that did not include the process as a member of the cluster. The process, therefore, stops its cluster service and will attempt to join the cluster again when appropriate. There are few reasons that the senior member may not have included the process in its heartbeat. Based on the timestamps and roles, I would first want to confirm the intent to cluster these processes. If the intent is not to cluster these processes, I would adjust their configurations appropriately (eg. use a distinct port) to form separate clusters. If the intent is to cluster these processes and the error (with the timestamp spread) reproduces, I would want to examine the network topology and look for reasons the members are being dropped from the cluster.
    Regards,
    Harv

  • Configure the ADMIN and CLUSTER service connections to be SSL

    Can you configure the ADMIN and CLUSTER service connections to be SSL
    rather than tcp?
    I was wondering about the present or future ability to secure other
    connection services with SSL. Can you now or are there future plans
    to configure the ADMIN and CLUSTER service connections to be SSL
    rather than tcp? I suppose I should add the PORTMAPPER to that list.
    My primary interest is for an SSLCLUSTER service in the case where
    two brokers are connected over a non-trusted network. It may
    not be too difficult to secure all the services the same way, but
    perhaps that is on the TODO list.
    A related question is if there are plans to add SSL with client
    authentication as a stronger authentication mechanism than 'simple'
    username and password. I believe you could get the username from
    the client certificate's DN and continue to use the same LDAP user
    repository for access control. I think this is similar to the way
    that BEA's Weblogic server does it.
    Finally should it be possible to deploy the HTTP tunnel servlet to
    a webserver (such as iPlanet Web Server) configured to do SSL with
    client authentication as a work-around to get stronger authentication
    with the current release of the product? Or am I perhaps missing some
    obvious and important detail? :) I guess I would like to know it's been
    done already or is at least possible before I try and do it myself.

    3 scenarios involving SSL are:
    1: JMS client <------- SSL -------> iMQ broker
    2: iMQ admin <------- SSL -------> iMQ broker
    3: iMQ broker <------- SSL -------> iMQ broker (i.e clusters)
    (1) is currently supported in iMQ 2.0
    (2) and (3) is not supported in iMQ 2.0. No concrete plans yet to support
    it in the near future but we'll definitely consider doing it if we
    hear a lot of demand for it.
    ]A related question is if there are plans to add SSL with client
    ]authentication as a stronger authentication mechanism than 'simple'
    ]username and password. I believe you could get the username from
    ]the client certificate's DN and continue to use the same LDAP user
    ]repository for access control. I think this is similar to the way
    ]that BEA's Weblogic server does it.
    This is on our todo list, but due to other more pressing issues we
    have not been able to address it. We will continue to keep it
    on our potential list of new features.
    Sorry if I sound pretty wishy-washy in my responses above, but the fact
    is that the things you mentioned above had to take a backseat
    to other more critical features. That and the usual time/resource
    constraints caused them not to be implemented.
    ]Finally should it be possible to deploy the HTTP tunnel servlet to
    ]a webserver (such as iPlanet Web Server) configured to do SSL with
    ]client authentication as a work-around to get stronger authentication
    ]with the current release of the product? Or am I perhaps missing some
    ]obvious and important detail? :) I guess I would like to know it's been
    ]done already or is at least possible before I try and do it myself.
    Yes, this should be possible (although I don't believe we've tried it here).
    The client authentication here is really only between the JMS client and the
    web server (not between the tunnel servlet and the iMQ broker) and should
    be similar in setup to any other java application talking to iPlanet Web
    Server.

  • Why virtual interfaces added to ManagementOS not visible to Cluster service?

    Hello All, 
    I"m starting this new thread since the one before is answered by our friend Udo. My problem in short is following. Diagram will be enough to explain what I'm trying to achieve. I've setup this lab to learn Hyper-V clustering with 2 nodes. It is Hyper-V
    server 2012. Both nodes have 3x physical NIcs, 1 in each node is dedicated to managing the Node. Rest of the two are used to create a NIC team. Atop of that NIC team, a virtual switch is created with -AllowManagementOS
    $False. Next I created and added following virtual interfaces to host partition, and plugged them into virtual switch created atop of teamed interface. These virtual interfaces should serve the purpose of various networks available. 
    For SAN i'm running a Linux VM which has iSCSI target server and clustering service has no problem with that. All tests pass ok.
    The problem is......when those virtual interfaces added to hosts; do not appear as available networks
    to cluster service; instead it only shows the management NIC as the available network to leverage. 
    This is making it difficult to understand how to setup a cluster of 2x Hyper-V Server nodes. Can someone help please?
    Regards,
    Shahzad.

    Shahzad,
    I've read this thread a couple of times and I don't think I'm clear on the exact question you're asking.
    When the clustering service goes out to look for "Networks", what it does is scan the IP addresses on each node. Every time it finds an IP in a unique subnet, that subnet is listed as a network. It can't see virtual switches and doesn't care about
    virtual vs. teamed vs. physical adapters or anything like that. It's just looking at IP addresses. This is why I'm confused when you say, "it won't show virtual interfaces available as networks". "Networks" in this context are IP subnets.
    I'm not aware of any context where a singular interface would be treated like a network.
    If you've got virtual adapters attached to the management operating system
    and have assigned IPs to them, the cluster should have discovered those networks. If you have multiple adapters on the same node using IPs in the same subnet, that network will only appear once and the cluster service will only use
    one adapter from that subnet on that node. The one it picked will be visible on the "Network Connections" tab at the bottom of Failover Cluster Manager when you're on the Networks section.
    Eric Siron Altaro Hyper-V Blog
    I am an independent blog contributor, not an Altaro employee. I am solely responsible for the content of my posts.
    "Every relationship you have is in worse shape than you think."
    Hello Eric and friends, 
    Eric, much appreciated about your interest about the issue and yes I agree with you when you said... "When the clustering service goes out to look for "Networks",
    what it does is scan the IP addresses on each node. Every time it finds an IP in a unique subnet, that subnet is listed as a network. It can't see virtual switches and doesn't care about virtual vs. teamed vs. physical adapters or anything like that. It's
    just looking at IP addresses. This is why I'm confused when you say, "it won't show virtual interfaces available as networks". "Networks" in this context are IP subnets. I'm not aware of any context where a singular interface would be treated
    like a network."
    By networks I meant to say subnets. Let me explain what I've configured so far:
    Node 1 & Node 2 installed with 3x NICs. All 3 NICs/node plugged into same switch. 
    Node1:  131.107.0.50/24
    Node2:  131.107l.0.150/24
    A Core Domain controller VM running on Node 1:   131.107.0.200/24 
    A JUMPBOX (WS 2012 R2 Std.) VM running on Node 1: 131.107.0.100/24
    A Linux SAN VM running on Node 2: 10.1.1.100/8 
    I planed to configured following networks:
    (1) Cluster traffic:  10.0.0.50/24     (IP given to virtual interface for Cluster traffic in Node1)
         Cluster traffic:  10.0.0.150/24   (IP given to virtual interface for Cluster traffic in Node2)
    (2) SAN traffic:      10.1.1.50/8      (IP given to virtual interfce for SAN traffic in Node1)  
         SAN traffic:      10.1.1.150/8    (IP given to virtual interfce for SAN traffic in Node2)
    Note: Cluster service has no problem accessing the SAN VM (10.1.1.100) over this network, it validates SAN settings and comes back OK. This is an indication that virtual interface is
    working fine. 
    (3) Migration traffic:   172.168.0.50/8     (IP given to virtual interfce for
    Migration traffic in Node1) 
         Migration traffic:   172.168.0.150/8    (IP given to virtual interfce for
    Migration  traffic in Node2)
    All these networks (virtual interfaces) are made available through two virtual switches which are configured EXACTLY identical on both Node1/Node2.
    Now after finishing the cluster validation steps (which comes all OK), when create cluster wizard starts, it only shows one network; i.e. network of physical Layer 2 switch i.e. 131.107.0.0/24.
    I wonder why it won't show IPs of other networks (10.0.0.0/8, 10.1.1.0/8 and  172.168.0.0/8)
    Regards,
    Shahzad

  • DAG issue - Unable to start cluster service

    Hello,
    Let me brief my environment.
    - 2 Sites
    - 1 DAG, 4 Servers
    - 2 Servers each site
    Situation:
    I have just updated the OS for all the servers in DAG and update all the Exchange Versions to SP3 RU3. One of the server in the DAG/cluster is down/unavailable. I figured that the cluster service on that server is disabled and not able to Start.
    Errors:
    Event ID:
    - 7024 - The Cluster Service service terminated with service-specific error. The system cannot find the file specified..
    - 7031 -
    - in to ensure that this machine is a member of a cluster. If you intend to add this machine to an existing cluster use the Add Node Wizard. Alternatively, if this machine has been configured as a member of a cluster, it will be necessary to restore the
    missing configuration data that is necessary for the Cluster Service to identify that it is a member of a cluster. Perform a System State Restore of this machine in order to restore the configuration data.
    Please help meee....

    Hi,
    From the error you provided above, it seems that a node(one server in DAG you mentioned above) doesn't belong to existing cluster. I recommend you join this node to cluster using
    cluster node nodename /forcecleanup cmdlet and then restart cluster service.
    Best regards,
    Belinda
    Belinda Ma
    TechNet Community Support

  • How do I restart Cluster services?

    Can some one tell me ho do I restart Cluster Services?
    Name Type Target State Host
    ora....DB1.srv application ONLINE OFFLINE
    ora....MSDB.cs application ONLINE OFFLINE
    ora....B1.inst application ONLINE ONLINE fms-db1
    ora....B2.inst application ONLINE ONLINE fms-db2
    ora.FMSDB.db application ONLINE ONLINE fms-db2
    ora....B1.lsnr application ONLINE ONLINE fms-db1
    ora....db1.gsd application ONLINE OFFLINE
    ora....db1.ons application ONLINE ONLINE fms-db1
    ora....db1.vip application ONLINE ONLINE fms-db1
    ora....B2.lsnr application ONLINE ONLINE fms-db2
    ora....db2.gsd application ONLINE OFFLINE
    ora....db2.ons application ONLINE ONLINE fms-db2
    ora....db2.vip application ONLINE ONLINE fms-db2
    ????

    What did you mean Cluster Service?
    If you mean Oracle Cluster,
    1. You must root user.
    2. use crsctl command-line
    ./crsctl stop crs
    ./crsctl start crs
    Your Database and listener , they have resisted in Oracle Cluster, that down.
    If you mean database service. You can use srvctl command-line to help you

  • Cluster Service 1146 & 1230 event id

    Dear Team,
    I am facing a cluster problem in server 2012 r2 its showing me error event id 1146 & 1230
    i am not able to start my cluster service my production is total down please help
    Here is log with this link pls help
    https://onedrive.live.com/redir?resid=4A228E11EF76B735!193&authkey=!AKCOUxUeE4FEu8A&ithint=file%2ctxt
    Ravi Tandon
    8400414038

    Hi,
    The log is incomplete. The error 1146 or 1230 is not included in the log file you uploaded.
    According to my search result, error 1146 & 1230 could be caused by dll crash issue. You can search in your local log file to see if you can find such entry:
    Error server.domain.com 1230 Microsoft-Windows-FailoverClustering   Cluster resource 'AA_BBBB' (resource type '', DLL 'XXXXX.dll') either crashed or deadlocked. 
    If so, search for the dll file to see if you can find any detailed information. Sometimes it could belong to a third party application and you can try to uninstall it to see the result. Or if it belong to a Role or Service, you can try to repair/reinstall
    it.
    And as Tim said, analysis log on TechNet forum is a little difficult as log files are large and almost all log files contain company information. You can try to submit a case to Microsoft for an efficient response. 
    If you have any feedback on our support, please send to [email protected]

  • Cluster Service Monitoring - Is There An Alert When a Volume is Available?

    We've seen some alerts that show a shared volume is no longer available. They look something like this
     Alert: Shared Volume IO is paused
    Source: Cluster Service
    Path: Host.domain.com
    Last modified by: System
    Last modified time: 2/14/2011 7:16:10 AM Alert description: Cluster Shared Volume 'Volume4' ('Exchange Mail Data') is no longer available on this node because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to
    the volume is reestablished.
    We're wondering if there is a way to generate an alert that tells us the volume is available again.
    Orange County District Attorney

    Hi,
    Based on my research, this monitor is based on the Cluster Shared Volume related Events:
    Event Log Rules
    http://technet.microsoft.com/en-us/library/dd491018.aspx
    Please also see the Events listed:
    Cluster Shared Volume Functionality
    http://technet.microsoft.com/en-us/library/ee830309(WS.10).aspx
    However, I could not find the Events means the “cluster
    shared volume is available again”; therefore, I suspect this cannot be monitored based on Event Log.
    In addition, I just noticed the status of a cluster shared volume can be queried by PowerShell script. Hope this can give you some hints:
    Get-ClusterSharedVolume
    http://technet.microsoft.com/en-us/library/ee460981.aspx
    Thanks.
    Nicholas Li - MSFT
    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.

  • Windows could not start the Cluster Service on Local computer. For more information, review the System Event Log. If this is a non-Microsoft service, contact the service vendor, and refer to service-specific error code 2.

    Dear Technet,
    Windows could not start the Cluster Service on Local computer. For more information, review the System Event Log. If this is a non-Microsoft service, contact the service vendor, and refer to service-specific error code 2.
    My cluster suddenly went disappear. and tried to restart the cluster service. When trying to restart service this above mention error comes up.
    even i tried to remove the cluster through power-shell still couldn't happen because of cluster service not running.
    Help me please.. thank you.
    Regards
    Shamil

    Hi,
    Could you confirm which account when you start the cluster service? The Cluster service is a service that requires a domain user account.
    The server cluster Setup program changes the local security policy for this account by granting a set of user rights to the account. Additionally, this account is made a member
    of the local Administrators group.
    If one or more of these user rights are missing, the Cluster service may stop immediately during startup or later, depending on when the Cluster service requires the particular
    user right.
    Hope this helps.
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

Maybe you are looking for