Switching resource group in 2 node cluster fails

hi,
i configured a 2 node cluster to provide high availability for my oracle DB 9.2.0.7
i have created a resource and named it oracleha-rg,
and i crated later the following resources
oraclelh-rs for logical hostname
hastp-rs for the HA storage resource
oracle-server-rs for oracle resource
and listener-rs for listener
whenever i try to switch the resource group between nodes is gives me the following in dmesg:
+Feb  6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <hafoip_stop> for resource <oraclelh-rs>, resource group <oracleha-rg>, node <DB1>, timeout <300> seconds+
+Feb  6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource oraclelh-rs status on node DB1 change to R_FM_UNKNOWN+
+Feb  6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource oraclelh-rs status msg on node DB1 change to <Stopping>+
+Feb  6 16:17:49 DB1 ip: [ID 678092 kern.notice] TCP_IOC_ABORT_CONN: local = 010.050.033.009:0, remote = 000.000.000.000:0, start = -2, end = 6+
+Feb  6 16:17:49 DB1 ip: [ID 302654 kern.notice] TCP_IOC_ABORT_CONN: aborted 0 connection+
+Feb  6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource oraclelh-rs status on node DB1 change to R_FM_OFFLINE+
+Feb  6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource oraclelh-rs status msg on node DB1 change to <LogicalHostname offline.>+
+Feb  6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <hafoip_stop> completed successfully for resource <oraclelh-rs>, resource group <oracleha-rg>, node <DB1>, time used: 0% of timeout <300 seconds>+
+Feb  6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource oraclelh-rs state on node DB1 change to R_OFFLINE+
+Feb  6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_postnet_stop> for resource <hastp-rs>, resource group <oracleha-rg>, node <DB1>, timeout <1800> seconds+
+Feb  6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource hastp-rs status on node DB1 change to R_FM_UNKNOWN+
+Feb  6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource hastp-rs status msg on node DB1 change to <Stopping>+
+Feb  6 16:17:49 DB1 SC[,SUNW.HAStoragePlus:8,oracleha-rg,hastp-rs,hastorageplus_postnet_stop]: [ID 843127 daemon.warning] Extension properties FilesystemMountPoints and GlobalDevicePaths and Zpools are empty.+
+Feb  6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <hastorageplus_postnet_stop> completed successfully for resource <hastp-rs>, resource group <oracleha-rg>, node <DB1>, time used: 0% of timeout <1800 seconds>+
+Feb  6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource hastp-rs state on node DB1 change to R_OFFLINE+
+Feb  6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource hastp-rs status on node DB1 change to R_FM_OFFLINE+
+Feb  6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource hastp-rs status msg on node DB1 change to <>+
+Feb  6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 529407 daemon.error] resource group oracleha-rg state on node DB1 change to RG_OFFLINE_START_FAILED+
+Feb  6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group oracleha-rg state on node DB1 change to RG_OFFLINE+
+Feb  6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 447451 daemon.notice] Not attempting to start resource group <oracleha-rg> on node <DB1> because this resource group has already failed to start on this node 2 or more times in the past 3600 seconds+
+Feb  6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 447451 daemon.notice] Not attempting to start resource group <oracleha-rg> on node <DB2> because this resource group has already failed to start on this node 2 or more times in the past 3600 seconds+
+Feb  6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 674214 daemon.notice] rebalance: no primary node is currently found for resource group <oracleha-rg>.+
+Feb  6 16:19:08 DB1 Cluster.RGM.global.rgmd: [ID 603096 daemon.notice] resource hastp-rs disabled.+
+Feb  6 16:19:17 DB1 Cluster.RGM.global.rgmd: [ID 603096 daemon.notice] resource oraclelh-rs disabled.+
+Feb  6 16:19:22 DB1 Cluster.RGM.global.rgmd: [ID 603096 daemon.notice] resource oracle-rs disabled.+
+Feb  6 16:19:27 DB1 Cluster.RGM.global.rgmd: [ID 603096 daemon.notice] resource listener-rs disabled.+
+Feb  6 16:19:51 DB1 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group oracleha-rg state on node DB1 change to RG_OFF_PENDING_METHODS+
+Feb  6 16:19:51 DB1 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group oracleha-rg state on node DB2 change to RG_OFF_PENDING_METHODS+
+Feb  6 16:19:51 DB1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <bin/oracle_listener_fini> for resource <listener-rs>, resource group <oracleha-rg>, node <DB1>, timeout <30> seconds+
+Feb  6 16:19:51 DB1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <bin/oracle_listener_fini> completed successfully for resource <listener-rs>, resource group <oracleha-rg>, node <DB1>, time used: 0% of timeout <30 seconds>+
+Feb  6 16:19:51 DB1 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group oracleha-rg state on node DB1 change to RG_OFFLINE+
+Feb  6 16:19:51 DB1 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group oracleha-rg state on node DB2 change to RG_OFFLINE+
and the resource group fails to switch...
any help please?

Hi,
this forum is for Oracle Clusterware, not Solaris Cluster. You probably should close this thread and open your question in the corresponding Solaris Cluster forum, to get help.
Regards
Sebastian

Similar Messages

  • Rs-ora:resource group failed to start on chosen node; it may end up failing

    I have configured two node failover cluster environment using netra a/d 1000 storage. When I try to deploy oracle server application it throws the following error
    rs-ora: resource group failed to start on chosen node; it may end up failing over to other node(s)
    I created metaset and gave one raw did disk to that metaset.
    I created logical hostname resource, ha-storage plus resource. Later I brought the resource group to online using following command
    #clrg online –emM rg-ora
    Later I created oracle cluster resource using following command.
    #clrs create -g rg-ora -t SUNW.oracle_server -p ORACLE_HOME=/global/oracle/product/10.2.0/db_1 -p ORACLE_SID=infra -p Alert_log_file=/global/oracle/product/10.2.0/db_1/admin/infra/bdump/alert_infra.log -p Connect_string=sysdba/dbadmin1@infra -p Resource_dependencies=rs-ora-has rs-ora
    node1 - Validation failed. ORACLE_HOME /global/oracle/product/10.2.0/db_1 does not exist
    node1 - ALERT_LOG_FILE /global/oracle/product/10.2.0/db_1/admin/infra/bdump/alert_infra.log doesn't exist
    node1 - PARAMETER_FILE: /global/oracle/product/10.2.0/db_1/dbs/initinfra.ora nor server PARAMETER_FILE: /global/oracle/product/10.2.0/db_1/dbs/spfileinfra.ora exists
    node1 - This resource depends on a HAStoragePlus resouce that is not online on this node. Ignoring validation errors.
    rs-ora: resource group failed to start on chosen node; it may end up failing over to other node(s)
    The status of oracle resource shows as follows.
    Resource Name Node Name State Status Message
    rs-ora node1 Start failed Faulted
    I used solaris 10 update 6 patch level is Generic_137137-09, Oracle version 10.2.0, Sun clusters 3.2 update1. Following are the vfstab and /var/adm/messages of both nodes.
    Node1#grep ora /etc/vfstab
    /dev/md/oradg/dsk/d300 /dev/md/oradg/rdsk/d300 /global/oracle ufs 5 no logging
    Node2#grep ora /etc/vfstab
    /dev/md/oradg/dsk/d300 /dev/md/oradg/rdsk/d300 /global/oracle ufs 5 no logging
    Node1#more /var/adm/messages
    Oct 17 05:19:17 node1 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hafoip_prenet_start> for resource <ha-
    host-1>, resource group <rg-ora>, node <node1>, timeout <300> seconds
    Oct 17 05:19:17 node1 Cluster.RGM.rgmd: [ID 751138 daemon.notice] 47 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/
    lib/rgm/rt/hafoip/hafoip_prenet_start>:tag=<rg-ora.ha-host-1.10>: Calling security_clnt_connect(..., host=<node1>, sec_typ
    e {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
    Oct 17 05:19:17 node1 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hafoip_prenet_start> completed successfully for
    resource <ha-host-1>, resource group <rg-ora>, node <node1>, time used: 0% of timeout <300 seconds>
    Oct 17 05:19:17 node1 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_prenet_start> for resour
    ce <rs-ora-has>, resource group <rg-ora>, node <node1>, timeout <1800> seconds
    Oct 17 05:19:17 node1 Cluster.RGM.rgmd: [ID 751138 daemon.notice] 47 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/
    lib/rgm/rt/hastorageplus/hastorageplus_prenet_start>:tag=<rg-ora.rs-ora-has.10>: Calling security_clnt_connect(..., host=<tes
    tlab5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
    Oct 17 05:19:18 node1 Cluster.RGM.rgmd: [ID 375444 daemon.notice] 8 fe_rpc_command: cmd_type(enum):<2>:cmd=<null>:tag=<rg-
    ora.rs-ora-has.10>: Calling security_clnt_connect(..., host=<node1>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<0>, ...)
    Oct 17 05:19:18 node1 Cluster.RGM.rgmd: [ID 316625 daemon.notice] Timeout monitoring on method tag <rg-ora.rs-ora-has.10>
    has been suspended.
    Oct 17 05:19:20 node1 Cluster.Framework: [ID 801593 daemon.notice] stdout: becoming primary for oradg
    Oct 17 05:19:21 node1 Cluster.RGM.rgmd: [ID 375444 daemon.notice] 8 fe_rpc_command: cmd_type(enum):<3>:cmd=<null>:tag=<rg-
    ora.rs-ora-has.10>: Calling security_clnt_connect(..., host=<node1>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<0>, ...)
    Oct 17 05:19:21 node1 Cluster.RGM.rgmd: [ID 316625 daemon.notice] Timeout monitoring on method tag <rg-ora.rs-ora-has.10>
    has been resumed.
    Oct 17 05:19:25 node1 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hastorageplus_prenet_start> completed successful
    ly for resource <rs-ora-has>, resource group <rg-ora>, node <node1>, time used: 0% of timeout <1800 seconds>
    Oct 17 05:19:25 node1 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hafoip_start> for resource <ha-host-1>
    , resource group <rg-ora>, node <node1>, timeout <500> seconds
    Oct 17 05:19:25 node1 Cluster.RGM.rgmd: [ID 751138 daemon.notice] 47 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/
    lib/rgm/rt/hafoip/hafoip_start>:tag=<rg-ora.ha-host-1.0>: Calling security_clnt_connect(..., host=<node1>, sec_type {0:WEA
    K, 1:STRONG, 2:DES} =<1>, ...)
    Oct 17 05:19:25 node1 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hafoip_start> completed successfully for resourc
    e <ha-host-1>, resource group <rg-ora>, node <node1>, time used: 0% of timeout <500 seconds>
    Oct 17 05:19:25 node1 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hafoip_monitor_start> for resource <ha
    -host-1>, resource group <rg-ora>, node <node1>, timeout <300> seconds
    Oct 17 05:19:25 node1 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_start> for resource <rs-
    ora-has>, resource group <rg-ora>, node <node1>, timeout <90> seconds
    Oct 17 05:19:25 node1 Cluster.RGM.rgmd: [ID 510020 daemon.notice] 46 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/
    lib/rgm/rt/hafoip/hafoip_monitor_start>:tag=<rg-ora.ha-host-1.7>: Calling security_clnt_connect(..., host=<node1>, sec_typ
    e {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
    Oct 17 05:19:25 node1 Cluster.RGM.rgmd: [ID 751138 daemon.notice] 47 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/
    lib/rgm/rt/hastorageplus/hastorageplus_start>:tag=<rg-ora.rs-ora-has.0>: Calling security_clnt_connect(..., host=<node1>,
    sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
    Oct 17 05:19:25 node1 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hafoip_monitor_start> completed successfully for
    resource <ha-host-1>, resource group <rg-ora>, node <node1>, time used: 0% of timeout <300 seconds>
    Oct 17 05:19:25 node1 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hastorageplus_start> completed successfully for
    resource <rs-ora-has>, resource group <rg-ora>, node <node1>, time used: 0% of timeout <90 seconds>
    Oct 17 05:19:25 node1 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_monitor_start> for resou
    rce <rs-ora-has>, resource group <rg-ora>, node <node1>, timeout <90> seconds
    Oct 17 05:19:25 node1 Cluster.RGM.rgmd: [ID 751138 daemon.notice] 47 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/
    lib/rgm/rt/hastorageplus/hastorageplus_monitor_start>:tag=<rg-ora.rs-ora-has.7>: Calling security_clnt_connect(..., host=<tes
    tlab5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
    Oct 17 05:19:25 node1 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hastorageplus_monitor_start> completed successfu
    lly for resource <rs-ora-has>, resource group <rg-ora>, node <node1>, time used: 0% of timeout <90 seconds>
    Oct 17 05:19:38 node1 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <bin/oracle_server_validate> for resour
    ce <rs-ora>, resource group <rg-ora>, node <node1>, timeout <120> seconds

    Oct 17 05:19:38 node1 Cluster.RGM.rgmd: [ID 375444 daemon.notice] 8 fe_rpc_command: cmd_type(enum):<1>:cmd=</opt/SUNWscor/
    oracle_server/bin/oracle_server_validate>:tag=<rg-ora.rs-ora.2>: Calling security_clnt_connect(..., host=<node1>, sec_type
    {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
    Oct 17 05:19:38 node1 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <bin/oracle_server_validate> completed successful
    ly for resource <rs-ora>, resource group <rg-ora>, node <node1>, time used: 0% of timeout <120 seconds>
    Oct 17 05:19:38 node1 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <bin/oracle_server_init> for resource <
    rs-ora>, resource group <rg-ora>, node <node1>, timeout <30> seconds
    Oct 17 05:19:38 node1 Cluster.RGM.rgmd: [ID 751138 daemon.notice] 47 fe_rpc_command: cmd_type(enum):<1>:cmd=</opt/SUNWscor
    /oracle_server/bin/oracle_server_init>:tag=<rg-ora.rs-ora.4>: Calling security_clnt_connect(..., host=<node1>, sec_type {0
    :WEAK, 1:STRONG, 2:DES} =<1>, ...)
    Oct 17 05:19:38 node1 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <bin/oracle_server_init> completed successfully f
    or resource <rs-ora>, resource group <rg-ora>, node <node1>, time used: 0% of timeout <30 seconds>
    Oct 17 05:19:38 node1 Cluster.CCR: [ID 973933 daemon.notice] resource rs-ora added.
    Oct 17 05:19:39 node1 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <bin/oracle_server_start> for resource
    <rs-ora>, resource group <rg-ora>, node <node1>, timeout <600> seconds
    Oct 17 05:19:39 node1 Cluster.RGM.rgmd: [ID 751138 daemon.notice] 47 fe_rpc_command: cmd_type(enum):<1>:cmd=</opt/SUNWscor
    /oracle_server/bin/oracle_server_start>:tag=<rg-ora.rs-ora.0>: Calling security_clnt_connect(..., host=<node1>, sec_type {
    0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
    Oct 17 05:19:48 node1 SC[SUNWscor.oracle_server.start]:rg-ora:rs-ora: [ID 876834 daemon.error] Could not start server
    Oct 17 05:19:48 node1 Cluster.RGM.rgmd: [ID 938318 daemon.error] Method <bin/oracle_server_start> failed on resource <rs-o
    ra> in resource group <rg-ora> [exit code <1>, time used: 1% of timeout <600 seconds>]
    Node2# more /var/adm/messages
    Oct 14 20:20:04 node2 Cluster.RGM.rgmd: [ID 529407 daemon.notice] resource group rg-ora state on node node2 change to RG_PENDIN
    G_OFFLINE
    Oct 14 20:20:04 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource rs-ora-has state on node node2 change to R_MON_STOPP
    ING
    Oct 14 20:20:04 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource ha-host-1 state on node node2 change to R_MON_STOPPI
    NG
    Oct 14 20:20:04 node2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hafoip_monitor_stop> for resource <ha-host
    -1>, resource group <rg-ora>, node <node2>, timeout <300> seconds
    Oct 14 20:20:04 node2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_monitor_stop> for resource <
    rs-ora-has>, resource group <rg-ora>, node <node2>, timeout <90> seconds
    Oct 14 20:20:04 node2 Cluster.RGM.rgmd: [ID 268902 daemon.notice] 45 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/
    rgm/rt/hafoip/hafoip_monitor_stop>:tag=<rg-ora.ha-host-1.8>: Calling security_clnt_connect(..., host=<node2>, sec_type {0:WEAK
    , 1:STRONG, 2:DES} =<1>, ...)
    Oct 14 20:20:04 node2 Cluster.RGM.rgmd: [ID 510020 daemon.notice] 46 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/
    rgm/rt/hastorageplus/hastorageplus_monitor_stop>:tag=<rg-ora.rs-ora-has.8>: Calling security_clnt_connect(..., host=<node2>, s
    ec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
    Oct 14 20:20:05 node2 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hastorageplus_monitor_stop> completed successfully f
    or resource <rs-ora-has>, resource group <rg-ora>, node <node2>, time used: 0% of timeout <90 seconds>
    Oct 14 20:20:05 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource rs-ora-has state on node node2 change to R_ONLINE_UN
    MON
    Oct 14 20:20:05 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource rs-ora-has state on node node2 change to R_STOPPING
    Oct 14 20:20:05 node2 Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource rs-ora-has status on node node2 change to R_FM_UNKNO
    WN
    Oct 14 20:20:05 node2 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource rs-ora-has status msg on node node2 change to <Stopp
    ing>
    Oct 14 20:20:05 node2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_stop> for resource <rs-ora-h
    as>, resource group <rg-ora>, node <node2>, timeout <1800> seconds
    Oct 14 20:20:05 node2 Cluster.RGM.rgmd: [ID 510020 daemon.notice] 46 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/
    rgm/rt/hastorageplus/hastorageplus_stop>:tag=<rg-ora.rs-ora-has.1>: Calling security_clnt_connect(..., host=<node2>, sec_type
    {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
    Oct 14 20:20:05 node2 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hafoip_monitor_stop> completed successfully for reso
    urce <ha-host-1>, resource group <rg-ora>, node <node2>, time used: 0% of timeout <300 seconds>
    Oct 14 20:20:05 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource ha-host-1 state on node node2 change to R_ONLINE_UNM
    ON
    Oct 14 20:20:05 node2 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hastorageplus_stop> completed successfully for resou
    rce <rs-ora-has>, resource group <rg-ora>, node <node2>, time used: 0% of timeout <1800 seconds>
    Oct 14 20:20:05 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource rs-ora-has state on node node2 change to R_STOPPED
    Oct 14 20:20:05 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource ha-host-1 state on node node2 change to R_STOPPING
    Oct 14 20:20:05 node2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hafoip_stop> for resource <ha-host-1>, res
    ource group <rg-ora>, node <node2>, timeout <300> seconds
    Oct 14 20:20:05 node2 Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource ha-host-1 status on node node2 change to R_FM_UNKNOW
    N
    Oct 14 20:20:05 node2 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource ha-host-1 status msg on node node2 change to <Stoppi
    ng>
    Oct 14 20:20:05 node2 Cluster.RGM.rgmd: [ID 510020 daemon.notice] 46 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/
    rgm/rt/hafoip/hafoip_stop>:tag=<rg-ora.ha-host-1.1>: Calling security_clnt_connect(..., host=<node2>, sec_type {0:WEAK, 1:STRO
    NG, 2:DES} =<1>, ...)
    Oct 14 20:20:06 node2 ip: [ID 678092 kern.notice] TCP_IOC_ABORT_CONN: local = 192.168.032.244:0, remote = 000.000.000.000:0, s
    tart = -2, end = 6
    Oct 14 20:20:06 node2 ip: [ID 302654 kern.notice] TCP_IOC_ABORT_CONN: aborted 0 connection
    Oct 14 20:20:06 node2 Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource ha-host-1 status on node node2 change to R_FM_OFFLIN
    E
    Oct 14 20:20:06 node2 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource ha-host-1 status msg on node node2 change to <Logica
    lHostname offline.>
    Oct 14 20:20:06 node2 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hafoip_stop> completed successfully for resource <ha
    -host-1>, resource group <rg-ora>, node <node2>, time used: 0% of timeout <300 seconds>
    Oct 14 20:20:06 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource ha-host-1 state on node node2 change to R_OFFLINE
    Oct 14 20:20:06 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource rs-ora-has state on node node2 change to R_POSTNET_S
    TOPPING

  • Can not start messaging server resource group in cluster 3.2

    Hi all,
    Please help in the following issue.
    I am not able to start resource group (msg-rg) and following is the error:
    ms1@root# clrg online -M -e msg-rg
    clrg: (C748634) Resource group msg-rg failed to start on chosen node and might fail over to other node(s)
    clrg: (C135343) No primary node could be found for resource group msg-rg; it remains offline
    scstat output (remove some for brief description)
    -- Device Group Servers --
    Device Group Primary Secondary
    Device group servers: SJMS ms1 ms2
    -- Device Group Status --
    Device Group Status
    Device group status: SJMS Online
    -- Resource Groups and Resources --
    Group Name Resources
    Resources: msg-rg mail msg-hasp-rs msg-rs
    -- Resources --
    Resource Name Node Name State Status Message
    Resource: mail ms1 Offline Offline - LogicalHostname offline.
    Resource: mail ms2 Offline Offline - LogicalHostname offline.
    Resource: msg-hasp-rs ms1 Offline Offline
    Resource: msg-hasp-rs ms2 Offline Offline
    Resource: msg-rs ms1 Offline Offline - Stop Succeeded
    Resource: msg-rs ms2 Offline Offline - Stop Succeeded
    Following is the from /var/adm/messages (remove some for brief description)
    Sep 26 12:25:19 ms1 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <ims_svc_start> for resource <msg-rs>, resou
    rce group <msg-rg>, node <ms1>, timeout <300> seconds
    Sep 26 12:25:19 ms1 Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource msg-rs status on node ms1 change to R_FM_UNKNOWN
    Sep 26 12:25:19 ms1 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource msg-rs status msg on node ms1 change to <Starting>
    Sep 26 12:25:19 ms1 Cluster.RGM.rgmd: [ID 751138 daemon.notice] 47 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/r
    gm/rt/hafoip/hafoip_monitor_start>:tag=<msg-rg.mail.7>: Calling security_clnt_connect(..., host=<ms1>, sec_type {0:WEAK, 1:ST
    RONG, 2:DES} =<1>, ...)
    Sep 26 12:25:19 ms1 Cluster.RGM.rgmd: [ID 268902 daemon.notice] 45 fe_rpc_command: cmd_type(enum):<1>:cmd=</opt/sun/comms/msg
    scha/bin/imssvc_start>:tag=<msg-rg.msg-rs.0>: Calling security_clnt_connect(..., host=<ms1>, sec_type {0:WEAK, 1:STRONG, 2:
    DES} =<1>, ...)
    Sep 26 12:25:19 ms1 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hafoip_monitor_start> completed successfully for reso
    urce <mail>, resource group <msg-rg>, node <ms1>, time used: 0% of timeout <300 seconds>
    Sep 26 12:25:19 ms1 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource mail state on node ms1 change to R_ONLINE
    Sep 26 12:26:53 ms1 Cluster.PMF.pmfd: [ID 887656 daemon.notice] Process: tag="msg-rg,msg-rs,1.svc", cmd="/bin/sh -c /opt/sun/
    comms/messaging64/bin/start-msg watcher", Failed to stay up.
    Sep 26 12:26:55 ms1 Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource msg-rs status on node ms1 change to R_FM_ONLINE
    Sep 26 12:26:55 ms1 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource msg-rs status msg on node ms1 change to <Start succe
    eded.>
    Sep 26 12:26:55 ms1 Cluster.PMF.pmfd: [ID 819736 daemon.notice] PMF is restarting process that died: tag=msg-rg,msg-rs,1.svc,
    cmd_path=/bin/sh -c /opt/sun/comms/messaging64/bin/start-msg watcher, max_retries=0, num_retries=0
    Sep 26 12:27:25 ms1 SC[SUNW.ims:7.0,msg-rg,msg-rs,ims_svc_start]: [ID 141062 daemon.error] Failed to connect to host 192.168.
    0.250 and port 27442: Connection refused.
    Sep 26 12:29:55 ms1 last message repeated 6 times
    Sep 26 12:30:26 ms1 Cluster.RGM.rgmd: [ID 764140 daemon.error] Method <ims_svc_start> on resource <msg-rs>, resource group <m
    sg-rg>, node <ms1>: Timeout.
    Sep 26 12:30:26 ms1 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource msg-rs state on node ms1 change to R_START_FAILED
    Sep 26 12:30:26 ms1 Cluster.RGM.rgmd: [ID 529407 daemon.notice] resource group msg-rg state on node ms1 change to RG_PENDING_
    OFF_START_FAILED
    Sep 26 12:30:26 ms1 Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource msg-rs status on node ms1 change to R_FM_FAULTED
    Sep 26 12:30:26 ms1 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource msg-rs state on node ms1 change to R_STOPPING
    S

    I got the mistake in adding /etc/hosts. I pasted the area here for any person who can notice if they encountered same problem or same mistake.
    it should be following format:
    192.168.0.250 mail.test.com mail msg-lcreate logical hostname as follow:
    clrslh create -g msg-rg msg-lNotice qfe0:1
    # ifconfig -a
    lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
    inet 127.0.0.1 netmask ff000000
    eri0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
    inet 192.168.0.240 netmask ffffff00 broadcast 192.168.0.255
    groupname sc_ipmp0
    ether 0:3:ba:29:8a:ac
    eri0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 >index 2
    inet 192.168.0.242 netmask ffffff00 broadcast 192.168.0.255
    qfe0: flags=9040842<BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3
    inet 192.168.0.243 netmask ffffff00 broadcast 192.168.0.255
    groupname sc_ipmp0
    ether 0:3:ba:22:d4:36
    qfe0:1: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 index 3
    inet 192.168.0.250 netmask ffffff00 broadcast 192.168.0.255
    qfe2: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 5
    inet 172.16.0.129 netmask ffffff80 broadcast 172.16.0.255
    ether 0:3:ba:22:d4:38
    qfe3: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 4
    inet 172.16.1.1 netmask ffffff80 broadcast 172.16.1.127
    ether 0:3:ba:22:d4:39
    clprivnet0: flags=1009843<UP,BROADCAST,RUNNING,MULTICAST,MULTI_BCAST,PRIVATE,IPv4> mtu 1500 >index 6
    inet 172.16.4.1 netmask fffffe00 broadcast 172.16.5.255
    ether 0:0:0:0:0:1Now I am able to plumb logical hostname ip. messaging resource group is able to swing over between nodes and resource group is able to go online (before creating messaging server resource (msg-rs).
    after creating messaging server resource, use following command to start message resource group:
    ms1@root #clrg online -eM msg-rgI used the following command to create message resource (msg-rs)
    clrs create -g msg-rg -t SUNW.ims -x IMS_serverroot=/opt/sun/comms/messaging64 -y >Resource_dependencies=msg-l,msg-hasp-rs msg-rsBut still having problem in starting resource group after adding msg-rs
    Please advise where I went wrong..
    Thanks.

  • Problem switching on resource group

    I am in the process of setting up a new two node cluster. I do have Sun Cluster 3.2 installed on a pair of recently patched Solaris 10 T1000 servers.
    I ran the command "scswitch -z -h mw3 -g ensemble-rg" and the command just hung it has not completed or timed out. I tried to stop the command with "scswitch -k -Q -g ensemble-rg" on the mw3 server but that also has not completed.
    I tried to run "clresourcegroup online +" and "clresourcegroup offline -v +" and got the same message:
    clresourcegroup: (C667636) ensemble-rg: resource group is undergoing a reconfiguration, try again later
    What do I need to do to get the resource group and hosts completed?
    Thank you,
    Tom.

    Hello Tim,
    At this point there is nothing in the ensemble-rg. The logical host ens-perf has it's own ip address separate from the two nodes in the cluster and there is no other machine that answers on that IP address on the network.
    What can I look at to show me what might be wrong with the cluster configuration?
    Here is what is in /var/adm/messages for today:
    Nov 20 09:02:47 mw3 Cluster.CCR: [ID 499775 daemon.notice] resource group ensemble-rg added.
    Nov 20 09:03:27 mw3 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hafoip_validate> for resource <ens-perf>, resource group <ensemble-rg>, node <mw3>, timeout <300> seconds
    Nov 20 09:03:27 mw3 Cluster.RGM.rgmd: [ID 375444 daemon.notice] 8 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hafoip/hafoip_validate>:tag=<ensemble-rg.ens-perf.2>: Calling security_clnt_connect(..., host=<mw3>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
    Nov 20 09:03:27 mw3 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hafoip_validate> completed successfully for resource <ens-perf>, resource group <ensemble-rg>, node <mw3>, time used: 0% of timeout <300 seconds>
    Nov 20 09:03:28 mw3 Cluster.CCR: [ID 973933 daemon.notice] resource ens-perf added.
    The scswitch command that hung was started at 09:04:51 according to ps.
    I still have the scswitch and clresourcegroup commands that are still hung.
    I was creating the resource group to install the application on the cluster nodes - so I don't have any application logs to check at this point because this is a brand new cluster that I am setting up to test.
    Here is the output for clrg status:
    clrg status
    Cluster Resource Groups ===
    Group Name Node Name Suspended Status
    ensemble-rg mw4 No Offline
    mw3 No Pending online
    I only have one resource group
    From an scstat:
    -- Device Group Servers --
    Device Group Primary Secondary
    Device group servers: ensemble mw3 mw4
    Device group servers: journal mw3 mw4
    Device group servers: wij mw3 mw4
    -- Device Group Status --
    Device Group Status
    Device group status: ensemble Online
    Device group status: journal Online
    Device group status: wij Online
    -- Multi-owner Device Groups --
    Device Group Online Status
    -- Resource Groups and Resources --
    Group Name Resources
    Resources: ensemble-rg ens-perf
    Thank you,
    Tom.

  • How to unregister Resource Group LDom without Stop it.

    Hello,I created a Resource Group and Resource Ldom in test ( SUN Cluster 4.2). Now I would like to remove these Resource Group and Resource LDom without stop the LDom itself.
    If i'm following the Oracle DOC, the LDom become in inactive state.
    clrs disable LDM-sovxxxxx
    clrs delete LDM-sovxxxxx
    root@ddom14:/etc/cluster/ccr/global# ldm ls
    NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  NORM  UPTIME
    primary          active     -n-cv-  UART    4     4G       1.8%  1.8%  105d
    sovgvacluster01p active     -n----  5000    16    10G      0.1%  0.1%  19d 2h 13m
    sovgvacluster02p active    -n----  5001    8     10G      0.1%  0.1%  5d 19h 16m
    root@ddom14:/etc/cluster/ccr/global# clresource disable LDM-sovgvacluster02
    root@ddom14:/etc/cluster/ccr/global# clrs show
    === Resources ===                             
    Resource:                                       LDM-sovgvacluster01p
      Type:                                            SUNW.ldom:4
      Type_version:                                    4
      Group:                                           sovgvacluster01p
      R_description:                                  
      Resource_project_name:                           default
      Enabled{ddom14}:                                 True
      Enabled{ddom24}:                                 True
      Monitored{ddom14}:                               True
      Monitored{ddom24}:                               True
    Resource:                                       LDM-sovgvacluster02p
      Type:                                            SUNW.ldom:4
      Type_version:                                    4
      Group:                                           sovgvacluster02p
      R_description:                                  
      Resource_project_name:                           default
      Enabled{ddom14}:                                 False
      Enabled{ddom24}:                                 False
      Monitored{ddom14}:                               True
      Monitored{ddom24}:                               True
    root@ddom14:/etc/cluster/ccr/global# ldm ls
    NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  NORM  UPTIME
    primary          active     -n-cv-  UART    4     4G       3.2%  3.2%  105d
    sovgvacluster01p active     -n----  5000    16    10G      0.1%  0.1%  19d 2h 15m
    sovgvacluster02p inactive  ------          8     10G                
    Is there any way to do that.....
    Than's for your help
    Willy

    Hi Willy,
    when the LDom is configured in a SUNW.ldom resource then it is in the control from the rgm (resource group manager) of Solaris Cluster software.
    And yes, if you disable the SUNW.ldom or delete it with ‘delete -F’ then the LDom goes down which is expected behavior.
    I don’t know what you like to reach, but maybe ‘quiesce’ or ‘suspend’ the resource group could help?
    quiesce:
    This command stops a resource group from continuously switching from one node or zone to another node or zone if a START or STOP method fails.
    Use the -k option to kill methods that are running on behalf of resources in the affected resource groups. If you do not specify the -k option, methods are allowed to continue running until they exit or exceed their configured timeout.
    suspend:
    To prevent the resource group from coming online automatically, use the suspend subcommand to suspend the automatic recovery actions of the resource group. To resume automatic recovery actions, use the resume subcommand.
    More details in the man page of clrg.
    Hth,
      Juergen

  • Testing ha-nfs in two node cluster (cannot statvfs /global/nfs: I/O error )

    Hi all,
    I am testing HA-NFS(Failover) on two node cluster. I have sun fire v240 ,e250 and Netra st a1000/d1000 storage. I have installed Solaris 10 update 6 and cluster packages on both nodes.
    I have created one global file system (/dev/did/dsk/d4s7) and mounted as /global/nfs. This file system is accessible form both the nodes. I have configured ha-nfs according to the document, Sun Cluster Data Service for NFS Guide for Solaris, using command line interface.
    Logical host is pinging from nfs client. I have mounted there using logical hostname. For testing purpose I have made one machine down. After this step files tem is giving I/O error (server and client). And when I run df command it is showing
    df: cannot statvfs /global/nfs: I/O error.
    I have configured with following commands.
    #clnode status
    # mkdir -p /global/nfs
    # clresourcegroup create -n test1,test2 -p Pathprefix=/global/nfs rg-nfs
    I have added logical hostname,ip address in /etc/hosts
    I have commented hosts and rpc lines in /etc/nsswitch.conf
    # clreslogicalhostname create -g rg-nfs -h ha-host-1 -N
    sc_ipmp0@test1, sc_ipmp0@test2 ha-host-1
    # mkdir /global/nfs/SUNW.nfs
    Created one file called dfstab.user-home in /global/nfs/SUNW.nfs and that file contains follwing line
    share -F nfs &ndash;o rw /global/nfs
    # clresourcetype register SUNW.nfs
    # clresource create -g rg-nfs -t SUNW.nfs ; user-home
    # clresourcegroup online -M rg-nfs
    Where I went wrong? Can any one provide document on this?
    Any help..?
    Thanks in advance.

    test1#  tail -20 /var/adm/messages
    Feb 28 22:28:54 testlab5 Cluster.SMF.DR: [ID 344672 daemon.error] Unable to open door descriptor /var/run/rgmd_receptionist_door
    Feb 28 22:28:54 testlab5 Cluster.SMF.DR: [ID 801855 daemon.error]
    Feb 28 22:28:54 testlab5 Error in scha_cluster_get
    Feb 28 22:28:54 testlab5 Cluster.scdpmd: [ID 489913 daemon.notice] The state of the path to device: /dev/did/rdsk/d5s0 has changed to OK
    Feb 28 22:28:54 testlab5 Cluster.scdpmd: [ID 489913 daemon.notice] The state of the path to device: /dev/did/rdsk/d6s0 has changed to OK
    Feb 28 22:28:58 testlab5 svc.startd[8]: [ID 652011 daemon.warning] svc:/system/cluster/scsymon-srv:default: Method "/usr/cluster/lib/svc/method/svc_scsymon_srv start" failed with exit status 96.
    Feb 28 22:28:58 testlab5 svc.startd[8]: [ID 748625 daemon.error] system/cluster/scsymon-srv:default misconfigured: transitioned to maintenance (see 'svcs -xv' for details)
    Feb 28 22:29:23 testlab5 Cluster.RGM.rgmd: [ID 537175 daemon.notice] CMM: Node e250 (nodeid: 1, incarnation #: 1235752006) has become reachable.
    Feb 28 22:29:23 testlab5 Cluster.RGM.rgmd: [ID 525628 daemon.notice] CMM: Cluster has reached quorum.
    Feb 28 22:29:23 testlab5 Cluster.RGM.rgmd: [ID 377347 daemon.notice] CMM: Node e250 (nodeid = 1) is up; new incarnation number = 1235752006.
    Feb 28 22:29:23 testlab5 Cluster.RGM.rgmd: [ID 377347 daemon.notice] CMM: Node testlab5 (nodeid = 2) is up; new incarnation number = 1235840337.
    Feb 28 22:37:15 testlab5 Cluster.CCR: [ID 499775 daemon.notice] resource group rg-nfs added.
    Feb 28 22:39:05 testlab5 Cluster.RGM.rgmd: [ID 375444 daemon.notice] 8 fe_rpc_command: cmd_type(enum):<5>:cmd=<null>:tag=<>: Calling security_clnt_connect(..., host=<testlab5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
    Feb 28 22:39:05 testlab5 Cluster.CCR: [ID 491081 daemon.notice] resource ha-host-1 removed.
    Feb 28 22:39:17 testlab5 Cluster.RGM.rgmd: [ID 375444 daemon.notice] 8 fe_rpc_command: cmd_type(enum):<5>:cmd=<null>:tag=<>: Calling security_clnt_connect(..., host=<testlab5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
    Feb 28 22:39:17 testlab5 Cluster.CCR: [ID 254131 daemon.notice] resource group nfs-rg removed.
    Feb 28 22:39:30 testlab5 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hafoip_validate> for resource <ha-host-1>, resource group <rg-nfs>, node <testlab5>, timeout <300> seconds
    Feb 28 22:39:30 testlab5 Cluster.RGM.rgmd: [ID 375444 daemon.notice] 8 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hafoip/hafoip_validate>:tag=<rg-nfs.ha-host-1.2>: Calling security_clnt_connect(..., host=<testlab5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
    Feb 28 22:39:30 testlab5 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hafoip_validate> completed successfully for resource <ha-host-1>, resource group <rg-nfs>, node <testlab5>, time used: 0% of timeout <300 seconds>
    Feb 28 22:39:30 testlab5 Cluster.CCR: [ID 973933 daemon.notice] resource ha-host-1 added.

  • Creation of diskset in Two node cluster

    Hi All ,
    I have created one diskset in solaris 9 using SVM in two node cluster.
    After diskset creation, I mounted the diskset in a primary node in the mount point /test. But, the disk set is mounting on both the nodes.
    I created this diskset for failover purposes, if one node goes down the other node will take care.
    My idea is to create a failover resource (diskset resources) in the two node cluster.
    Below are steps used for creating the disk set.
    root@host2# /usr/cluster/bin/scdidadm -L d8
    8        host1:/dev/rdsk/c1t9d0   /dev/did/rdsk/d8
    8        host2:/dev/rdsk/c1t9d0   /dev/did/rdsk/d8
    metaset -s diskset -a -h host2 host1
    metaset -s diskset -a -m host2 host1
    metaset -s diskset -a /dev/did/rdsk/d8
    metainit -s diskset d40 1 1 /dev/did/dsk/d8s0
    newfs /dev/md/diskset/rdsk/d40
    mount /dev/md/diskset/dsk/d40 /test
    root@host2# metaset -s diskset
    Set name = diskset, Set number = 1
    Host                Owner
      host2                  Yes
      host1
    Mediator Host(s)    Aliases
      host2
      host1
    Driv Dbase
    d8   YesPlease let me know how to mount the disk set in one node.
    If i am wrong, please correct me.
    Regards,
    R. Rajesh Kannan.

    The file system will only mount on both (all) nodes if you mount it globally, i.e with the global flag or if there is an entry in /etc/vfstab that has a global option.
    Given your output below, I would guess you have a global mount for /test defined in /etc/vfstab.
    Regards,
    Tim
    ---

  • 2 node Sun Cluster 3.2, resource groups not failing over.

    Hello,
    I am currently running two v490s connected to a 6540 Sun Storagetek array. After attempting to install the latest OS patches the cluster seems nearly destroyed. I backed out the patches and right now only one node can process the resource groups properly. The other node will appear to take over the Veritas disk groups but will not mount them automatically. I have been working on this for over a month and have learned alot and fixed alot of other issues that came up, but the cluster is just not working properly. Here is some output.
    bash-3.00# clresourcegroup switch -n coins01 DataWatch-rg
    clresourcegroup: (C776397) Request failed because node coins01 is not a potential primary for resource group DataWatch-rg. Ensure that when a zone is intended, it is explicitly specified by using the node:zonename format.
    bash-3.00# clresourcegroup switch -z zcoins01 -n coins01 DataWatch-rg
    clresourcegroup: (C298182) Cannot use node coins01:zcoins01 because it is not currently in the cluster membership.
    clresourcegroup: (C916474) Request failed because none of the specified nodes are usable.
    bash-3.00# clresource status
    === Cluster Resources ===
    Resource Name Node Name State Status Message
    ftp-rs coins01:zftp01 Offline Offline
    coins02:zftp01 Offline Offline - LogicalHostname offline.
    xprcoins coins01:zcoins01 Offline Offline
    coins02:zcoins01 Offline Offline - LogicalHostname offline.
    xprcoins-rs coins01:zcoins01 Offline Offline
    coins02:zcoins01 Offline Offline - LogicalHostname offline.
    DataWatch-hasp-rs coins01:zcoins01 Offline Offline
    coins02:zcoins01 Offline Offline
    BDSarchive-res coins01:zcoins01 Offline Offline
    coins02:zcoins01 Offline Offline
    I am really at a loss here. Any help appreciated.
    Thanks

    My advice is to open a service call, provided you have a service contract with Oracle. There is much more information required to understand that specific configuration and to analyse the various log files. This is beyond what can be done in this forum.
    From your description I can guess that you want to failover a resource group between non-global zones. And it looks like the zone coins01:zcoins01 is reported to not be in cluster membership.
    Obviously node coins01 needs to be a cluster member. If it is reported as online and has joined the cluster, then you need to verify if the zone zcoins01 is really properly up and running.
    Specifically you need to verify that it reached the multi-user milestone and all cluster related SMF services are running correctly (ie. verify "svcs -x" in the non-global zone).
    You mention Veritas diskgroups. Note that VxVM diskgroups are handled in the global cluster level (ie. in the global zone). The VxVM diskgroup is not imported for a non-global zone. However, with SUNW.HAStoragePlus you can ensure that file systems on top of VxVM diskgroups can be mounted into a non-global zone. But again, more information would be required to see how you configued things and why they don't work as you expect it.
    Regards
    Thorsten

  • How to change the primary node for a resource group. Solaris cluster. 3.2

    I have searched for hours to try to find this answer.
    I want to change the primary node of a resource group.
    example.
    log-rg runs on node1.this.com it will list node1.this.com first when you do clrg status.
    But we run it on node2.this.com
    A reboot will have log-rg run on node1 after a reboot. We have to switch it by hand to run
    on node2 .
    I want it to know that it should always try to first run on node1, but still failover to node2 if the situation arises.
    scswitch -z -g log-rg -h node2 (and all the fully qualified versions of this command)
    would not work.
    How tow can I change the primary node for log-rg (logZ) from node1 to node2???
    thanks!

    Hi.
    Show current configuration for RG:
    clrg show -v log-rg
    For change order Nodelist you can:
    clrg remove-node -n node1 log-rg
    clrg add-node -n node1 log-rg
    But tis command more destructive. It may be problem add-node back to this RG.
    I don't know why Validation of resource log-tiv in res group log-rg on node1 failed.
    Need more know about configuration, resourse type, etc.
    May be it's better create for test small RG and try move and change resource.
    Nodelist - say candidates for run this resources. But at this moment RG can run on any from this list.
    Docs about Sun Cluster.
    http://download.oracle.com/docs/cd/E19787-01/820-7360/fxjbo/index.html
    Typical tasks:
    http://download.oracle.com/docs/cd/E19787-01/820-7359/z40002701009474/index.html
    Adding or Removing a Node to or From a Resource Group
    http://download.oracle.com/docs/cd/E19787-01/820-7359/z400043a1055200/index.html

  • Cluster resource 'SQL Server' in Resource Group 'MSSQL' failed.

    Hi All,
    Last week we face problem on SQL server 2005 Cluster server.
    SQL cluster was down with below issue.
    Event 1069 : Cluster resource 'SQL Server' in Resource Group 'MSSQL' failed.  
    Event 19019 : [sqsrvres] CheckServiceAlive: Service is dead
    [sqsrvres] OnlineThread: service stopped while waiting for QP.
    [sqsrvres] OnlineThread: Error 1 bringing resource online
    Kindly any one provide resolution for my above issue.

    I have checked in event viewer Application error side error:  
    Event 19019 : [sqsrvres]
    CheckServiceAlive: Service is dead
    [sqsrvres] OnlineThread: service stopped while waiting for QP.
    [sqsrvres] OnlineThread: Error 1 bringing resource online
    System error :
    Event 1069 : Cluster resource 'SQL Server' in Resource
    Group 'MSSQL' failed.
    Before this no error is there in event viewer

  • Can Resource Group Virtual IP be same as that of any of the Cluster Node?

    Hi,
    Can Resource Group Virtual IP Address be same as the IP Address of any of the Cluster Nodes?
    As in if my Cluster node (node-1) has IP Address, say 172.23.28.218, then can I configure a Resource Group with the same IP Address, i.e. 172.23.28.218?
    Thanks,
    Chaitanya

    Chaitanya,
    Short answer - no. The physical nodes have IP addresses which are fixed and unique. A RG group can have 0 or more logical hosts (IP addresses) associated with it, one per subnet, that are also unique.
    Regards,
    Tim
    ---

  • Cluster resource SAPCCM4X.00' in Resource Group 'SAP ABC' failed

    Hallo.
    I installed SAPCCM4X following the http://www.sdn.sap.com/irj/scn/go/portal/prtroot/docs/library/uuid/f0bcedaa-dfb1-2d10-b3a9-c140aff84dc2?quicklink=index&overridelayout=true
    It is registered successfully, but it fails every 5 minutes.
    I see in the event viewer
    Cluster resource SAPCCM4X.00' in Resource Group 'SAP ABC' failed.
    What could I check?
    Thanks for your help .
    Mario

    Hallo.
    I followed the note.
    The problem is when the CCMS check the status , every 5 minutes.
    The registrazion was successful.
    I set ccms/enable_agent = -1
    I set AgentLocalHost virtualhostname
    But I have the same problem.
    I don't know what to check.

  • Why can't  Switch the Resource Group online

    I use scsetup command to setup ORACLE RAC DATASERVICE
    Then to Switch the Resource Group online:
    scswitch -Z -g rac-framework-rg
    #scstat -g
    -- Resource Groups and Resources --
    Group Name Resources
    Resources: nfs-rg cluster1-nfs nfs-stor nfs-res
    Resources: rac-framework-rg rac_framework rac_udlm rac_cvm
    -- Resource Groups --
    Group Name Node Name State Suspended
    Group: nfs-rg sysb Online No
    Group: nfs-rg sysc Offline No
    Group: rac-framework-rg sysc Online faulted No
    Group: rac-framework-rg sysb Online faulted No
    -- Resources --
    Resource Name Node Name State Status Message
    Resource: cluster1-nfs sysb Online Online - LogicalHostname online.
    Resource: cluster1-nfs sysc Offline Offline
    Resource: nfs-stor sysb Online Online
    Resource: nfs-stor sysc Offline Offline
    Resource: nfs-res sysb Online Online - Service is online.
    Resource: nfs-res sysc Offline Offline
    Resource: rac_framework sysc Start failed Faulted - Error in previous reconfiguration.
    Resource: rac_framework sysb Start failed Faulted - Error in previous reconfiguration.
    Resource: rac_udlm sysc Offline Offline
    Resource: rac_udlm sysb Offline Offline
    Resource: rac_cvm sysc Offline Offline
    Resource: rac_cvm sysb Offline Offline
    Thanks!

    The reason for this is that it allows the admin to diagnose why it failed previously without going into a loop.
    The comment in the shell script says:
    # SCMSGS
    # @explanation
    # Error was detected during previous reconfiguration of the
    # RAC framework component. Error is indicated in the message.
    # As a result of error, the ucmmd daemon was stopped and node
    # was rebooted.
    # On node reboot, the ucmmd daemon was not started on the node
    # to allow investigation of the problem.
    # RAC framework is not running on this node. Oracle parallel
    # server/ Real Application Clusters database instances will
    # not be able to start on this node.
    # @user_action
    # Review logs and messages in /var/adm/messages and
    # /var/cluster/ucmm/ucmm_reconf.log. Resolve the problem that
    # resulted in reconfiguration error. Reboot the node to start
    # RAC framework on the node.
    # Refer to the documentation of Sun Cluster support for Oracle
    # Parallel Server/ Real Application Clusters. If problem
    # persists, contact your Sun service representative.
    This should give you some idea of where the problem lies.
    Regards,
    Tim
    ---

  • SQL SERVER Failover Cluster switch failure because the passive node automatically reassign drive letter

    I switch the sql server resource group to the standby node , when the disk resource ready bring online in the passive node ,then occur exception. because the original dependency disk resource the drive letter is 'K:' , BUT when the disk bring online , it
    automatically reassign new drive letter 'H:' ,  So the sql server resource couldnot bring online . And After Manual modify the drive letter to 'K:' in the passive node , It Works !  So my question is why it not use the original drive letter
    and reassign a new one . what reasons would be cause it ? mount point ? Some log as follows:
    00001cbc.000004e0::2015/03/12-14:41:11.377 WARN  [RES] Physical Disk <FltLowestPrice_K>: OnlineThread: Failed to set volguid \??\Volume{e32c13d5-02e6-4924-a2d9-59a6fae1a1be}. Error: 183.
    00001cbc.000004e0::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk <FltLowestPrice_K>: Found 2 mount points for device \Device\Harddisk8\Partition2
    00001cbc.00001cdc::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk: PNP: Update volume exit, status 1168
    00001cbc.00001cdc::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk: PNP: Updating volume
    \\?\STORAGE#Volume#{1a8ddb8e-fe43-11e2-b7c5-6c3be5a5cdca}#0000000008100000#{53f5630d-b6bf-11d0-94f2-00a0c91efb8b}
    00001cbc.00001cdc::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk: PNP: Update volume exit, status 5023
    00001cbc.000004e0::2015/03/12-14:41:11.377 ERR   [RES] Physical Disk: Failed to get volname for drive H:\, status 2
    00001cbc.000004e0::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk <FltLowestPrice_K>: VolumeIsNtfs: Volume
    \\?\GLOBALROOT\Device\Harddisk8\Partition2\ has FS type NTFS
    00001cbc.000004e0::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk: Volume
    \\?\GLOBALROOT\Device\Harddisk8\Partition2\ has FS type NTFS
    00001cbc.000004e0::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk: MountPoint H:\ points to volume
    \\?\Volume{e32c13d5-02e6-4924-a2d9-59a6fae1a1be}\

    Sounds like you have an cluster hive that is out of date/bad, or some registry settings which are incorrect. You'll want to have this question transferred to the windows forum as that's really what you're asking about.
    -Sean
    The views, opinions, and posts do not reflect those of my company and are solely my own. No warranty, service, or results are expressed or implied.

  • Failover Cluster Core Resources question on a Windows 2008R2 three node cluster

    We have a three node Windows 2008R2 cluster with SQL Server 2008 R2 as a clustered resource. There are three resource groups in this cluster 1) Available Storage 2) Cluster Group 3) SQL Server.  The Available Storage and SQL Server resource groups
    reside on one node while the Cluster Group resides on another.  The only resources residing in the Cluster Resource Group is the Cluster name and IP.  I'd like to failover the Cluster Resource Group to be on the same node as everything else. 
    I'm not sure what the implications are on doing this.  Failing over the Cluster Group shouldn't have any impact on the SQL Server Resource Group correct or would there be an interruption to SQL because of the failover of the Cluster Group.  It's
    an critical application of which I'm trying to gather some information for a change request and I know I'm going to be asked if this impacts the production database and everybody using it.
    Thanks
    RG

    No, that should not impact anything.  The cluster group is completely separate from the SQL group.
    . : | : . : | : . tim

Maybe you are looking for