Resource Failover on Sun Cluster
Hi:
I am a newbie on Solaris Cluster (i have worked with VCS since 4 yeasr ago) and I am evaluating SC like an alternative to VCS.
I am testing in a two node cluster (SF v880 , 4 CPU's 16 Gb RAM). I have created a failover resource group with two resources:
- A logical hostname
- A HAStoragePlus resource (5 file systems)
I have enabled the monitoring and managing of the resource group. In order to test the switch of the resource group I have executed:
clresourcegroup switch -n xxxx app1_rg and works fine
If I reboot one server (witch resource group online) the resource group is realocated in the other member of the cluster.
I have found a problem (I suppose it sill be a configuration error) when I try to force a failure in the resources. By example If I umount all filesystems of the HAStoragePlus cluster doesn't detect this failure (the same when unplumb the network interface).
Could somebody help me with this?
Thanks in advance (I'm sorry because my bad English)
Hi,
It is not a configuration error, but a matter of expectations. The HAStoragePlus resource does not monitor the FS status, so the behaviour is as expected. This is not much of a problem, because an application probe will detect that the underlying FS is gon anyway. But becouse many people expressed the desire for a FS monitoring, there are discussions underway to implement one. But this is not available right now.
The network resource is different. Unplumbing is not a valid test to insert a network error. The Logical Host monitors the status of the underlying IPMP group, and unplumbing does not change that. If you want to test a network error, you have to physically remove the cables.
Cheers
Detlef
Similar Messages
-
Failed to create resource - Error in Sun cluster 3.2
Hi All,
I have a 2 node cluster in place. When i trying to create a resource, i am getting following error.
Can anybody tell me why i am getting this. I have Sun Cluster 3.2 on Solaris 10.
I have created zpool called testpool.
clrs create -g test-rg -t SUNW.HAStoragePlus -p Zpools=testpool hasp-testpool-res
clrs: sun011:test011z - : no error
clrs: (C189917) VALIDATE on resource hasp-testpool-res, resource group test-rg, exited with non-zero exit status.
clrs: (C720144) Validation of resource hasp-testpool-res in resource group test-rg on node sun011:test011z failed.
clrs: (C891200) Failed to create resource "hasp-testpool-res".
Regards
KumarThorsten,
testpool created in one of the cluster nodes and is accessible from both the nodes in the cluster. But if it is imported in one node and will not be access from other node. If other node want to get access we need to export and import testpool in other node.
Storage LUNs allocated to testpool are accessible from all the nodes in the cluster and able import and export testpool from all the nodes in the cluster.
Regards
Kumar -
Any experience with NFS failover in Sun Cluster?
Hello,
I am planning to install dual-node Sun Cluster for NFS failover configuration. The SAN storage is shared between nodes via Fibre Channel. The NFS shares will be manually assigned to nodes and should fail over / takeback between nodes.
Is this setup tested well? How the NFS clients survive the failover (without "stale NFS handle" errrors)? Does it work smoothly for Solaris,Linux,FreeBSD clients?
Please share your experience.
TIA,
-- LeonMy 3 year old linux installtion on my laptop, which is my NFS client most of the time uses udp as default (kernel 2.4.19).
Anyway the key is that the NFS client, or better, the RPC implementation on the client is intelligent enough to detect a failed TCP connection and tries to reestablish it with the same IP address. Now once the cluster has failed over the logical IP the reconnect will be successful and NFS traffic continues as if nothing bad had happened. This only(!) works if the NFS mount was done with the "hard" option. Only this makes the client retry the connection.
Other "dumb" TCP based applications might not retry and thus would need manual intervention.
Regarding UFS or PxFS, it does not make a difference. NFS does not know the difference. It shares a mount point.
Hope that helped. -
Sharing resources among resource groups in Sun Cluster 3.1
Hi all,
Is it possible to share a resource among resource groups. For example:
lh: resource of type Logical Hostname =lh-res
/orahome: Oracle binaries and configuration files = orahome-res
/oradata1: Data for instance 1 = oradata1-res
/oradata2: Data for instance 2 = oradata2-res
rg1 ( resource group for Oracle instance 1) ora1-rg = lh + orahome-res + oradata1-res
rg2 (resource group for Oracle instance 2) ora2-rg = lh + orahome-res + oradata2-res
Thanks,
EnriqueHi Enrique,
if lh represents the same address and the same resource name then the answer is: No not possible one resource can belong to only one resource group.
If it would work and both rg's are running on different node you would create duplicate ip adress errors which can not be your intent.
Which behavior do you want to achieve?
Detlef -
SUN CLUSTER RESOURCE FOR LEGATO CLIENT (LGTO.CLNT) in Oracle database
hi everyone
I am tryinig to create a LGTO.clnt resource in oracle-rg resource group in SUN CLUSTER 3.2 with the following commands
clresource create -g resource_group_name -t LGTO.clnt \
-x clientname=virtual_hostname -x owned_paths=pathname_1,
pathname_2[,...] resource_name
I just need to know what is value of Owned_Paths variable in the above commnad?
or what PATH it is reffering to ( $ORACLE_HOME or Global devices path ...etc) ?Hello,
The Owned_Paths parameter are the paths (or mountpoints) the legato client will be able to backup from.
To configure a legato client in the Networker console (and to be managed as a cluster client) you need to declare the in the Owned_Paths the paths you want to save.
The savesets paths can be a directory under the Owned_Paths.
Regards
Pablo Villanueva. -
Sun Cluster: Graph resources and resource groups dependencies
Hi,
Is there anything like the scfdot (http://opensolaris.org/os/community/smf/scfdot/) to graph resource dependencies in Sun Cluster?
Regards,
CiroSolaris 10 8/07 s10s_u4wos_12b SPARC
+ scha_resource_get -O TYPE -R lh-billapp-rs
+ echo SUNW.LogicalHostname:2
+ [ -z sa-billapp-rs ]
+ NETRS=sa-billapp-rs lh-billapp-rs
+ [ true = true -a ! -z sa-billapp-rs lh-billapp-rs ]
cluster2dot.ksh[193]: test: syntax error
+ + tr -s \n
+ scha_resource_get -O RESOURCE_DEPENDENCIES -R sa-billapp-rs
DEP=
+ [ true = true -a ! -z sa-billapp-rs lh-billapp-rs ]
cluster2dot.ksh[193]: test: syntax error
+ + tr -s \n
+ scha_resource_get -O RESOURCE_DEPENDENCIES -R lh-billapp-rs
DEP=
+ [ != ]
+ echo \t\t"lh-billapp-rs";
+ 1>> /tmp/clu-dom3-resources.dot
+ + tr -s \n
+ scha_resource_get -O RESOURCE_DEPENDENCIES_WEAK -R lh-billapp-rs
DEP_WEAK= -
Sun Cluster 3.2 without share storage. (Sun StorageTek Availability Suite)
Hi all.
I have two node sun cluster.
I am configured and installed AVS on this nodes. (AVS Remote mirror replication)
AVS working fine. But I don't understand how integrate it in cluster.
What did I do:
Created remote mirror with AVS.
v210-node1# sndradm -P
/dev/rdsk/c1t1d0s1 -> v210-node0:/dev/rdsk/c1t1d0s1
autosync: on, max q writes: 4096, max q fbas: 16384, async threads: 2, mode: sync, group: AVS_TEST_GRP, state: replicating
v210-node1#
v210-node0# sndradm -P
/dev/rdsk/c1t1d0s1 <- v210-node1:/dev/rdsk/c1t1d0s1
autosync: on, max q writes: 4096, max q fbas: 16384, async threads: 2, mode: sync, group: AVS_TEST_GRP, state: replicating
v210-node0# Created resource group in Sun Cluster:
v210-node0# clrg status avs_test_rg
=== Cluster Resource Groups ===
Group Name Node Name Suspended Status
avs_test_rg v210-node0 No Offline
v210-node1 No Online
v210-node0# Created SUNW.HAStoragePlus resource with AVS device:
v210-node0# cat /etc/vfstab | grep avs
/dev/global/dsk/d11s1 /dev/global/rdsk/d11s1 /zones/avs_test ufs 2 no logging
v210-node0#
v210-node0# clrs show avs_test_hastorageplus_rs
=== Resources ===
Resource: avs_test_hastorageplus_rs
Type: SUNW.HAStoragePlus:6
Type_version: 6
Group: avs_test_rg
R_description:
Resource_project_name: default
Enabled{v210-node0}: True
Enabled{v210-node1}: True
Monitored{v210-node0}: True
Monitored{v210-node1}: True
v210-node0# In default all work fine.
But if i need switch RG on second node - I have problem.
v210-node0# clrs status avs_test_hastorageplus_rs
=== Cluster Resources ===
Resource Name Node Name State Status Message
avs_test_hastorageplus_rs v210-node0 Offline Offline
v210-node1 Online Online
v210-node0#
v210-node0# clrg switch -n v210-node0 avs_test_rg
clrg: (C748634) Resource group avs_test_rg failed to start on chosen node and might fail over to other node(s)
v210-node0# If I change state in logging - all work.
v210-node0# sndradm -C local -l
Put Remote Mirror into logging mode? (Y/N) [N]: Y
v210-node0# clrg switch -n v210-node0 avs_test_rg
v210-node0# clrs status avs_test_hastorageplus_rs
=== Cluster Resources ===
Resource Name Node Name State Status Message
avs_test_hastorageplus_rs v210-node0 Online Online
v210-node1 Offline Offline
v210-node0# How can I do this without creating SC Agent for it?
Anatoly S. ZiminNormally you use AVS to replicate data from one Solaris Cluster to another. Can you just clarify whether you are replicating to another cluster or trying to do it between a single cluster's nodes? If it is the latter, then this is not something that Sun officially support (IIRC) - rather it is something that has been developed in the open source community. As such it will not be documented in the Sun main SC documentation set. Furthermore, support and or questions for it should be directed to the author of the module.
Regards,
Tim
--- -
Sun Cluster 3.1 Failover Resource without Logical Hostname
Maybe it could sound strange, but I'd need to create a failover service without any network resource in use (or at least with a dependency on a logical hostname created in a different resource-group).
Does anybody know how to do that?Well, you don't really NEED a LogicalHostname in a RG. So, i guess i am not understanding
the question.
Is there an application agent which demands to have a network resource in the RG? Sometimes
the VALIDATE method of such agents refuses to work if there is no network resource in
the RG.
If so, tell us a bit more about the application. Is this GDS based and generated by
Sun Cluster Agent Builder? The Agent Builder has a option of "non Network Aware", if you
select that while building you app, it ought to work without a network resource in the RG.
But maybe i should back up and ask the more basic question of exactly what is REQUIRING
you to create a LogicalHostname?
HTH,
-ashu -
Rename Sun Cluster Resource Group
Hi All,
We have a 2 node Sun Cluster 3.0 running on 2 x V440 servers. We want to change the resource group.
Can I use "scrgadm -c -g RG_NAME -h nodelist -y property" command to change the resource group name. Can I do this online while the clusters are running or do I need to bring the cluster to mainenance mode? Any help would be appreciated.
Thanks.You cannot rename a resource group in that way in Sun Cluster.
You have two options:
-Recreate the resource group with the new name
-Use an unsupported procedure to change the name in the CCR. This requires downtime of both nodes and as it is unsupported I am not going to describe it here. If that is what you want to do, please log a call with Sun.
If you think renaming resource groups is a useful feature may I also ask you to contact your Sun Service representative so that they can take proper action to log an RFE for the feature. -
Wrong hostname setting after Sun Cluster failover
Hi Gurus,
our PI system has been setup to fail over in a sun cluster with a virtual hostname s280m (primary host s280 secondary host s281)
The basis team set up the system profiles to use the virtual hostname, and I did all the steps in SAP Note 1052984 "Process Integration 7.1 High Availability" (my PI is 7.11)
Now I believe to have substituted "s280m" in every spot where previously "s280" existed, but when I start the system on the DR box (s281), the java stack throws erros when starting. Both SCS01 and DVEBMGS00 work directories contain a file called dev_sldregs with the following error:
Mon Apr 04 11:55:22 2011 Parsing XML document.
Mon Apr 04 11:55:22 2011 Supplier Name: BCControlInstance
Mon Apr 04 11:55:22 2011 Supplier Version: 1.0
Mon Apr 04 11:55:22 2011 Supplier Vendor:
Mon Apr 04 11:55:22 2011 CIM Model Version: 1.5.29
Mon Apr 04 11:55:22 2011 Using destination file '/usr/sap/XP1/SYS/global/slddest.cfg'.
Mon Apr 04 11:55:22 2011 Use binary key file '/usr/sap/XP1/SYS/global/slddest.cfg.key' for data decryption
Mon Apr 04 11:55:22 2011 Use encryted destination file '/usr/sap/XP1/SYS/global/slddest.cfg' as data source
Mon Apr 04 11:55:22 2011 HTTP trace: false
Mon Apr 04 11:55:22 2011 Data trace: false
Mon Apr 04 11:55:22 2011 Using destination file '/usr/sap/XP1/SYS/global/slddest.cfg'.
Mon Apr 04 11:55:22 2011 Use binary key file '/usr/sap/XP1/SYS/global/slddest.cfg.key' for data decryption
Mon Apr 04 11:55:22 2011 Use encryted destination file '/usr/sap/XP1/SYS/global/slddest.cfg' as data source
Mon Apr 04 11:55:22 2011 ******************************
Mon Apr 04 11:55:22 2011 *** Start SLD Registration ***
Mon Apr 04 11:55:22 2011 ******************************
Mon Apr 04 11:55:22 2011 HTTP open timeout = 420 sec
Mon Apr 04 11:55:22 2011 HTTP send timeout = 420 sec
Mon Apr 04 11:55:22 2011 HTTP response timeout = 420 sec
Mon Apr 04 11:55:22 2011 Used URL: http://s280:50000/sld/ds
Mon Apr 04 11:55:22 2011 HTTP open status: false - NI RC=0
Mon Apr 04 11:55:22 2011 Failed to open HTTP connection!
Mon Apr 04 11:55:22 2011 ****************************
Mon Apr 04 11:55:22 2011 *** End SLD Registration ***
Mon Apr 04 11:55:22 2011 ****************************
notice it is using the wrong hostname (s280 instead of s280m). Where did I forget to change the hostname? Any ideas?
thanks in advance,
PeterPlease note that the PI system is transparent about the Failover system used.
When you configure the parameters against the mentioned note, this means that in case one of the nodes is down, the load will be sent to another system under the same Web Dispatcher/Load Balancer.
When using the Solaris failover solution, it covers the whole environment, including the web dispatcher, database and all nodes.
Therefore, please check the configuration as per the page below, which talks specifically about the Solaris failover solution for SAP usage:
http://wikis.sun.com/display/SunCluster/InstallingandConfiguringSunClusterHAfor+SAP -
What are typical failover times for application X on Sun Cluster
Our company does not yet have any hands-on experience with clustering anything on Solaris, although we do with Veritas and Miscrosoft. My experience with MS is that it is as close to seemless (instantaneous) as possible. The Veritas clustering takes a little bit longer to activate the standby's. A new application we are bringing in house soon runs on Sun cluster (it is some BEA Tuxedo/WebLogic/Oracle monster). They claim the time it takes to flip from the active node to the standby node is ~30minutes. This to us seems a bit insane since they are calling this "HA". Is this type of failover time typical in Sun land? Thanks for any numbers or reference.
This is a hard question to answer because it depends on the cluster agent/application.
On one hand you may have a simple Sun Cluster application that fails over in seconds because it has to do a limited amount of work (umount here, mount there, plumb network interface, etc) to actually failover.
On the other hand these operations may, depending on the application, take longer than another application due to the very nature of that application.
An Apache web server failover may take 10-15 seconds but an Oracle failover may take longer. There are many variables that control what happens from the time that a node failure is detected to the time that an application appears on another cluster node.
If the failover time is 30 minutes I would ask your vendor why that is exactly.
Not in a confrontational way but a 'I don't get how this is high availability' since the assumption is that up to 30 minutes could elapse from the time that your application goes down to it coming back on another node.
A better solution might be a different application vendor (I know, I know) or a scalable application that can run on more than one cluster node at a time.
The logic with the scalable approach is that if a failover takes 30 minutes or so to complete it (failover) becomes an expensive operation so I would rather that my application can use multiple nodes at once rather than eat a 30 minute failover if one node dies in a two node cluster:
serverA > 30 minute failover > serverB
seems to be less desirable than
serverA, serverB, serverC, etc concurrently providing access to the application so that failover only happens when we get down to a handful of nodes
Either one is probably more desirable than having an application outage(?) -
Failover Zones / Containers with Sun Cluster Geographic Edition and AVS
Hi everyone,
Is the following solution supported/certified by Oracle/Sun? I did find some docs saying it is but cannot find concrete technical information yet...
* Two sites with a 2-node cluster in each site
* 2x Failover containers/zones that are part of the two protection groups (1x group for SAP, other group for 3rd party application)
* Sun Cluster 3.2 and Geographic Edition 3.2 with Availability Suite for SYNC/ASYNC replication over TCP/IP between the two sites
The Zones and their application need to be able to failover between the two sites.
Thanks!
Wim OlivierFritz,
Obviously, my colleagues and I, in the Geo Cluster group build and test Geo clusters all the time :-)
We have certainly built and tested Oracle (non-RAC) configurations on AVS. One issue you do have, unfortunately, is that of zones plus AVS (see my Blueprint for more details http://wikis.sun.com/display/BluePrints/Using+Solaris+Cluster+and+Sun+Cluster+Geographic+Edition). Consequently, you can't built the configuration you described. The alternative is to sacrifice zones for now and wait for the fixes to RG affinities (no idea on the schedule for this feature) or find another way to do this - probably hand crafted.
If you follow the OHAC pages (http://www.opensolaris.org/os/community/ha-clusters/) and look at the endorsed projects you'll see that there is a Script Based Plug-in on the way (for OHACGE) that I'm writing. So, if you are interested in playing with OHACGE source or the SCXGE binaries, you might see that appear at some point. Of course, these aren't supported solutions though.
Regards,
Tim
--- -
Close/shutdown the Sun Cluster Package/resource Group
Hi,
I have a SUN cluster system.
I want to know what script do when the SUN cluster shutdown the package "app-gcota-rg" as I may need to modify it ?? Where can I find out this information in the system??
In which directory and log file ???
Any suggestion ???
Resource Groups --
Group Name Node Name State
Group: ora_gcota_rg ytgcota-1 Online
Group: ora_gcota_rg ytgcota-2 Offline
Group: app-gcota-rg ytgcota-1 Online
Group: app-gcota-rg ytgcota-2 OfflineHi,
you would first find out which resources belong to app-gcota-rg.
Do a "clrs list -g app-gcota-rg". Then find out which of the resource is the one dealing with your application. Then try to find out its resource type:
"clrs show -v <resource name>| fgrep Type". If it is a standard type like HA Oracle, it is an extremely bad idea to hack the scripts, as you'll lose support. If type is SUNWgds, the scripts to start, stop and monitor the application are user supplied. You can find their pathnames using:
"clrs show -v <resource-name>| fgrep _command". This should display full pathnames.
Regards
Hartmut -
QFS Meta data resource on sun cluster failed
Hi,
I'm trying to configure QFS on cluster environment, to configure metadata resource faced error. i tried with different type of qfs none of them worked.
[root @ n1u331]
~ # scrgadm -a -j mds -g qfs-mds-rg -t SUNW.qfs:5 -x QFSFileSystem=/sharedqfs
n1u332 - shqfs: Invalid priority (0) for server n1u332FS shqfs: validate_node() failed.
(C189917) VALIDATE on resource mds, resource group qfs-mds-rg, exited with non-zero exit status.
(C720144) Validation of resource mds in resource group qfs-mds-rg on node n1u332 failed.
[root @ n1u331]
~ # scrgadm -a -j mds -g qfs-mds-rg -t SUNW.qfs:5 -x QFSFileSystem=/global/haqfs
n1u332 - Mount point /global/haqfs does not have the 'shared' option set.
(C189917) VALIDATE on resource mds, resource group qfs-mds-rg, exited with non-zero exit status.
(C720144) Validation of resource mds in resource group qfs-mds-rg on node n1u332 failed.
[root @ n1u331]
~ # scrgadm -a -j mds -g qfs-mds-rg -t SUNW.qfs:5 -x QFSFileSystem=/global/hasharedqfs
n1u332 - has: No /dsk/ string (nodev) in device.Inappropriate path in FS has device component: nodev.FS has: validate_qfsdevs() failed.
(C189917) VALIDATE on resource mds, resource group qfs-mds-rg, exited with non-zero exit status.
(C720144) Validation of resource mds in resource group qfs-mds-rg on node n1u332 failed.
any QFS expert here?hi
Yes we have 5.2, here is the wiki's link, [ http://wikis.sun.com/display/SAMQFSDocs52/Home|http://wikis.sun.com/display/SAMQFSDocs52/Home]
I have added the file system trough webconsole, and it's mounted and working fine.
after creating the file system i tried to put under sun cluster's management, but it asked for metadata resource and to create metadata resource I have got the mentioned errors.
I need the use QFS file system in non-RAC environment, just mounting and using the file system. I could mount it on two machine in shared mode and high available mode, in both case in the second node it's 3 time slower then the node which has metadata server when you write and the same read speed. could you please let me know if it's the same for your environment or not. if so what do you think of the reason, i see both side is writing to the storage directly but why it's so slow on one node.
regards, -
Sun cluster resource group online but faulted
Hi,
recently our storage admin has deleted a volume d9 from the oradg disk set by mistake. for that we created a new disk d29 and restore data to it but we forgot to remove disk d9 from the disk set. after a reboot the orarg resource group failed to go online with a faulted status because of ora-stor resource is faultedYou cannot rename a resource group in that way in Sun Cluster.
You have two options:
-Recreate the resource group with the new name
-Use an unsupported procedure to change the name in the CCR. This requires downtime of both nodes and as it is unsupported I am not going to describe it here. If that is what you want to do, please log a call with Sun.
If you think renaming resource groups is a useful feature may I also ask you to contact your Sun Service representative so that they can take proper action to log an RFE for the feature.
Maybe you are looking for
-
Installation SAP Sneak Preview 7.0 EhP1 fails.
Hi all, I want to install SAP Sneak Preview 7.0 EhP1, with the new WD4A features. I already had one installation that failed. Therefore, the official uninstall was not yet present in c:\sap\nsp\uninstall. So I had to delete all files and folders manu
-
Question about transfering one account to another?
is there anyway to transfer your old itunes account on to a new one? or change the email it is linked to?
-
Cost of sales not getting generated during final settlement of refurbishment order
Hi We use refurbishment Order and we have settlement receiver as PSG. We use RA Key. We use raw material components CE 1, , We confirm debit labor, we also receive Revenue cost element category 11 . We have RA key assigned in the Order. Now We ran an
-
Calling procedure with 2 parameters from a dynamic link
I have just another question- I have a procedure testing_del_archive which is being called with 2 parameters...from a dynamic link in my SQL Query. The following is my code.... SELECT re.report_exec_id, re.exec_userid, NVL(re.batch_exec_date, re.begi
-
Spry Menu - Current Page question
Hi I have a 2 level Spry Menu which is dynamically populated. (http://broke.brokefordwich.com.au) I am able to add a class to the menu item for the current page. This works fine for the top level items and the second level items. However, I would li