Solaris Cluster 3.3u2 configuration - Solaris Ldoms

Hi,
I've 2 sun blade T5-1B and I create 4 ldoms on each blade, also I install the Solaris Cluster 3.3u2 and its all done successfully, I did several time resource fail-over with nodes, everything seems working perfect for me as normal behavior of expected but I've an issue to test public transport testing, hereby I remove the both physical public transport cable form node1 but its didn't fail-over on node2, little confuse as its should be fail-over to node2, but not. Please let me know if where missed something to get proper fail-over?
Please advise.
regards,

Hi M10vir,
this means you are running SC3.3u2 in guest domains and pulling the cables from the primary domain?
If so, this would mean that the IPMP group used in the guest domain need to get a link down that Solaris Cluster do a failover when logical hostname resources are configured. How are the network interfaces configured and what is the status of the network interfaces when the cables are removed? Furthermore does the domains recognize that the network is gone and the relevant messages are in /var/adm/messages?
Thanks,
Juergen

Similar Messages

Configure Solaris cluster to failover guest domain when NICs were down

Hi,
I am running Solaris 11 as the control domains on 2 clustered nodes running on Solaris Cluster 4. There is a Solaris 10 guest domain which is managed via the Solaris cluster in failover mode.
2 virtual switches connected to 2 different network switches are presented to the guest domain. I would like to use link based IPMP to facilitate HA for the network connections. I understand that in this case the IPMP can only be configured within the guest domain. Now the question is how do I configure it in such a way that the guest domain fails over to the second cluster node (standby control domain) if both network interfaces are down? Thanks.
Edited by: user12925046 on Dec 25, 2012 9:48 PM
Edited by: user12925046 on Dec 25, 2012 9:49 PM

The Solaris Cluster 4.1 Installation and Concepts Guide are available at :-
http://docs.oracle.com/cd/E29086_01/index.html
Thanks.

Solaris Cluster 4.1 Quorum Configuration Best Follows metaset OR ZFS?

Solaris Cluster 4.1 Quorum Configuration Best Follows metaset OR ZFS?

If you want to use a quorum device - in contrast to a quorum server - then you'll need a LUN to configure your quorum device on.
It does not matter whether this LUN will be used later as a zpool or as an SVM metaset.
There is one exception, that should be mentioned in the docs. If the LUN used for the quorum device is later used as a disk for a zpool, and this disk gets a new EFI label, then, I think, the quorum information can get overwritten. So be careful in this specific situation and consult the docs before doing so.
Hartmut

Oracle ASM Configuration on Solaris Cluster - Oracle 11.2.0.3

Hi,
I want some clarifications!
I need to set Active and Passive Cluster settup on Solaris 10 SPARC Operating System, the HA software is Solaris Cluster and Oracle 11.2.0.3.
1) I understand "Single instance Oracle ASM is not supported with Oracle 11g release 2" so we need to go for Clustered ASM - is it required to use RAC framework in this case?
2) When i use the RAC framework, do i need to have license for RAC?
Am new to Oracle, any help is appreciated.
Regards,
Shashank

Hi,
I want some clarifications!
I need to set Active and Passive Cluster settup on Solaris 10 SPARC Operating System, the HA software is Solaris Cluster and Oracle 11.2.0.3.
1) I understand "Single instance Oracle ASM is not supported with Oracle 11g release 2" so we need to go for Clustered ASM - is it required to use RAC framework in this case?
2) When i use the RAC framework, do i need to have license for RAC?
Am new to Oracle, any help is appreciated.
Regards,
Shashank

Grid installation: root.sh failed on the first node on Solaris cluster 4.1

Hi all,
I'm trying to install the Grid (11.2.0.3.0) on the 2 node-clusters (OSC 4.1).
When I run the root.sh on the first node, I got the out put as follow:
xha239080-root-5.11# root.sh
Performing root user operation for Oracle 11g
The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /Grid/CRShome
Enter the full pathname of the local bin directory: [/usr/local/bin]:
/usr/local/bin is read only. Continue without copy (y/n) or retry (r)? [y]:
Warning: /usr/local/bin is read only. No files will be copied.
Creating /var/opt/oracle/oratab file...
Entries will be added to the /var/opt/oracle/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /Grid/CRShome/crs/install/crsconfig_params
Creating trace directory
User ignored Prerequisites during installation
OLR initialization - successful
root wallet
root wallet cert
root cert export
peer wallet
profile reader wallet
pa wallet
peer wallet keys
pa wallet keys
peer cert request
pa cert request
peer cert
pa cert
peer root cert TP
profile reader root cert TP
pa root cert TP
peer pa cert TP
pa peer cert TP
profile reader pa cert TP
profile reader peer cert TP
peer user cert
pa user cert
Adding Clusterware entries to inittab
CRS-2672: Attempting to start 'ora.mdnsd' on 'xha239080'
CRS-2676: Start of 'ora.mdnsd' on 'xha239080' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'xha239080'
CRS-2676: Start of 'ora.gpnpd' on 'xha239080' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'xha239080'
CRS-2672: Attempting to start 'ora.gipcd' on 'xha239080'
CRS-2676: Start of 'ora.cssdmonitor' on 'xha239080' succeeded
CRS-2676: Start of 'ora.gipcd' on 'xha239080' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'xha239080'
CRS-2672: Attempting to start 'ora.diskmon' on 'xha239080'
CRS-2676: Start of 'ora.diskmon' on 'xha239080' succeeded
CRS-2676: Start of 'ora.cssd' on 'xha239080' succeeded
ASM created and started successfully.
Disk Group DATA created successfully.
clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4256: Updating the profile
Successful addition of voting disk 9cdb938773bc4f16bf332edac499fd06.
Successful addition of voting disk 842907db11f74f59bf65247138d6e8f5.
Successful addition of voting disk 748852d2a5c84f72bfcd50d60f65654d.
Successfully replaced voting disk group with +DATA.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
## STATE File Universal Id File Name Disk group
1. ONLINE 9cdb938773bc4f16bf332edac499fd06 (/dev/did/rdsk/d10s6) [DATA]
2. ONLINE 842907db11f74f59bf65247138d6e8f5 (/dev/did/rdsk/d8s6) [DATA]
3. ONLINE 748852d2a5c84f72bfcd50d60f65654d (/dev/did/rdsk/d9s6) [DATA]
Located 3 voting disk(s).
Start of resource "ora.cssd" failed
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'xha239080'
CRS-2672: Attempting to start 'ora.gipcd' on 'xha239080'
CRS-2676: Start of 'ora.cssdmonitor' on 'xha239080' succeeded
CRS-2676: Start of 'ora.gipcd' on 'xha239080' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'xha239080'
CRS-2672: Attempting to start 'ora.diskmon' on 'xha239080'
CRS-2676: Start of 'ora.diskmon' on 'xha239080' succeeded
CRS-2674: Start of 'ora.cssd' on 'xha239080' failed
CRS-2679: Attempting to clean 'ora.cssd' on 'xha239080'
CRS-2681: Clean of 'ora.cssd' on 'xha239080' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'xha239080'
CRS-2677: Stop of 'ora.gipcd' on 'xha239080' succeeded
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'xha239080'
CRS-2677: Stop of 'ora.cssdmonitor' on 'xha239080' succeeded
CRS-5804: Communication error with agent process
CRS-4000: Command Start failed, or completed with errors.
Failed to start Oracle Grid Infrastructure stack
Failed to start Cluster Synchorinisation Service in clustered mode at /Grid/CRShome/crs/install/crsconfig_lib.pm line 1211.
/Grid/CRShome/perl/bin/perl -I/Grid/CRShome/perl/lib -I/Grid/CRShome/crs/install /Grid/CRShome/crs/install/rootcrs.pl execution failed
xha239080-root-5.11# history
checking the ocssd.log, I see some thing as follow:
2013-09-16 18:46:24.238: [    CSSD][1]clssscmain: Starting CSS daemon, version 11.2.0.3.0, in (clustered) mode with uniqueness value 1379371584
2013-09-16 18:46:24.239: [    CSSD][1]clssscmain: Environment is production
2013-09-16 18:46:24.239: [    CSSD][1]clssscmain: Core file size limit extended
2013-09-16 18:46:24.248: [    CSSD][1]clssscmain: GIPCHA down 1
2013-09-16 18:46:24.249: [    CSSD][1]clssscGetParameterOLR: OLR fetch for parameter logsize (8) failed with rc 21
2013-09-16 18:46:24.250: [    CSSD][1]clssscExtendLimits: The current soft limit for file descriptors is 65536, hard limit is 65536
2013-09-16 18:46:24.250: [    CSSD][1]clssscExtendLimits: The current soft limit for locked memory is 4294967293, hard limit is 4294967293
2013-09-16 18:46:24.250: [    CSSD][1]clssscGetParameterOLR: OLR fetch for parameter priority (15) failed with rc 21
2013-09-16 18:46:24.250: [    CSSD][1]clssscSetPrivEnv: Setting priority to 4
2013-09-16 18:46:24.253: [    CSSD][1]clssscSetPrivEnv: unable to set priority to 4
2013-09-16 18:46:24.253: [    CSSD][1]SLOS: cat=-2, opn=scls_mem_lockdown, dep=11, loc=mlockall
unable to lock memory
2013-09-16 18:46:24.253: [    CSSD][1](:CSSSC00011:)clssscExit: A fatal error occurred during initialization
Do anyone have any idea what going on and how can I fix it ?

Hi,
solaris has several issues with DISM, e.g.:
Solaris 10 and Solaris 11 Shared Memory Locking May Fail (Doc ID 1590151.1)
Sounds like Solaris Cluster has a similar bug. A "workaround" is to reboot the (cluster) zone, that "fixes" the mlock error. This bug was introduced with updates in september, atleast to our environment (Solaris 11.1). Prior i did not have the issue and now i have to restart the entire zone, whenever i stop crs.
With 11.2.0.3 the root.sh script can be rerun without prior cleaning up, so you should be able to continue installation at that point after the reboot. After the root.sh completes some configuration assistants need to be run, to complete the installation. You need to execute this manually as you wipe your oui session
Kind Regards
Thomas

HOWTO: Create 2-node Solaris Cluster 4.1/Solaris 11.1(x64) using VirtualBox

I did this on VirtualBox 4.1 on Windows 7 and VirtualBox 4.2 on Linux.X64. Basic pre-requisites are : 40GB disk space, 8GB RAM, 64-bit guest capable VirtualBox.
Please read all the descriptive messages/prompts shown by 'scinstall' and 'clsetup' before answering.
0) Download from OTN
- Solaris 11.1 Live Media for x86(~966 MB)
- Complete Solaris 11.1 IPS Repository Image (total 7GB)
- Oracle Solaris Cluster 4.1 IPS Repository image (~73MB)
1) Run VirtualBox Console, create VM1 : 3GB RAM, 30GB HDD
2) The new VM1 has 1 NIC, add 2 more NICs (total 3). Setting the NIC to any type should be okay, 'VirtualBox Host Only Adapter' worked fine for me.
3) Start VM1, point the "Select start-up disk" to the Solaris 11.1 Live Media ISO.
4) Select "Oracle Solaris 11.1" in the GRUB menu. Select Keyboard layout and Language.
VM1 will boot and the Solaris 11.1 Live Desktop screen will appear.
5) Click <Install Oracle Solaris> from the desktop, supply necessary inputs.
Default Disk Discovery (iSCSI not needed) and Disk Selection are fine.
Disable the "Support Registration" connection info
6) The alternate user created during the install has root privileges (sudo). Set appropriate VM1 name
7) When the VM has to be rebooted after the installation is complete, make sure the Solaris 11.1 Live ISO is ejected or else the VM will again boot from the Live CD.
8) Repeat steps 1-6, create VM2 and install Solaris.
9) FTP(secure) the Solaris 11.1 Repository IPS and Solaris Cluster 4.1 IPS onto both the VMs e.g under /home/user1/
10) We need to setup both the packages: Solaris 11.1 Repository and Solaris Cluster 4.1
11) All commands now to be run as root
12) By default the 'solaris' repository is of type online (pkg.oracle.com), that needs to be updated to the local ISO we downloaded :-
+$ sudo sh+
+# lofiadm -a /home/user1/sol-11_1-repo-full.iso+
+//output : /dev/lofi/N+
+# mount -F hsfs /dev/lofi/N /mnt+
+# pkg set-publisher -G '*' -M '*' -g /mnt/repo solaris+
13) Setup the ha-cluster package :-
+# lofiadm -a /home/user1/osc-4_1-ga-repo-full.iso+
+//output : /dev/lofi/N+
+# mkdir /mnt2+
+# mount -f hsfs /dev/lofi/N /mnt2+
+# pkg set-publisher -g file:///mnt2/repo ha-cluster+
14) Verify both packages are fine :-
+# pkg publisher+
PUBLISHER                   TYPE     STATUS P LOCATION
solaris                     origin   online F file:///mnt/repo/
ha-cluster                  origin   online F file:///mnt2/repo/
15) Install the complete SC4.1 package by installing 'ha-cluster-full'
+# pkg install ha-cluster-full+
14) Repeat steps 12-15 on VM2.
15) Now both VMs have the OS and SC4.1 installed.
16) By default the 3 NICs are in the "Automatic" profile and have DHCP configured. We need to activate the Fixed profile and put the 3 NICs into it. Only 1 interface, the public interface, needs to be
configured. The other 2 are for the cluster interconnect and will be automatically configured by scinstall. Execute the following commands :-
+# netadm enable -p ncp defaultfixed+
+//verify+
+# netadm list -p ncp defaultfixed+
+#Configure the public-interface+
+#Verify none of the interfaces are listed, add all the 3+
+# ipadm show-if+
+# run dladm show-phys or dladm show-link to check interface names : must be net0/net1/net2+
+# ipadm create-ip net0+
+# ipadm create-ip net1+
+# ipadm create-ip net2+
+# ipadm show-if+
+//select proper IP and configure the public interface. I have used 192.168.56.171 & 172+
+# ipadm create-addr -T static -a 192.168.56.171/24 net0/publicip+
+#IP plumbed, restart+
+# ipadm down-addr -t net0/publicip+
+# ipadm up-addr -t net0/publicip+
+//Verify publicip is fine by pinging the host+
+# ping 192.168.56.1+
+//Verify, net0 should be up, net1/net2 should be down+
+# ipadm+
17) Repeat step 16 on VM2
18) Verify both VMs can ping each other using the public IP. Add entries to each other's /etc/hosts
Now we are ready to run scinstall and create/configure the 2-node cluster
19)
+# cd /usr/cluster/bin+
+# ./scinstall+
select 1) Create a new cluster ...
select 1) Create a new cluster
select 2) Custom in "Typical or Custom Mode"
Enter cluster name : mycluster1 (e.g)
Add the 2 nodes : solvm1 & solvm2 and press <ctrl-d>
Accept default "No" for <Do you need to use DES authentication>"
Accept default "Yes" for <Should this cluster use at least two private networks>
Enter "No" for <Does this two-node cluster use switches>
Select "1)net1" for "Select the first cluster transport adapter"
If there is warning of unexpected traffic on "net"1, ignore it
Enter "net1" when it asks corresponding adapter on "solvm2"
Select "2)net2" for "Select the second cluster transport adapter"
Enter "net2" when it asks corresponding adapter on "solvm2"
Select "Yes" for "Is it okay to accept the default network address"
Select "Yes" for "Is it okay to accept the default network netmask"Now the IP addresses 172.16.0.0 will be plumbed in the 2 private interfaces
Select "yes" for "Do you want to turn off global fencing"
(These are SATA serial disks, so no fencing)
Enter "Yes" for "Do you want to disable automatic quorum device selection"
(we will add quorum disks later)
Enter "Yes" for "Proceed with cluster creation"
Select "No" for "Interrupt cluster creation for cluster check errors"
The second node will be configured and 2nd node rebooted
The first node will be configured and rebootedAfter both nodes have rebooted, verify the cluster has been created and both nodes joined.
On both nodes :-
+# cd /usr/cluster/bin+
+# ./clnode status+
+//should show both nodes Online.+
At this point there are no quorum disks, so 1 of the node's will be designated quorum vote. That node VM has to be up for the other node to come up and cluster to be formed.
To check the current quorum status, run :-
+# ./clquorum show+
+//one of the nodes will have 1 vote and other 0(zero).+
20)
Now the cluster is in 'Installation Mode' and we need to add a quorum disk.
Shutdown both the nodes as we will be adding shared disks to both of them
21)
Create 2 VirtualBox HDDs (VDI Files) on the host, 1 for quorum and 1 for shared filesystem. I have used a size of 1 GB for each :-
*$ vboxmanage createhd --filename /scratch/myimages/sc41cluster/sdisk1.vdi --size 1024 --format VDI --variant Fixed*
*0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%*
*Disk image created. UUID: 899147b9-d21f-4495-ad55-f9cf1ae46cc3*
*$ vboxmanage createhd --filename /scratch/myimages/sc41cluster/sdisk2.vdi --size 1024 --format VDI --variant Fixed*
*0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%*
*Disk image created. UUID: 899147b9-d22f-4495-ad55-f9cf15346caf*
22)
Attach these disks to both the VMs as shared type
*$ vboxmanage storageattach solvm1 --storagectl "SATA" --port 1 --device 0 --type hdd --medium /scratch/myimages/sc41cluster/sdisk1.vdi --mtype shareable*
*$ vboxmanage storageattach solvm1 --storagectl "SATA" --port 2 --device 0 --type hdd --medium /scratch/myimages/sc41cluster/sdisk2.vdi --mtype shareable*
*$ vboxmanage storageattach solvm2 --storagectl "SATA" --port 1 --device 0 --type hdd --medium /scratch/myimages/sc41cluster/sdisk1.vdi --mtype shareable*
*$ vboxmanage storageattach solvm2 --storagectl "SATA" --port 2 --device 0 --type hdd --medium /scratch/myimages/sc41cluster/sdisk2.vdi --mtype shareable*
The disks are attached to SATA ports 1 & 2 of each VM. On my VirtualBox on Linux, the controller type is "SATA", whereas on Windows it is "SATA Controller".
The "--mtype shareable' parameter is important
23)
Mark both disks as shared :-
*$ vboxmanage modifyhd /scratch/myimages/sc41cluster/sdisk1.vdi --type shareable*
*$ vboxmanage modifyhd /scratch/myimages/sc41cluster/sdisk2.vdi --type shareable*
24) Start both VMs. We need to format the 2 shared disks
25) From VM1, run format. In my case, the 2 new shared disks show up as 'c7t1d0' and 'c7t2d0'.
+# format+
select disk 1 (c7t1d0)
[disk formated]
FORMAT MENU
fdisk
Type 'y' to accept default partition
partition
0
<enter>
<enter>
1
995mb
print
label
<yes>
quit
quit26) Repeat step 25) for the 2nd disk (c7t2d0)
27) Make sure the shared disks can be used for quorum :-
On VM1
+# ./cldevice refresh+
+# ./cldevice show+
On VM2
+# ./cldevice refresh+
+# ./cldevice show+
The shared disks should have the same DID (d2,d3,d4 etc). Note down the DID that you are going to use for quorum (e.g d2)
By default, global fencing is enabled for these disks. We need to turn it off for all disks as these are SATA disks :-
+# cldevice set -p default_fencing=nofencing-noscrub d1+
+# cldevice set -p default_fencing=nofencing-noscrub d2+
+# cldevice set -p default_fencing=nofencing-noscrub d3+
+# cldevice set -p default_fencing=nofencing-noscrub d4+
28) It is better to do one more reboot of both VMs, otherwise I got a error when adding the quorum disk
29) Run clsetup to add quorum disk and to complete cluster configuration :-
+# ./clsetup+
=== Initial Cluster Setup ===
Enter 'Yes' for "Do you want to continue"
Enter 'Yes' for "Do you want add any quorum devices"
Select '1) Directly Attached Shared Disk' for the type of device
Enter 'Yes' for "Is it okay to continue"
Enter 'd2' (or 'd3') for 'Which global device do you want to use'
Enter 'Yes' for "Is it okay to proceed with the update"
The command 'clquorum add d2' is run
Enter 'No' for "Do you want to add another quorum device"
Enter 'Yes' for "Is it okay to reset "installmode"?"Cluster initialization is complete.!!!
30) Run 'clquorum status' to confirm both nodes and the quorum disk have 1 vote each
31) Run other cluster commands to explore!
I will cover Data services and shared file system in another post. Basically the other shared disk
can be used to create a UFS filesystem and mount it on all nodes.

The Solaris Cluster 4.1 Installation and Concepts Guide are available at :-
http://docs.oracle.com/cd/E29086_01/index.html
Thanks.

Solaris cluster 3.2 Sparc

Hi folks
First things first. I may not have great knowledge about Solaris clusters, so please be merciful :)
Here it is what I have:
- 2 x Netra T1 AC200 each with 1GB Ram, 2x18GB disks, 500 MHZ Sparc Cpu, 4 port ethernet card
- 1 array netra d130 3x36 GB
-- cable et all, switches , you name it
So, I set up the OS, all ok. I set up the cluster, all SEEMS to be ok.
But when I define my resources and stuff like that all goes fine, except when I try top bring the resource group on line.
On another configuration I teste the shared logical hostname and works fine.
Group Name Resources
Resources: ingresc nodec ingresr
-- Resource Groups --
Group Name Node Name State Suspended
Group: ingresc node2 Unmanaged No
Group: ingresc node1 Unmanaged No
-- Resources --
Resource Name Node Name State Status Message
Resource: nodec node2 Offline Offline
Resource: nodec node1 Offline Offline
Resource: ingresr node2 Offline Offline
Resource: ingresr node1 Offline Offline
scswitch: (C969069) Request failed because resource group ingresc is in ERROR_STOP_FAILED state and requires operator attention
Now, in /var/adm/messsages I spotted this :
Mar 6 17:09:03 node2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hafoip_stop> for resource <nodec>, resource group <IngresNCG>, node <node2>, timeout <300> seconds
Mar 6 17:09:03 node2 Cluster.RGM.rgmd: [ID 510020 daemon.notice] 46 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hafoip/hafoip_stop>:tag=<IngresNCG.nodec.1>: Calling security_clnt_connect(..., host=<node2>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
A little bit of research points in the direction of a bug (see CR 6565601)
Here it is what I see as my options:
1 - reinstall Solaris OS, but not the Solaris Cluster 3.2, instead using Solaris Express 10/07 or 2/08. But will this combination work ? Or will it work only in the combination Solaris Cluster Express and Solaris Express Developer Edition ? If the later, which versions will work together ?
2 - Beg for a Solaris Cluster 3.2 patch, although in my humble opinion, this should be free since it looks to me that once you write your own stuff, you run in the bug, and after all it is education
Any ideas, help, greatly appreciated
Many thanks
Armand

Although names are different since I used two setups, this is the relevant part of /var/adm/messages.
It looks to me Ingres resource is failing:
Mar 6 17:08:03 node2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hafoip_prenet_start> for resource <nodec>, resource group <IngresNCG>, node <node2>, timeout <300> seconds
Mar 6 17:08:03 node2 Cluster.RGM.rgmd: [ID 510020 daemon.notice] 46 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hafoip/hafoip_prenet_start>:tag=<IngresNCG.nodec.10>: Calling security_clnt_connect(..., host=<node2>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
Mar 6 17:08:05 node2 svc.startd[8]: [ID 652011 daemon.warning] svc:/system/cluster/scsymon-srv:default: Method "/usr/cluster/lib/svc/method/svc_scsymon_srv start" failed with exit status 96.
Mar 6 17:08:05 node2 svc.startd[8]: [ID 748625 daemon.error] system/cluster/scsymon-srv:default misconfigured: transitioned to maintenance (see 'svcs -xv' for details)
Mar 6 17:08:09 node2 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hafoip_prenet_start> completed successfully for resource <nodec>, resource group <IngresNCG>, node <node2>, time used: 1% of timeout <300 seconds>
Mar 6 17:08:09 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource nodec state on node node2 change to R_PRENET_STARTED
Mar 6 17:08:09 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource nodec state on node node2 change to R_STARTING
Mar 6 17:08:09 node2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hafoip_start> for resource <nodec>, resource group <IngresNCG>, node <node2>, timeout <500> seconds
Mar 6 17:08:09 node2 Cluster.RGM.rgmd: [ID 510020 daemon.notice] 46 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hafoip/hafoip_start>:tag=<IngresNCG.nodec.0>: Calling security_clnt_connect(..., host=<node2>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
Mar 6 17:08:11 node2 Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource nodec status on node node2 change to R_FM_ONLINE
Mar 6 17:08:11 node2 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource nodec status msg on node node2 change to <LogicalHostname online.>
Mar 6 17:08:11 node2 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hafoip_start> completed successfully for resource <nodec>, resource group <IngresNCG>, node <node2>, time used: 0% of timeout <500 seconds>
Mar 6 17:08:11 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource nodec state on node node2 change to R_JUST_STARTED
Mar 6 17:08:11 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource nodec state on node node2 change to R_ONLINE_UNMON
Mar 6 17:08:11 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource IngresNCR state on node node2 change to R_STARTING
Mar 6 17:08:11 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource nodec state on node node2 change to R_MON_STARTING
Mar 6 17:08:11 node2 Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource IngresNCR status on node node2 change to R_FM_UNKNOWN
Mar 6 17:08:11 node2 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource IngresNCR status msg on node node2 change to <Starting>
Mar 6 17:08:11 node2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <bin/ingres_server_start> for resource <IngresNCR>, resource group <IngresNCG>, node <node2>, timeout <300> seconds
Mar 6 17:08:11 node2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hafoip_monitor_start> for resource <nodec>, resource group <IngresNCG>, node <node2>, timeout <300> seconds
Mar 6 17:08:11 node2 Cluster.RGM.rgmd: [ID 510020 daemon.notice] 46 fe_rpc_command: cmd_type(enum):<1>:cmd=</global/disk2s0/ing_nc_1/ingresclu/bin/ingres_server_start>:tag=<IngresNCG.IngresNCR.0>: Calling security_clnt_connect(..., host=<node2>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
Mar 6 17:08:11 node2 Cluster.RGM.rgmd: [ID 268902 daemon.notice] 45 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hafoip/hafoip_monitor_start>:tag=<IngresNCG.nodec.7>: Calling security_clnt_connect(..., host=<node2>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
Mar 6 17:08:12 node2 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hafoip_monitor_start> completed successfully for resource <nodec>, resource group <IngresNCG>, node <node2>, time used: 0% of timeout <300 seconds>
Mar 6 17:08:12 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource nodec state on node node2 change to R_ONLINE
Mar 6 17:08:13 node2 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource IngresNCR status msg on node node2 change to <Bringing Ingres DBMS server online.>
Mar 6 17:08:30 node2 sendmail[534]: [ID 702911 mail.alert] unable to qualify my own domain name (node2) -- using short name
Mar 6 17:08:30 node2 sendmail[535]: [ID 702911 mail.alert] unable to qualify my own domain name (node2) -- using short name
Mar 6 17:08:31 node2 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource IngresNCR status msg on node node2 change to <Bringing Ingres DBMS server offline.>
Mar 6 17:08:45 node2 SC[Ingres.ingres_server,IngresNCG,IngresNCR,stop]: [ID 147958 daemon.error] ERROR : HA-Ingres failed to stop.
Mar 6 17:08:46 node2 Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource IngresNCR status on node node2 change to R_FM_FAULTED
Mar 6 17:08:46 node2 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource IngresNCR status msg on node node2 change to <Ingres DBMS server faulted.>
Mar 6 17:08:46 node2 SC[Ingres.ingres_server,IngresNCG,IngresNCR,start]: [ID 335575 daemon.error] ERROR : Stop method failed for the HA-Ingres data service.
Mar 6 17:08:46 node2 Cluster.RGM.rgmd: [ID 938318 daemon.error] Method <bin/ingres_server_start> failed on resource <IngresNCR> in resource group <IngresNCG> [exit code <1>, time used: 11% of timeout <300 seconds>]
Mar 6 17:08:46 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource IngresNCR state on node node2 change to R_START_FAILED
Mar 6 17:08:46 node2 Cluster.RGM.rgmd: [ID 529407 daemon.notice] resource group IngresNCG state on node node2 change to RG_PENDING_OFF_START_FAILED
Mar 6 17:08:46 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource IngresNCR state on node node2 change to R_STOPPING
Mar 6 17:08:46 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource nodec state on node node2 change to R_MON_STOPPING
Mar 6 17:08:46 node2 Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource IngresNCR status on node node2 change to R_FM_UNKNOWN
Mar 6 17:08:46 node2 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource IngresNCR status msg on node node2 change to <Stopping>
Mar 6 17:08:46 node2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <bin/ingres_server_stop> for resource <IngresNCR>, resource group <IngresNCG>, node <node2>, timeout <300> seconds
Mar 6 17:08:46 node2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hafoip_monitor_stop> for resource <nodec>, resource group <IngresNCG>, node <node2>, timeout <300> seconds
Mar 6 17:08:46 node2 Cluster.RGM.rgmd: [ID 510020 daemon.notice] 46 fe_rpc_command: cmd_type(enum):<1>:cmd=</global/disk2s0/ing_nc_1/ingresclu/bin/ingres_server_stop>:tag=<IngresNCG.IngresNCR.1>: Calling security_clnt_connect(..., host=<node2>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
Mar 6 17:08:46 node2 Cluster.RGM.rgmd: [ID 268902 daemon.notice] 45 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hafoip/hafoip_monitor_stop>:tag=<IngresNCG.nodec.8>: Calling security_clnt_connect(..., host=<node2>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
Mar 6 17:08:47 node2 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource IngresNCR status msg on node node2 change to <Bringing Ingres DBMS server offline.>
Mar 6 17:08:48 node2 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hafoip_monitor_stop> completed successfully for resource <nodec>, resource group <IngresNCG>, node <node2>, time used: 0% of timeout <300 seconds>
Mar 6 17:08:48 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource nodec state on node node2 change to R_ONLINE_UNMON
Mar 6 17:09:00 node2 SC[Ingres.ingres_server,IngresNCG,IngresNCR,stop]: [ID 147958 daemon.error] ERROR : HA-Ingres failed to stop.
Mar 6 17:09:02 node2 Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource IngresNCR status on node node2 change to R_FM_FAULTED
Mar 6 17:09:02 node2 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource IngresNCR status msg on node node2 change to <Ingres DBMS server faulted.>
Mar 6 17:09:03 node2 Cluster.RGM.rgmd: [ID 938318 daemon.error] Method <bin/ingres_server_stop> failed on resource <IngresNCR> in resource group <IngresNCG> [exit code <2>, time used: 5% of timeout <300 seconds>]
Mar 6 17:09:03 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource IngresNCR state on node node2 change to R_STOP_FAILED
Mar 6 17:09:03 node2 Cluster.RGM.rgmd: [ID 529407 daemon.notice] resource group IngresNCG state on node node2 change to RG_PENDING_OFF_STOP_FAILED
Mar 6 17:09:03 node2 Cluster.RGM.rgmd: [ID 424774 daemon.error] Resource group <IngresNCG> requires operator attention due to STOP failure
Mar 6 17:09:03 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource nodec state on node node2 change to R_STOPPING
Mar 6 17:09:03 node2 Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource nodec status on node node2 change to R_FM_UNKNOWN
Mar 6 17:09:03 node2 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource nodec status msg on node node2 change to <Stopping>
Mar 6 17:09:03 node2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hafoip_stop> for resource <nodec>, resource group <IngresNCG>, node <node2>, timeout <300> seconds
Mar 6 17:09:03 node2 Cluster.RGM.rgmd: [ID 510020 daemon.notice] 46 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hafoip/hafoip_stop>:tag=<IngresNCG.nodec.1>: Calling security_clnt_connect(..., host=<node2>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
Mar 6 17:09:04 node2 ip: [ID 678092 kern.notice] TCP_IOC_ABORT_CONN: local = 192.168.005.085:0, remote = 000.000.000.000:0, start = -2, end = 6
Mar 6 17:09:04 node2 ip: [ID 302654 kern.notice] TCP_IOC_ABORT_CONN: aborted 0 connection
Mar 6 17:09:04 node2 Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource nodec status on node node2 change to R_FM_OFFLINE
Mar 6 17:09:04 node2 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource nodec status msg on node node2 change to <LogicalHostname offline.>
Mar 6 17:09:04 node2 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hafoip_stop> completed successfully for resource <nodec>, resource group <IngresNCG>, node <node2>, time used: 0% of timeout <300 seconds>
Mar 6 17:09:04 node2 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource nodec state on node node2 change to R_OFFLINE
Mar 6 17:09:04 node2 Cluster.RGM.rgmd: [ID 529407 daemon.notice] resource group IngresNCG state on node node2 change to RG_ERROR_STOP_FAILED
Mar 6 17:09:04 node2 Cluster.RGM.rgmd: [ID 424774 daemon.error] Resource group <IngresNCG> requires operator attention due to STOP failure
Mar 6 17:09:04 node2 Cluster.RGM.rgmd: [ID 663692 daemon.error] failback attempt failed on resource group <IngresNCG> with error <resource group in ERROR_STOP_FAILED state requires operator attention>
Mar 6 17:09:10 node2 java[1652]: [ID 807473 user.error] pkcs11_softtoken: Keystore version failure.Thank you
Armand

FYI: Solaris Cluster 3.2 01/09 (aka u2) has now been released

FYI:
Solaris Cluster 3.2 01/09 (aka u2) has now been released.
The new Solaris Cluster 3.2 1/09 update adds innovative business continuity capabilities for Oracle RAC in Solaris virtualized environments, enables increased high availability with new monitoring capabilities and configuration advisors, supports more types of storage solutions and provides simplified configurations.
For additional details go to [http://sun.com/cluster]
Download the software from here
Solaris Cluster engineering will blog about it here
You can find the documentation here
Enjoy.
Tim
---

From what I have read Sun Cluster 3.2 will not be supported with RAC 11g R2 untill September time frame.
The functionality will be released via a cluster core patch.
Currently Sun Cluster is looking down in /etc/init.d/init.crs to start CRS (11g R1). With 11g R2 the start script for GRID is now /etc/init.d/ohasd.
SC does support HA 11g R2, but not RAC at this point in time.

DAA Installation in Solaris cluster Environment

Hello,
We are integrate our PRD system to SOlution Manager 7.1 .
Our System is in Solaris Cluster where in DB run on one node and SAP on another.
Right now I have installed DAA in Node A where SAP is installed. In managed system setup in solution manager I can see both the nodes and I can assign the DAA for SAP ABAP system .
My doubt is do I have to install DAA again in node B and do the same manged system steps for Node B in Solman.
I am not able to find any relevant blog or notes related to this scenario.
It would be great if someone can guide me through the process.
Thanks
Raghu

Hi ,
Ok , so as Divyanshu said I can just continue installation on Node B with same DAA SID.
But If I going to install as Amar said I have to uninstall the current installation and start a fresh installation on both Node without mentioning Logical Hostname.
Amar,
I have already gone through that link but now the problem is I have to convince the client to uninstall the DAA which is already installed . To get permission and to redo activities its bit complicated.
So I probably go with Divyanshu's method . Install DAA in Node B with same SID and continue the same configuration in Solman.
I will update the result once I finish with that.
Thank you
Regards
Raghu

Solaris cluster, Oracle10g RAC

I just want to understand, which one is more favorable, most popular combination used by sun customers for Oracle10g RAC.
1) Solaris cluster + VERITAS Storage Foundation
2) Solaris cluster + QFS

Please refer to http://docs.sun.com/app/docs/doc/820-2574/fmnyo?a=view for the supported options for Oracle RAC data storage. This is important because if you stray outside these you will not be on a jointly Sun/Oracle supported configuration.
Therefore, if you want to put Oracle RAC (tablespace) data files on a cluster file system, you must use shared QFS as that is the only supported option open to you. Furthermore, you can only run sQFS on top of SVM/Oban or h/w RAID - we do not support running it on VxVM/CVM.
Regards,
Tim
---

Solaris cluster 3.2 with zfs failover filesystem failed. How can I recover?

Hi all,
I have just install and configure Solaris cluster 3.2U3 using zfs for both of root filesystem and shared storage file system.
This cluster operate clearly. Today, I can not see the zpool for shared storage. I can see the storage volume in the output of format command.
So all my resources change to offline status. and my application is failed.
How can I recover this cluster??????
is there any body can help me :(

Have you used a SUNW.HAStoragePlus (HASP) resource to control your zpool? If not, the zpool is probably needs importing. That is what the HASP resource would do for you. You would also need a dependency from your application on the HASP resource to ensure that your application does not try to start up before the storage is avaialable.
Regards,
Tim
---

Timefinder mirror BCV issues after installing Solaris Cluster 3.2

I just installed Solaris Cluster 3.2, and since then, my BVC's will no longer mount. I have a case open with EMC already, just thought I check and see if anyone here has ever ran into the same issues and found a fix. The BCV job runs just fine until it comes to the time to mount it under an alternate location. This fails which I initially assumed was because SC needs the diskgroup to be registered. However, I have BCV's running on other servers that use SC3.2 as well, and they work just fine. If anyone has any past experience with this, please let me know how you overcame the problem.
TIA...

Did a rebuild with current patches, did not run JASS or an in-house hardening script on the nodes and I get the same result.
I think I might have a bug issue with the bge driver or something.
On the first node before running scinstall all nic lights are lit.
After that first node reboots from scinstall the nic lights stay lit until right before the CMM messages begin appearing on the console. At that point the interconnect lights go out and stay out physically.
No errors were detected with sccheck or in the install logs for the cluster.
I tried a rebuild of the cluster nodes using a switch (ProCurve 2626) for the interconnects rather than an ethernet cable or cross-over cable.
I have a hme interface in my V210 and V240 and I am going to use that for one of the interconnects to see if it matters.
Basically at this point it is definitely not something physical (bad cable, bad switch port, etc) but something in the cluster configuration from scinstall that is not digging the interconnects and keeping the cluster nodes from conversing.
Since the cluster isn't working anyway I can do a clintr enable node:port,switch@port and see that the ports and switch ports show as enabled by clintr status does not show an interconnect active and the physical ports are not lit.
I do see references to bgeX/0 unregistered in /var/adm/messages but I haven't found information as to what this means or what to do about it exactly yet.
Closest thing so far is this:
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6453203

Solaris Cluster and Global Devices for sapmnt

Hi,
IHAC that is considering using Global Devices for /usr/sapmnt in a SAP environment, since they need all SAP nodes looking for the same sapmnt area. Is this a recommended approach?
Generally, we are working with S11.1/Solaris Cluster 4.1 to implement Netweaver 7.2 and ERP 6.0.
I appreciate your comments.
Regards, Rafael.

Hi Rafael,
/usr/sapmnt is a file system, as such I assume when you ask about usage of global devices you really mean to ask about using a global file system (ie. an UFS file system with mount option global)?
If that is true, then the answer is yes.
The data service guide for SAP NetWeaver (http://docs.oracle.com/cd/E29086_01/html/E29440/installconfig-10.html#installconfig-34) does mention in the section "configuration considerations":
"The SAP enqueue server and SAP replica server run on different cluster nodes. Therefore, the SAP application files (binary files, configuration files, and parameter files) can be installed either on the global file system or on the local file system. However, the application files for each of these applications must be accessible at all times from the nodes on which these applications are running."
And the deployment example in the appendix (http://docs.oracle.com/cd/E29086_01/html/E29440/gmlbt.html#scrolltoc) makes use of a global mounted /sapstore file system.
Regards
Thorsten

Solaris Cluster Private Link Failure

Hi,
I have configured Solaris Cluster 3.3 and add two Back to Back interconnect cable.
Sun Cluster is working fine but private link is fail and i can not ping the clusternode2-priv and clusternode1-priv form each other. some cammands faile
~ # ping clusternode2-priv
no answer from clusternode2-priv
~ # metaset -s nfsds -a -h t1u331 t1u332
metaset: 172.16.4.1: metad client create: RPC: Rpcbind failure
~ # scstat
-- Cluster Nodes --
Node name Status
Cluster node: n1u332 Online
Cluster node: n1u331 Online
-- Cluster Transport Paths --
Endpoint Endpoint Status
Transport path:   n1u332:nxge2           n1u331:nxge2           Path online
Transport path:   n1u332:nxge1           n1u331:nxge1           Path online
-- Quorum Summary from latest node reconfiguration --
Quorum votes possible: 3
Quorum votes needed: 2
Quorum votes present: 3
-- Quorum Votes by Node (current status) --
Node Name Present Possible Status
Node votes: n1u332 1 1 Online
Node votes: n1u331 1 1 Online
-- Quorum Votes by Device (current status) --
Device Name Present Possible Status
Device votes: /dev/did/rdsk/d4s2 1 1 Online
-- Device Group Servers --
Device Group Primary Secondary
-- Device Group Status --
Device Group Status
-- Multi-owner Device Groups --
Device Group Online Status
-- Resource Groups and Resources --
Group Name Resources
-- Resource Groups --
Group Name Node Name State Suspended
-- Resources --
Resource Name Node Name State Status Message
-- IPMP Groups --
Node Name Group Status Adapter Status
[root @ n1u332]
~ # ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000802<BROADCAST,MULTICAST,IPv4> mtu 1500 index 2
inet 0.0.0.0 netmask 0
ether 0:15:17:e3:a4:e8
vsw0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3
inet 10.131.58.76 netmask ffffff00 broadcast 10.131.58.255
groupname ipmp-grp
ether 0:14:4f:f9:1:bd
vsw0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
inet 10.131.58.75 netmask ffffff00 broadcast 10.131.58.255
vsw1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 4
inet 10.131.58.77 netmask ffffff00 broadcast 10.131.58.255
groupname ipmp-grp
ether 0:14:4f:fb:44:4
nxge1: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 7
inet 172.16.0.129 netmask ffffff80 broadcast 172.16.0.255
ether 0:14:4f:a0:81:d9
nxge2: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 6
inet 172.16.1.1 netmask ffffff80 broadcast 172.16.1.127
ether 0:14:4f:a0:81:da
clprivnet0: flags=1009843<UP,BROADCAST,RUNNING,MULTICAST,MULTI_BCAST,PRIVATE,IPv4> mtu 1500 index 8
inet 172.16.4.1 netmask fffffe00 broadcast 172.16.5.255
ether 0:0:0:0:0:1
[root @ n1u332]
~ # dladm show-dev
vsw0 link: up speed: 1000 Mbps duplex: full
vsw1 link: up speed: 1000 Mbps duplex: full
e1000g0 link: down speed: 0 Mbps duplex: half
e1000g1 link: up speed: 1000 Mbps duplex: full
e1000g2 link: unknown speed: 0 Mbps duplex: half
e1000g3 link: unknown speed: 0 Mbps duplex: half
nxge0 link: up speed: 100 Mbps duplex: full
nxge1 link: up speed: 1000 Mbps duplex: full
nxge2 link: up speed: 1000 Mbps duplex: full
nxge3 link: up speed: 100 Mbps duplex: full
e1000g4 link: unknown speed: 0 Mbps duplex: half
e1000g5 link: up speed: 1000 Mbps duplex: full
clprivnet0              link: unknown   speed: 0     Mbps       duplex: unknown
Edited by: 808696 on Mar 2, 2011 8:27 AM

If your private interconnect had really failed then one or other of the cluster nodes would have panicked. I think it is more likely that either you have changed the nsswitch.conf entry for hosts such that it does not include 'cluster' first, although I would have expected that to result in an unresolved host name. The other option is that you have hardened your machine in some way with ipfilters or security settings.
Has it ever worked?
Tim
---

Hardware recommendations for learning Solaris Cluster on Sparc (at home)

On a low budget, I'd like to put together a Solaris Cluster on Sparc (at home). At "work" in the next year we will be implementing a Solaris Cluster to run Tomcat and a custom CORBA server. (These apps will be migrated from very old hardware and VCS) The CORBA server is a Sparc binary, hence the need for Sparc. I'd like my home-office cluster to be similar in function to what I have at work. At work we have (2) T5120 Servers and a 2540 (2500-M2) Array waiting. From looking at the Solaris Cluster docs, it looks like you use a 2540 in a Direct-Connect configuration. We will be going to Solaris Cluster training eventually, but not soon. In the meantime, I'd like to keep/gain some skills/experience.
Potential (cheap) Home Cluster:
(2) SunFire V245 or (2) T1000 or (2) something_cheap
connected to
(1) Storedge D2 or (1) Storedge S1
My main desire, is for the interconnects and failover on this Home Cluster to behave the same way as the T5120s with the 2540 Array. Example, if I yank a HD (or replace) then I'd like it to give very similar messages to what I will face at work in the future. I'd like the creation of ZFS pools etc to work similarly. I'd like SCSI cards (HBAs or whatever) and cabling to be cheap.
Any recommendations on hardware> Servers? Arrays? SCSI Cards/cabling?
Thanks,
Scott

I settled on:
(2) Sunfire V210
Storedge 3120
Connected by VHDCI
All used equipment at a cheap price. Should be a great little testbed.

Solaris Cluster 3.3u2 configuration - Solaris Ldoms

Similar Messages

Maybe you are looking for