NFS Cluster

Hi gurus,
Is supported to configure more than one resource group that uses the NFS resource?
That is, to have more than one instance of NFS services.
Thanks in advance,
AB

Yes, absolutely it is supported. Doing so allows you to balance NFS services across the cluster. HOWEVER, there is one rule you must obey and that is that you can only share an NFS mount point from one cluster node at any one time.
So, suppose you had a set of home directories that you wanted to share from /failover/export/home, then rather than sharing that from both cluster nodes (in two resource groups), you would break the shares up into two pieces. For example, /failover/export/home/a_to_m and /failover/export/home/n_to_z. That way, you can share one from one node and the other from the other node using two separate resource groups.
One other rule that Solaris Cluster users need to be aware of: You cannot share an HA-NFS mount point from within the cluster to another cluster node. If you do, you risk deadlocks.
Hope that helps,
Tim
---

Similar Messages

NFS cluster node crashed

Hi all, we have a 2-node cluster running Solaris 10 11/06 and Sun Cluster 3.2.
Recently, we were asked to nfs mount on node 1 of the cluster, a directory from an external Linux host (ie node 1 of the cluster is the nfs client; the linux server is the nfs server).
A few days later, early on a Sunday morning, the linux server developed a high load and was very slow to log into. Around the same time, node 1 of the cluster rebooted. Was this reboot of node 1 a coincidence? I'm not sure.
Anyone got ideas/suggestions about this situation (eg the slow response of the nfs linux server caused node 1 of the cluster to reboot; the external nfs mount is a bad idea)?
Stewart

Hi,
your assumption sounds very unreasonable. But without any hard facts like
- the panic string
- contents of /var/adm/messages at time of crash
- configuration information
- etc.
it is impossible to tell.
Regards
Hartmut

Set samba on NFS cluster

Hi,
I have a two nodes cluster running HA NFS on top of several UFS and ZFS file systems. Now I'd like to have samba sharing those file systems using the same logical hostname. In this case, do I need to create a separate samba RG as the user guide suggested, or I can just use the same RG for the NFS and only create a samba resource through samba_register?
Thanks,

Thanks Neil for spending time on this issues on the weekend.. I should have thought about these two things when I did the testing Friday.
I just corrected the config file and re-run the test. It still failed. I notice it complained the faultmonitor user as well as the RUN_NMBD variable.
For the first complaint, I did verify the fmuser as the manual suggested through the smbclient command.
# hostname
test2
# /usr/sfw/sbin/smbd -s /global/ufs/fs1/samba/test10/lib/smb.conf -D# smbclient -s /global/ufs/fs1/samba/test10/lib/smb.conf -N -L test10
Anonymous login successful
Domain=[WINTEST] OS=[Unix] Server=[Samba 3.0.28]
Sharename Type Comment
testshare Disk
IPC$ IPC IPC Service (Samba 3.0.28)
Anonymous login successful
Domain=[WINTEST] OS=[Unix] Server=[Samba 3.0.28]
Server Comment
Workgroup Master
# smbclient -s /global/ufs/fs1/samba/test10/lib/smb.conf '//test10/scmondir' -U test10/fmuser%samba -c 'pwd;exit'
Domain=[ANSYS] OS=[Unix] Server=[Samba 3.0.28]
Current directory is \\test10\scmondir\
# pkill -TERM smbd
For the RUN_NMBD, I recalled somebody posted on this forum but his solution was not recognized.
Start_command output,
# ksh -x /opt/SUNWscsmb/samba/bin/start_samba -R 'samba-rs' -G 'nfs-rg' -X 'smbd nmbd' -B '/usr/sfw/bin' -S '/usr/sfw/sbin' -C '/global/ufs/fs1/samba/test10' \
-L '/global/ufs/fs1/samba/test10/logs' -U test10/fmuser%samba -M 'scmondir' -P '/usr/sfw/lib' -H test10/bin/pwd
2> /dev/null
PWD=/var/adm
+ + basename /opt/SUNWscsmb/samba/bin/start_samba
MYNAME=start_samba
+ + /usr/bin/awk -F_ {print $1}
+ /usr/bin/echo start_samba
parm1=start
+ + /usr/bin/awk -F_ {print $2}
+ /usr/bin/echo start_samba
parm2=samba
+ /opt/SUNWscsmb/bin/control_samba -R samba-rs -G nfs-rg -X smbd nmbd -B /usr/sfw/bin -S /usr/sfw/sbin -C /global/ufs/fs1/samba/test10 -L /global/ufs/fs1/samba/test10/logs -U test10/fmuser%samba -M scmondir -P /usr/sfw/lib -H test10 start samba
+ print SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs
+ validate_common
+ print SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs
+ rc=0
+ [ ! -d /usr/sfw/bin ]
+ debug_message Validate - samba bin directory /usr/sfw/bin exists
+ print SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs
+ [ ! -d /usr/sfw/sbin ]
+ debug_message Validate - samba sbin directory /usr/sfw/sbin exists
+ print SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs
+ [ ! -d /global/ufs/fs1/samba/test10 ]
+ debug_message Validate - samba configuration directory /global/ufs/fs1/samba/test10 exists
+ print SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs
+ [ ! -f /global/ufs/fs1/samba/test10/lib/smb.conf ]
+ debug_message Validate - smbconf /global/ufs/fs1/samba/test10/lib/smb.conf exists
+ print SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs
+ [ ! -x /usr/sfw/bin/nmblookup ]
+ debug_message Validate - nmblookup /usr/sfw/bin/nmblookup exists and is executable
+ print SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs
+ /usr/sfw/bin/nmblookup -h
+ 1> /dev/null 2>& 1
+ [ 1 -eq 0 ]
+ + /usr/bin/awk {print $2}
+ /usr/sfw/bin/nmblookup -V
VERSION=3.0.28
+ + /usr/bin/cut -d. -f1
+ /usr/bin/echo 3.0.28
SAMBA_VERSION=3
+ + /usr/bin/cut -d. -f2
+ /usr/bin/echo 3.0.28
SAMBA_RELEASE=0
+ + /usr/bin/cut -d. -f3
+ /usr/bin/echo 3.0.28
SAMBA_UPDATE=28
+ debug_message Validate - Samba version <3.0.28> is being used
+ print SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs
+ + stripfunc 3
SAMBA_VERSION=3
+ + stripfunc 0
SAMBA_RELEASE=0
+ + stripfunc 28
SAMBA_UPDATE=28
+ rc_validate_version=0
+ [ -z 3.0.28 ]
+ [ 3 -lt 2 ]
+ [ 3 -eq 2 -a 0 -le 2 -a 28 -lt 2 ]
+ [ 0 -gt 0 ]
+ debug_message Function: validate_common - End
+ print SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs
+ return 0
+ rc1=0
+ validate_samba
+ print SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs
+ rc=0
+ [ ! -d /global/ufs/fs1/samba/test10/logs ]
+ debug_message Validate - Samba log directory /global/ufs/fs1/samba/test10/logs exists
+ print SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs
+ [ ! -x /usr/sfw/sbin/smbd ]
+ debug_message Validate - smbd /usr/sfw/sbin/smbd exists and is executable
+ print SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs
+ [ ! -x /usr/sfw/sbin/nmbd ]
+ debug_message Validate - nmbd /usr/sfw/sbin/nmbd exists and is executable
+ print SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs
+ /usr/bin/grep \[ /global/ufs/fs1/samba/test10/lib/smb.conf
+ /usr/bin/cut -d[ -f2
+ /usr/bin/grep scmondir
+ /usr/bin/cut -d] -f1
+ [ -z scmondir ]
+ debug_message Validate - Faultmonitor resource scmondir exists
+ print SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs
+ [ ! -x /usr/sfw/bin/smbclient ]
+ debug_message Validate - smbclient /usr/sfw/bin/smbclient exists and is executable
+ print SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs
+ + /usr/bin/echo test10/fmuser%samba
+ /usr/bin/cut -d% -f1
+ /usr/bin/awk BEGIN { FS="\\" } {print $NF}
USER=test10/fmuser
+ /usr/bin/getent passwd test10/fmuser
+ [ -z ]
+ syslog_tag
+ print SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs
+ scds_syslog -p daemon.error -t SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs -m Validate - Couldn't retrieve faultmonitor-user <%s> from the nameservice test10/fmuser
+ rc=1
+ + /usr/bin/tr -s [:lower:] [:upper:]
+ /usr/bin/echo YES
Bad string
RUN_NMBD=
+ [ = YES -o = NO ]
+ syslog_tag
+ print SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs
+ scds_syslog -p daemon.error -t SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs -m Validate - RUN_NMBD=%s is invalid - specify YES or NO
+ rc=1
+ debug_message Function: validate_samba - End
+ print SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs
+ return 1
+ rc2=1
+ rc=1
+ [ 0 -eq 0 -a 1 -eq 0 ]
+ [ 1 -eq 0 ]
+ rc=1
+ exit 1
+ rc=1
+ exit 1
Messages file output,
# tail -f /var/adm/messages
Nov 21 11:14:28 test2 Cluster.RGM.rgmd: [ID 529407 daemon.notice] resource group nfs-rg state on node test3 change to RG_OFFLINE
Nov 21 15:52:57 test2 syslogd: going down on signal 15
Nov 21 15:53:32 test2 SC[SUNW.nfs:3.2,nfs-rg,nfs-ufs-fs2-rs,nfs_probe]: [ID 903370 daemon.debug] Command share -F nfs -o sec=sys,rw /localfs/ufs/fs2/data > /var/run/.hanfs/.run.out.2024 2>&1 failed to run: share -F nfs -o sec=sys,rw /localfs/ufs/fs2/data > /var/run/.hanfs/.run.out.2024 2>&1 exited with status 0.
Nov 21 15:53:32 test2 SC[SUNW.nfs:3.2,nfs-rg,nfs-zfs-rs,nfs_probe]: [ID 903370 daemon.debug] Command share -F nfs -o sec=sys,rw /CLSzpool/export > /var/run/.hanfs/.run.out.2026 2>&1 failed to run: share -F nfs -o sec=sys,rw /CLSzpool/export > /var/run/.hanfs/.run.out.2026 2>&1 exited with status 0.
Nov 21 15:56:18 test2 SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs: [ID 327132 daemon.error] Validate - Couldn't retrieve faultmonitor-user <test10/fmuser> from the nameservice
Nov 21 15:56:18 test2 SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs: [ID 287111 daemon.error] Validate - RUN_NMBD= is invalid - specify YES or NO
Nov 23 04:20:40 test2 cl_eventlogd[1156]: [ID 848580 daemon.info] Restarting on signal 1.
Nov 23 04:20:40 test2 last message repeated 2 times
Nov 23 04:21:43 test2 SC[SUNW.nfs:3.2,nfs-rg,nfs-ufs-fs2-rs,nfs_probe]: [ID 903370 daemon.debug] Command share -F nfs -o sec=sys,rw /localfs/ufs/fs2/data > /var/run/.hanfs/.run.out.2024 2>&1 failed to run: share -F nfs -o sec=sys,rw /localfs/ufs/fs2/data > /var/run/.hanfs/.run.out.2024 2>&1 exited with status 0.
Nov 23 04:21:43 test2 SC[SUNW.nfs:3.2,nfs-rg,nfs-zfs-rs,nfs_probe]: [ID 903370 daemon.debug] Command share -F nfs -o sec=sys,rw /CLSzpool/export > /var/run/.hanfs/.run.out.2026 2>&1 failed to run: share -F nfs -o sec=sys,rw /CLSzpool/export > /var/run/.hanfs/.run.out.2026 2>&1 exited with status 0.
Nov 24 09:11:50 test2 SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs: [ID 702911 daemon.debug] Function: validate_common - Begin
Nov 24 09:11:50 test2 SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs: [ID 702911 daemon.debug] Method: control_samba - Begin
Nov 24 09:11:50 test2 SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs: [ID 702911 daemon.debug] Validate - samba bin directory /usr/sfw/bin exists
Nov 24 09:11:50 test2 SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs: [ID 702911 daemon.debug] Validate - samba sbin directory /usr/sfw/sbin exists
Nov 24 09:11:50 test2 SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs: [ID 702911 daemon.debug] Validate - samba configuration directory /global/ufs/fs1/samba/test10 exists
Nov 24 09:11:50 test2 SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs: [ID 702911 daemon.debug] Validate - smbconf /global/ufs/fs1/samba/test10/lib/smb.conf exists
Nov 24 09:11:50 test2 SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs: [ID 702911 daemon.debug] Validate - nmblookup /usr/sfw/bin/nmblookup exists and is executable
Nov 24 09:11:51 test2 SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs: [ID 702911 daemon.debug] Validate - Samba version <3.0.28> is being used
Nov 24 09:11:51 test2 SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs: [ID 702911 daemon.debug] Function: validate_common - End
Nov 24 09:11:51 test2 SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs: [ID 702911 daemon.debug] Function: validate_samba - Begin
Nov 24 09:11:51 test2 SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs: [ID 702911 daemon.debug] Validate - Samba log directory /global/ufs/fs1/samba/test10/logs exists
Nov 24 09:11:51 test2 SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs: [ID 702911 daemon.debug] Validate - smbd /usr/sfw/sbin/smbd exists and is executable
Nov 24 09:11:51 test2 SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs: [ID 702911 daemon.debug] Validate - nmbd /usr/sfw/sbin/nmbd exists and is executable
Nov 24 09:11:51 test2 SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs: [ID 702911 daemon.debug] Validate - Faultmonitor resource scmondir exists
Nov 24 09:11:51 test2 SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs: [ID 702911 daemon.debug] Validate - smbclient /usr/sfw/bin/smbclient exists and is executable
Nov 24 09:11:51 test2 SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs: [ID 327132 daemon.error] Validate - Couldn't retrieve faultmonitor-user <test10/fmuser> from the nameservice
Nov 24 09:11:51 test2 SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs: [ID 287111 daemon.error] Validate - RUN_NMBD= is invalid - specify YES or NO
Nov 24 09:11:51 test2 SC[SUNWscsmb.samba.start]:nfs-rg:samba-rs: [ID 702911 daemon.debug] Function: validate_samba - End
Thanks again,
Jon

Adding Samba to an existing NFS-Cluster

Hello!
Currently we are running a cluster consisting of two nodes which provide two NFS servers (one for each node). Both nodes are connected to an 6140 Storage array. If one of the nodes fails, the zpool is mounted on the other node, all works fine. The NFS servers are started in the global zone. Here is the config of the current cluster:
# clrs status
Cluster Resources ===
Resource Name        Node Name            State     Status Message
fileserv01            clnode01            Online    Online - LogicalHostname online.
                      clnode02            Offline   Offline
fileserv01-hastp-rs   clnode01            Online    Online
                      clnode02            Offline   Offline
fileserv01-nfs-rs     clnode01            Online    Online - Service is online.
                      clnode02            Offline   Offline
fileserv02            clnode02            Online    Online - LogicalHostname online.
                      clnode01            Offline   Offline - LogicalHostname offline.
fileserv02-hastp-rs   clnode02            Online    Online
                      clnode01            Offline   Offline
fileserv02-nfs-rs     clnode02            Online    Online - Service is online.
                      clnode01            Offline   Offline - Completed successfully.Now we want to add two Samba servers which additionally share the same zpools. From what I read in the "Sun Cluster Data Service for Samba Guide for Solaris OS" this can only be done using zones, because I cannot run two Samba services on the same host.
At this point I am stuck. We have to failover both NFS and Samba services if a failure occurs (because they share the same data) and have therefore to be in the same resource group. But this will not work because the list of nodes for each service is different ("clnode01, clnode02" vs. "clnode01:zone1, clnode02:zone2").
I can mount the zpool from the global zone using "mount -F lofs", but I think I need a real dependency for the storage.
My question is: How can we accomplish this setup?

Hi,
You can run multiple Samba services on the same host or global zone in your case. For example, suppose you had smb1 and smb2, you could run smb1 within your fileserv01 RG and smb2 within your fileserv02 RG.
The only restriction is if you also require winbind, as only one instance of winbind is possible per global or non-global zone. In this regard if deploying smb1 and smb2 as above, if winbind is also required you would also need a scalable RG to ensure that winbindd gets started on each node.
Please see http://docs.sun.com/app/docs/doc/819-3063/gdewn?a=view in particular Restrictions for multiple Samba instances that require Winbind, also please note the "note" within that section, which indicates that "... you may also use global as the zone name ..."
Alternatively, please post the restriction from the docs.
Regards
Neil

Recommendations for Mail/Calender cluster????

Hi all, at the Uni where I work, we currently have a 2 node (2 x v480) Sun Cluster running iPlanet Messaging Server 5.2. We are looking at upgrading the mail system to Sun Java Messaging Server 6.3, and possibly Calender, later this year. What do people recommended we upgrade the hardware too? eg 2 x T2000 to run Mail and Calender? or maybe 2 x T2000 for mail and a T1000 for Calender? etc etc
Any suggestions would be greatly appreciated.
NB: This mail/calender cluster may also serve as an NFS cluster too.
Stewart

Hi,
Hi all, at the Uni where I work, we currently have a
2 node (2 x v480) Sun Cluster running iPlanet
Messaging Server 5.2. We are looking at upgrading the
mail system to Sun Java Messaging Server 6.3, and
possibly Calender, later this year. What do people
recommended we upgrade the hardware too? eg 2 x T2000
to run Mail and Calender? or maybe 2 x T2000 for mail
and a T1000 for Calender? etc etcHow many users are you looking to service?
What kind of backend disk architecture are you going to use?
Where is the directory service going?
Are you planning on using delegated admin/schema 2 for provisioning?
Is the cluster in an active/active or active/passive configuration?
Do you have a load-balancer to balance traffic between the nodes?
Any suggestions would be greatly appreciated.A few suggestions to make.
-> Try to establish a Frontend-Backend configuration and zone of applications e.g.
Frontend:
-> Calendar Frontend
-> UWC
-> Messaging MTA & MMP
Backend:
-> Calendar database
-> Messaging store
Regards,
Shane.

Quorum disk question

What is the best practice for the quorum disk asignment in a dual-node cluster ?
1.Is there any benefit to have a dedicated quorum disk and if yes - what size should it be ?
2.The manual says: "Quorum devices can contain users data". Does it mean they can contain the NFS shared data in the NFS cluster? Is there a problem that the quorum device in this case will be under the volume manager (SVM or VxVM) control ?
TIA

Best practice is to use a disk that is actively used within the cluster as a quorum disk. This means that because data is frequently read from and written to the disk, any problems with the disk will be highlighted very quickly. That way a new QD can be nominated before the old disk fails and causes the entire cluster to fail if one node then goes down. (This would happen because the remaining node would not be able to gain majority).
A QD can be any shared disk with data on undere SVM or VxVM control or just on it's own.

Moving Server Pool Storage/Image to another location

I have a server pool that is using an NFS mount as the 12G of cluster storage at 192.168.0.5. This server pool has 1 Oracle VM server with 2 VM guests that I simply can not loose.
I need to move that to NFS mount to another server at 192.168.0.31
I know I can simply create another server pool, but once I do that, I believe I need another Oracle VM server (which I do not have).
Or can you move the entire Oracle server along with the VMs to the other pool seemlessly?
What is the best way to go here?

This is my environment right now.
One Physical Host with a 3 TB internal SATA drive. Oracle VM server 3.0.1 installed.
Another Host with RHEL 5.5 on it where I installed Oracle VM Manager 3.0, on this RHEL 5.5 host, I also created an NFS export (192.168.0.5 with about 30G possible of space).
Created a Server Pool with cluster storage of the NFS mount above.
Discovered the OVM Server, and added it to the Server pool.
Created a repository on the 3 TB drive in that server that was detected by OVM manager. Created a few virtual disks.
Created a few VNICs.
Created 2 virtual machines using 3 different virtual disks that are attached to the one VM server. All working as expected so far.
Now I want to move the NFS cluster storage to a new host. (192.168.0.31 - that does not get rebooted as much)
Created a new server pool using the new cluster storage.
I have taken down both VM guests, and migrated them to the "unassigned Virtual Machines" folder.
The large 3TB repo, I un-presented to the Oracle VM server.
I then tried to remove the VM server from the server pool, and receive this
warning icon
Job Construction Phase
begin()
com.oracle.ovm.mgr.api.exception.RuleException: OVMRU_000036E Cannot remove server: ovs1.advantagedata.com, from pool: ADI. There are still OCFS2 file systems: [fs_OVM_repo], in the pool
Tue Sep 27 10:22:48 EDT 2011
at com.oracle.ovm.mgr.rules.modules.api.virtual.ClusterRules.removeServerPre(ClusterRules.java:144)
at com.oracle.ovm.mgr.api.job.JobEngine.invokeMethod(JobEngine.java:634)
at com.oracle.ovm.mgr.api.job.JobEngine.invokeMethod(JobEngine.java:598)
at com.oracle.ovm.mgr.rules.RulesEngine.runRules(RulesEngine.java:184)
at com.oracle.ovm.mgr.rules.RulesEngine.preProcess(RulesEngine.java:136)
at com.oracle.ovm.mgr.model.ModelEngine.preValidate(ModelEngine.java:513)
at com.oracle.ovm.mgr.model.ModelEngine.access$200(ModelEngine.java:59)
at com.oracle.ovm.mgr.model.ModelEngine$3.notify(ModelEngine.java:321)
at com.oracle.odof.core.AbstractVessel.invokeMethod(AbstractVessel.java:207)
at com.oracle.odof.core.storage.Transaction.invokeMethod(Transaction.java:764)
at com.oracle.odof.command.InvokeMethodCommand.process(InvokeMethodCommand.java:100)
at com.oracle.odof.core.BasicWork.processCommand(BasicWork.java:81)
at com.oracle.odof.core.storage.Transaction.processCommand(Transaction.java:467)
at com.oracle.odof.core.TransactionManager.processTransactionWork(TransactionManager.java:650)
at com.oracle.odof.core.TransactionManager.processCommand(TransactionManager.java:755)
at com.oracle.odof.core.WorkflowManager.processCommand(WorkflowManager.java:395)
at com.oracle.odof.core.WorkflowManager.processWork(WorkflowManager.java:453)
at com.oracle.odof.io.AbstractClient.run(AbstractClient.java:42)
at java.lang.Thread.run(Thread.java:662)
Job Aborted from server, cleaning up client.
Is this not possible?

Testing ha-nfs in two node cluster (cannot statvfs /global/nfs: I/O error )

Hi all,
I am testing HA-NFS(Failover) on two node cluster. I have sun fire v240 ,e250 and Netra st a1000/d1000 storage. I have installed Solaris 10 update 6 and cluster packages on both nodes.
I have created one global file system (/dev/did/dsk/d4s7) and mounted as /global/nfs. This file system is accessible form both the nodes. I have configured ha-nfs according to the document, Sun Cluster Data Service for NFS Guide for Solaris, using command line interface.
Logical host is pinging from nfs client. I have mounted there using logical hostname. For testing purpose I have made one machine down. After this step files tem is giving I/O error (server and client). And when I run df command it is showing
df: cannot statvfs /global/nfs: I/O error.
I have configured with following commands.
#clnode status
# mkdir -p /global/nfs
# clresourcegroup create -n test1,test2 -p Pathprefix=/global/nfs rg-nfs
I have added logical hostname,ip address in /etc/hosts
I have commented hosts and rpc lines in /etc/nsswitch.conf
# clreslogicalhostname create -g rg-nfs -h ha-host-1 -N
sc_ipmp0@test1, sc_ipmp0@test2 ha-host-1
# mkdir /global/nfs/SUNW.nfs
Created one file called dfstab.user-home in /global/nfs/SUNW.nfs and that file contains follwing line
share -F nfs –o rw /global/nfs
# clresourcetype register SUNW.nfs
# clresource create -g rg-nfs -t SUNW.nfs ; user-home
# clresourcegroup online -M rg-nfs
Where I went wrong? Can any one provide document on this?
Any help..?
Thanks in advance.

test1# tail -20 /var/adm/messages
Feb 28 22:28:54 testlab5 Cluster.SMF.DR: [ID 344672 daemon.error] Unable to open door descriptor /var/run/rgmd_receptionist_door
Feb 28 22:28:54 testlab5 Cluster.SMF.DR: [ID 801855 daemon.error]
Feb 28 22:28:54 testlab5 Error in scha_cluster_get
Feb 28 22:28:54 testlab5 Cluster.scdpmd: [ID 489913 daemon.notice] The state of the path to device: /dev/did/rdsk/d5s0 has changed to OK
Feb 28 22:28:54 testlab5 Cluster.scdpmd: [ID 489913 daemon.notice] The state of the path to device: /dev/did/rdsk/d6s0 has changed to OK
Feb 28 22:28:58 testlab5 svc.startd[8]: [ID 652011 daemon.warning] svc:/system/cluster/scsymon-srv:default: Method "/usr/cluster/lib/svc/method/svc_scsymon_srv start" failed with exit status 96.
Feb 28 22:28:58 testlab5 svc.startd[8]: [ID 748625 daemon.error] system/cluster/scsymon-srv:default misconfigured: transitioned to maintenance (see 'svcs -xv' for details)
Feb 28 22:29:23 testlab5 Cluster.RGM.rgmd: [ID 537175 daemon.notice] CMM: Node e250 (nodeid: 1, incarnation #: 1235752006) has become reachable.
Feb 28 22:29:23 testlab5 Cluster.RGM.rgmd: [ID 525628 daemon.notice] CMM: Cluster has reached quorum.
Feb 28 22:29:23 testlab5 Cluster.RGM.rgmd: [ID 377347 daemon.notice] CMM: Node e250 (nodeid = 1) is up; new incarnation number = 1235752006.
Feb 28 22:29:23 testlab5 Cluster.RGM.rgmd: [ID 377347 daemon.notice] CMM: Node testlab5 (nodeid = 2) is up; new incarnation number = 1235840337.
Feb 28 22:37:15 testlab5 Cluster.CCR: [ID 499775 daemon.notice] resource group rg-nfs added.
Feb 28 22:39:05 testlab5 Cluster.RGM.rgmd: [ID 375444 daemon.notice] 8 fe_rpc_command: cmd_type(enum):<5>:cmd=<null>:tag=<>: Calling security_clnt_connect(..., host=<testlab5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
Feb 28 22:39:05 testlab5 Cluster.CCR: [ID 491081 daemon.notice] resource ha-host-1 removed.
Feb 28 22:39:17 testlab5 Cluster.RGM.rgmd: [ID 375444 daemon.notice] 8 fe_rpc_command: cmd_type(enum):<5>:cmd=<null>:tag=<>: Calling security_clnt_connect(..., host=<testlab5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
Feb 28 22:39:17 testlab5 Cluster.CCR: [ID 254131 daemon.notice] resource group nfs-rg removed.
Feb 28 22:39:30 testlab5 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hafoip_validate> for resource <ha-host-1>, resource group <rg-nfs>, node <testlab5>, timeout <300> seconds
Feb 28 22:39:30 testlab5 Cluster.RGM.rgmd: [ID 375444 daemon.notice] 8 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hafoip/hafoip_validate>:tag=<rg-nfs.ha-host-1.2>: Calling security_clnt_connect(..., host=<testlab5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
Feb 28 22:39:30 testlab5 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hafoip_validate> completed successfully for resource <ha-host-1>, resource group <rg-nfs>, node <testlab5>, time used: 0% of timeout <300 seconds>
Feb 28 22:39:30 testlab5 Cluster.CCR: [ID 973933 daemon.notice] resource ha-host-1 added.

Node Manger in a cluster on NFS

Hi,
I will shortly set up a cluster. The two unix servers are on an NFS file system. So, both sides of the cluster will have access to the WebLogic installation and the domain.
So, I guess there will be two instances of Node Manager? One on each unix server. And, I guess each Node manager corresponds to one 'Machine' configured on the console. For example, the Node Manager on 'unix server 1' should be 'machine 1'.
Does anyone know how each Node Manager 'knows' which machine it is ? That is, when I start Node Manager on server 1, how will it know to start WebLogic instances configured to run on 'machine 1'. I do not see a start parameter saying something like...
'startNodeManager -machine Machine1'
Any clues appreciated!
thanks,
David.

The nodemanager runs locally on a specific machine. You couple a nodemanager to a certain machine and not
a machine to a nodemanager.
The nodemanager has a listenaddress, which is the address of the machine.
You also have configured certain weblogic servers to belong to a machine. Through this configuration the
nodemanager knows which sevrers to monitor, i.e, servers belonging to that machine.
I also saw that you are using NFS. If you are planning on migration, for example JTA, you should be a little war in the sense that distributed file
systems such as NFS typically do not provide the necessary semantics to guarantee the integrity and content of transaction logs. NFS historically
has provided no support for synchronous writes, and also suffered from file locking issues. Some NFS implementations have matured in recent
years, you should check that yours guarantees that a write operation will not return until the data is safely stored on disk.

Failing to create HA nfs storage on a shared 3310 HW Raid cluster 3.2

Hi,
I'm working on testing clustering on a couple v240s, running identitcal Sol10 10/08 and Sun Cluster 3.2. In trying things, I may have messed up the cluster. I may want to backout the cluster and start over. Is that possible, or do I need to install Solaris fresh.
But first, the problem. I have the array connect to both machines and working. I mount 1 LUN on /global/nfs using the device /dev/did/dsk/d4s0. Then I ran the commands:
# clrt register SUNW.nfs
# clrt register SUNW.HAStoragePlus
# clrt list -v
Resource Type Node List
SUNW.LogicalHostname:2 <All>
SUNW.SharedAddress:2 <All>
SUNW.nfs:3.2 <All>
SUNW.HAStoragePlus:6 <All>
# clrg create -n stnv240a,stnv240b -p PathPrefix=/global/nfs/admin nfs-rg
I enabled them just now so:
# clrg status
Cluster Resource Groups ===
Group Name Node Name Suspended Status
nfs-rg stnv240a No Online
stnv240b No Offline
Then:
# clrslh create -g nfs-rg cluster
# clrslh status
Cluster Resources ===
Resource Name Node Name State Status Message
cluster stnv240a Online Online - LogicalHostname online.
stnv240b Offline Offline
I'm guessing that 'b' is offline because it's the backup.
Finally, I get:
# clrs create -t HAStoragePlus -g nfs-rg -p AffinityOn=true -p FilesystemMountPoints=/global/nfs nfs-stor
clrs: stnv240b - Invalid global device path /dev/did/dsk/d4s0 detected.
clrs: (C189917) VALIDATE on resource nfs-stor, resource group nfs-rg, exited with non-zero exit status.
clrs: (C720144) Validation of resource nfs-stor in resource group nfs-rg on node stnv240b failed.
clrs: (C891200) Failed to create resource "nfs-stor".
On stnv240a:
# df -h /global/nfs
Filesystem size used avail capacity Mounted on
/dev/did/dsk/d4s0 49G 20G 29G 41% /global/nfs
and on stnv240b:
# df -h /global/nfs
Filesystem size used avail capacity Mounted on
/dev/did/dsk/d4s0 49G 20G 29G 41% /global/nfs
Any help? Like I said, this is a test setup. I've started over once. So I can start over if I did something irreversible.

I still have the issue. I reinstalled from scratch and installed the cluster. Then I did the following:
$ vi /etc/default/nfs
GRACE_PERIOD=10
$ ls /global//nfs
$ mount /global/nfs
$ df -h
Filesystem size used avail capacity Mounted on
/dev/global/dsk/d4s0 49G 20G 29G 41% /global/nfs
$ clrt register SUNW.nfs
$ clrt register SUNW.HAStoragePlus
$ clrt list -v
Resource Type Node List
SUNW.LogicalHostname:2 <All>
SUNW.SharedAddress:2 <All>
SUNW.nfs:3.2 <All>
SUNW.HAStoragePlus:6 <All>
$ clrg create -n stnv240a,stnv240b -p PathPrefix=/global/nfs/admin nfs-rg
$ clrslh create -g nfs-rg patience
clrslh: IP Address 204.155.141.146 is already plumbed at host: stnv240b
$ grep cluster /etc/hosts
204.155.141.140 stnv240a stnv240a.mns.qintra.com # global - cluster
204.155.141.141 cluster cluster.mns.qintra.com # cluster virtual address
204.155.141.146 stnv240b stnv240b.mns.qintra.com patience patience.mns.qintra.com # global v240 - cluster test
$ clrslh create -g nfs-rg cluster
$ clrs create -t HAStoragePlus -g nfs-rg -p AffinityOn=true -p FilesystemMountPoints=/global/nfs nfs-stor
clrs: stnv240b - Failed to analyze the device special file associated with file system mount point /global/nfs: No such file or directory.
clrs: (C189917) VALIDATE on resource nfs-stor, resource group nfs-rg, exited with non-zero exit status.
clrs: (C720144) Validation of resource nfs-stor in resource group nfs-rg on node stnv240b failed.
clrs: (C891200) Failed to create resource "nfs-stor".
Now, on the second machine (stnv240b), /dev/global does not exist, but the file system mounts anyway. I guess that's cluster magic?
$ cat /etc/vfstab
/dev/global/dsk/d4s0 /dev/global/dsk/d4s0 /global/nfs ufs 1 yes global
$ df -h /global/nfs
Filesystem size used avail capacity Mounted on
/dev/global/dsk/d4s0 49G 20G 29G 41% /global/nfs
$ ls -l /dev/global
/dev/global: No such file or directory
I followed the other thread. devfsadm and scgdevs
One other thing I notice. Both nodes mount my global on node@1
/dev/md/dsk/d6 723M 3.5M 662M 1% /global/.devices/node@1
/dev/md/dsk/d6 723M 3.5M 662M 1% /global/.devices/node@1

Parameters of NFS in Solaris 10 and Oracle Linux 6 with ZFS Storage 7420 in cluster without database

Hello,
I have ZFS 7420 in cluster and OS Solaris 10 and Oracle Linux 6 without DB and I need mount share NFS in this OS and I do not know which parameters are the best for this.
Wich are the best parameters to mount share NFS in Solaris 10 or Oracle Linux 6?
Thanks
Best regards.

Hi Pascal,
My question is because when We mount share NFS in some servers for example Exadata Database Machine or Super Cluster for best performance we need mount this shares with specific parameters, for example.
Exadata
192.168.36.200:/export/dbname/backup1 /zfssa/dbname/backup1 nfs rw,bg,hard,nointr,rsize=131072,wsize=1048576,tcp,nfsvers=3,timeo=600 0 0
Super Cluster
sscsn1-stor:/export/ssc-shares/share1 - /export/share1 nfs - yes rw,bg,hard,nointr,rsize=131072,wsize=131072,proto=tcp,vers=3
Now,
My network is 10GBE
What happen with normal servers only with OS (Solaris and Linux)?
Which parameters I need use for best performance?
or are not necessary specific parameters.
Thanks.
Best regards.

Zone cluster and NFS

Hiya folks.
Setup is, 2 global nodes running 3.3 and a zone cluster setup among them. NFS share from a Netapp filer that could be mounted on both the global zones.
I’m not aware of way how I can present this NFS share to the zone clusters.
This is a failover cluster setup and there won’t be any parallel I/O from the other cluster node.
Heard the path I should follow is loopback F/S. Seek your advice. Thanks in advance.
Cheers
osp

Hi,
I had been confused by the docs and needed confirmation before replying.
You have to issue the clnas command from the global zone but can use the -Z <zoneclustername> option to work in the zonecluster itself. E.g.
# clnas add -t netapp -p userid=nasadmin -f <passwd-file> -Z <zc> <appliance>
# clnas add-dir -Z <zc> -d <dir> <appliance>
Your proposal (it must be clnas, not clns)
clns -t netapp -u nasadmin -f /home/nasadmin/passwd.txt -Z zc1 netapp_nfs_vfiler1
clns add-dir -d /nfs_share1 netapp_nfs_vfiler1 is not quite correct.
Few concerns here, the -u and the password should be vfiler users or are they unix users ? This is the vfiler user!
where does the share get presented to on the zone cluster ?.Good question. Just give it a try.
Let us know whether that worked.
Hartmut

Any experience with NFS failover in Sun Cluster?

Hello,
I am planning to install dual-node Sun Cluster for NFS failover configuration. The SAN storage is shared between nodes via Fibre Channel. The NFS shares will be manually assigned to nodes and should fail over / takeback between nodes.
Is this setup tested well? How the NFS clients survive the failover (without "stale NFS handle" errrors)? Does it work smoothly for Solaris,Linux,FreeBSD clients?
Please share your experience.
TIA,
-- Leon

My 3 year old linux installtion on my laptop, which is my NFS client most of the time uses udp as default (kernel 2.4.19).
Anyway the key is that the NFS client, or better, the RPC implementation on the client is intelligent enough to detect a failed TCP connection and tries to reestablish it with the same IP address. Now once the cluster has failed over the logical IP the reconnect will be successful and NFS traffic continues as if nothing bad had happened. This only(!) works if the NFS mount was done with the "hard" option. Only this makes the client retry the connection.
Other "dumb" TCP based applications might not retry and thus would need manual intervention.
Regarding UFS or PxFS, it does not make a difference. NFS does not know the difference. It shares a mount point.
Hope that helped.

Global NFS Mount & Cluster

Dear All
My Development server is in LAN environment and other system QAS and PRD is in the SZ2. For transport management configuration, we need to do the global NFS mounting. But as per my company policy, there is security issue.
The Second issue is that if we mount /usr/sap/trans as global and also part of the NFS then it cluster startup will fail. Please suggest the above directory should be part of cluster or not.
Regards
Vimal Pathak

Tiffany wrote:
          > We need to store information (objects) that are global to a cluster.
          The only way you can do this is to store the information in a
          database.
          > It's my understanding that anything stored in the servlet context is
          > visible to all servers,
          No. This is not true.
          > but it resides on a network drive. Wouldn't
          > each read of this servlet context info involve a directory read hit
          > with all its implied performance degradation?
          How about WebLogic Workspaces? Is this information replicated across
          clusters? Does it live on a network drive as a file as well?
          > Hoping someone can help us out here.
          Workspaces are not replicated.
          >
          > Thanks for any help,
          >
          > Tiffany
          Cheers
          - Prasad

Local NFS / LDAP on cluster nodes

Hi,
I have a 2-node cluster (3.2 1/09) on Solaris 10 U8, providing NFS (/home) and LDAP for clients. I would like to configure LDAP and NFS clients on each cluster node, so they share user information with the rest of the machines.
I assume the right way to do this is to configure the cluster nodes the same as other clients, using the HA Logical Hostnames for the LDAP and NFS server; this way, there's always a working LDAP and NFS server for each node. However, what happens if both nodes reboot at once (for example, power failure)? As the first node boots, there is no working LDAP or NFS server, because it hasn't been started yet. Will this cause the boot to fail and require manual intervention, or will the cluster boot without NFS and LDAP clients enabled, allowing me to fix it later?

Thanks. In that case, is it safe to configure the NFS-exported filesystem as a global mount, and symlink e.g. "/home" -> "/global/home", so home directories are accessible via the normal path on both nodes? (I understand global filesystems have worse performance, but this would just be for administrators logging in with their LDAP accounts.)
For LDAP, my concern is that if svc:/network/ldap/client:default fails during startup (because no LDAP server is running yet), it might prevent the cluster services from starting, even though all names required by cluster are available from /etc.

NFS Cluster

Similar Messages

Maybe you are looking for