IPMP Failure

Hello buddies,
We are facing a severe issues with the IPMP configuration that was working fine till the last patch ( Feb Patch)installation where we have virtual IP 192.168.1.1 and two physical IPs 192.168.1.2 & 192.168.1.3 that binded on ce0 and ce1 respectively. Let me come to the issue... All interfaces in the iPMP group are getting down ( three times in a day) and complaing that the default router is not pingable by the server through the two physical links. I am pretty sure that the switch where both links are connected are working fine without any outage. The follwing are the logs
Mar 15 19:54:13 serv1 in.mpathd[2095]: [ID 594170 daemon.error] NIC failure detected on ce1 of group ipmp-pub
Mar 15 19:54:13 serv1 in.mpathd[2095]: [ID 594170 daemon.error] NIC failure detected on ce1 of group ipmp-pub
Mar 15 19:54:13 serv1 in.mpathd[2095]: [ID 594170 daemon.error] NIC failure detected on ce1 of group ipmp-pub
Mar 15 19:54:13 serv1 in.mpathd[2095]: [ID 832587 daemon.error] Successfully failed over from NIC ce1 to NIC ce0
Mar 15 19:54:13 serv1 in.mpathd[2095]: [ID 832587 daemon.error] Successfully failed over from NIC ce1 to NIC ce0
Mar 15 19:54:13 serv1 in.mpathd[2095]: [ID 832587 daemon.error] Successfully failed over from NIC ce1 to NIC ce0
Mar 15 19:54:39 serv1 in.mpathd[2095]: [ID 168056 daemon.error] All Interfaces in group ipmp-pub have failed
Mar 15 19:54:39 serv1 in.mpathd[2095]: [ID 168056 daemon.error] All Interfaces in group ipmp-pub have failed
Mar 15 19:54:39 serv1 in.mpathd[2095]: [ID 168056 daemon.error] All Interfaces in group ipmp-pub have failed
Mar 15 20:33:32 serv1 in.mpathd[2095]: [ID 620804 daemon.error] Successfully failed back to NIC ce1
Mar 15 20:33:32 serv1 in.mpathd[2095]: [ID 620804 daemon.error] Successfully failed back to NIC ce1
Mar 15 20:33:32 serv1 in.mpathd[2095]: [ID 620804 daemon.error] Successfully failed back to NIC ce1
Mar 15 20:33:32 serv1 in.mpathd[2095]: [ID 299542 daemon.error] NIC repair detected on ce1 of group ipmp-pub
Mar 15 20:33:32 serv1 in.mpathd[2095]: [ID 299542 daemon.error] NIC repair detected on ce1 of group ipmp-pub
Mar 15 20:33:32 serv1 in.mpathd[2095]: [ID 299542 daemon.error] NIC repair detected on ce1 of group ipmp-pub
Mar 15 20:33:32 serv1 in.mpathd[2095]: [ID 237757 daemon.error] At least 1 interface (ce1) of group ipmp-pub has repaired
Any help would be great appreciable.
Thanks,
Muhammed Afsal K.S

lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
     inet 127.0.0.1 netmask ff000000
ce0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2
     inet 192.168.217.6 netmask ffffff00 broadcast 192.168.217.255
     groupname ipmp1
     ether 0:3:ba:b0:5d:54
ce3: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
     inet 192.168.217.20 netmask ffffff00 broadcast 192.168.217.255
     groupname ipmp1
     ether 0:3:ba:95:5d:6e
ce3:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 4
     inet 192.168.217.4 netmask ffffff00 broadcast 192.168.217.255
General speaking,
When I switch float IP from ce0 to ce3, IPMP will say ce0 MAC is "trying to be our address ....", then ce0 test IP failed, FLOAT IP didn't failover.
When I switch float IP from ce3 to ce0, IPMP will say ce3 MAC is "trying to be our address ....",
then ce0 test IP failed, FLOAT IP didn't failover.
In my viewpoint, float NIC MAC & address information may be cached in CICSO device's RAM, not released in time, I think.

Similar Messages

IPMP failures on bge Interface

We've been testing IPMP on Solaris Sparc hosts that also have the Apani IPSec Agent installed. It works fine on older hosts that have 'qfe' and 'le' interfaces, but our v210's and T1000's with 'bge' interfaces have a problem. If we configure an IPMP group to use, say, bge0 and bge1 (with bge0 as the primary interface), it works fine. Disconnecting bge0 causes a failover to bge1, also fine. Disconnecting bge1 causes the following errors:
Nov 2 10:32:29 cs22 in.mpathd[146]: NIC failure detected on bge1 of group test
Nov 2 10:32:29 cs22 in.mpathd[146]: Successfully failed over from NIC bge1 to NIC bge0
Nov 2 10:32:37 cs2 in.mpathd[146]: All Interfaces in group test have failed
All interfaces fail, even though bge0 is still connected and was active before disconnecting bge1. The system recovers once bge0 is reconnected. The two interfaces are physically connected to the same switch, and the hostname.bgeX files are:
-------- hostname.bge0
cs22 netmask + broadcast + group test up \
addif cs21 deprecated -failover netmask + broadcast + up
-------- hostname.bge1
sp12 netmask + broadcast + group test up \
addif sp16 deprecated -failover netmask + broadcast + up
Any help would be appreciated, thanks in advance.

Hello again,
When gathering data for the previous reply, I also noticed that the default route had not been set. We usually do specify that, so I added that to the configuration. But, the host had found the correct router previously, it's 63.192.77.9. Specifying it did not change the problem symptoms, anyway. Here's the other requested info:
-> netstat -rn
Routing Table: IPv4
Destination           Gateway           Flags Ref   Use   Interface
63.192.77.0          63.192.77.12         U         1      5 bge1
63.192.77.0          63.192.77.22         U         1      1 bge0
63.192.77.0          63.192.77.22         U         1      0 bge0:1
63.192.77.0          63.192.77.12         U         1      0 bge1:1
224.0.0.0            63.192.77.22         U         1      0 bge0
default              63.192.77.9          UG        1      0
127.0.0.1            127.0.0.1            UH        7     93 lo0
-> routeadm
              Configuration   Current              Current
                     Option   Configuration        System State
            IPv4 forwarding   disabled             disabled
               IPv4 routing   default (disabled)   disabled
            IPv6 forwarding   disabled             disabled
               IPv6 routing   disabled             disabled
        IPv4 routing daemon   "/usr/sbin/in.routed"
   IPv4 routing daemon args   ""
   IPv4 routing daemon stop   "kill -TERM `cat /var/tmp/in.routed.pid`"
        IPv6 routing daemon   "/usr/lib/inet/in.ripngd"
   IPv6 routing daemon args   "-s"
   IPv6 routing daemon stop   "kill -TERM `cat /var/tmp/in.ripngd.pid`"
r
-> arp -an
Net to Media Table: IPv4
Device   IP Address               Mask      Flags   Phys Addr
bge1   63.192.77.1          255.255.255.255       00:03:ba:c0:77:75
bge0   63.192.77.9          255.255.255.255       00:16:46:f1:b5:c2
bge1   63.192.77.9          255.255.255.255       00:16:46:f1:b5:c2
bge1   63.192.77.186        255.255.255.255       00:c0:4f:60:6a:ab
bge0   63.192.77.186        255.255.255.255       00:c0:4f:60:6a:ab
bge1   63.192.77.191        255.255.255.255       00:0c:f1:bf:1d:01
bge0   63.192.77.191        255.255.255.255       00:0c:f1:bf:1d:01
bge1   63.192.77.169        255.255.255.255       00:0c:f1:bf:1c:92
bge0   63.192.77.169        255.255.255.255       00:0c:f1:bf:1c:92
bge1   63.192.77.175        255.255.255.255       00:c0:4f:60:68:64
bge0   63.192.77.175        255.255.255.255       00:c0:4f:60:68:64
bge1   63.192.77.144        255.255.255.255       00:c0:4f:60:68:94
bge0   63.192.77.144        255.255.255.255       00:c0:4f:60:68:94
bge1   63.192.77.150        255.255.255.255       00:c0:4f:60:6a:70
bge0   63.192.77.150        255.255.255.255       00:c0:4f:60:6a:70
bge0   63.192.77.130        255.255.255.255       00:0c:f1:bf:1d:1f
bge1   63.192.77.130        255.255.255.255       00:0c:f1:bf:1d:1f
bge1   63.192.77.128        255.255.255.255       00:0c:f1:bf:1c:65
bge0   63.192.77.128        255.255.255.255       00:0c:f1:bf:1c:65
bge1   63.192.77.242        255.255.255.255       00:0d:56:0b:eb:2a
bge0   63.192.77.242        255.255.255.255       00:0d:56:0b:eb:2a
bge1   63.192.77.243        255.255.255.255       00:0f:1f:91:c1:9b
bge0   63.192.77.243        255.255.255.255       00:0f:1f:91:c1:9b
bge1   63.192.77.240        255.255.255.255       00:13:72:17:cb:13
bge0   63.192.77.240        255.255.255.255       00:13:72:17:cb:13
bge1   63.192.77.247        255.255.255.255       00:c0:4f:60:6a:e6
bge0   63.192.77.247        255.255.255.255       00:c0:4f:60:6a:e6
bge1   63.192.77.224        255.255.255.255       00:09:6b:2e:61:dd
bge0   63.192.77.224        255.255.255.255       00:09:6b:2e:61:dd
bge1   63.192.77.225        255.255.255.255       00:11:11:c4:9c:eb
bge0   63.192.77.225        255.255.255.255       00:11:11:c4:9c:eb
bge1   63.192.77.236        255.255.255.255       00:03:ba:eb:17:6d
bge0   63.192.77.236        255.255.255.255       00:03:ba:eb:17:6d
bge1   63.192.77.210        255.255.255.255       00:11:11:b1:2b:6e
bge0   63.192.77.210        255.255.255.255       00:11:11:b1:2b:6e
bge1   63.192.77.222        255.255.255.255       00:30:6e:08:ed:3a
bge0   63.192.77.222        255.255.255.255       00:30:6e:08:ed:3a
bge1   63.192.77.193        255.255.255.255       00:13:72:23:32:aa
bge0   63.192.77.193        255.255.255.255       00:13:72:23:32:aa
bge1   63.192.77.207        255.255.255.255       00:0c:f1:b6:26:aa
bge0   63.192.77.207        255.255.255.255       00:0c:f1:b6:26:aa
bge1   63.192.77.204        255.255.255.255       00:c0:4f:60:68:5b
bge0   63.192.77.204        255.255.255.255       00:c0:4f:60:68:5b
bge1   63.192.77.48         255.255.255.255       00:0a:95:99:e4:40
bge0   63.192.77.48         255.255.255.255       00:0a:95:99:e4:40
bge0   63.192.77.49         255.255.255.255       00:03:93:90:52:f6
bge1   63.192.77.61         255.255.255.255       00:c0:4f:60:6a:75
bge0   63.192.77.61         255.255.255.255       00:c0:4f:60:6a:75
bge1   63.192.77.35         255.255.255.255       00:30:6e:49:41:50
bge0   63.192.77.35         255.255.255.255       00:30:6e:49:41:50
bge1   63.192.77.36         255.255.255.255       00:16:35:3e:7d:0a
bge0   63.192.77.36         255.255.255.255       00:16:35:3e:7d:0a
bge0   63.192.77.42         255.255.255.255       00:11:11:c4:9d:05
bge1   63.192.77.42         255.255.255.255       00:11:11:c4:9d:05
bge1   63.192.77.40         255.255.255.255       00:0c:f1:bf:1f:8d
bge0   63.192.77.40         255.255.255.255       00:0c:f1:bf:1f:8d
bge1   63.192.77.41         255.255.255.255       00:0c:f1:bf:1d:10
bge0   63.192.77.41         255.255.255.255       00:0c:f1:bf:1d:10
bge0   63.192.77.19         255.255.255.255       08:00:20:f0:ea:e4
bge1   63.192.77.19         255.255.255.255       08:00:20:f0:ea:e4
bge1   63.192.77.16         255.255.255.255 SP    00:14:4f:2a:9b:83
bge0   63.192.77.22         255.255.255.255 SP    00:14:4f:2a:9b:82
bge0   63.192.77.23         255.255.255.255       00:09:6b:3e:2b:82
bge1   63.192.77.23         255.255.255.255       00:09:6b:3e:2b:82
bge0   63.192.77.21         255.255.255.255 SP    00:14:4f:2a:9b:82
bge1   63.192.77.29         255.255.255.255       00:09:6b:2e:46:51
bge0   63.192.77.29         255.255.255.255       00:09:6b:2e:46:51
bge0   63.192.77.1          255.255.255.255       00:03:ba:c0:77:75
bge1   63.192.77.12         255.255.255.255 SP    00:14:4f:2a:9b:83
bge0   63.192.77.115        255.255.255.255       00:0c:f1:bf:1c:e6
bge1   63.192.77.115        255.255.255.255       00:0c:f1:bf:1c:e6
bge1   63.192.77.122        255.255.255.255       00:10:83:f9:34:d4
bge0   63.192.77.122        255.255.255.255       00:10:83:f9:34:d4
bge1   63.192.77.125        255.255.255.255       00:0f:1f:91:bf:7d
bge0   63.192.77.125        255.255.255.255       00:0f:1f:91:bf:7d
bge1   63.192.77.99         255.255.255.255       00:0c:f1:bf:1a:52
bge0   63.192.77.99         255.255.255.255       00:0c:f1:bf:1a:52
bge1   63.192.77.100        255.255.255.255       00:0c:f1:b6:26:b4
bge0   63.192.77.100        255.255.255.255       00:0c:f1:b6:26:b4
bge1   63.192.77.101        255.255.255.255       00:0c:f1:bf:1c:fe
bge0   63.192.77.101        255.255.255.255       00:0c:f1:bf:1c:fe
bge1   63.192.77.107        255.255.255.255       00:0d:56:14:48:4d
bge0   63.192.77.107        255.255.255.255       00:0d:56:14:48:4d
bge1   63.192.77.110        255.255.255.255       00:c0:4f:60:6a:44
bge0   63.192.77.110        255.255.255.255       00:c0:4f:60:6a:44
bge1   63.192.77.108        255.255.255.255       00:14:bf:31:ec:e2
bge0   63.192.77.108        255.255.255.255       00:14:bf:31:ec:e2
bge0   63.192.77.80         255.255.255.255       00:16:cb:a6:5e:3d
bge1   63.192.77.80         255.255.255.255       00:16:cb:a6:5e:3d
bge1   63.192.77.92         255.255.255.255       00:40:63:d3:8c:46
bge0   63.192.77.92         255.255.255.255       00:40:63:d3:8c:46
bge1   63.192.77.68         255.255.255.255       00:0c:f1:b6:27:10
bge0   63.192.77.68         255.255.255.255       00:0c:f1:b6:27:10
bge1   63.192.77.69         255.255.255.255       00:13:72:17:ca:4a
bge0   63.192.77.69         255.255.255.255       00:13:72:17:ca:4a
bge1   63.192.77.73         255.255.255.255       00:03:93:d1:db:cc
bge0   63.192.77.73         255.255.255.255       00:03:93:d1:db:cc
bge1   63.192.77.77         255.255.255.255       00:30:65:a8:22:bc
bge0   63.192.77.77         255.255.255.255       00:30:65:a8:22:bc
bge1   224.0.0.0            240.0.0.0       SM    01:00:5e:00:00:00
bge0   224.0.0.0            240.0.0.0       SM    01:00:5e:00:00:00
-> ps -aef
     UID   PID PPID   C    STIME TTY         TIME CMD
    root     0     0   0 15:11:12 ?           0:11 sched
    root     1     0   0 15:11:13 ?           0:00 /sbin/init
    root     2     0   0 15:11:13 ?           0:00 pageout
    root     3     0   0 15:11:13 ?           0:00 fsflush
daemon   196     1   0 15:11:37 ?           0:00 /usr/sbin/rpcbind
    root     7     1   0 15:11:15 ?           0:10 /lib/svc/bin/svc.startd
    root     9     1   0 15:11:16 ?           0:16 /lib/svc/bin/svc.configd
    root   256     1   0 15:11:40 ?           0:00 /usr/sbin/cron
    root   335     1   0 15:11:49 ?           0:00 /usr/sbin/syslogd
    root   113     1   0 15:11:33 ?           0:00 /usr/sbin/nscd -S passwd,yes
    root   726   691   0 15:16:16 pts/1       0:00 ps -aef
daemon   201     1   0 15:11:37 ?           0:00 /usr/lib/nfs/statd
    root   200     1   0 15:11:37 ?           0:00 /usr/sbin/keyserv
    root   192     1   0 15:11:36 ?           0:01 /opt/apani/uagent/nlagent
daemon    86     1   0 15:11:26 ?           0:00 /usr/lib/crypto/kcfd
    root   152     1   0 15:11:35 ?           0:00 /usr/lib/inet/in.mpathd -a
    root   212     7   0 15:11:38 ?           0:00 /usr/lib/saf/sac -t 300
    root    89     1   0 15:11:26 ?           0:00 /usr/lib/picl/picld
daemon   247     1   0 15:11:40 ?           0:00 /usr/lib/nfs/nfs4cbd
    root   102     1   0 15:11:28 ?           0:00 /usr/lib/power/powerd
    root    98     1   0 15:11:27 ?           0:00 /usr/lib/sysevent/syseventd
    root   215     1   0 15:11:38 ?           0:00 /usr/sbin/nis_cachemgr
daemon   214     1   0 15:11:38 ?           0:00 /usr/lib/nfs/lockd
    root   213     1   0 15:11:38 ?           0:00 /usr/lib/utmpd
    root   217     7   0 15:11:38 console     0:00 -sh
    root   223   192   0 15:11:39 ?           0:00 inm -p9165
    root   222   212   0 15:11:39 ?           0:00 /usr/lib/saf/ttymon
daemon   255     1   0 15:11:40 ?           0:00 /usr/lib/nfs/nfsmapid
    root   399   397   0 15:11:52 ?           0:00 /usr/sadm/lib/smc/bin/smcboot
    root   252     1   0 15:11:40 ?           0:04 /usr/lib/inet/inetd start
    root   398   397   0 15:11:52 ?           0:00 /usr/sadm/lib/smc/bin/smcboot
    root   317     1   0 15:11:48 ?           0:00 /usr/lib/autofs/automountd
    root   359     1   0 15:11:50 ?           0:00 /usr/lib/sendmail -bd -q15m
    root   448   447   0 15:11:53 ?           0:00 /usr/lib/locale/ja/wnn/jserver_m
    root   351     1   0 15:11:50 ?           0:02 /usr/lib/fm/fmd/fmd
    root   674   252   0 15:12:14 ?           0:00 /usr/sbin/in.telnetd
    root   347     1   0 15:11:50 ?           0:00 /usr/lib/ssh/sshd
   smmsp   360     1   0 15:11:50 ?           0:00 /usr/lib/sendmail -Ac -q15m
    root   461     1   0 15:11:53 ?           0:00 /usr/lib/locale/ja/atokserver/atokmngdaemon
    root   397     1   0 15:11:52 ?           0:00 /usr/sadm/lib/smc/bin/smcboot
    root   468   459   0 15:11:53 ?           0:00 htt_server -port 9010 -syslog -message_locale C
    root   441     1   0 15:11:53 ?           0:00 /usr/lib/locale/ja/wnn/dpkeyserv
    root   447     1   0 15:11:53 ?           0:00 /usr/lib/locale/ja/wnn/jserver
    root   459     1   0 15:11:53 ?           0:00 /usr/lib/im/htt -port 9010 -syslog -message_locale C
    root   512     1   0 15:11:55 ?           0:00 /usr/lib/snmp/snmpdx -y -c /etc/snmp/conf
    root   520     1   0 15:11:56 ?           0:00 /usr/lib/dmi/dmispd
    root   528     1   0 15:11:56 ?           0:00 /usr/sbin/vold
    root   521     1   0 15:11:56 ?           0:00 /usr/lib/dmi/snmpXdmid -s cstoc77022
    root   511     1   0 15:11:55 ?           0:00 /usr/dt/bin/dtlogin -daemon
    root   691   677   0 15:12:18 pts/1       0:00 bash
    root   677   674   0 15:12:14 pts/1       0:00 -sh
    root   585     1   0 15:11:57 ?           0:00 /usr/sfw/sbin/snmpd

Cluster node reboots after network failure

hi all,
The suncluster 3.1 8/05 with 2 nodes (E2900) was working fine without any errors in the sccheck.
yesterday one node rebooted saying a network failure,errors in the massage file are
Jan 17 08:00:36 PRD in.mpathd[221]: [ID 594170 daemon.error] NIC failure detected on ce0 of group sc_ipmp0
Jan 17 08:00:36 PRD Cluster.PNM: [ID 890413 daemon.notice] sc_ipmp0: state transition from OK to DOWN.
Jan 17 08:00:47 PRD Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource PROD status on node PRD change to R_FM_DEGRADED
Jan 17 08:00:47 PRD Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource PROD status msg on node PRD change to <IPMP Failure.>
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 529407 daemon.notice] resource group CFS state on node PRD change to RG_PENDING_OFFLINE
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource PROD state on node PRD change to R_MON_STOPPING
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hafoip_monitor_stop> for resource <PROD>, resource group <CFS>, timeout <300> seconds
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 736390 daemon.notice] method <hafoip_monitor_stop> completed successfully for resource <PROD>, resource group <CFS>, time used: 0% of timeout <300 seconds>
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource PROD state on node PRD change to R_ONLINE_UNMON
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource PROD state on node PRD change to R_STOPPING
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hafoip_stop> for resource <PROD>, resource group <CFS>, timeout <300> seconds
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource PROD status on node PRD change to R_FM_UNKNOWN
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource PROD status msg on node PRD change to <Stopping>
Jan 17 08:00:51 PRD ip: [ID 678092 kern.notice] TCP_IOC_ABORT_CONN: local = 172.016.005.025:0, remote = 000.000.000.000:0, start = -2, end = 6
Jan 17 08:00:51 PRD ip: [ID 302654 kern.notice] TCP_IOC_ABORT_CONN: aborted 53 connections
what can be the reason for reabooting?
is there any way to avoid this, with only a failover?
rgds
Message was edited by:
suj

What is in that resource group? The cause is probably something with Failover_mode=HARD set. Check the manual reference section for this. The option would be to set the Failover_mode=SOFT.
Tim
---

NAS ZFS 7420c

Hi All,
I'm confuse the clustering feature on NAS ZFS 7420c. When the clustering feature'll failover on NAS ZFS 7420c? Which resource are monitored by clustron?
Our customer concern network failure. below is scenario
- data link failure
- IPMP failure
- LACP failure
- network port failure
- network card failure
- network cable failure
- network switch failure ( not redundacy )
The NAS ZFS 7420c'll be failover to other node or not.. if it have above scenario. Could you pls help me to consult this scenairo on the clustering feature in ZFS 7420c.
Thank you
Sombut J.

Hi.
Administrator can manual switch service from one node to other.
Every node monitor that all services running normal and restart him when sevice died. It's not cluster features, it's features of single node.
NetApp can failover from one node to other, but it's not happen automaticaly when one node lost some network connections.
So it't look same as ZFS storage.
For resolve network problem you should configure link agregation or IPMP. It's protect from lost some network links without requirrements initiate cluster failover.
Regards.

Resource R_FM_DEGRADED

When resource goes R_FM_DEGRADED it can be used from users or it's in a state where it cannot be used ?!

Hi,
did you expect anything else but "it depends"?
In general degraded means, that there is a partial problem. E.g. logical IP addresses can go into FM_DEGRADED when there is an IPMP failure. Now, the IP address is still there, and you can talk to it - locally, but connectivity to the outside world is impossible.
Similar with replication in the SC Geo environment. If the fault monitor finds out that replication does not work, e.g. due to a broken link between 2 sites, the resource goes into state degraded. You can still use it, but it lacks some of its functionality.
Hartmut

Creation of IPMP Group failure

Hi All,
I used the following commands to create the IPMP Group for 10 Gbe interfaces but it seems to fail:
root@ebsprdb1 # ipadm create-ipmp ipmp1
root@ebsprdb1 # ipadm create-ip net16
root@ebsprdb1 # ipadm create-ip net33
root@ebsprdb1 # ipadm add-ipmp -i net16 -i net33 ipmp1
root@ebsprdb1 # ipadm create-addr -T static -a ebsprdb1-data/16 ipmp1/data1
root@ebsprdb1 # ipadm create-addr -T static -a ebsprdb1-vsw2-test1/16 net16/test
root@ebsprdb1 # ipadm create-addr -T static -a ebsprdb1-vsw3-test1/16 net33/test
root@ebsprdb1 # cat /etc/hosts
# Copyright 2009 Sun Microsystems, Inc. All rights reserved.
# Use is subject to license terms.
# Internet host table
::1 localhost
127.0.0.1 localhost loghost
172.16.4.51 ebsprdb1.oocep.com ebsprdb1
172.16.4.52 ebsprdb1-vsw0-test1.oocep.com ebsprdb1-vsw0-test1
172.16.4.53 ebsprdb1-vsw1-test1.oocep.com ebsprdb1-vsw1-test1
10.0.0.130 ebsprdb1-data
10.0.0.131 ebsprdb1-vsw2-test1
10.0.0.132 ebsprdb1-vsw3-test1
root@ebsprdb1 # ipadm
NAME CLASS/TYPE STATE UNDER ADDR
ipmp0 ipmp ok -- --
ipmp0/data1 static ok -- 172.16.4.51/24
ipmp1 ipmp ok -- --
ipmp1/data1 static ok -- 10.0.0.130/16
lo0 loopback ok -- --
lo0/v4 static ok -- 127.0.0.1/8
lo0/v6 static ok -- ::1/128
net14 ip ok ipmp0 --
net14/test static ok -- 172.16.4.52/24
net16 ip failed ipmp1 --
net16/test static ok -- 10.0.0.131/16
net25 ip ok -- --
net25/v4 static ok -- 169.254.182.77/24
net29 ip ok ipmp0 --
net29/test static ok -- 172.16.4.53/24
net33 ip ok ipmp1 --
net33/test static ok -- 10.0.0.132/16
As soon as I add the net16 device to the IPMP group, there is a failure.
Can anyone please help?
Regards.

Julius,
Thanks for the update. As suggested by you i have inserted one entry of auth group in TBRG table against FI object with SE16.
Now how do we maintain the view of V_TBRG. Is it from SE11?, if yes then i should do this step from ABAP login.
But what i heard is, this activity is purely involved by Basis people.
Please suggest.
Rgds,
Durga.

Oracle 10g CRS autorecovery from network failures - Solaris with IPMP

Hi all,
Just wondering if anyone has experience with a setup similar to mine. Let me first apologise for the lengthy introduction that follows >.<
A quick run-down of my implementation: Sun SPARC Solaris 10, Oracle CRS, ASM and RAC database patched to version 10.2.0.4 respectively, no third-party cluster software used for a 2-node cluster. Additionally, the SAN storage is attached directly with fiber cable to both servers, and the CRS files (OCR, voting disks) are always visible to the servers, there is no switch/hub between the server and the storage. There is IPMP configured for both the public and interconnect network devices. When performing the usual failover tests for IPMP, both the OS logs and the CRS logs show a failure detected, and a failover to the surviving network interface (on both the public and the private network devices).
For the private interconnect, when both of the network devices are disabled (by manually disconnecting the network cables), this results in the 2nd node rebooting, and the CRS process starting, but unable to synchronize with the 1st node (which is running fine the whole time). Further, when I look at the CRS logs, it is able to correctly identify all the OCR files and voting disks. When the network connectivity is restored, both the OS and CRS logs reflect this connection has been repaired. However, the CRS logs at this point still state that node 1 (which is running fine) is down, and the 2nd node attempts to join the cluster as the master node. When I manually run the 'crsctl stop crs' and 'crsctl start crs' commands, this results in a message stating that the node is going to be rebooted to ensure cluster integrity, and the 2nd node reboots, starts the CRS daemons again at startup, and joins the cluster normally.
For the public network, when the 2nd node is manually disconnected, the VIP is seen to not failover, and any attempts to connect to this node via the VIP result in a timeout. When connectivity is restored, as expected the OS and CRS logs acknowledge the recovery, and the VIP for node 2 automatically fails over, but the listener goes down as well. Using the 'srvctl start listener' command brings it up again, and everything is fine. During this whole process, the database instance runs fine on both nodes.
From the case studies above, I can see that the network failures are detected by the Oracle Clusterware, and a simple command run once this failure is repaired restores full functionality to the RAC database. However, is there anyway to automate this recovery, for the 2 cases stated above, so that there is no need for manual intervention by the DBAs? I was able to test case 2 (public network) with the Oracle document 805969.1 (VIP does not relocate back to the original node after public network problem is resolved), is there a similar workaround for the interconnect?
Any and all pointers would be appreciated, and again, sorry for the lengthy post.
Edited by: NS Selvam on 16-Dec-2009 20:36
changed some minor typos

hi
i ve given the shell script.i just need to run that i usually get the op like
[root@rac-1 Desktop]# sh iscsi-corntab.sh
Logging in to [iface: default, target: iqn.2010-02-23.de.sayantan-chakraborty:storage.disk1.amiens.sys1.xyz, portal: 192.168.181.10,3260]
Login to [iface: default, target: iqn.2010-02-23.de.sayantan-chakraborty:storage.disk1.amiens.sys1.xyz, portal: 192.168.181.10,3260]: successfulthe script contains :
iscsiadm -m node -T iqn.2010-02-23.de.sayantan-chakraborty:storage.disk1.amiens.sys1.xyz -p 192.168.181.10 -l
iscsiadm -m node -T iqn.2010-02-23.de.sayantan-chakraborty:storage.disk1.amiens.sys1.xyz -p 192.168.181.10 --op update -n node.startup -v automatic
(cd /dev/disk/by-path; ls -l *sayantan-chakraborty* | awk '{FS=" "; print $9 " " $10 " " $11}')
[root@rac-1 Desktop]# (cd /dev/disk/by-path; ls -l *sayantan-chakraborty* | awk '{FS=" "; print $9 " " $10 " " $11}')
ip-192.168.181.10:3260-iscsi-iqn.2010-02-23.de.sayantan-chakraborty:storage.disk1.amiens.sys1.xyz-lun-1 -> ../../sdc
[root@rac-1 Desktop]# can you post the oput of ls /dev/iscsi ??you may get like this:
[root@rac-1 Desktop]# ls /dev/iscsi
xyz
[root@rac-1 Desktop]#

IPMP failover Failure

I have one qustion about IPMP under solaris 9 9/04 SPARC 64-bit
My OS: with EIS 3.1.1 patches
Clusterware: Sun Cluster 3.1u4 with EIS 3.1.1 patches
My IPMP group contains two NICs: ce0 & ce3.
Two NICs are linked to CISCO 4506
IPMP configuration Files as the following:
*/etc/hostname.ce0*
lamp-test2 netmask + broadcast + group ipmp1 deprecated -failover up
*/etc/hostname.ce3*
lamp netmask + broadcast + group ipmp1 up \
addif lamp-test1 netmask + broadcast + deprecated -failover up
I am alway using the default in.mpathd configuration file
But once I pull out ceN NIC's cable, my IPMP group will complaint that:
+Mar 18 18:06:34 lamp in.mpathd[2770]: [ID 215189 daemon.error] The link has gone down on ce0+
+Mar 18 18:06:34 lamp in.mpathd[2770]: [ID 594170 daemon.error] NIC failure detected on ce0 of group ipmp1+
+Mar 18 18:06:34 lamp ip: [ID 903730 kern.warning] WARNING: IP: Hardware address '00:03:ba:b0:5d:54' trying to be our address 192.168.217.020!+
+Mar 18 18:06:34 lamp in.mpathd[2770]: [ID 832587 daemon.error] Successfully failed over from NIC ge0 to NIC ce0+
+Mar 18 18:06:34 lamp ip: [ID 903730 kern.warning] WARNING: IP: Hardware address '00:03:ba:b0:5d:54' trying to be our address 192.168.217.020!+
Why do solaris OS tell us Hardware Address conflict ?
But I'm sure this IPMP configuration files can cowork finely with CISCO 2950 and DLINK mini switch.
By the way, there are no the same MACs in the LAN.
I should modify some CICSO parameters?
Your advicement is so appreciated!!!

lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
     inet 127.0.0.1 netmask ff000000
ce0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2
     inet 192.168.217.6 netmask ffffff00 broadcast 192.168.217.255
     groupname ipmp1
     ether 0:3:ba:b0:5d:54
ce3: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
     inet 192.168.217.20 netmask ffffff00 broadcast 192.168.217.255
     groupname ipmp1
     ether 0:3:ba:95:5d:6e
ce3:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 4
     inet 192.168.217.4 netmask ffffff00 broadcast 192.168.217.255
General speaking,
When I switch float IP from ce0 to ce3, IPMP will say ce0 MAC is "trying to be our address ....", then ce0 test IP failed, FLOAT IP didn't failover.
When I switch float IP from ce3 to ce0, IPMP will say ce3 MAC is "trying to be our address ....",
then ce0 test IP failed, FLOAT IP didn't failover.
In my viewpoint, float NIC MAC & address information may be cached in CICSO device's RAM, not released in time, I think.

NODEAPPS FAILURE FOR IPMP ON SOLARIS LOCAL CONTAINER

Hi everyone,
I want to setup an two node RAC (oracle 11.2.0.2) in a Solaris container 3.3.
though it was certified recently I am facing issues while running the root.sh in node 1 and node 2.
/u01/app/11.2.0/grid/bin/srvctl start nodeapps -n cbq2-svu-istdb-n1 ... failed
FirstNode configuration failed at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 8388.
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed
Can you please help me out to fix it
FYI
My cluvfy precheck was successfull
./runcluvfy.sh stage -pre crsinst -n cbq2-svu-istdb-n1,cbq2-svu-istdb-n2 -verbose
Pre-check for cluster services setup was successful.
Thanks
Surendran

Thanks for the notice, been waiting for that one!

SUN Cluster 3.2, Solaris 10, Corrupted IPMP group on one node.

Hello folks,
I recently made a network change on nodename2 to add some resilience to IPMP (adding a second interface but still using a single IP address).
After a reboot, I cannot keep this host from rebooting. For the one minute that it stays up, I do get the following result from scstat that seems to suggest a problem with the IPMP configuration. I rolled back my IPMP change, but it still doesn't seem to register the IPMP group in scstat.
nodename2|/#scstat
-- Cluster Nodes --
Node name Status
Cluster node: nodename1 Online
Cluster node: nodename2 Online
-- Cluster Transport Paths --
Endpoint Endpoint Status
Transport path: nodename1:bge3 nodename2:bge3 Path online
-- Quorum Summary from latest node reconfiguration --
Quorum votes possible: 3
Quorum votes needed: 2
Quorum votes present: 3
-- Quorum Votes by Node (current status) --
Node Name Present Possible Status
Node votes: nodename1 1 1 Online
Node votes: nodename2 1 1 Online
-- Quorum Votes by Device (current status) --
Device Name Present Possible Status
Device votes: /dev/did/rdsk/d3s2 0 1 Offline
-- Device Group Servers --
Device Group Primary Secondary
Device group servers: jms-ds nodename1 nodename2
-- Device Group Status --
Device Group Status
Device group status: jms-ds Online
-- Multi-owner Device Groups --
Device Group Online Status
-- IPMP Groups --
Node Name Group Status Adapter Status
scstat: unexpected error.
I did manage to run scstat on nodename1 while nodename2 was still up between reboots, here is that result (it does not show any IPMP group(s) on nodename2)
nodename1|/#scstat
-- Cluster Nodes --
Node name Status
Cluster node: nodename1 Online
Cluster node: nodename2 Online
-- Cluster Transport Paths --
Endpoint Endpoint Status
Transport path: nodename1:bge3 nodename2:bge3 faulted
-- Quorum Summary from latest node reconfiguration --
Quorum votes possible: 3
Quorum votes needed: 2
Quorum votes present: 3
-- Quorum Votes by Node (current status) --
Node Name Present Possible Status
Node votes: nodename1 1 1 Online
Node votes: nodename2 1 1 Online
-- Quorum Votes by Device (current status) --
Device Name Present Possible Status
Device votes: /dev/did/rdsk/d3s2 1 1 Online
-- Device Group Servers --
Device Group Primary Secondary
Device group servers: jms-ds nodename1 -
-- Device Group Status --
Device Group Status
Device group status: jms-ds Degraded
-- Multi-owner Device Groups --
Device Group Online Status
-- IPMP Groups --
Node Name Group Status Adapter Status
IPMP Group: nodename1 sc_ipmp1 Online bge2 Online
IPMP Group: nodename1 sc_ipmp0 Online bge0 Online
-- IPMP Groups in Zones --
Zone Name Group Status Adapter Status
I believe that I should be able to delete the IPMP group for the second node from the cluster and re-add it, but I'm sure about how to go about doing this. I welcome your comments or thoughts on what I can try before rebuilding this node from scratch.
-AG

I was able to restart both sides of the cluster. Now both sides are online, but neither side can access the shared disk.
Lots of warnings. I will keep poking....
Rebooting with command: boot
Boot device: /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/disk@0,0:a File and args:
SunOS Release 5.10 Version Generic_141444-09 64-bit
Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Hardware watchdog enabled
Hostname: nodename2
Jul 21 10:00:16 in.mpathd[221]: No test address configured on interface ce3; disabling probe-based failure detection on it
Jul 21 10:00:16 in.mpathd[221]: No test address configured on interface bge0; disabling probe-based failure detection on it
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],0:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],0:c,raw".
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],1:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],1:c,raw".
Booting as part of a cluster
NOTICE: CMM: Node nodename1 (nodeid = 1) with votecount = 1 added.
NOTICE: CMM: Node nodename2 (nodeid = 2) with votecount = 1 added.
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
NOTICE: clcomm: Adapter bge3 constructed
NOTICE: CMM: Node nodename2: attempting to join cluster.
NOTICE: CMM: Node nodename1 (nodeid: 1, incarnation #: 1279727883) has become reachable.
NOTICE: clcomm: Path nodename2:bge3 - nodename1:bge3 online
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
NOTICE: CMM: Cluster has reached quorum.
NOTICE: CMM: Node nodename1 (nodeid = 1) is up; new incarnation number = 1279727883.
NOTICE: CMM: Node nodename2 (nodeid = 2) is up; new incarnation number = 1279728026.
NOTICE: CMM: Cluster members: nodename1 nodename2.
NOTICE: CMM: node reconfiguration #3 completed.
NOTICE: CMM: Node nodename2: joined cluster.
NOTICE: CCR: Waiting for repository synchronization to finish.
WARNING: CCR: Invalid CCR table : dcs_service_9 cluster global.
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast
==> WARNING: DCS: Error looking up services table
==> WARNING: DCS: Error initializing service 9 from file
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],0:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],0:c,raw".
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],1:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],1:c,raw".
/dev/md/rdsk/d22 is clean
Reading ZFS config: done.
NOTICE: iscsi session(6) iqn.1994-12.com.promise.iscsiarray2 online
nodename2 console login: obtaining access to all attached disks
starting NetWorker daemons:
Rebooting with command: boot
Boot device: /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/disk@0,0:a File and args:
SunOS Release 5.10 Version Generic_141444-09 64-bit
Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Hardware watchdog enabled
Hostname: nodename1
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],0:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],0:c,raw".
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],1:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],1:c,raw".
Booting as part of a cluster
NOTICE: CMM: Node nodename1 (nodeid = 1) with votecount = 1 added.
NOTICE: CMM: Node nodename2 (nodeid = 2) with votecount = 1 added.
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
NOTICE: clcomm: Adapter bge3 constructed
NOTICE: CMM: Node nodename1: attempting to join cluster.
NOTICE: bge3: link up 1000Mbps Full-Duplex
NOTICE: clcomm: Path nodename1:bge3 - nodename2:bge3 errors during initiation
WARNING: Path nodename1:bge3 - nodename2:bge3 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting for quorum.
NOTICE: bge3: link down
NOTICE: bge3: link up 1000Mbps Full-Duplex
NOTICE: CMM: Node nodename2 (nodeid: 2, incarnation #: 1279728026) has become reachable.
NOTICE: clcomm: Path nodename1:bge3 - nodename2:bge3 online
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
NOTICE: CMM: Cluster has reached quorum.
NOTICE: CMM: Node nodename1 (nodeid = 1) is up; new incarnation number = 1279727883.
NOTICE: CMM: Node nodename2 (nodeid = 2) is up; new incarnation number = 1279728026.
NOTICE: CMM: Cluster members: nodename1 nodename2.
NOTICE: CMM: node reconfiguration #3 completed.
NOTICE: CMM: Node nodename1: joined cluster.
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],0:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],0:c,raw".
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],1:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],1:c,raw".
/dev/md/rdsk/d26 is clean
Reading ZFS config: done.
NOTICE: iscsi session(6) iqn.1994-12.com.promise.iscsiarray2 online
nodename1 console login: obtaining access to all attached disks
starting NetWorker daemons:
nsrexecd
mount: /dev/md/jms-ds/dsk/d100 is already mounted or /opt/esbshares is busy

Solaris 10 IPMP and NetApp NFS v4 ACL

Okay here is my issue. I have one T5220 that has a failed NIC. IPMP is setup for active-standby and the NIC fails over on boot. I can reach the system through said interface and send traffic out the failed to NIC (ssh to another server and do a last and I get the 10.255.249.196 address). However the NFS acl I have is limiting to the shared IP address of the IPMP group (10.255.249.196). As that is what it should see. However if it appears that the NFS server is seeing the test IP (10.255.249.197) of the "failed to" NIC.. I added 10.255.249.197 to the NFS acl and all is fine. ifconfig output
e1000g1: flags=219040803<UP,BROADCAST,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED,CoS> mtu 1500 index 3
inet 10.255.249.198 netmask ffffff00 broadcast 10.255.249.255
groupname prvdmz
ether 0:21:28:24:3:1f
nxge1: flags=209040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,CoS> mtu 1500 index 6
inet 10.255.249.197 netmask ffffff00 broadcast 10.255.249.255
groupname prvdmz
ether 0:21:28:d:a4:6f
nxge1:1: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 6
inet 10.255.249.196 netmask ffffff00 broadcast 10.255.249.255
netstat -rn out put
10.255.249.0 10.255.249.196 U 1 1333 nxge1:1
10.255.249.0 10.255.249.197 U 1 0 e1000g1
10.255.249.0 10.255.249.197 U 1 0 nxge1
DNS sets the host name of the system to 10.255.249.196. But if I leave the ACL as is with the one IP address and wait about 10 minutes after a boot then I am able to mount the NFS share with acl only containing 10.255.249.196.
Here are my hosts hostname.INT files.
bash-3.00# cat /etc/hosts
# Internet host table
::1 localhost
127.0.0.1 localhost
10.255.249.196 mymcresprod2-pv
10.255.249.197 mymcresprod2-pv_nxge1
10.255.249.198 mymcresprod2-pv_e1000g1
bash-3.00# cat /etc/hostname.e1000g1
mymcresprod2-pv_e1000g1 netmask 255.255.255.0 broadcast + deprecated -failover up group prvdmz
addif mymcresprod2-pv netmask 255.255.255.0 broadcast + up
bash-3.00# cat /etc/hostname.nxge1
mymcresprod2-pv_nxge1 netmask 255.255.255.0 broadcast + deprecated -failover up group prvdmz
bash-3.00# more /etc/default/mpathd
#pragma ident "@(#)mpathd.dfl 1.2 00/07/17 SMI"
# Time taken by mpathd to detect a NIC failure in ms. The minimum time
# that can be specified is 100 ms.
FAILURE_DETECTION_TIME=10000
# Failback is enabled by default. To disable failback turn off this option
FAILBACK=yes
# By default only interfaces configured as part of multipathing groups
# are tracked. Turn off this option to track all network interfaces
# on the system
TRACK_INTERFACES_ONLY_WITH_GROUPS=yes
I think the IPMP configuration is fine but could be wrong. Any ideas on this? I mean I can add the test IP address to the ACL if need be but that just seems to be a band-aid. Or am I completely nuts and it should work this way.
Thanks,
Ben

Followin up on my post...
The moment I started to add more NFS shares, things slowed down when loggin in big time.
Only way out was to fully open a hole on the sever for every client...
I was able to lock down somewhat the Linux server, to fixed ports, and only open up those (111,2049,656,32766-32769) But on the Solaris server, I can't seem to figure this out...
Any one ?
TIA...

Best practices for IPMP and LDoms?

Having read the Oracle VM Server for SPARC 2.0 Administration Guide, it seems to imply that it might be possible to configure IPMP in the control domain (i.e. between the virtual switch interfaces), eliminating the necessity to configure IPMP on each individual guest domain. Specifically, it says:
Configuring and Using IPMP in the Service Domain
IPMP can be configured in the service domain by configuring virtual switch interfaces into a group. The following diagram shows two virtual switch instances (vsw0 and vsw1) that are bound to two different physical devices. The two virtual switch interfaces can then be plumbed and configured into an IPMP group. In the event of a physical link failure, the virtual switch device that is bound to that physical device detects the link failure. Then, the virtual switch device sends notification of this link event to the IP layer in the service domain, which results in failover to the other virtual switch device in the IPMP group.
Unfortunately, when I configure IPMP in this way -- that is, with vsw0 and vsw0 in an IPMP group -- it doesn't appear to do what I'm looking for. I can ping the service domain, but not the guest domains which rely on that virtual switch.
So, this is my question: is it possible to configure IPMP in the service domain and eliminate the need to configure IPMP in the guest domain, and if so, how? Or is it always necessary to share both virtual switches with the guest domain and setup IPMP with the guest domain, as in the example in the documentation?
Thanks,
Patrick Narkinsky

I'm not 100% sure, but I think that the documentation means that you can configure an IPMP group in the service domain for the vsw interfaces that are used by the service domain for its networking:
The two virtual switch interfaces can then be plumbed and configured into an IPMP group. So for example there would be vsw0 and vsw1 that are plumbed and an IP address would be assigned to the group. This does not mean the vsws that you are using for guest domain traffic.
If you would like to have a transparent multipathed networking (like MPxIO multipathing for SAN disks) for the virtual switches that provide networking for guest domains, I guess that you could use aggregates (see dladm command). The problem is or at least has been that the interfaces that are configured in aggregate must be connected to the same switch. Then there will be a SPOF. Not sure if modern network switch OS's have some kind of a capability to spread an aggregate (in Cisco world etherchannel) between two or more physical switches.
So IMHO you have to configure the IPMP group in the guest domains.

Set up IPMP Solaris 10 -- two interfaces, one IP

I have a tasking to set up failure-based IPMP on a T5120. I have been reading all the Sun documentation on setting up IPMP, but cannot find exactly what I am looking for. I have one IP address, and two connected NICs, and my task is to set up IPMP so that if e1000g0 fails, e1000g1 will take over. Is this possible, and if so, how?

If you have two interfaces and only want link-based failure detection, just put the group $YOUR_GROUPNAME statement in /etc/hostname.$INTERFACE file.
Say you have the two interfaces e1000g0 and e1000g1, your hostname is MyHostname and your group is MyGroup you would do the following:
put
MyHostname group MyGroup in /etc/hostname.e1000g0.
Put group MyGroup in /etc/hostname.e1000g1.
Either reboot the machine or manually configure ipmp:
ifconfig e1000g0 group MyGroup
ifconfig e1000g1 plumb group MyGroup up
in /var/adm/messages you there should be an info that no test-adress was given and that ipmp will operate in link-failure detection mode only.
Please note that officially you should create an ipmp instance first by issuing something like ifconfig $MyIPMP-Instance group MyGroup, but that step could be left out as ipmp instances are created implicitly.

Can I use Tuxedo 8.1 BRIDGE with IPMP ?

Hello All,
My purpose is to prevent Tuxedo MP partition problem.
At first, I try to configure netgroup (one with heartbeat and another with LAN) but after we terminate heartbeat Tuxedo take to much time than OS to detect network failure an failover to LAN.
I don't know why , If you know or have an experience please help me.
And the second idea is try to use BRIDGE process with IPMP, but I'm not sure it is to good idea. Does anyone can suggest me?
Thank you
Tep.

Hi,
Can you explain exactly what problem you are trying to solve? When you say you configure one netgroup with heartbeat and one with LAN, where are you setting these options?
Regards,
Todd Little
Oracle Tuxedo Chief Architect

Replacing network adapter from IPMP group (Sun cluster 3.3)

Hello!
I need to change network devices from IPMP group that have devices ge0 ge1 ge2 to ce5 ce6 ce7
I can do this procedure online? something like:
Creating files adding to the ipmp groups: /etc/hostname.ce5 ,ce6, c7
unmonitoring resources group
umplumb old devices and plumb up new devices
# scstat -i
-- IPMP Groups --
Node Name Group Status Adapter Status
IPMP Group: node0 ipmp0 Online ge1 Online
IPMP Group: node0 ipmp0 Online ge0 Online
IPMP Group: node0 ipmp1 Online ce2 Online
IPMP Group: node0 ipmp1 Online ce0 Online
IPMP Group: node1 ipmp0 Online ge1 Online
IPMP Group: node1 ipmp0 Online ge0 Online
IPMP Group: node1 ipmp1 Online ce2 Online
IPMP Group: node1 ipmp1 Online ce0 Online
/etc/hostname.ge0
n0-testge0 netmask + broadcast + group ipmp0 deprecated -failover up
addif node0 netmask + broadcast + up
/etc/hostname.ge1
n0-testge1 netmask + broadcast + group ipmp0 deprecated -failover up
/etc/hostname.ge2
backupn0 mtu 1500
# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
ce0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2
inet 172.19.1.25 netmask ffffff00 broadcast 172.19.1.255
groupname ipmp1
ether 0:14:4f:23:1d:9
ce0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 172.19.1.10 netmask ffffff00 broadcast 172.19.1.255
ce1: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 9
inet 172.16.0.129 netmask ffffff80 broadcast 172.16.0.255
ether 0:14:4f:23:1d:a
ce2: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3
inet 172.19.1.26 netmask ffffff00 broadcast 172.19.1.255
groupname ipmp1
ether 0:14:4f:26:a4:83
ce2:1: flags=1001040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,FIXEDMTU> mtu 1500 index 3
inet 172.19.1.23 netmask ffffff00 broadcast 172.19.1.255
ce4: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 8
inet 172.16.1.1 netmask ffffff80 broadcast 172.16.1.127
ether 0:14:4f:42:7f:28
dman0: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 4
inet 192.168.103.6 netmask ffffffe0 broadcast 192.168.103.31
ether 0:0:be:aa:1c:58
ge0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 5
inet 10.1.0.25 netmask ffffff00 broadcast 10.1.0.255
groupname ipmp0
ether 8:0:20:e6:61:a7
ge0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 5
inet 10.1.0.10 netmask ffffff00 broadcast 10.1.0.255
ge1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 6
inet 10.1.0.26 netmask ffffff00 broadcast 10.1.0.255
groupname ipmp0
ether 0:3:ba:c:74:62
ge1:1: flags=1001040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,FIXEDMTU> mtu 1500 index 6
inet 10.1.0.23 netmask ffffff00 broadcast 10.1.0.255
ge2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 7
inet 10.1.2.10 netmask ffffff00 broadcast 10.1.2.255
ether 8:0:20:b5:25:88
clprivnet0: flags=1009843<UP,BROADCAST,RUNNING,MULTICAST,MULTI_BCAST,PRIVATE,IPv4> mtu 1500 index 10
inet 172.16.4.1 netmask fffffe00 broadcast 172.16.5.255
ether 0:0:0:0:0:1
Thanks in advance!

You should be able to replace adapters in an IPMP group one-by-one without affecting the cluster operation.
BUT: You must make sure that the status of the new adapter in the IPMP group gets back to normal, before you start replacing the next adapter.
Solaris Cluster only reacts to IPMP group failures, not to failures of individual NICs.
Note, that IPMP is only used for the public network. Cluster interconnects are not configured using IPMP. Nevertheless the same technique can be applied to replace adapters in the cluster interconnect. You need to use the clintr command (IIRC) to replace individual NICs. Again, make sure that all the NICs of the interconnect are healthy before you continue replacing the next adapater.

IPMP Failure

Similar Messages

Maybe you are looking for