IPMP Configuration
hi
i need to kno whow to configure ipmp on solaris 10 in order to overcome the issue the cross cables are not supported.
my environment looks like as follows:
no third party cluster.
cluster environment is built using raw partitions s5 and s6 with external redundancy.
operating system is solaris 10 sparc.
rdbms is 10gR2 rac database.
we planned to use asm for database files on raw devices as we dont have any other option on solaris as per our environment.
plz kindly let me knw how to configure ipmp for this environment.
we are using 2 nodes (v490 servers).
and for storage 3100 server.
hi,
i already mentioned my environment very clearly.
now i like to know how to setup ipmp as cluster interconnect.
i went through metalink doc 368464.1. its not very clear to me.
this set up has to be done before clusterware installation. am i right?
if it so how to do this?
kindly revert back to me.
Similar Messages
-
Scstat freeze when it try to print the ipmp configuration
Hi All,
some of my cluster have a strange behavior sometimes.
the scstat freeze at the point of showing the ipmp configuration, and when i want to create a ressource LogicalHostname (VIP), the command freeze, and i see anything into the log file....
have someone a return regarding that?
thx a lotCheck if the heartbeat network same the public network
Install the sun cluster core patch the its working -
IPMP Configuration problem in SUN 1290 Server
I am facing problem with IPMP Configuration.
server a is having configuration like :-
ce2 :- 97.241.213.152 - Physical IP
ce6 :- 97.241.213.163 - Physical IP
ce 2:1 :- 97.241.213.162 - Virtual IP
ce2: flags=1100843<UP,BROADCAST,RUNNING,MULTICAST,ROUTER,IPv4> mtu 1500 index 3
inet 97.241.213.152 netmask ffffff80 broadcast 97.241.213.255
groupname ipmp1
ether 0:14:4f:95:c8:38
ce2:1: flags=9140843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,ROUTER,IPv4,NOFAILOVER> mtu 1500 index 3
inet 97.241.213.162 netmask ffffff80 broadcast 97.241.213.255
ce6: flags=9140843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,ROUTER,IPv4,NOFAILOVER> mtu 1500 index 7
inet 97.241.213.163 netmask ffffff80 broadcast 97.241.213.255
groupname ipmp1
ether 0:14:4f:95:c8:38
IPMP Is working Fine in this server
same way server b is having configuration :- (Here port ce2 was having problem so we have shifted to port ce1)
ce1 :- 97.241.213.154 - Physical IP
ce6 :- 97.241.213.165 - Physical IP
ce1 :1 :- 97.241.213.154 - Virtual IP
ce1: flags=19100842<BROADCAST,RUNNING,MULTICAST,ROUTER,IPv4,NOFAILOVER,FAILED> mtu 0 index 3
inet 0.0.0.0 netmask 0
groupname ipmp1
ether 0:14:4f:95:c8:60
ce1:1: flags=19140843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,ROUTER,IPv4,NOFAILOVER,FAILED> mtu 1500 index 3
inet 97.241.213.164 netmask ffffff80 broadcast 97.241.213.255
ce6: flags=9140843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,ROUTER,IPv4,NOFAILOVER> mtu 1500 index 7
inet 97.241.213.165 netmask ffffff80 broadcast 97.241.213.255
groupname ipmp1
ether 0:14:4f:95:c8:60
ce6:1: flags=1100843<UP,BROADCAST,RUNNING,MULTICAST,ROUTER,IPv4> mtu 1500 index 7
inet 97.241.213.154 netmask ffffff80 broadcast 97.241.213.255
I know that in above configuration of server b there is some mistake but not able to find? Kindly provide proper configuration with command also.
I have attached hostname.ce1 output also for server b
######## hostname.ce1#############
serverb-DCN netmask + broadcast + group ipmp1 up addif serverb-ce2-test netmask + broadcast + deprecated -failover up
######## hostname.ce6#############
serverb-ce6-test netmask + broadcast + group ipmp1 deprecated -failover upfor the first system:
ifconfig ce2 97.241.213.152 deprecated -failover group ipmp1 up
ifconfig ce6 97.241.213.163 deprecated -failover group ipmp1 up
ifconfig ce2 addif 97.241.213.162 upfor the second system (server b):
ifconfig ce1 97.241.213.154 deprecated -failover group ipmp1 up
ifconfig ce6 97.241.213.165 deprecated -failover group ipmp1 up
ifconfig ce1 97.241.213.164 upMake sure you assiganed the deprecated and nonfailover flag to your physical ip(test ip) not the virtual ip.
and check your net work connection as well.
ce2 and ce6 on server a should in the same subnet.
ce1 and ce6 on server b should in the same subnet too.
Edited by: Alicia.tang on Aug 5, 2008 5:05 PM -
IPMP configuration and zones - how to?
Hello all,
So, I've been thrown in at the deep end and have been given a brand new M4000 to get configured to host two zones. I have little zone experience and my last Solaris exposure was 7 !
Anyway, enough of the woe, this M4000 has two quad port NICs, and so, I'm going to configure two ports per subnet using IPMP and on top of the IPMP link, I will configure two v4 addresses and give one to one zone and one to the other.
My question is, how can this be best accomplished with regards to giving each zone a different address on the IPMP link.
IP addresses available = 10.221.91.2 (for zone1) and (10.221.91.3 for zone2)
So far, in the global zone I have
ipadm create-ip net2 <-----port 0 of NIC1
ipadm create-ip net6 <-----port 0 of NIC2
ipadm create-ipmp -i net2,net6 ipmp0
ipadm create-addr -T static -a 10.221.91.2/24 ipmp0/zone1
ipadm create-addr -T static -a 10.221.91.3/24 ipmp0/zone2
the output of ipmpstat -i and ipmpstat -a is all good. I can ping the addresses from external hosts.
So, how now to assign each address to the correct host. I assume I'm using shared-ip?
in the zonecfg, do I simply (as per [this documentation|http://docs.oracle.com/cd/E23824_01/html/821-1460/z.admin.task-54.html#z.admin.task-60] ):
zonecfg:zone1> add net
zonecfg:zone1:net> set address=10.221.91.2
zonecfg:zone1:net> set physical=net2
zonecfg:zone1:net> end
and what if I have many many addresses to configure per interface... for example zone1 and zone2 will also require 6 addresses on another subnet (221.206.29.0)... so how would that look in the zonecfg?
Is IPMP the correct way to be doing this? The client wants resilience above all, but these network connections are coming out of different switches thus LACP/Trunking is probably out of the question.
Many thanks for your thoughts... please let me know if you want more information
Solaris11 is a different beast altogether.
Edited by: 913229 on 08-Feb-2012 08:03
added link to the Solaris IPMP and zones docThanks for the reply....
It still didn't work... but you pointed me in the right direction. I had to remove the addresses I had configured on ipmp0 and instead put them in the zonecfg. Makes sense really. Below I have detailed my steps as per your recommendation...
I had configured the zone as minimally as I could:
zonepath=/zones/zone1
ip-type=shared
net:
address: 10.221.91.2
physical=ipmp0
but after it is installed, I try and boot it and I get:
zone 'zone1': ipmp0:2: could not bring network interface up: address in use by zone 'global: Cannot assign the requested address
So, I changed the ip-type to exclusive and I got:
WARNING: skipping network interface 'ipmp0' which is used in the global zone.
zone 'zone1': failed to add network device
which was a bit of a shame.
So, finally, I removed the addresses from ipmp0
ipadm delete-addr ipmp0/zone1
ipadm delete-addr ipmp0/zone2
and set the address in zonecfg together with the physical=ipmp0 as per your recommendation and it seems to be working.
So, am I correct in taking away from this that if using IPMP in shared-ip zones, don't set the address in the global zone, but stick it in the zone config and everyone is happy?
I think this was the only way to achieve multiple IP addresses on one interface but over two ports?
Lastly, why oh why is the gateway address in netstat -rn coming up as the address of the host?
Anyway, thanks for your help.
;) -
IPMP configuration not permenent
Hello
I've configured two interfaces to be in an IPMP group and failover to each other incase of failure, when I test it , it's fine.
+/etc/hostname.bge0+
bscs-bl netmask broadcast + \+
group production up \
addif bscs-bl-net0 deprecated netmask brodcast + -failover up+
+::::::::::::::+
+/etc/hostname.nxge0+
bscs-bl-net1 netmask broadcast + group production deprecated up+
the issue is when I reboot the system the first interface shows like it's not configured the bge0 IP is displayed as 0.0.0.0.
I suspected that the files content is not correct , but when I try the same in a shell command , the bge0 became configured in the ifconfig -a
thanks for your support
BR
HEBAHello,
There are two ways of configuring IPMP. The first one called 'probe based' is by creating test addresses like in your case. The second one called 'kernel based' you don't need test addresses.
Simply put this line on your interfaces configuration file (/etc/hostname.<>) that you want to be part of the IPMP group:
solaris10 group mpath1 up
Then reboot. After reboot, test the IPMP functionality by using command if_mpadm -d <interface> to cause a fail over procedure and ip_mpadm -r <interface> to cause a fail back procedure.
Regards,
Rei -
Netmask and IPMP configuration question...
Hello all,
I have two questions.
Question 1.
I changed ip address on Solaris 10 server.
The ip address was 122.232.243.122 and netmask was 255.255.255.0.
the /etc/netmasks file was "122.232.243.122 255.255.255.0" and it worked fine.
Today. I got new ip address and netmask that is 122.232.233.122 and 255.255.255.240.
Now, what shoud I update in /etc/netmask file?
I updated it with "122.242.244.0 255.255.255.240" but it does not work.
Question 2.
IPMP needs 3 ip addresses (primary + secondary + virtual)
Usually, hostname links primary ip address.
For example
########## /etc/hosts ############
127.0.0.1 localhost
122.232.244.122 sis01 sis01. loghost
122.232.244.123 sis02
122.232.244.124 sys03
########## /etc/hostname.ce0 #######
sis01 group production netmask + broadcast + up
addif sis03 netmask + broadcast + deprecated -failover up
########## /etc/hostname.ce1 ########
sys02 netmask + broadcast + group production deprecated -failover standby up
In this case, hostname is sys01, primary ip is 122, secondary is 124, and virtual is 123.
But what I want to do is to make "hostname is sys01, primary ip is 123, secondary is 124, and virtual is 122".
Is it possible?sun929 wrote:
I changed ip address on Solaris 10 server.
The ip address was 122.232.243.122 and netmask was 255.255.255.0.
the /etc/netmasks file was "122.232.243.122 255.255.255.0" and it worked fine.I would have expected
122.232.243.0 255.255.255.0
122.232.243.0 is the first address in the /24 that contains 122.232.243.122.
Today. I got new ip address and netmask that is 122.232.233.122 and 255.255.255.240.
Now, what shoud I update in /etc/netmask file?
I updated it with "122.242.244.0 255.255.255.240" but it does not work.122.232.233.122 is not in the /28 with 122.242.244.0. The first address that it's in would be 122.232.233.112. So:
122.232.233.112 255.255.255.240
Question 2.
IPMP needs 3 ip addresses (primary + secondary + virtual)
Usually, hostname links primary ip address.
For example
########## /etc/hosts ############
127.0.0.1 localhost
122.232.244.122 sis01 sis01. loghost
122.232.244.123 sis02
122.232.244.124 sys03
########## /etc/hostname.ce0 #######
sis01 group production netmask + broadcast + up
addif sis03 netmask + broadcast + deprecated -failover up
########## /etc/hostname.ce1 ########
sys02 netmask + broadcast + group production deprecated -failover standby up
In this case, hostname is sys01, primary ip is 122, secondary is 124, and virtual is 123.
But what I want to do is to make "hostname is sys01, primary ip is 123, secondary is 124, and virtual is 122".
Is it possible?I believe so. I think it was even recommended back when a bug caused the current method to have problems. I'm not sure if there are any problems with doing so.
Darren -
Configuring IPMP with several zones
I am trying to configure IPMP with 2 zones on Solaris 10 but it seems that an interface has failed.
ce1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,Failed> mtu 1500 index 3
inet 10.177.6.91 netmask ffffff00 broadcast 10.177.6.255
groupname zone1
ether 0:14:4f:24:87:c1
ce1:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
zone v08k39-zone1
inet 10.177.6.90 netmask ffffff00 broadcast 10.177.6.255
ce5: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 6
inet 10.177.6.92 netmask ffffff00 broadcast 10.177.6.255
groupname zone1
ether 0:14:4f:24:87:c5
root@sys # cat hostname.ce1
sys-zone1-test1 deprecated -failover netmask + broadcast + group zone1 up
root@sys # cat hostname.ce5
sys-zone1-test2 deprecated -failover netmask + broadcast + group zone1 up
Is the IPMP configuration correct or can i do away with link checking...
TIA177499-02 wrote:
Hi can you tell me how to configure IPMP in solaris 10
following are the steps which i havae followed.
root@sun4401:/> cat /etc/hostname.ce0
sun4401netmask + broadcast + group ipmp0 up
root@sun4401:/> cat /etc/hostname.ce1
group ipmp0 up
hosts file entrie are also proper, i don't know eher is the problem.
and after that taken a reboot.
Thanks
gopalYou're missing the second configuration line on your "primary" interface.
addif IPMP-hostname netmask + broadcast + up -
Traceroute does not work properly with IPMP
I know this question has been asked many times but i didn't find any solution for it. Traceroute does not work with IPMP configured as intended.
i have following interfaces configured on my system
nxge2: flags=201000842<BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 4
inet 0.0.0.0 netmask 0
groupname test
ether 0:14:4f:c0:1c:a
nxge2:1: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 4
inet 10.63.20.30 netmask fffffe00 broadcast 10.63.21.255
nxge2:2: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 4
inet 10.63.20.29 netmask fffffe00 broadcast 10.63.21.255
nxge3: flags=201000842<BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 5
inet 0.0.0.0 netmask 0
groupname test
ether 0:14:4f:c0:1c:b
now if i do
#traceroute 10.63.20.29
traceroute: Warning: Multiple interfaces found; using 10.63.20.30 @ nxge2:1
traceroute to 10.63.20.29 (10.63.20.29), 30 hops max, 40 byte packets
1 10.63.20.29 (10.63.20.29) 0.124 ms 0.036 ms 0.033 ms
it uses nxge2:1 interface instead of nxge2:2 and if disable the IPMP on nexge2 it works fine as expected
# traceroute 10.63.20.29
traceroute: Warning: Multiple interfaces found; using 10.63.20.29 @ nxge2:2
traceroute to 10.63.20.29 (10.63.20.29), 30 hops max, 40 byte packets
1 10.63.20.29 (10.63.20.29) 0.120 ms 0.034 ms 0.030 ms
Why is it so and how can i make it work with IPMPPlease refer to my reply in this post:
https://forum-en.msi.com/index.php?topic=252404.0
If you installed the proper Realtek HD audio driver, you should be able to find this. -
Oracle 10g CRS autorecovery from network failures - Solaris with IPMP
Hi all,
Just wondering if anyone has experience with a setup similar to mine. Let me first apologise for the lengthy introduction that follows >.<
A quick run-down of my implementation: Sun SPARC Solaris 10, Oracle CRS, ASM and RAC database patched to version 10.2.0.4 respectively, no third-party cluster software used for a 2-node cluster. Additionally, the SAN storage is attached directly with fiber cable to both servers, and the CRS files (OCR, voting disks) are always visible to the servers, there is no switch/hub between the server and the storage. There is IPMP configured for both the public and interconnect network devices. When performing the usual failover tests for IPMP, both the OS logs and the CRS logs show a failure detected, and a failover to the surviving network interface (on both the public and the private network devices).
For the private interconnect, when both of the network devices are disabled (by manually disconnecting the network cables), this results in the 2nd node rebooting, and the CRS process starting, but unable to synchronize with the 1st node (which is running fine the whole time). Further, when I look at the CRS logs, it is able to correctly identify all the OCR files and voting disks. When the network connectivity is restored, both the OS and CRS logs reflect this connection has been repaired. However, the CRS logs at this point still state that node 1 (which is running fine) is down, and the 2nd node attempts to join the cluster as the master node. When I manually run the 'crsctl stop crs' and 'crsctl start crs' commands, this results in a message stating that the node is going to be rebooted to ensure cluster integrity, and the 2nd node reboots, starts the CRS daemons again at startup, and joins the cluster normally.
For the public network, when the 2nd node is manually disconnected, the VIP is seen to not failover, and any attempts to connect to this node via the VIP result in a timeout. When connectivity is restored, as expected the OS and CRS logs acknowledge the recovery, and the VIP for node 2 automatically fails over, but the listener goes down as well. Using the 'srvctl start listener' command brings it up again, and everything is fine. During this whole process, the database instance runs fine on both nodes.
From the case studies above, I can see that the network failures are detected by the Oracle Clusterware, and a simple command run once this failure is repaired restores full functionality to the RAC database. However, is there anyway to automate this recovery, for the 2 cases stated above, so that there is no need for manual intervention by the DBAs? I was able to test case 2 (public network) with the Oracle document 805969.1 (VIP does not relocate back to the original node after public network problem is resolved), is there a similar workaround for the interconnect?
Any and all pointers would be appreciated, and again, sorry for the lengthy post.
Edited by: NS Selvam on 16-Dec-2009 20:36
changed some minor typoshi
i ve given the shell script.i just need to run that i usually get the op like
[root@rac-1 Desktop]# sh iscsi-corntab.sh
Logging in to [iface: default, target: iqn.2010-02-23.de.sayantan-chakraborty:storage.disk1.amiens.sys1.xyz, portal: 192.168.181.10,3260]
Login to [iface: default, target: iqn.2010-02-23.de.sayantan-chakraborty:storage.disk1.amiens.sys1.xyz, portal: 192.168.181.10,3260]: successfulthe script contains :
iscsiadm -m node -T iqn.2010-02-23.de.sayantan-chakraborty:storage.disk1.amiens.sys1.xyz -p 192.168.181.10 -l
iscsiadm -m node -T iqn.2010-02-23.de.sayantan-chakraborty:storage.disk1.amiens.sys1.xyz -p 192.168.181.10 --op update -n node.startup -v automatic
(cd /dev/disk/by-path; ls -l *sayantan-chakraborty* | awk '{FS=" "; print $9 " " $10 " " $11}')
[root@rac-1 Desktop]# (cd /dev/disk/by-path; ls -l *sayantan-chakraborty* | awk '{FS=" "; print $9 " " $10 " " $11}')
ip-192.168.181.10:3260-iscsi-iqn.2010-02-23.de.sayantan-chakraborty:storage.disk1.amiens.sys1.xyz-lun-1 -> ../../sdc
[root@rac-1 Desktop]# can you post the oput of ls /dev/iscsi ??you may get like this:
[root@rac-1 Desktop]# ls /dev/iscsi
xyz
[root@rac-1 Desktop]# -
I have one qustion about IPMP under solaris 9 9/04 SPARC 64-bit
My OS: with EIS 3.1.1 patches
Clusterware: Sun Cluster 3.1u4 with EIS 3.1.1 patches
My IPMP group contains two NICs: ce0 & ce3.
Two NICs are linked to CISCO 4506
IPMP configuration Files as the following:
*/etc/hostname.ce0*
lamp-test2 netmask + broadcast + group ipmp1 deprecated -failover up
*/etc/hostname.ce3*
lamp netmask + broadcast + group ipmp1 up \
addif lamp-test1 netmask + broadcast + deprecated -failover up
I am alway using the default in.mpathd configuration file
But once I pull out ceN NIC's cable, my IPMP group will complaint that:
+Mar 18 18:06:34 lamp in.mpathd[2770]: [ID 215189 daemon.error] The link has gone down on ce0+
+Mar 18 18:06:34 lamp in.mpathd[2770]: [ID 594170 daemon.error] NIC failure detected on ce0 of group ipmp1+
+Mar 18 18:06:34 lamp ip: [ID 903730 kern.warning] WARNING: IP: Hardware address '00:03:ba:b0:5d:54' trying to be our address 192.168.217.020!+
+Mar 18 18:06:34 lamp in.mpathd[2770]: [ID 832587 daemon.error] Successfully failed over from NIC ge0 to NIC ce0+
+Mar 18 18:06:34 lamp ip: [ID 903730 kern.warning] WARNING: IP: Hardware address '00:03:ba:b0:5d:54' trying to be our address 192.168.217.020!+
Why do solaris OS tell us Hardware Address conflict ?
But I'm sure this IPMP configuration files can cowork finely with CISCO 2950 and DLINK mini switch.
By the way, there are no the same MACs in the LAN.
I should modify some CICSO parameters?
Your advicement is so appreciated!!!lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
ce0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2
inet 192.168.217.6 netmask ffffff00 broadcast 192.168.217.255
groupname ipmp1
ether 0:3:ba:b0:5d:54
ce3: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
inet 192.168.217.20 netmask ffffff00 broadcast 192.168.217.255
groupname ipmp1
ether 0:3:ba:95:5d:6e
ce3:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 4
inet 192.168.217.4 netmask ffffff00 broadcast 192.168.217.255
General speaking,
When I switch float IP from ce0 to ce3, IPMP will say ce0 MAC is "trying to be our address ....", then ce0 test IP failed, FLOAT IP didn't failover.
When I switch float IP from ce3 to ce0, IPMP will say ce3 MAC is "trying to be our address ....",
then ce0 test IP failed, FLOAT IP didn't failover.
In my viewpoint, float NIC MAC & address information may be cached in CICSO device's RAM, not released in time, I think. -
SUN Cluster 3.2, Solaris 10, Corrupted IPMP group on one node.
Hello folks,
I recently made a network change on nodename2 to add some resilience to IPMP (adding a second interface but still using a single IP address).
After a reboot, I cannot keep this host from rebooting. For the one minute that it stays up, I do get the following result from scstat that seems to suggest a problem with the IPMP configuration. I rolled back my IPMP change, but it still doesn't seem to register the IPMP group in scstat.
nodename2|/#scstat
-- Cluster Nodes --
Node name Status
Cluster node: nodename1 Online
Cluster node: nodename2 Online
-- Cluster Transport Paths --
Endpoint Endpoint Status
Transport path: nodename1:bge3 nodename2:bge3 Path online
-- Quorum Summary from latest node reconfiguration --
Quorum votes possible: 3
Quorum votes needed: 2
Quorum votes present: 3
-- Quorum Votes by Node (current status) --
Node Name Present Possible Status
Node votes: nodename1 1 1 Online
Node votes: nodename2 1 1 Online
-- Quorum Votes by Device (current status) --
Device Name Present Possible Status
Device votes: /dev/did/rdsk/d3s2 0 1 Offline
-- Device Group Servers --
Device Group Primary Secondary
Device group servers: jms-ds nodename1 nodename2
-- Device Group Status --
Device Group Status
Device group status: jms-ds Online
-- Multi-owner Device Groups --
Device Group Online Status
-- IPMP Groups --
Node Name Group Status Adapter Status
scstat: unexpected error.
I did manage to run scstat on nodename1 while nodename2 was still up between reboots, here is that result (it does not show any IPMP group(s) on nodename2)
nodename1|/#scstat
-- Cluster Nodes --
Node name Status
Cluster node: nodename1 Online
Cluster node: nodename2 Online
-- Cluster Transport Paths --
Endpoint Endpoint Status
Transport path: nodename1:bge3 nodename2:bge3 faulted
-- Quorum Summary from latest node reconfiguration --
Quorum votes possible: 3
Quorum votes needed: 2
Quorum votes present: 3
-- Quorum Votes by Node (current status) --
Node Name Present Possible Status
Node votes: nodename1 1 1 Online
Node votes: nodename2 1 1 Online
-- Quorum Votes by Device (current status) --
Device Name Present Possible Status
Device votes: /dev/did/rdsk/d3s2 1 1 Online
-- Device Group Servers --
Device Group Primary Secondary
Device group servers: jms-ds nodename1 -
-- Device Group Status --
Device Group Status
Device group status: jms-ds Degraded
-- Multi-owner Device Groups --
Device Group Online Status
-- IPMP Groups --
Node Name Group Status Adapter Status
IPMP Group: nodename1 sc_ipmp1 Online bge2 Online
IPMP Group: nodename1 sc_ipmp0 Online bge0 Online
-- IPMP Groups in Zones --
Zone Name Group Status Adapter Status
I believe that I should be able to delete the IPMP group for the second node from the cluster and re-add it, but I'm sure about how to go about doing this. I welcome your comments or thoughts on what I can try before rebuilding this node from scratch.
-AGI was able to restart both sides of the cluster. Now both sides are online, but neither side can access the shared disk.
Lots of warnings. I will keep poking....
Rebooting with command: boot
Boot device: /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/disk@0,0:a File and args:
SunOS Release 5.10 Version Generic_141444-09 64-bit
Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Hardware watchdog enabled
Hostname: nodename2
Jul 21 10:00:16 in.mpathd[221]: No test address configured on interface ce3; disabling probe-based failure detection on it
Jul 21 10:00:16 in.mpathd[221]: No test address configured on interface bge0; disabling probe-based failure detection on it
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],0:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],0:c,raw".
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],1:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],1:c,raw".
Booting as part of a cluster
NOTICE: CMM: Node nodename1 (nodeid = 1) with votecount = 1 added.
NOTICE: CMM: Node nodename2 (nodeid = 2) with votecount = 1 added.
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
NOTICE: clcomm: Adapter bge3 constructed
NOTICE: CMM: Node nodename2: attempting to join cluster.
NOTICE: CMM: Node nodename1 (nodeid: 1, incarnation #: 1279727883) has become reachable.
NOTICE: clcomm: Path nodename2:bge3 - nodename1:bge3 online
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
NOTICE: CMM: Cluster has reached quorum.
NOTICE: CMM: Node nodename1 (nodeid = 1) is up; new incarnation number = 1279727883.
NOTICE: CMM: Node nodename2 (nodeid = 2) is up; new incarnation number = 1279728026.
NOTICE: CMM: Cluster members: nodename1 nodename2.
NOTICE: CMM: node reconfiguration #3 completed.
NOTICE: CMM: Node nodename2: joined cluster.
NOTICE: CCR: Waiting for repository synchronization to finish.
WARNING: CCR: Invalid CCR table : dcs_service_9 cluster global.
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast
==> WARNING: DCS: Error looking up services table
==> WARNING: DCS: Error initializing service 9 from file
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],0:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],0:c,raw".
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],1:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],1:c,raw".
/dev/md/rdsk/d22 is clean
Reading ZFS config: done.
NOTICE: iscsi session(6) iqn.1994-12.com.promise.iscsiarray2 online
nodename2 console login: obtaining access to all attached disks
starting NetWorker daemons:
Rebooting with command: boot
Boot device: /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/disk@0,0:a File and args:
SunOS Release 5.10 Version Generic_141444-09 64-bit
Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Hardware watchdog enabled
Hostname: nodename1
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],0:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],0:c,raw".
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],1:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],1:c,raw".
Booting as part of a cluster
NOTICE: CMM: Node nodename1 (nodeid = 1) with votecount = 1 added.
NOTICE: CMM: Node nodename2 (nodeid = 2) with votecount = 1 added.
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
NOTICE: clcomm: Adapter bge3 constructed
NOTICE: CMM: Node nodename1: attempting to join cluster.
NOTICE: bge3: link up 1000Mbps Full-Duplex
NOTICE: clcomm: Path nodename1:bge3 - nodename2:bge3 errors during initiation
WARNING: Path nodename1:bge3 - nodename2:bge3 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting for quorum.
NOTICE: bge3: link down
NOTICE: bge3: link up 1000Mbps Full-Duplex
NOTICE: CMM: Node nodename2 (nodeid: 2, incarnation #: 1279728026) has become reachable.
NOTICE: clcomm: Path nodename1:bge3 - nodename2:bge3 online
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
NOTICE: CMM: Cluster has reached quorum.
NOTICE: CMM: Node nodename1 (nodeid = 1) is up; new incarnation number = 1279727883.
NOTICE: CMM: Node nodename2 (nodeid = 2) is up; new incarnation number = 1279728026.
NOTICE: CMM: Cluster members: nodename1 nodename2.
NOTICE: CMM: node reconfiguration #3 completed.
NOTICE: CMM: Node nodename1: joined cluster.
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],0:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],0:c,raw".
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],1:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],1:c,raw".
/dev/md/rdsk/d26 is clean
Reading ZFS config: done.
NOTICE: iscsi session(6) iqn.1994-12.com.promise.iscsiarray2 online
nodename1 console login: obtaining access to all attached disks
starting NetWorker daemons:
nsrexecd
mount: /dev/md/jms-ds/dsk/d100 is already mounted or /opt/esbshares is busy -
A V890 machine is connected to Cisco switches then link to a Nortel firewall. After configuring IPMP. pc can telnet and ftp to the sun machine without any problem. Host is showing some errors in message files. But IPMP configuration was done similar for all the SUN machines.
V890 log :
May 12 04:33:30 xyz in.mpathd[38]: [ID 168056 daemon.error] All Interfaces in group test3 have failed
May 12 04:33:57 xyz in.mpathd[38]: [ID 237757 daemon.error] At least 1 interface (eri0) of group test3 has repaired
May 12 04:33:57 xyz in.mpathd[38]: [ID 299542 daemon.error] NIC repair detected on eri0 of group test3
May 12 04:33:58 xyz in.mpathd[38]: [ID 299542 daemon.error] NIC repair detected on ce0 of group test3
May 12 04:33:58 xyz in.mpathd[38]: [ID 620804 daemon.error] Successfully failed back to NIC ce0His network card configuration is as follows :
xyz@node-1 # ifconfig -a
lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
ce0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 192.168.1.207 netmask ffffff00 broadcast 192.168.1.255
groupname test3
ether 0:3:ba:da:9:9b
ce0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2
inet 192.168.1.123 netmask ffffff00 broadcast 192.168.1.255
eri0: flags=69040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,STANDBY,INACTIVE> mtu 1500 index 3
inet 192.168.1.124 netmask ffffff00 broadcast 192.168.1.255
groupname test3
ether 0:3:ba:ce:91:4d
xyz@node-1 #Another sun servers has no errors ....
Which master encountered such a situation who can help us .. Thank you in advance
Regards
Britto SidhanMaybe:
1. You have a script calling ndd changing the NIC speed.
2. Your Cisco switch is changing the speed
3. Someone is changing the NICs hardware configurations during the day
4. Someone is testing IPMP removing and installing the network cables
What is exactly the contents of /etc/hostname.* files?
Cheers,
Andreas -
Solaris 10 IPMP and NetApp NFS v4 ACL
Okay here is my issue. I have one T5220 that has a failed NIC. IPMP is setup for active-standby and the NIC fails over on boot. I can reach the system through said interface and send traffic out the failed to NIC (ssh to another server and do a last and I get the 10.255.249.196 address). However the NFS acl I have is limiting to the shared IP address of the IPMP group (10.255.249.196). As that is what it should see. However if it appears that the NFS server is seeing the test IP (10.255.249.197) of the "failed to" NIC.. I added 10.255.249.197 to the NFS acl and all is fine. ifconfig output
e1000g1: flags=219040803<UP,BROADCAST,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED,CoS> mtu 1500 index 3
inet 10.255.249.198 netmask ffffff00 broadcast 10.255.249.255
groupname prvdmz
ether 0:21:28:24:3:1f
nxge1: flags=209040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,CoS> mtu 1500 index 6
inet 10.255.249.197 netmask ffffff00 broadcast 10.255.249.255
groupname prvdmz
ether 0:21:28:d:a4:6f
nxge1:1: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 6
inet 10.255.249.196 netmask ffffff00 broadcast 10.255.249.255
netstat -rn out put
10.255.249.0 10.255.249.196 U 1 1333 nxge1:1
10.255.249.0 10.255.249.197 U 1 0 e1000g1
10.255.249.0 10.255.249.197 U 1 0 nxge1
DNS sets the host name of the system to 10.255.249.196. But if I leave the ACL as is with the one IP address and wait about 10 minutes after a boot then I am able to mount the NFS share with acl only containing 10.255.249.196.
Here are my hosts hostname.INT files.
bash-3.00# cat /etc/hosts
# Internet host table
::1 localhost
127.0.0.1 localhost
10.255.249.196 mymcresprod2-pv
10.255.249.197 mymcresprod2-pv_nxge1
10.255.249.198 mymcresprod2-pv_e1000g1
bash-3.00# cat /etc/hostname.e1000g1
mymcresprod2-pv_e1000g1 netmask 255.255.255.0 broadcast + deprecated -failover up group prvdmz
addif mymcresprod2-pv netmask 255.255.255.0 broadcast + up
bash-3.00# cat /etc/hostname.nxge1
mymcresprod2-pv_nxge1 netmask 255.255.255.0 broadcast + deprecated -failover up group prvdmz
bash-3.00# more /etc/default/mpathd
#pragma ident "@(#)mpathd.dfl 1.2 00/07/17 SMI"
# Time taken by mpathd to detect a NIC failure in ms. The minimum time
# that can be specified is 100 ms.
FAILURE_DETECTION_TIME=10000
# Failback is enabled by default. To disable failback turn off this option
FAILBACK=yes
# By default only interfaces configured as part of multipathing groups
# are tracked. Turn off this option to track all network interfaces
# on the system
TRACK_INTERFACES_ONLY_WITH_GROUPS=yes
I think the IPMP configuration is fine but could be wrong. Any ideas on this? I mean I can add the test IP address to the ACL if need be but that just seems to be a band-aid. Or am I completely nuts and it should work this way.
Thanks,
BenFollowin up on my post...
The moment I started to add more NFS shares, things slowed down when loggin in big time.
Only way out was to fully open a hole on the sever for every client...
I was able to lock down somewhat the Linux server, to fixed ports, and only open up those (111,2049,656,32766-32769) But on the Solaris server, I can't seem to figure this out...
Any one ?
TIA... -
Does SunMC monitor IPMP configurations at all ? I asking about servers with IPMP setup but don't have Sun Cluster installed. I say this thinking, naively, that the Sun Cluster SunMC module monitors the IPMP config for cluster nodes.
Just an update on this in case any one is interested.
When the agent core dumps, it has by default a rlimit of zero which is pretty useless if you ask me. To get a core dump, you have to follow these steps.
in a bourne shell
stop your agent
# ESDEBUG=true
# export ESDEBUG
then start your agent.
This works as I finally got a core dump after many frustrating months. When doing a pstack on the core, I found some typical libc calls and then a number of ut calls which suggests that the SunRay software is causing the core dump. My systems are large and I run normally have 80 to 100 people on each one so perhaps there may be a limit when the agent does someting like utwho -c.
So this is now going to Sun and hopefully I will get some resolution on this.
Regards
Stephen -
hello everyone. i have dedicated network with two servers with solaris9 configured. sunray servers installed on both. both servers connected by two NIC's with IPMP configured. there are no anymore devices in the network. i have following trouble: if any of servers down (reboot, halt etc) anothers server stop answer on network request with message in log: 'in.mpathd[236]: [ID 168056 daemon.error] All Interfaces in group test have failed'. all becoming normal, when rebooted server online. i got log message:
Jul 4 11:59:10 sf210-contourK in.mpathd[236]: [ID 299542 daemon.error] NIC repair detected on bge3 of group test
Jul 4 11:59:10 sf210-contourK in.mpathd[236]: [ID 237757 daemon.error] At least 1 interface (bge3) of group test has repaired
Jul 4 11:59:10 sf210-contourK in.mpathd[236]: [ID 620804 daemon.error] Successfully failed back to NIC bge2
Jul 4 11:59:10 sf210-contourK in.mpathd[236]: [ID 299542 daemon.error] NIC repair detected on bge2 of group test
snoop show, that avery server selectig addresses of opposite server by test addresses:
(10.99.0.1-2-3 - one server, 10.99.10.1-2-3 second server)
10.99.0.2 -> 10.99.10.3 ICMP Echo request (ID: 60421 Sequence number: 5594) 10.99.10.3 -> 10.99.0.2 ICMP Echo reply (ID: 60421 Sequence number: 5594)
10.99.10.1 -> 10.99.0.3 ICMP Echo request (ID: 65027 Sequence number: 5595) 10.99.0.3 -> 10.99.10.1 ICMP Echo reply (ID: 65027 Sequence number: 5595)
10.99.0.1 -> 10.99.10.3 ICMP Echo request (ID: 60420 Sequence number: 5595) 10.99.10.3 -> 10.99.0.1 ICMP Echo reply (ID: 60420 Sequence number: 5595)
looks like this is reason by total network fail. is any solution for solving rouble like this?Hello,
IPMP can be configured in two different modes:
- based on ping requests to detect link failure
- using the link state (MII)
it seems that you are using the 1st mode. Unfortunately, if you have only wo nodes in your network, the ICMP request will fail as soon as one serer will be down.
Adding a third server/station is one of the solution:.
The other one is to use the MII mode: in that case, the link status is based on the connectivity for example, if your link is connected to a switch, the link failure will be detected if the cable is unplugged
Hope this helps./
Maybe you are looking for
-
We are running Windows Server 2003 with Terminal Services for approximately 10 remote users. A few days ago, some users started to complain about not being able to print pdf documents to their printer. I discovered that the print dialog box doesn't o
-
Configure a intel i915 (Solved)
I need help to configure a intel i915 chipset, I installed the 3ddesktop package, and the command "3ddesk" gives me this message: Xlib: extension "GLX" missing on display ":0.0". Xlib: extension "GLX" missing on display ":0.0". Xlib: extension "GL
-
Hi to all, I have a small issue with a website using a Spry Accordion Widget to create a collapsible menu. I have several items of navigation, for example: Link 1 Link 1.1 Link 1.2 The Accordion works well, but in some level there is only one item, a
-
Debugging not working in weblogic portal
When I am running my portal app built on weblogic portal 8.1, its throwing an error cannot start debugger. App might not be deployed properly. I have redeployed 2-3 times, and its running also fine, but debugger is not working. when the app domain st
-
EDI: Partner profile inbound not available
Hi All, I am doing File to IDOC Scenario. File got picked up and every where message is successfull, but when i am checking into WE05 of R3 i am getting an error EDI : Partner Profile not available. I have already created partner profile in R3 side a