Cluster Interconnect droped packets.

Hi,
We have a 4 node RAC cluster 10.2.0.3 that is seeing some reboot issues that seem to be network related. The network statistics are showing dropped packets across the interconnect (bond1,eth2). Is this normal behavior due to using UDP?
$ netstat -i
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
bond0 1500 0 387000915 0 0 0 377153910 0 0 0 BMmRU
bond1 1500 0 942586399 0 2450416 0 884471536 0 0 0 BMmRU
eth0 1500 0 386954905 0 0 0 377153910 0 0 0 BMsRU
eth1 1500 0 46010 0 0 0 0 0 0 0 BMsRU
eth2 1500 0 942583215 0 2450416 0 884471536 0 0 0 BMsRU
eth3 1500 0 3184 0 0 0 0 0 0 0 BMsRU
lo 16436 0 1048410 0 0 0 1048410 0 0 0 LRU
Thanks

Hi,
To diagnose the reboot issues refere *Troubleshooting 10g and 11.1 Clusterware Reboots [ID 265769.1]*
Also monitor your lost blocks *gc lost blocks diagnostics [ID 563566.1]*
I had a issue which turned out to be network card related (gc lost blocks) http://www.asanga-pradeep.blogspot.com/2011/05/gathering-stats-for-gc-lost-blocks.html

Similar Messages

Aggregates, VLAN's, Jumbo-Frames and cluster interconnect opinions

Hi All,
I'm reviewing my options for a new cluster configuration and would like the opinions of people with more expertise than myself out there.
What I have in mind as follows:
2 x X4170 servers with 8 x NIC's in each.
On each 4170 I was going to configure 2 aggregates with 3 nics in each aggregate as follows
igb0 device in aggr1
igb1 device in aggr1
igb2 device in aggr1
igb3 stand-alone device for iSCSI network
e1000g0 device in aggr2
e1000g1 device in aggr2
e1000g2 device in aggr3
e1000g3 stand-alone device of iSCSI network
Now, on top of these aggregates, I was planning on creating VLAN interfaces which will allow me to connect to our two "public" network segments and for the cluster heartbeat network.
I was then going to configure the vlan's in an IPMP group for failover. I know there are some questions around that configuration in the sense that IPMP will not detect a nic failure if a NIC goes offline in the aggregate, but I could monitor that in a different manner.
At this point, my questions are:
[1] Are vlan's, on top of aggregates, supported withing Solaris Cluster? I've not seen anything in the documentation to mention that it is, or is not for that matter. I see that vlan's are supported, inluding support for cluster interconnects over vlan's.
Now with the standalone interface I want to enable jumbo frames, but I've noticed that the igb.conf file has a global setting for all nic ports, whereas I can enable it for a single nic port in the e1000g.conf kernel driver. My questions are as follows:
[2] What is the general feeling with mixing mtu sizes on the same lan/vlan? Ive seen some comments that this is not a good idea, and some say that it doesnt cause a problem.
[3] If the underlying nic, igb0-2 (aggr1) for example, has 9k mtu enabled, I can force the mtu size (1500) for "normal" networks on the vlan interfaces pointing to my "public" network and cluster interconnect vlan. Does anyone have experience of this causing any issues?
Thanks in advance for all comments/suggestions.

For 1) the question is really "Do I need to enable Jumbo Frames if I don't want to use them (neither public nore private network)" - the answer is no.
For 2) each cluster needs to have its own seperate set of VLANs.
Greets
Thorsten

IPFC (ip over fc) cluster interconnect

Hello!
It a possible create cluster interconnect with IPFC (ip over fc) driver (for example - a reserve channel) ?
What problems may arise?

Hi,
technically Sun Cluster works fine with only a single interconnect, but it used to be not supported. The mandatory requirement to have 2 dedicated interconnects was lifted a couple of months ago. Although it is still a best practice and a recommendation to use 2 independent interconnects.
The possible consequences of only having one NIC port have been mentioned in the previous post.
Regards
Hartmut

LDOM SUN Cluster Interconnect failure

I am making a test SUN-Cluster on Solaris 10 in LDOM 1.3.
in my environment, i have T5120, i have setup two guest OS with some configurations, setup sun cluster software, when executed, scinstall, it failed.
node 2 come up, but node 1 throws following messgaes:
Boot device: /virtual-devices@100/channel-devices@200/disk@0:a File and args:
SunOS Release 5.10 Version Generic_139555-08 64-bit
Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Hostname: test1
Configuring devices.
Loading smf(5) service descriptions: 37/37
/usr/cluster/bin/scdidadm: Could not load DID instance list.
/usr/cluster/bin/scdidadm: Cannot open /etc/cluster/ccr/did_instances.
Booting as part of a cluster
NOTICE: CMM: Node test2 (nodeid = 1) with votecount = 1 added.
NOTICE: CMM: Node test1 (nodeid = 2) with votecount = 0 added.
NOTICE: clcomm: Adapter vnet2 constructed
NOTICE: clcomm: Adapter vnet1 constructed
NOTICE: CMM: Node test1: attempting to join cluster.
NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting for quorum.
NOTICE: clcomm: Path test1:vnet1 - test2:vnet1 errors during initiation
NOTICE: clcomm: Path test1:vnet2 - test2:vnet2 errors during initiation
WARNING: Path test1:vnet1 - test2:vnet1 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
WARNING: Path test1:vnet2 - test2:vnet2 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
clcomm: Path test1:vnet2 - test2:vnet2 errors during initiation
CREATED VIRTUAL SWITCH AND VNETS ON PRIMARY DOMAIN LIKE:<>
532 ldm add-vsw mode=sc cluster-vsw0 primary
533 ldm add-vsw mode=sc cluster-vsw1 primary
535 ldm add-vnet vnet2 cluster-vsw0 test1
536 ldm add-vnet vnet3 cluster-vsw1 test1
540 ldm add-vnet vnet2 cluster-vsw0 test2
541 ldm add-vnet vnet3 cluster-vsw1 test2
Primary DOmain<>
bash-3.00# dladm show-dev
vsw0 link: up speed: 1000 Mbps duplex: full
vsw1 link: up speed: 0 Mbps duplex: unknown
vsw2 link: up speed: 0 Mbps duplex: unknown
e1000g0 link: up speed: 1000 Mbps duplex: full
e1000g1 link: down speed: 0 Mbps duplex: half
e1000g2 link: down speed: 0 Mbps duplex: half
e1000g3 link: up speed: 1000 Mbps duplex: full
bash-3.00# dladm show-link
vsw0 type: non-vlan mtu: 1500 device: vsw0
vsw1 type: non-vlan mtu: 1500 device: vsw1
vsw2 type: non-vlan mtu: 1500 device: vsw2
e1000g0 type: non-vlan mtu: 1500 device: e1000g0
e1000g1 type: non-vlan mtu: 1500 device: e1000g1
e1000g2 type: non-vlan mtu: 1500 device: e1000g2
e1000g3 type: non-vlan mtu: 1500 device: e1000g3
bash-3.00#
NOde1<>
-bash-3.00# dladm show-link
vnet0 type: non-vlan mtu: 1500 device: vnet0
vnet1 type: non-vlan mtu: 1500 device: vnet1
vnet2 type: non-vlan mtu: 1500 device: vnet2
-bash-3.00# dladm show-dev
vnet0 link: unknown speed: 0 Mbps duplex: unknown
vnet1 link: unknown speed: 0 Mbps duplex: unknown
vnet2 link: unknown speed: 0 Mbps duplex: unknown
-bash-3.00#
NODE2<>
-bash-3.00# dladm show-link
vnet0 type: non-vlan mtu: 1500 device: vnet0
vnet1 type: non-vlan mtu: 1500 device: vnet1
vnet2 type: non-vlan mtu: 1500 device: vnet2
-bash-3.00#
-bash-3.00#
-bash-3.00# dladm show-dev
vnet0 link: unknown speed: 0 Mbps duplex: unknown
vnet1 link: unknown speed: 0 Mbps duplex: unknown
vnet2 link: unknown speed: 0 Mbps duplex: unknown
-bash-3.00#
and this configuration i give while setting up scinstall
Cluster Transport Adapters and Cables <<<You must identify the two cluster transport adapters which attach
this node to the private cluster interconnect.
For node "test1",
What is the name of the first cluster transport adapter [vnet1]?
Will this be a dedicated cluster transport adapter (yes/no) [yes]?
All transport adapters support the "dlpi" transport type. Ethernet
and Infiniband adapters are supported only with the "dlpi" transport;
however, other adapter types may support other types of transport.
For node "test1",
Is "vnet1" an Ethernet adapter (yes/no) [yes]?
Is "vnet1" an Infiniband adapter (yes/no) [yes]? no
For node "test1",
What is the name of the second cluster transport adapter [vnet3]? vnet2
Will this be a dedicated cluster transport adapter (yes/no) [yes]?
For node "test1",
Name of the switch to which "vnet2" is connected [switch2]?
For node "test1",
Use the default port name for the "vnet2" connection (yes/no) [yes]?
For node "test2",
What is the name of the first cluster transport adapter [vnet1]?
Will this be a dedicated cluster transport adapter (yes/no) [yes]?
For node "test2",
Name of the switch to which "vnet1" is connected [switch1]?
For node "test2",
Use the default port name for the "vnet1" connection (yes/no) [yes]?
For node "test2",
What is the name of the second cluster transport adapter [vnet2]?
Will this be a dedicated cluster transport adapter (yes/no) [yes]?
For node "test2",
Name of the switch to which "vnet2" is connected [switch2]?
For node "test2",
Use the default port name for the "vnet2" connection (yes/no) [yes]?
i have setup the configurations like.
ldm list -l nodename
NODE1<>
NETWORK
NAME SERVICE ID DEVICE MAC MODE PVID VID MTU LINKPROP
vnet1 primary-vsw0@primary 0 network@0 00:14:4f:f9:61:63 1 1500
vnet2 cluster-vsw0@primary 1 network@1 00:14:4f:f8:87:27 1 1500
vnet3 cluster-vsw1@primary 2 network@2 00:14:4f:f8:f0:db 1 1500
ldm list -l nodename
NODE2<>
NETWORK
NAME SERVICE ID DEVICE MAC MODE PVID VID MTU LINKPROP
vnet1 primary-vsw0@primary 0 network@0 00:14:4f:f9:a1:68 1 1500
vnet2 cluster-vsw0@primary 1 network@1 00:14:4f:f9:3e:3d 1 1500
vnet3 cluster-vsw1@primary 2 network@2 00:14:4f:fb:03:83 1 1500
ldm list-services
VSW
NAME LDOM MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
primary-vsw0 primary 00:14:4f:f9:25:5e e1000g0 0 switch@0 1 1 1500 on
cluster-vsw0 primary 00:14:4f:fb:db:cb 1 switch@1 1 1 1500 sc on
cluster-vsw1 primary 00:14:4f:fa:c1:58 2 switch@2 1 1 1500 sc on
ldm list-bindings primary
VSW
NAME MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
primary-vsw0 00:14:4f:f9:25:5e e1000g0 0 switch@0 1 1 1500 on
PEER MAC PVID VID MTU LINKPROP INTERVNETLINK
vnet1@gitserver 00:14:4f:f8:c0:5f 1 1500
vnet1@racc2 00:14:4f:f8:2e:37 1 1500
vnet1@test1 00:14:4f:f9:61:63 1 1500
vnet1@test2 00:14:4f:f9:a1:68 1 1500
NAME MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
cluster-vsw0 00:14:4f:fb:db:cb 1 switch@1 1 1 1500 sc on
PEER MAC PVID VID MTU LINKPROP INTERVNETLINK
vnet2@test1 00:14:4f:f8:87:27 1 1500
vnet2@test2 00:14:4f:f9:3e:3d 1 1500
NAME MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
cluster-vsw1 00:14:4f:fa:c1:58 2 switch@2 1 1 1500 sc on
PEER MAC PVID VID MTU LINKPROP INTERVNETLINK
vnet3@test1 00:14:4f:f8:f0:db 1 1500
vnet3@test2 00:14:4f:fb:03:83 1 1500
Any Idea Team, i beleive the cluster interconnect adapters were not successfull.
I need any guidance/any clue, how to correct the private interconnect for clustering in two guest LDOMS.

You dont have to stick to default IP's or subnet . You can change to whatever IP's you need. Whatever subnet mask you need. Even change the private names.
You can do all this during install or even after install.
Read the cluster install doc at docs.sun.com

Cluster interconnect on LDOM

Hi,
We want to setup Solaris cluster on LDOM environment.
We have:
- Primary domain
- Alternate domain (Service Domain)
So we want to setup the cluster interconnect from primary domain and service domain, like below configuration:
example:
ldm add-vsw net-dev=net3 mode=sc private-vsw1 primary
ldm add-vsw net-dev=net7 mode=sc private-vsw2 alternate
ldm add-vnet private-net1 mode=hybrid private-vsw1 ldg1
ldm add-vnet private-net2 mode=hybrid private-vsw2 ldg1
It's supported the configuration above?
If there is any documentation about this, please refer me.
Thanks,

Hi rachfebrianto,
yes, the commands are looking good. Minimum requirement is Solaris Cluster 3.2u3 to use hybrid I/O. But I guess you running 3.3 or 4.1 anyway.
The mode=sc is a requirement on the vsw for Solaris Cluster interconnect (private network).
And it is supported to add mode=hybrid to guest LDom for the Solaris Cluster interconnect.
There is no special documentation for Solaris Cluster because its using what is available in the
Oracle VM Server for SPARC 3.1 Administration Guide
Using NIU Hybrid I/O
How to Configure a Virtual Switch With an NIU Network Device
How to Enable or Disable Hybrid Mode
Hth,
Juergen

Bad cluster interconnections

Hi,
I've an Oracle Database 11g Release 11.1.0.6.0 - 64bit Production With the Real Application Clusters option.
Due to some check I was executing I noticed something wrong for the cluster interconnections
This is the oifcfg getif output:
eth0 10.81.10.0 global public
eth1 172.16.100.0 global cluster_interconnect
this is equal for both node, and it seems to be right as the 10.81.10.x is the public network and the 172.16.100.x is the private network.
But if I query the gv$cluster_interconnects, I get:
SQL> select * from gv$cluster_interconnects;
INST_ID NAME IP_ADDRESS IS_ SOURCE
2 bond0 10.81.10.40 NO OS dependent software
1 bond0 10.81.10.30 NO OS dependent software
It seems the cluster interconnections are on public network.
Another info that support this fact is the traffic I can see on network interface (using iptraf):
NODE 1:
lo: 629.80 kb/s
eth0: 29983.60 kb/s
eth1: 2.20 kb/s
eth2: 0 kb/s
eth3: 0 kb/s
NODE 2:
lo: 1420.60 kb/s
eth0: 18149.60 kb/s
eth1: 2.20 kb/s
eth2: 0 kb/s
eth3: 0 kb/s
This is the bond configuration (the configuration is the same on both nodes):
+[node01 ~]# more /etc/sysconfig/network-scripts/ifcfg-eth0+
DEVICE=eth0
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
+[node01 ~]# more /etc/sysconfig/network-scripts/ifcfg-eth1+
DEVICE=eth1
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
MASTER=bond1
SLAVE=yes
+[node01 ~]# more /etc/sysconfig/network-scripts/ifcfg-eth2+
DEVICE=eth2
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
MASTER=bond1
SLAVE=yes
+[node01 ~]# more /etc/sysconfig/network-scripts/ifcfg-eth3+
DEVICE=eth3
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
Why the oifcfg getif output is different than the gv$cluster_interconnects view?
Any suggestions on how to configure correctly the interconnections?
Thanks in advance.
Samuel

As soon as I reboot the database I'll check it out.
In the meantime, I snapshot the dabase activity during last hour (during which we suffered a little) and I extracted this top 15 waiting events (based on total wait time (s)):
Event / Total Wait Time (s)
enq: TX - index contention     945
gc current block 2-way     845
log file sync     769
latch: shared pool     729
gc cr block busy     703
buffer busy waits     536
buffer deadlock     444
gc current grant busy     415
SQL*Net message from client     338,421
latch free     316
gc buffer busy release     242
latch: cache buffers chains     203
library cache: mutex X     181
library cache load lock     133
gc current grant 2-way     102
Could some of those depends on that bad interconnection configuration?
And those one are the 15 top wait event based on %DB Time
Event / % DB time
db file sequential read     15,3
library cache pin     13,72
gc buffer busy acquire     7,16
gc cr block 2-way     4,19
library cache lock     2,64
gc current block busy     2,59
enq: TX - index contention     2,29
gc current block 2-way     2,04
log file sync     1,86
latch: shared pool     1,76
gc cr block busy     1,7
buffer busy waits     1,3
buffer deadlock     1,08
gc current grant busy     1
Thanks in advance
Edited by: Samuel Rabini on Jan 11, 2011 4:51 PM
Edited by: Samuel Rabini on Jan 11, 2011 4:53 PM

Cluster Interconnect information

Hi,
We have a two node cluster (10.2.0.3) running on top of Solaris and Veritas SFRAC.
The cluster is working fine. I would like to get more information about the private cluster interconnect used by the Oracle Clusterware but the two places I could think of have shown nothing.
SQL> show parameter cluster_interconnects;
NAME TYPE VALUE
cluster_interconnects string
SQL> select * from GV$CLUSTER_INTERCONNECTS;
no rows selected
I wasn't expecting to see anything in the cluster_interconnects parameter but thought there would be something in the dictionary view.
I'd be grateful if anyone could shed some light on this.
Where can I get information about the currently configured private interconnect?
I've yet to check the OCR, does anyone know which key/values are relevant, if any?
Thanks
user234564

Try this:
1. $ORA_CRS_HOME/bin/oifcfg getif
eth0 1xx.xxx.x.0 global public
eth1 192.168.0.0 global cluster_interconnect
2. V$CONFIGURED_INTERCONNECTS;
3. X$KSXPIA;
HTH
Thanks
Chandra Pabba

Cluster interconnect using listening port 8059 + 8060

Hello,
I have 4 tomcat instances running on a zone which are being redirected to an apache via jkmounts.
One of the tomcats is running on port 8050 and with a listener setup on 8059.
I was wondering why this was the only port Apache wasn't picking up. A tail-f of the catalina.out shows that port 8059 was busy, so it tries to connect the listener to 8060... also busy, so it finally binds to 8061.
Ports 8059 and 8060 are being used by the cluster interconnects as shown in netstat below:
*.* *.* 0 0 49152 0 IDLE
localhost.5999 *.* 0 0 49152 0 LISTEN
*.scqsd* . *0 0 49152 0 LISTEN*
.scqsd *.* 0 0 49152 0 LISTEN
**.8059* . *0 0 49152 0 LISTEN**
*.8060 *.* 0 0 49152 0 LISTEN*
*.* *.* 0 0 49152 0 IDLE
*.sunrpc* . *0 0 49152 0 LISTEN*
. . *0 0 49152 0 IDLE*
localhost.5987 . *0 0 49152 0 LISTEN*
localhost.898 . *0 0 49152 0 LISTEN*
localhost.32781 . *0 0 49152 0 LISTEN*
localhost.5988 . *0 0 49152 0 LISTEN*
localhost.32782 . *0 0 49152 0 LISTEN*
.ssh *.* 0 0 49152 0 LISTEN
*.32783* . *0 0 49152 0 LISTEN*
.32784 *.* 0 0 49152 0 LISTEN
*.sccheckd* . *0 0 49152 0 LISTEN*
.32785 *.* 0 0 49152 0 LISTEN
*.servicetag* . *0 0 49152 0 LISTEN*
localhost.smtp . *0 0 49152 0 LISTEN*
localhost.submission . *0 0 49152 0 LISTEN*
.32798 *.* 0 0 49152 0 LISTEN
*.pnmd* . *0 0 49152 0 LISTEN*
.32811 *.* 0 0 49152 0 BOUND
localhost.6788 *.* 0 0 49152 0 LISTEN
localhost.6789 *.* 0 0 49152 0 LISTEN
scmars.ssh 161.228.79.36.54693 65180 51 49640 0 ESTABLISHED
localhost.32793 *.* 0 0 49152 0 LISTEN
*172.16.1.1.35136 172.16.1.2.8059 49640 0 49640 0 ESTABLISHED*
*172.16.1.1.35137 172.16.1.2.8060 49640 0 49640 0 ESTABLISHED*
*172.16.1.1.35138 172.16.1.2.8060 49640 0 49640 0 ESTABLISHED*
*172.16.1.1.35139 172.16.1.2.8060 49640 0 49640 0 ESTABLISHED*
*172.16.1.1.35140 172.16.1.2.8060 49640 0 49640 0 ESTABLISHED*
*172.16.0.129.35141 172.16.0.130.8059 49640 0 49640 0 ESTABLISHED*
*172.16.0.129.35142 172.16.0.130.8060 49640 0 49640 0 ESTABLISHED*
*172.16.0.129.35143 172.16.0.130.8060 49640 0 49640 0 ESTABLISHED*
*172.16.0.129.35144 172.16.0.130.8060 49640 0 49640 0 ESTABLISHED*
*172.16.0.129.35145 172.16.0.130.8060 49640 0 49640 0 ESTABLISHED*
My question is, how can I modify the ports being used by the cluster interconnects as I would like to keep port 8059 as the tomcat listener port?
Any help is appreciated.
Thanks!

Hi,
unfortunately the ports used by Sun Cluster are hard wired, su you must cnge your Tomcat port.
Sorry for the bad news
Detlef

Oracle10g RAC Cluster Interconnect issues

Hello Everybody,
Just a brief overview as to what i am currently doing. I have installed Oracle10g RAC database on a cluster of two Windows 2000 AS nodes.These two nodes are accessing an external SCSI hard disk.I have used Oracle cluster file system.
Currently i am facing some performance issues when it comes to balancing workload on both the nodes.(Single instance database load is faster than a parallel load using two database instances).
I feel the performance issues could be due to IPC using public Ethernet IP instead of private interconnect.
(During a parallel load large amount of packets of data are sent over the Public IP and not Private interconnect).
How can i be sure that the Private interconnect is used for transferring cluster traffic and not the Public IP? (Oracle mentions that for a Oracle10g RAC database, private IP should be used for heart beat as well as transferring cluster traffic).
Thanks in advance,
Regards,
Salil

You find the answers here:
RAC: Frequently Asked Questions
Doc ID: NOTE:220970.1
At least crossover interconnect is completely unsupported.
Werner

Cluster interconnect

We have a 3 node RAC cluster, 10.2.0.3 version. sys admin is gearing up to change
1g interconnect to 10g interconnect. just trying to find out, if anything we need to be prepared with from the database point of view/cluster point of view.
Thanks

riyaj wrote:
But, if the protocol is not RDS, then the path becomes udp -> ip -> IPoIB -> HCA. Clearly, there is an additional layer IPoIB. Considering that most latency is at the software layer level, not in the hardware layer, I am not sure, additional layer will improve the latency. Perhaps when one compares 10GigE with 10Gb IB... but that would be comparing new Ethernet technology with older IB technology. QDR (40Gb) is pretty much the standard (for some years now) for IB.
Originally we compared 1GigE with 10Gb IB, as 10GigE was not available. IPoIB was a lot faster on SDR IB than 1GigE.
When 10GigE was released, it was pretty expensive (not sure if this is still the case). A 10Gb Ethernet port was more than 1.5x the cost of 40Gb IB port.
IB also supports a direct socket (or something) for IP applications. As I understand, this simplifies the call interface and allows socket calls to be made with less latency (surpassing that of the socket interface of a standard IP stack on Ethernet). We never looked at this ourselves as our Interconnect using IB was pretty robust and performant using standard IPoIB.
Further, InfiniBand has data center implications. In huge companies, this is a problem: A separate infiniband architecture is needed to support infiniband network, which is not exactly a mundane task. With 10Gb NIC cards, existing network infrastructure can be used as long as the switch supports the 10Gb traffic. True... but I see that more as resistance to new technology and even the network vendor used (do not have to name names, do I?) that will specifically slam IB technology as they do not supply IB kit. A pure profit and territory issue.
All resistance I've ever seen and responded to with IB versus Ethernet have been pretty much unwarranted - to the extend of seing RACs being build using 100Mb Ethernet Interconnect as IB was a "foreign" technology and equated to evil/do not use/complex/unstable/etc.
Another issue to keep in mind is that IB is a fabric layer. SRP scales and performs better than using fibre channel technology and protocols. So IB is not only suited as Interconnect, but also as the storage fabric layer. (Exadata pretty much proved that point).
Some months ago OFED announced that the SRP specs have been made available to Ethernet vendors to implement (as there is nothing equivalent on Ethernet). Unfortunately, a 10Gig Ethernet SRP implementation will still lack in comparison with a QDR IB SRP implementation.
Third point, skill set in the system admin side needs further adjustment to support Infiniband hardware effectively. Important point. But it is not that difficult as sysadmin to acquire the basic set of skills to manage IB from an o/s perspective. Likewise not that difficult for a network engineer to acquire the basic skills for managing the switch and fabric layer.
The one issue that I think is the single biggest negative ito using IB is getting a stable OFED driver stack running in the kernel. Some of the older versions were not that stable. However, later versions have improved considerably and the current version seems pretty robust. Oh yeah - this is specifically using SRP. IPoIB and bonding and so on, have always worked pretty well. RDMA and SRP were not always that stable with the v1.3 drivers and earlier.

Soalris Cluster Interconnect over Layer 3 Link

Is it possible to connect two Solaris Cluster nodes over a pure layer 3 (TCP/UDP) network connection over a distance of about 10 km?

The problem with having just single node clusters is that, effectively, any failure on the primary site will require invocation of the DR process rather than just a local fail-over. Remember that Sun Cluster and Sun Cluster Geographic Edition are aimed at different problems: high availability and disaster recovery respectively. You seem to be trying to combine the two and this is a bad thing IMHO. (See the Blueprint http://www.sun.com/blueprints/0406/819-5783.html)
I'm assuming that you have some leased bandwidth between these sites and hence the pure layer 3 networking.
What do you actually want to achieve: HA or DR? If it's both, you will probably have to make compromises either in cost or expectations.
Regards,
Tim
---

Who has to solve?

load balancing in two rac issue............who need to solve
is it dba or architect level?
who has decide sga_target=
is it archtiect or dba

1. RAC DBA is responsible.
Oracle RAC job role best practices
There is a perpetual conflict between systems administrators (SAs), who traditionally manage servers and disks, and the RAC DBAs who are responsible for managing the RAC database. There are also clearly defined job roles for network administrators, who are especially challenged in a RAC database environment to manage the cluster interconnect and packet shipping between servers.
If your DBA is going to be held responsible for the performance of the RAC database, then it’s only fair that he be given root access to the servers and disk storage subsystem. However, not every DBA will have the required computer science skills to manage a complex server and SAN environment, so each shop makes this decision on a case-by-case basis.
Source:http://searchoracle.techtarget.com/news/2240016536/Understanding-Oracle-Real-Application-Clusters-RAC-best-practices
2. sga_target = Since it is related to SGA so its sole responsiblity of DBA. If there is any need regarding increasing RAM, CPU Usages, IO issues and if they find out as an OS issue; then SA is responsible.
Regards
Girish Sharma

Gig Ethernet V/S SCI as Cluster Private Interconnect for Oracle RAC

Hello Gurus
Can any one pls confirm if it's possible to configure 2 or more Gigabit Ethernet interconnects ( Sun Cluster 3.1 Private Interconnects) on a E6900 cluster ?
It's for a High Availability requirement of Oracle 9i RAC. i need to know ,
1) can i use gigabit ethernet as Private cluster interconnect for Deploying Oracle RAC on E6900 ?
2) What is the recommended Private Cluster Interconnect for Oracle RAC ? GiG ethernet or SCI with RSM ?
3) How about the scenarios where one can have say 3 X Gig Ethernet V/S 2 X SCI , as their cluster's Private Interconnects ?
4) How the Interconnect traffic gets distributed amongest the multiple GigaBit ethernet Interconnects ( For oracle RAC) , & is anything required to be done at oracle Rac Level to enable Oracle to recognise that there are multiple interconnect cards it needs to start utilizing all of the GigaBit ethernet Interfaces for transfering packets ?
5) what would happen to Oracle RAC if one of the Gigabit ethernet private interconnects fails
Have tried searching for this info but could not locate any doc that can precisely clarify these doubts that i have .........
thanks for the patience
Regards,
Nilesh

Answers inline...
Tim
Can any one pls confirm if it's possible to configure
2 or more Gigabit Ethernet interconnects ( Sun
Cluster 3.1 Private Interconnects) on a E6900
cluster ?Yes, absolutely. You can configure up to 6 NICs for the private networks. Traffic is automatically striped across them if you specify clprivnet0 to Oracle RAC (9i or 10g). That is TCP connections and UDP messages.
It's for a High Availability requirement of Oracle
9i RAC. i need to know ,
1) can i use gigabit ethernet as Private cluster
interconnect for Deploying Oracle RAC on E6900 ? Yes, definitely.
2) What is the recommended Private Cluster
Interconnect for Oracle RAC ? GiG ethernet or SCI
with RSM ? SCI is or is in the process of being EOL'ed. Gigabit is usually sufficient. Longer term you may want to consider Infiniband or 10 Gigabit ethernet with RDS.
3) How about the scenarios where one can have say 3 X
Gig Ethernet V/S 2 X SCI , as their cluster's
Private Interconnects ? I would still go for 3 x GbE because it is usually cheaper and will probably work just as well. The latency and bandwidth differences are often masked by the performance of the software higher up the stack. In short, unless you tuned the heck out of your application and just about everything else, don't worry too much about the difference between GbE and SCI.
4) How the Interconnect traffic gets distributed
amongest the multiple GigaBit ethernet Interconnects
( For oracle RAC) , & is anything required to be done
at oracle Rac Level to enable Oracle to recognise
that there are multiple interconnect cards it needs
to start utilizing all of the GigaBit ethernet
Interfaces for transfering packets ?You don't need to do anything at the Oracle level. That's the beauty of using Oracle RAC with Sun Cluster as opposed to RAC on its own. The striping takes place automatically and transparently behind the scenes.
5) what would happen to Oracle RAC if one of the
Gigabit ethernet private interconnects fails It's completely transparent. Oracle will never see the failure.
Have tried searching for this info but could not
locate any doc that can precisely clarify these
doubts that i have .........This is all covered in a paper that I have just completed and should be published after Christmas. Unfortunately, I cannot give out the paper yet.
thanks for the patience
Regards,
Nilesh

Cluster Private Interconnect

Hi,
Does Global Cache Services work only when Cluster Private Interconnect is configured? I am not seeing any data in v$cache_transfer. cluster_interconnects parameter is blank. V$CLUSTER_INTERCONNECTS view is missing. Please let me know.
Thanks,
Madhav

HI
If you want to use specif interconnect IP then you can add this in cluster interconnect parameter,
if it is blank then you are using one interconnect which is default , so no need to worry about it is blank
rds

Cisco ISE 1.3 MAB authentication.. switch drop packet

Hello All,
I have C3560 Software (C3560-IPSERVICESK9-M), Version 12.2(55)SE9, RELEASE SOFTWARE (fc1) switch..
and ISE 1.3 versoin..
MAB authentication is working perfectly at ISE end.. but while seeing the same at switch end.. I am seeing switch is droping packet on some ports..
while some ports are working perfectly..
Same switch configuration is working perfectly on another switch without any issue..
Switch configuration for your suggestion..!!
aaa new-model
aaa authentication fail-message ^C
**** Either ACS or ISE is DOWN / Use ur LOCAL CREDENTIALS / Thank You ****
^C
aaa authentication login CONSOLE local
aaa authentication login ACS group tacacs+ group radius local
aaa authentication dot1x default group radius
aaa authorization config-commands
aaa authorization commands 0 default group tacacs+ local
aaa authorization commands 1 default group tacacs+ local
aaa authorization commands 15 default group tacacs+ local
aaa authorization network default group radius
aaa accounting dot1x default start-stop group radius
aaa accounting exec default start-stop group tacacs+
aaa accounting commands 0 default start-stop group tacacs+
aaa accounting commands 15 default start-stop group tacacs+
aaa accounting network default start-stop group tacacs+
aaa accounting connection default start-stop group tacacs+
aaa accounting system default start-stop group tacacs+ group radius
aaa server radius dynamic-author
client 172.16.95.x server-key 7 02050D480809
client 172.16.95.x server-key 7 14141B180F0B
aaa session-id common
clock timezone IST 5 30
system mtu routing 1500
ip routing
no ip domain-lookup
ip domain-name EVS.com
ip device tracking
epm logging
dot1x system-auth-control
interface FastEthernet0/1
switchport access vlan x
switchport mode access
switchport voice vlan x
authentication event fail action next-method
--More--         authentication host-mode multi-auth
authentication order mab dot1x
authentication priority mab dot1x
authentication port-control auto
authentication violation restrict
mab
snmp trap mac-notification change added
snmp trap mac-notification change removed
dot1x pae authenticator
dot1x timeout tx-period 10
spanning-tree portfast
ip tacacs source-interface Vlan10
ip radius source-interface Vlan10 vrf default
logging trap critical
logging origin-id ip
logging 172.16.5.95
logging host 172.16.95.x transport udp port 20514
logging host 172.16.95.x transport udp port 20514
snmp-server group SNMP-Group v3 auth read EVS-view notify *tv.FFFFFFFF.FFFFFFFF.FFFFFFFF.FFFFFFFF7F access 15
snmp-server view EVS-view internet included
snmp-server community S1n2M3p4$ RO
snmp-server community cisco RO
snmp-server trap-source Vlan10
snmp-server source-interface informs Vlan10
snmp-server enable traps snmp authentication linkdown linkup coldstart warmstart
--More--         snmp-server enable traps tty
snmp-server enable traps cluster
snmp-server enable traps entity
snmp-server enable traps cpu threshold
snmp-server enable traps vtp
snmp-server enable traps vlancreate
snmp-server enable traps vlandelete
snmp-server enable traps flash insertion removal
snmp-server enable traps port-security
snmp-server enable traps envmon fan shutdown supply temperature status
snmp-server enable traps config-copy
snmp-server enable traps config
snmp-server enable traps bridge newroot topologychange
snmp-server enable traps stpx inconsistency root-inconsistency loop-inconsistency
snmp-server enable traps syslog
snmp-server enable traps mac-notification change move threshold
snmp-server enable traps vlan-membership
snmp-server host 172.16.95.x version 2c cisco
snmp-server host 172.16.95.x version 2c cisco
snmp-server host 172.16.5.x version 3 auth evsnetadmin
tacacs-server host 172.16.5.x key 7 0538571873651D1D4D26421A4F
tacacs-server directed-request
--More--         tacacs-server key 7 107D580E573E411F58277F2360
tacacs-server administration
radius-server attribute 6 on-for-login-auth
radius-server attribute 25 access-request include
radius-server host 172.16.95.y auth-port 1812 acct-port 1813 key 7 060506324F41
radius-server host 172.16.95.x auth-port 1812 acct-port 1813 key 7 110A1016141D
radius-server host 172.16.95.y auth-port 1645 acct-port 1646 key 7 110A1016141D
radius-server host 172.16.95.x auth-port 1645 acct-port 1646 key 7 070C285F4D06
radius-server timeout 2
radius-server key 7 060506324F41
radius-server vsa send accounting
radius-server vsa send authentication
line con 0
exec-timeout 5 0
privilege level 15
logging synchronous
login authentication CONSOLE
line vty 0 4
access-class telnet_access in
exec-timeout 0 0
logging synchronous
--More--         login authentication ACS
transport input ssh

24423 ISE has not been able to confirm previous successful machine authentication
Judging by that line and what your policy says, it appears that your authentication was rejected as your machine was not authenticated prior to this connection.
first thing to check is whether MAR has been enabled on the identity source. second thing to check is whether your machine is set to send a certificate for authentication. there are other things you can look at but I'd do those two first.
log off and on or reboot and then see if you at least get a failed machine auth on the operations>authentication page and we can go from there.

Cluster Interconnect droped packets.

Similar Messages

Maybe you are looking for