Bad cluster interconnections

Hi,
I've an Oracle Database 11g Release 11.1.0.6.0 - 64bit Production With the Real Application Clusters option.
Due to some check I was executing I noticed something wrong for the cluster interconnections
This is the oifcfg getif output:
eth0  10.81.10.0  global  public
eth1  172.16.100.0  global  cluster_interconnect
this is equal for both node, and it seems to be right as the 10.81.10.x is the public network and the 172.16.100.x is the private network.
But if I query the gv$cluster_interconnects, I get:
SQL> select * from gv$cluster_interconnects;
INST_ID NAME IP_ADDRESS IS_ SOURCE
2 bond0 10.81.10.40 NO OS dependent software
1 bond0 10.81.10.30 NO OS dependent software
It seems the cluster interconnections are on public network.
Another info that support this fact is the traffic I can see on network interface (using iptraf):
NODE 1:
lo: 629.80 kb/s
eth0: 29983.60 kb/s
eth1: 2.20 kb/s
eth2: 0 kb/s
eth3: 0 kb/s
NODE 2:
lo: 1420.60 kb/s
eth0: 18149.60 kb/s
eth1: 2.20 kb/s
eth2: 0 kb/s
eth3: 0 kb/s
This is the bond configuration (the configuration is the same on both nodes):
+[node01 ~]# more /etc/sysconfig/network-scripts/ifcfg-eth0+
DEVICE=eth0
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
+[node01  ~]# more /etc/sysconfig/network-scripts/ifcfg-eth1+
DEVICE=eth1
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
MASTER=bond1
SLAVE=yes
+[node01  ~]# more /etc/sysconfig/network-scripts/ifcfg-eth2+
DEVICE=eth2
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
MASTER=bond1
SLAVE=yes
+[node01  ~]# more /etc/sysconfig/network-scripts/ifcfg-eth3+
DEVICE=eth3
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
Why the oifcfg getif output is different than the gv$cluster_interconnects view?
Any suggestions on how to configure correctly the interconnections?
Thanks in advance.
Samuel

As soon as I reboot the database I'll check it out.
In the meantime, I snapshot the dabase activity during last hour (during which we suffered a little) and I extracted this top 15 waiting events (based on total wait time (s)):
Event / Total Wait Time (s)
enq: TX - index contention     945
gc current block 2-way     845
log file sync     769
latch: shared pool     729
gc cr block busy     703
buffer busy waits     536
buffer deadlock     444
gc current grant busy     415
SQL*Net message from client     338,421
latch free     316
gc buffer busy release     242
latch: cache buffers chains     203
library cache: mutex X     181
library cache load lock     133
gc current grant 2-way     102
Could some of those depends on that bad interconnection configuration?
And those one are the 15 top wait event based on %DB Time
Event / % DB time
db file sequential read     15,3
library cache pin     13,72
gc buffer busy acquire     7,16
gc cr block 2-way     4,19
library cache lock     2,64
gc current block busy     2,59
enq: TX - index contention     2,29
gc current block 2-way     2,04
log file sync     1,86
latch: shared pool     1,76
gc cr block busy     1,7
buffer busy waits     1,3
buffer deadlock     1,08
gc current grant busy     1
Thanks in advance
Edited by: Samuel Rabini on Jan 11, 2011 4:51 PM
Edited by: Samuel Rabini on Jan 11, 2011 4:53 PM

Similar Messages

  • Cluster interconnect using listening port 8059 + 8060

    Hello,
    I have 4 tomcat instances running on a zone which are being redirected to an apache via jkmounts.
    One of the tomcats is running on port 8050 and with a listener setup on 8059.
    I was wondering why this was the only port Apache wasn't picking up. A tail-f of the catalina.out shows that port 8059 was busy, so it tries to connect the listener to 8060... also busy, so it finally binds to 8061.
    Ports 8059 and 8060 are being used by the cluster interconnects as shown in netstat below:
    *.* *.* 0 0 49152 0 IDLE
    localhost.5999 *.* 0 0 49152 0 LISTEN
    *.scqsd* . *0 0 49152 0 LISTEN*
    .scqsd *.* 0 0 49152 0 LISTEN
    **.8059* . *0 0 49152 0 LISTEN**
    *.8060 *.* 0 0 49152 0 LISTEN*
    *.* *.* 0 0 49152 0 IDLE
    *.sunrpc* . *0 0 49152 0 LISTEN*
    . . *0 0 49152 0 IDLE*
    localhost.5987 . *0 0 49152 0 LISTEN*
    localhost.898 . *0 0 49152 0 LISTEN*
    localhost.32781 . *0 0 49152 0 LISTEN*
    localhost.5988 . *0 0 49152 0 LISTEN*
    localhost.32782 . *0 0 49152 0 LISTEN*
    .ssh *.* 0 0 49152 0 LISTEN
    *.32783* . *0 0 49152 0 LISTEN*
    .32784 *.* 0 0 49152 0 LISTEN
    *.sccheckd* . *0 0 49152 0 LISTEN*
    .32785 *.* 0 0 49152 0 LISTEN
    *.servicetag* . *0 0 49152 0 LISTEN*
    localhost.smtp . *0 0 49152 0 LISTEN*
    localhost.submission . *0 0 49152 0 LISTEN*
    .32798 *.* 0 0 49152 0 LISTEN
    *.pnmd* . *0 0 49152 0 LISTEN*
    .32811 *.* 0 0 49152 0 BOUND
    localhost.6788 *.* 0 0 49152 0 LISTEN
    localhost.6789 *.* 0 0 49152 0 LISTEN
    scmars.ssh 161.228.79.36.54693 65180 51 49640 0 ESTABLISHED
    localhost.32793 *.* 0 0 49152 0 LISTEN
    *172.16.1.1.35136 172.16.1.2.8059 49640 0 49640 0 ESTABLISHED*
    *172.16.1.1.35137 172.16.1.2.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.1.1.35138 172.16.1.2.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.1.1.35139 172.16.1.2.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.1.1.35140 172.16.1.2.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.0.129.35141 172.16.0.130.8059 49640 0 49640 0 ESTABLISHED*
    *172.16.0.129.35142 172.16.0.130.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.0.129.35143 172.16.0.130.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.0.129.35144 172.16.0.130.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.0.129.35145 172.16.0.130.8060 49640 0 49640 0 ESTABLISHED*
    My question is, how can I modify the ports being used by the cluster interconnects as I would like to keep port 8059 as the tomcat listener port?
    Any help is appreciated.
    Thanks!

    Hi,
    unfortunately the ports used by Sun Cluster are hard wired, su you must cnge your Tomcat port.
    Sorry for the bad news
    Detlef

  • Aggregates, VLAN's, Jumbo-Frames and cluster interconnect opinions

    Hi All,
    I'm reviewing my options for a new cluster configuration and would like the opinions of people with more expertise than myself out there.
    What I have in mind as follows:
    2 x X4170 servers with 8 x NIC's in each.
    On each 4170 I was going to configure 2 aggregates with 3 nics in each aggregate as follows
    igb0 device in aggr1
    igb1 device in aggr1
    igb2 device in aggr1
    igb3 stand-alone device for iSCSI network
    e1000g0 device in aggr2
    e1000g1 device in aggr2
    e1000g2 device in aggr3
    e1000g3 stand-alone device of iSCSI network
    Now, on top of these aggregates, I was planning on creating VLAN interfaces which will allow me to connect to our two "public" network segments and for the cluster heartbeat network.
    I was then going to configure the vlan's in an IPMP group for failover. I know there are some questions around that configuration in the sense that IPMP will not detect a nic failure if a NIC goes offline in the aggregate, but I could monitor that in a different manner.
    At this point, my questions are:
    [1] Are vlan's, on top of aggregates, supported withing Solaris Cluster? I've not seen anything in the documentation to mention that it is, or is not for that matter. I see that vlan's are supported, inluding support for cluster interconnects over vlan's.
    Now with the standalone interface I want to enable jumbo frames, but I've noticed that the igb.conf file has a global setting for all nic ports, whereas I can enable it for a single nic port in the e1000g.conf kernel driver. My questions are as follows:
    [2] What is the general feeling with mixing mtu sizes on the same lan/vlan? Ive seen some comments that this is not a good idea, and some say that it doesnt cause a problem.
    [3] If the underlying nic, igb0-2 (aggr1) for example, has 9k mtu enabled, I can force the mtu size (1500) for "normal" networks on the vlan interfaces pointing to my "public" network and cluster interconnect vlan. Does anyone have experience of this causing any issues?
    Thanks in advance for all comments/suggestions.

    For 1) the question is really "Do I need to enable Jumbo Frames if I don't want to use them (neither public nore private network)" - the answer is no.
    For 2) each cluster needs to have its own seperate set of VLANs.
    Greets
    Thorsten

  • IPFC (ip over fc) cluster interconnect

    Hello!
    It a possible create cluster interconnect with IPFC (ip over fc) driver (for example - a reserve channel) ?
    What problems may arise?

    Hi,
    technically Sun Cluster works fine with only a single interconnect, but it used to be not supported. The mandatory requirement to have 2 dedicated interconnects was lifted a couple of months ago. Although it is still a best practice and a recommendation to use 2 independent interconnects.
    The possible consequences of only having one NIC port have been mentioned in the previous post.
    Regards
    Hartmut

  • LDOM SUN Cluster Interconnect failure

    I am making a test SUN-Cluster on Solaris 10 in LDOM 1.3.
    in my environment, i have T5120, i have setup two guest OS with some configurations, setup sun cluster software, when executed, scinstall, it failed.
    node 2 come up, but node 1 throws following messgaes:
    Boot device: /virtual-devices@100/channel-devices@200/disk@0:a File and args:
    SunOS Release 5.10 Version Generic_139555-08 64-bit
    Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
    Use is subject to license terms.
    Hostname: test1
    Configuring devices.
    Loading smf(5) service descriptions: 37/37
    /usr/cluster/bin/scdidadm: Could not load DID instance list.
    /usr/cluster/bin/scdidadm: Cannot open /etc/cluster/ccr/did_instances.
    Booting as part of a cluster
    NOTICE: CMM: Node test2 (nodeid = 1) with votecount = 1 added.
    NOTICE: CMM: Node test1 (nodeid = 2) with votecount = 0 added.
    NOTICE: clcomm: Adapter vnet2 constructed
    NOTICE: clcomm: Adapter vnet1 constructed
    NOTICE: CMM: Node test1: attempting to join cluster.
    NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting for quorum.
    NOTICE: clcomm: Path test1:vnet1 - test2:vnet1 errors during initiation
    NOTICE: clcomm: Path test1:vnet2 - test2:vnet2 errors during initiation
    WARNING: Path test1:vnet1 - test2:vnet1 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
    WARNING: Path test1:vnet2 - test2:vnet2 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
    clcomm: Path test1:vnet2 - test2:vnet2 errors during initiation
    CREATED VIRTUAL SWITCH AND VNETS ON PRIMARY DOMAIN LIKE:<>
    532 ldm add-vsw mode=sc cluster-vsw0 primary
    533 ldm add-vsw mode=sc cluster-vsw1 primary
    535 ldm add-vnet vnet2 cluster-vsw0 test1
    536 ldm add-vnet vnet3 cluster-vsw1 test1
    540 ldm add-vnet vnet2 cluster-vsw0 test2
    541 ldm add-vnet vnet3 cluster-vsw1 test2
    Primary DOmain<>
    bash-3.00# dladm show-dev
    vsw0 link: up speed: 1000 Mbps duplex: full
    vsw1 link: up speed: 0 Mbps duplex: unknown
    vsw2 link: up speed: 0 Mbps duplex: unknown
    e1000g0 link: up speed: 1000 Mbps duplex: full
    e1000g1 link: down speed: 0 Mbps duplex: half
    e1000g2 link: down speed: 0 Mbps duplex: half
    e1000g3 link: up speed: 1000 Mbps duplex: full
    bash-3.00# dladm show-link
    vsw0 type: non-vlan mtu: 1500 device: vsw0
    vsw1 type: non-vlan mtu: 1500 device: vsw1
    vsw2 type: non-vlan mtu: 1500 device: vsw2
    e1000g0 type: non-vlan mtu: 1500 device: e1000g0
    e1000g1 type: non-vlan mtu: 1500 device: e1000g1
    e1000g2 type: non-vlan mtu: 1500 device: e1000g2
    e1000g3 type: non-vlan mtu: 1500 device: e1000g3
    bash-3.00#
    NOde1<>
    -bash-3.00# dladm show-link
    vnet0 type: non-vlan mtu: 1500 device: vnet0
    vnet1 type: non-vlan mtu: 1500 device: vnet1
    vnet2 type: non-vlan mtu: 1500 device: vnet2
    -bash-3.00# dladm show-dev
    vnet0 link: unknown speed: 0 Mbps duplex: unknown
    vnet1 link: unknown speed: 0 Mbps duplex: unknown
    vnet2 link: unknown speed: 0 Mbps duplex: unknown
    -bash-3.00#
    NODE2<>
    -bash-3.00# dladm show-link
    vnet0 type: non-vlan mtu: 1500 device: vnet0
    vnet1 type: non-vlan mtu: 1500 device: vnet1
    vnet2 type: non-vlan mtu: 1500 device: vnet2
    -bash-3.00#
    -bash-3.00#
    -bash-3.00# dladm show-dev
    vnet0 link: unknown speed: 0 Mbps duplex: unknown
    vnet1 link: unknown speed: 0 Mbps duplex: unknown
    vnet2 link: unknown speed: 0 Mbps duplex: unknown
    -bash-3.00#
    and this configuration i give while setting up scinstall
    Cluster Transport Adapters and Cables <<<You must identify the two cluster transport adapters which attach
    this node to the private cluster interconnect.
    For node "test1",
    What is the name of the first cluster transport adapter [vnet1]?
    Will this be a dedicated cluster transport adapter (yes/no) [yes]?
    All transport adapters support the "dlpi" transport type. Ethernet
    and Infiniband adapters are supported only with the "dlpi" transport;
    however, other adapter types may support other types of transport.
    For node "test1",
    Is "vnet1" an Ethernet adapter (yes/no) [yes]?
    Is "vnet1" an Infiniband adapter (yes/no) [yes]? no
    For node "test1",
    What is the name of the second cluster transport adapter [vnet3]? vnet2
    Will this be a dedicated cluster transport adapter (yes/no) [yes]?
    For node "test1",
    Name of the switch to which "vnet2" is connected [switch2]?
    For node "test1",
    Use the default port name for the "vnet2" connection (yes/no) [yes]?
    For node "test2",
    What is the name of the first cluster transport adapter [vnet1]?
    Will this be a dedicated cluster transport adapter (yes/no) [yes]?
    For node "test2",
    Name of the switch to which "vnet1" is connected [switch1]?
    For node "test2",
    Use the default port name for the "vnet1" connection (yes/no) [yes]?
    For node "test2",
    What is the name of the second cluster transport adapter [vnet2]?
    Will this be a dedicated cluster transport adapter (yes/no) [yes]?
    For node "test2",
    Name of the switch to which "vnet2" is connected [switch2]?
    For node "test2",
    Use the default port name for the "vnet2" connection (yes/no) [yes]?
    i have setup the configurations like.
    ldm list -l nodename
    NODE1<>
    NETWORK
    NAME SERVICE ID DEVICE MAC MODE PVID VID MTU LINKPROP
    vnet1 primary-vsw0@primary 0 network@0 00:14:4f:f9:61:63 1 1500
    vnet2 cluster-vsw0@primary 1 network@1 00:14:4f:f8:87:27 1 1500
    vnet3 cluster-vsw1@primary 2 network@2 00:14:4f:f8:f0:db 1 1500
    ldm list -l nodename
    NODE2<>
    NETWORK
    NAME SERVICE ID DEVICE MAC MODE PVID VID MTU LINKPROP
    vnet1 primary-vsw0@primary 0 network@0 00:14:4f:f9:a1:68 1 1500
    vnet2 cluster-vsw0@primary 1 network@1 00:14:4f:f9:3e:3d 1 1500
    vnet3 cluster-vsw1@primary 2 network@2 00:14:4f:fb:03:83 1 1500
    ldm list-services
    VSW
    NAME LDOM MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
    primary-vsw0 primary 00:14:4f:f9:25:5e e1000g0 0 switch@0 1 1 1500 on
    cluster-vsw0 primary 00:14:4f:fb:db:cb 1 switch@1 1 1 1500 sc on
    cluster-vsw1 primary 00:14:4f:fa:c1:58 2 switch@2 1 1 1500 sc on
    ldm list-bindings primary
    VSW
    NAME MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
    primary-vsw0 00:14:4f:f9:25:5e e1000g0 0 switch@0 1 1 1500 on
    PEER MAC PVID VID MTU LINKPROP INTERVNETLINK
    vnet1@gitserver 00:14:4f:f8:c0:5f 1 1500
    vnet1@racc2 00:14:4f:f8:2e:37 1 1500
    vnet1@test1 00:14:4f:f9:61:63 1 1500
    vnet1@test2 00:14:4f:f9:a1:68 1 1500
    NAME MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
    cluster-vsw0 00:14:4f:fb:db:cb 1 switch@1 1 1 1500 sc on
    PEER MAC PVID VID MTU LINKPROP INTERVNETLINK
    vnet2@test1 00:14:4f:f8:87:27 1 1500
    vnet2@test2 00:14:4f:f9:3e:3d 1 1500
    NAME MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
    cluster-vsw1 00:14:4f:fa:c1:58 2 switch@2 1 1 1500 sc on
    PEER MAC PVID VID MTU LINKPROP INTERVNETLINK
    vnet3@test1 00:14:4f:f8:f0:db 1 1500
    vnet3@test2 00:14:4f:fb:03:83 1 1500
    Any Idea Team, i beleive the cluster interconnect adapters were not successfull.
    I need any guidance/any clue, how to correct the private interconnect for clustering in two guest LDOMS.

    You dont have to stick to default IP's or subnet . You can change to whatever IP's you need. Whatever subnet mask you need. Even change the private names.
    You can do all this during install or even after install.
    Read the cluster install doc at docs.sun.com

  • Cluster interconnect on LDOM

    Hi,
    We want to setup Solaris cluster on LDOM environment.
    We have:
    - Primary domain
    - Alternate domain (Service Domain)
    So we want to setup the cluster interconnect from primary domain and service domain, like below configuration:
    example:
    ldm add-vsw net-dev=net3 mode=sc private-vsw1 primary
    ldm add-vsw net-dev=net7 mode=sc private-vsw2 alternate
    ldm add-vnet private-net1 mode=hybrid private-vsw1 ldg1
    ldm add-vnet private-net2 mode=hybrid private-vsw2 ldg1
    It's supported the configuration above?
    If there is any documentation about this, please refer me.
    Thanks,

    Hi rachfebrianto,
    yes, the commands are looking good. Minimum requirement is Solaris Cluster 3.2u3 to use hybrid I/O. But I guess you running 3.3 or 4.1 anyway.
    The mode=sc is a requirement on the vsw for Solaris Cluster interconnect (private network).
    And it is supported to add mode=hybrid to guest LDom for the Solaris Cluster interconnect.
    There is no special documentation for Solaris Cluster because its using what is available in the
    Oracle VM Server for SPARC 3.1 Administration Guide
    Using NIU Hybrid I/O
    How to Configure a Virtual Switch With an NIU Network Device
    How to Enable or Disable Hybrid Mode
    Hth,
      Juergen

  • Cluster Interconnect information

    Hi,
    We have a two node cluster (10.2.0.3) running on top of Solaris and Veritas SFRAC.
    The cluster is working fine. I would like to get more information about the private cluster interconnect used by the Oracle Clusterware but the two places I could think of have shown nothing.
    SQL> show parameter cluster_interconnects;
    NAME TYPE VALUE
    cluster_interconnects string
    SQL> select * from GV$CLUSTER_INTERCONNECTS;
    no rows selected
    I wasn't expecting to see anything in the cluster_interconnects parameter but thought there would be something in the dictionary view.
    I'd be grateful if anyone could shed some light on this.
    Where can I get information about the currently configured private interconnect?
    I've yet to check the OCR, does anyone know which key/values are relevant, if any?
    Thanks
    user234564

    Try this:
    1. $ORA_CRS_HOME/bin/oifcfg getif
    eth0 1xx.xxx.x.0 global public
    eth1 192.168.0.0 global cluster_interconnect
    2. V$CONFIGURED_INTERCONNECTS;
    3. X$KSXPIA;
    HTH
    Thanks
    Chandra Pabba

  • Oracle10g RAC Cluster Interconnect issues

    Hello Everybody,
    Just a brief overview as to what i am currently doing. I have installed Oracle10g RAC database on a cluster of two Windows 2000 AS nodes.These two nodes are accessing an external SCSI hard disk.I have used Oracle cluster file system.
    Currently i am facing some performance issues when it comes to balancing workload on both the nodes.(Single instance database load is faster than a parallel load using two database instances).
    I feel the performance issues could be due to IPC using public Ethernet IP instead of private interconnect.
    (During a parallel load large amount of packets of data are sent over the Public IP and not Private interconnect).
    How can i be sure that the Private interconnect is used for transferring cluster traffic and not the Public IP? (Oracle mentions that for a Oracle10g RAC database, private IP should be used for heart beat as well as transferring cluster traffic).
    Thanks in advance,
    Regards,
    Salil

    You find the answers here:
    RAC: Frequently Asked Questions
    Doc ID: NOTE:220970.1
    At least crossover interconnect is completely unsupported.
    Werner

  • Cluster interconnect

    We have a 3 node RAC cluster, 10.2.0.3 version. sys admin is gearing up to change
    1g interconnect to 10g interconnect. just trying to find out, if anything we need to be prepared with from the database point of view/cluster point of view.
    Thanks

    riyaj wrote:
    But, if the protocol is not RDS, then the path becomes udp -> ip -> IPoIB -> HCA. Clearly, there is an additional layer IPoIB. Considering that most latency is at the software layer level, not in the hardware layer, I am not sure, additional layer will improve the latency. Perhaps when one compares 10GigE with 10Gb IB... but that would be comparing new Ethernet technology with older IB technology. QDR (40Gb) is pretty much the standard (for some years now) for IB.
    Originally we compared 1GigE with 10Gb IB, as 10GigE was not available. IPoIB was a lot faster on SDR IB than 1GigE.
    When 10GigE was released, it was pretty expensive (not sure if this is still the case). A 10Gb Ethernet port was more than 1.5x the cost of 40Gb IB port.
    IB also supports a direct socket (or something) for IP applications. As I understand, this simplifies the call interface and allows socket calls to be made with less latency (surpassing that of the socket interface of a standard IP stack on Ethernet). We never looked at this ourselves as our Interconnect using IB was pretty robust and performant using standard IPoIB.
    Further, InfiniBand has data center implications. In huge companies, this is a problem: A separate infiniband architecture is needed to support infiniband network, which is not exactly a mundane task. With 10Gb NIC cards, existing network infrastructure can be used as long as the switch supports the 10Gb traffic. True... but I see that more as resistance to new technology and even the network vendor used (do not have to name names, do I?) that will specifically slam IB technology as they do not supply IB kit. A pure profit and territory issue.
    All resistance I've ever seen and responded to with IB versus Ethernet have been pretty much unwarranted - to the extend of seing RACs being build using 100Mb Ethernet Interconnect as IB was a "foreign" technology and equated to evil/do not use/complex/unstable/etc.
    Another issue to keep in mind is that IB is a fabric layer. SRP scales and performs better than using fibre channel technology and protocols. So IB is not only suited as Interconnect, but also as the storage fabric layer. (Exadata pretty much proved that point).
    Some months ago OFED announced that the SRP specs have been made available to Ethernet vendors to implement (as there is nothing equivalent on Ethernet). Unfortunately, a 10Gig Ethernet SRP implementation will still lack in comparison with a QDR IB SRP implementation.
    Third point, skill set in the system admin side needs further adjustment to support Infiniband hardware effectively. Important point. But it is not that difficult as sysadmin to acquire the basic set of skills to manage IB from an o/s perspective. Likewise not that difficult for a network engineer to acquire the basic skills for managing the switch and fabric layer.
    The one issue that I think is the single biggest negative ito using IB is getting a stable OFED driver stack running in the kernel. Some of the older versions were not that stable. However, later versions have improved considerably and the current version seems pretty robust. Oh yeah - this is specifically using SRP. IPoIB and bonding and so on, have always worked pretty well. RDMA and SRP were not always that stable with the v1.3 drivers and earlier.

  • Cluster Interconnect droped packets.

    Hi,
    We have a 4 node RAC cluster 10.2.0.3 that is seeing some reboot issues that seem to be network related. The network statistics are showing dropped packets across the interconnect (bond1,eth2). Is this normal behavior due to using UDP?
    $ netstat -i
    Kernel Interface table
    Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
    bond0 1500 0 387000915 0 0 0 377153910 0 0 0 BMmRU
    bond1 1500 0 942586399 0 2450416 0 884471536 0 0 0 BMmRU
    eth0 1500 0 386954905 0 0 0 377153910 0 0 0 BMsRU
    eth1 1500 0 46010 0 0 0 0 0 0 0 BMsRU
    eth2 1500 0 942583215 0 2450416 0 884471536 0 0 0 BMsRU
    eth3 1500 0 3184 0 0 0 0 0 0 0 BMsRU
    lo 16436 0 1048410 0 0 0 1048410 0 0 0 LRU
    Thanks

    Hi,
    To diagnose the reboot issues refere *Troubleshooting 10g and 11.1 Clusterware Reboots [ID 265769.1]*
    Also monitor your lost blocks *gc lost blocks diagnostics [ID 563566.1]*
    I had a issue which turned out to be network card related (gc lost blocks) http://www.asanga-pradeep.blogspot.com/2011/05/gathering-stats-for-gc-lost-blocks.html

  • Soalris Cluster Interconnect over Layer 3 Link

    Is it possible to connect two Solaris Cluster nodes over a pure layer 3 (TCP/UDP) network connection over a distance of about 10 km?

    The problem with having just single node clusters is that, effectively, any failure on the primary site will require invocation of the DR process rather than just a local fail-over. Remember that Sun Cluster and Sun Cluster Geographic Edition are aimed at different problems: high availability and disaster recovery respectively. You seem to be trying to combine the two and this is a bad thing IMHO. (See the Blueprint http://www.sun.com/blueprints/0406/819-5783.html)
    I'm assuming that you have some leased bandwidth between these sites and hence the pure layer 3 networking.
    What do you actually want to achieve: HA or DR? If it's both, you will probably have to make compromises either in cost or expectations.
    Regards,
    Tim
    ---

  • Gig Ethernet V/S  SCI as Cluster Private Interconnect for Oracle RAC

    Hello Gurus
    Can any one pls confirm if it's possible to configure 2 or more Gigabit Ethernet interconnects ( Sun Cluster 3.1 Private Interconnects) on a E6900 cluster ?
    It's for a High Availability requirement of Oracle 9i RAC. i need to know ,
    1) can i use gigabit ethernet as Private cluster interconnect for Deploying Oracle RAC on E6900 ?
    2) What is the recommended Private Cluster Interconnect for Oracle RAC ? GiG ethernet or SCI with RSM ?
    3) How about the scenarios where one can have say 3 X Gig Ethernet V/S 2 X SCI , as their cluster's Private Interconnects ?
    4) How the Interconnect traffic gets distributed amongest the multiple GigaBit ethernet Interconnects ( For oracle RAC) , & is anything required to be done at oracle Rac Level to enable Oracle to recognise that there are multiple interconnect cards it needs to start utilizing all of the GigaBit ethernet Interfaces for transfering packets ?
    5) what would happen to Oracle RAC if one of the Gigabit ethernet private interconnects fails
    Have tried searching for this info but could not locate any doc that can precisely clarify these doubts that i have .........
    thanks for the patience
    Regards,
    Nilesh

    Answers inline...
    Tim
    Can any one pls confirm if it's possible to configure
    2 or more Gigabit Ethernet interconnects ( Sun
    Cluster 3.1 Private Interconnects) on a E6900
    cluster ?Yes, absolutely. You can configure up to 6 NICs for the private networks. Traffic is automatically striped across them if you specify clprivnet0 to Oracle RAC (9i or 10g). That is TCP connections and UDP messages.
    It's for a High Availability requirement of Oracle
    9i RAC. i need to know ,
    1) can i use gigabit ethernet as Private cluster
    interconnect for Deploying Oracle RAC on E6900 ? Yes, definitely.
    2) What is the recommended Private Cluster
    Interconnect for Oracle RAC ? GiG ethernet or SCI
    with RSM ? SCI is or is in the process of being EOL'ed. Gigabit is usually sufficient. Longer term you may want to consider Infiniband or 10 Gigabit ethernet with RDS.
    3) How about the scenarios where one can have say 3 X
    Gig Ethernet V/S 2 X SCI , as their cluster's
    Private Interconnects ? I would still go for 3 x GbE because it is usually cheaper and will probably work just as well. The latency and bandwidth differences are often masked by the performance of the software higher up the stack. In short, unless you tuned the heck out of your application and just about everything else, don't worry too much about the difference between GbE and SCI.
    4) How the Interconnect traffic gets distributed
    amongest the multiple GigaBit ethernet Interconnects
    ( For oracle RAC) , & is anything required to be done
    at oracle Rac Level to enable Oracle to recognise
    that there are multiple interconnect cards it needs
    to start utilizing all of the GigaBit ethernet
    Interfaces for transfering packets ?You don't need to do anything at the Oracle level. That's the beauty of using Oracle RAC with Sun Cluster as opposed to RAC on its own. The striping takes place automatically and transparently behind the scenes.
    5) what would happen to Oracle RAC if one of the
    Gigabit ethernet private interconnects fails It's completely transparent. Oracle will never see the failure.
    Have tried searching for this info but could not
    locate any doc that can precisely clarify these
    doubts that i have .........This is all covered in a paper that I have just completed and should be published after Christmas. Unfortunately, I cannot give out the paper yet.
    thanks for the patience
    Regards,
    Nilesh

  • Cluster Private Interconnect

    Hi,
    Does Global Cache Services work only when Cluster Private Interconnect is configured? I am not seeing any data in v$cache_transfer. cluster_interconnects parameter is blank. V$CLUSTER_INTERCONNECTS view is missing. Please let me know.
    Thanks,
    Madhav

    HI
    If you want to use specif interconnect IP then you can add this in cluster interconnect parameter,
    if it is blank then you are using one interconnect which is default , so no need to worry about it is blank
    rds

  • Interconnect speed on dbconsole

    Hi,
    We are seeing Private Interconnect Transfer Rate (MB/Sec)     betwee 0.45 MB/Sec to 0.023 MB/Sec. This we are seeing on DB Console. (DB Console -> Cluster -> Interconnects )
    can it be due to less load on system or the interconnect it self is slow?
    Is there a way to load test interconnect?
    This is 11.2.0.3 ON AIX 7.1, paired gigabit ethernet as private interconnect.
    For both nodes following gives abt 2 ms for AVG CR BLOCK RECEIVE TIME (ms)
    select b1.inst_id, b2.value "GCS CR BLOCKS RECEIVED",
    b1.value "GCS CR BLOCK RECEIVE TIME",
    ((b1.value / b2.value) * 10) "AVG CR BLOCK RECEIVE TIME (ms)"
    from gv$sysstat b1, gv$sysstat b2
    where b1.name = 'global cache cr block receive time' and
    b2.name = 'global cache cr blocks received' and b1.inst_id = b2.inst_id
    or b1.name = 'gc cr block receive time' and
    b2.name = 'gc cr blocks received' and b1.inst_id = b2.inst_id
    /Also please see following
    /home/oracle> hostname
    Hostname for Node1
    /home/oracle> netstat -p udp
    udp:
            148643921 datagrams received
            0 incomplete headers
            0 bad data length fields
            0 bad checksums
            63270 dropped due to no socket
            30286 broadcast/multicast datagrams dropped due to no socket
            0 socket buffer overflows
            148550365 delivered
            146920807 datagrams output
    /home/oracle> hostname
    Hostname for Node 2
    /home/oracle> netstat -p udp
    udp:
            74069717 datagrams received
            0 incomplete headers
            0 bad data length fields
            0 bad checksums
            36390 dropped due to no socket
            4852 broadcast/multicast datagrams dropped due to no socket
            0 socket buffer overflows
            74028475 delivered
            75133092 datagrams output
    /home/oracle>

    The approach I would take is to ask "do you actually have a problem?" No end user has ever telephoned the help desk to say "the interconnect transfer rate is too low". If they aren't complaining, there is no problem.
    That having been said, I've just looked at the avg global cache CR receive time on three systems, and your 2ms is not brilliant but is in the range:
    0.8
    1.2
    2.6
    John Watson
    Oracle Certified Master DBA
    http://skillbuilders.com

  • Cluster doesn't have operational quorum - one node of the cl couldnt reach

    Hi All,
    (I posted this issue at an other forum but I thought this one is more suitable, sorry to post it again)
    I'd like to ask help regarding to the problem I had which made me
    block. Unfortunately my investigation was not good enough.
    Here is the brief description below;
    I have a V440 cluster and one of the node could not boot up
    successfully, startup log says me;
    cluster doesn't have operational quorum yet; waiting for quorum.
    what I did were;
    1./ booting both nodes at the same time - didn't work
    2./ shutdown good node and only started up bad node - didn't work
    3./ replaced cluster interconnect cables with working cables - didn't
    work
    Please take a look at the startup log and let me know what I should do
    further.
    Your helps will be really appreciated for me thx,
    ++++++++++
    Rebooting with command: boot
    Boot device: disk0 File and args:
    SunOS Release 5.9 Version Generic_122300-28 64-bit
    Copyright 1983-2003 Sun Microsystems, Inc. All rights reserved.
    Use is subject to license terms.
    /pci@1d,700000/pci@1/scsi@4 (qus0):
    initiator SCSI ID now 6
    /pci@1d,700000/pci@1/scsi@5 (qus1):
    initiator SCSI ID now 6
    /pci@1d,700000/pci@2/scsi@4 (qus2):
    initiator SCSI ID now 6
    /pci@1d,700000/pci@2/scsi@5 (qus3):
    initiator SCSI ID now 6
    Hardware watchdog enabled
    SC unretrieved msg SEP 16 09:21:50 2008 UTC [Host System has Reset]
    Starting VxVM restore daemon...
    VxVM starting in boot mode...
    NOTICE: vxvm:vxdmp: added disk array OTHER_DISKS, datype = OTHER_DISKS
    NOTICE: vxvm:vxdmp: disabled path 32/0x298 belonging to the dmpnode
    232/0x8
    NOTICE: vxvm:vxdmp: disabled dmpnode 232/0x8
    configuring IPv4 interfaces: ce1 ce2 ce5 ce6.
    Hostname: kadikoy
    VxVM starting special volumes ( swapvol rootvol var oracle
    rootdisk_25vol )...
    Booting as part of a cluster
    NOTICE: CMM: Node taksim (nodeid = 1) with votecount = 1 added.
    NOTICE: CMM: Node kadikoy (nodeid = 2) with votecount = 1 added.
    NOTICE: CMM: Quorum device 1 (/dev/did/rdsk/d4s2) added; votecount =
    1, bitmask of nodes with configured paths = 0x3.
    NOTICE: clcomm: Adapter ce8 constructed
    NOTICE: clcomm: Path kadikoy:ce8 - taksim:ce8 being constructed
    NOTICE: clcomm: Adapter ce4 constructed
    NOTICE: clcomm: Path kadikoy:ce4 - taksim:ce4 being constructed
    NOTICE: CMM: Node kadikoy: attempting to join cluster.
    NOTICE: clcomm: Path kadikoy:ce8 - taksim:ce8 errors during initiation
    NOTICE: clcomm: Path kadikoy:ce4 - taksim:ce4 errors during initiation
    WARNING: Path kadikoy:ce8 - taksim:ce8 initiation encountered errors,
    errno = 62. Remote node may be down or unreachable through this path.
    WARNING: Path kadikoy:ce4 - taksim:ce4 initiation encountered errors,
    errno = 62. Remote node may be down or unreachable through this path.
    NOTICE: CMM: Quorum device 1 (gdevname /dev/did/rdsk/d4s2) can not be
    acquired by the current cluster members. This quorum device is held by
    node 1.
    NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting for
    quorum.
    NOTICE: vxvm:vxdmp: enabled path 32/0x298 belonging to the dmpnode
    232/0x8
    NOTICE: vxvm:vxdmp: enabled dmpnode 232/0x8
    ++++++++++

    Tim.Read wrote:
    The key messages are:
    NOTICE: clcomm: Path kadikoy:ce8 - taksim:ce8 errors during initiation
    NOTICE: clcomm: Path kadikoy:ce4 - taksim:ce4 errors during initiation
    WARNING: Path kadikoy:ce8 - taksim:ce8 initiation encountered errors,
    errno = 62. Remote node may be down or unreachable through this path.
    WARNING: Path kadikoy:ce4 - taksim:ce4 initiation encountered errors,
    So the heartbeat networks between the two machines aren't working. Without that, the existing node won't allow this node to join the cluster and then release its reservations.
    Regards,
    Tim
    ---hi,
    thanks,
    but interconnects are connected point-to-point and working fine on other clusters, so How can I investigate heartbeat networks?
    on the other side as you said bad node can not boot up.
    regards,
    Halit

Maybe you are looking for

  • Ink cartridges life span in Photosmart Pro B8850

    I have an HP Photosmart Pro B8850 Printer that we bought so I could print high quality pictures and also on 12x12 cardstock for scrapbooking.  We don't use our printer very often (maybe once a month), however the ink cartridges seem to run out really

  • Getting Error during Invoicing

    Hi Experts, I am getting error of an account during invoicing.I have tried to invoice of the Electric billing document in T code EA19. I am not understanding why it is fetched Gas division business area where I am trying to invoice of only Electric.

  • How to work even and odd number query

    Plz explain how below worked even and odd query 2 ) why used subquery after from and which time we can use(what time of out put) even select * FROM (SELECT ROW_NUMBER() OVER(ORDER BY stud_id) as row ,grno,stud_id from stud_Enrollment) d  WHERE d.row

  • HT4550 My bar on my safari is gone, as in i can't go on any website other than the apple store

    My bar on my safari is gone, as in i can't go on any website other than the apple store. how do i get the google bar back!!!

  • How to trace sql in plsql?

    Hi, experts, I want to trace my sql query in plsql, but i dont know how to do it? could somebody give me a hint or an example? Thank you