Cluster interconnect on LDOM

Hi,
We want to setup Solaris cluster on LDOM environment.
We have:
- Primary domain
- Alternate domain (Service Domain)
So we want to setup the cluster interconnect from primary domain and service domain, like below configuration:
example:
ldm add-vsw net-dev=net3 mode=sc private-vsw1 primary
ldm add-vsw net-dev=net7 mode=sc private-vsw2 alternate
ldm add-vnet private-net1 mode=hybrid private-vsw1 ldg1
ldm add-vnet private-net2 mode=hybrid private-vsw2 ldg1
It's supported the configuration above?
If there is any documentation about this, please refer me.
Thanks,

Hi rachfebrianto,
yes, the commands are looking good. Minimum requirement is Solaris Cluster 3.2u3 to use hybrid I/O. But I guess you running 3.3 or 4.1 anyway.
The mode=sc is a requirement on the vsw for Solaris Cluster interconnect (private network).
And it is supported to add mode=hybrid to guest LDom for the Solaris Cluster interconnect.
There is no special documentation for Solaris Cluster because its using what is available in the
Oracle VM Server for SPARC 3.1 Administration Guide
Using NIU Hybrid I/O
How to Configure a Virtual Switch With an NIU Network Device
How to Enable or Disable Hybrid Mode
Hth,
  Juergen

Similar Messages

  • LDOM SUN Cluster Interconnect failure

    I am making a test SUN-Cluster on Solaris 10 in LDOM 1.3.
    in my environment, i have T5120, i have setup two guest OS with some configurations, setup sun cluster software, when executed, scinstall, it failed.
    node 2 come up, but node 1 throws following messgaes:
    Boot device: /virtual-devices@100/channel-devices@200/disk@0:a File and args:
    SunOS Release 5.10 Version Generic_139555-08 64-bit
    Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
    Use is subject to license terms.
    Hostname: test1
    Configuring devices.
    Loading smf(5) service descriptions: 37/37
    /usr/cluster/bin/scdidadm: Could not load DID instance list.
    /usr/cluster/bin/scdidadm: Cannot open /etc/cluster/ccr/did_instances.
    Booting as part of a cluster
    NOTICE: CMM: Node test2 (nodeid = 1) with votecount = 1 added.
    NOTICE: CMM: Node test1 (nodeid = 2) with votecount = 0 added.
    NOTICE: clcomm: Adapter vnet2 constructed
    NOTICE: clcomm: Adapter vnet1 constructed
    NOTICE: CMM: Node test1: attempting to join cluster.
    NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting for quorum.
    NOTICE: clcomm: Path test1:vnet1 - test2:vnet1 errors during initiation
    NOTICE: clcomm: Path test1:vnet2 - test2:vnet2 errors during initiation
    WARNING: Path test1:vnet1 - test2:vnet1 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
    WARNING: Path test1:vnet2 - test2:vnet2 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
    clcomm: Path test1:vnet2 - test2:vnet2 errors during initiation
    CREATED VIRTUAL SWITCH AND VNETS ON PRIMARY DOMAIN LIKE:<>
    532 ldm add-vsw mode=sc cluster-vsw0 primary
    533 ldm add-vsw mode=sc cluster-vsw1 primary
    535 ldm add-vnet vnet2 cluster-vsw0 test1
    536 ldm add-vnet vnet3 cluster-vsw1 test1
    540 ldm add-vnet vnet2 cluster-vsw0 test2
    541 ldm add-vnet vnet3 cluster-vsw1 test2
    Primary DOmain<>
    bash-3.00# dladm show-dev
    vsw0 link: up speed: 1000 Mbps duplex: full
    vsw1 link: up speed: 0 Mbps duplex: unknown
    vsw2 link: up speed: 0 Mbps duplex: unknown
    e1000g0 link: up speed: 1000 Mbps duplex: full
    e1000g1 link: down speed: 0 Mbps duplex: half
    e1000g2 link: down speed: 0 Mbps duplex: half
    e1000g3 link: up speed: 1000 Mbps duplex: full
    bash-3.00# dladm show-link
    vsw0 type: non-vlan mtu: 1500 device: vsw0
    vsw1 type: non-vlan mtu: 1500 device: vsw1
    vsw2 type: non-vlan mtu: 1500 device: vsw2
    e1000g0 type: non-vlan mtu: 1500 device: e1000g0
    e1000g1 type: non-vlan mtu: 1500 device: e1000g1
    e1000g2 type: non-vlan mtu: 1500 device: e1000g2
    e1000g3 type: non-vlan mtu: 1500 device: e1000g3
    bash-3.00#
    NOde1<>
    -bash-3.00# dladm show-link
    vnet0 type: non-vlan mtu: 1500 device: vnet0
    vnet1 type: non-vlan mtu: 1500 device: vnet1
    vnet2 type: non-vlan mtu: 1500 device: vnet2
    -bash-3.00# dladm show-dev
    vnet0 link: unknown speed: 0 Mbps duplex: unknown
    vnet1 link: unknown speed: 0 Mbps duplex: unknown
    vnet2 link: unknown speed: 0 Mbps duplex: unknown
    -bash-3.00#
    NODE2<>
    -bash-3.00# dladm show-link
    vnet0 type: non-vlan mtu: 1500 device: vnet0
    vnet1 type: non-vlan mtu: 1500 device: vnet1
    vnet2 type: non-vlan mtu: 1500 device: vnet2
    -bash-3.00#
    -bash-3.00#
    -bash-3.00# dladm show-dev
    vnet0 link: unknown speed: 0 Mbps duplex: unknown
    vnet1 link: unknown speed: 0 Mbps duplex: unknown
    vnet2 link: unknown speed: 0 Mbps duplex: unknown
    -bash-3.00#
    and this configuration i give while setting up scinstall
    Cluster Transport Adapters and Cables <<<You must identify the two cluster transport adapters which attach
    this node to the private cluster interconnect.
    For node "test1",
    What is the name of the first cluster transport adapter [vnet1]?
    Will this be a dedicated cluster transport adapter (yes/no) [yes]?
    All transport adapters support the "dlpi" transport type. Ethernet
    and Infiniband adapters are supported only with the "dlpi" transport;
    however, other adapter types may support other types of transport.
    For node "test1",
    Is "vnet1" an Ethernet adapter (yes/no) [yes]?
    Is "vnet1" an Infiniband adapter (yes/no) [yes]? no
    For node "test1",
    What is the name of the second cluster transport adapter [vnet3]? vnet2
    Will this be a dedicated cluster transport adapter (yes/no) [yes]?
    For node "test1",
    Name of the switch to which "vnet2" is connected [switch2]?
    For node "test1",
    Use the default port name for the "vnet2" connection (yes/no) [yes]?
    For node "test2",
    What is the name of the first cluster transport adapter [vnet1]?
    Will this be a dedicated cluster transport adapter (yes/no) [yes]?
    For node "test2",
    Name of the switch to which "vnet1" is connected [switch1]?
    For node "test2",
    Use the default port name for the "vnet1" connection (yes/no) [yes]?
    For node "test2",
    What is the name of the second cluster transport adapter [vnet2]?
    Will this be a dedicated cluster transport adapter (yes/no) [yes]?
    For node "test2",
    Name of the switch to which "vnet2" is connected [switch2]?
    For node "test2",
    Use the default port name for the "vnet2" connection (yes/no) [yes]?
    i have setup the configurations like.
    ldm list -l nodename
    NODE1<>
    NETWORK
    NAME SERVICE ID DEVICE MAC MODE PVID VID MTU LINKPROP
    vnet1 primary-vsw0@primary 0 network@0 00:14:4f:f9:61:63 1 1500
    vnet2 cluster-vsw0@primary 1 network@1 00:14:4f:f8:87:27 1 1500
    vnet3 cluster-vsw1@primary 2 network@2 00:14:4f:f8:f0:db 1 1500
    ldm list -l nodename
    NODE2<>
    NETWORK
    NAME SERVICE ID DEVICE MAC MODE PVID VID MTU LINKPROP
    vnet1 primary-vsw0@primary 0 network@0 00:14:4f:f9:a1:68 1 1500
    vnet2 cluster-vsw0@primary 1 network@1 00:14:4f:f9:3e:3d 1 1500
    vnet3 cluster-vsw1@primary 2 network@2 00:14:4f:fb:03:83 1 1500
    ldm list-services
    VSW
    NAME LDOM MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
    primary-vsw0 primary 00:14:4f:f9:25:5e e1000g0 0 switch@0 1 1 1500 on
    cluster-vsw0 primary 00:14:4f:fb:db:cb 1 switch@1 1 1 1500 sc on
    cluster-vsw1 primary 00:14:4f:fa:c1:58 2 switch@2 1 1 1500 sc on
    ldm list-bindings primary
    VSW
    NAME MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
    primary-vsw0 00:14:4f:f9:25:5e e1000g0 0 switch@0 1 1 1500 on
    PEER MAC PVID VID MTU LINKPROP INTERVNETLINK
    vnet1@gitserver 00:14:4f:f8:c0:5f 1 1500
    vnet1@racc2 00:14:4f:f8:2e:37 1 1500
    vnet1@test1 00:14:4f:f9:61:63 1 1500
    vnet1@test2 00:14:4f:f9:a1:68 1 1500
    NAME MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
    cluster-vsw0 00:14:4f:fb:db:cb 1 switch@1 1 1 1500 sc on
    PEER MAC PVID VID MTU LINKPROP INTERVNETLINK
    vnet2@test1 00:14:4f:f8:87:27 1 1500
    vnet2@test2 00:14:4f:f9:3e:3d 1 1500
    NAME MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
    cluster-vsw1 00:14:4f:fa:c1:58 2 switch@2 1 1 1500 sc on
    PEER MAC PVID VID MTU LINKPROP INTERVNETLINK
    vnet3@test1 00:14:4f:f8:f0:db 1 1500
    vnet3@test2 00:14:4f:fb:03:83 1 1500
    Any Idea Team, i beleive the cluster interconnect adapters were not successfull.
    I need any guidance/any clue, how to correct the private interconnect for clustering in two guest LDOMS.

    You dont have to stick to default IP's or subnet . You can change to whatever IP's you need. Whatever subnet mask you need. Even change the private names.
    You can do all this during install or even after install.
    Read the cluster install doc at docs.sun.com

  • Interconnects on LDOMs with SC3.3

    Hello @all,
    a question how to set up interconnects with LDOMs and SC3.3:
    We have two T4-1 and want to set up 2 LDOMs on each ( ldom1 and ldom2 ).
    Each LDOM should become a cluster
    Cluster1: ldom1 on 1st T4-1 and ldom1 on 2nd T4-1
    Cluster2: ldom2 on 1st T4-1 and ldom2 on 2nd T4-1
    But we habe only 2 network ports for the interconnects on ech T4-1 and each cluster should work with two interconnects.
    So in the end the two clusters have to share these two network ports for their two interconnects.
    Can anyone advise me how to setup these two network ports so that each cluster is able to work with two interconnects?
    Cluster software should be installed in the LDOMs and not in the control domains!
    Like here:
    http://docs.oracle.com/cd/E18728_01/html/821-2682/baccigea.html
    Figure 2-11 SPARC: Clusters Span Two Different Hosts
    Thanks!
    Heinz
    Edited by: 921905 on 23.09.2012 13:25
    Edited by: 921905 on 23.09.2012 13:29
    Edited by: 921905 on 23.09.2012 13:36

    The answer should be in the docs somewhere. Actually, you configure the two network links in the control domain and present them as vnets to the guests.
    You then configure the first cluster to use these two links for the private interconnect. You should accept the default network address, but not the default netmask. Instead you should use the builtin functionality to limit the address space used by one cluster: limit the number of nodes, of interconnects and of zoneclusters. scinstall will then tell you how much of an address space starting with the default address is used by this cluster.
    You then install the second cluster and do not accept the default interconnect address, but add up the network space used by the first cluster. Then you'll be fine.
    To explain it by example. For the first cluster you chose 10.0.0.0 with netmask 255.255.0.0 for the interconnect. For the second you chose 10.1.0.0. with the same netmaks. It is now guaranteed that the interconnect traffic does not interfere.
    Hope that helps
    Hartmut

  • Aggregates, VLAN's, Jumbo-Frames and cluster interconnect opinions

    Hi All,
    I'm reviewing my options for a new cluster configuration and would like the opinions of people with more expertise than myself out there.
    What I have in mind as follows:
    2 x X4170 servers with 8 x NIC's in each.
    On each 4170 I was going to configure 2 aggregates with 3 nics in each aggregate as follows
    igb0 device in aggr1
    igb1 device in aggr1
    igb2 device in aggr1
    igb3 stand-alone device for iSCSI network
    e1000g0 device in aggr2
    e1000g1 device in aggr2
    e1000g2 device in aggr3
    e1000g3 stand-alone device of iSCSI network
    Now, on top of these aggregates, I was planning on creating VLAN interfaces which will allow me to connect to our two "public" network segments and for the cluster heartbeat network.
    I was then going to configure the vlan's in an IPMP group for failover. I know there are some questions around that configuration in the sense that IPMP will not detect a nic failure if a NIC goes offline in the aggregate, but I could monitor that in a different manner.
    At this point, my questions are:
    [1] Are vlan's, on top of aggregates, supported withing Solaris Cluster? I've not seen anything in the documentation to mention that it is, or is not for that matter. I see that vlan's are supported, inluding support for cluster interconnects over vlan's.
    Now with the standalone interface I want to enable jumbo frames, but I've noticed that the igb.conf file has a global setting for all nic ports, whereas I can enable it for a single nic port in the e1000g.conf kernel driver. My questions are as follows:
    [2] What is the general feeling with mixing mtu sizes on the same lan/vlan? Ive seen some comments that this is not a good idea, and some say that it doesnt cause a problem.
    [3] If the underlying nic, igb0-2 (aggr1) for example, has 9k mtu enabled, I can force the mtu size (1500) for "normal" networks on the vlan interfaces pointing to my "public" network and cluster interconnect vlan. Does anyone have experience of this causing any issues?
    Thanks in advance for all comments/suggestions.

    For 1) the question is really "Do I need to enable Jumbo Frames if I don't want to use them (neither public nore private network)" - the answer is no.
    For 2) each cluster needs to have its own seperate set of VLANs.
    Greets
    Thorsten

  • IPFC (ip over fc) cluster interconnect

    Hello!
    It a possible create cluster interconnect with IPFC (ip over fc) driver (for example - a reserve channel) ?
    What problems may arise?

    Hi,
    technically Sun Cluster works fine with only a single interconnect, but it used to be not supported. The mandatory requirement to have 2 dedicated interconnects was lifted a couple of months ago. Although it is still a best practice and a recommendation to use 2 independent interconnects.
    The possible consequences of only having one NIC port have been mentioned in the previous post.
    Regards
    Hartmut

  • Bad cluster interconnections

    Hi,
    I've an Oracle Database 11g Release 11.1.0.6.0 - 64bit Production With the Real Application Clusters option.
    Due to some check I was executing I noticed something wrong for the cluster interconnections
    This is the oifcfg getif output:
    eth0  10.81.10.0  global  public
    eth1  172.16.100.0  global  cluster_interconnect
    this is equal for both node, and it seems to be right as the 10.81.10.x is the public network and the 172.16.100.x is the private network.
    But if I query the gv$cluster_interconnects, I get:
    SQL> select * from gv$cluster_interconnects;
    INST_ID NAME IP_ADDRESS IS_ SOURCE
    2 bond0 10.81.10.40 NO OS dependent software
    1 bond0 10.81.10.30 NO OS dependent software
    It seems the cluster interconnections are on public network.
    Another info that support this fact is the traffic I can see on network interface (using iptraf):
    NODE 1:
    lo: 629.80 kb/s
    eth0: 29983.60 kb/s
    eth1: 2.20 kb/s
    eth2: 0 kb/s
    eth3: 0 kb/s
    NODE 2:
    lo: 1420.60 kb/s
    eth0: 18149.60 kb/s
    eth1: 2.20 kb/s
    eth2: 0 kb/s
    eth3: 0 kb/s
    This is the bond configuration (the configuration is the same on both nodes):
    +[node01 ~]# more /etc/sysconfig/network-scripts/ifcfg-eth0+
    DEVICE=eth0
    USERCTL=no
    BOOTPROTO=none
    ONBOOT=yes
    MASTER=bond0
    SLAVE=yes
    +[node01  ~]# more /etc/sysconfig/network-scripts/ifcfg-eth1+
    DEVICE=eth1
    USERCTL=no
    BOOTPROTO=none
    ONBOOT=yes
    MASTER=bond1
    SLAVE=yes
    +[node01  ~]# more /etc/sysconfig/network-scripts/ifcfg-eth2+
    DEVICE=eth2
    USERCTL=no
    BOOTPROTO=none
    ONBOOT=yes
    MASTER=bond1
    SLAVE=yes
    +[node01  ~]# more /etc/sysconfig/network-scripts/ifcfg-eth3+
    DEVICE=eth3
    USERCTL=no
    BOOTPROTO=none
    ONBOOT=yes
    MASTER=bond0
    SLAVE=yes
    Why the oifcfg getif output is different than the gv$cluster_interconnects view?
    Any suggestions on how to configure correctly the interconnections?
    Thanks in advance.
    Samuel

    As soon as I reboot the database I'll check it out.
    In the meantime, I snapshot the dabase activity during last hour (during which we suffered a little) and I extracted this top 15 waiting events (based on total wait time (s)):
    Event / Total Wait Time (s)
    enq: TX - index contention     945
    gc current block 2-way     845
    log file sync     769
    latch: shared pool     729
    gc cr block busy     703
    buffer busy waits     536
    buffer deadlock     444
    gc current grant busy     415
    SQL*Net message from client     338,421
    latch free     316
    gc buffer busy release     242
    latch: cache buffers chains     203
    library cache: mutex X     181
    library cache load lock     133
    gc current grant 2-way     102
    Could some of those depends on that bad interconnection configuration?
    And those one are the 15 top wait event based on %DB Time
    Event / % DB time
    db file sequential read     15,3
    library cache pin     13,72
    gc buffer busy acquire     7,16
    gc cr block 2-way     4,19
    library cache lock     2,64
    gc current block busy     2,59
    enq: TX - index contention     2,29
    gc current block 2-way     2,04
    log file sync     1,86
    latch: shared pool     1,76
    gc cr block busy     1,7
    buffer busy waits     1,3
    buffer deadlock     1,08
    gc current grant busy     1
    Thanks in advance
    Edited by: Samuel Rabini on Jan 11, 2011 4:51 PM
    Edited by: Samuel Rabini on Jan 11, 2011 4:53 PM

  • Cluster Interconnect information

    Hi,
    We have a two node cluster (10.2.0.3) running on top of Solaris and Veritas SFRAC.
    The cluster is working fine. I would like to get more information about the private cluster interconnect used by the Oracle Clusterware but the two places I could think of have shown nothing.
    SQL> show parameter cluster_interconnects;
    NAME TYPE VALUE
    cluster_interconnects string
    SQL> select * from GV$CLUSTER_INTERCONNECTS;
    no rows selected
    I wasn't expecting to see anything in the cluster_interconnects parameter but thought there would be something in the dictionary view.
    I'd be grateful if anyone could shed some light on this.
    Where can I get information about the currently configured private interconnect?
    I've yet to check the OCR, does anyone know which key/values are relevant, if any?
    Thanks
    user234564

    Try this:
    1. $ORA_CRS_HOME/bin/oifcfg getif
    eth0 1xx.xxx.x.0 global public
    eth1 192.168.0.0 global cluster_interconnect
    2. V$CONFIGURED_INTERCONNECTS;
    3. X$KSXPIA;
    HTH
    Thanks
    Chandra Pabba

  • Cluster interconnect using listening port 8059 + 8060

    Hello,
    I have 4 tomcat instances running on a zone which are being redirected to an apache via jkmounts.
    One of the tomcats is running on port 8050 and with a listener setup on 8059.
    I was wondering why this was the only port Apache wasn't picking up. A tail-f of the catalina.out shows that port 8059 was busy, so it tries to connect the listener to 8060... also busy, so it finally binds to 8061.
    Ports 8059 and 8060 are being used by the cluster interconnects as shown in netstat below:
    *.* *.* 0 0 49152 0 IDLE
    localhost.5999 *.* 0 0 49152 0 LISTEN
    *.scqsd* . *0 0 49152 0 LISTEN*
    .scqsd *.* 0 0 49152 0 LISTEN
    **.8059* . *0 0 49152 0 LISTEN**
    *.8060 *.* 0 0 49152 0 LISTEN*
    *.* *.* 0 0 49152 0 IDLE
    *.sunrpc* . *0 0 49152 0 LISTEN*
    . . *0 0 49152 0 IDLE*
    localhost.5987 . *0 0 49152 0 LISTEN*
    localhost.898 . *0 0 49152 0 LISTEN*
    localhost.32781 . *0 0 49152 0 LISTEN*
    localhost.5988 . *0 0 49152 0 LISTEN*
    localhost.32782 . *0 0 49152 0 LISTEN*
    .ssh *.* 0 0 49152 0 LISTEN
    *.32783* . *0 0 49152 0 LISTEN*
    .32784 *.* 0 0 49152 0 LISTEN
    *.sccheckd* . *0 0 49152 0 LISTEN*
    .32785 *.* 0 0 49152 0 LISTEN
    *.servicetag* . *0 0 49152 0 LISTEN*
    localhost.smtp . *0 0 49152 0 LISTEN*
    localhost.submission . *0 0 49152 0 LISTEN*
    .32798 *.* 0 0 49152 0 LISTEN
    *.pnmd* . *0 0 49152 0 LISTEN*
    .32811 *.* 0 0 49152 0 BOUND
    localhost.6788 *.* 0 0 49152 0 LISTEN
    localhost.6789 *.* 0 0 49152 0 LISTEN
    scmars.ssh 161.228.79.36.54693 65180 51 49640 0 ESTABLISHED
    localhost.32793 *.* 0 0 49152 0 LISTEN
    *172.16.1.1.35136 172.16.1.2.8059 49640 0 49640 0 ESTABLISHED*
    *172.16.1.1.35137 172.16.1.2.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.1.1.35138 172.16.1.2.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.1.1.35139 172.16.1.2.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.1.1.35140 172.16.1.2.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.0.129.35141 172.16.0.130.8059 49640 0 49640 0 ESTABLISHED*
    *172.16.0.129.35142 172.16.0.130.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.0.129.35143 172.16.0.130.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.0.129.35144 172.16.0.130.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.0.129.35145 172.16.0.130.8060 49640 0 49640 0 ESTABLISHED*
    My question is, how can I modify the ports being used by the cluster interconnects as I would like to keep port 8059 as the tomcat listener port?
    Any help is appreciated.
    Thanks!

    Hi,
    unfortunately the ports used by Sun Cluster are hard wired, su you must cnge your Tomcat port.
    Sorry for the bad news
    Detlef

  • Oracle10g RAC Cluster Interconnect issues

    Hello Everybody,
    Just a brief overview as to what i am currently doing. I have installed Oracle10g RAC database on a cluster of two Windows 2000 AS nodes.These two nodes are accessing an external SCSI hard disk.I have used Oracle cluster file system.
    Currently i am facing some performance issues when it comes to balancing workload on both the nodes.(Single instance database load is faster than a parallel load using two database instances).
    I feel the performance issues could be due to IPC using public Ethernet IP instead of private interconnect.
    (During a parallel load large amount of packets of data are sent over the Public IP and not Private interconnect).
    How can i be sure that the Private interconnect is used for transferring cluster traffic and not the Public IP? (Oracle mentions that for a Oracle10g RAC database, private IP should be used for heart beat as well as transferring cluster traffic).
    Thanks in advance,
    Regards,
    Salil

    You find the answers here:
    RAC: Frequently Asked Questions
    Doc ID: NOTE:220970.1
    At least crossover interconnect is completely unsupported.
    Werner

  • Cluster interconnect

    We have a 3 node RAC cluster, 10.2.0.3 version. sys admin is gearing up to change
    1g interconnect to 10g interconnect. just trying to find out, if anything we need to be prepared with from the database point of view/cluster point of view.
    Thanks

    riyaj wrote:
    But, if the protocol is not RDS, then the path becomes udp -> ip -> IPoIB -> HCA. Clearly, there is an additional layer IPoIB. Considering that most latency is at the software layer level, not in the hardware layer, I am not sure, additional layer will improve the latency. Perhaps when one compares 10GigE with 10Gb IB... but that would be comparing new Ethernet technology with older IB technology. QDR (40Gb) is pretty much the standard (for some years now) for IB.
    Originally we compared 1GigE with 10Gb IB, as 10GigE was not available. IPoIB was a lot faster on SDR IB than 1GigE.
    When 10GigE was released, it was pretty expensive (not sure if this is still the case). A 10Gb Ethernet port was more than 1.5x the cost of 40Gb IB port.
    IB also supports a direct socket (or something) for IP applications. As I understand, this simplifies the call interface and allows socket calls to be made with less latency (surpassing that of the socket interface of a standard IP stack on Ethernet). We never looked at this ourselves as our Interconnect using IB was pretty robust and performant using standard IPoIB.
    Further, InfiniBand has data center implications. In huge companies, this is a problem: A separate infiniband architecture is needed to support infiniband network, which is not exactly a mundane task. With 10Gb NIC cards, existing network infrastructure can be used as long as the switch supports the 10Gb traffic. True... but I see that more as resistance to new technology and even the network vendor used (do not have to name names, do I?) that will specifically slam IB technology as they do not supply IB kit. A pure profit and territory issue.
    All resistance I've ever seen and responded to with IB versus Ethernet have been pretty much unwarranted - to the extend of seing RACs being build using 100Mb Ethernet Interconnect as IB was a "foreign" technology and equated to evil/do not use/complex/unstable/etc.
    Another issue to keep in mind is that IB is a fabric layer. SRP scales and performs better than using fibre channel technology and protocols. So IB is not only suited as Interconnect, but also as the storage fabric layer. (Exadata pretty much proved that point).
    Some months ago OFED announced that the SRP specs have been made available to Ethernet vendors to implement (as there is nothing equivalent on Ethernet). Unfortunately, a 10Gig Ethernet SRP implementation will still lack in comparison with a QDR IB SRP implementation.
    Third point, skill set in the system admin side needs further adjustment to support Infiniband hardware effectively. Important point. But it is not that difficult as sysadmin to acquire the basic set of skills to manage IB from an o/s perspective. Likewise not that difficult for a network engineer to acquire the basic skills for managing the switch and fabric layer.
    The one issue that I think is the single biggest negative ito using IB is getting a stable OFED driver stack running in the kernel. Some of the older versions were not that stable. However, later versions have improved considerably and the current version seems pretty robust. Oh yeah - this is specifically using SRP. IPoIB and bonding and so on, have always worked pretty well. RDMA and SRP were not always that stable with the v1.3 drivers and earlier.

  • Cluster Interconnect droped packets.

    Hi,
    We have a 4 node RAC cluster 10.2.0.3 that is seeing some reboot issues that seem to be network related. The network statistics are showing dropped packets across the interconnect (bond1,eth2). Is this normal behavior due to using UDP?
    $ netstat -i
    Kernel Interface table
    Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
    bond0 1500 0 387000915 0 0 0 377153910 0 0 0 BMmRU
    bond1 1500 0 942586399 0 2450416 0 884471536 0 0 0 BMmRU
    eth0 1500 0 386954905 0 0 0 377153910 0 0 0 BMsRU
    eth1 1500 0 46010 0 0 0 0 0 0 0 BMsRU
    eth2 1500 0 942583215 0 2450416 0 884471536 0 0 0 BMsRU
    eth3 1500 0 3184 0 0 0 0 0 0 0 BMsRU
    lo 16436 0 1048410 0 0 0 1048410 0 0 0 LRU
    Thanks

    Hi,
    To diagnose the reboot issues refere *Troubleshooting 10g and 11.1 Clusterware Reboots [ID 265769.1]*
    Also monitor your lost blocks *gc lost blocks diagnostics [ID 563566.1]*
    I had a issue which turned out to be network card related (gc lost blocks) http://www.asanga-pradeep.blogspot.com/2011/05/gathering-stats-for-gc-lost-blocks.html

  • Soalris Cluster Interconnect over Layer 3 Link

    Is it possible to connect two Solaris Cluster nodes over a pure layer 3 (TCP/UDP) network connection over a distance of about 10 km?

    The problem with having just single node clusters is that, effectively, any failure on the primary site will require invocation of the DR process rather than just a local fail-over. Remember that Sun Cluster and Sun Cluster Geographic Edition are aimed at different problems: high availability and disaster recovery respectively. You seem to be trying to combine the two and this is a bad thing IMHO. (See the Blueprint http://www.sun.com/blueprints/0406/819-5783.html)
    I'm assuming that you have some leased bandwidth between these sites and hence the pure layer 3 networking.
    What do you actually want to achieve: HA or DR? If it's both, you will probably have to make compromises either in cost or expectations.
    Regards,
    Tim
    ---

  • Gig Ethernet V/S  SCI as Cluster Private Interconnect for Oracle RAC

    Hello Gurus
    Can any one pls confirm if it's possible to configure 2 or more Gigabit Ethernet interconnects ( Sun Cluster 3.1 Private Interconnects) on a E6900 cluster ?
    It's for a High Availability requirement of Oracle 9i RAC. i need to know ,
    1) can i use gigabit ethernet as Private cluster interconnect for Deploying Oracle RAC on E6900 ?
    2) What is the recommended Private Cluster Interconnect for Oracle RAC ? GiG ethernet or SCI with RSM ?
    3) How about the scenarios where one can have say 3 X Gig Ethernet V/S 2 X SCI , as their cluster's Private Interconnects ?
    4) How the Interconnect traffic gets distributed amongest the multiple GigaBit ethernet Interconnects ( For oracle RAC) , & is anything required to be done at oracle Rac Level to enable Oracle to recognise that there are multiple interconnect cards it needs to start utilizing all of the GigaBit ethernet Interfaces for transfering packets ?
    5) what would happen to Oracle RAC if one of the Gigabit ethernet private interconnects fails
    Have tried searching for this info but could not locate any doc that can precisely clarify these doubts that i have .........
    thanks for the patience
    Regards,
    Nilesh

    Answers inline...
    Tim
    Can any one pls confirm if it's possible to configure
    2 or more Gigabit Ethernet interconnects ( Sun
    Cluster 3.1 Private Interconnects) on a E6900
    cluster ?Yes, absolutely. You can configure up to 6 NICs for the private networks. Traffic is automatically striped across them if you specify clprivnet0 to Oracle RAC (9i or 10g). That is TCP connections and UDP messages.
    It's for a High Availability requirement of Oracle
    9i RAC. i need to know ,
    1) can i use gigabit ethernet as Private cluster
    interconnect for Deploying Oracle RAC on E6900 ? Yes, definitely.
    2) What is the recommended Private Cluster
    Interconnect for Oracle RAC ? GiG ethernet or SCI
    with RSM ? SCI is or is in the process of being EOL'ed. Gigabit is usually sufficient. Longer term you may want to consider Infiniband or 10 Gigabit ethernet with RDS.
    3) How about the scenarios where one can have say 3 X
    Gig Ethernet V/S 2 X SCI , as their cluster's
    Private Interconnects ? I would still go for 3 x GbE because it is usually cheaper and will probably work just as well. The latency and bandwidth differences are often masked by the performance of the software higher up the stack. In short, unless you tuned the heck out of your application and just about everything else, don't worry too much about the difference between GbE and SCI.
    4) How the Interconnect traffic gets distributed
    amongest the multiple GigaBit ethernet Interconnects
    ( For oracle RAC) , & is anything required to be done
    at oracle Rac Level to enable Oracle to recognise
    that there are multiple interconnect cards it needs
    to start utilizing all of the GigaBit ethernet
    Interfaces for transfering packets ?You don't need to do anything at the Oracle level. That's the beauty of using Oracle RAC with Sun Cluster as opposed to RAC on its own. The striping takes place automatically and transparently behind the scenes.
    5) what would happen to Oracle RAC if one of the
    Gigabit ethernet private interconnects fails It's completely transparent. Oracle will never see the failure.
    Have tried searching for this info but could not
    locate any doc that can precisely clarify these
    doubts that i have .........This is all covered in a paper that I have just completed and should be published after Christmas. Unfortunately, I cannot give out the paper yet.
    thanks for the patience
    Regards,
    Nilesh

  • Cluster Private Interconnect

    Hi,
    Does Global Cache Services work only when Cluster Private Interconnect is configured? I am not seeing any data in v$cache_transfer. cluster_interconnects parameter is blank. V$CLUSTER_INTERCONNECTS view is missing. Please let me know.
    Thanks,
    Madhav

    HI
    If you want to use specif interconnect IP then you can add this in cluster interconnect parameter,
    if it is blank then you are using one interconnect which is default , so no need to worry about it is blank
    rds

  • Kernel patch 138888 breaks cluster

    I am trying to set up a cluster between guest LDoms on two physical 5140 servers. Without the kernel patch 138888 -03 everything works fine. When I install the patch before or after the cluster is created, it results in the two LDoms not being able to communicate over the interconnects, reporting that one or the other node is unreachable.
    Does anyone know why the patch causes that problem and if there is a fix? I'm leaving the patch off for now.
    Thanks.

    [PSARC 2009/069 802.1Q tag mode link property|http://mail.opensolaris.org/pipermail/opensolaris-arc/2009-February/013817.html]

Maybe you are looking for