Cluster Interconnect droped packets.

Hi,
We have a 4 node RAC cluster 10.2.0.3 that is seeing some reboot issues that seem to be network related. The network statistics are showing dropped packets across the interconnect (bond1,eth2). Is this normal behavior due to using UDP?
$ netstat -i
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
bond0 1500 0 387000915 0 0 0 377153910 0 0 0 BMmRU
bond1 1500 0 942586399 0 2450416 0 884471536 0 0 0 BMmRU
eth0 1500 0 386954905 0 0 0 377153910 0 0 0 BMsRU
eth1 1500 0 46010 0 0 0 0 0 0 0 BMsRU
eth2 1500 0 942583215 0 2450416 0 884471536 0 0 0 BMsRU
eth3 1500 0 3184 0 0 0 0 0 0 0 BMsRU
lo 16436 0 1048410 0 0 0 1048410 0 0 0 LRU
Thanks

Hi,
To diagnose the reboot issues refere *Troubleshooting 10g and 11.1 Clusterware Reboots [ID 265769.1]*
Also monitor your lost blocks *gc lost blocks diagnostics [ID 563566.1]*
I had a issue which turned out to be network card related (gc lost blocks) http://www.asanga-pradeep.blogspot.com/2011/05/gathering-stats-for-gc-lost-blocks.html

Similar Messages

  • Aggregates, VLAN's, Jumbo-Frames and cluster interconnect opinions

    Hi All,
    I'm reviewing my options for a new cluster configuration and would like the opinions of people with more expertise than myself out there.
    What I have in mind as follows:
    2 x X4170 servers with 8 x NIC's in each.
    On each 4170 I was going to configure 2 aggregates with 3 nics in each aggregate as follows
    igb0 device in aggr1
    igb1 device in aggr1
    igb2 device in aggr1
    igb3 stand-alone device for iSCSI network
    e1000g0 device in aggr2
    e1000g1 device in aggr2
    e1000g2 device in aggr3
    e1000g3 stand-alone device of iSCSI network
    Now, on top of these aggregates, I was planning on creating VLAN interfaces which will allow me to connect to our two "public" network segments and for the cluster heartbeat network.
    I was then going to configure the vlan's in an IPMP group for failover. I know there are some questions around that configuration in the sense that IPMP will not detect a nic failure if a NIC goes offline in the aggregate, but I could monitor that in a different manner.
    At this point, my questions are:
    [1] Are vlan's, on top of aggregates, supported withing Solaris Cluster? I've not seen anything in the documentation to mention that it is, or is not for that matter. I see that vlan's are supported, inluding support for cluster interconnects over vlan's.
    Now with the standalone interface I want to enable jumbo frames, but I've noticed that the igb.conf file has a global setting for all nic ports, whereas I can enable it for a single nic port in the e1000g.conf kernel driver. My questions are as follows:
    [2] What is the general feeling with mixing mtu sizes on the same lan/vlan? Ive seen some comments that this is not a good idea, and some say that it doesnt cause a problem.
    [3] If the underlying nic, igb0-2 (aggr1) for example, has 9k mtu enabled, I can force the mtu size (1500) for "normal" networks on the vlan interfaces pointing to my "public" network and cluster interconnect vlan. Does anyone have experience of this causing any issues?
    Thanks in advance for all comments/suggestions.

    For 1) the question is really "Do I need to enable Jumbo Frames if I don't want to use them (neither public nore private network)" - the answer is no.
    For 2) each cluster needs to have its own seperate set of VLANs.
    Greets
    Thorsten

  • IPFC (ip over fc) cluster interconnect

    Hello!
    It a possible create cluster interconnect with IPFC (ip over fc) driver (for example - a reserve channel) ?
    What problems may arise?

    Hi,
    technically Sun Cluster works fine with only a single interconnect, but it used to be not supported. The mandatory requirement to have 2 dedicated interconnects was lifted a couple of months ago. Although it is still a best practice and a recommendation to use 2 independent interconnects.
    The possible consequences of only having one NIC port have been mentioned in the previous post.
    Regards
    Hartmut

  • LDOM SUN Cluster Interconnect failure

    I am making a test SUN-Cluster on Solaris 10 in LDOM 1.3.
    in my environment, i have T5120, i have setup two guest OS with some configurations, setup sun cluster software, when executed, scinstall, it failed.
    node 2 come up, but node 1 throws following messgaes:
    Boot device: /virtual-devices@100/channel-devices@200/disk@0:a File and args:
    SunOS Release 5.10 Version Generic_139555-08 64-bit
    Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
    Use is subject to license terms.
    Hostname: test1
    Configuring devices.
    Loading smf(5) service descriptions: 37/37
    /usr/cluster/bin/scdidadm: Could not load DID instance list.
    /usr/cluster/bin/scdidadm: Cannot open /etc/cluster/ccr/did_instances.
    Booting as part of a cluster
    NOTICE: CMM: Node test2 (nodeid = 1) with votecount = 1 added.
    NOTICE: CMM: Node test1 (nodeid = 2) with votecount = 0 added.
    NOTICE: clcomm: Adapter vnet2 constructed
    NOTICE: clcomm: Adapter vnet1 constructed
    NOTICE: CMM: Node test1: attempting to join cluster.
    NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting for quorum.
    NOTICE: clcomm: Path test1:vnet1 - test2:vnet1 errors during initiation
    NOTICE: clcomm: Path test1:vnet2 - test2:vnet2 errors during initiation
    WARNING: Path test1:vnet1 - test2:vnet1 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
    WARNING: Path test1:vnet2 - test2:vnet2 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
    clcomm: Path test1:vnet2 - test2:vnet2 errors during initiation
    CREATED VIRTUAL SWITCH AND VNETS ON PRIMARY DOMAIN LIKE:<>
    532 ldm add-vsw mode=sc cluster-vsw0 primary
    533 ldm add-vsw mode=sc cluster-vsw1 primary
    535 ldm add-vnet vnet2 cluster-vsw0 test1
    536 ldm add-vnet vnet3 cluster-vsw1 test1
    540 ldm add-vnet vnet2 cluster-vsw0 test2
    541 ldm add-vnet vnet3 cluster-vsw1 test2
    Primary DOmain<>
    bash-3.00# dladm show-dev
    vsw0 link: up speed: 1000 Mbps duplex: full
    vsw1 link: up speed: 0 Mbps duplex: unknown
    vsw2 link: up speed: 0 Mbps duplex: unknown
    e1000g0 link: up speed: 1000 Mbps duplex: full
    e1000g1 link: down speed: 0 Mbps duplex: half
    e1000g2 link: down speed: 0 Mbps duplex: half
    e1000g3 link: up speed: 1000 Mbps duplex: full
    bash-3.00# dladm show-link
    vsw0 type: non-vlan mtu: 1500 device: vsw0
    vsw1 type: non-vlan mtu: 1500 device: vsw1
    vsw2 type: non-vlan mtu: 1500 device: vsw2
    e1000g0 type: non-vlan mtu: 1500 device: e1000g0
    e1000g1 type: non-vlan mtu: 1500 device: e1000g1
    e1000g2 type: non-vlan mtu: 1500 device: e1000g2
    e1000g3 type: non-vlan mtu: 1500 device: e1000g3
    bash-3.00#
    NOde1<>
    -bash-3.00# dladm show-link
    vnet0 type: non-vlan mtu: 1500 device: vnet0
    vnet1 type: non-vlan mtu: 1500 device: vnet1
    vnet2 type: non-vlan mtu: 1500 device: vnet2
    -bash-3.00# dladm show-dev
    vnet0 link: unknown speed: 0 Mbps duplex: unknown
    vnet1 link: unknown speed: 0 Mbps duplex: unknown
    vnet2 link: unknown speed: 0 Mbps duplex: unknown
    -bash-3.00#
    NODE2<>
    -bash-3.00# dladm show-link
    vnet0 type: non-vlan mtu: 1500 device: vnet0
    vnet1 type: non-vlan mtu: 1500 device: vnet1
    vnet2 type: non-vlan mtu: 1500 device: vnet2
    -bash-3.00#
    -bash-3.00#
    -bash-3.00# dladm show-dev
    vnet0 link: unknown speed: 0 Mbps duplex: unknown
    vnet1 link: unknown speed: 0 Mbps duplex: unknown
    vnet2 link: unknown speed: 0 Mbps duplex: unknown
    -bash-3.00#
    and this configuration i give while setting up scinstall
    Cluster Transport Adapters and Cables <<<You must identify the two cluster transport adapters which attach
    this node to the private cluster interconnect.
    For node "test1",
    What is the name of the first cluster transport adapter [vnet1]?
    Will this be a dedicated cluster transport adapter (yes/no) [yes]?
    All transport adapters support the "dlpi" transport type. Ethernet
    and Infiniband adapters are supported only with the "dlpi" transport;
    however, other adapter types may support other types of transport.
    For node "test1",
    Is "vnet1" an Ethernet adapter (yes/no) [yes]?
    Is "vnet1" an Infiniband adapter (yes/no) [yes]? no
    For node "test1",
    What is the name of the second cluster transport adapter [vnet3]? vnet2
    Will this be a dedicated cluster transport adapter (yes/no) [yes]?
    For node "test1",
    Name of the switch to which "vnet2" is connected [switch2]?
    For node "test1",
    Use the default port name for the "vnet2" connection (yes/no) [yes]?
    For node "test2",
    What is the name of the first cluster transport adapter [vnet1]?
    Will this be a dedicated cluster transport adapter (yes/no) [yes]?
    For node "test2",
    Name of the switch to which "vnet1" is connected [switch1]?
    For node "test2",
    Use the default port name for the "vnet1" connection (yes/no) [yes]?
    For node "test2",
    What is the name of the second cluster transport adapter [vnet2]?
    Will this be a dedicated cluster transport adapter (yes/no) [yes]?
    For node "test2",
    Name of the switch to which "vnet2" is connected [switch2]?
    For node "test2",
    Use the default port name for the "vnet2" connection (yes/no) [yes]?
    i have setup the configurations like.
    ldm list -l nodename
    NODE1<>
    NETWORK
    NAME SERVICE ID DEVICE MAC MODE PVID VID MTU LINKPROP
    vnet1 primary-vsw0@primary 0 network@0 00:14:4f:f9:61:63 1 1500
    vnet2 cluster-vsw0@primary 1 network@1 00:14:4f:f8:87:27 1 1500
    vnet3 cluster-vsw1@primary 2 network@2 00:14:4f:f8:f0:db 1 1500
    ldm list -l nodename
    NODE2<>
    NETWORK
    NAME SERVICE ID DEVICE MAC MODE PVID VID MTU LINKPROP
    vnet1 primary-vsw0@primary 0 network@0 00:14:4f:f9:a1:68 1 1500
    vnet2 cluster-vsw0@primary 1 network@1 00:14:4f:f9:3e:3d 1 1500
    vnet3 cluster-vsw1@primary 2 network@2 00:14:4f:fb:03:83 1 1500
    ldm list-services
    VSW
    NAME LDOM MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
    primary-vsw0 primary 00:14:4f:f9:25:5e e1000g0 0 switch@0 1 1 1500 on
    cluster-vsw0 primary 00:14:4f:fb:db:cb 1 switch@1 1 1 1500 sc on
    cluster-vsw1 primary 00:14:4f:fa:c1:58 2 switch@2 1 1 1500 sc on
    ldm list-bindings primary
    VSW
    NAME MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
    primary-vsw0 00:14:4f:f9:25:5e e1000g0 0 switch@0 1 1 1500 on
    PEER MAC PVID VID MTU LINKPROP INTERVNETLINK
    vnet1@gitserver 00:14:4f:f8:c0:5f 1 1500
    vnet1@racc2 00:14:4f:f8:2e:37 1 1500
    vnet1@test1 00:14:4f:f9:61:63 1 1500
    vnet1@test2 00:14:4f:f9:a1:68 1 1500
    NAME MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
    cluster-vsw0 00:14:4f:fb:db:cb 1 switch@1 1 1 1500 sc on
    PEER MAC PVID VID MTU LINKPROP INTERVNETLINK
    vnet2@test1 00:14:4f:f8:87:27 1 1500
    vnet2@test2 00:14:4f:f9:3e:3d 1 1500
    NAME MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
    cluster-vsw1 00:14:4f:fa:c1:58 2 switch@2 1 1 1500 sc on
    PEER MAC PVID VID MTU LINKPROP INTERVNETLINK
    vnet3@test1 00:14:4f:f8:f0:db 1 1500
    vnet3@test2 00:14:4f:fb:03:83 1 1500
    Any Idea Team, i beleive the cluster interconnect adapters were not successfull.
    I need any guidance/any clue, how to correct the private interconnect for clustering in two guest LDOMS.

    You dont have to stick to default IP's or subnet . You can change to whatever IP's you need. Whatever subnet mask you need. Even change the private names.
    You can do all this during install or even after install.
    Read the cluster install doc at docs.sun.com

  • Cluster interconnect on LDOM

    Hi,
    We want to setup Solaris cluster on LDOM environment.
    We have:
    - Primary domain
    - Alternate domain (Service Domain)
    So we want to setup the cluster interconnect from primary domain and service domain, like below configuration:
    example:
    ldm add-vsw net-dev=net3 mode=sc private-vsw1 primary
    ldm add-vsw net-dev=net7 mode=sc private-vsw2 alternate
    ldm add-vnet private-net1 mode=hybrid private-vsw1 ldg1
    ldm add-vnet private-net2 mode=hybrid private-vsw2 ldg1
    It's supported the configuration above?
    If there is any documentation about this, please refer me.
    Thanks,

    Hi rachfebrianto,
    yes, the commands are looking good. Minimum requirement is Solaris Cluster 3.2u3 to use hybrid I/O. But I guess you running 3.3 or 4.1 anyway.
    The mode=sc is a requirement on the vsw for Solaris Cluster interconnect (private network).
    And it is supported to add mode=hybrid to guest LDom for the Solaris Cluster interconnect.
    There is no special documentation for Solaris Cluster because its using what is available in the
    Oracle VM Server for SPARC 3.1 Administration Guide
    Using NIU Hybrid I/O
    How to Configure a Virtual Switch With an NIU Network Device
    How to Enable or Disable Hybrid Mode
    Hth,
      Juergen

  • Bad cluster interconnections

    Hi,
    I've an Oracle Database 11g Release 11.1.0.6.0 - 64bit Production With the Real Application Clusters option.
    Due to some check I was executing I noticed something wrong for the cluster interconnections
    This is the oifcfg getif output:
    eth0  10.81.10.0  global  public
    eth1  172.16.100.0  global  cluster_interconnect
    this is equal for both node, and it seems to be right as the 10.81.10.x is the public network and the 172.16.100.x is the private network.
    But if I query the gv$cluster_interconnects, I get:
    SQL> select * from gv$cluster_interconnects;
    INST_ID NAME IP_ADDRESS IS_ SOURCE
    2 bond0 10.81.10.40 NO OS dependent software
    1 bond0 10.81.10.30 NO OS dependent software
    It seems the cluster interconnections are on public network.
    Another info that support this fact is the traffic I can see on network interface (using iptraf):
    NODE 1:
    lo: 629.80 kb/s
    eth0: 29983.60 kb/s
    eth1: 2.20 kb/s
    eth2: 0 kb/s
    eth3: 0 kb/s
    NODE 2:
    lo: 1420.60 kb/s
    eth0: 18149.60 kb/s
    eth1: 2.20 kb/s
    eth2: 0 kb/s
    eth3: 0 kb/s
    This is the bond configuration (the configuration is the same on both nodes):
    +[node01 ~]# more /etc/sysconfig/network-scripts/ifcfg-eth0+
    DEVICE=eth0
    USERCTL=no
    BOOTPROTO=none
    ONBOOT=yes
    MASTER=bond0
    SLAVE=yes
    +[node01  ~]# more /etc/sysconfig/network-scripts/ifcfg-eth1+
    DEVICE=eth1
    USERCTL=no
    BOOTPROTO=none
    ONBOOT=yes
    MASTER=bond1
    SLAVE=yes
    +[node01  ~]# more /etc/sysconfig/network-scripts/ifcfg-eth2+
    DEVICE=eth2
    USERCTL=no
    BOOTPROTO=none
    ONBOOT=yes
    MASTER=bond1
    SLAVE=yes
    +[node01  ~]# more /etc/sysconfig/network-scripts/ifcfg-eth3+
    DEVICE=eth3
    USERCTL=no
    BOOTPROTO=none
    ONBOOT=yes
    MASTER=bond0
    SLAVE=yes
    Why the oifcfg getif output is different than the gv$cluster_interconnects view?
    Any suggestions on how to configure correctly the interconnections?
    Thanks in advance.
    Samuel

    As soon as I reboot the database I'll check it out.
    In the meantime, I snapshot the dabase activity during last hour (during which we suffered a little) and I extracted this top 15 waiting events (based on total wait time (s)):
    Event / Total Wait Time (s)
    enq: TX - index contention     945
    gc current block 2-way     845
    log file sync     769
    latch: shared pool     729
    gc cr block busy     703
    buffer busy waits     536
    buffer deadlock     444
    gc current grant busy     415
    SQL*Net message from client     338,421
    latch free     316
    gc buffer busy release     242
    latch: cache buffers chains     203
    library cache: mutex X     181
    library cache load lock     133
    gc current grant 2-way     102
    Could some of those depends on that bad interconnection configuration?
    And those one are the 15 top wait event based on %DB Time
    Event / % DB time
    db file sequential read     15,3
    library cache pin     13,72
    gc buffer busy acquire     7,16
    gc cr block 2-way     4,19
    library cache lock     2,64
    gc current block busy     2,59
    enq: TX - index contention     2,29
    gc current block 2-way     2,04
    log file sync     1,86
    latch: shared pool     1,76
    gc cr block busy     1,7
    buffer busy waits     1,3
    buffer deadlock     1,08
    gc current grant busy     1
    Thanks in advance
    Edited by: Samuel Rabini on Jan 11, 2011 4:51 PM
    Edited by: Samuel Rabini on Jan 11, 2011 4:53 PM

  • Cluster Interconnect information

    Hi,
    We have a two node cluster (10.2.0.3) running on top of Solaris and Veritas SFRAC.
    The cluster is working fine. I would like to get more information about the private cluster interconnect used by the Oracle Clusterware but the two places I could think of have shown nothing.
    SQL> show parameter cluster_interconnects;
    NAME TYPE VALUE
    cluster_interconnects string
    SQL> select * from GV$CLUSTER_INTERCONNECTS;
    no rows selected
    I wasn't expecting to see anything in the cluster_interconnects parameter but thought there would be something in the dictionary view.
    I'd be grateful if anyone could shed some light on this.
    Where can I get information about the currently configured private interconnect?
    I've yet to check the OCR, does anyone know which key/values are relevant, if any?
    Thanks
    user234564

    Try this:
    1. $ORA_CRS_HOME/bin/oifcfg getif
    eth0 1xx.xxx.x.0 global public
    eth1 192.168.0.0 global cluster_interconnect
    2. V$CONFIGURED_INTERCONNECTS;
    3. X$KSXPIA;
    HTH
    Thanks
    Chandra Pabba

  • Cluster interconnect using listening port 8059 + 8060

    Hello,
    I have 4 tomcat instances running on a zone which are being redirected to an apache via jkmounts.
    One of the tomcats is running on port 8050 and with a listener setup on 8059.
    I was wondering why this was the only port Apache wasn't picking up. A tail-f of the catalina.out shows that port 8059 was busy, so it tries to connect the listener to 8060... also busy, so it finally binds to 8061.
    Ports 8059 and 8060 are being used by the cluster interconnects as shown in netstat below:
    *.* *.* 0 0 49152 0 IDLE
    localhost.5999 *.* 0 0 49152 0 LISTEN
    *.scqsd* . *0 0 49152 0 LISTEN*
    .scqsd *.* 0 0 49152 0 LISTEN
    **.8059* . *0 0 49152 0 LISTEN**
    *.8060 *.* 0 0 49152 0 LISTEN*
    *.* *.* 0 0 49152 0 IDLE
    *.sunrpc* . *0 0 49152 0 LISTEN*
    . . *0 0 49152 0 IDLE*
    localhost.5987 . *0 0 49152 0 LISTEN*
    localhost.898 . *0 0 49152 0 LISTEN*
    localhost.32781 . *0 0 49152 0 LISTEN*
    localhost.5988 . *0 0 49152 0 LISTEN*
    localhost.32782 . *0 0 49152 0 LISTEN*
    .ssh *.* 0 0 49152 0 LISTEN
    *.32783* . *0 0 49152 0 LISTEN*
    .32784 *.* 0 0 49152 0 LISTEN
    *.sccheckd* . *0 0 49152 0 LISTEN*
    .32785 *.* 0 0 49152 0 LISTEN
    *.servicetag* . *0 0 49152 0 LISTEN*
    localhost.smtp . *0 0 49152 0 LISTEN*
    localhost.submission . *0 0 49152 0 LISTEN*
    .32798 *.* 0 0 49152 0 LISTEN
    *.pnmd* . *0 0 49152 0 LISTEN*
    .32811 *.* 0 0 49152 0 BOUND
    localhost.6788 *.* 0 0 49152 0 LISTEN
    localhost.6789 *.* 0 0 49152 0 LISTEN
    scmars.ssh 161.228.79.36.54693 65180 51 49640 0 ESTABLISHED
    localhost.32793 *.* 0 0 49152 0 LISTEN
    *172.16.1.1.35136 172.16.1.2.8059 49640 0 49640 0 ESTABLISHED*
    *172.16.1.1.35137 172.16.1.2.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.1.1.35138 172.16.1.2.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.1.1.35139 172.16.1.2.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.1.1.35140 172.16.1.2.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.0.129.35141 172.16.0.130.8059 49640 0 49640 0 ESTABLISHED*
    *172.16.0.129.35142 172.16.0.130.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.0.129.35143 172.16.0.130.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.0.129.35144 172.16.0.130.8060 49640 0 49640 0 ESTABLISHED*
    *172.16.0.129.35145 172.16.0.130.8060 49640 0 49640 0 ESTABLISHED*
    My question is, how can I modify the ports being used by the cluster interconnects as I would like to keep port 8059 as the tomcat listener port?
    Any help is appreciated.
    Thanks!

    Hi,
    unfortunately the ports used by Sun Cluster are hard wired, su you must cnge your Tomcat port.
    Sorry for the bad news
    Detlef

  • Oracle10g RAC Cluster Interconnect issues

    Hello Everybody,
    Just a brief overview as to what i am currently doing. I have installed Oracle10g RAC database on a cluster of two Windows 2000 AS nodes.These two nodes are accessing an external SCSI hard disk.I have used Oracle cluster file system.
    Currently i am facing some performance issues when it comes to balancing workload on both the nodes.(Single instance database load is faster than a parallel load using two database instances).
    I feel the performance issues could be due to IPC using public Ethernet IP instead of private interconnect.
    (During a parallel load large amount of packets of data are sent over the Public IP and not Private interconnect).
    How can i be sure that the Private interconnect is used for transferring cluster traffic and not the Public IP? (Oracle mentions that for a Oracle10g RAC database, private IP should be used for heart beat as well as transferring cluster traffic).
    Thanks in advance,
    Regards,
    Salil

    You find the answers here:
    RAC: Frequently Asked Questions
    Doc ID: NOTE:220970.1
    At least crossover interconnect is completely unsupported.
    Werner

  • Cluster interconnect

    We have a 3 node RAC cluster, 10.2.0.3 version. sys admin is gearing up to change
    1g interconnect to 10g interconnect. just trying to find out, if anything we need to be prepared with from the database point of view/cluster point of view.
    Thanks

    riyaj wrote:
    But, if the protocol is not RDS, then the path becomes udp -> ip -> IPoIB -> HCA. Clearly, there is an additional layer IPoIB. Considering that most latency is at the software layer level, not in the hardware layer, I am not sure, additional layer will improve the latency. Perhaps when one compares 10GigE with 10Gb IB... but that would be comparing new Ethernet technology with older IB technology. QDR (40Gb) is pretty much the standard (for some years now) for IB.
    Originally we compared 1GigE with 10Gb IB, as 10GigE was not available. IPoIB was a lot faster on SDR IB than 1GigE.
    When 10GigE was released, it was pretty expensive (not sure if this is still the case). A 10Gb Ethernet port was more than 1.5x the cost of 40Gb IB port.
    IB also supports a direct socket (or something) for IP applications. As I understand, this simplifies the call interface and allows socket calls to be made with less latency (surpassing that of the socket interface of a standard IP stack on Ethernet). We never looked at this ourselves as our Interconnect using IB was pretty robust and performant using standard IPoIB.
    Further, InfiniBand has data center implications. In huge companies, this is a problem: A separate infiniband architecture is needed to support infiniband network, which is not exactly a mundane task. With 10Gb NIC cards, existing network infrastructure can be used as long as the switch supports the 10Gb traffic. True... but I see that more as resistance to new technology and even the network vendor used (do not have to name names, do I?) that will specifically slam IB technology as they do not supply IB kit. A pure profit and territory issue.
    All resistance I've ever seen and responded to with IB versus Ethernet have been pretty much unwarranted - to the extend of seing RACs being build using 100Mb Ethernet Interconnect as IB was a "foreign" technology and equated to evil/do not use/complex/unstable/etc.
    Another issue to keep in mind is that IB is a fabric layer. SRP scales and performs better than using fibre channel technology and protocols. So IB is not only suited as Interconnect, but also as the storage fabric layer. (Exadata pretty much proved that point).
    Some months ago OFED announced that the SRP specs have been made available to Ethernet vendors to implement (as there is nothing equivalent on Ethernet). Unfortunately, a 10Gig Ethernet SRP implementation will still lack in comparison with a QDR IB SRP implementation.
    Third point, skill set in the system admin side needs further adjustment to support Infiniband hardware effectively. Important point. But it is not that difficult as sysadmin to acquire the basic set of skills to manage IB from an o/s perspective. Likewise not that difficult for a network engineer to acquire the basic skills for managing the switch and fabric layer.
    The one issue that I think is the single biggest negative ito using IB is getting a stable OFED driver stack running in the kernel. Some of the older versions were not that stable. However, later versions have improved considerably and the current version seems pretty robust. Oh yeah - this is specifically using SRP. IPoIB and bonding and so on, have always worked pretty well. RDMA and SRP were not always that stable with the v1.3 drivers and earlier.

  • Soalris Cluster Interconnect over Layer 3 Link

    Is it possible to connect two Solaris Cluster nodes over a pure layer 3 (TCP/UDP) network connection over a distance of about 10 km?

    The problem with having just single node clusters is that, effectively, any failure on the primary site will require invocation of the DR process rather than just a local fail-over. Remember that Sun Cluster and Sun Cluster Geographic Edition are aimed at different problems: high availability and disaster recovery respectively. You seem to be trying to combine the two and this is a bad thing IMHO. (See the Blueprint http://www.sun.com/blueprints/0406/819-5783.html)
    I'm assuming that you have some leased bandwidth between these sites and hence the pure layer 3 networking.
    What do you actually want to achieve: HA or DR? If it's both, you will probably have to make compromises either in cost or expectations.
    Regards,
    Tim
    ---

  • Who has to solve?

    load balancing in two rac issue............who need to solve
    is it dba or architect level?
    who has decide sga_target=
    is it archtiect or dba

    1. RAC DBA is responsible.
    Oracle RAC job role best practices
    There is a perpetual conflict between systems administrators (SAs), who traditionally manage servers and disks, and the RAC DBAs who are responsible for managing the RAC database. There are also clearly defined job roles for network administrators, who are especially challenged in a RAC database environment to manage the cluster interconnect and packet shipping between servers.
    If your DBA is going to be held responsible for the performance of the RAC database, then it’s only fair that he be given root access to the servers and disk storage subsystem. However, not every DBA will have the required computer science skills to manage a complex server and SAN environment, so each shop makes this decision on a case-by-case basis.
    Source:http://searchoracle.techtarget.com/news/2240016536/Understanding-Oracle-Real-Application-Clusters-RAC-best-practices
    2. sga_target = Since it is related to SGA so its sole responsiblity of DBA. If there is any need regarding increasing RAM, CPU Usages, IO issues and if they find out as an OS issue; then SA is responsible.
    Regards
    Girish Sharma

  • Gig Ethernet V/S  SCI as Cluster Private Interconnect for Oracle RAC

    Hello Gurus
    Can any one pls confirm if it's possible to configure 2 or more Gigabit Ethernet interconnects ( Sun Cluster 3.1 Private Interconnects) on a E6900 cluster ?
    It's for a High Availability requirement of Oracle 9i RAC. i need to know ,
    1) can i use gigabit ethernet as Private cluster interconnect for Deploying Oracle RAC on E6900 ?
    2) What is the recommended Private Cluster Interconnect for Oracle RAC ? GiG ethernet or SCI with RSM ?
    3) How about the scenarios where one can have say 3 X Gig Ethernet V/S 2 X SCI , as their cluster's Private Interconnects ?
    4) How the Interconnect traffic gets distributed amongest the multiple GigaBit ethernet Interconnects ( For oracle RAC) , & is anything required to be done at oracle Rac Level to enable Oracle to recognise that there are multiple interconnect cards it needs to start utilizing all of the GigaBit ethernet Interfaces for transfering packets ?
    5) what would happen to Oracle RAC if one of the Gigabit ethernet private interconnects fails
    Have tried searching for this info but could not locate any doc that can precisely clarify these doubts that i have .........
    thanks for the patience
    Regards,
    Nilesh

    Answers inline...
    Tim
    Can any one pls confirm if it's possible to configure
    2 or more Gigabit Ethernet interconnects ( Sun
    Cluster 3.1 Private Interconnects) on a E6900
    cluster ?Yes, absolutely. You can configure up to 6 NICs for the private networks. Traffic is automatically striped across them if you specify clprivnet0 to Oracle RAC (9i or 10g). That is TCP connections and UDP messages.
    It's for a High Availability requirement of Oracle
    9i RAC. i need to know ,
    1) can i use gigabit ethernet as Private cluster
    interconnect for Deploying Oracle RAC on E6900 ? Yes, definitely.
    2) What is the recommended Private Cluster
    Interconnect for Oracle RAC ? GiG ethernet or SCI
    with RSM ? SCI is or is in the process of being EOL'ed. Gigabit is usually sufficient. Longer term you may want to consider Infiniband or 10 Gigabit ethernet with RDS.
    3) How about the scenarios where one can have say 3 X
    Gig Ethernet V/S 2 X SCI , as their cluster's
    Private Interconnects ? I would still go for 3 x GbE because it is usually cheaper and will probably work just as well. The latency and bandwidth differences are often masked by the performance of the software higher up the stack. In short, unless you tuned the heck out of your application and just about everything else, don't worry too much about the difference between GbE and SCI.
    4) How the Interconnect traffic gets distributed
    amongest the multiple GigaBit ethernet Interconnects
    ( For oracle RAC) , & is anything required to be done
    at oracle Rac Level to enable Oracle to recognise
    that there are multiple interconnect cards it needs
    to start utilizing all of the GigaBit ethernet
    Interfaces for transfering packets ?You don't need to do anything at the Oracle level. That's the beauty of using Oracle RAC with Sun Cluster as opposed to RAC on its own. The striping takes place automatically and transparently behind the scenes.
    5) what would happen to Oracle RAC if one of the
    Gigabit ethernet private interconnects fails It's completely transparent. Oracle will never see the failure.
    Have tried searching for this info but could not
    locate any doc that can precisely clarify these
    doubts that i have .........This is all covered in a paper that I have just completed and should be published after Christmas. Unfortunately, I cannot give out the paper yet.
    thanks for the patience
    Regards,
    Nilesh

  • Cluster Private Interconnect

    Hi,
    Does Global Cache Services work only when Cluster Private Interconnect is configured? I am not seeing any data in v$cache_transfer. cluster_interconnects parameter is blank. V$CLUSTER_INTERCONNECTS view is missing. Please let me know.
    Thanks,
    Madhav

    HI
    If you want to use specif interconnect IP then you can add this in cluster interconnect parameter,
    if it is blank then you are using one interconnect which is default , so no need to worry about it is blank
    rds

  • Cisco ISE 1.3 MAB authentication.. switch drop packet

    Hello All,
    I have C3560 Software (C3560-IPSERVICESK9-M), Version 12.2(55)SE9, RELEASE SOFTWARE (fc1) switch..
    and ISE 1.3 versoin..
    MAB authentication is working perfectly at ISE end.. but while seeing the same at switch end.. I am seeing switch is droping packet on some ports..
    while some ports are working perfectly..
    Same switch configuration is working perfectly on another switch without any issue..
    Switch configuration for your suggestion..!!
    aaa new-model
    aaa authentication fail-message ^C
    **** Either ACS or ISE is DOWN / Use ur LOCAL CREDENTIALS / Thank You ****
    ^C
    aaa authentication login CONSOLE local
    aaa authentication login ACS group tacacs+ group radius local
    aaa authentication dot1x default group radius
    aaa authorization config-commands
    aaa authorization commands 0 default group tacacs+ local
    aaa authorization commands 1 default group tacacs+ local
    aaa authorization commands 15 default group tacacs+ local
    aaa authorization network default group radius
    aaa accounting dot1x default start-stop group radius
    aaa accounting exec default start-stop group tacacs+
    aaa accounting commands 0 default start-stop group tacacs+
    aaa accounting commands 15 default start-stop group tacacs+
    aaa accounting network default start-stop group tacacs+
    aaa accounting connection default start-stop group tacacs+
    aaa accounting system default start-stop group tacacs+ group radius
    aaa server radius dynamic-author
     client 172.16.95.x server-key 7 02050D480809
     client 172.16.95.x server-key 7 14141B180F0B
    aaa session-id common
    clock timezone IST 5 30
    system mtu routing 1500
    ip routing
    no ip domain-lookup
    ip domain-name EVS.com
    ip device tracking
    epm logging
    dot1x system-auth-control
    interface FastEthernet0/1
     switchport access vlan x
     switchport mode access
     switchport voice vlan x
     authentication event fail action next-method
     --More--         authentication host-mode multi-auth
     authentication order mab dot1x
     authentication priority mab dot1x
     authentication port-control auto
     authentication violation restrict
     mab
     snmp trap mac-notification change added
     snmp trap mac-notification change removed
     dot1x pae authenticator
     dot1x timeout tx-period 10
     spanning-tree portfast
    ip tacacs source-interface Vlan10
    ip radius source-interface Vlan10 vrf default
    logging trap critical
    logging origin-id ip
    logging 172.16.5.95
    logging host 172.16.95.x transport udp port 20514
    logging host 172.16.95.x transport udp port 20514
    snmp-server group SNMP-Group v3 auth read EVS-view notify *tv.FFFFFFFF.FFFFFFFF.FFFFFFFF.FFFFFFFF7F access 15
    snmp-server view EVS-view internet included
    snmp-server community S1n2M3p4$ RO
    snmp-server community cisco RO
    snmp-server trap-source Vlan10
    snmp-server source-interface informs Vlan10
    snmp-server enable traps snmp authentication linkdown linkup coldstart warmstart
     --More--         snmp-server enable traps tty
    snmp-server enable traps cluster
    snmp-server enable traps entity
    snmp-server enable traps cpu threshold
    snmp-server enable traps vtp
    snmp-server enable traps vlancreate
    snmp-server enable traps vlandelete
    snmp-server enable traps flash insertion removal
    snmp-server enable traps port-security
    snmp-server enable traps envmon fan shutdown supply temperature status
    snmp-server enable traps config-copy
    snmp-server enable traps config
    snmp-server enable traps bridge newroot topologychange
    snmp-server enable traps stpx inconsistency root-inconsistency loop-inconsistency
    snmp-server enable traps syslog
    snmp-server enable traps mac-notification change move threshold
    snmp-server enable traps vlan-membership
    snmp-server host 172.16.95.x version 2c cisco
    snmp-server host 172.16.95.x version 2c cisco
    snmp-server host 172.16.5.x version 3 auth evsnetadmin
    tacacs-server host 172.16.5.x key 7 0538571873651D1D4D26421A4F
    tacacs-server directed-request
     --More--         tacacs-server key 7 107D580E573E411F58277F2360
    tacacs-server administration
    radius-server attribute 6 on-for-login-auth
    radius-server attribute 25 access-request include
    radius-server host 172.16.95.y auth-port 1812 acct-port 1813 key 7 060506324F41
    radius-server host 172.16.95.x auth-port 1812 acct-port 1813 key 7 110A1016141D
    radius-server host 172.16.95.y auth-port 1645 acct-port 1646 key 7 110A1016141D
    radius-server host 172.16.95.x auth-port 1645 acct-port 1646 key 7 070C285F4D06
    radius-server timeout 2
    radius-server key 7 060506324F41
    radius-server vsa send accounting
    radius-server vsa send authentication
    line con 0
     exec-timeout 5 0
     privilege level 15
     logging synchronous
     login authentication CONSOLE
    line vty 0 4
     access-class telnet_access in
     exec-timeout 0 0
     logging synchronous
     --More--         login authentication ACS
     transport input ssh

     24423  ISE has not been able to confirm previous successful machine authentication  
    Judging by that line and what your policy says, it appears that your authentication was rejected as your machine was not authenticated prior to this connection.
    first thing to check is whether MAR has been enabled on the identity source. second thing to check is whether your machine is set to send a certificate for authentication. there are other things you can look at but I'd do those two first.
    log off and on  or reboot and then see if you at least get a failed machine auth on the operations>authentication page and we can go from there. 

Maybe you are looking for