Gig Ethernet V/S  SCI as Cluster Private Interconnect for Oracle RAC

Hello Gurus
Can any one pls confirm if it's possible to configure 2 or more Gigabit Ethernet interconnects ( Sun Cluster 3.1 Private Interconnects) on a E6900 cluster ?
It's for a High Availability requirement of Oracle 9i RAC. i need to know ,
1) can i use gigabit ethernet as Private cluster interconnect for Deploying Oracle RAC on E6900 ?
2) What is the recommended Private Cluster Interconnect for Oracle RAC ? GiG ethernet or SCI with RSM ?
3) How about the scenarios where one can have say 3 X Gig Ethernet V/S 2 X SCI , as their cluster's Private Interconnects ?
4) How the Interconnect traffic gets distributed amongest the multiple GigaBit ethernet Interconnects ( For oracle RAC) , & is anything required to be done at oracle Rac Level to enable Oracle to recognise that there are multiple interconnect cards it needs to start utilizing all of the GigaBit ethernet Interfaces for transfering packets ?
5) what would happen to Oracle RAC if one of the Gigabit ethernet private interconnects fails
Have tried searching for this info but could not locate any doc that can precisely clarify these doubts that i have .........
thanks for the patience
Regards,
Nilesh

Answers inline...
Tim
Can any one pls confirm if it's possible to configure
2 or more Gigabit Ethernet interconnects ( Sun
Cluster 3.1 Private Interconnects) on a E6900
cluster ?Yes, absolutely. You can configure up to 6 NICs for the private networks. Traffic is automatically striped across them if you specify clprivnet0 to Oracle RAC (9i or 10g). That is TCP connections and UDP messages.
It's for a High Availability requirement of Oracle
9i RAC. i need to know ,
1) can i use gigabit ethernet as Private cluster
interconnect for Deploying Oracle RAC on E6900 ? Yes, definitely.
2) What is the recommended Private Cluster
Interconnect for Oracle RAC ? GiG ethernet or SCI
with RSM ? SCI is or is in the process of being EOL'ed. Gigabit is usually sufficient. Longer term you may want to consider Infiniband or 10 Gigabit ethernet with RDS.
3) How about the scenarios where one can have say 3 X
Gig Ethernet V/S 2 X SCI , as their cluster's
Private Interconnects ? I would still go for 3 x GbE because it is usually cheaper and will probably work just as well. The latency and bandwidth differences are often masked by the performance of the software higher up the stack. In short, unless you tuned the heck out of your application and just about everything else, don't worry too much about the difference between GbE and SCI.
4) How the Interconnect traffic gets distributed
amongest the multiple GigaBit ethernet Interconnects
( For oracle RAC) , & is anything required to be done
at oracle Rac Level to enable Oracle to recognise
that there are multiple interconnect cards it needs
to start utilizing all of the GigaBit ethernet
Interfaces for transfering packets ?You don't need to do anything at the Oracle level. That's the beauty of using Oracle RAC with Sun Cluster as opposed to RAC on its own. The striping takes place automatically and transparently behind the scenes.
5) what would happen to Oracle RAC if one of the
Gigabit ethernet private interconnects fails It's completely transparent. Oracle will never see the failure.
Have tried searching for this info but could not
locate any doc that can precisely clarify these
doubts that i have .........This is all covered in a paper that I have just completed and should be published after Christmas. Unfortunately, I cannot give out the paper yet.
thanks for the patience
Regards,
Nilesh

Similar Messages

  • Copper cable / GigE Copper Interface as Private Interconnect for Oracle RAC

    Hello Gurus
    Can some one confirm if the copper Cables ( Cat5/RJ45) can be used for Gig Ethernet i.e. Private interconnects for deploying Oracle RAC 9.x or 10gR2 on Solaris 9/10 .
    i am planning to use 2 X GigE Interfaces (one port each from X4445 Quad Port Ethernet Adapters) & Planning to connect it using copper cables ( all the documents that i came across is been refering to the fiber cables for Private Interconnects , connecting GigE Interfaces , so i am getting bit confused )
    would appretiate if some one can throw some lights on the same.
    regards,
    Nilesh Naik
    thanks

    Cat5/RJ45 can be used for Gig Ethernet Private interconnects for Oracle RAC. I would recommend trunking the two or more interconnects for redundancy. The X4445 adapters are compatible with the Sun Trunking 1.3 software (http://www.sun.com/products/networking/ethernet/suntrunking/). If you have servers that support the Nemo framework (bge, e1000g, xge, nge, rge, ixgb), you can use the Solaris 10 trunking software, dladmin.
    We have a couple of SUN T2000 servers and are using the onboard GigE ports for the Oracle 10gR2 RAC interconnects. We upgraded the onboard NIC drivers to the e1000g and used the Solaris 10 trunking software. The next update of Solaris will have the e1000g drivers as the default for the SUN T2000 servers.

  • Oracle Cluster private Interconnect

    What are the different speeds and technologies that we can configure the oracle private interconnect for RAC 11g?

    ghd wrote:
    What are the different speeds and technologies that we can configure the oracle private interconnect for RAC 11g?The recommended technology (looking at what Oracle's Database Machine uses) is QDR (Quad Data Rate/40Gbs) Infiniband, using the RDS (Reliable Datagram Sockets). This provides (according to Oracle testing), a 50% faster cache-to-cache block throughput with 50% less CPU time - in comparison to using UDP as the RAC Interconnect wire protocol.
    Oracle presented these results to the Infiniband/OFED members in a presentation called Oracle’s Next-Generation Interconnect Protocol (PDF).
    The Infiniband roadmap shows that the NDR (Next Data Rate) will scale to 320Gb/s.
    There is absolutely nothing I have seen from the Ethernet vendors that show GigE matching Infiniband.
    From Top 500, listing the biggest and fastest 500 clusters on this planet, Infiniband has a 41.8% market share, in comparison with the 41.4% share of GigE.
    Compare this to 2005 (when we first got Infiniband for RAC). Back then Infiniband had a 3.2% market share. GigE had a 42.8% share. So there has been an incredible growth in using Infiniband as Interconnect - unlike GigE that has been stagnant and now is the 2nd place as top500 Interconnect family architecture.
    What is needed for using Infiniband for Oracle RAC? A HCA (Host Channel Adapter) card for each RAC server (high speed PCI cards, dual port). An Infiniband switch (2 ports per RAC server needed). And cables of course. All these are sold by most server h/w vendors. Costs are quite comparable to 10Gb/s GigE (and even cheaper) in my experience.

  • Cluster Private Interconnect

    Hi,
    Does Global Cache Services work only when Cluster Private Interconnect is configured? I am not seeing any data in v$cache_transfer. cluster_interconnects parameter is blank. V$CLUSTER_INTERCONNECTS view is missing. Please let me know.
    Thanks,
    Madhav

    HI
    If you want to use specif interconnect IP then you can add this in cluster interconnect parameter,
    if it is blank then you are using one interconnect which is default , so no need to worry about it is blank
    rds

  • Encountered ora-29701 during Sun Cluster for Oracle RAC 9.2.0.7 startup (UR

    Hi all,
    Need some help from all out there
    In our Sun Cluster 3.1 Data Service for Oracle RAC 9.2.0.7 (Solaris 9) configuration, my team had encountered
    ora-29701 *Unable to connect to Cluster Manager*
    during the startup of the Oracle RAC database instances on the Oracle RAC Server resources.
    We tried the attached workaround by Oracle. This workaround works well for the 1^st time but it doesn’t work anymore when the server is rebooted.
    Kindly help me to check whether anyone encounter the same problem as the above and able to resolve. Thanks.
    Bug No. 4262155
    Filed 25-MAR-2005 Updated 11-APR-2005
    Product Oracle Server - Enterprise Edition Product Version 9.2.0.6.0
    Platform Linux x86
    Platform Version 2.4.21-9.0.1
    Database Version 9.2.0.6.0
    Affects Platforms Port-Specific
    Severity Severe Loss of Service
    Status Not a Bug. To Filer
    Base Bug N/A
    Fixed in Product Version No Data
    Problem statement:
    ORA-29701 DURING DATABASE CREATION AFTER APPLYING 9.2.0.6 PATCHSET
    *** 03/25/05 07:32 am ***
    TAR:
    PROBLEM:
    Customer applied 9.2.0.6 patchset over 9.2.0.4 patchset.
    While creating the database, customer receives following error:
         ORA-29701: unable to connect to Cluster Manager
    However, if customer goes from 9.2.0.4 -> 9.2.0.5 -> 9.2.0.6, the problem does not occur.
    DIAGNOSTIC ANALYSIS:
    It seems that the problem is with libskgxn9.so shared library.
    For 9.2.0.4 -> 9.2.0.5 -> 9.2.0.6, the install log shows the following:
    installActions2005-03-22_03-44-42PM.log:,
    [libskgxn9.so->%ORACLE_HOME%/lib/libskgxn9.so 7933 plats=1=>[46]langs=1=> en,fr,ar,bn,pt_BR,bg,fr_CA,ca,hr,cs,da,nl,ar_EG,en_GB,et,fi,de,el,iw,hu,is,in, it,ja,ko,es,lv,lt,ms,es_MX,no,pl,pt,ro,ru,zh_CN,sk,sl,es_ES,sv,th,zh_TW, tr,uk,vi]]
    installActions2005-03-22_04-13-03PM.log:, [libcmdll.so ->%ORACLE_HOME%/lib/libskgxn9.so 64274 plats=1=>[46] langs=-554696704=>[en]]
    For 9.2.0.4 -> 9.2.0.6, install log shows:
    installActions2005-03-22_04-13-03PM.log:, [libcmdll.so ->%ORACLE_HOME%/lib/libskgxn9.so 64274 plats=1=>[46] langs=-554696704=>[en]] does not exist.
    This means that while patching from 9.2.0.4 -> 9.2.0.5, Installer copies the libcmdll.so library into libskgxn9.so, while patching from 9.2.0.4 -> 9.2.0.6 does not.
    ORACM is located in /app/oracle/ORACM which is different than ORACLE_HOME in customer's environment.
    WORKAROUND:
    Customer is using the following workaround:
    cd $ORACLE_HOME/rdbms/lib make -f ins_rdbms.mk rac_on ioracle ipc_udp
    RELATED BUGS:
    Bug 4169291

    Check if following MOS note helps.
    Series of ORA-7445 Errors After Applying 9.2.0.7.0 Patchset to 9.2.0.6.0 Database (Doc ID 373375.1)

  • Why do we use reverse proxy for Oracle RAC Cluster setup

    Hello All,
                 I got this question lately.. "why do we use reverse proxy for Oracle RAC Cluster setup". I know we use the reverse proxy at Middleware level for multiple security reasons.
    Thanks..

    "why do we use reverse proxy for Oracle RAC Cluster setup".
    I wouldn't. I wouldn't use a proxy of any sort for the Cluster Interconnect for sure.
    Cheers,
    Brian

  • Linux set up for Oracle RAC (real application cluster)

    Hi Guys,
    I m wrkig as Oracle DBA.
    Very curious to know the initials for RAC set up at OS level.
    Can anyone provide his/her usefull guidelines for the same.
    Although I know all steps at OS level also, but didn't did the set up of before Oracle RAC installation.
    Want to increase knowlegde on like:
    --how we sahre storage.
    --how we set up network (private & virtual IP) and how can check working of NIC's.
    --and other required things.
    Will appreciate ur help and if someone want to share his/her personal experience.
    Thx in advance.

    [email protected] wrote:
    Want to increase knowlegde on like:Here are very basic answers to very complex questions - from a pure Linux perspective running an Open Source stack and untainted kernel.
    --how we sahre storage.Using multipath - this should ship with most 2.6 kernels. The kernel sees the shared storage LUNs as scsi devices - multipath does the rest. (and ASM can directly use a multipath device).
    On a physical layer. Typical setup (on a RAC node) is using a HBA PCI card that runs fibre connections into a SAN switch. You can also use Infiniband (IB) as the I/O layer (as Oracle's Exadata database machine does). In this case the servers will use HCA PCI cards, run IB cables into the switch, and so will the storage array run an IB cable into the switch.
    --how we set up network (private & virtual IP) and how can check working of NIC's.Depends on the achitecture choses as Interconnect. Typical choices are GigE or Infiniband (IB). Oracle's Exadata database machine (RAC) uses IB as already mentioned. (and is also our preferred Interconnect technology)
    With IB you would use the OFED driver stack and have a range of ib.. commands available. These can be used to configure IP over IB (IPoIB) for use as an IP-based Interconnect, bonding of NICs, check a port's status, and so on.
    --and other required things.As both Daniel and Hans indicated.. you are asking quite complex questions that require a manual (if not several) to be written in response. So best to refer to the manuals and OTN material available.
    Also, if you and your company are serious about using RAC, then you should make use of Oracle's RAC Assurance group to assist you. They will provide you with starter kit information for the o/s selected. They will check every single configuration parameter afterwards and deliver a comprehensive report on what's wrong, what works and what doesn't. With recommended changes that need to be done.

  • How to setup private network in oracle rac

    Hi all,
    Iam trying to setup oracle 2-NODE RAC ,
    now i stuck in setup private network..
    how to setup private network, what i have to do for that.
    please help us provide step by step process

    The loop is nothing but a network cable connecting two nodes on same port with a private IP address (something like 10.0.01 and 10.0.0.2) which is not accessible by any other machine in the network (except 2 nodes obviously ).
    Note that cross over cables are not supported for the Cluster Interconnect. And cross over cables limit the cluster to only 2 nodes, which may not be enough for many RAC deployments.
    Cheers,
    Brian

  • Veritas required for Oracle RAC on Sun Cluster v3?

    Hi,
    We are planning a 2 node Oracle 9i RAC cluster on Sun Cluster 3.
    Can you please explain these 2 questions?
    1)
    If we have a hardware disk array RAID controller with LUNs etc, then why do we need to have Veritas Volume Manager (VxVM) if all the LUNS are configured at a hardware level?
    2)
    Do we need to have VxFS? All our Oracle database files will be on raw partitions.
    Thanks,
    Steve

    > We are planning a 2 node Oracle 9i RAC cluster on Sun
    Cluster 3.Good. This is a popular configuration.
    Can you please explain these 2 questions?
    1)
    If we have a hardware disk array RAID controller with
    LUNs etc, then why do we need to have Veritas Volume
    Manager (VxVM) if all the LUNS are configured at a
    hardware level?VxVM is not required to run RAC. VxVM has an option (separately
    licensable) which is specifically designed for OPS/RAC. But if
    you have a highly reliable, multi-pathed, hardware RAID platform,
    you are not required to have VxVM.
    2)
    Do we need to have VxFS? All our Oracle database
    files will be on raw partitions.No.
    IMHO, simplify is a good philosophy. Adding more software
    and layers into a highly available design will tend to reduce
    the availability. So, if you are going for maximum availabiliity,
    you will want to avoid over-complicating the design. KISS.
    In the case of RAC, or Oracle in general, many people do use
    raw and Oracle has the ability to manage data in raw devices
    pretty well. Oracle 10g further improves along these lines.
    A tenet in the design of highly available systems is to keep
    the data management as close to the application as possible.
    Oracle, and especially 10g, are following this tenet. The only
    danger here is that they could try to get too clever, and end up
    following policies which are suboptimal as the underlying
    technologies change. But even in this case, the policy is
    coming from the application rather than the supporting platform.
    -- richard

  • Veritas Cluster 6 + Solaris 11 + Oracle RAC 11g2 = OCR trouble

    I trying new version of Veritas Cluster and Solaris.
    Experienced trouble with clustered VxFS for OCR file.
    Grid installer refuse VxFS with message - not support the storage type.
    After installation I tryed to add new OCR file on VxFS got message in crsd.log:
    ==============
    +2013-01-14 11:05:38.898: [  OCROSD][26]utstoragetypecommon: Oracle Cluster Registry does not support the storage type configured. OCR can be configured on: ASM, NFS, Character Device, VxFS+
    +2013-01-14 11:05:38.898: [  OCROSD][26]utdvch:-1: New location /app/oracle/ocrvote2/2.ocr configured is not valid storage type. Return code [37].+
    +2013-01-14 11:05:38.898: [  OCRRAW][26]propriodvch: Error [8] returned device check for [app/oracle/ocrvote2/2.ocr]+
    +2013-01-14 11:05:38.898: [  OCRRAW][26]dev_replace: master could not verify the new disk (8)+
    File system mounted on both nodes with option mincache=direct.
    What can be the reason for this error?

    Trouble was solved by VCS patch 6.0.3
    Version 6.0.1 does not support Solaris 11.1

  • NICs for Private Interconnect redundancy

    DB/Grid version : 11.2.0.2
    Platform : AIX 6.1
    We are going to install a 2-node RAC on AIX (that thing which is almost good as Solaris )
    Our primary private interconnect is
    ### Primary Private Interconnect
    169.21.204.1      scnuprd186-privt1.mvtrs.net  scnuprd186-privt1
    169.21.204.4      scnuprd187-privt1.mvtrs.net  scnuprd187-privt1For Cluster inteconnect's redundancy , Unix team has attached an extra NIC for each node with an extra Gigabit-ethernet switch for these NICs.
    ###Redundant Private Interconnect attached to the server
    169.21.204.2      scnuprd186-privt2.mvtrs.net  scnuprd186-privt2  # Node1's newly attached redundant NIC
    169.21.204.5      scnuprd187-privt2.mvtrs.net  scnuprd187-privt2  # Node2's newly attached redundant NICExample borrowed from citizen2's post
    Apparently I have 2 ways to implement cluster inteconnect's redundancy
    Option1. NIC bonding at OS level
    Option2. Let grid software do it
    Question1. Which is better : Option 1 or 2 ?
    Question2.
    Regarding Option2.
    From googling and OTN , i gather that , during grid installation you just provide 169.21.204.0 for cluster inteconnect and grid will identify the redundant NIC and switch. And if something goes wrong with the Primary Interconnect setup (shown above) , grid will automatically re-route interconnect traffic using the redundant NIC setup. Is this correct ?
    Question 3.
    My colleague tells me , for the redundant Switch (Gigabit) Unless I configure some Multicasting (AIX specific), I could get errors during installation. He doesn't clearly what it was ? Anyone faced Multicasting related issue on this ?

    Hi,
    My recommendation is to you use the AIX EtherChannel.
    The EtherCannel of AIX is much more powerfull and stable compared with HAIP.
    See how setup AIX EtherChannel on 10 Gigabit Ethernet interfaces
    http://levipereira.wordpress.com/2011/01/26/setting-up-ibm-power-systems-10-gigabit-ethernet-ports-and-aix-6-1-etherchannel-for-oracle-rac-private-interconnectivity/
    If you choose use HAIP I recommend you read this note, and find all notes about bugs of HAIP on AIX.
    11gR2 Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip [ID 1210883.1]
    ASM Crashes as HAIP Does not Failover When Two or More Private Network Fails [ID 1323995.1]
    About Multicasting read it:
    Grid Infrastructure 11.2.0.2 Installation or Upgrade may fail due to Multicasting Requirement [ID 1212703.1]
    Regards,
    Levi Pereira

  • Oracle RAC Private Connection fail. what is the preferable node?

    Hello Everyone! I need all your helps about Oracle RAC.
    I would like to know that when the private network in oracle rac down, which is the preferable node in the cluster (cluster of 2 nodes) to takeover the cluster.
    Can I set/get the failover policy to point to one node in the cluster?
    Thank you in advance.

    Hi,
    Base on my testing environment, I use VMware to build the oracle rac.
    Because ifdown command reacts differently, now I use vmware feature on Network interface "disconnect" as pulling out the link from network interface.
    Please, kindly have a look on the actions, I have done. Correct me if I am wrong.
    I have two nodes which are node1 "rac1" and node2 "rac2".
    1. I start crs on node2 "rac2" so that node2 have the role to write to OCR
    2. After node2 start completely, I start crs on node1 "rac1" to join cluster.
    3. I use disconnect network link from private network on node1, and I also check using ethtool to check link detected:
    [root@rac1 ~]# ethtool eth1 | grep Link
    Link detected: no
    [root@rac1 ~]# ifconfig eth1
    eth1 Link encap:Ethernet HWaddr 00:0C:29:6A:73:20
    inet addr:192.168.2.231 Bcast:192.168.2.255 Mask:255.255.255.0
    UP BROADCAST MULTICAST MTU:1500 Metric:1
    RX packets:2658671 errors:0 dropped:0 overruns:0 frame:0
    TX packets:2069398 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:1000
    RX bytes:1615768514 (1.5 GiB) TX bytes:985464556 (939.8 MiB)
    4. After some seconds, I checked the log file
    [cssd(9517)]CRS-1612:Network communication with node rac2 (2) missing for 50% of timeout interval. Removal of this node from cluster in 14.620 seconds
    2013-02-12 04:12:30.419
    [cssd(9517)]CRS-1611:Network communication with node rac2 (2) missing for 75% of timeout interval. Removal of this node from cluster in 6.610 seconds
    2013-02-12 04:12:34.436
    [cssd(9517)]CRS-1610:Network communication with node rac2 (2) missing for 90% of timeout interval. Removal of this node from cluster in 2.590 seconds
    2013-02-12 04:12:37.036
    [cssd(9517)]CRS-1607:Node rac2 is being evicted in cluster incarnation 251972986; details at (:CSSNM00007:) in /u01/app/11.2.0/grid/log/rac1/cssd/ocssd.log.
    2013-02-12 04:12:39.136
    [cssd(9517)]CRS-1625:Node rac2, number 2, was manually shut down
    2013-02-12 04:12:39.140
    [cssd(9517)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 .
    2013-02-12 04:12:39.157
    [crsd(9957)]CRS-5504:Node down event reported for node 'rac2'.
    2013-02-12 04:12:45.519
    [crsd(9957)]CRS-2773:Server 'rac2' has been removed from pool 'Generic'.
    2013-02-12 04:12:45.519
    [crsd(9957)]CRS-2773:Server 'rac2' has been removed from pool 'ora.oradb'
    5. I check the status resource of cluster and I see that Node1 "rac1" is the survived node.
    Please help me to analyze it.
    Thanks,
    Edited by: 985243 on Feb 12, 2013 1:53 AM

  • Need procedure to change ip address on private interconnect in 11.2.0.3

    Could someone please send me the procedure to change the ip address of the private interconnect in 11gr2 rac (11.2.0.3)
    The interconnect has been configured using the default HAIP resource during installation of a 2 node cluster on the aix 6.1 platform. I have searched metalink but cannot find a doc with the procedure to make the ip address change.
    The sys admins gave us an ip address on the wrong subnet so now we have to change the ip address of the en1 interface.
    If anyone has steps in terms of shutting down the clusterware and correct order to make changes this would be very much appreciated.
    Thanks.

    Thanks, I seen this one also but I was just hoping to see some official documentation from oracle on this topic. I searched metalink and there is a doc id called
    "Grid infrastructure everything you need to know" but it does not speak to this configuration change or even how to disable the clusterware in the event that you need to perform maintenance and do not want the clusterware to automatically come online.
    Although I love google too... but If there are any official documentation on this topic I would really appreciate to know where it can be found?
    Thanks.

  • Can I use virtual Servers in private cloud for RAC

    Hello  to all
    We are going to install an Oracle RAC on two servers
    But our Hardware Administrator says to us   “I Allocate two virtual servers in the our private cloud not two physical Servers (or real Servers)”
    Do you think it’s practical and reasonable to using virtual Server for Oracle RAC  in production environment ?
    Which one is better physical server or virtual server  for RAC?
    Please write your reasons
    Thanks

    Using virtual machines is officially  supported for RAC only in a few cases which can be found here:
    http://www.oracle.com/technetwork/database/virtualizationmatrix-172995.html
    Make sure that you meet these requirements in your private cloud. Some cases like vmware are still somewhat supported despite beeing not on the list.
    Beside this you should make sure that your 2 virtual machines run on different hardware servers in the cloud, otherwise you lose most parts of the rac advantage regarding high availability, when both virtual servers happen to run on the same hardware during a crash
    Virtual servers are used in production environments, but you will have to take greater care for many aspects of rac compared to physical hardware, e.g.. something like "live migration" of vmware can kill a rac node due to timeout.
    I would prefer hardware for rac anytime over virtual servers and spare me the hassle of dealing with all possible issues arising from the virtualization.
    And check oracles licensing policy...
    Running an enterprise edition rac on e.g. a large vmware cluster is insanely expensive, you pay every cpu core the rac COULD run on -> the entire cluster!
    If you must use virtual hardware but don't want to and need an argument against it use the license issue.
    Regards
    Thomas

  • During the installation of grid infra(cluster) for Oracle 11.2 RAC one.

    Good Day All, and thanks in advance…
    During the installation of grid infrastructure(cluster) for Oracle 11.2 RAC One Node on AIX6.1 ( PROD) , ASM used. I am getting below errors when executing ./root.sh
    Upon investigation ,I managed to get note: 1068212.1 from the support oracle site ( see below for details) . I might be hitting Unpublished bug 8670579. I also logged Severity 2 SR with Oracle support to get the bug/patch fix and no one has attended the call.
    This might be configuration issue or otherwise , if you have experienced the same issue please assist ? ( if you need more logfiles please feel free to request)….
    I ran the Cluster Verify Check – all passed.
    Many Thanks
    Ezekiel Filane
    /u01/app/11.2.0/grid#./root.sh
    Running Oracle 11g root.sh script...
    The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME= /u01/app/11.2.0/grid
    Enter the full pathname of the local bin directory: [usr/local/bin]:
    The file "dbhome" already exists in /usr/local/bin. Overwrite it? (y/n) [n]:
    The file "oraenv" already exists in /usr/local/bin. Overwrite it? (y/n) [n]:
    The file "coraenv" already exists in /usr/local/bin. Overwrite it? (y/n) [n]:
    Creating /etc/oratab file...
    Entries will be added to the /etc/oratab file as needed by
    Database Configuration Assistant when a database is created
    Finished running generic part of root.sh script.
    Now product-specific root actions will be performed.
    2010-10-19 10:33:11: Parsing the host name
    2010-10-19 10:33:11: Checking for super user privileges
    2010-10-19 10:33:11: User has super user privileges
    Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
    Creating trace directory
    User grid has the required capabilities to run CSSD in realtime mode
    LOCAL ADD MODE
    Creating OCR keys for user 'root', privgrp 'system'..
    Operation successful.
    root wallet
    root wallet cert
    root cert export
    peer wallet
    profile reader wallet
    pa wallet
    peer wallet keys
    pa wallet keys
    peer cert request
    pa cert request
    peer cert
    pa cert
    peer root cert TP
    profile reader root cert TP
    pa root cert TP
    peer pa cert TP
    pa peer cert TP
    profile reader pa cert TP
    profile reader peer cert TP
    peer user cert
    pa user cert
    Adding daemon to inittab
    CRS-4123: Oracle High Availability Services has been started.
    ohasd is starting
    CRS-2672: Attempting to start 'ora.gipcd' on 'csgipm'
    CRS-2672: Attempting to start 'ora.mdnsd' on 'csgipm'
    CRS-2676: Start of 'ora.gipcd' on 'csgipm' succeeded
    CRS-2676: Start of 'ora.mdnsd' on 'csgipm' succeeded
    CRS-2672: Attempting to start 'ora.gpnpd' on 'csgipm'
    CRS-2676: Start of 'ora.gpnpd' on 'csgipm' succeeded
    CRS-2672: Attempting to start 'ora.cssdmonitor' on 'csgipm'
    CRS-2676: Start of 'ora.cssdmonitor' on 'csgipm' succeeded
    CRS-2672: Attempting to start 'ora.cssd' on 'csgipm'
    CRS-2672: Attempting to start 'ora.diskmon' on 'csgipm'
    CRS-2676: Start of 'ora.diskmon' on 'csgipm' succeeded
    CRS-2676: Start of 'ora.cssd' on 'csgipm' succeeded
    CRS-2672: Attempting to start 'ora.ctssd' on 'csgipm'
    Start action for daemon aborted
    CRS-2674: Start of 'ora.ctssd' on 'csgipm' failed
    CRS-2679: Attempting to clean 'ora.ctssd' on 'csgipm'
    CRS-2681: Clean of 'ora.ctssd' on 'csgipm' succeeded
    CRS-4000: Command Start failed, or completed with errors.
    Command return code of 1 (256) from command: /u01/app/11.2.0/grid/bin/crsctl start resource ora.ctssd -init
    Start of resource "ora.ctssd -init" failed
    Clusterware exclusive mode start of resource ora.ctssd failed
    CRS-2500: Cannot stop resource 'ora.crsd' as it is not running
    CRS-4000: Command Stop failed, or completed with errors.
    Command return code of 1 (256) from command: /u01/app/11.2.0/grid/bin/crsctl stop resource ora.crsd -init
    Stop of resource "ora.crsd -init" failed
    Failed to stop CRSD
    CRS-2500: Cannot stop resource 'ora.asm' as it is not running
    CRS-4000: Command Stop failed, or completed with errors.
    Command return code of 1 (256) from command: /u01/app/11.2.0/grid/bin/crsctl stop resource ora.asm -init
    Stop of resource "ora.asm -init" failed
    Failed to stop ASM
    CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'csgipm'
    CRS-2677: Stop of 'ora.cssdmonitor' on 'csgipm' succeeded
    CRS-2673: Attempting to stop 'ora.cssd' on 'csgipm'
    CRS-2677: Stop of 'ora.cssd' on 'csgipm' succeeded
    CRS-2673: Attempting to stop 'ora.gpnpd' on 'csgipm'
    CRS-2677: Stop of 'ora.gpnpd' on 'csgipm' succeeded
    CRS-2673: Attempting to stop 'ora.gipcd' on 'csgipm'
    CRS-2677: Stop of 'ora.gipcd' on 'csgipm' succeeded
    CRS-2673: Attempting to stop 'ora.mdnsd' on 'csgipm'
    CRS-2677: Stop of 'ora.mdnsd' on 'csgipm' succeeded
    Initial cluster configuration failed. See /u01/app/11.2.0/grid/cfgtoollogs/crsconfig/rootcrs_csgipm.log for details
    csgipm:/u01/app/11.2.0/grid#ps -ef | grep pmon
    root 6160492 3932160 0 10:54:13 pts/2 0:00 grep pmon
    more /u01/app/11.2.0/grid/log/csgipm/client/ocrconfig_5767204.log
    csgipm:/usr/sbin#more /u01/app/11.2.0/grid/log/csgipm/client/ocrconfig_5767204.log
    2010-10-19 10:33:14.435: [  OCROSD][1]utread:3: Problem reading buffer 104ef000 buflen 4096 retval 0 phy_offset 102400 retry 4
    2010-10-19 10:33:14.435: [  OCROSD][1]utread:3: Problem reading buffer 104ef000 buflen 4096 retval 0 phy_offset 102400 retry 5
    2010-10-19 10:33:14.435: [  OCRRAW][1]propriogid:1_1: Failed to read the whole bootblock. Assumes invalid format.
    2010-10-19 10:33:14.435: [  OCRRAW][1]proprioini: all disks are not OCR/OLR formatted
    2010-10-19 10:33:14.435: [  OCRRAW][1]proprinit: Could not open raw device
    2010-10-19 10:33:14.442: [ default][1]a_init:7!: Backend init unsuccessful : [26]
    2010-10-19 10:33:14.461: [ OCRCONF][1]Exporting OCR data to [OCRUPGRADEFILE]
    2010-10-19 10:33:14.461: [  OCRAPI][1]a_init:7!: Backend init unsuccessful : [33]
    2010-10-19 10:33:14.461: [ OCRCONF][1]There was no previous version of OCR. error:[PROCL-33: Oracle Local Registry is not configured]
    2010-10-19 10:33:14.461: [  OCROSD][1]utread:3: Problem reading buffer 104ef000 buflen 4096 retval 0 phy_offset 102400 retry 0
    2010-10-19 10:33:14.461: [  OCROSD][1]utread:3: Problem reading buffer 104ef000 buflen 4096 retval 0 phy_offset 102400 retry 1
    2010-10-19 10:33:14.462: [  OCROSD][1]utread:3: Problem reading buffer 104ef000 buflen 4096 retval 0 phy_offset 102400 retry 2
    2010-10-19 10:33:14.462: [  OCROSD][1]utread:3: Problem reading buffer 104ef000 buflen 4096 retval 0 phy_offset 102400 retry 3
    2010-10-19 10:33:14.462: [  OCROSD][1]utread:3: Problem reading buffer 104ef000 buflen 4096 retval 0 phy_offset 102400 retry 4
    2010-10-19 10:33:14.462: [  OCROSD][1]utread:3: Problem reading buffer 104ef000 buflen 4096 retval 0 phy_offset 102400 retry 5
    2010-10-19 10:33:14.462: [  OCRRAW][1]propriogid:1_1: Failed to read the whole bootblock. Assumes invalid format.
    2010-10-19 10:33:14.462: [  OCRRAW][1]proprioini: all disks are not OCR/OLR formatted
    2010-10-19 10:33:14.462: [  OCRRAW][1]proprinit: Could not open raw device
    2010-10-19 10:33:14.462: [ default][1]a_init:7!: Backend init unsuccessful : [26]
    2010-10-19 10:33:14.462: [  OCROSD][1]utread:3: Problem reading buffer 104ef000 buflen 4096 retval 0 phy_offset 102400 retry 0
    2010-10-19 10:33:14.463: [  OCROSD][1]utread:3: Problem reading buffer 104ef000 buflen 4096 retval 0 phy_offset 102400 retry 1
    2010-10-19 10:33:14.463: [  OCROSD][1]utread:3: Problem reading buffer 104ef000 buflen 4096 retval 0 phy_offset 102400 retry 2
    2010-10-19 10:33:14.463: [  OCROSD][1]utread:3: Problem reading buffer 104ef000 buflen 4096 retval 0 phy_offset 102400 retry 3
    2010-10-19 10:33:14.463: [  OCROSD][1]utread:3: Problem reading buffer 104ef000 buflen 4096 retval 0 phy_offset 102400 retry 4
    2010-10-19 10:33:14.463: [  OCROSD][1]utread:3: Problem reading buffer 104ef000 buflen 4096 retval 0 phy_offset 102400 retry 5
    2010-10-19 10:33:14.463: [  OCRRAW][1]propriogid:1_1: Failed to read the whole bootblock. Assumes invalid format.
    2010-10-19 10:33:14.463: [  OCROSD][1]utread:3: Problem reading buffer 104ef000 buflen 4096 retval 0 phy_offset 102400 retry 0
    2010-10-19 10:33:14.463: [  OCROSD][1]utread:3: Problem reading buffer 104ef000 buflen 4096 retval 0 phy_offset 102400 retry 1
    2010-10-19 10:33:14.463: [  OCROSD][1]utread:3: Problem reading buffer 104ef000 buflen 4096 retval 0 phy_offset 102400 retry 2
    2010-10-19 10:33:14.463: [  OCROSD][1]utread:3: Problem reading buffer 104ef000 buflen 4096 retval 0 phy_offset 102400 retry 3
    2010-10-19 10:33:14.463: [  OCROSD][1]utread:3: Problem reading buffer 104ef000 buflen 4096 retval 0 phy_offset 102400 retry 4
    2010-10-19 10:33:14.463: [  OCROSD][1]utread:3: Problem reading buffer 104ef000 buflen 4096 retval 0 phy_offset 102400 retry 5
    2010-10-19 10:33:14.483: [  OCRRAW][1]ibctx: Failed to read the whole bootblock. Assumes invalid format.
    2010-10-19 10:33:14.483: [  OCRRAW][1]proprinit:problem reading the bootblock or superbloc 22
    2010-10-19 10:33:14.483: [  OCROSD][1]utread:3: Problem reading buffer 104fe000 buflen 4096 retval 0 phy_offset 102400 retry 0
    2010-10-19 10:33:14.483: [  OCROSD][1]utread:3: Problem reading buffer 104fe000 buflen 4096 retval 0 phy_offset 102400 retry 1
    2010-10-19 10:33:14.483: [  OCROSD][1]utread:3: Problem reading buffer 104fe000 buflen 4096 retval 0 phy_offset 102400 retry 2
    2010-10-19 10:33:14.484: [  OCROSD][1]utread:3: Problem reading buffer 104fe000 buflen 4096 retval 0 phy_offset 102400 retry 3
    2010-10-19 10:33:14.484: [  OCROSD][1]utread:3: Problem reading buffer 104fe000 buflen 4096 retval 0 phy_offset 102400 retry 4
    2010-10-19 10:33:14.484: [  OCROSD][1]utread:3: Problem reading buffer 104fe000 buflen 4096 retval 0 phy_offset 102400 retry 5
    2010-10-19 10:33:14.484: [  OCRRAW][1]propriogid:1_1: Failed to read the whole bootblock. Assumes invalid format.
    2010-10-19 10:33:14.541: [  OCRAPI][1]a_init:6a: Backend init successful
    2010-10-19 10:33:14.646: [ OCRCONF][1]Initialized DATABASE keys
    2010-10-19 10:33:14.650: [ OCRCONF][1]Exiting [status=success]...

    Hi,
    We are also trying to install 11.2.0.2 Grid infrastructure for Oracle RAC One Node on AIX 6.1. We did a POC in our lab environment and after much struggle got that working. Now we are building 4 clusters in the production environment and the first cluster installation failed while running root.sh on node2. We already have a Sev1 ticket open with Oracle Support but have not heard anything.
    Here is root.sh output from node2. The two node names are p01dou416 and p01dou417.
    CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node p01dou416, number 1, and is terminating
    An active cluster was found during exclusive startup, restarting to join the cluster
    Failed to start Oracle Clusterware stack
    Failed to start Cluster Synchorinisation Service in clustered mode at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 1020.
    /u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed
    [root@P01DOU417] /u01/app/11.2.0/grid #
    LOG output: /u01/app/11.2.0/grid/cfgtoollogs/crsconfig/ rootcrs_p01dou417.log
    2010-11-13 17:22:14: Successfully started requested Oracle stack daemons
    2010-11-13 17:22:14: Starting CSS in clustered mode
    2010-11-13 17:22:14: Executing cmd: /u01/app/11.2.0/grid/bin/crsctl start resource ora.cssd -init
    2010-11-13 17:32:28: Command output:
    CRS-2672: Attempting to start 'ora.cssdmonitor' on 'p01dou417'
    CRS-2672: Attempting to start 'ora.gipcd' on 'p01dou417'
    CRS-2676: Start of 'ora.cssdmonitor' on 'p01dou417' succeeded
    CRS-2676: Start of 'ora.gipcd' on 'p01dou417' succeeded> CRS-2679: Attempting to clean 'ora.cssd' on 'p01dou417'
    CRS-2681: Clean of 'ora.cssd' on 'p01dou417' succeeded
    CRS-2673: Attempting to stop 'ora.diskmon' on 'p01dou417'
    CRS-2677: Stop of 'ora.diskmon' on 'p01dou417' succeeded
    CRS-2673: Attempting to stop 'ora.gipcd' on 'p01dou417'
    CRS-2677: Stop of 'ora.gipcd' on 'p01dou417' succeeded
    CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'p01dou417'
    CRS-2677: Stop of 'ora.cssdmonitor' on 'p01dou417' succeeded
    CRS-5804: Communication error with agent process
    CRS-4000: Command Start failed, or completed with errors.
    End Command output2010-11-13 17:32:28: Executing cmd: /u01/app/11.2.0/grid/bin/crsctl check css
    2010-11-13 17:32:28: Command output:
    CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
    End Command output2010-11-13 17:32:28: Checking the status of css
    2010-11-13 17:32:33: Executing cmd: /u01/app/11.2.0/grid/bin/crsctl check css
    2010-11-13 17:32:33: Command output:
    CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
    End Command output2010-11-13 17:32:33: Checking the status of css
    2010-11-13 17:32:38: CRS-2672: Attempting to start 'ora.cssdmonitor' on 'p01dou417'
    2010-11-13 17:32:38: CRS-2672: Attempting to start 'ora.gipcd' on 'p01dou417'
    2010-11-13 17:32:38: CRS-2676: Start of 'ora.cssdmonitor' on 'p01dou417' succeeded
    2010-11-13 17:32:38: CRS-2676: Start of 'ora.gipcd' on 'p01dou417' succeeded
    2010-11-13 17:32:38: CRS-2672: Attempting to start 'ora.cssd' on 'p01dou417'
    2010-11-13 17:32:38: CRS-2672: Attempting to start 'ora.diskmon' on 'p01dou417'
    2010-11-13 17:32:38: CRS-2676: Start of 'ora.diskmon' on 'p01dou417' succeeded
    2010-11-13 17:32:38: CRS-2674: Start of 'ora.cssd' on 'p01dou417' failed
    2010-11-13 17:32:38: CRS-2679: Attempting to clean 'ora.cssd' on 'p01dou417'
    2010-11-13 17:32:38: CRS-2681: Clean of 'ora.cssd' on 'p01dou417' succeeded
    2010-11-13 17:32:38: CRS-2673: Attempting to stop 'ora.diskmon' on 'p01dou417'
    2010-11-13 17:32:38: CRS-2677: Stop of 'ora.diskmon' on 'p01dou417' succeeded
    2010-11-13 17:32:38: CRS-2673: Attempting to stop 'ora.gipcd' on 'p01dou417'
    2010-11-13 17:32:38: CRS-2677: Stop of 'ora.gipcd' on 'p01dou417' succeeded
    2010-11-13 17:32:38: CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'p01dou417'
    2010-11-13 17:32:38: CRS-2677: Stop of 'ora.cssdmonitor' on 'p01dou417' succeeded
    2010-11-13 17:32:38: CRS-5804: Communication error with agent process
    2010-11-13 17:32:38: CRS-4000: Command Start failed, or completed with errors.
    2010-11-13 17:32:38: Failed to start Oracle Clusterware stack
    2010-11-13 17:32:38: ###### Begin DIE Stack Trace ######
    2010-11-13 17:32:38: Package File Line Calling
    2010-11-13 17:32:38: --------------- -------------------- ---- ----------
    2010-11-13 17:32:38: 1: main rootcrs.pl 324 crsconfig_lib::dietrap
    2010-11-13 17:32:38: 2: crsconfig_lib crsconfig_lib.pm 1020 main::__ANON__
    2010-11-13 17:32:38: 3: crsconfig_lib crsconfig_lib.pm 997 crsconfig_lib::start_cluster
    2010-11-13 17:32:38: 4: main rootcrs.pl 697 crsconfig_lib::perform_start_cluster
    2010-11-13 17:32:38: ####### End DIE Stack Trace #######
    2010-11-13 17:32:38: 'ROOTCRS_STACK' checkpoint has failed
    Any help on this is appreciated.
    Edited by: user12019257 on Nov 17, 2010 1:26 PM

Maybe you are looking for

  • Find in Bridge

    Hi All: Im having a problem with the find command in Bridge. Basically it cant seem to find anything. It cant even find a file in a folder that Im looking at with Bridge. No mater what search criteria I select Bridge reports that there is nothing to

  • Installment Payment Terms value break up

    Hello All, As per Satndard configuration for installment payment terms, we can do the total value breakup into installment payment terms percentage wise(%). My requirement is in case I have 3 installment payment terms(Say SD01, SD02 & SD03) wrt one m

  • Accidentally deleted mail app: How do I get it back?

    Today I somehow managed to delete my mail app -- it's totally gone. I can't find a way to get it back. Anyone know how I can reinstall it? I tried a fresh reinstall but it seems to remember my last setting with it gone. I have an iPhone 4.

  • Oracle10g on linux

    Hi, I am having dual boot on my PC for xp and linux. I have installed oracle 10g on xp. Now My question is : 1.Can I access oracle from Linux operating system with sql commands and throughshell. 2.Is it required me ti install separate oracle10 db on

  • How to compile xml

    i write xml file in Eclipse and compile it its giving errors.