Oracle RAC interconnect performance by GRIDControl

Hi All,
We have Oracle 10g Rac and we manage database through Grid control.
We are not able to see <Performance> and <Interconnects> tabs in rac cluster page , do you guys know why?
logged into sysman--> at right corner <targets> at left side I could see database list , I selected Rac database name and clicked --> at the top left most corner I see a link like below, so if I click on this hyper link (DBname) it is taking me to cluster page, here it is not able to enable two tabed pans <Performance> and <Interconnects> can anyone please help me how to check this information in Grid Control
Cluster: DBNAME >
Thanks in advance

First click on the target type Cluster Database, that will take you to overall Cluster Database : <your cluster database name> page. There on the top of the page, left side you will see a hyperlink with name Cluster: <cluster name>, click on this cluster name hyperlink that will take you to the Cluster page where interconnect tabs are enabled.
-Harish Kumar Kalra

Similar Messages

  • Oracle RAC Interconnect, PowerVM VLANs, and the Limit of 20

    Hello,
    Our company has a requirement to build a multitude of Oracle RAC clusters on AIX using Power VM on 770s and 795 hardware.
    We presently have 802.1q trunking configured on our Virtual I/O Servers, and have currently consumed 12 of 20 allowed VLANs for a virtual ethernet adapter. We have read the Oracle RAC FAQ on Oracle Metalink and it seems to otherwise discourage the use of sharing these interconnect VLANs between different clusters. This puts us in a scalability bind; IBM limits VLANs to 20 and Oracle says there is a one-to-one relationship between VLANs and subnets and RAC clusters. We must assume we have a fixed number of network interfaces available and that we absolutely have to leverage virtualized network hardware in order to build these environments. "add more network adapters to VIO" isn't an acceptable solution for us.
    Does anyone know if Oracle can afford any flexibility which would allow us to host multiple Oracle RAC interconnects on the same 802.1q trunk VLAN? We will independently guarantee the bandwidth, latency, and redundancy requirements are met for proper Oracle RAC performance, however we don't want a design "flaw" to cause us supportability issues in the future.
    We'd like it very much if we could have a bunch of two-node clusters all sharing the same private interconnect. For example:
    Cluster 1, node 1: 192.168.16.2 / 255.255.255.0 / VLAN 16
    Cluster 1, node 2: 192.168.16.3 / 255.255.255.0 / VLAN 16
    Cluster 2, node 1: 192.168.16.4 / 255.255.255.0 / VLAN 16
    Cluster 2, node 2: 192.168.16.5 / 255.255.255.0 / VLAN 16
    Cluster 3, node 1: 192.168.16.6 / 255.255.255.0 / VLAN 16
    Cluster 3, node 2: 192.168.16.7 / 255.255.255.0 / VLAN 16
    Cluster 4, node 1: 192.168.16.8 / 255.255.255.0 / VLAN 16
    Cluster 4, node 2: 192.168.16.9 / 255.255.255.0 / VLAN 16
    etc.
    Whereas the concern is that Oracle Corp will only support us if we do this:
    Cluster 1, node 1: 192.168.16.2 / 255.255.255.0 / VLAN 16
    Cluster 1, node 2: 192.168.16.3 / 255.255.255.0 / VLAN 16
    Cluster 2, node 1: 192.168.17.2 / 255.255.255.0 / VLAN 17
    Cluster 2, node 2: 192.168.17.3 / 255.255.255.0 / VLAN 17
    Cluster 3, node 1: 192.168.18.2 / 255.255.255.0 / VLAN 18
    Cluster 3, node 2: 192.168.18.3 / 255.255.255.0 / VLAN 18
    Cluster 4, node 1: 192.168.19.2 / 255.255.255.0 / VLAN 19
    Cluster 4, node 2: 192.168.19.3 / 255.255.255.0 / VLAN 19
    Which eats one VLAN per RAC cluster.

    Thank you for your answer!!
    I think I roughly understand the argument behind a 2-node RAC and a 3-node or greater RAC. We, unfortunately, were provided with two physical pieces of hardware to virtualize to support production (and two more to support non-production) and as a result we really have no place to host a third RAC node without placing it within the same "failure domain" (I hate that term) as one of the other nodes.
    My role is primarily as a system engineer, and, generally speaking, our main goals are eliminating single points of failure. We may be misusing 2-node RACs to eliminate single points of failure since it seems to violate the real intentions behind RAC, which is used more appropriately to scale wide to many nodes. Unfortunately, we've scaled out to only two nodes, and opted to scale these two nodes up, making them huge with many CPUs and lots of memory.
    Other options, notably the active-passive failover cluster we have in HACMP or PowerHA on the AIX / IBM Power platform is unattractive as the standby node drives no resources yet must consume CPU and memory resources so that it is prepared for a failover of the primary node. We use HACMP / PowerHA with Oracle and it works nice, however Oracle RAC, even in a two-node configuration, drives load on both nodes unlike with an active-passive clustering technology.
    All that aside, I am posing the question to both IBM, our Oracle DBAs (whom will ask Oracle Support). Typically the answers we get vary widely depending on the experience and skill level of the support personnel we get on both the Oracle and IBM sides... so on a suggestion from a colleague (Hi Kevin!) I posted here. I'm concerned that the answer from Oracle Support will unthinkingly be "you can't do that, my script says to tell you the absolute most rigid interpretation of the support document" while all the time the same document talks of the use of NFS and/or iSCSI storage eye roll
    We have a massive deployment of Oracle EBS and honestly the interconnect doesn't even touch 100mbit speeds even though the configuration has been checked multiple times by Oracle and IBM and with the knowledge that Oracle EBS is supposed to heavily leverage RAC. I haven't met a single person who doesn't look at our environment and suggest jumbo frames. It's a joke at this point... comments like "OMG YOU DON'T HAVE JUMBO FRAMES" and/or "OMG YOU'RE NOT USING INFINIBAND WHATTA NOOB" are commonplace when new DBAs are hired. I maintain that the utilization numbers don't support this.
    I can tell you that we have 8Gb fiber channel storage and 10Gb network connectivity. I would probably assume that there were a bottleneck in the storage infrastructure first. But alas, I digress.
    Mainly I'm looking for a real-world answer to this question. Aside from violating every last recommendation and making oracle support folk gently weep at the suggestion, are there any issues with sharing interconnects between RAC environments that will prevent it's functionality and/or reduce it's stability?
    We have rapid spanning tree configured, as far as I know, and our network folks have tuned the timers razor thin. We have Nexus 5k and Nexus 7k network infrastructure. The typical issues you'd fine with standard spanning tree really don't affect us because our network people are just that damn good.

  • RAC Interconnect performance

    Hi,
    We are facing RAC Interconnect performance problems.
    Oracle Version: Oracle 9i RAC (9.2.0.7)
    Operating system: SunOS 5.8
    SQL> SELECT b1.inst_id, b2.value "RECEIVED",
    b1.value "RECEIVE TIME",
    ((b1.value / b2.value) * 10) "AVG RECEIVE TIME (ms)"
    FROM gv$sysstat b1, gv$sysstat b2
    WHERE b1.name = 'global cache cr block receive time'
    AND b2.name = 'global cache cr blocks received'
    AND b1.inst_id = b2.inst_id;
    INST_ID RECEIVED RECEIVE TIME AVG RECEIVE TIME (ms)
    1 323849 172359 5.32220263
    2 675806 94537 1.39887778
    After database restart average time increases for Instance 1 and instance 2 remains similar.
    Application performance degrades, restart database solves the issue. This is critical application and can not have frequent downtimes for restart.
    What specific points should I check to find out to improve interconnect performance?
    Thanks
    Dilip Patel.

    Hi,
    Configurations:
    Node: 1
    Hardware Model: Sun-Fire-V890
    OS: SunOS 5.8
    Release: Generic_117350-53
    CPU: 16 sparcv9 cpu(s) running at 1200 MHz
    Memory: 40.0GB
    Node: 2
    Hardware Model: Sun-Fire-V890
    OS: SunOS 5.8
    Release: Generic_117350-53
    CPU: 16 sparcv9 cpu(s) running at 1200 MHz
    Memory: 40.0GB
    CPU Utilization on Node 1 is never exceeded 40%.
    CPU Utilization on Node 2 is between 20% to 30%.
    Application load is more Node 1 compared to Node 2.
    I can observer wait event "global cache cr request" in top 5 wait events on most of the statspack report. Application faces degrade performacne after few days of restart database. No major changes done on application recently.
    Statapack report for Node 1:
    DB Name         DB Id    Instance     Inst Num Release     Cluster Host
    XXXX          2753907139 xxxx1               1 9.2.0.7.0   YES    xxxxx
                  Snap Id     Snap Time      Sessions Curs/Sess Comment
    Begin Snap:     61688 17-Feb-09 09:10:06      253     299.4
      End Snap:     61698 17-Feb-09 10:10:06      285     271.6
       Elapsed:               60.00 (mins)
    Cache Sizes (end)
    ~~~~~~~~~~~~~~~~~
                   Buffer Cache:     2,048M      Std Block Size:          8K
               Shared Pool Size:       384M          Log Buffer:      2,048K
    Load Profile
    ~~~~~~~~~~~~                            Per Second       Per Transaction
                      Redo size:            102,034.92              4,824.60
                  Logical reads:             60,920.35              2,880.55
                  Block changes:                986.07                 46.63
                 Physical reads:              1,981.12                 93.67
                Physical writes:                 28.30                  1.34
                     User calls:              2,651.63                125.38
                         Parses:                500.89                 23.68
                    Hard parses:                 21.44                  1.01
                          Sorts:                 66.91                  3.16
                         Logons:                  3.69                  0.17
                       Executes:                553.34                 26.16
                   Transactions:                 21.15
      % Blocks changed per Read:    1.62    Recursive Call %:     22.21
    Rollback per transaction %:    2.90       Rows per Sort:      7.44
    Instance Efficiency Percentages (Target 100%)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                Buffer Nowait %:   99.99       Redo NoWait %:    100.00
                Buffer  Hit   %:   96.75    In-memory Sort %:    100.00
                Library Hit   %:   98.30        Soft Parse %:     95.72
             Execute to Parse %:    9.48         Latch Hit %:     99.37
    Parse CPU to Parse Elapsd %:   90.03     % Non-Parse CPU:     92.97
    Shared Pool Statistics        Begin   End
                 Memory Usage %:   94.23   94.93
        % SQL with executions>1:   74.96   74.66
      % Memory for SQL w/exec>1:   82.93   72.26
    Top 5 Timed Events
    ~~~~~~~~~~~~~~~~~~                                                     % Total
    Event                                               Waits    Time (s) Ela Time
    db file sequential read                         1,080,532      13,191    40.94
    CPU time                                                       10,183    31.60
    db file scattered read                            456,075       3,977    12.34
    wait for unread message on broadcast channel        4,195       2,770     8.60
    global cache cr request                         1,633,056         873     2.71
    Cluster Statistics for DB: EPIP  Instance: epip1  Snaps: 61688 -61698
    Global Cache Service - Workload Characteristics
    Ave global cache get time (ms):                            0.8
    Ave global cache convert time (ms):                        1.1
    Ave build time for CR block (ms):                          0.1
    Ave flush time for CR block (ms):                          0.2
    Ave send time for CR block (ms):                           0.3
    Ave time to process CR block request (ms):                 0.6
    Ave receive time for CR block (ms):                        4.4
    Ave pin time for current block (ms):                       0.2
    Ave flush time for current block (ms):                     0.0
    Ave send time for current block (ms):                      0.3
    Ave time to process current block request (ms):            0.5
    Ave receive time for current block (ms):                   2.6
    Global cache hit ratio:                                    3.9
    Ratio of current block defers:                             0.0
    % of messages sent for buffer gets:                        3.7
    % of remote buffer gets:                                   0.3
    Ratio of I/O for coherence:                                1.1
    Ratio of local vs remote work:                            10.9
    Ratio of fusion vs physical writes:                        0.0
    Global Enqueue Service Statistics
    Ave global lock get time (ms):                             0.1
    Ave global lock convert time (ms):                         0.0
    Ratio of global lock gets vs global lock releases:         1.0
    GCS and GES Messaging statistics
    Ave message sent queue time (ms):                          0.4
    Ave message sent queue time on ksxp (ms):                  1.8
    Ave message received queue time (ms):                      0.2
    Ave GCS message process time (ms):                         0.1
    Ave GES message process time (ms):                         0.0
    % of direct sent messages:                                 8.0
    % of indirect sent messages:                              49.4
    % of flow controlled messages:                            42.6
    GES Statistics for DB: EPIP  Instance: epip1  Snaps: 61688 -61698
    Statistic                                    Total   per Second    per Trans
    dynamically allocated gcs resourc                0          0.0          0.0
    dynamically allocated gcs shadows                0          0.0          0.0
    flow control messages received                   0          0.0          0.0
    flow control messages sent                       0          0.0          0.0
    gcs ast xid                                      0          0.0          0.0
    gcs blocked converts                         2,830          0.8          0.0
    gcs blocked cr converts                      7,677          2.1          0.1
    gcs compatible basts                             5          0.0          0.0
    gcs compatible cr basts (global)               142          0.0          0.0
    gcs compatible cr basts (local)            142,678         39.6          1.9
    gcs cr basts to PIs                              0          0.0          0.0
    gcs cr serve without current lock                0          0.0          0.0
    gcs error msgs                                   0          0.0          0.0
    gcs flush pi msgs                              798          0.2          0.0
    gcs forward cr to pinged instance                0          0.0          0.0
    gcs immediate (compatible) conver            9,296          2.6          0.1
    gcs immediate (null) converts               52,460         14.6          0.7
    gcs immediate cr (compatible) con          752,507        209.0          9.9
    gcs immediate cr (null) converts         4,047,959      1,124.4         53.2
    gcs msgs process time(ms)                  153,618         42.7          2.0
    gcs msgs received                        2,287,640        635.5         30.0
    gcs out-of-order msgs                            0          0.0          0.0
    gcs pings refused                           70,099         19.5          0.9
    gcs queued converts                              0          0.0          0.0
    gcs recovery claim msgs                          0          0.0          0.0
    gcs refuse xid                                   1          0.0          0.0
    gcs retry convert request                        0          0.0          0.0
    gcs side channel msgs actual                40,400         11.2          0.5
    gcs side channel msgs logical            4,039,700      1,122.1         53.1
    gcs write notification msgs                     46          0.0          0.0
    gcs write request msgs                         972          0.3          0.0
    gcs writes refused                               4          0.0          0.0
    ges msgs process time(ms)                    2,713          0.8          0.0
    ges msgs received                           73,687         20.5          1.0
    global posts dropped                             0          0.0          0.0
    global posts queue time                          0          0.0          0.0
    global posts queued                              0          0.0          0.0
    global posts requested                           0          0.0          0.0
    global posts sent                                0          0.0          0.0
    implicit batch messages received           288,801         80.2          3.8
    implicit batch messages sent               622,610        172.9          8.2
    lmd msg send time(ms)                        2,148          0.6          0.0
    lms(s) msg send time(ms)                         1          0.0          0.0
    messages flow controlled                 3,473,393        964.8         45.6
    messages received actual                   765,292        212.6         10.1
    messages received logical                2,360,972        655.8         31.0
    messages sent directly                     654,760        181.9          8.6
    messages sent indirectly                 4,027,924      1,118.9         52.9
    msgs causing lmd to send msgs               33,481          9.3          0.4
    msgs causing lms(s) to send msgs            13,220          3.7          0.2
    msgs received queue time (ms)              379,304        105.4          5.0
    msgs received queued                     2,359,723        655.5         31.0
    msgs sent queue time (ms)                1,514,305        420.6         19.9
    msgs sent queue time on ksxp (ms)        4,349,174      1,208.1         57.1
    msgs sent queued                         4,032,426      1,120.1         53.0
    msgs sent queued on ksxp                 2,415,381        670.9         31.7
    GES Statistics for DB: EPIP  Instance: epip1  Snaps: 61688 -61698
    Statistic                                    Total   per Second    per Trans
    process batch messages received            278,174         77.3          3.7
    process batch messages sent                913,611        253.8         12.0
    Wait Events for DB: EPIP  Instance: epip1  Snaps: 61688 -61698
    -> s  - second
    -> cs - centisecond -     100th of a second
    -> ms - millisecond -    1000th of a second
    -> us - microsecond - 1000000th of a second
    -> ordered by wait time desc, waits desc (idle events last)
                                                                       Avg
                                                         Total Wait   wait    Waits
    Event                               Waits   Timeouts   Time (s)   (ms)     /txn
    db file sequential read         1,080,532          0     13,191     12     14.2
    db file scattered read            456,075          0      3,977      9      6.0
    wait for unread message on b        4,195      1,838      2,770    660      0.1
    global cache cr request         1,633,056      8,417        873      1     21.4
    db file parallel write              8,243          0        260     32      0.1
    buffer busy waits                  16,811          0        168     10      0.2
    log file parallel write           187,783          0        158      1      2.5
    log file sync                      75,143          0        147      2      1.0
    buffer busy global CR               9,713          0        102     10      0.1
    global cache open x                31,157      1,230         50      2      0.4
    enqueue                            58,261         14         45      1      0.8
    latch free                         33,398      7,610         44      1      0.4
    direct path read (lob)              9,925          0         36      4      0.1
    library cache pin                   8,777          1         34      4      0.1
    SQL*Net break/reset to clien       82,982          0         32      0      1.1
    log file sequential read              409          0         31     75      0.0
    log switch/archive                      3          3         29   9770      0.0
    SQL*Net more data to client       201,538          0         16      0      2.6
    global cache open s                 8,585        342         14      2      0.1
    global cache s to x                11,098        148         11      1      0.1
    control file sequential read        6,845          0          8      1      0.1
    db file parallel read               1,569          0          7      4      0.0
    log file switch completion             35          0          7    194      0.0
    row cache lock                     15,780          0          6      0      0.2
    process startup                        69          0          6     82      0.0
    global cache null to x              1,759         48          6      3      0.0
    direct path write (lob)               685          0          5      7      0.0
    DFS lock handle                     8,713          0          3      0      0.1
    control file parallel write         1,350          0          2      2      0.0
    wait for master scn                 1,194          0          1      1      0.0
    CGS wait for IPC msg               30,830     30,715          1      0      0.4
    global cache busy                      14          1          1     75      0.0
    ksxr poll remote instances         30,997     12,692          1      0      0.4
    direct path read                      752          0          0      1      0.0
    switch logfile command                  3          0          0    148      0.0
    log file single write                  24          0          0     13      0.0
    library cache lock                    668          0          0      0      0.0
    KJC: Wait for msg sends to c        1,161          0          0      0      0.0
    buffer busy global cache               26          0          0      6      0.0
    IPC send completion sync              261        260          0      0      0.0
    PX Deq: reap credit                 3,477      3,440          0      0      0.0
    LGWR wait for redo copy             1,751          0          0      0      0.0
    async disk IO                       1,059          0          0      0      0.0
    direct path write                     298          0          0      0      0.0
    slave TJ process wait                   1          1          0     18      0.0
    PX Deq: Execute Reply                   3          1          0      3      0.0
    PX Deq: Join ACK                        8          4          0      1      0.0
    global cache null to s                  8          0          0      1      0.0
    ges inquiry response                   16          0          0      0      0.0
    Wait Events for DB: EPIP  Instance: epip1  Snaps: 61688 -61698
    -> s  - second
    -> cs - centisecond -     100th of a second
    -> ms - millisecond -    1000th of a second
    -> us - microsecond - 1000000th of a second
    -> ordered by wait time desc, waits desc (idle events last)
                                                                       Avg
                                                         Total Wait   wait    Waits
    Event                               Waits   Timeouts   Time (s)   (ms)     /txn
    PX Deq: Parse Reply                     6          2          0      1      0.0
    PX Deq Credit: send blkd                2          1          0      0      0.0
    PX Deq: Signal ACK                      3          1          0      0      0.0
    library cache load lock                 1          0          0      0      0.0
    buffer deadlock                         6          6          0      0      0.0
    lock escalate retry                     4          4          0      0      0.0
    SQL*Net message from client     9,470,867          0    643,285     68    124.4
    queue messages                     42,829     41,144     42,888   1001      0.6
    wakeup time manager                   601        600     16,751  27872      0.0
    gcs remote message                795,414    120,163     13,606     17     10.4
    jobq slave wait                     2,546      2,462      7,375   2897      0.0
    PX Idle Wait                        2,895      2,841      7,021   2425      0.0
    virtual circuit status                120        120      3,513  29273      0.0
    ges remote message                142,306     69,912      3,504     25      1.9
    SQL*Net more data from clien      206,559          0         19      0      2.7
    SQL*Net message to client       9,470,903          0         14      0    124.4
    PX Deq: Execution Msg                 313        103          2      7      0.0
    Background Wait Events for DB: EPIP  Instance: epip1  Snaps: 61688 -61698
    -> ordered by wait time desc, waits desc (idle events last)
                                                                       Avg
                                                         Total Wait   wait    Waits
    Event                               Waits   Timeouts   Time (s)   (ms)     /txn
    db file parallel write              8,243          0        260     32      0.1
    log file parallel write           187,797          0        158      1      2.5
    log file sequential read              316          0         22     70      0.0
    enqueue                            56,204          0         15      0      0.7
    control file sequential read        5,694          0          6      1      0.1
    DFS lock handle                     8,682          0          3      0      0.1
    db file sequential read               276          0          2      8      0.0
    control file parallel write         1,334          0          2      2      0.0
    wait for master scn                 1,194          0          1      1      0.0
    CGS wait for IPC msg               30,830     30,714          1      0      0.4
    ksxr poll remote instances         30,972     12,681          1      0      0.4
    latch free                            356         54          1      2      0.0
    direct path read                      752          0          0      1      0.0
    log file single write                  24          0          0     13      0.0
    LGWR wait for redo copy             1,751          0          0      0      0.0
    async disk IO                         812          0          0      0      0.0
    global cache cr request                69          0          0      1      0.0
    row cache lock                         45          0          0      1      0.0
    direct path write                     298          0          0      0      0.0
    library cache pin                      29          0          0      1      0.0
    rdbms ipc reply                        29          0          0      0      0.0
    buffer busy waits                      10          0          0      0      0.0
    library cache lock                      2          0          0      0      0.0
    global cache open x                     2          0          0      0      0.0
    rdbms ipc message                 179,764     36,258     29,215    163      2.4
    gcs remote message                795,409    120,169     13,605     17     10.4
    pmon timer                          1,388      1,388      3,508   2527      0.0
    ges remote message                142,295     69,912      3,504     25      1.9
    smon timer                            414          0      3,463   8366      0.0
              -------------------------------------------------------------

  • Does Oracle RAC improve performance over single-instance db

    Hallo!i am running a single instance db that seems to be overwhelmed by workload as CPU is constantly 100% utilized despite code and instance tuning and users persistently complain of system slowness.
    It is our plan to uprgade to more powerful servers but would also like to implement load balancing to have 2 instances and 2 servers share workload via RAC solution.
    Some of peers state that RAC mostly a High Availability solution to enable continuity should one instance fail.I would like to know if db performance would improve on RAC as load balancing between 2 or more instances occurs in a RAC setup.
    Thanks.

    4joey1 wrote:
    Some of peers state that RAC mostly a High Availability solution to enable continuity should one instance fail.I would like to know if db performance would improve on RAC as load balancing between 2 or more instances occurs in a RAC setup.RAC also provides scalability - as you have additional servers and database instances to deal with the workload. It is not a mere high availability and redundancy architecture.
    BTW, a 2 node RAC is IMO not a "real" RAC... We also have one of those and I regret not insisting that it be, at minimum, a 4 node RAC - from both a redundancy/availability and performance/scalability viewpoints.
    Simple example - on our bigger RAC we are dealing with about 37000+ row inserts every second (3+ billion rows per day). This workload is only handled (on smallish dual 2-core CPU servers) due to RAC. No way that a single one of these servers would alone (as a single non-RAC instance) be capable to deal with that workload.
    So yeah - RAC is most definitely also about performance and scalability too. (but that does not mean it's a magic wand for solving your performance problems either).

  • RCA for Oracle RAC Performance Issue

    Hi DBAs,
    I have setup a 2 node Oracle RAC 10.2.0.3 on Linux 4.5 (64 bit) with 16 GB memory and 4 dual core CPUs each. The database is serving a web application but unfortunately the system is at its knees. The performance is terrible. The storage is a EMC SAN but ASM is not implemented with a fear to further degrade the performance or not to complicate the system further.
    I am seeking the expert advises from some GURUs from this forums to formulate the action plan to do the root cause analysis to the system and database. Please advise me what tools I can use to gather the information about the Root Cause. AWR Report is not very helpful. The system stats with top, vmstat, iostat only show the high resource usage but difficult to find the reason. OEM has configured and very frequently report all kind of high wait events.
    How I can use effectively find Network bottle necks (netstat command which need to be really helpful to understand).
    How I can see the system I/O (iostats) which can provide me some useful information. I don't understand what sould be the baseline or optimal values to compare the I/O activities.
    I am seeking help and advised to diagnose the issue. I also want to represent this issue as a case study.
    Thanks
    -Samar-

    First of all, RAC is mainly suited for OLTP applications.
    Secondly, if your application is unscalable (it doesn't use bind variables and no SQL statements have been tuned and/or it has been ported from Sukkelserver 200<whatever>) running it against RAC will make things worse.
    Thirdly: RAC uses a chatty Interconnect. If you didn't configure the Interconnect properly,and/or are using slow Network cards (1 Gb is mandatory), and/or you are not using a 9k MTU on your 1 Gb NIC, this again will make things worse.
    You can't install RAC 'out of the box'. It won't perform! PERIOD.
    Fourthly: you might suffer from your 'application' connecting and disconnecting for every individual SQL statement and/or commit every individual INSERT or UPDATE.
    You need to address this.
    Using ADDM and/or AWR is compulsory for analysing the problem, and/or having read Cary Millsaps book on Optimizing Oracle performance is compulsory.
    You won't come anywhere without AWR and OS statistics will not provide any clue.
    Because, paraphrasing William Jefferson Clinton, former president of the US of A:
    It's the application, stupid.
    99 out of 100 cases. Trust me. All developers I know currently are 100 percent clueless.
    That said, if you can't be bothered to post the top 5 AWR events, and you aren't up to using AWR reports, maybe you should hire a consultant who can.
    Regards,
    Sybrand Bakker
    Senior Oracle DBA

  • Gig Ethernet V/S  SCI as Cluster Private Interconnect for Oracle RAC

    Hello Gurus
    Can any one pls confirm if it's possible to configure 2 or more Gigabit Ethernet interconnects ( Sun Cluster 3.1 Private Interconnects) on a E6900 cluster ?
    It's for a High Availability requirement of Oracle 9i RAC. i need to know ,
    1) can i use gigabit ethernet as Private cluster interconnect for Deploying Oracle RAC on E6900 ?
    2) What is the recommended Private Cluster Interconnect for Oracle RAC ? GiG ethernet or SCI with RSM ?
    3) How about the scenarios where one can have say 3 X Gig Ethernet V/S 2 X SCI , as their cluster's Private Interconnects ?
    4) How the Interconnect traffic gets distributed amongest the multiple GigaBit ethernet Interconnects ( For oracle RAC) , & is anything required to be done at oracle Rac Level to enable Oracle to recognise that there are multiple interconnect cards it needs to start utilizing all of the GigaBit ethernet Interfaces for transfering packets ?
    5) what would happen to Oracle RAC if one of the Gigabit ethernet private interconnects fails
    Have tried searching for this info but could not locate any doc that can precisely clarify these doubts that i have .........
    thanks for the patience
    Regards,
    Nilesh

    Answers inline...
    Tim
    Can any one pls confirm if it's possible to configure
    2 or more Gigabit Ethernet interconnects ( Sun
    Cluster 3.1 Private Interconnects) on a E6900
    cluster ?Yes, absolutely. You can configure up to 6 NICs for the private networks. Traffic is automatically striped across them if you specify clprivnet0 to Oracle RAC (9i or 10g). That is TCP connections and UDP messages.
    It's for a High Availability requirement of Oracle
    9i RAC. i need to know ,
    1) can i use gigabit ethernet as Private cluster
    interconnect for Deploying Oracle RAC on E6900 ? Yes, definitely.
    2) What is the recommended Private Cluster
    Interconnect for Oracle RAC ? GiG ethernet or SCI
    with RSM ? SCI is or is in the process of being EOL'ed. Gigabit is usually sufficient. Longer term you may want to consider Infiniband or 10 Gigabit ethernet with RDS.
    3) How about the scenarios where one can have say 3 X
    Gig Ethernet V/S 2 X SCI , as their cluster's
    Private Interconnects ? I would still go for 3 x GbE because it is usually cheaper and will probably work just as well. The latency and bandwidth differences are often masked by the performance of the software higher up the stack. In short, unless you tuned the heck out of your application and just about everything else, don't worry too much about the difference between GbE and SCI.
    4) How the Interconnect traffic gets distributed
    amongest the multiple GigaBit ethernet Interconnects
    ( For oracle RAC) , & is anything required to be done
    at oracle Rac Level to enable Oracle to recognise
    that there are multiple interconnect cards it needs
    to start utilizing all of the GigaBit ethernet
    Interfaces for transfering packets ?You don't need to do anything at the Oracle level. That's the beauty of using Oracle RAC with Sun Cluster as opposed to RAC on its own. The striping takes place automatically and transparently behind the scenes.
    5) what would happen to Oracle RAC if one of the
    Gigabit ethernet private interconnects fails It's completely transparent. Oracle will never see the failure.
    Have tried searching for this info but could not
    locate any doc that can precisely clarify these
    doubts that i have .........This is all covered in a paper that I have just completed and should be published after Christmas. Unfortunately, I cannot give out the paper yet.
    thanks for the patience
    Regards,
    Nilesh

  • Oracle RAC - Not getting performance(TPS) as we expect on insert/update

    Hi All,
    We got a problem while executing insert/update and delete queries with Oracle RAC system, we are not getting the TPS as we expected in Oracle RAC. The TPS of Oracle RAC (for insert/update and delete ) is less than as that of
    single oracle system.
    But while executing select queries, we are getting almost double TPS as that of Single Oracle System.
    We have done server side and client side load balancing.
    Can anyone knows to solve this strange behaviour? Shall we need to perform any other settings in ASM/ Oracle Nodes
    for better performance on insert/update and delete queries.
    The following is the Oracle RAC configuration
    OS & Hardware :Windows 2008 R2 , Core 2 Du0 2.66GHz , 4 GB
    Software : Oracle 11g 64 Bit R2 , Oracle Clusterware & ASM , Microsoft iSCSI initiator.
    Storage Simulation : Xeon 4GB , 240 GB ,Win 2008 R2, Microsoft iSCSI Traget
    Please help me to solve this. We are almost stuck with this situation.
    Thanks
    Roy

    Load Profile Per Second Per Transaction Per Exec Per Call
    ~~~~~~~~~~~~ ------------------ ----------------- ----------- -----------
    DB time(s): 48.3 0.3 0.26 0.10
    DB CPU(s): 0.1 0.0 0.00 0.00
    Redo size: 523,787.9 3,158.4
    Logical reads: 6,134.6 37.0
    Block changes: 3,247.1 19.6
    Physical reads: 3.5 0.0
    Physical writes: 50.7 0.3
    User calls: 497.6 3.0
    Parses: 182.0 1.1
    Hard parses: 0.1 0.0
    W/A MB processed: 0.1 0.0
    Logons: 0.1 0.0
    Executes: 184.0 1.1
    Rollbacks: 0.0 0.0
    Transactions: 165.8
    Instance Efficiency Indicators
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Buffer Nowait %: 93.74 Redo NoWait %: 99.96
    Buffer Hit %: 99.99 Optimal W/A Exec %: 100.00
    Library Hit %: 100.19 Soft Parse %: 99.96
    Execute to Parse %: 1.09 Latch Hit %: 99.63
    Parse CPU to Parse Elapsd %: 16.44 % Non-Parse CPU: 84.62
    Shared Pool Statistics Begin End
    Memory Usage %: 75.89 77.67
    % SQL with executions>1: 71.75 69.88
    % Memory for SQL w/exec>1: 75.63 71.38

  • Copper cable / GigE Copper Interface as Private Interconnect for Oracle RAC

    Hello Gurus
    Can some one confirm if the copper Cables ( Cat5/RJ45) can be used for Gig Ethernet i.e. Private interconnects for deploying Oracle RAC 9.x or 10gR2 on Solaris 9/10 .
    i am planning to use 2 X GigE Interfaces (one port each from X4445 Quad Port Ethernet Adapters) & Planning to connect it using copper cables ( all the documents that i came across is been refering to the fiber cables for Private Interconnects , connecting GigE Interfaces , so i am getting bit confused )
    would appretiate if some one can throw some lights on the same.
    regards,
    Nilesh Naik
    thanks

    Cat5/RJ45 can be used for Gig Ethernet Private interconnects for Oracle RAC. I would recommend trunking the two or more interconnects for redundancy. The X4445 adapters are compatible with the Sun Trunking 1.3 software (http://www.sun.com/products/networking/ethernet/suntrunking/). If you have servers that support the Nemo framework (bge, e1000g, xge, nge, rge, ixgb), you can use the Solaris 10 trunking software, dladmin.
    We have a couple of SUN T2000 servers and are using the onboard GigE ports for the Oracle 10gR2 RAC interconnects. We upgraded the onboard NIC drivers to the e1000g and used the Solaris 10 trunking software. The next update of Solaris will have the e1000g drivers as the default for the SUN T2000 servers.

  • Oracle RAC performance Suddenly terminates on one of the two node cluster

    I have a strange problem that happens frequently from time to time when My M400 Machine which is a part of two node RAC cluster goes down suddenly
    I tried so many times to understand what's the cause behind that but when I read the logs there are so many messages related to the Oracle RAC which I don't have any experience or knowledge about so I hope I can find here any one who can explain to me these log messages knowing that they are always the same
    Jun 18 08:30:00 kfc-rac1 sendmail[17709]: [ID 702911 mail.crit] My unqualified host name (kfc-rac1) unknown; sleeping for retry
    Jun 18 08:31:00 kfc-rac1 sendmail[17709]: [ID 702911 mail.alert] unable to qualify my own domain name (kfc-rac1) -- using short name
    Jun 18 11:44:15 kfc-rac1 iscsi: [ID 454097 kern.notice] NOTICE: unrecognized ioctl 0x403
    Jun 18 11:44:15 kfc-rac1 scsi: [ID 243001 kern.warning] WARNING: /pseudo/fcp@0 (fcp0):
    Jun 18 11:44:15 kfc-rac1 Invalid ioctl opcode = 0x403
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <bin/rac_udlm_monitor_stop> for resource <rac-udlm-rs>, resource group <rac-fw-rg>, node <kfc-rac1
    , timeout <300> secondsJun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <bin/rac_framework_monitor_stop> for resource <rac-fw-rs>, resource group <rac-fw-rg>, node <kfc-r
    ac1>, timeout <3600> seconds
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <bin/rac_svm_monitor_stop> for resource <rac-svm-rs>, resource group <rac-fw-rg>, node <kfc-rac1>,
    timeout <300> seconds
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <scal_dg_monitor_stop> for resource <scal-racdg-rs>, resource group <scal-racdg-rg>, node <kfc-rac
    1>, timeout <300> seconds
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <scal_mountpoint_monitor_stop> for resource <racfs-mntpnt-rs>, resource group <racfs-mntpnt-rg>, n
    ode <kfc-rac1>, timeout <300> seconds
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <bin/rac_framework_monitor_stop> completed successfully for resource <rac-fw-rs>, resource group <rac-fw-rg>
    , node <kfc-rac1>, time used: 0% of timeout <3600 seconds>
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <bin/rac_udlm_monitor_stop> completed successfully for resource <rac-udlm-rs>, resource group <rac-fw-rg>, n
    ode <kfc-rac1>, time used: 0% of timeout <300 seconds>
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <bin/rac_svm_monitor_stop> completed successfully for resource <rac-svm-rs>, resource group <rac-fw-rg>, nod
    e <kfc-rac1>, time used: 0% of timeout <300 seconds>
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <bin/rac_udlm_stop> for resource <rac-udlm-rs>, resource group <rac-fw-rg>, node <kfc-rac1>, timeo
    ut <300> seconds
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <scal_dg_monitor_stop> completed successfully for resource <scal-racdg-rs>, resource group <scal-racdg-rg>,
    node <kfc-rac1>, time used: 0% of timeout <300 seconds>
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <scal_mountpoint_monitor_stop> completed successfully for resource <racfs-mntpnt-rs>, resource group <racfs-
    mntpnt-rg>, node <kfc-rac1>, time used: 0% of timeout <300 seconds>
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <scal_mountpoint_postnet_stop> for resource <racfs-mntpnt-rs>, resource group <racfs-mntpnt-rg>, n
    ode <kfc-rac1>, timeout <300> seconds
    Jun 18 17:09:41 kfc-rac1 SC[SUNW.rac_udlm.rac_udlm_stop]: [ID 854390 daemon.notice] Resource state of rac-udlm-rs is changed to offline. Note that RAC framework will not be stopped by STOP me
    thod.
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <bin/rac_udlm_stop> completed successfully for resource <rac-udlm-rs>, resource group <rac-fw-rg>, node <kfc
    -rac1>, time used: 0% of timeout <300 seconds>
    Jun 18 17:09:42 kfc-rac1 samfs: [ID 320134 kern.notice] NOTICE: SAM-QFS: racfs: Initiated unmount filesystem: vers 2
    Jun 18 17:09:43 kfc-rac1 samfs: [ID 522083 kern.notice] NOTICE: SAM-QFS: racfs: Completed unmount filesystem: vers 2
    Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <scal_mountpoint_postnet_stop> completed successfully for resource <racfs-mntpnt-rs>, resource group <racfs-
    mntpnt-rg>, node <kfc-rac1>, time used: 0% of timeout <300 seconds>
    Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <scal_dg_postnet_stop> for resource <scal-racdg-rs>, resource group <scal-racdg-rg>, node <kfc-rac
    1>, timeout <300> seconds
    Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <scal_dg_postnet_stop> completed successfully for resource <scal-racdg-rs>, resource group <scal-racdg-rg>,
    node <kfc-rac1>, time used: 0% of timeout <300 seconds>
    Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <bin/rac_svm_stop> for resource <rac-svm-rs>, resource group <rac-fw-rg>, node <kfc-rac1>, timeout
    <300> seconds
    Jun 18 17:09:43 kfc-rac1 SC[SUNW.rac_svm.rac_svm_stop]: [ID 854390 daemon.notice] Resource state of rac-svm-rs is changed to offline. Note that RAC framework will not be stopped by STOP metho
    d.
    Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <bin/rac_svm_stop> completed successfully for resource <rac-svm-rs>, resource group <rac-fw-rg>, node <kfc-r
    ac1>, time used: 0% of timeout <300 seconds>
    Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <bin/rac_framework_stop> for resource <rac-fw-rs>, resource group <rac-fw-rg>, node <kfc-rac1>, ti
    meout <300> seconds
    Jun 18 17:09:43 kfc-rac1 SC[SUNW.rac_framework.rac_framework_stop]: [ID 854390 daemon.notice] Resource state of rac-fw-rs is changed to offline. Note that RAC framework will not be stopped by
    STOP method.
    Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <bin/rac_framework_stop> completed successfully for resource <rac-fw-rs>, resource group <rac-fw-rg>, node <
    kfc-rac1>, time used: 0% of timeout <300 seconds>
    Jun 18 17:09:44 kfc-rac1 root: [ID 702911 user.error] Oracle CRSD 3932 set to stop
    Jun 18 17:09:44 kfc-rac1 root: [ID 702911 user.error] Oracle CRSD 3932 shutdown completed
    Jun 18 17:09:44 kfc-rac1 root: [ID 702911 user.error] Oracle EVMD set to stop
    Jun 18 17:09:44 kfc-rac1 root: [ID 702911 user.error] Oracle CSSD being stopped
    Jun 18 17:09:45 kfc-rac1 xntpd[980]: [ID 866926 daemon.notice] xntpd exiting on signal 15
    Jun 18 17:09:45 kfc-rac1 ip: [ID 646971 kern.notice] ip_create_dl: hw addr length = 0
    Jun 18 17:09:45 kfc-rac1 pppd[516]: [ID 702911 daemon.notice] Connection terminated.
    Jun 18 17:09:47 kfc-rac1 pppd[9462]: [ID 860527 daemon.notice] pppd 2.4.0b1 (Sun Microsystems, Inc.) started by root, uid 0
    Jun 18 17:09:47 kfc-rac1 pppd[9462]: [ID 702911 daemon.notice] Connect: sppp0 <--> /dev/dm2s0
    Jun 18 17:09:47 kfc-rac1 rpc.metamedd: [ID 702911 daemon.error] Terminated
    Jun 18 17:09:48 kfc-rac1 inetd[482]: [ID 702911 daemon.warning] inetd_offline method for instance svc:/network/rpc/scrcmd:default is unspecified. Taking default action: kill.
    Jun 18 17:09:48 kfc-rac1 inetd[482]: [ID 702911 daemon.warning] inetd_offline method for instance svc:/network/rpc/metacld:default is unspecified. Taking default action: kill.
    Jun 18 17:09:49 kfc-rac1 inetd[482]: [ID 702911 daemon.warning] inetd_offline method for instance svc:/network/rpc/scadmd:default is unspecified. Taking default action: kill.
    Jun 18 17:09:50 kfc-rac1 pppd[9462]: [ID 702911 daemon.notice] local IP address 192.168.224.2
    Jun 18 17:09:50 kfc-rac1 pppd[9462]: [ID 702911 daemon.notice] remote IP address 192.168.224.1
    Jun 18 17:09:50 kfc-rac1 cl_eventlogd[1554]: [ID 247336 daemon.error] Going down on signal 15.
    Jun 18 17:09:52 kfc-rac1 ip: [ID 372019 kern.error] ipsec_check_inbound_policy: Policy Failure for the incoming packet (not secure); Source 192.168.224.001, Destination 192.168.224.002.
    *Jun 18 17:09:56 kfc-rac1 ip: [ID 646971 kern.notice] ip_create_dl: hw addr length = 0*
    *Jun 18 17:09:56 kfc-rac1 pppd[9462]: [ID 702911 daemon.notice] Connection terminated.*
    *Jun 18 17:09:56 kfc-rac1 Cluster.PNM: [ID 226280 daemon.notice] PNM daemon exiting.*
    *Jun 18 17:09:57 kfc-rac1 pseudo: [ID 129642 kern.info] pseudo-device: tod0*
    *Jun 18 17:09:57 kfc-rac1 genunix: [ID 936769 kern.info] tod0 is /pseudo/tod@0*
    *Jun 18 17:09:57 kfc-rac1 pseudo: [ID 129642 kern.info] pseudo-device: pm0*
    *Jun 18 17:09:57 kfc-rac1 genunix: [ID 936769 kern.info] pm0 is /pseudo/pm@0*
    *Jun 18 17:09:57 kfc-rac1 rpc.metad: [ID 702911 daemon.error] Terminated*
    Jun 18 17:10:01 kfc-rac1 syslogd: going down on signal 15
    *Jun 18 17:10:07 kfc-rac1 rpcbind: [ID 564983 daemon.error] rpcbind terminating on signal.*
    *Jun 18 17:10:32 kfc-rac1 Cluster.RGM.fed: [ID 831843 daemon.notice] SCSLM thread WARNING pools facility is disabled*
    *Jun 18 17:10:40 kfc-rac1 genunix: [ID 672855 kern.notice] syncing file systems...*
    *Jun 18 17:10:40 kfc-rac1 genunix: [ID 904073 kern.notice] done*
    Jun 19 14:20:12 kfc-rac1 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.10 Version Generic_141444-09 64-bit
    Jun 19 14:20:12 kfc-rac1 genunix: [ID 943908 kern.notice] Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
    Jun 19 14:20:12 kfc-rac1 Use is subject to license terms.
    Jun 19 14:20:12 kfc-rac1 genunix: [ID 678236 kern.info] Ethernet address = 0:21:28:2:21:b2
    Thanks in advance for all of you
    your response is highly appreciated

    Hi I have checked the interconnect between the two nodes and it's as follow
    ifconfig -a
    lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
    inet 127.0.0.1 netmask ff000000
    bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
    inet 10.1.100.126 netmask ffffff00 broadcast 10.1.100.255
    groupname sc_ipmp0
    ether 0:14:4f:3a:6c:19
    bge0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2
    inet 10.1.100.127 netmask ffffff00 broadcast 10.1.100.255
    bge0:2: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 index 2
    inet 10.1.100.140 netmask ffffff00 broadcast 10.1.100.255
    bge1: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 6
    inet 172.16.0.129 netmask ffffff80 broadcast 172.16.0.255
    ether 0:14:4f:3a:6c:1a
    nxge0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3
    inet 10.1.100.128 netmask ffffff00 broadcast 10.1.100.255
    groupname sc_ipmp0
    ether 0:21:28:d:c9:8e
    nxge1: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 5
    inet 172.16.1.1 netmask ffffff80 broadcast 172.16.1.127
    ether 0:21:28:d:c9:8f
    e1000g1: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 4
    inet 172.16.1.129 netmask ffffff80 broadcast 172.16.1.255
    ether 0:15:17:81:15:c3
    clprivnet0: flags=1009843<UP,BROADCAST,RUNNING,MULTICAST,MULTI_BCAST,PRIVATE,IPv4> mtu 1500 index 7
    inet 172.16.4.1 netmask fffffe00 broadcast 172.16.5.255
    ether 0:0:0:0:0:1
    sppp0: flags=10010008d1<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST,IPv4,FIXEDMTU> mtu 1500 index 8
    inet 192.168.224.2 --> 192.168.224.1 netmask ffffff00
    ether 0:0:0:0:0:0
    root@kfc-rac1 #
    and it's direct attached between both nodes interfaces
    back to back
    and about the status of the hba cards here is it as well
    fcinfo hba-port -l
    HBA Port WWN: 2100001b3284c042
    OS Device Name: /dev/cfg/c1
    Manufacturer: QLogic Corp.
    Model: 375-3355-02
    Firmware Version: 05.01.00
    FCode/BIOS Version:  BIOS: 1.24; fcode: 1.24; EFI: 1.8;
    Serial Number: 0402R00-0844647023
    Driver Name: qlc
    Driver Version: 20090519-2.31
    Type: N-port
    State: online
    Supported Speeds: 1Gb 2Gb 4Gb
    Current Speed: 4Gb
    Node WWN: 2000001b3284c042
    Link Error Statistics:
    Link Failure Count: 0
    Loss of Sync Count: 0
    Loss of Signal Count: 0
    Primitive Seq Protocol Error Count: 0
    Invalid Tx Word Count: 0
    Invalid CRC Count: 0
    HBA Port WWN: 2100001b321c462b
    OS Device Name: /dev/cfg/c2
    Manufacturer: QLogic Corp.
    Model: 375-3355-02
    Firmware Version: 05.01.00
    FCode/BIOS Version:  BIOS: 1.24; fcode: 1.24; EFI: 1.8;
    Serial Number: 0402R00-0844646557
    Driver Name: qlc
    Driver Version: 20090519-2.31
    Type: N-port
    State: online
    Supported Speeds: 1Gb 2Gb 4Gb
    Current Speed: 4Gb
    Node WWN: 2000001b321c462b
    Link Error Statistics:
    Link Failure Count: 0
    Loss of Sync Count: 0
    Loss of Signal Count: 0
    Primitive Seq Protocol Error Count: 0
    Invalid Tx Word Count: 0
    Invalid CRC Count: 0
    HBA Port WWN: 2100001b32934b3c
    OS Device Name: /dev/cfg/c3
    Manufacturer: QLogic Corp.
    Model: 375-3294-01
    Firmware Version: 05.01.00
    FCode/BIOS Version:  BIOS: 2.2; fcode: 2.1; EFI: 2.0;
    Serial Number: 0402R00-0947745866
    Driver Name: qlc
    Driver Version: 20090519-2.31
    Type: N-port
    State: online
    Supported Speeds: 1Gb 2Gb 4Gb
    Current Speed: 4Gb
    Node WWN: 2000001b32934b3c
    Link Error Statistics:
    Link Failure Count: 0
    Loss of Sync Count: 0
    Loss of Signal Count: 0
    Primitive Seq Protocol Error Count: 0
    Invalid Tx Word Count: 0
    Invalid CRC Count: 0
    HBA Port WWN: 2101001b32b34b3c
    OS Device Name: /dev/cfg/c4
    Manufacturer: QLogic Corp.
    Model: 375-3294-01
    Firmware Version: 05.01.00
    FCode/BIOS Version:  BIOS: 2.2; fcode: 2.1; EFI: 2.0;
    Serial Number: 0402R00-0947745866
    Driver Name: qlc
    Driver Version: 20090519-2.31
    Type: unknown
    State: offline
    Supported Speeds: 1Gb 2Gb 4Gb
    Current Speed: not established
    Node WWN: 2001001b32b34b3c
    Link Error Statistics:
    Link Failure Count: 0
    Loss of Sync Count: 0
    Loss of Signal Count: 0
    Primitive Seq Protocol Error Count: 0
    Invalid Tx Word Count: 0
    Invalid CRC Count: 0
    root@kfc-rac1 #
    In addition here is the ocssd log file as well
    http://www.4shared.com/file/Txl9DqLW/log_25155156.html?
    you'll find on the lines for the dates in which this issue happens
    look at 2012-06-09
    2012-06-18
    2012-06-21
    you'll see something related to the voting disk
    it suddenly becomes unavailable which causes the problem
    thanks a lot for your help
    I'm waiting for your recommendation
    hope these logs gives more look for the problem
    Thanks in advance :)

  • Oracle RAC new feature for interconnect configuration HAIP vs BONDING

    Hi All:
    I would like to get some opinion about using Oracle HAIP (High Avalibilty IP) for configuring the RAC interconnect vs using network interface bonding.
    This seems to be a new feature for Oracle GI from 11.2.0.2 and later. Have anyone had any experience for using HAIP and any issues?
    Thanks

    Hi
    Multiple private network adapters can be defined either during the installation phase or afterward using the oifcfg.Grid Infrastructure can activate a maximum of four private network adapters at a time. With HAIP, by default, interconnect traffic will be load balanced across all active interconnect interfaces, and corresponding HAIP address will be failed over transparently to other adapters if one fails or becomes non-communicative.
    it is quite helpful in following scenarios
    1) if one private network adapterrs fails, virtual private IP on that adapter would be relocated to healthy adapter.
    There is very good document on Metalink ( Doc ID - 1210883.1)
    Rgds
    Harvinder

  • Oracle RAC performance

    Hello all,
    I have a 3 node oracle RAC. Each server has 2 intel quad core processors. I was asked to extend the RAC to 5 nodes to increase the processing power. I was told that Intel 6 core processors are in the market. So I was looking at the following options and need your input/expertise or suggestions on the following options.
    Option 1: Add 2 more nodes with 2 quad core processors to the existing 3 node RAC. so total cores available to process the load will be 40 cores.
    Option 2: completely replace all the above 5 nodes(in option 1) with 4 nodes - each with 2 * 6 core processors. total cores available are 48 core.
    option 2 will give have 1 less server to maintain.
    will the performance on option 2 be better than option 1? Any there any white papers or documentation to help me take a decision as to which option i should go?
    Thanks in advance.
    Raj

    hi
    I would also first try to tune the database. However it demands some professional skills. There are many areas of tuning.
    For example, if your storage subsystem runs slowly (often a bottleneck), then you will not benefit from and node's scaling!
    Upgrading hardware is the last thing to do, as it is costly.
    If you have a budget, then run for it, buy new "toys", why not, but make sure you hit the performance issue lest you spent money with no improvement...brr
    So start with storage, if disks are old e.g. 10k RPM replace them with new 15k RPM, or SSD even.
    Run AWR, see "Top 5 events", find the slowest subsystem, if this is I/O then invest in disks, if CPU then CPU, etc.
    Replacing old hardware (scale-up) have this benefit, that you don't have to buy all components and install all software from the scratch (OS, add nodes).
    Also, however it is not forbidden, but it's been said by Oracle, nodes should have similar performance.
    Next thing is that adding more than 3 nodes to the RAC cluster , probably will not improve the overall performance in a linear way, and is more expensive.
    My suggestion is to scale-up after inspecting and finding the bottleneck of the cluster performance.
    Hope this helps,

  • Is Interconnect mandatory in ORACLE RAC??

    The private interconnect is the physical construct that allows inter-node communication or Cache Fusion communication.
    because of high speed interconnect existing in the cluster between each instance. Each of instance is connected to other instance using a high-speed interconnect. This makes it possible to share the memory between 2 or more servers. Previously only datafile sharing was possible, now because of interconnect, even the cache memory can be shared.
    also private IP range for cluster services cannot be used at public IP range domain due to security purpose.
    so in my opinion, interconnect is must in RAC. Please share your opinion? thanks

    Previously only datafile sharing was possible, now because of interconnect, even the cache memory can be shared.Only datafile sharing??? I don't get it.
    What age you are talking about? Since Oracle8i (1999), Cache Fusion optimizes read/write concurrency by using the interconnect to directly transfer data blocks among instances. This eliminates I/O and reduces delays for block reads to the speed of the interprocess communication (IPC) and the interconnecting network.
    Docs from 2000...
    The ability to spread instances across different machines provides you with scalability. Many hardware vendors offer machines that have upper limits on both memory and numbers of processors. With parallel server, though, if you have a machine with a 2-gigabyte upper limit on memory and your memory requirements grow beyond the 2 gigabytes, you can split your users across two machines. Each machine will have its own instance, but all users will share the same database.
    Lets move to 2012 ... forget about the past.
    See this :
    The Infiniband roadmap shows that the NDR (Next Data Rate) will scale to 320Gb/s.{message:id=10429061}
    Is Interconnect mandatory in ORACLE RAC?? Talking about SHARED WORLD always there Networks. From long time ago.. YES

  • Recommendations - Oracle RAC 10g on Solaris 10 Containers Logical/Local..

    Dear Oracle Experts et all
    I have a couple of questions for Oracle 10g RAC implementation on Solaris and seek your advice. we are attempting to implement oracle 10g RAC on Solaris OS and SPARC Platform.
    1 We are wondering if Oracle 10g RAC could be implemented on Solaris Local/Logical Containers? I was assuming that Oracle will always link it self with OS binaries and Libraries while S/W installation and hence will need an OS image/Root Disk over which it could go. However, in containers, I assume we have a single solaris installation and configuration which will thus be shared to the containers which will be further configured in it. In such situations how does Oracle instalation proceed? Do I need to look at a scenario where, the global Container/Zone will have Oracle install and this image be shared across to zones/containers accordingly? If it is so, what all filesystems from OS will need to be shared across to these zones/containers?
    Additionally, even if this approach is supported, is it a recommended approach? I am unsure about the stability and functionality of Oracle in such cases and am not able to completly conceptualize. However, I assume there could be certain items which needs to be approprietly taken care off. It will help if you could share observations from your experiences.
    2 The idea of RAC we are looking at is to have multiple Oracle Installations on top of native clustering solution say veritas clusters/Sun Clusters. Do we still need to have Oracle Cluster solution Clusterware (ORACRS) on top of this to achieve Oracle Clustering? Will I be able to install Oracle as a standalone installation on top of native clustering solution say veritas clusters/Sun Clusters?
    Our requirement is to have the above mentioned multiple Oracle installations spread across two (2) seperate H/W platforms,say Node A and Node B, and configure our Cluster Solution to behave as active-passive across Node A and Node B. In other words, I will configure Clustering Solution like VRTS/SunCluster in Active-Passive, then have 3 Oracle installations on Node A, another 3 on Node B. I will configure one database each for each of these Oracle S/W installation (with an idea not to have Clusterware between clustering solution VRTS/SunCluster and Oracle installation, if it works). Now I will run 3 databases thus on each of these nodes. If any downtime happens on any one of the nodes, say Node A, I will fail all oracle databases and S/W accordingly to the alternate available node, Node B in this case, using native clustering solution and I will want the database to behave as it was behaving earlier, on Node A. I am not sure though if I will be able to bring the database up on Node B when resources in OS perspective are failed over.
    we want to use Oracle 10g RAC Release 2 EE on Solaris 10 OS latest/one before the latest release.
    Please share your thoughts.
    Regards!
    Sarat

    Sarat Chandra C wrote:
    Dear Oracle Experts et all
    I have a couple of questions for Oracle 10g RAC implementation on Solaris and seek your advice. we are attempting to implement oracle 10g RAC on Solaris OS and SPARC Platform.
    1 We are wondering if Oracle 10g RAC could be implemented on Solaris Local/Logical Containers? My understanding is that RAC in a Zone (Container) is not supported by Oracle, and will not work anyway. Regardless of installation, RAC needs to do cluster level stuff about the cluster configuration, changing network addresses dynamically, and sending guaranteed messages over the cluster interconnect. None of this stuff can be done in a Local Zone in Solaris, because Local Zones have fewer permissions that the Global Zone. This is part of the design of Solaris Zones, and nothing to do with how Oracle RAC itself works on them.
    This is all down to the security model of Zones, and Local Zones lack the ability to do certain things, to stop them reconfiguring themselves and impacting other Zones. Hence RAC cannot do dynamic cluster reconfiguration in a Local Zone, such as changing virtual network addresses when a node fails.
    My understanding is that RAC just cannot work in a Local Zone. This was certainly true 5 years ago (mid 2005), and was a result of the inherent design and implementation of Zones in Solaris. Things may have changed, so check the Solaris documentation, and check if Oracle RAC is supported in Local Zones. However, as I said, this limitation was inherent in the design of Zones, so I do not see how Sun could possibly have changed it so that RAC would work in a Local Zone.
    To me, your only option is the Global Zone. Which pretty much destroys the argument for having Zones on a Solaris system, unless you can host other non-Oracle application on the other Zones.
    2 The idea of RAC we are looking at is to have multiple Oracle Installations on top of native clustering solution say veritas clusters/Sun Clusters. Do we still need to have Oracle Cluster solution Clusterware (ORACRS) on top of this to achieve Oracle Clustering? Will I be able to install Oracle as a standalone installation on top of native clustering solution say veritas clusters/Sun Clusters?I am not sure the term 'native' is correct. All 'Cluster' software is low level, and has components that run within the operating system. Whether this is Sun Cluster, Veritas Cluster Server, or Oracle Clusterware. They are all as 'native' to Solaris as each other. They all perform the same function for Oracle RAC around Cluster management - which nodes are members of the cluster, heartbeats between nodes, reliable fast message delivery, etc.
    You only need one piece of Cluster software. So pick one and use it. If you use the Sun or Veritas cluster products, then you do not need the Oracle Clusterware software. But I would use it, because it is free (included with RAC), is from Oracle themselves and so guaranteed to work, is fully supported, and is one less third party product to deal with. Having an all Oracle software stack makes things simpler and more reliable, as far as I am concerned. You can be sure that Oracle will have fully tested RAC on their own Clusterware, and be able to replicate any issues in their own support environments.
    Officially the Sun and Veritas products will work and are supported. But when you get a problem with your Cluster environment, who are you going to call? You really want to avoid "finger pointing" when you have a problem, with each vendor blaming the cause of the problem on another vendor. Using an all Oracle stack is simpler, and ensures Oracle will "own" all your support problems.
    Also future upgrades between versions will be simpler, as Oracle will release all their software together, and have tested it together. When using third party Cluster software, you have to wait for all vendors to release new versions of their own software, and then wait again while it is tested against all the different third party software that runs on it. I have heard of customers stuck on old versions of certain cluster products, who cannot upgrade because there are no compatible combinations in the support matrices between the cluster product and Oracle database versions.
    I will configure Clustering Solution like VRTS/SunCluster in Active-Passive, then have 3 Oracle installations on Node A, another 3 on Node B. As I said before, these 3 Oracle installations will actually all be on the same Global Zone, because RAC will not go into Local Zones.
    John

  • ORACLE RAC - Clusterware 11Gr1  -

    Hi,
    We are starting a fresh installation of an Oracle RAC database on:
    IBM AIX 5.3
    Processor Type: PowerPC_POWER7
    Processor Implementation Mode: POWER 6
    Processor Version: PV_6_Compat
    Number Of Processors: 2
    Processor Clock Speed: 3108 MHz
    CPU Type: 64-bit
    Kernel Type: 64-bit
    We are changing the servers and installing new binaries: clusterware ,asm binary, rdbms binary. After that, database will be migrated to this server. The oracle clusterware will be on 11.1.0.7, ASM on 11.1.0.7 and RDMS on 10.2.0.4. We are installing exactly as it is in production now.
    After the servers were release to the dba, configured the ocr and voting disks and the pre-reqs, during the installation we face an error on the second node, while running the root.sh. The installation went successfully, 100% and on the first node finished successfully and started the services. On the second node, while running the root.hs, it failed with the message :
    --> Failure at final check of oracle CRS stack. 10
    Checking the logs, we could see the following:
    [    CSSD]2011-09-25 02:59:51.633 [1030] >TRACE: clssnmReadDskHeartbeat: node 1, hodev001ler, has a disk HB, but no network HB, DHB has rcfg 212382813, wrtcnt, 2978, LATS 3178240991, lastSeqNo 2978, timestamp 1316930391/924708736
    [    CSSD]2011-09-25 02:59:51.984 [1287] >TRACE: clssnmReadDskHeartbeat: node 1, hodev001ler, has a disk HB, but no network HB, DHB has rcfg 212382813, wrtcnt, 2978, LATS 3178241341, lastSeqNo 2978, timestamp 1316930391/924708736
    [    CSSD]2011-09-25 02:59:52.304 [1801] >TRACE: clssnmReadDskHeartbeat: node 1, hodev001ler, has a disk HB, but no network HB, DHB has rcfg 212382813, wrtcnt, 2978, LATS 3178241662, lastSeqNo 2978, timestamp 1316930391/924708736
    [    CSSD]2011-09-25 02:59:52.635 [1030] >TRACE: clssnmReadDskHeartbeat: node 1, hodev001ler, has a disk HB, but no network HB, DHB has rcfg 212382813, wrtcnt, 2979, LATS 3178241992, lastSeqNo 2979, timestamp 1316930392/924709737
    [    CSSD]2011-09-25 02:59:52.771 [5399] >TRACE: clssnmLocalJoinEvent: node(1), state(0), cont (1), sleep (0), diskHB 1, diskinfo 110aa46f0
    [    CSSD]2011-09-25 02:59:52.771 [5399] >TRACE: clssnmLocalJoinEvent: node(2), state(1), cont (0), sleep (0), diskHB 1, diskinfo 110aa46f0
    All the notes seems to point to and interface/interconnect problem, but we dont have any clue on what parameter or what checks we need to perform with Unix Team. Does anybody had this issue? Any clue on what may be need to be adjusted or configured to solve this issue? Following below is the interfaces of both servers: hodb001lernew and hodb002lernew:
    hodb001lernew
    root@hodev001lernew:/u01/crs/log/hodev001ler # ifconfig -a
    en1: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
    inet 10.124.85.221 netmask 0xffffffe0 broadcast 10.124.85.223
    tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
    en2: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
    inet 10.124.140.109 netmask 0xffffffe0 broadcast 10.124.140.127
    tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
    en3: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
    inet 10.56.96.47 netmask 0xffffe000 broadcast 10.56.127.255
    tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
    en5: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
    inet 10.124.251.4 netmask 0xffffffc0 broadcast 10.124.251.63
    tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
    lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT>
    inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
    inet6 ::1/0
    tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
    hodb002lernew
    root@hodev002lernew:/u01/crs/log/hodev002ler # ifconfig -a
    en0: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
    inet 172.16.1.11 netmask 0xffffff00 broadcast 172.16.1.255
    tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
    en1: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
    inet 10.124.85.220 netmask 0xffffffe0 broadcast 10.124.85.223
    tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
    en2: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
    inet 10.124.140.110 netmask 0xffffffe0 broadcast 10.124.140.127
    tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
    en3: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
    inet 10.56.96.48 netmask 0xffffe000 broadcast 10.56.127.255
    tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
    en5: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
    inet 10.124.251.7 netmask 0xffffffc0 broadcast 10.124.251.63
    tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
    lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT>
    inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
    inet6 ::1/0
    tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
    Also, while we run the cluverify, it appears the following warning on the interfaces verification:
    Checking node connectivity...
    Node connectivity check passed for subnet "172.16.1.0" with node(s) hodev002ler,hodev001ler.
    Node connectivity check passed for subnet "10.124.85.192" with node(s) hodev002ler,hodev001ler.
    Node connectivity check passed for subnet "10.124.140.96" with node(s) hodev002ler,hodev001ler.
    Node connectivity check passed for subnet "10.56.96.0" with node(s) hodev002ler,hodev001ler.
    Node connectivity check passed for subnet "10.124.107.224" with node(s) hodev002ler,hodev001ler.
    Node connectivity check passed for subnet "10.124.251.0" with node(s) hodev002ler,hodev001ler.
    Interfaces found on subnet "172.16.1.0" that are likely candidates for VIP:
    hodev002ler en0:172.16.1.11
    hodev001ler en0:172.16.1.13
    Interfaces found on subnet "10.124.85.192" that are likely candidates for VIP:
    hodev002ler en1:10.124.85.200
    hodev001ler en1:10.124.85.222
    Interfaces found on subnet "10.124.140.96" that are likely candidates for VIP:
    hodev002ler en2:10.124.140.97
    hodev001ler en2:10.124.140.101
    Interfaces found on subnet "10.56.96.0" that are likely candidates for VIP:
    hodev002ler en3:10.56.96.8
    hodev001ler en3:10.56.96.32
    Interfaces found on subnet "10.124.107.224" that are likely candidates for VIP:
    hodev002ler en4:10.124.107.231
    hodev001ler en4:10.124.107.241
    Interfaces found on subnet "10.124.251.0" that are likely candidates for VIP:
    hodev002ler en5:10.124.251.7
    hodev001ler en5:10.124.251.4
    WARNING:
    Could not find a suitable set of interfaces for the private interconnect.
    Any helps?
    Thanks,

    Hi,
    Thanks for the reply. Here is what I have:
    Node 1:
    oracle@hodev001lernew:/home/oracle # ssh hodev002lernew date
    Tue Oct 4 19:09:16 GRNLNDST 2011
    oracle@hodev001lernew:/home/oracle # ssh hodev002lernew_pri date
    Tue Oct 4 19:09:25 GRNLNDST 2011
    Node 2:
    oracle@hodev002lernew:/home/oracle # ssh hodev001lernew date
    Tue Oct 4 19:10:24 GRNLNDST 2011
    oracle@hodev002lernew:/home/oracle # ssh hodev001lernew_pri date
    Tue Oct 4 19:10:57 GRNLNDST 2011
    Regarding the Firewall point, I issued the command lsfilt but didn't return. it means it is not enabled? Im at AIX 5.3, any other command to verify this point ?
    root@hodev001lernew:/ # /usr/sbin/lsfilt -a
    Can not open device /dev/ipsec4_filt.
    root@hodev002lernew:/ # lsfilt -a
    Can not open device /dev/ipsec4_filt.
    Thanks
    Edited by: user11969939 on 04/10/2011 15:23

  • Will RAC's performance bottleneck be the shared disk storage ?

    Hi All
    I'm studying RAC and I'm concerned about RAC's I/O performance bottleneck.
    If I have 10 nodes and they use the same storage disk to hold database, then
    they will do I/Os to the disk simultaneously.
    Maybe we got more latency ...
    Will that be a performance problem?
    How does RAC solve this kind of problem?
    Thanks.

    J.Laurence wrote:
    I see FC can solve the problem with bandwidth(throughput),There are a couple of layers in the I/O subsystem for RAC.
    There is CacheFusion as already mentioned. Why read a data block from disk when another node has it in is buffer cache and can provide that instead (over the Interconnect communication layer).
    Then there is the actual pipes between the server nodes and the storage system. Fibre is slow and not what the latest RAC architecture (such as Exadata) uses.
    Traditionally, you pop a HBA card into the server that provides you with 2 fibre channel pipes to the storage switch. These usually run at 2Gb/s and the I/O driver can load balance and fail over. So it in theory can scale to 4Gb/s and provide redundancy should one one fail.
    Exadata and more "+modern+" RAC systems use HCA cards running Infiniband (IB). This provides scalability of up to 40Gb/s. Also dual port, which means that you have 2 cables running into the storage switch.
    IB supports a protocol called RDMA (Remote Direct Memory Access). This essentially allow memory to be "+shared+" across the IB fabric layer - and is used to read data blocks from the storage array's buffer cache into the local Oracle RAC instance's buffer cache.
    Port to port latency for a properly configured IB layer running QDR (4 speed) can be lower than 70ns.
    And this does not stop there. You can of course add a huge memory cache in the storage array (which is essentially a server with a bunch of disks). Current x86-64 motherboard technology supports up to 512GB RAM.
    Exadata takes it even further as special ASM software on the storage node reconstructs data blocks on the fly to supply the RAC instance with only relevant data. This reduces the data volume to push from the storage node to the database node.
    So fibre channels in this sense is a bit dated. As is GigE.
    But what about the hard drive's reading & writing I/O? Not a problem as the storage array deals with that. A RAC instance that writes a data block, writes it into storage buffer cache.. where the storage array s/w manages that cache and will do the physical write to disk.
    Of course, it will stripe heavily and will have 24+ disk controllers available to write that data block.. so do not think of I/O latency ito of the actual speed of a single disk.

Maybe you are looking for

  • Upgrading to LMS 4.0.1

    After the installation LMS 4.0 I make Upgrading on LMS4.0.1. The process remains stand with the message "File Copy in progress..." during several hours. After the Reboot of the server it points me Cisco Works again with LMS 4.0 starts. OS Version: Wi

  • How to implement this algorithm with region growing ?

    hi, I would like to ask u to help me in developing code for the region growing segmentation algorithm for digital images.The algorithm perform 1.Model I should use the average of the pixel(for each colorchanel,red-green-blue ) from the color I choose

  • Attachment missing in back end for purchase order

    Hi all, We are using SRM 5.0 SP10 extended classic scenario. I have created a shopping cart with attachment. A local purchase order has been successfully created and i can find the attachment in it. When i check in the backend the purchase order does

  • Why is my applet hanging?

    Can anyone give me some insight as to why my applet is hanging? I just get the grey rectangle and no output in the java plugin console as to any kind of errors. I think it may have to do with hte ContentPane in both the main() and init() but I am not

  • Downloading entire Dashboard page

    Hello Guys, I have a dashboard page with about 6 reports and I need to download the entire dashboard page. I know to report links for individual reports but I want to know if there is a way to download the entire dashboard page at once (into one exce