RAC node and disk ping

Hi,
I have a 10.2.0.3 RAC running on RHELv4. What is a node and disk ping in terms of Oracle RAC? I assume the node ping refers to the hangcheck-timer, or is this wrong? The only disk ping that I recall was in 9i but not heard of this in 10g unless it's something else.
Also, where would I configure these parameters?
Thanks.

Try ML Note 294430.1: "CSS Timeout Computation in RAC 10g (10g Release 1 and 10g Release 2)" for example.
Thanks.

Similar Messages

  • Does Maintenance Wizard allow to register RAC node and if yes, then how?

    Maintenance Wizard is not allowing to register additional nodes for RAC. This cause manual run for other nodes. Is there any way we can setup multiple nodes?
    Example, we have two CM/Admin nodes and RAC database. We are using Maintenance Wizard version 2.19.
    Thanks

    Edit the following script to sort out bug 11699526.
    $EOF_HOME/scripts/COMMON/eof1122e.sh
    Line 131 reads "(CONNECT_DATA=(SID=${DB_SID})" change it to read "(CONNECT_DATA=(SERVICE_NAME=${DB_SID})".
    Enter the database service name when configuring the DB nodes.
    You will still need to do the OS related operations manually on node 2.
    Robert

  • Oracle RAC Diskgroups and disks

    Hi all,
    I want to find out what disks are assigned to the diskgroups in my RAC.
    I want to do that without using ASMCA, but rather using CLI.
    I do know what diskgroups I have and on what nodes they're active using "crsstl status resource -l".
    Is there a similar command to extract a mapping of disks to diskgroups?
    Regards,
    Igor.

    922172 wrote:
    Hi all,
    I want to find out what disks are assigned to the diskgroups in my RAC.export ORACLE_SID=+ASM
    sqlplus / as sysdba
    SQL> select name from v$asm_disk d,v$asm_diskgroup g where d.GROUP_NUMBER=g.GROUP_NUMBER and g.name='Diskgroup name';
    I want to do that without using ASMCA, but rather using CLI.
    I do know what diskgroups I have and on what nodes they're active using "crsstl status resource -l".
    Is there a similar command to extract a mapping of disks to diskgroups?ASMCMD [+] > lsdsk -t -G data
    where data is diskgroup
    http://docs.oracle.com/cd/E16338_01/server.112/e10500/asm_util004.htm#CIHDCADB
    >
    Regards,
    Igor.Also close your pending threads and mark post as correct or helpful if you think they were

  • Invalid B-tree node and disk stuck

    Last night my 2 year old iMac froze up. When I re-started I received the flashing question-mark folder. I ended up inserting my OS X install disk and attempting to run the Disk Utility. I tried to repair the disk, but got the "Invalid B-tree node size" error with "The volume needs to be repaired."
    I have tried a lot of things and will probably end up buying Disk Warrior and hoping that fixes it. BUT, I have another problem. I can't eject the OS X install disk. The option to eject is greyed out on the Disk Utility, the eject key doesn't work, and I can't find any way to unload the disk. Help would be appreciated.

    If you have an external hard drive it would be easier to clone your system to the external then boot to the external and run Disk Warrior from there. However if your system won't boot in the first place that won't work. You could pick up a copy of Disk Warrior from the Apple store.
    http://www.bombich.com/software/ccc.html
    George

  • Query on setting of ORACLE_HOME in RAC node and more errors

    i m following oracle docs for install Oracle 10g R2 RAC on RHEL 5.
    i am able to get through most of the part but one minor thing is still bugging me.
    Enter commands similar to the following to set the ORACLE_BASE and
    ORACLE_HOME environment variable in preparation for the Oracle Clusterware
    installation:
    ■ Bourne, Bash, or Korn shell:
    $ ORACLE_BASE=/u01/app/oracle
    $ ORACLE_HOME=/u01/crs/oracle/product/10/app
    $ export ORACLE_BASE
    $ export ORACLE_HOMEthe above extract from the docs (B14203-09) says ORACLE_HOME should be set to the dir where clusterware is installed.
    my query is do we change this ORACLE_HOME to '/u01/app/oracle/product/10/db_1' when we install the oracle database after we finish installing the clusterware.
    Further, permissions for the /u01/crs directory should be root:oinstall and 775 during the installation of the clusterware and changed to 644 after installation of clusterware. is that correct? is it 644 or 640? does it need read access for every one?
    Edited by: iinfi on Jul 5, 2009 1:42 AM
    Edited by: iinfi on Jul 5, 2009 12:20 PM

    i initially thought not setting the oracle_home to the crs directory was the cause of the error message
    [root@node1 ~]# /ora/app/oracle/product/10.2.0/crs/root.sh
    WARNING: directory '/ora/app/oracle/product/10.2.0' is not owned by root
    WARNING: directory '/ora/app/oracle/product' is not owned by root
    WARNING: directory '/ora/app/oracle' is not owned by root
    WARNING: directory '/ora/app' is not owned by root
    WARNING: directory '/ora' is not owned by root
    Checking to see if Oracle CRS stack is already configured
    Setting the permissions on OCR backup directory
    Setting up NS directories
    Failed to upgrade Oracle Cluster Registry configurationthe crs directory is owned by root:oinstall with permissions set to 775. CRS and voting disks are raw devices set on my RHEL 5.3 with openfiler as storage (iSCSI targets)
    [root@node1 node1]# cat /ora/app/oracle/product/10.2.0/crs/log/node1/client/ocrconfig_9680.log
    Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle.  All rights reserved.
    2009-07-05 02:03:01.250: [ OCRCONF][14166224]ocrconfig starts...
    2009-07-05 02:03:01.253: [ OCRCONF][14166224]Upgrading OCR data
    2009-07-05 02:03:01.300: [ OCRCONF][14166224]OCR already in current version.
    2009-07-05 02:03:01.363: [ OCRCONF][14166224]Failed to call clsssinit (21)
    2009-07-05 02:03:01.363: [ OCRCONF][14166224]Failed to make a backup copy of OCR
    2009-07-05 02:03:01.363: [ OCRCONF][14166224]Exiting [status=failed]...can someone please throw some light as to what i am missing here.
    thanks

  • Rac node failed how do you bring it back up?

    Example: If there are 3 RAC nodes and one becomes unavailable/fails, how do you bring it back up?

    There are typically two basic reasons why a RAC node will go down.
    A cluster issue, causing the node to be evicted. This usually means the node lost access to the cluster storage (e.g. no access to voting disks), or lost access to the Interconnect.
    An o/s issue causing the node to fail. E.g. a kernel panic due to a page swap and memory not syncing, or a soft CPU lockup, etc.
    You need to determine why it went down, and determine what is needed to enable it to join the successfully cluster again.

  • 2 x T2000 and RAC and disk formatting

    Hi,
    Do I have to format my disks differently on my T2000s simply because I am planning on using Oracle10G version 2 RAC for both boxes?

    I am not sure what you are aiming at.
    The oracle clusterware needs to have a cluster filesystem for its voting and registry files, because these files needs to be shared by all the nodes you are going to use RAC on. (which means the filesystem needs to be able to cope with concurrent access from more than one node)
    Next, the oracle database files (datafiles, tempfiles, online redologs, spfile) need to be shared too, which can be either by a cluster filesystem, using raw disks or using ASM. The same is the matter here, these files need to be able to cope with multi-node access.
    So, it actually depends on how you have formatted your disks currently, but assuming you mean solaris, and disks formatted with ufs, yes, some disks needs to be reformatted.

  • 2 node RAC: one 10gR2  node and one 11.2.0.3 node on Solaris 10.

    Is it possible to have a mixed Oracle version 2 node RAC with 10gR2 database on one machine and 11.2.0.3 database installed on the other machine. Has anyone done this?

    Hi,
    if you are talking about setting up a RAC, and having a database 10g running on one node, and a different database 11g running on the other node this is possible.
    You will have to use the newest clusterware/GI (11.2.0.3) and multiple Oracle Homes (One for 10g and one for the 11.2.0.3).
    If however you want one database with 2 instances running different versions: Then No.
    Regards
    Sebastian

  • RAC node connected to outside DB  and pass 2 IP address

    Experts,
    we have a 4 nodes 11.1 RAC at red hat
    As we know each node have 3 IP. --public, vip and privated IP.
    it works well in domain inside network.
    But we get a problem when try to connect to outside network client's database.
    the connection string pass 2 IPs to client firewall (based on network monitor).
    listener log show that connection is OK. But Conection is still blocked by client's firewall side.
    The client network staff told us that we passed two IP address during connected connection.
    Could some experts explain why does the RAC node's connected requested passs two IP to client database?
    It is only discovered by network staff. we could not see 2 IP information in listener log file.
    Is it our firewall NAT setting issue? or client firewall NAT setting issue
    Thanks
    Jim
    Edited by: user589812 on Jan 21, 2010 2:25 PM

    Hi Experts
    The Two IP addresses that were being passed were one of the load balancer and one of the db server. the load balancer was supposed to mask the load balancer IP address and only pass the db IP address. Somehow, we were sending both IP to client database--outside network. But IT works well in inter network side. How to eliminate the load balancer IP address from coming to client network firewall --to client database server side?
    I looking for help!
    JIm

  • Replication between 2 node RAC environment and standalone

    I would like to find out if we can setup replication between a (2 node) RAC environment and standalone database located at different location. Any help regarding this would be greatly appreciated.

    Thanks for the reply.
    Consider for a moment I cannot implement dataguard/stream -- because I believe both involves licensing issue --- now only option left is writing my own code. If I right my own code what are the prerequisites for this and what do I have to keep in (technically)mind before i start implementing this. Any help or any lead would be greatly appreciated.

  • Log file sequential read  and RFS ping/write - among Top 5 event

    I have situation here to discuss. In a 3-node RAC setup which is Logical standby DB; one node is showing high CPU utilization around 40~50%. The CPU utilization was less than 20% 10 days back but from 9th oldest day it jumped and consistently shows the double figure. I ran AWR reports on all three nodes and found one node with high CPU utilization and shows below tops events-
    EVENT WAITS TIME(S) AVG WAIT(MS) %TOTAL CALL TIME WAIT CLASS
    CPU time 5,802 34.9
    RFS ping 15 5,118 33,671 30.8 Other
    Log file sequential read 234,831 5,036 21 30.3 System I/O
    Sql*Net more data from
    client 24,171 1,087 45 6.5 Network
    Db file sequential read 130,939 453 3 2.7 User I/O
    Findings:-
    On AWR report(file attached) for node= sipd207; we can see that "RFS PING" wait event takes 30% of the waits and "log file sequential read" wait event takes 30% of the waits that occurs in database.
    Environment :- (Oracle- 10.2.0.4.0, O/S - AIX .3)
    1)other node awr shows "log file sync" - is it due to oversized log buffer?
    2)Network wait events can be reduced by tweaking SDU & TDU values based on MDU.
    3) Why ARCH processes taking much to archives filled redo logs; is it issue with slow disk I/O?
    Regards
    WORKLOAD REPOSITORY report for<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<DB Name DB Id Instance Inst Num Release RAC Host
    XXXPDB 4123595889 XXX2p2 2 10.2.0.4.0 YES sipd207
    Snap Id Snap Time Sessions Curs/Sess
    Begin Snap: 1053 04-Apr-11 18:00:02 59 7.4
    End Snap: 1055 04-Apr-11 20:00:35 56 7.5
    Elapsed: 120.55 (mins)
    DB Time: 233.08 (mins)
    Cache Sizes
    ~~~~~~~~~~~ Begin End
    Buffer Cache: 3,728M 3,728M Std Block Size: 8K
    Shared Pool Size: 4,080M 4,080M Log Buffer: 14,332K
    Load Profile
    ~~~~~~~~~~~~ Per Second Per Transaction
    Redo size: 245,392.33 10,042.66
    Logical reads: 9,080.80 371.63
    Block changes: 1,518.12 62.13
    Physical reads: 7.50 0.31
    Physical writes: 44.00 1.80
    User calls: 36.44 1.49
    Parses: 25.84 1.06
    Hard parses: 0.59 0.02
    Sorts: 12.06 0.49
    Logons: 0.05 0.00
    Executes: 295.91 12.11
    Transactions: 24.43
    % Blocks changed per Read: 16.72 Recursive Call %: 94.18
    Rollback per transaction %: 4.15 Rows per Sort: 53.31
    Instance Efficiency Percentages (Target 100%)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Buffer Nowait %: 99.99 Redo NoWait %: 100.00
    Buffer Hit %: 99.92 In-memory Sort %: 100.00
    Library Hit %: 99.83 Soft Parse %: 97.71
    Execute to Parse %: 91.27 Latch Hit %: 99.79
    Parse CPU to Parse Elapsd %: 15.69 % Non-Parse CPU: 99.95
    Shared Pool Statistics Begin End
    Memory Usage %: 83.60 84.67
    % SQL with executions>1: 97.49 97.19
    % Memory for SQL w/exec>1: 97.10 96.67
    Top 5 Timed Events Avg %Total
    ~~~~~~~~~~~~~~~~~~ wait Call
    Event Waits Time (s) (ms) Time Wait Class
    CPU time 4,503 32.2
    RFS ping 168 4,275 25449 30.6 Other
    log file sequential read 183,537 4,173 23 29.8 System I/O
    SQL*Net more data from client 21,371 1,009 47 7.2 Network
    RFS write 25,438 343 13 2.5 System I/O
    RAC Statistics DB/Inst: UDAS2PDB/udas2p2 Snaps: 1053-1055
    Begin End
    Number of Instances: 3 3
    Global Cache Load Profile
    ~~~~~~~~~~~~~~~~~~~~~~~~~ Per Second Per Transaction
    Global Cache blocks received: 0.78 0.03
    Global Cache blocks served: 1.18 0.05
    GCS/GES messages received: 131.69 5.39
    GCS/GES messages sent: 139.26 5.70
    DBWR Fusion writes: 0.06 0.00
    Estd Interconnect traffic (KB) 68.60
    Global Cache Efficiency Percentages (Target local+remote 100%)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Buffer access - local cache %: 99.91
    Buffer access - remote cache %: 0.01
    Buffer access - disk %: 0.08
    Global Cache and Enqueue Services - Workload Characteristics
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Avg global enqueue get time (ms): 0.5
    Avg global cache cr block receive time (ms): 0.9
    Avg global cache current block receive time (ms): 1.0
    Avg global cache cr block build time (ms): 0.0
    Avg global cache cr block send time (ms): 0.1
    Global cache log flushes for cr blocks served %: 2.9
    Avg global cache cr block flush time (ms): 4.6
    Avg global cache current block pin time (ms): 0.0
    Avg global cache current block send time (ms): 0.1
    Global cache log flushes for current blocks served %: 0.1
    Avg global cache current block flush time (ms): 5.0
    Global Cache and Enqueue Services - Messaging Statistics
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Avg message sent queue time (ms): 0.1
    Avg message sent queue time on ksxp (ms): 0.6
    Avg message received queue time (ms): 0.0
    Avg GCS message process time (ms): 0.0
    Avg GES message process time (ms): 0.1
    % of direct sent messages: 31.57
    % of indirect sent messages: 5.17
    % of flow controlled messages: 63.26
    Time Model Statistics DB/Inst: UDAS2PDB/udas2p2 Snaps: 1053-1055
    -> Total time in database user-calls (DB Time): 13984.6s
    -> Statistics including the word "background" measure background process
    time, and so do not contribute to the DB time statistic
    -> Ordered by % or DB time desc, Statistic name
    Statistic Name Time (s) % of DB Time
    sql execute elapsed time 7,270.6 52.0
    DB CPU 4,503.1 32.2
    parse time elapsed 506.7 3.6
    hard parse elapsed time 497.8 3.6
    sequence load elapsed time 152.4 1.1
    failed parse elapsed time 19.5 .1
    repeated bind elapsed time 3.4 .0
    PL/SQL execution elapsed time 0.7 .0
    hard parse (sharing criteria) elapsed time 0.3 .0
    connection management call elapsed time 0.3 .0
    hard parse (bind mismatch) elapsed time 0.0 .0
    DB time 13,984.6 N/A
    background elapsed time 869.1 N/A
    background cpu time 276.6 N/A
    Wait Class DB/Inst: UDAS2PDB/udas2p2 Snaps: 1053-1055
    -> s - second
    -> cs - centisecond - 100th of a second
    -> ms - millisecond - 1000th of a second
    -> us - microsecond - 1000000th of a second
    -> ordered by wait time desc, waits desc
    Avg
    %Time Total Wait wait Waits
    Wait Class Waits -outs Time (s) (ms) /txn
    System I/O 529,934 .0 4,980 9 3.0
    Other 582,349 37.4 4,611 8 3.3
    Network 279,858 .0 1,009 4 1.6
    User I/O 54,899 .0 317 6 0.3
    Concurrency 136,907 .1 58 0 0.8
    Cluster 60,300 .0 41 1 0.3
    Commit 80 .0 10 130 0.0
    Application 6,707 .0 3 0 0.0
    Configuration 17,528 98.5 1 0 0.1
    Wait Events DB/Inst: UDAS2PDB/udas2p2 Snaps: 1053-1055
    -> s - second
    -> cs - centisecond - 100th of a second
    -> ms - millisecond - 1000th of a second
    -> us - microsecond - 1000000th of a second
    -> ordered by wait time desc, waits desc (idle events last)
    Avg
    %Time Total Wait wait Waits
    Event Waits -outs Time (s) (ms) /txn
    RFS ping 168 .0 4,275 25449 0.0
    log file sequential read 183,537 .0 4,173 23 1.0
    SQL*Net more data from clien 21,371 .0 1,009 47 0.1
    RFS write 25,438 .0 343 13 0.1
    db file sequential read 54,680 .0 316 6 0.3
    DFS lock handle 97,149 .0 214 2 0.5
    log file parallel write 104,808 .0 157 2 0.6
    db file parallel write 143,905 .0 149 1 0.8
    RFS random i/o 25,438 .0 86 3 0.1
    RFS dispatch 25,610 .0 56 2 0.1
    control file sequential read 39,309 .0 55 1 0.2
    row cache lock 130,665 .0 47 0 0.7
    gc current grant 2-way 35,498 .0 23 1 0.2
    wait for scn ack 50,872 .0 20 0 0.3
    enq: WL - contention 6,156 .0 14 2 0.0
    gc cr grant 2-way 16,917 .0 11 1 0.1
    log file sync 80 .0 10 130 0.0
    Log archive I/O 3,986 .0 9 2 0.0
    control file parallel write 3,493 .0 8 2 0.0
    latch free 2,356 .0 6 2 0.0
    ksxr poll remote instances 278,473 49.4 6 0 1.6
    enq: XR - database force log 2,890 .0 4 1 0.0
    enq: TX - index contention 325 .0 3 11 0.0
    buffer busy waits 4,371 .0 3 1 0.0
    gc current block 2-way 3,002 .0 3 1 0.0
    LGWR wait for redo copy 9,601 .2 2 0 0.1
    SQL*Net break/reset to clien 6,438 .0 2 0 0.0
    latch: ges resource hash lis 23,223 .0 2 0 0.1
    enq: WF - contention 32 6.3 2 62 0.0
    enq: FB - contention 660 .0 2 2 0.0
    enq: PS - contention 1,088 .0 2 1 0.0
    library cache lock 869 .0 1 2 0.0
    enq: CF - contention 671 .1 1 2 0.0
    gc current grant busy 1,488 .0 1 1 0.0
    gc current multi block reque 1,072 .0 1 1 0.0
    reliable message 618 .0 1 2 0.0
    CGS wait for IPC msg 62,402 100.0 1 0 0.4
    gc current block 3-way 998 .0 1 1 0.0
    name-service call wait 18 .0 1 57 0.0
    cursor: pin S wait on X 78 100.0 1 11 0.0
    os thread startup 16 .0 1 53 0.0
    enq: RO - fast object reuse 193 .0 1 3 0.0
    IPC send completion sync 652 99.2 1 1 0.0
    local write wait 194 .0 1 3 0.0
    gc cr block 2-way 534 .0 0 1 0.0
    log file switch completion 17 .0 0 20 0.0
    SQL*Net message to client 258,483 .0 0 0 1.5
    undo segment extension 17,282 99.9 0 0 0.1
    gc cr block 3-way 286 .7 0 1 0.0
    enq: TM - contention 76 .0 0 4 0.0
    PX Deq: reap credit 15,246 95.6 0 0 0.1
    kksfbc child completion 5 100.0 0 49 0.0
    enq: TT - contention 141 .0 0 2 0.0
    enq: HW - contention 203 .0 0 1 0.0
    RFS create 2 .0 0 115 0.0
    rdbms ipc reply 339 .0 0 1 0.0
    PX Deq Credit: send blkd 452 20.1 0 0 0.0
    gcs log flush sync 128 32.8 0 2 0.0
    latch: cache buffers chains 128 .0 0 1 0.0
    library cache pin 441 .0 0 0 0.0
    Wait Events DB/Inst: UDAS2PDB/udas2p2 Snaps: 1053-1055
    -> s - second
    -> cs - centisecond - 100th of a second
    -> ms - millisecond - 1000th of a second
    -> us - microsecond - 1000000th of a second
    -> ordered by wait time desc, waits desc (idle events last)

    We only apply on one node in a cluster so I would expect that the node running SQL Apply would have much higher usage and waits. Is this what you are asking?
    Larry

  • RMAN, RAC, NFS, and server lock ups

    Good day. My environment is:
    --a 2-node RAC
    --Enterprise Edition 11.2.0.3
    --RHEL 5.1
    The goal is to use RMAN to push backups to a shared NFS mount (on a different server). Both nodes will have access to this location (in the event one node goes down, the other can still run backups). Easy, right?
    Wrong.
    I've tried every NFS mount option in the book. Most work just fine, some don't. When I use the recommended NFS mount options:
    rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp, vers=3,timeo=600, actimeo=0
    or
    rw,bg,hard,nointr,rsize=32768,wsize=32768,proto=tcp,noac,forcedirectio, vers=3,suid
    The mount works normally. I can "ls" and "mkdir" and "touch" and "vi" and "cp" files back and forth from the NFS backup location to the RAC node all day long. No problems. However, when I try to do almost anything in RMAN which requires writing to the NFS backup location such as the command "backup archive all delete input;" (or even things as simple as a Crosscheck or RMAN configuration change which writes any changes back to the autobackup ControlFile) the node locks up. There are no errors (or if there are, I don't know where to find them), even when I use RMAN log.
    Just to recap: I run a Crosscheck (or any RMAN process that writes to the NFS backup location), the node will lock up, and I can let it sit for a day, inaccessible, with CRSCTL on the other node saying it's offline, and the node will never come out of a "frozen" state. It cannot be pinged or connected to.
    I think I can safely rule out NFS mount options at this point.
    I understand (after extensive reading of MOS docs and testing) that RAC RMAN can and does suffer from inefficient I/O when writing to an NFS mount. I don't think that's the culprit either. The autobackup ControlFile is not that big and I cannot see how running a simple Crosscheck would lock an entire node.
    I am hoping someone has encountered this in the past and hopefully it's just a simple misconfiguration somewhere.

    My NFS line in /etc/fstab is (these options are for supporting 11.2.0.3, 11.1.0.7, and 10.2.0.4/5 simultaneously): server.domain:/NFS_Export /backup nfs rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,vers=3,timeo=600 0 0
    Before you installed GI, did you by chance do a yum update? I've encountered a similar issue which ended up being due to mkinitrd creating a corrupted kernel; mkinitrd is invoked during the GI installation when the ADVM drivers are added and in my case mkinitrd created a new kernel prior to the new kernel being installed. Second to that, make sure you have the matching kernel headers to your kernel version. If they are different then you could probably get away with just creating a new kernel with mkinitrd and relinking GI/RDBMS homes, but be prepared to wipe GI and reinstall.

  • What is best use of 1400 gb SGA (2 rac nodes 768gb each)

    currently using 11.2.0.3.0 on unix sun sever with 2 RAC nodes each 8 UltraSPARC-T1 cpus (came out in 2005) four threads each so oracle sees 32 CPUS very slow(1.2 gb).  Database is 4TB in size on regular SAN (10k speed).
    8gb SGA.
    New boss wants to update system to the max to get best performance possible  Money is a concern of course but budget is pretty high,  Our use case is 12-16 users at same time, running reports some small others very large (return single row or 10000s or rows).  reports take 5 sec to 5 minutes, Our job is get the fastest system possible,  We have total of 8 licenses available so we can have 16 cores.  We are also getting a 6tb all flash SSD array for database.  we can get any CPU we want but we cant use parallel query server due to all kinds of issues we have experienced (too many slaves, RAC interconnect saturation etc, whack-a-mole).  sparc has too many threads and without PS oracle runs query in single thread. 
    we have speced out the following system for each RAC node
    HP ProLiant DL380p Gen8 8 SFF server
    2 Intel Xeon E5-2637v2 3.5GHz/4-core cpus
    768 gb ram
    2 HP 300GB 6G SAS 15K drives for database software
    this will give us total of 4 Xeon E5-2637v2 cpus 16 cores total (,5 factor for 8 licenses) and 1536 ram (leaving ~1400 for sga).  this will guarantee an available core for each user.  we intend to create very very large keep pool around 300 gb for each node that will hold all our dimension tables.  this we hope will reduce reads from the SSD to just data from fact tables.,
    Are we doing a massive overkill here?  the budget for this was way less than what our boss expected.  will that big an sga be wasted will say a 256gb be fine.  or will oracle take advantage of it and be able to keep most blocks in there.
    will an sga that big cause oracle problems due to overhead of handling that much ram?

    Current System:
    ===========
    a. Version : 11.2.0.3
    b. Unix Sun
    c. CPU - 8 cpus with 4 threads => 32 logical cpus or cores
    d. database 4TB
    e. SAN - 10k speed disk drives
    f. 8gb SGA
    g. 1.2 gb ??
    h. Users --> 12-16 concurrent and run reports varying size
    i. reports elasped time 5 sec to 5 mins
    j. cpu license -->8
    Target System
    ===========
    a. Version: 11.2.0.3
    b. HP ProLiant DL380p Gen8 8 SFF server
    c. RAM --> 768 GB
    d. 2 HP 300GB 6G SAS 15K drives for database software
    e. large keep pool -->90 gb to  hold all dimension tables. 
    f.  SSD to just data from fact tables
    g. SGA -->256gb
    Reassessment of the performance issues of current system appears to be required.Good performance tuning expert is required to look into tuning issues of current application by analyzing awr performance metrics . If 8GB SGA is not enough,then reason behind so is that queries running in the system are not having good access path to select lesser data to avoid flushing out of recent buffers from different tables involved in the query. Until those issues are identified , wherever you go, performance issue wont be going away as table size increase in future , problem will reappear.Even if the queries are running with more FULL Scan , then re-platforming to Exadata might be right decision as Exadata has smart scan , cell offloading feature which works faster and might be right direction for best performance and best investment for future.Compression (compress for OLTP) could be one of the other feature to exploit to improve further efficiency while reading the lesser block in lesser read time.
    Investment in infrastructure will solve a few issue in short term but long term issue will again arise.
    Investment in identifying the performance issues of current system would be best investment in current scenario.

  • Rac node restart

    Hello everyone,
    I have met an error,that is our RAC node auto restart with below messages.
    #/u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/alert_odsdb1.log
    Fri Jun 07 12:23:42 2013
    Thread 1 cannot allocate new log, sequence 58363
    Checkpoint not complete
    Current log# 2 seq# 58362 mem# 0: +DATA/odsdb/onlinelog/group_2.265.812288839
    Current log# 2 seq# 58362 mem# 1: +DATA/odsdb/onlinelog/group_2.266.812288839
    Fri Jun 07 12:23:42 2013
    NOTE: ASMB terminating
    Errors in file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc:
    ORA-15064: ? ASM ??????
    ORA-03113: ?????????
    ?? ID:
    ?? ID: 2047 ???: 5
    Errors in file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc:
    ORA-15064: ? ASM ??????
    ORA-03113: ?????????
    ?? ID:
    ?? ID: 2047 ???: 5
    ASMB (ospid: 32641): terminating the instance due to error 15064
    Fri Jun 07 12:23:44 2013
    ORA-1092 : opitsk aborting process
    Fri Jun 07 12:23:46 2013
    ORA-1092 : opitsk aborting process
    Instance terminated by ASMB, pid = 32641
    Fri Jun 07 12:25:02 2013
    Starting ORACLE instance (normal)
    Fri Jun 07 12:25:23 2013
    LICENSE_MAX_SESSION = 0
    LICENSE_SESSIONS_WARNING = 0
    Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
    [name='eth1:1', type=1, ip=169.254.37.103, mac=00-26-55-eb-61-89, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
    Public Interface 'eth0' configured from GPnP for use as a public interface.
    [name='eth0', type=1, ip=135.33.2.8, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
    Public Interface 'eth0:1' configured from GPnP for use as a public interface.
    [name='eth0:1', type=1, ip=135.33.2.13, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
    Picked latch-free SCN scheme 3
    Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/oracle/product/11.2.0/dbhome_2/dbs/arch
    Autotune of undo retention is turned on.
    LICENSE_MAX_USERS = 0
    SYS auditing is disabled
    Starting up:
    Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
    With the Partitioning, Real Application Clusters, OLAP, Data Mining
    and Real Application Testing options.
    ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_2
    System name:     Linux
    Node name:     odsdb1
    Release:     2.6.18-308.el5
    Version:     #1 SMP Fri Jan 27 17:17:51 EST 2012
    Machine:     x86_64
    Using parameter settings in server-side pfile /u01/app/oracle/product/11.2.0/dbhome_2/dbs/initodsdb1.ora
    System parameters with non-default values:
    processes = 4500
    sessions = 6784
    event = ""
    spfile = "+DATA/odsdb/spfileodsdb.ora"
    nls_language = "SIMPLIFIED CHINESE"
    nls_territory = "CHINA"
    memory_target = 170G
    control_files = "+DATA/odsdb/controlfile/current.262.812288837"
    control_files = "+DATA/odsdb/controlfile/current.261.812288837"
    db_block_size = 8192
    compatible = "11.2.0.0.0"
    db_files = 4096
    cluster_database = TRUE
    db_create_file_dest = "+DATA"
    db_recovery_file_dest = ""
    db_recovery_file_dest_size= 38820M
    thread = 1
    undo_tablespace = "UNDOTBS1"
    instance_number = 1
    remote_login_passwordfile= "EXCLUSIVE"
    db_domain = ""
    dispatchers = "(PROTOCOL=TCP) (SERVICE=odsdbXDB)"
    remote_listener = "odsdb-cluster-scan:1521"
    job_queue_processes = 1000
    audit_file_dest = "/u01/app/oracle/admin/odsdb/adump"
    audit_trail = "DB"
    db_name = "odsdb"
    open_cursors = 300
    diagnostic_dest = "/u01/app/oracle"
    Cluster communication is configured to use the following interface(s) for this instance
    169.254.37.103
    cluster interconnect IPC version:Oracle UDP/IP (generic)
    IPC Vendor 1 proto 2
    Fri Jun 07 12:25:33 2013
    PMON started with pid=2, OS id=22959
    Fri Jun 07 12:25:33 2013
    PSP0 started with pid=3, OS id=22962
    Fri Jun 07 12:25:34 2013
    VKTM started with pid=4, OS id=22971 at elevated priority
    VKTM running at (1)millisec precision with DBRM quantum (100)ms
    Fri Jun 07 12:25:34 2013
    GEN0 started with pid=5, OS id=22977
    Fri Jun 07 12:25:34 2013
    DIAG started with pid=6, OS id=22979
    Fri Jun 07 12:25:35 2013
    DBRM started with pid=7, OS id=22981
    Fri Jun 07 12:25:35 2013
    PING started with pid=8, OS id=22983
    Fri Jun 07 12:25:35 2013
    ACMS started with pid=9, OS id=22985
    Fri Jun 07 12:25:35 2013
    DIA0 started with pid=10, OS id=22987
    Fri Jun 07 12:25:35 2013
    LMON started with pid=11, OS id=22989
    Fri Jun 07 12:25:35 2013
    LMD0 started with pid=12, OS id=22991
    * Load Monitor used for high load check
    * New Low - High Load Threshold Range = [61440 - 81920]
    Fri Jun 07 12:25:35 2013
    LMS0 started with pid=13, OS id=22994 at elevated priority
    Fri Jun 07 12:25:35 2013
    LMS1 started with pid=14, OS id=22998 at elevated priority
    Fri Jun 07 12:25:35 2013
    LMS2 started with pid=15, OS id=23002 at elevated priority
    Fri Jun 07 12:25:35 2013
    LMS3 started with pid=16, OS id=23006 at elevated priority
    Fri Jun 07 12:25:35 2013
    RMS0 started with pid=17, OS id=23010
    Fri Jun 07 12:25:35 2013
    LMHB started with pid=18, OS id=23013
    Fri Jun 07 12:25:35 2013
    MMAN started with pid=19, OS id=23015
    Fri Jun 07 12:25:35 2013
    DBW0 started with pid=20, OS id=23017
    Fri Jun 07 12:25:35 2013
    DBW1 started with pid=21, OS id=23019
    Fri Jun 07 12:25:35 2013
    DBW2 started with pid=22, OS id=23022
    Fri Jun 07 12:25:35 2013
    DBW3 started with pid=23, OS id=23024
    Fri Jun 07 12:25:35 2013
    DBW4 started with pid=24, OS id=23026
    Fri Jun 07 12:25:35 2013
    DBW5 started with pid=25, OS id=23028
    Fri Jun 07 12:25:35 2013
    DBW6 started with pid=26, OS id=23031
    Fri Jun 07 12:25:35 2013
    DBW7 started with pid=27, OS id=23033
    Fri Jun 07 12:25:35 2013
    LGWR started with pid=28, OS id=23035
    Fri Jun 07 12:25:35 2013
    CKPT started with pid=29, OS id=23037
    Fri Jun 07 12:25:35 2013
    SMON started with pid=30, OS id=23039
    Fri Jun 07 12:25:35 2013
    RECO started with pid=31, OS id=23041
    Fri Jun 07 12:25:35 2013
    RBAL started with pid=32, OS id=23043
    Fri Jun 07 12:25:35 2013
    ASMB started with pid=33, OS id=23045
    Fri Jun 07 12:25:35 2013
    MMON started with pid=34, OS id=23048
    Fri Jun 07 12:25:35 2013
    MMNL started with pid=35, OS id=23052
    Fri Jun 07 12:25:35 2013
    starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
    NOTE: initiating MARK startup
    starting up 1 shared server(s) ...
    Starting background process MARK
    Fri Jun 07 12:25:35 2013
    MARK started with pid=37, OS id=23056
    NOTE: MARK has subscribed
    lmon registered with NM - instance number 1 (internal mem no 0)
    Reconfiguration started (old inc 0, new inc 119)
    List of instances:
    1 2 (myinst: 1)
    Global Resource Directory frozen
    * allocate domain 0, invalid = TRUE
    Communication channels reestablished
    * domain 0 valid according to instance 2
    * domain 0 valid = 1 according to instance 2
    Master broadcasted resource hash value bitmaps
    Non-local Process blocks cleaned out
    LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    Set master node info
    Submitted all remote-enqueue requests
    Dwn-cvts replayed, VALBLKs dubious
    All grantable enqueues granted
    Submitted all GCS remote-cache requests
    Fix write in gcs resources
    Reconfiguration started (old inc 119, new inc 121)
    List of instances:
    1 2 (myinst: 1)
    Nested reconfiguration detected.
    Global Resource Directory frozen
    Communication channels reestablished
    Master broadcasted resource hash value bitmaps
    Non-local Process blocks cleaned out
    LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    Set master node info
    Submitted all remote-enqueue requests
    Dwn-cvts replayed, VALBLKs dubious
    All grantable enqueues granted
    Fri Jun 07 12:25:45 2013
    Submitted all GCS remote-cache requests
    Fri Jun 07 12:26:08 2013
    Fix write in gcs resources
    Reconfiguration complete
    Fri Jun 07 12:26:10 2013
    LCK0 started with pid=40, OS id=23632
    Fri Jun 07 12:26:10 2013
    Starting background process RSMN
    Fri Jun 07 12:26:10 2013
    RSMN started with pid=41, OS id=23646
    ORACLE_BASE not set in environment. It is recommended
    that ORACLE_BASE be set in the environment
    Reusing ORACLE_BASE from an earlier startup = /u01/app/oracle
    Fri Jun 07 12:26:11 2013
    ALTER SYSTEM SET local_listener=' (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=135.33.2.13)(PORT=1521))))' SCOPE=MEMORY SID='odsdb1';
    ALTER DATABASE MOUNT /* db agent *//* {1:9971:2} */
    Fri Jun 07 12:26:11 2013
    NOTE: Loaded library: System
    Fri Jun 07 12:26:11 2013
    SUCCESS: diskgroup DATA was mounted
    Fri Jun 07 12:26:11 2013
    NOTE: dependency between database odsdb and diskgroup resource ora.DATA.dg is established
    Fri Jun 07 12:26:16 2013
    Successful mount of redo thread 1, with mount id 3452000551
    Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
    Lost write protection disabled
    Completed: ALTER DATABASE MOUNT /* db agent *//* {1:9971:2} */
    ALTER DATABASE OPEN /* db agent *//* {1:9971:2} */
    Picked broadcast on commit scheme to generate SCNs
    Thread 1 advanced to log sequence 58364 (thread open)
    Thread 1 opened at log sequence 58364
    Current log# 2 seq# 58364 mem# 0: +DATA/odsdb/onlinelog/group_2.265.812288839
    Current log# 2 seq# 58364 mem# 1: +DATA/odsdb/onlinelog/group_2.266.812288839
    Successful open of redo thread 1
    MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
    Fri Jun 07 12:26:21 2013
    SMON: enabling cache recovery
    Fri Jun 07 12:26:23 2013
    minact-scn: Inst 1 is a slave inc#:121 mmon proc-id:23048 status:0x2
    minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
    Fri Jun 07 12:26:34 2013
    [23651] Successfully onlined Undo Tablespace 2.
    Undo initialization finished serial:0 start:2061372614 end:2061384964 diff:12350 (123 seconds)
    Verifying file header compatibility for 11g tablespace encryption..
    Verifying 11g file header compatibility for tablespace encryption completed
    Fri Jun 07 12:26:34 2013
    SMON: enabling tx recovery
    Database Characterset is ZHS16GBK
    No Resource Manager plan active
    Starting background process GTX0
    Fri Jun 07 12:26:35 2013
    GTX0 started with pid=45, OS id=23931
    Starting background process RCBG
    Fri Jun 07 12:26:35 2013
    RCBG started with pid=46, OS id=23933
    replication_dependency_tracking turned off (no async multimaster replication found)
    Starting background process QMNC
    Fri Jun 07 12:26:35 2013
    QMNC started with pid=48, OS id=23940
    Completed: ALTER DATABASE OPEN /* db agent *//* {1:9971:2} */
    Fri Jun 07 12:26:38 2013
    Starting background process CJQ0
    Fri Jun 07 12:26:38 2013
    CJQ0 started with pid=55, OS id=23977
    Fri Jun 07 12:27:56 2013
    Thread 1 advanced to log sequence 58365 (LGWR switch)
    Current log# 1 seq# 58365 mem# 0: +DATA/odsdb/onlinelog/group_1.263.812288839
    Current log# 1 seq# 58365 mem# 1: +DATA/odsdb/onlinelog/group_1.264.812288839
    Fri Jun 07 12:28:18 2013
    Starting background process SMCO
    Fri Jun 07 12:28:18 2013
    SMCO started with pid=70, OS id=25166
    Fri Jun 07 12:29:01 2013
    Thread 1 cannot allocate new log, sequence 58366
    Trace file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc
    Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
    With the Partitioning, Real Application Clusters, OLAP, Data Mining
    and Real Application Testing options
    ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_2
    System name: Linux
    Node name: odsdb1
    Release: 2.6.18-308.el5
    Version: #1 SMP Fri Jan 27 17:17:51 EST 2012
    Machine: x86_64
    Instance name: odsdb1
    Redo thread mounted by this instance: 0 <none>
    Oracle process number: 33
    Unix process pid: 32641, image: oracle@odsdb1 (ASMB)
    *** 2013-05-14 15:37:08.705
    *** SESSION ID:(3499.1) 2013-05-14 15:37:08.705
    *** CLIENT ID:() 2013-05-14 15:37:08.705
    *** SERVICE NAME:() 2013-05-14 15:37:08.705
    *** MODULE NAME:() 2013-05-14 15:37:08.705
    *** ACTION NAME:() 2013-05-14 15:37:08.705
    NOTE: initiating MARK startup
    *** 2013-05-14 15:37:16.835
    instance health monitoring reports instance shutting down
    *** 2013-06-07 12:23:42.700
    NOTE: ASMB terminating
    ORA-15064: ? ASM ??????
    ORA-03113: ?????????
    ?? ID:
    ?? ID: 2047 ???: 5
    error 15064 detected in background process
    ORA-15064: ? ASM ??????
    ORA-03113: ?????????
    ?? ID:
    ?? ID: 2047 ???: 5
    kjzduptcctx: Notifying DIAG for crash event
    ----- Abridged Call Stack Trace -----
    ksedsts()+461<-kjzdssdmp()+267<-kjzduptcctx()+232<-kjzdicrshnfy()+53<-ksuitm()+1332<-ksbrdp()+3344<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+252<-main()+201<-__libc_start_main()+244<-_start()+36
    ----- End of Abridged Call Stack Trace -----
    *** 2013-06-07 12:23:42.783
    ASMB (ospid: 32641): terminating the instance due to error 15064
    /u01/app/grid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log
    NOTE: ASMB process exiting, either shutdown is in progress
    NOTE: or foreground connected to ASMB was killed.
    Fri Jun 07 12:23:42 2013
    NOTE: client exited [14808]
    Fri Jun 07 12:23:44 2013
    Received an instance abort message from instance 2
    Please check instance 2 alert and LMON trace files for detail.
    Fri Jun 07 12:23:44 2013
    Received an instance abort message from instance 2
    Please check instance 2 alert and LMON trace files for detail.
    LMD0 (ospid: 31201): terminating the instance due to error 481
    Instance terminated by LMD0, pid = 31201
    Fri Jun 07 12:24:30 2013
    * instance_number obtained from CSS = 1, checking for the existence of node 0...
    * node 0 does not exist. instance_number = 1
    Starting ORACLE instance (normal)
    LICENSE_MAX_SESSION = 0
    LICENSE_SESSIONS_WARNING = 0
    Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
    [name='eth1:1', type=1, ip=169.254.37.103, mac=00-26-55-eb-61-89, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
    Public Interface 'eth0' configured from GPnP for use as a public interface.
    [name='eth0', type=1, ip=135.33.2.8, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
    Picked latch-free SCN scheme 3
    Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/11.2.0.2/grid/dbs/arch
    Autotune of undo retention is turned on.
    LICENSE_MAX_USERS = 0
    [grid@odsdb1 cssd]$ file core.30481
    core.30481: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'ocssd.bin'
    [grid@odsdb1 cssd]$ gdb
    gdb gdbserver gdbtui
    [grid@odsdb1 cssd]$ gdb ocssd.bin core.30481
    GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-42.el5)
    Copyright (C) 2009 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law. Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-redhat-linux-gnu".
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>...
    Reading symbols from /u01/app/11.2.0.2/grid/bin/ocssd.bin...(no debugging symbols found)...done.
    [New Thread 30486]
    [New Thread 30530]
    [New Thread 30526]
    [New Thread 30525]
    [New Thread 30523]
    [New Thread 30522]
    [New Thread 30521]
    [New Thread 30520]
    [New Thread 30519]
    [New Thread 30504]
    [New Thread 30503]
    [New Thread 30495]
    [New Thread 30485]
    [New Thread 30484]
    [New Thread 30483]
    [New Thread 30481]
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libhasgen11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libhasgen11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libocr11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocr11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libocrb11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocrb11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libocrutl11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocrutl11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libclntsh.so.11.1...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libclntsh.so.11.1
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libskgxn2.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libskgxn2.so
    Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
    Loaded symbols for /lib64/libdl.so.2
    Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
    Loaded symbols for /lib64/libm.so.6
    Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
    [Thread debugging using libthread_db enabled]
    Loaded symbols for /lib64/libpthread.so.0
    Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
    Loaded symbols for /lib64/libnsl.so.1
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libasmclntsh11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libasmclntsh11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libcell11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libcell11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libskgxp11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libskgxp11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libnnz11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libnnz11.so
    Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
    Loaded symbols for /lib64/libc.so.6
    Reading symbols from /usr/lib64/libaio.so.1...(no debugging symbols found)...done.
    Loaded symbols for /usr/lib64/libaio.so.1
    Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
    Loaded symbols for /lib64/ld-linux-x86-64.so.2
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libnque11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libnque11.so
    Reading symbols from /opt/oracle/extapi/64/asm/orcl/1/libasm.so...(no debugging symbols found)...done.
    Loaded symbols for /opt/oracle/extapi/64/asm/orcl/1/libasm.so
    warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff505fd000
    Core was generated by `/u01/app/11.2.0.2/grid/bin/ocssd.bin '.
    Program terminated with signal 6, Aborted.
    #0 0x000000369ea30265 in raise () from /lib64/libc.so.6
    (gdb) where
    #0 0x000000369ea30265 in raise () from /lib64/libc.so.6
    #1 0x000000369ea31d10 in abort () from /lib64/libc.so.6
    #2 0x00002afc67f9aeda in scls_abort (flags=0) at scls.c:7088
    #3 0x000000000040babd in clssscExit (thrd=0x10d325a0, status=clssscreasonSHUTNORM) at clsssc.c:2155
    #4 0x0000000000446221 in clssgmClientShutdown (thrd=0x10d325a0, cmInfo=0x10b40090) at clssgmc.c:6415
    #5 0x0000000000436707 in clssgmProcClientReqs (thrd=0x10d325a0, clctx=0x10b40630) at clssgmc.c:704
    #6 0x0000000000436405 in clssgmclientlsnr (thrd=0x10d325a0) at clssgmc.c:644
    #7 0x000000000040ac2f in clssscthrdmain (thrd=0x10d325a0) at clsssc.c:1716
    #8 0x000000369fa0677d in start_thread () from /lib64/libpthread.so.0
    #9 0x000000369ead49ad in clone () from /lib64/libc.so.6
    (gdb)
    2013-06-07 12:19:37.377: [    CSSD][1085888832]clssscSelect: cookie accept request 0x10b40630
    2013-06-07 12:19:37.377: [    CSSD][1085888832]clssgmAllocProc: (0x2aaab0133ea0) allocated
    2013-06-07 12:19:37.379: [    CSSD][1085888832]clssgmClientConnectMsg: properties of cmProc 0x2aaab0133ea0 - 1,2,3,4,5
    2013-06-07 12:19:37.379: [    CSSD][1085888832]clssgmClientConnectMsg: Connect from con(0x6ae44fa) proc(0x2aaab0133ea0) pid(14139/14139) version 11:2:1:4, properties: 1,2,3,4,5
    2013-06-07 12:19:37.379: [    CSSD][1085888832]clssgmClientConnectMsg: msg flags 0x0000
    2013-06-07 12:19:37.384: [    CSSD][1085888832]clssscSelect: cookie accept request 0x2aaab0133ea0
    2013-06-07 12:19:37.384: [    CSSD][1085888832]clssscevtypSHRCON: getting client with cmproc 0x2aaab0133ea0
    2013-06-07 12:19:37.384: [    CSSD][1085888832]clssgmRegisterClient: proc(69/0x2aaab0133ea0), client(1/0x2aaab010c5c0)
    2013-06-07 12:19:37.385: [    CSSD][1085888832]clssgmRegisterShared: grp DBODSDB, mbr 0, type 1
    2013-06-07 12:19:37.385: [    CSSD][1085888832]clssgmQueueShare: (0x2aaab0085790) target global grock DBODSDB member 0 type 1 queued from client (0x2aaab010c5c0), global grock DBODSDB, refcount 23
    2013-06-07 12:19:37.385: [    CSSD][1085888832]clssgmRegisterShared: global grock DBODSDB member 0 share type 1, refcount 23
    2013-06-07 12:19:37.391: [    CSSD][1085888832]clssscSelect: cookie accept request 0x2aaab0133ea0
    2013-06-07 12:19:37.391: [    CSSD][1085888832]clssscevtypSHRCON: getting client with cmproc 0x2aaab0133ea0
    2013-06-07 12:19:37.391: [    CSSD][1085888832]clssgmRegisterClient: proc(69/0x2aaab0133ea0), client(2/0x2aaab0061f10)
    what is the problem
    Edited by: 徐振富 on 2013-6-7 下午6:38
    Edited by: 徐振富 on 2013-6-7 下午6:45

    is your ASM instance up?
    If not, trying bring up ASM instance up just by itself and see if it throws any error?
    Post status of crsctl status cluster -all

  • Private Interconnect: Should any nodes other than RAC nodes have one?

    The contractors that set up our four-node production 10g RAC (and a standalone development server) also assigned private interconnect addresses to 2 Apache/ApEx servers and a standalone development database server.
    There are service names in the tnsnames.ora on all servers in our infrastructure referencing these private interconnects- even the non-rac member servers. The nics on these servers are not bound for failover with the nics bound to the public/VIP addresses. These nics are isolated on their own switch.
    Could this configuration be related to lost heartbeats or voting disk errors? We experience rac node expulsions and even arbitrary bounces (reboots!) of all the rac nodes.

    I do not have access to the contractors. . . .can only look at what they have left behind and try to figure out their intention. . .
    I am reading the Ault/Tumha book Oracle 10g Grid and Real Application Clusters and looking through our own settings and config files and learning srvctl and crsctl commands from their examples. Also googling and OTN searching through the library full of documentation. . .
    I still have yet to figure out if the private interconnect spoken about so frequently in cluster configuration documents are the binding to the set of node.vip address specifications in the tnsnames.ora (bound the the first eth adaptor along with the public ip addresses for the nodes) or the binding on the second eth adaptor to the node.prv addresses not found in the local pfile, in the tnsnames.ora, or the listener.ora (but found at the operating system level in the ifconfig). If the node.prv addresses are not the private interconnect then can anyone tell me that they are for?

Maybe you are looking for