Unable to bring up crs on rac node 2 after server reboot

Hi,
We have a 2 node rac architecture. We are only able to bring up Node 1 on the cluster, whereas node 2 is failing. Here are some points::
1. After the server reboot, node 2 crs crs/resources weren't starting up apart from OHAS.
2. We again stopped both the CRS and tried bringing up CRS on node 2 initially and succeeded. But now node 1 wasn't coming up.
3. Again brought down both nodes' CRS and tried bringing up CRS on node1 and succeded but asm wasn't showing the Diskgroups. So we changed pfile to include asm_diskstring from ORCL* to /dev/oracleasm/disks and we could lsdg in asm now.So started all the instances from node 1 now. Apart from this, again node 2 CRS wasn't starting. From alertlog I saw "CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds;". But we were able to query voting disks initially. What has gone wrong now??
./crsctl status res -t -init
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
Cluster Resources
ora.asm
      1        OFFLINE OFFLINE
ora.cluster_interconnect.haip
      1        ONLINE  OFFLINE
ora.crf
      1        ONLINE  ONLINE       kusmnd0r
ora.crsd
      1        ONLINE  OFFLINE
ora.cssd
      1        ONLINE  OFFLINE                               STARTING
ora.cssdmonitor
      1        ONLINE  ONLINE       kusmnd0r
ora.ctssd
      1        ONLINE  OFFLINE
ora.diskmon
      1        OFFLINE OFFLINE
ora.evmd
      1        ONLINE  OFFLINE
ora.gipcd
      1        ONLINE  ONLINE       kusmnd0r
ora.gpnpd
      1        ONLINE  ONLINE       kusmnd0r
ora.mdnsd
      1        ONLINE  ONLINE       kusmnd0r
This is the history of activitites. Kindly someone throw light on this please.
Thanks,
Anirban.

It is on a raw device.
Healthy node::
ls -ltrh /dev/vote*
brw-rw---- 1 crsdwqa dbadwqa 120, 1057 Nov  6 11:32 /dev/vote3
brw-rw---- 1 crsdwqa dbadwqa 120, 1025 Nov  6 11:32 /dev/vote1
brw-rw---- 1 crsdwqa dbadwqa 120, 1041 Nov  6 11:32 /dev/vote2
Affected Node::
ls -ltrh /dev/vote*
brw-rw-r-- 1 crsdwqa dbadwqa 120, 1025 Nov  4 12:06 /dev/vote1
brw-rw-r-- 1 crsdwqa dbadwqa 120, 1041 Nov  4 12:06 /dev/vote2
brw-rw-r-- 1 crsdwqa dbadwqa 120, 1057 Nov  5 04:42 /dev/vote3
Regards,
Anirban.

Similar Messages

  • Unable to bring database online on 2nd node in FailSafe and MSCS config

    I have a two node cluster running Windows 2003 Server, MSCS, and Oracle Failsafe. I have been unable to bring the database online on Node2. All resources in Cluster Administrator fail over fine and come online except for the Database resource itself. The shared disks, listener, etc, are all online. All resources come online on Node1. When I failover to Node2, the Database Resource fails and posts this error in the Event Viewer:
    Oracle Fail Safe encountered an error starting resource DISS.
    ORA-01100: database already mounted
    The database is not mounted and the database windows service is not running (is set to manual). I can manually start the database thru Windows Services and access it fine. But still unable to bring it online via Cluster Administrator or Oracle Fail Safe Manager.
    Any help would be greatly appreciated.
    Thanks in advance!

    The only message in the event viewer is the ORA-01100 I originally posted.
    ORACLE_HOME is set correctly.
    Here are the contents of the alert.log:
    Fri Sep 04 13:06:21 2009
    Starting ORACLE instance (normal)
    LICENSE_MAX_SESSION = 0
    LICENSE_SESSIONS_WARNING = 0
    Picked latch-free SCN scheme 2
    Using LOG_ARCHIVE_DEST_10 parameter default value as USE_DB_RECOVERY_FILE_DEST
    Autotune of undo retention is turned on.
    IMODE=BR
    ILAT =18
    LICENSE_MAX_USERS = 0
    SYS auditing is disabled
    Starting up ORACLE RDBMS Version: 11.1.0.7.0.
    Using parameter settings in server-side spfile C:\ORACLE\PRODUCT\11.1.0\DB_1\DATABASE\SPFILEDISS.ORA
    System parameters with non-default values:
    processes = 150
    memory_target = 820M
    control_files = "O:\ORACLE\ORADATA\DISS\CONTROL01.CTL"
    control_files = "O:\ORACLE\ORADATA\DISS\CONTROL02.CTL"
    control_files = "O:\ORACLE\ORADATA\DISS\CONTROL03.CTL"
    db_block_size = 8192
    compatible = "11.1.0.0.0"
    db_recovery_file_dest = "C:\Oracle\flash_recovery_area"
    db_recovery_file_dest_size= 2G
    undo_tablespace = "UNDOTBS1"
    remote_login_passwordfile= "EXCLUSIVE"
    db_domain = ""
    dispatchers = "(PROTOCOL=TCP) (SERVICE=DISSXDB)"
    audit_file_dest = "C:\ORACLE\ADMIN\DISS\ADUMP"
    audit_trail = "DB"
    db_name = "DISS"
    open_cursors = 300
    diagnostic_dest = "C:\ORACLE"
    Fri Sep 04 13:06:22 2009
    PMON started with pid=2, OS id=2892
    Fri Sep 04 13:06:22 2009
    VKTM started with pid=3, OS id=3932 at elevated priority
    VKTM running at (20)ms precision
    Fri Sep 04 13:06:22 2009
    DIAG started with pid=4, OS id=1280
    Fri Sep 04 13:06:22 2009
    DBRM started with pid=5, OS id=1084
    Fri Sep 04 13:06:22 2009
    PSP0 started with pid=6, OS id=648
    Fri Sep 04 13:06:22 2009
    DIA0 started with pid=7, OS id=1792
    Fri Sep 04 13:06:22 2009
    MMAN started with pid=8, OS id=3172
    Fri Sep 04 13:06:22 2009
    DBW0 started with pid=9, OS id=3740
    Fri Sep 04 13:06:22 2009
    LGWR started with pid=10, OS id=524
    Fri Sep 04 13:06:22 2009
    CKPT started with pid=11, OS id=2020
    Fri Sep 04 13:06:22 2009
    SMON started with pid=12, OS id=1068
    Fri Sep 04 13:06:22 2009
    RECO started with pid=13, OS id=3636
    Fri Sep 04 13:06:22 2009
    MMON started with pid=14, OS id=3512
    starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
    starting up 1 shared server(s) ...
    ORACLE_BASE from environment = C:\Oracle
    Fri Sep 04 13:06:22 2009
    alter database mount exclusive
    Fri Sep 04 13:06:22 2009
    MMNL started with pid=15, OS id=2996
    Setting recovery target incarnation to 2
    Successful mount of redo thread 1, with mount id 739001231
    Database mounted in Exclusive Mode
    Lost write protection disabled
    Completed: alter database mount exclusive
    alter database open
    Beginning crash recovery of 1 threads
    Started redo scan
    Completed redo scan
    295 redo blocks read, 99 data blocks need recovery
    Started redo application at
    Thread 1: logseq 134, block 176
    Recovery of Online Redo Log: Thread 1 Group 2 Seq 134 Reading mem 0
    Mem# 0: O:\ORACLE\ORADATA\DISS\REDO02.LOG
    Completed redo application of 0.13MB
    Completed crash recovery at
    Thread 1: logseq 134, block 471, scn 3797812
    99 data blocks read, 99 data blocks written, 295 redo blocks read
    Thread 1 advanced to log sequence 135 (thread open)
    Thread 1 opened at log sequence 135
    Current log# 3 seq# 135 mem# 0: O:\ORACLE\ORADATA\DISS\REDO03.LOG
    Successful open of redo thread 1
    MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
    SMON: enabling cache recovery
    Successfully onlined Undo Tablespace 2.
    Verifying file header compatibility for 11g tablespace encryption..
    Verifying 11g file header compatibility for tablespace encryption completed
    SMON: enabling tx recovery
    Database Characterset is WE8MSWIN1252
    Opening with internal Resource Manager plan
    Starting background process FBDA
    Fri Sep 04 13:06:29 2009
    FBDA started with pid=19, OS id=4004
    replication_dependency_tracking turned off (no async multimaster replication found)
    Starting background process QMNC
    Fri Sep 04 13:06:29 2009
    QMNC started with pid=20, OS id=1132
    db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a
    user-specified limit on the amount of space that will be used by this
    database for recovery-related files, and does not reflect the amount of
    space available in the underlying filesystem or ASM diskgroup.
    Fri Sep 04 13:06:35 2009
    Starting background process CJQ0
    Fri Sep 04 13:06:35 2009
    CJQ0 started with pid=21, OS id=2864
    Fri Sep 04 13:06:35 2009
    Completed: alter database open
    Fri Sep 04 13:06:42 2009
    Starting ORACLE instance (restrict)
    alter database "DISS" mount exclusive
    ORA-1100 signalled during: alter database "DISS" mount exclusive...
    Stopping background process FBDA
    Shutting down instance: further logons disabled
    Stopping background process QMNC
    Stopping background process CJQ0
    Stopping background process MMNL
    Stopping background process MMON
    Shutting down instance (immediate)
    License high water mark = 3
    Waiting for dispatcher 'D000' to shutdown
    All dispatchers and shared servers shutdown

  • Node Manager not starting servicess after server reboot

    We have WLS 10.1.3.5 with IDM/OAM(two domains in same server as its TEST environment) on Oracle Linux 5(64bit),when ever server reboots node manger is not starting managed servers.
    I have enabled CrashRecoveryEnabled=true
    Here is nodemanager.properties.
    [oracle@oam nodemanager]$ cat nodemanager.properties
    #Changed NM Listen Port
    #Tue Mar 05 03:12:20 IST 2013
    DomainsFile=/u01/Oracle/Middleware/wlserver_10.3/common/nodemanager/nodemanager.domains
    LogLimit=0
    DomainsDirRemoteSharingEnabled=false
    PropertiesVersion=10.3
    AuthenticationEnabled=true
    NodeManagerHome=/u01/Oracle/Middleware/wlserver_10.3/common/nodemanager
    javaHome=/u01/Oracle/Middleware/jdk160_24
    JavaHome=/u01/Oracle/Middleware/jdk160_24/jre
    LogLevel=INFO
    DomainsFileEnabled=true
    StartScriptName=startWebLogic.sh
    ListenAddress=
    NativeVersionEnabled=true
    ListenPort=5556
    LogToStderr=true
    SecureListener=false
    LogCount=1
    StopScriptEnabled=false
    DomainRegistrationEnabled=false
    QuitEnabled=false
    LogAppend=true
    StateCheckInterval=500
    CrashRecoveryEnabled=true
    StartScriptEnabled=true
    LogFile=/u01/Oracle/Middleware/wlserver_10.3/common/nodemanager/nodemanager.log
    LogFormatter=weblogic.nodemanager.server.LogFormatter
    ListenBacklog=50
    here is my domain information in server.
    =========================
    [oracle@oam nodemanager]$ cat nodemanager.domains
    #Domains and directories created by Configuration Wizard
    #Tue Mar 05 05:24:37 IST 2013
    OAMDomain=/u01/Oracle/Middleware/user_projects/domains/OAMDomain
    IDMDomain=/u01/Oracle/Middleware/user_projects/domains/IDMDomain

    Hi,
    Entries in nodemanager.properties looks fine, but how are you testing this functionality?
    If you will stop servers monitored by nodemanager, stop nodemanager and then reboot physical server, on startup if nodemanager is configured as daemon it will start nm process but will do nothing to managed server process.
    To test "CrashRecoveryEnabled=true" kill the managed server pid and then stop/kill nodemanager process. Then perform physical server reboot, now if nm is configured as daemon process server reboot will start nodemanager. Once NM is up first thing it check is the server it was monitoring earlier and if under directory $WL_HOME/servers/<server>/data/nodemanager it finds "pid" and "lck" file it checks if that pid exists and if not then go for a restart of weblogic server instance.
    This has worked for me...!
    Thanks,
    Ranjan
    http://www.middlewaresupport.wordpress.com

  • Unable to bring up ASM on 2nd node of a 2-node Cluster

    Having a very wierd problem on a 2-node cluster. I can only bring up on ASM instance at a time. If i bring up the second, it hangs. This is what the second (hung) instance puts in the alert log:
    Starting ORACLE instance (normal)
    LICENSE_MAX_SESSION = 0
    LICENSE_SESSIONS_WARNING = 0
    Picked latch-free SCN scheme 3
    Using LOG_ARCHIVE_DEST_1 parameter default value as /ORAUTL/oraasm/product/ASM/dbs/arch
    Autotune of undo retention is turned off.
    LICENSE_MAX_USERS = 0
    SYS auditing is disabled
    ksdpec: called for event 13740 prior to event group initialization
    Starting up ORACLE RDBMS Version: 10.2.0.3.0.
    System parameters with non-default values:
    large_pool_size = 12582912
    instance_type = asm
    cluster_interconnects = 192.168.0.12
    cluster_database = TRUE
    instance_number = 2
    remote_login_passwordfile= EXCLUSIVE
    background_dump_dest = /ORAUTL/oraasm/admin/+ASM2/bdump
    user_dump_dest = /ORAUTL/oraasm/admin/+ASM2/udump
    core_dump_dest = /ORAUTL/oraasm/admin/+ASM2/cdump
    pga_aggregate_target = 0
    Cluster communication is configured to use the following interface(s) for this instance
    192.168.0.12
    Fri Nov 21 21:10:48 2008
    cluster interconnect IPC version:Oracle UDP/IP (generic)
    IPC Vendor 1 proto 2
    PMON started with pid=2, OS id=5428
    DIAG started with pid=3, OS id=5430
    PSP0 started with pid=4, OS id=5432
    LMON started with pid=5, OS id=5434
    LMD0 started with pid=6, OS id=5436
    LMS0 started with pid=7, OS id=5438
    MMAN started with pid=8, OS id=5442
    DBW0 started with pid=9, OS id=5444
    LGWR started with pid=10, OS id=5446
    CKPT started with pid=11, OS id=5448
    SMON started with pid=12, OS id=5458
    RBAL started with pid=13, OS id=5475
    GMON started with pid=14, OS id=5487
    Fri Nov 21 21:10:49 2008
    lmon registered with NM - instance id 2 (internal mem no 1)
    Fri Nov 21 21:10:49 2008
    Reconfiguration started (old inc 0, new inc 2)
    ASM instance
    List of nodes:
    0 1
    Global Resource Directory frozen
    Communication channels reestablished
    After this it hangs. i've checked everything. CRS is fine.
    I suspect its the kernel revision. This is a cluster of two v890's. Kernel rev is 127127-11. Anyone seen this issue ?
    thanks

    Responses in-line:
    Have you got any issue reported from Lock Monitor's (LMON) ? (those messages are in the alert.log are summaries of the reconfiguration event.
    No issues that I have seen. I see trc files on both nodes for lmon, but neither contain errors.Do you have any post issues on the date that issue began (something with Reconfiguration started) ?
    This is a new build. Its going to be a DR environment (Dataguard Physical Standby), so we've never managed to get ASM up yet.Do you have any other errors on the second node on the date the issue appears (some ORA-27041 or other messages) errors?
    No errors at all.What is the result of a crs_stat -t ?
    HA Resource Target State
    ora.vzdfwsdbp01.LISTENER_VZDFWSDBP01.lsnr ONLINE ONLINE on vzdfwsdbp01
    ora.vzdfwsdbp01.gsd ONLINE ONLINE on vzdfwsdbp01
    ora.vzdfwsdbp01.ons ONLINE ONLINE on vzdfwsdbp01
    ora.vzdfwsdbp01.vip ONLINE ONLINE on vzdfwsdbp01
    ora.vzdfwsdbp02.LISTENER_VZDFWSDBP02.lsnr ONLINE ONLINE on vzdfwsdbp02
    ora.vzdfwsdbp02.gsd ONLINE ONLINE on vzdfwsdbp02
    ora.vzdfwsdbp02.ons ONLINE ONLINE on vzdfwsdbp02
    ora.vzdfwsdbp02.vip ONLINE ONLINE on vzdfwsdbp02
    ASM isn't registered with CRS/OCR yet. I did add it at one time, but it didnt seem to make any difference.What is the release of your installation 10.2.0.4? Otherwise control if you can upgrade CRS, ASM and your RDBMS to that release.
    CRS, ASM and Oracle will be 10.2.0.3Can't go to 10.2.0.4 yet as primary site is at 10.2.0.3 on a live system.
    Can you please tell us what is the OS / Hardware in use?
    Solaris 10, Sun v890$ uname -a
    SunOS dbp02 5.10 Generic_127127-11 sun4u sparc SUNW,Sun-Fire-V890
    What is the result of that on the second node:
    even a startup nomount hangs on second node.connect sqlplus / as sysdba;
    startup nomount
    desc v$asmdiskgroup;
    select name, mount from v$diskgroup;
    In the case that no group is mounted do
    alter database mount diskgroup 'your diskgroupname';
    What is the result of that?
    thanks
    -toby

  • Unable to log in as a regular user after a reboot

    Hello Everybody,
    I have a SPARC SunBlade 2000 with 2 GB of memory running Solaris 10 with the following configuration:
    root@auyantepui # cat /etc/release
                           Solaris 10 5/09 s10s_u7wos_08 SPARC
               Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
                            Use is subject to license terms.
                                 Assembled 30 March 2009
    root@auyantepui # uname -a
    SunOS auyantepui 5.10 Generic_139555-08 sun4u sparc SUNW,Sun-Blade-1000
    root@auyantepui # I had an instance of Oracle 11g running without too much activity and I was just finishing installing Weblogic 10 Application Server in the same system,
    when the system got extremely slow that became pretty much unresponsive.
    As soon as I could in a terminal window I typed init 5 to bring the system down.
    Since I brought the system up I have been unable to log in as me user "morillo" or user "oracle".
    I can only log into the system as user "root".
    root@auyantepui # su - morillo
    su: No shell
    root@auyantepui #
    root@auyantepui #
    root@auyantepui #
    root@auyantepui # su - oracle
    su: No shell
    root@auyantepui #Obviously I have a shell for both users in /etc/passwd
    root@auyantepui # cat /etc/passwd
    root:x:0:0:Super-User:/:/sbin/sh
    daemon:x:1:1::/:
    bin:x:2:2::/usr/bin:
    sys:x:3:3::/:
    adm:x:4:4:Admin:/var/adm:
    lp:x:71:8:Line Printer Admin:/usr/spool/lp:
    uucp:x:5:5:uucp Admin:/usr/lib/uucp:
    nuucp:x:9:9:uucp Admin:/var/spool/uucppublic:/usr/lib/uucp/uucico
    smmsp:x:25:25:SendMail Message Submission Program:/:
    listen:x:37:4:Network Admin:/usr/net/nls:
    gdm:x:50:50:GDM Reserved UID:/:
    webservd:x:80:80:WebServer Reserved UID:/:
    postgres:x:90:90:PostgreSQL Reserved UID:/:/usr/bin/pfksh
    svctag:x:95:12:Service Tag UID:/:
    nobody:x:60001:60001:NFS Anonymous Access User:/:
    noaccess:x:60002:60002:No Access User:/:
    nobody4:x:65534:65534:SunOS 4.x NFS Anonymous Access User:/:
    morillo:x:33353:10:Carlos A. Morillo:/home/morillo:/bin/csh
    oracle:x:100:102:Oracle DBA:/auyantepui/oracle:/bin/csh
    apache:x:101:101:Oracle Apache:/auyantepui/apache:/bin/csh
    root@auyantepui #      
    root@auyantepui #
    root@auyantepui #
    root@auyantepui # ls -l /bin/csh
    -r-xr-xr-x   2 root     bin       151456 Aug  8  2006 /bin/csh
    root@auyantepui # file /bin/csh
    /bin/csh:       ELF 32-bit MSB executable SPARC Version 1, dynamically linked, stripped
    root@auyantepui # Only including the last part of the truss output:
    1557:   fstat64(4, 0xFFBFE1D0)                          = 0
    1557:   ioctl(4, TCGETA, 0xFFBFE2B4)                    Err#25 ENOTTY
    1557:   write(4, " S U   0 7 / 1 2   1 1 :".., 36)      = 36
    1557:   close(4)                                        = 0
    1557:   setgid(10)                                      = 0
    1557:   sysconfig(_CONFIG_NGROUPS)                      = 16
    1557:   open("/etc/default/nss", O_RDONLY|O_LARGEFILE)  = 4
    1557:   fcntl(4, F_DUPFD, 0x00000100)                   Err#22 EINVAL
    1557:   read(4, " #   i d e n t\t " @ ( #".., 1024)     = 749
    1557:   read(4, 0xFF342400, 1024)                       = 0
    1557:   close(4)                                        = 0
    1557:   getuid()                                        = 0 [0]
    1557:   getuid()                                        = 0 [0]
    1557:   door_info(3, 0xFFBFE9B8)                        = 0
    1557:   door_call(3, 0xFFBFEA60)                        = 0
    1557:   setgroups(2, 0x0002C248)                        = 0
    1557:   setuid(33353)                                   = 0
    1557:   chdir("/home/morillo")                          = 0
    1557:   munmap(0xFF060000, 4458)                        = 0
    1557:   munmap(0xFF072000, 544)                         = 0
    1557:   munmap(0xFF010000, 11897)                       = 0
    1557:   munmap(0xFF024000, 948)                         = 0
    1557:   munmap(0xFEFF0000, 10532)                       = 0
    1557:   munmap(0xFF004000, 1298)                        = 0
    1557:   munmap(0xFEFD0000, 14306)                       = 0
    1557:   munmap(0xFEFE4000, 1494)                        = 0
    1557:   munmap(0xFEFA0000, 129457)                      = 0
    1557:   munmap(0xFEFC0000, 7432)                        = 0
    1557:   munmap(0xFEF80000, 13029)                       = 0
    1557:   munmap(0xFEF94000, 1592)                        = 0
    1557:   munmap(0xFEF50000, 125737)                      = 0
    1557:   munmap(0xFEF70000, 4824)                        = 0
    1557:   munmap(0xFEF30000, 38927)                       = 0
    1557:   munmap(0xFEF4A000, 1860)                        = 0
    1557:   munmap(0xFEF10000, 4760)                        = 0
    1557:   munmap(0xFEF22000, 988)                         = 0
    1557:   munmap(0xFF030000, 54114)                       = 0
    1557:   munmap(0xFF04E000, 5104)                        = 0
    1557:   sigaction(SIGXCPU, 0xFFBFF048, 0xFFBFF0E8)      = 0
    1557:   sigaction(SIGXFSZ, 0xFFBFF048, 0xFFBFF0E8)      = 0
    1557:   execve("/bin/csh", 0xFFBFF118, 0x000268A0)      Err#13 EACCES [file_dac_search]
    1557:   fstat64(2, 0xFFBFE1E8)                          = 0
    su: 1557:       write(2, " s u :  ", 4)                         = 4
    No shell1557:   write(2, " N o   s h e l l", 8)                 = 8
    1557:   write(2, "\n", 1)                               = 1
    1557:   _exit(3)
    auyantepui#
    {code}
    Checking the Err#13 EACCES [file_dac_search] in /usr/include/sys/errno.h I have
    {code}errno.h:#define EACCES  13      /* Permission denied                    */{code}
    I suspect file_dac_search has to do with process privileges and the "Least Privilege" Security framework
    introduced with Solaris 10.
    Any ideas? Suggestions? Recommendations how to fix this?
    There has to be some way to restore the default privileges I had before the reboot.
    I guess something got corrupted during the reboot.
    Thanks in advance,
    Carlos.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               

    >
    Just as an FYI OpenSolaris build 134 uses the automounter to mount /export/home to /home and /home is actually being written to /etc/passwd. So if you're still in the habit of telling Linux admins to disable the automounter please stop.
    [http://opensolaris.org/jive/thread.jspa?messageID=482048&#482048|http://opensolaris.org/jive/thread.jspa?messageID=482048&#482048]
    alanNever was in that habit. My automounter is not disabled and is configured as such. I don't think it was that way out of the box, however. Dunno about OpenSolaris. The SMC, whenever you make changes to a user account, still overwrites your home directory of "/export/home/userid" with "/home/userid", and this will prevent you from logging in. If you use the SMC you need to go back and change the home directory; I don't know if they created patches in the past year to correct this since I quit using it, that being one of the reasons.

  • Unable to run Fortune sample after server reboot

    Hi,
    I have installed Application server 6.5 on Solaris 9. After installation, I am able to run the fortune sample to verify success installation.
    However, after server reboot, I am not able to drun the fortune sample again. Access log on Iplanet web server shows error code 504. Can anyone help me?
    Thanks.

    The above steps resolved the problem - This is posted in case someone else suffers from the same issue.

  • RAC node outage causes SOA Suite 10.1.3.4 BPEL  failure

    Using weblogic 9.2 and the SOA Suite 10.1.3.4. We use a 10g Oracle RAC ( 2 nodes ); the WL cluster has a multi data source of 2 pools, each pool pointing to a single node in the rac, each pool deployed to the cluster, and the multi data source in load-balancing mode.
    So the other night, one of the db nodes had a hardware failure ( ironically, with a remote monitoring / management card ). Annoying, but it should not have caused the BPEL servers to be in "FAILED NOT RESTARTABLE" status the next morning.
    Jun 9, 2009 12:10:07 AM EDT> <Warning> <JDBC> <BEA-001129> <Received exception while creating connection for pool "esbaqds2": Io exception: The Network Adapter could not establish the connection>
    SEVERE: Destroying JMSDequeuer failed
    oracle.jms.AQjmsException: Connection has been administratively destroyed. Reconnect.
    at oracle.jms.AQjmsSession.preClose(AQjmsSession.java:980)
    at oracle.jms.AQjmsObject.close(AQjmsObject.java:409)
    at oracle.jms.AQjmsSession.close(AQjmsSession.java:1020)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.destroy(JMSDequeuer.java:419)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.destroyWithoutUnsubscribing(JMSDequeuer.java:395)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:175)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
    at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
    at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
    at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
    at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
    followed by a 2 GB log file containing 1.3 million iterations of the following within the next 10 minutes before the managed servers failed.
    java.lang.NullPointerException
    at java.lang.String.<init>(String.java:144)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:168)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
    at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
    at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
    at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
    at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
    Both managed instances of the BPEL cluster failed, even though the 1st node of the Oracle RAC was still available.
    Our 10.3 cluster, also using multi data sources to the same RAC for the OSB components, simply went on about its business using the remaining rac node pool.
    Seems to be a single point of failure...

    We haven't changed the JDBC connection string yet, but we did run a test in the same environment while Oracle support considers the situation.
    For the test, we simply shutdown one node of the RAC and watched to see what happens. Within the space of a minute, the JDBC "Failed Reserve Request Count" was increasing by thousands on every refresh of the screen. We restarted the RAC node after 5 minutes, by which time the "Failed Reserve Request Count" was over 190,000
    The 2 BPEL managed servers remained in Running status and each created a 660 MB log file within that 5 minutes. In the original outage, the nodes were down for about 15 minutes. Most of the logging is being generated from within the oracle.tip.esb classes, not by the weblogic classes. It looks like that once the pool pointing to the downed RAC node becomes disabled, the Oracle BPEL code is still trying to use it even though the multi-source JNDI is the published lookup:
    INFO: JMSDequeuer::createConnection - AQ Topics
    java.sql.SQLException: weblogic.common.ResourceException:
    esbaqds(esbaqds2): Pool esbaqds2 is disabled, cannot allocate resources to applications..
    esbaqds(esbaqds1): Pool esbaqds1 is disabled, cannot allocate resources to applications..
    at weblogic.jdbc.common.internal.JDBCUtil.wrapAndThrowResourceException(JDBCUtil.java:250)
    at weblogic.jdbc.common.internal.RmiDataSource.getPoolConnection(RmiDataSource.java:348)
    at weblogic.jdbc.common.internal.RmiDataSource.getConnection(RmiDataSource.java:364)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.createAQConnection(JMSDequeuer.java:559)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:159)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
    at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
    at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
    at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
    at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
    Caused by: weblogic.common.ResourceException:
    esbaqds(esbaqds2): Pool esbaqds2 is disabled, cannot allocate resources to applications..
    esbaqds(esbaqds1): Pool esbaqds1 is disabled, cannot allocate resources to applications..
    at weblogic.jdbc.common.internal.MultiPool.searchLoadBalance(MultiPool.java:331)
    at weblogic.jdbc.common.internal.MultiPool.findPool(MultiPool.java:202)
    at weblogic.jdbc.common.internal.ConnectionPoolManager.reserve(ConnectionPoolManager.java:77)
    at weblogic.jdbc.common.internal.RmiDataSource.getPoolConnection(RmiDataSource.java:346)
    ... 11 more
    SEVERE: Failed to process deferred message
    oracle.tip.esb.server.dispatch.QueueHandlerException: Error creating "weblogic.common.ResourceException: No good connections available."
    at oracle.tip.esb.server.dispatch.JMSDequeuer.createAQConnection(JMSDequeuer.java:661)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:159)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
    at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
    at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
    at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
    at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)

  • Rac node failed how do you bring it back up?

    Example: If there are 3 RAC nodes and one becomes unavailable/fails, how do you bring it back up?

    There are typically two basic reasons why a RAC node will go down.
    A cluster issue, causing the node to be evicted. This usually means the node lost access to the cluster storage (e.g. no access to voting disks), or lost access to the Interconnect.
    An o/s issue causing the node to fail. E.g. a kernel panic due to a page swap and memory not syncing, or a soft CPU lockup, etc.
    You need to determine why it went down, and determine what is needed to enable it to join the successfully cluster again.

  • Root.sh fails to bring up crs

    Hi All,
    I am installing 11gR2 on IBM AIX.
    Every ran fine. when i ran root.sh
    CRS-2676: Start of 'ora.diskmon' on 't24db1' succeeded
    CRS-2676: Start of 'ora.cssd' on 't24db1' succeeded
    CRS-2672: Attempting to start 'ora.ctssd' on 't24db1'
    CRS-2676: Start of 'ora.ctssd' on 't24db1' succeeded
    ASM created and started successfully.
    DiskGroup OCR_VOTE created successfully.
    clscfg: -install mode specified
    Successfully accumulated necessary OCR keys.
    Creating OCR keys for user 'root', privgrp 'system'..
    Operation successful.
    CRS-2672: Attempting to start 'ora.crsd' on 't24db1'
    CRS-2676: Start of 'ora.crsd' on 't24db1' succeeded
    CRS-4256: Updating the profile
    Successful addition of voting disk 70e7b3c678074f2bbfcf933764645975.
    Successful addition of voting disk a85fe5cb0c9a4f88bf5adf830732d62f.
    Successful addition of voting disk 52d343aad6374f79bf0257908be9a6c9.
    Successfully replaced voting disk group with +OCR_VOTE.
    CRS-4256: Updating the profile
    CRS-4266: Voting file(s) successfully replaced
    ## STATE File Universal Id File Name Disk group
    1. ONLINE 70e7b3c678074f2bbfcf933764645975 (/dev/rhdisk10) [OCR_VOTE]
    2. ONLINE a85fe5cb0c9a4f88bf5adf830732d62f (/dev/rhdisk11) [OCR_VOTE]
    3. ONLINE 52d343aad6374f79bf0257908be9a6c9 (/dev/rhdisk5) [OCR_VOTE]
    Located 3 voting disk(s).
    CRS-2673: Attempting to stop 'ora.crsd' on 't24db1'
    CRS-2677: Stop of 'ora.crsd' on 't24db1' succeeded
    CRS-2673: Attempting to stop 'ora.asm' on 't24db1'
    CRS-2677: Stop of 'ora.asm' on 't24db1' succeeded
    CRS-2673: Attempting to stop 'ora.ctssd' on 't24db1'
    CRS-2677: Stop of 'ora.ctssd' on 't24db1' succeeded
    CRS-2673: Attempting to stop 'ora.cssdmonitor' on 't24db1'
    CRS-2677: Stop of 'ora.cssdmonitor' on 't24db1' succeeded
    CRS-2673: Attempting to stop 'ora.cssd' on 't24db1'
    CRS-2677: Stop of 'ora.cssd' on 't24db1' succeeded
    CRS-2673: Attempting to stop 'ora.gpnpd' on 't24db1'
    CRS-2677: Stop of 'ora.gpnpd' on 't24db1' succeeded
    CRS-2673: Attempting to stop 'ora.gipcd' on 't24db1'
    CRS-2677: Stop of 'ora.gipcd' on 't24db1' succeeded
    CRS-2673: Attempting to stop 'ora.mdnsd' on 't24db1'
    CRS-2677: Stop of 'ora.mdnsd' on 't24db1' succeeded
    CRS-2672: Attempting to start 'ora.mdnsd' on 't24db1'
    CRS-2676: Start of 'ora.mdnsd' on 't24db1' succeeded
    CRS-2672: Attempting to start 'ora.gipcd' on 't24db1'
    CRS-2676: Start of 'ora.gipcd' on 't24db1' succeeded
    CRS-2672: Attempting to start 'ora.gpnpd' on 't24db1'
    CRS-2676: Start of 'ora.gpnpd' on 't24db1' succeeded
    CRS-2672: Attempting to start 'ora.cssdmonitor' on 't24db1'
    CRS-2676: Start of 'ora.cssdmonitor' on 't24db1' succeeded
    CRS-2672: Attempting to start 'ora.cssd' on 't24db1'
    CRS-2672: Attempting to start 'ora.diskmon' on 't24db1'
    CRS-2676: Start of 'ora.diskmon' on 't24db1' succeeded
    CRS-2676: Start of 'ora.cssd' on 't24db1' succeeded
    CRS-2672: Attempting to start 'ora.ctssd' on 't24db1'
    CRS-2676: Start of 'ora.ctssd' on 't24db1' succeeded
    CRS-2672: Attempting to start 'ora.asm' on 't24db1'
    CRS-2676: Start of 'ora.asm' on 't24db1' succeeded
    CRS-2672: Attempting to start 'ora.crsd' on 't24db1'
    CRS-2676: Start of 'ora.crsd' on 't24db1' succeeded
    CRS-2672: Attempting to start 'ora.evmd' on 't24db1'
    CRS-2676: Start of 'ora.evmd' on 't24db1' succeeded
    Timed out waiting for the CRS stack to start.
    I can also do the ocrcheck and i am also able to connect to ASM.
    crs process is not coming up.
    I tried to start it manually but no luck.
    alert log
    [crsd(11534476)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:29:50.913
    [crsd(11534476)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:29:52.130
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:29:53.760
    [crsd(11468998)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:29:54.029
    [crsd(11468998)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:29:55.246
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:29:55.246
    [ohasd(6947036)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
    2011-09-21 20:51:59.589
    [crsd(8650916)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:51:59.854
    [crsd(8650916)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:52:00.895
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:52:02.588
    [crsd(9306182)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:52:02.852
    [crsd(9306182)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:52:04.024
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:52:05.693
    [crsd(9306184)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:52:05.961
    [crsd(9306184)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:52:07.174
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:52:08.807
    [crsd(11337884)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:52:09.078
    [crsd(11337884)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:52:10.289
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:52:11.942
    [crsd(11206904)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:52:12.207
    [crsd(11206904)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:52:13.400
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:52:15.031
    [crsd(11337894)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:52:15.302
    [crsd(11337894)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:52:16.521
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:52:18.132
    [crsd(11337896)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:52:18.403
    [crsd(11337896)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:52:19.627
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:52:21.225
    [crsd(9306200)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:52:21.498
    [crsd(9306200)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:52:22.717
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:52:25.895
    [crsd(8650934)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:52:26.176
    [crsd(8650934)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:52:27.369
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:52:29.046
    [crsd(9306206)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:52:29.316
    [crsd(9306206)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:52:30.511
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:52:32.169
    [crsd(11337908)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:52:32.435
    [crsd(11337908)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:52:33.629
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:52:33.630
    [ohasd(6947036)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
    2011-09-21 20:57:11.777
    [ctssd(8388722)]CRS-2405:The Cluster Time Synchronization Service on host t24db1 is shutdown by user
    2011-09-21 20:58:15.374
    [ctssd(8519782)]CRS-2403:The Cluster Time Synchronization Service on host t24db1 is in observer mode.
    2011-09-21 20:58:15.391
    [ctssd(8519782)]CRS-2407:The new Cluster Time Synchronization Service reference node is host t24db1.
    2011-09-21 20:58:16.064
    [ctssd(8519782)]CRS-2401:The Cluster Time Synchronization Service started on host t24db1.
    2011-09-21 20:58:32.499
    [crsd(11862176)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:58:33.649
    [crsd(11862176)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:58:34.450
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:58:36.075
    [crsd(11534462)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:58:36.339
    [crsd(11534462)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:58:37.560
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:58:39.190
    [crsd(9699394)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:58:39.555
    [crsd(9699394)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:58:40.677
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:58:42.311
    [crsd(9699396)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:58:42.586
    [crsd(9699396)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:58:43.790
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:58:45.435
    [crsd(11600058)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:58:45.720
    [crsd(11600058)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:58:46.909
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:58:48.536
    [crsd(11600060)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:58:48.799
    [crsd(11600060)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:58:50.037
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:58:51.670
    [crsd(9699410)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:58:51.940
    [crsd(9699410)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:58:53.151
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:58:56.298
    [crsd(11600072)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:58:56.559
    [crsd(11600072)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:58:57.784
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:58:59.428
    [crsd(9699414)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:58:59.733
    [crsd(9699414)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:59:00.910
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:59:02.522
    [crsd(9699416)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:59:02.798
    [crsd(9699416)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:59:04.030
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:59:05.624
    [crsd(9699418)]CRS-1012:The OCR service started on node t24db1.
    2011-09-21 20:59:05.899
    [crsd(9699418)]CRS-1201:CRSD started on node t24db1.
    2011-09-21 20:59:07.120
    [ohasd(6947036)]CRS-2765:Resource 'ora.crsd' has failed on server 't24db1'.
    2011-09-21 20:59:07.121
    [ohasd(6947036)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
    crsd log
    2011-09-21 20:59:06.013: [ default][1]clsu_get_private_ip_addresses: required buffer number is 2. Buffer number passed in 1 is not enough. Return [8]
    2011-09-21 20:59:06.013: [GIPCXCPT][1] gipcShutdownF: skipping shutdown, count 2, from [ clsinet.c : 1735], ret gipcretSuccess (0)
    2011-09-21 20:59:06.014: [GIPCXCPT][1] gipcShutdownF: skipping shutdown, count 1, from [ clsgpnp0.c : 1021], ret gipcretSuccess (0)
    2011-09-21 20:59:06.015: [  CRSCCL][1]Listening endpoint created sucessfully @ (ADDRESS=(PROTOCOL=tcp)(DEV=52)(HOST=192.168.50.15)(PORT=49390)).con = 116b2c4f0
    2011-09-21 20:59:06.018: [  CRSCCL][10799]CSS Group Registration complete.
    2011-09-21 20:59:06.018: [  CRSCCL][10799]cclGetMemberData called
    2011-09-21 20:59:06.020: [  CRSCCL][10799]Obtained first membership map.
    2011-09-21 20:59:06.020: [  CRSCCL][10799]Dumping member data ------------------
    2011-09-21 20:59:06.020: [  CRSCCL][10799]Member (1, 1411327295) on node port=.
    2011-09-21 20:59:06.020: [  CRSCCL][10799]Done ------------------
    2011-09-21 20:59:06.020: [  CRSCCL][10799]Waiting for reconfigs
    2011-09-21 20:59:06.020: [  CRSCCL][11056]cclCommunicationHandler started.
    2011-09-21 20:59:06.020: [  CRSCCL][1]cclLibShutdown called
    2011-09-21 20:59:06.120: [  CRSCCL][10799]CCL Shutting down.
    2011-09-21 20:59:06.122: [  CRSCCL][10799]CSS Group Unregister complete.
    2011-09-21 20:59:06.122: [  CRSCCL][11056]Comunications Shutdown.
    2011-09-21 20:59:06.123: [  CRSCCL][10799]Membership Monitor exiting ...
    2011-09-21 20:59:06.123: [  CRSCCL][1]Clsc shutting down
    2011-09-21 20:59:06.123: [  CRSCCL][1]Clsc server shutdown
    2011-09-21 20:59:06.123: [ COMMCRS][1]clscugblmterm: (1158d7970) cleaning up icon (116b2bef0) with 1 cons
    2011-09-21 20:59:06.123: [ COMMCRS][1]clscugblmterm: (1158d7970) cleaning up open icon (116b2bef0)
    2011-09-21 20:59:06.123: [  CRSCCL][1]Clsc UGBLM shutdown
    2011-09-21 20:59:06.123: [  CRSCCL][1]Clsc shutdown Done
    2011-09-21 20:59:06.123: [  CRSCCL][1]ccllibShutdown done.
    2011-09-21 20:59:06.123: [CLSFRAME][1] clsCclInit returned failure:4
    2011-09-21 20:59:06.123: [CLSFRAME][1] Unable to start module-to-module comms: 2
    2011-09-21 20:59:06.123: [    CRSD][1][PANIC] CRSD exiting: unable to start CLS framework
    2011-09-21 20:59:06.123: [    CRSD][1] Done.
    Please assist me on this.
    Thanks and Regards,
    Daniesh

    1. try to give root capabilities with CAP_NUMA_ATTACH, CAP_BYPASS_RAC_VMM, and CAP_PROPAGATE
    2. install 11.2.0.2 version
    I solved the problem rootsh fails to bring up crs.

  • What would happened when one RAC node's public NIC down ?

    Dear all,
    There's a two-node RAC on my office. As my observation, when one RAC node's public NIC is down, the "crs_stat -t" command wouldn't show any resource is OFFLINE. So at this moment if i use sqlplus to connect to my RAC db, it still will choose node1 or node2 instance randomly right? And if i've been assigned to the node's instance whose public NIC is down, my sqlplus would failed to connect to db?
    So, can we say that Oracle RAC can't supply HA function if one node's public NIC is down? Or is there just any other solution to solve this issue? Any suggestion would be appreciated.

    Public node is down means that only one node would be able to take the load i.e. in your case it will be Node 2. It may happen that CRS is unable to record the status - in such case there may be failed connections to one node.

  • Dbca doesnt show complete list of RAC nodes

    Hello,
    While using dbca to create a Rac database spread over two nodes out of a 4 node cluster, the dbca lists only 3 nodes to be chosen. Its not showing the 4th node srv1053 which is also in same cluster like the other 3. Our Crs is 10204 and the db home where I am going to create is also 10204.
    Below is an excerpt from the dbca log:
    [AWT-EventQueue-0] [5:5:19:540] [GetActiveNodes.create:221] Returning an existing instance of GetActiveNodes
    [AWT-EventQueue-0] [5:5:19:540] [Cluster.verifyNodeList:954] clusterNodes[0]=srv1052
    [AWT-EventQueue-0] [5:5:19:541] [Cluster.verifyNodeList:954] clusterNodes[1]=srv1053
    [AWT-EventQueue-0] [5:5:19:541] [Cluster.verifyNodeList:954] clusterNodes[2]=srv1050
    [AWT-EventQueue-0] [5:5:19:541] [Cluster.verifyNodeList:954] clusterNodes[3]=srv1051
    [AWT-EventQueue-0] [5:5:19:541] [Cluster.verifyNodeList:959] nodeList[0]=srv1050
    [AWT-EventQueue-0] [5:5:19:542] [Cluster.verifyNodeList:959] nodeList[1]=srv1051
    [AWT-EventQueue-0] [5:5:19:542] [Cluster.verifyNodeList:959] nodeList[2]=srv1052
    I've compared the /etc/hosts file and the o/p of olsnodes -n -p -i command and its the same for all 4 nodes. Even the /etc/ocfs2/cluster.conf file is also the same for all 4 nodes. I guess I'm missing something silly but unable to pin it.
    Thanks,
    M
    Edited by: Mmubeen on Oct 26, 2009 7:37 AM

    Hello Chandra,
    I guess you hit the jackpot ;) Entries for crs and the 10204 home are as follows:
    <HOME NAME="CRS_10203" LOC="/u01/crs/oracle/product/10.2.0.3/crs_1" TYPE="O" IDX="1" CRS="true">
    <NODE_LIST>
    <NODE NAME="srvr1012"/>
    <NODE NAME="srvr1013"/>
    <NODE NAME="srvr1014"/>
    <NODE NAME="srvr1052"/>
    <NODE NAME="srvr1053"/>
    <NODE NAME="srvr1050"/>
    <NODE NAME="srvr1051"/>
    </NODE_LIST>
    <HOME NAME="10204_oraclehome" LOC="/u01/app/oracle/product/10.2.0.4/dbee_1" TYPE="O" IDX="4">
    <NODE_LIST>
    <NODE NAME="srvr1050"/>
    <NODE NAME="srvr1051"/>
    <NODE NAME="srvr1052"/>
    </NODE_LIST>
    </HOME>
    srvr1053 is missing here in this list...So I guess it must be because of this that its not showing up, isnt it? if yes can we manually add the entry under 10204 home and also remove the obsolete srvr1014,13,12 entries from crs home ( since they r no longer in the same cluster) ?

  • Rac node restart

    Hello everyone,
    I have met an error,that is our RAC node auto restart with below messages.
    #/u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/alert_odsdb1.log
    Fri Jun 07 12:23:42 2013
    Thread 1 cannot allocate new log, sequence 58363
    Checkpoint not complete
    Current log# 2 seq# 58362 mem# 0: +DATA/odsdb/onlinelog/group_2.265.812288839
    Current log# 2 seq# 58362 mem# 1: +DATA/odsdb/onlinelog/group_2.266.812288839
    Fri Jun 07 12:23:42 2013
    NOTE: ASMB terminating
    Errors in file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc:
    ORA-15064: ? ASM ??????
    ORA-03113: ?????????
    ?? ID:
    ?? ID: 2047 ???: 5
    Errors in file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc:
    ORA-15064: ? ASM ??????
    ORA-03113: ?????????
    ?? ID:
    ?? ID: 2047 ???: 5
    ASMB (ospid: 32641): terminating the instance due to error 15064
    Fri Jun 07 12:23:44 2013
    ORA-1092 : opitsk aborting process
    Fri Jun 07 12:23:46 2013
    ORA-1092 : opitsk aborting process
    Instance terminated by ASMB, pid = 32641
    Fri Jun 07 12:25:02 2013
    Starting ORACLE instance (normal)
    Fri Jun 07 12:25:23 2013
    LICENSE_MAX_SESSION = 0
    LICENSE_SESSIONS_WARNING = 0
    Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
    [name='eth1:1', type=1, ip=169.254.37.103, mac=00-26-55-eb-61-89, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
    Public Interface 'eth0' configured from GPnP for use as a public interface.
    [name='eth0', type=1, ip=135.33.2.8, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
    Public Interface 'eth0:1' configured from GPnP for use as a public interface.
    [name='eth0:1', type=1, ip=135.33.2.13, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
    Picked latch-free SCN scheme 3
    Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/oracle/product/11.2.0/dbhome_2/dbs/arch
    Autotune of undo retention is turned on.
    LICENSE_MAX_USERS = 0
    SYS auditing is disabled
    Starting up:
    Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
    With the Partitioning, Real Application Clusters, OLAP, Data Mining
    and Real Application Testing options.
    ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_2
    System name:     Linux
    Node name:     odsdb1
    Release:     2.6.18-308.el5
    Version:     #1 SMP Fri Jan 27 17:17:51 EST 2012
    Machine:     x86_64
    Using parameter settings in server-side pfile /u01/app/oracle/product/11.2.0/dbhome_2/dbs/initodsdb1.ora
    System parameters with non-default values:
    processes = 4500
    sessions = 6784
    event = ""
    spfile = "+DATA/odsdb/spfileodsdb.ora"
    nls_language = "SIMPLIFIED CHINESE"
    nls_territory = "CHINA"
    memory_target = 170G
    control_files = "+DATA/odsdb/controlfile/current.262.812288837"
    control_files = "+DATA/odsdb/controlfile/current.261.812288837"
    db_block_size = 8192
    compatible = "11.2.0.0.0"
    db_files = 4096
    cluster_database = TRUE
    db_create_file_dest = "+DATA"
    db_recovery_file_dest = ""
    db_recovery_file_dest_size= 38820M
    thread = 1
    undo_tablespace = "UNDOTBS1"
    instance_number = 1
    remote_login_passwordfile= "EXCLUSIVE"
    db_domain = ""
    dispatchers = "(PROTOCOL=TCP) (SERVICE=odsdbXDB)"
    remote_listener = "odsdb-cluster-scan:1521"
    job_queue_processes = 1000
    audit_file_dest = "/u01/app/oracle/admin/odsdb/adump"
    audit_trail = "DB"
    db_name = "odsdb"
    open_cursors = 300
    diagnostic_dest = "/u01/app/oracle"
    Cluster communication is configured to use the following interface(s) for this instance
    169.254.37.103
    cluster interconnect IPC version:Oracle UDP/IP (generic)
    IPC Vendor 1 proto 2
    Fri Jun 07 12:25:33 2013
    PMON started with pid=2, OS id=22959
    Fri Jun 07 12:25:33 2013
    PSP0 started with pid=3, OS id=22962
    Fri Jun 07 12:25:34 2013
    VKTM started with pid=4, OS id=22971 at elevated priority
    VKTM running at (1)millisec precision with DBRM quantum (100)ms
    Fri Jun 07 12:25:34 2013
    GEN0 started with pid=5, OS id=22977
    Fri Jun 07 12:25:34 2013
    DIAG started with pid=6, OS id=22979
    Fri Jun 07 12:25:35 2013
    DBRM started with pid=7, OS id=22981
    Fri Jun 07 12:25:35 2013
    PING started with pid=8, OS id=22983
    Fri Jun 07 12:25:35 2013
    ACMS started with pid=9, OS id=22985
    Fri Jun 07 12:25:35 2013
    DIA0 started with pid=10, OS id=22987
    Fri Jun 07 12:25:35 2013
    LMON started with pid=11, OS id=22989
    Fri Jun 07 12:25:35 2013
    LMD0 started with pid=12, OS id=22991
    * Load Monitor used for high load check
    * New Low - High Load Threshold Range = [61440 - 81920]
    Fri Jun 07 12:25:35 2013
    LMS0 started with pid=13, OS id=22994 at elevated priority
    Fri Jun 07 12:25:35 2013
    LMS1 started with pid=14, OS id=22998 at elevated priority
    Fri Jun 07 12:25:35 2013
    LMS2 started with pid=15, OS id=23002 at elevated priority
    Fri Jun 07 12:25:35 2013
    LMS3 started with pid=16, OS id=23006 at elevated priority
    Fri Jun 07 12:25:35 2013
    RMS0 started with pid=17, OS id=23010
    Fri Jun 07 12:25:35 2013
    LMHB started with pid=18, OS id=23013
    Fri Jun 07 12:25:35 2013
    MMAN started with pid=19, OS id=23015
    Fri Jun 07 12:25:35 2013
    DBW0 started with pid=20, OS id=23017
    Fri Jun 07 12:25:35 2013
    DBW1 started with pid=21, OS id=23019
    Fri Jun 07 12:25:35 2013
    DBW2 started with pid=22, OS id=23022
    Fri Jun 07 12:25:35 2013
    DBW3 started with pid=23, OS id=23024
    Fri Jun 07 12:25:35 2013
    DBW4 started with pid=24, OS id=23026
    Fri Jun 07 12:25:35 2013
    DBW5 started with pid=25, OS id=23028
    Fri Jun 07 12:25:35 2013
    DBW6 started with pid=26, OS id=23031
    Fri Jun 07 12:25:35 2013
    DBW7 started with pid=27, OS id=23033
    Fri Jun 07 12:25:35 2013
    LGWR started with pid=28, OS id=23035
    Fri Jun 07 12:25:35 2013
    CKPT started with pid=29, OS id=23037
    Fri Jun 07 12:25:35 2013
    SMON started with pid=30, OS id=23039
    Fri Jun 07 12:25:35 2013
    RECO started with pid=31, OS id=23041
    Fri Jun 07 12:25:35 2013
    RBAL started with pid=32, OS id=23043
    Fri Jun 07 12:25:35 2013
    ASMB started with pid=33, OS id=23045
    Fri Jun 07 12:25:35 2013
    MMON started with pid=34, OS id=23048
    Fri Jun 07 12:25:35 2013
    MMNL started with pid=35, OS id=23052
    Fri Jun 07 12:25:35 2013
    starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
    NOTE: initiating MARK startup
    starting up 1 shared server(s) ...
    Starting background process MARK
    Fri Jun 07 12:25:35 2013
    MARK started with pid=37, OS id=23056
    NOTE: MARK has subscribed
    lmon registered with NM - instance number 1 (internal mem no 0)
    Reconfiguration started (old inc 0, new inc 119)
    List of instances:
    1 2 (myinst: 1)
    Global Resource Directory frozen
    * allocate domain 0, invalid = TRUE
    Communication channels reestablished
    * domain 0 valid according to instance 2
    * domain 0 valid = 1 according to instance 2
    Master broadcasted resource hash value bitmaps
    Non-local Process blocks cleaned out
    LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    Set master node info
    Submitted all remote-enqueue requests
    Dwn-cvts replayed, VALBLKs dubious
    All grantable enqueues granted
    Submitted all GCS remote-cache requests
    Fix write in gcs resources
    Reconfiguration started (old inc 119, new inc 121)
    List of instances:
    1 2 (myinst: 1)
    Nested reconfiguration detected.
    Global Resource Directory frozen
    Communication channels reestablished
    Master broadcasted resource hash value bitmaps
    Non-local Process blocks cleaned out
    LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    Set master node info
    Submitted all remote-enqueue requests
    Dwn-cvts replayed, VALBLKs dubious
    All grantable enqueues granted
    Fri Jun 07 12:25:45 2013
    Submitted all GCS remote-cache requests
    Fri Jun 07 12:26:08 2013
    Fix write in gcs resources
    Reconfiguration complete
    Fri Jun 07 12:26:10 2013
    LCK0 started with pid=40, OS id=23632
    Fri Jun 07 12:26:10 2013
    Starting background process RSMN
    Fri Jun 07 12:26:10 2013
    RSMN started with pid=41, OS id=23646
    ORACLE_BASE not set in environment. It is recommended
    that ORACLE_BASE be set in the environment
    Reusing ORACLE_BASE from an earlier startup = /u01/app/oracle
    Fri Jun 07 12:26:11 2013
    ALTER SYSTEM SET local_listener=' (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=135.33.2.13)(PORT=1521))))' SCOPE=MEMORY SID='odsdb1';
    ALTER DATABASE MOUNT /* db agent *//* {1:9971:2} */
    Fri Jun 07 12:26:11 2013
    NOTE: Loaded library: System
    Fri Jun 07 12:26:11 2013
    SUCCESS: diskgroup DATA was mounted
    Fri Jun 07 12:26:11 2013
    NOTE: dependency between database odsdb and diskgroup resource ora.DATA.dg is established
    Fri Jun 07 12:26:16 2013
    Successful mount of redo thread 1, with mount id 3452000551
    Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
    Lost write protection disabled
    Completed: ALTER DATABASE MOUNT /* db agent *//* {1:9971:2} */
    ALTER DATABASE OPEN /* db agent *//* {1:9971:2} */
    Picked broadcast on commit scheme to generate SCNs
    Thread 1 advanced to log sequence 58364 (thread open)
    Thread 1 opened at log sequence 58364
    Current log# 2 seq# 58364 mem# 0: +DATA/odsdb/onlinelog/group_2.265.812288839
    Current log# 2 seq# 58364 mem# 1: +DATA/odsdb/onlinelog/group_2.266.812288839
    Successful open of redo thread 1
    MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
    Fri Jun 07 12:26:21 2013
    SMON: enabling cache recovery
    Fri Jun 07 12:26:23 2013
    minact-scn: Inst 1 is a slave inc#:121 mmon proc-id:23048 status:0x2
    minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
    Fri Jun 07 12:26:34 2013
    [23651] Successfully onlined Undo Tablespace 2.
    Undo initialization finished serial:0 start:2061372614 end:2061384964 diff:12350 (123 seconds)
    Verifying file header compatibility for 11g tablespace encryption..
    Verifying 11g file header compatibility for tablespace encryption completed
    Fri Jun 07 12:26:34 2013
    SMON: enabling tx recovery
    Database Characterset is ZHS16GBK
    No Resource Manager plan active
    Starting background process GTX0
    Fri Jun 07 12:26:35 2013
    GTX0 started with pid=45, OS id=23931
    Starting background process RCBG
    Fri Jun 07 12:26:35 2013
    RCBG started with pid=46, OS id=23933
    replication_dependency_tracking turned off (no async multimaster replication found)
    Starting background process QMNC
    Fri Jun 07 12:26:35 2013
    QMNC started with pid=48, OS id=23940
    Completed: ALTER DATABASE OPEN /* db agent *//* {1:9971:2} */
    Fri Jun 07 12:26:38 2013
    Starting background process CJQ0
    Fri Jun 07 12:26:38 2013
    CJQ0 started with pid=55, OS id=23977
    Fri Jun 07 12:27:56 2013
    Thread 1 advanced to log sequence 58365 (LGWR switch)
    Current log# 1 seq# 58365 mem# 0: +DATA/odsdb/onlinelog/group_1.263.812288839
    Current log# 1 seq# 58365 mem# 1: +DATA/odsdb/onlinelog/group_1.264.812288839
    Fri Jun 07 12:28:18 2013
    Starting background process SMCO
    Fri Jun 07 12:28:18 2013
    SMCO started with pid=70, OS id=25166
    Fri Jun 07 12:29:01 2013
    Thread 1 cannot allocate new log, sequence 58366
    Trace file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc
    Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
    With the Partitioning, Real Application Clusters, OLAP, Data Mining
    and Real Application Testing options
    ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_2
    System name: Linux
    Node name: odsdb1
    Release: 2.6.18-308.el5
    Version: #1 SMP Fri Jan 27 17:17:51 EST 2012
    Machine: x86_64
    Instance name: odsdb1
    Redo thread mounted by this instance: 0 <none>
    Oracle process number: 33
    Unix process pid: 32641, image: oracle@odsdb1 (ASMB)
    *** 2013-05-14 15:37:08.705
    *** SESSION ID:(3499.1) 2013-05-14 15:37:08.705
    *** CLIENT ID:() 2013-05-14 15:37:08.705
    *** SERVICE NAME:() 2013-05-14 15:37:08.705
    *** MODULE NAME:() 2013-05-14 15:37:08.705
    *** ACTION NAME:() 2013-05-14 15:37:08.705
    NOTE: initiating MARK startup
    *** 2013-05-14 15:37:16.835
    instance health monitoring reports instance shutting down
    *** 2013-06-07 12:23:42.700
    NOTE: ASMB terminating
    ORA-15064: ? ASM ??????
    ORA-03113: ?????????
    ?? ID:
    ?? ID: 2047 ???: 5
    error 15064 detected in background process
    ORA-15064: ? ASM ??????
    ORA-03113: ?????????
    ?? ID:
    ?? ID: 2047 ???: 5
    kjzduptcctx: Notifying DIAG for crash event
    ----- Abridged Call Stack Trace -----
    ksedsts()+461<-kjzdssdmp()+267<-kjzduptcctx()+232<-kjzdicrshnfy()+53<-ksuitm()+1332<-ksbrdp()+3344<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+252<-main()+201<-__libc_start_main()+244<-_start()+36
    ----- End of Abridged Call Stack Trace -----
    *** 2013-06-07 12:23:42.783
    ASMB (ospid: 32641): terminating the instance due to error 15064
    /u01/app/grid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log
    NOTE: ASMB process exiting, either shutdown is in progress
    NOTE: or foreground connected to ASMB was killed.
    Fri Jun 07 12:23:42 2013
    NOTE: client exited [14808]
    Fri Jun 07 12:23:44 2013
    Received an instance abort message from instance 2
    Please check instance 2 alert and LMON trace files for detail.
    Fri Jun 07 12:23:44 2013
    Received an instance abort message from instance 2
    Please check instance 2 alert and LMON trace files for detail.
    LMD0 (ospid: 31201): terminating the instance due to error 481
    Instance terminated by LMD0, pid = 31201
    Fri Jun 07 12:24:30 2013
    * instance_number obtained from CSS = 1, checking for the existence of node 0...
    * node 0 does not exist. instance_number = 1
    Starting ORACLE instance (normal)
    LICENSE_MAX_SESSION = 0
    LICENSE_SESSIONS_WARNING = 0
    Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
    [name='eth1:1', type=1, ip=169.254.37.103, mac=00-26-55-eb-61-89, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
    Public Interface 'eth0' configured from GPnP for use as a public interface.
    [name='eth0', type=1, ip=135.33.2.8, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
    Picked latch-free SCN scheme 3
    Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/11.2.0.2/grid/dbs/arch
    Autotune of undo retention is turned on.
    LICENSE_MAX_USERS = 0
    [grid@odsdb1 cssd]$ file core.30481
    core.30481: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'ocssd.bin'
    [grid@odsdb1 cssd]$ gdb
    gdb gdbserver gdbtui
    [grid@odsdb1 cssd]$ gdb ocssd.bin core.30481
    GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-42.el5)
    Copyright (C) 2009 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law. Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-redhat-linux-gnu".
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>...
    Reading symbols from /u01/app/11.2.0.2/grid/bin/ocssd.bin...(no debugging symbols found)...done.
    [New Thread 30486]
    [New Thread 30530]
    [New Thread 30526]
    [New Thread 30525]
    [New Thread 30523]
    [New Thread 30522]
    [New Thread 30521]
    [New Thread 30520]
    [New Thread 30519]
    [New Thread 30504]
    [New Thread 30503]
    [New Thread 30495]
    [New Thread 30485]
    [New Thread 30484]
    [New Thread 30483]
    [New Thread 30481]
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libhasgen11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libhasgen11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libocr11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocr11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libocrb11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocrb11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libocrutl11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocrutl11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libclntsh.so.11.1...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libclntsh.so.11.1
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libskgxn2.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libskgxn2.so
    Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
    Loaded symbols for /lib64/libdl.so.2
    Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
    Loaded symbols for /lib64/libm.so.6
    Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
    [Thread debugging using libthread_db enabled]
    Loaded symbols for /lib64/libpthread.so.0
    Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
    Loaded symbols for /lib64/libnsl.so.1
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libasmclntsh11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libasmclntsh11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libcell11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libcell11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libskgxp11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libskgxp11.so
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libnnz11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libnnz11.so
    Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
    Loaded symbols for /lib64/libc.so.6
    Reading symbols from /usr/lib64/libaio.so.1...(no debugging symbols found)...done.
    Loaded symbols for /usr/lib64/libaio.so.1
    Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
    Loaded symbols for /lib64/ld-linux-x86-64.so.2
    Reading symbols from /u01/app/11.2.0.2/grid/lib/libnque11.so...(no debugging symbols found)...done.
    Loaded symbols for /u01/app/11.2.0.2/grid/lib/libnque11.so
    Reading symbols from /opt/oracle/extapi/64/asm/orcl/1/libasm.so...(no debugging symbols found)...done.
    Loaded symbols for /opt/oracle/extapi/64/asm/orcl/1/libasm.so
    warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff505fd000
    Core was generated by `/u01/app/11.2.0.2/grid/bin/ocssd.bin '.
    Program terminated with signal 6, Aborted.
    #0 0x000000369ea30265 in raise () from /lib64/libc.so.6
    (gdb) where
    #0 0x000000369ea30265 in raise () from /lib64/libc.so.6
    #1 0x000000369ea31d10 in abort () from /lib64/libc.so.6
    #2 0x00002afc67f9aeda in scls_abort (flags=0) at scls.c:7088
    #3 0x000000000040babd in clssscExit (thrd=0x10d325a0, status=clssscreasonSHUTNORM) at clsssc.c:2155
    #4 0x0000000000446221 in clssgmClientShutdown (thrd=0x10d325a0, cmInfo=0x10b40090) at clssgmc.c:6415
    #5 0x0000000000436707 in clssgmProcClientReqs (thrd=0x10d325a0, clctx=0x10b40630) at clssgmc.c:704
    #6 0x0000000000436405 in clssgmclientlsnr (thrd=0x10d325a0) at clssgmc.c:644
    #7 0x000000000040ac2f in clssscthrdmain (thrd=0x10d325a0) at clsssc.c:1716
    #8 0x000000369fa0677d in start_thread () from /lib64/libpthread.so.0
    #9 0x000000369ead49ad in clone () from /lib64/libc.so.6
    (gdb)
    2013-06-07 12:19:37.377: [    CSSD][1085888832]clssscSelect: cookie accept request 0x10b40630
    2013-06-07 12:19:37.377: [    CSSD][1085888832]clssgmAllocProc: (0x2aaab0133ea0) allocated
    2013-06-07 12:19:37.379: [    CSSD][1085888832]clssgmClientConnectMsg: properties of cmProc 0x2aaab0133ea0 - 1,2,3,4,5
    2013-06-07 12:19:37.379: [    CSSD][1085888832]clssgmClientConnectMsg: Connect from con(0x6ae44fa) proc(0x2aaab0133ea0) pid(14139/14139) version 11:2:1:4, properties: 1,2,3,4,5
    2013-06-07 12:19:37.379: [    CSSD][1085888832]clssgmClientConnectMsg: msg flags 0x0000
    2013-06-07 12:19:37.384: [    CSSD][1085888832]clssscSelect: cookie accept request 0x2aaab0133ea0
    2013-06-07 12:19:37.384: [    CSSD][1085888832]clssscevtypSHRCON: getting client with cmproc 0x2aaab0133ea0
    2013-06-07 12:19:37.384: [    CSSD][1085888832]clssgmRegisterClient: proc(69/0x2aaab0133ea0), client(1/0x2aaab010c5c0)
    2013-06-07 12:19:37.385: [    CSSD][1085888832]clssgmRegisterShared: grp DBODSDB, mbr 0, type 1
    2013-06-07 12:19:37.385: [    CSSD][1085888832]clssgmQueueShare: (0x2aaab0085790) target global grock DBODSDB member 0 type 1 queued from client (0x2aaab010c5c0), global grock DBODSDB, refcount 23
    2013-06-07 12:19:37.385: [    CSSD][1085888832]clssgmRegisterShared: global grock DBODSDB member 0 share type 1, refcount 23
    2013-06-07 12:19:37.391: [    CSSD][1085888832]clssscSelect: cookie accept request 0x2aaab0133ea0
    2013-06-07 12:19:37.391: [    CSSD][1085888832]clssscevtypSHRCON: getting client with cmproc 0x2aaab0133ea0
    2013-06-07 12:19:37.391: [    CSSD][1085888832]clssgmRegisterClient: proc(69/0x2aaab0133ea0), client(2/0x2aaab0061f10)
    what is the problem
    Edited by: 徐振富 on 2013-6-7 下午6:38
    Edited by: 徐振富 on 2013-6-7 下午6:45

    is your ASM instance up?
    If not, trying bring up ASM instance up just by itself and see if it throws any error?
    Post status of crsctl status cluster -all

  • Issues while Add / Delete RAC Node in Oracle 10g R2

    Hi,
    I have an requirement to add a New Node in the existing 2 Node RAC at Production, where 1 Node is Active & other one is passive due to licence issue & cannot keep both the nodes as active. Due to performance issues (Memory , CPU Cores ..etc) we are adding another new node.
    Right now we are planning to add a 3rd database node making the new node as active and current active one as passive which is a swap & later on after final observation delete and decommission the current passive node.
    This activity is checked at the Dev database with the same infrastructure (OS + Memory ..etc) but want to check what is the best approach (or) challenges we face during the RAC Node Addition / Deletion
    RAC DB Version : 10.2.o.4
    OS Version : RHEL 5.8
    (1) Is the approach is right one , First Adding the node & later on delete
    (2) If the approach is the correct , what would be the behavious of the 3rd node in means of active (or) passive
    (3) We have taken RMAN backup , OS backup , CRS , ORACLE_HOME , ASM_Home backup , OCR & VD.
    (4) Could you please give detail steps for adding / deleting node in 10g R2.
    (5) Are they any known bugs to us with the DB release (or) OS while performing this activity.
    Since this is a production machine we want to more proactive . Please correct or add any thing i am missing out ...
    With Thanks,
    Rakesh

    Hello Rakesh,
    Please follow the following steps.
    Node Addition Steps
    1. Install and configure OS and hardware for new node.
    2. Add Oracle Clusterware to the new node.
    3. Configure ONS for the new node.
    4. Add ASM home to the new node.
    5. Add Databse home to the new node.
    6. Add a listener to the new node.
    7. Add ASM instance to the New Node.
    8. Add a database instance to the new node.
    Details of steps
    1. run cluvfy to verify whether New node is ready for addition or not.
         $ cluvfy stage -pre crsinst -n node2
    2. from node1, execute
              $/u01/app/crs11g/oui/bin/addNode.sh
    3. Specify node2 vip address and follow instructions.
    4. In the last of installtion it may through an wornig and will ask to click on YES. click on YES
    5. from node1,
              /u01/app/crs11g/bin/racgons add_config node2:6200
    6. from Node1,set ORACLE_HOME=ASM_HOME and then execute addNode.sh from $ASM_HOME/oui/bin and Follow instrusctions.
    7. From node1, set ORACLE_HOME=DB_HOME and then
         /u01/app/oracle/product/11.1.0/db_1/oui/bin/addNode.sh
         and Follow instructions.
    8. from node2 start NETCA and configure listener for new node. While configuring Listener select the name of new node.
    9. from node1 start dbca from ASM Home to configure ASM instance for new node.
    10. Again from node1 start dbca from DB Home to add DB instance
    Node deletion Steps
    1. Delete the Database instance on the node to be deleted.
    2. Clean up the ASM instance.
    3. Remove the listener from the node to be deleted.
    4. Remove the node from the database.
    5. Remove the node from ASM.
    6. Remove ONS configuration from the node to be deleted.
    7. Remove the node from the clusterware
    Details of Steps
    1. Remove database Instance of node2
         Dbca -> instance Management -> delete instance -> password for sys -> select node -> finish.
    2. Stop asm for node2 from any nodes.
         $srvctl stop asm –n node2
    3. Remove asm for node2
         $ srvctl remove asm -n node2
    4. Remove Listener from Node2 using NETCA.
    5. From Node2:
              ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node2" -local
    6. From Node2, start runinstaller from Oracle_DB_Home/oui/bin, and remove "DB_HOME"
         $ ./runinstaller
         On the WELCOME Screen -> Deinstall product -> Select dbhome name (OraDb10g_Home1) -> Remove
    7. From Node1:
              ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node1"
    8. From Node2, set Oracle_Home to asm_1 and then fire:
              ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node2" -local
    9. From Node2, start OUI and deinstall ASM Home.
    10. From Node1, Set ORACLE_HOME= /u01/app/oracle/product/11.1.0/asm_1
    11. From Node1: from /u01/app/oracle/product/11.1.0/asm_1/oui/bin, start OUI
              ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node1"
    12. From Node2: as a root user (#) execute rootdelete.sh from /u01/app/crs11g/install
         # /u01/app/crs11g/install/rootdelete.sh
    13.From Node-1 first find out the node numbers
         # /u01/app/crs11g/bin/olsnodes -n
         output : node1 1
              node2 2
    14. From Node-1 as a root user (#):
         # /u01/app/crs11g/install/rootdeletenode.sh node2[Node_Name] 2[node_no]
         output:
              CRS nodeapps are deleted successfully
              clscfg: EXISTING configuration version 4 detected.
              clscfg: version 4 is 11 Release 1.
              Node deletion operation successful.
              'node2' deleted successfully
    15. From Node2 set ORACLE_HOME=CRS_HOME and then execute
         $$ORACLE_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node2" CRS=TRUE -local
    16. ./runInstaller and remove CRS_HOME
    17. From Node-1:
         $ /u01/app/crs11g/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node1" CRS=TRUE
    18. check node is deleted from ./crs_stat -t

  • RAC node Hung

    Hi Friends,
    Server info:
    Windows 2003 server
    Oracle 10.2.0.5, 2 Node RAC
    We are having problem Hung Node 2 server due to Blue dump error. But in Oracle we are not getting any error on CRS & alertlogs. After restarted the server problem solved. How can we identify what could be the reason of server hang. We are not getting any error in Operating System side also. Is there any way to identify the problem of server hang after restarted server?
    Thanks in advance.

    user12159566 wrote:
    Hi,
    Thanks for your reply.
    OS side also having no logs generated except "*Blue Screen Trap (BugCheck, STOP: 0x0000FFFF (0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000))*" . As per my knowledge this is not a Node eviction problem. We are not able to find any node eviction log in Oracle logs.
    See this note:
    *RAC on Windows: Oracle Clusterware Node Evictions a.k.a. Why do we get a Blue Screen (BSOD) Caused By Orafencedrv.sys? [ID 337784.1]*

  • Rac node crash

    Hi,
    I am using 2 node rac enviroment.I am doing some tests for availability.When i poweroff one node of cluster the other node of cluster reboot itself. The voting disk is online. I think this is caused from network heartbeat of private network. So i wonder when one node of cluster goes down , does the other node have to reboot itself or not ?
    Thanks.

    Hi,
    This is 10gr1 rac.I could not find alertlog.But taken log from ocssd logs from both nodes.I have rebooted node2 and node1 reboot itself.
    Node2
    CLSS-3001: local node number 2, master node number 1
    2012-05-22 08:45:57.247 [773] >TRACE:   clssgmClientConnectMsg: Connect from con(111cdebb0), proc(111ce38b0)  pid()
    2012-05-22 08:45:57.247 [773] >TRACE:   clssgmClientConnectMsg: Connect from con(111ce1650), proc(111ce3b70)  pid()
    2012-05-22 08:47:20.865 [773] >TRACE:   clsc_receive: (111ce0e10) Connection failed, transport error (507, 0, 0)
    2012-05-22 08:47:20.865 [773] >TRACE:   clscreceive: (111ce1650) Physical connection (111ce0e10) not active, rc 11
    2012-05-22 08:47:20.865 [773] >TRACE:   clscreceive: (111ce3fb0) Physical connection (111ce0e10) not active, rc 11
    2012-05-22 08:47:20.865 [773] >TRACE:   clssgmDeleteClientListener: cleanup for proc(111ce3b70) con(111ce1650) pid()
    2012-05-22 08:47:20.879 [1287] >TRACE:   clscsendx: (111ce3fb0) Connection not active
    2012-05-22 08:47:20.934 [773] >TRACE:   clsc_receive: (111cde370) Connection failed, transport error (507, 0, 0)
    2012-05-22 08:47:20.934 [773] >TRACE:   clscreceive: (111cdebb0) Physical connection (111cde370) not active, rc 11
    2012-05-22 08:47:20.934 [773] >TRACE:   clssgmDeleteClientListener: cleanup for proc(111ce38b0) con(111cdebb0) pid()
    2012-05-22 08:47:51.100 [773] >TRACE:   clssgmClientConnectMsg: Connect from con(111cdebb0), proc(111ce0e10)  pid()
    Node1
    CLSS-3001: local node number 1, master node number 1
    2012-05-22 08:47:51.112 [516] >TRACE:   clsc_receive: (111ce6c50) Connection failed, transport error (507, 0, 0)
    2012-05-22 08:47:51.112 [516] >WARNING: clssnmeventhndlr: Receive failure with node 2, rc=11
    2012-05-22 08:47:51.112 [1287] >TRACE:   clsc_receive: (111bd8ab0) Connection failed, transport error (507, 0, 0)
    2012-05-22 08:47:51.157 [1544] >WARNING: clssnmPollingThread: Eviction started for node 2, flags 0x0001, state 3, wt4c 0
    2012-05-22 08:47:56.157 [1544] >TRACE:   clssnmDoSyncUpdate: Initiating sync 3
    2012-05-22 08:47:56.157 [516] >TRACE:   clssnmHandleSync: Acknowledging sync: src[1] seq[9] sync[3]
    2012-05-22 08:47:56.203 [1] >USER:    NMEVENT_SUSPEND [00][00][00][02]
    2012-05-22 08:48:00.165 [1544] >TRACE:   clssnmEvict: Evicting node 2, birth 2, death 0, killme 1
    2012-05-22 08:48:00.165 [1544] >TRACE:   clssnmWaitOnEvictions: Waiting for node 2 to die, missed HB 0 of 22503
    2012-05-22 08:48:01.165 [1544] >TRACE:   clssnmWaitOnEvictions: Waiting for node 2 to die, missed HB 1 of 22503
    2012-05-22 08:48:02.166 [1544] >TRACE:   clssnmWaitOnEvictions: Waiting for node 2 to die, missed HB 2 of 22503While node2 rebooting node1 has begun to reboot . Below log is OS log from node1
    LABEL:          CORE_DUMP
    IDENTIFIER:     C69F5C9B
    Date/Time:       Tue May 22 08:48:05 2012
    Sequence Number: 26098
    Machine Id:      00C2F7704C00
    Node Id:         kopstest1
    Class:           S
    Type:            PERM
    Resource Name:   SYSPROC
    Description
    SOFTWARE PROGRAM ABNORMALLY TERMINATED
    Probable Causes
    SOFTWARE PROGRAM
    User Causes
    USER GENERATED SIGNAL
            Recommended Actions
            CORRECT THEN RETRY
    Failure Causes
    SOFTWARE PROGRAM
            Recommended Actions
            RERUN THE APPLICATION PROGRAM
            IF PROBLEM PERSISTS THEN DO THE FOLLOWING
            CONTACT APPROPRIATE SERVICE REPRESENTATIVE
    Detail Data
    SIGNAL NUMBER
               6
    USER'S PROCESS ID:
                    909402
    FILE SYSTEM SERIAL NUMBER
              12
    INODE NUMBER
          109027
    CORE FILE NAME
    /oracle/product/10.1.0/crs/css/init/core
    PROGRAM NAME
    ocssd.bin
    STACK EXECUTION DISABLED
               0
    COME FROM ADDRESS REGISTER
    PROCESSOR ID
      hw_fru_id: N/A
      hw_cpu_id: N/A
    ADDITIONAL INFORMATION
    pthread_k 88
    Symptom Data
    REPORTABLE
    1
    INTERNAL ERROR
    0
    SYMPTOM CODE
    PCSS/SPI2 FLDS/ocssd.bin SIG/6 FLDS/pthread_k VALU/88Thanks.
    Edited by: ecer on May 21, 2012 11:11 PM

Maybe you are looking for