Rac node failure crs cleanup failing

I have a three node rac database, 10.2.0.4 running on Windows server 2008. I lost a hard drive on one of the servers and it corrupted the mirror disk as well so I am having to rebuild. I am going through these procedures, RAC on Windows: How to Cleanup When A Node Has Been Disconnected or The OS Rebuilt (Doc ID 742737.1) and am running into a problem once I tried to delete the listener and then on to crs to delete the nodeapps for node3.
For the listener, I go into netca and the option to delete a listener is grayed out. When I run crs_stat I can still see the ora.node3.lsnr there. Does this mean that I just need to update tnsnames.ora or is there another place the information would be held? I hate to manually delete because I am afraid I won't get it cleaned out from everywhere. Any idea why that option would not be there?
My second issue is when I run this:
srvctl stop nodeapps -n node3 The nodeapps stop doesn't return any output and then when I try to remove nodeapps it gives me PRKO-2112: Some or all node applications are not removed successfully on node.
I have searched metalink for that error with no success as the document I found also says that you must stop nodeapps. I have already deleted the node from the db and asm and updated the appropriate inventory. I just need to finish the listener and crs and update the inventory for crs. Also, I noticed that the vip for the failed node was reassigned to node2 and I show that it has been released when i run cluvfy to check. Would crs give me errors on this if that was not the case?
I appreciate any help or guidance!

Wanted to post a follow up in case any others are interested in the results...
I had tried to add the listener back to one of the remaining nodes .ora file and then delete but that didn't work. Also, remove nodeapps continued to throw an error that it could not stop the listener or vip for the failed node.
After a few days of reading I make a decision to just unregister the abandoned services from crs. I made sure to backup the OCRCONFIG before I ran crs_unregister and was able to successfully remove the listener and vip services from the failed node.
This eliminated my issue with netca, the node did not show up there anymore. I then went on to remove nodeapps and it failed saying it could not find the resource vip. I then ran olsnodes -n and used crssetup to remove the node entirely. Everything showed removed and I went and updated the crs inventory to finish.
All looks good and now I am working to rebuild and add the node back in.

Similar Messages

  • RAC node outage causes SOA Suite 10.1.3.4 BPEL  failure

    Using weblogic 9.2 and the SOA Suite 10.1.3.4. We use a 10g Oracle RAC ( 2 nodes ); the WL cluster has a multi data source of 2 pools, each pool pointing to a single node in the rac, each pool deployed to the cluster, and the multi data source in load-balancing mode.
    So the other night, one of the db nodes had a hardware failure ( ironically, with a remote monitoring / management card ). Annoying, but it should not have caused the BPEL servers to be in "FAILED NOT RESTARTABLE" status the next morning.
    Jun 9, 2009 12:10:07 AM EDT> <Warning> <JDBC> <BEA-001129> <Received exception while creating connection for pool "esbaqds2": Io exception: The Network Adapter could not establish the connection>
    SEVERE: Destroying JMSDequeuer failed
    oracle.jms.AQjmsException: Connection has been administratively destroyed. Reconnect.
    at oracle.jms.AQjmsSession.preClose(AQjmsSession.java:980)
    at oracle.jms.AQjmsObject.close(AQjmsObject.java:409)
    at oracle.jms.AQjmsSession.close(AQjmsSession.java:1020)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.destroy(JMSDequeuer.java:419)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.destroyWithoutUnsubscribing(JMSDequeuer.java:395)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:175)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
    at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
    at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
    at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
    at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
    followed by a 2 GB log file containing 1.3 million iterations of the following within the next 10 minutes before the managed servers failed.
    java.lang.NullPointerException
    at java.lang.String.<init>(String.java:144)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:168)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
    at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
    at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
    at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
    at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
    Both managed instances of the BPEL cluster failed, even though the 1st node of the Oracle RAC was still available.
    Our 10.3 cluster, also using multi data sources to the same RAC for the OSB components, simply went on about its business using the remaining rac node pool.
    Seems to be a single point of failure...

    We haven't changed the JDBC connection string yet, but we did run a test in the same environment while Oracle support considers the situation.
    For the test, we simply shutdown one node of the RAC and watched to see what happens. Within the space of a minute, the JDBC "Failed Reserve Request Count" was increasing by thousands on every refresh of the screen. We restarted the RAC node after 5 minutes, by which time the "Failed Reserve Request Count" was over 190,000
    The 2 BPEL managed servers remained in Running status and each created a 660 MB log file within that 5 minutes. In the original outage, the nodes were down for about 15 minutes. Most of the logging is being generated from within the oracle.tip.esb classes, not by the weblogic classes. It looks like that once the pool pointing to the downed RAC node becomes disabled, the Oracle BPEL code is still trying to use it even though the multi-source JNDI is the published lookup:
    INFO: JMSDequeuer::createConnection - AQ Topics
    java.sql.SQLException: weblogic.common.ResourceException:
    esbaqds(esbaqds2): Pool esbaqds2 is disabled, cannot allocate resources to applications..
    esbaqds(esbaqds1): Pool esbaqds1 is disabled, cannot allocate resources to applications..
    at weblogic.jdbc.common.internal.JDBCUtil.wrapAndThrowResourceException(JDBCUtil.java:250)
    at weblogic.jdbc.common.internal.RmiDataSource.getPoolConnection(RmiDataSource.java:348)
    at weblogic.jdbc.common.internal.RmiDataSource.getConnection(RmiDataSource.java:364)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.createAQConnection(JMSDequeuer.java:559)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:159)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
    at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
    at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
    at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
    at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
    Caused by: weblogic.common.ResourceException:
    esbaqds(esbaqds2): Pool esbaqds2 is disabled, cannot allocate resources to applications..
    esbaqds(esbaqds1): Pool esbaqds1 is disabled, cannot allocate resources to applications..
    at weblogic.jdbc.common.internal.MultiPool.searchLoadBalance(MultiPool.java:331)
    at weblogic.jdbc.common.internal.MultiPool.findPool(MultiPool.java:202)
    at weblogic.jdbc.common.internal.ConnectionPoolManager.reserve(ConnectionPoolManager.java:77)
    at weblogic.jdbc.common.internal.RmiDataSource.getPoolConnection(RmiDataSource.java:346)
    ... 11 more
    SEVERE: Failed to process deferred message
    oracle.tip.esb.server.dispatch.QueueHandlerException: Error creating "weblogic.common.ResourceException: No good connections available."
    at oracle.tip.esb.server.dispatch.JMSDequeuer.createAQConnection(JMSDequeuer.java:661)
    at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:159)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
    at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
    at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
    at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
    at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
    at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)

  • Unable to bring up crs on rac node 2 after server reboot

    Hi,
    We have a 2 node rac architecture. We are only able to bring up Node 1 on the cluster, whereas node 2 is failing. Here are some points::
    1. After the server reboot, node 2 crs crs/resources weren't starting up apart from OHAS.
    2. We again stopped both the CRS and tried bringing up CRS on node 2 initially and succeeded. But now node 1 wasn't coming up.
    3. Again brought down both nodes' CRS and tried bringing up CRS on node1 and succeded but asm wasn't showing the Diskgroups. So we changed pfile to include asm_diskstring from ORCL* to /dev/oracleasm/disks and we could lsdg in asm now.So started all the instances from node 1 now. Apart from this, again node 2 CRS wasn't starting. From alertlog I saw "CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds;". But we were able to query voting disks initially. What has gone wrong now??
    ./crsctl status res -t -init
    NAME           TARGET  STATE        SERVER                   STATE_DETAILS
    Cluster Resources
    ora.asm
          1        OFFLINE OFFLINE
    ora.cluster_interconnect.haip
          1        ONLINE  OFFLINE
    ora.crf
          1        ONLINE  ONLINE       kusmnd0r
    ora.crsd
          1        ONLINE  OFFLINE
    ora.cssd
          1        ONLINE  OFFLINE                               STARTING
    ora.cssdmonitor
          1        ONLINE  ONLINE       kusmnd0r
    ora.ctssd
          1        ONLINE  OFFLINE
    ora.diskmon
          1        OFFLINE OFFLINE
    ora.evmd
          1        ONLINE  OFFLINE
    ora.gipcd
          1        ONLINE  ONLINE       kusmnd0r
    ora.gpnpd
          1        ONLINE  ONLINE       kusmnd0r
    ora.mdnsd
          1        ONLINE  ONLINE       kusmnd0r
    This is the history of activitites. Kindly someone throw light on this please.
    Thanks,
    Anirban.

    It is on a raw device.
    Healthy node::
    ls -ltrh /dev/vote*
    brw-rw---- 1 crsdwqa dbadwqa 120, 1057 Nov  6 11:32 /dev/vote3
    brw-rw---- 1 crsdwqa dbadwqa 120, 1025 Nov  6 11:32 /dev/vote1
    brw-rw---- 1 crsdwqa dbadwqa 120, 1041 Nov  6 11:32 /dev/vote2
    Affected Node::
    ls -ltrh /dev/vote*
    brw-rw-r-- 1 crsdwqa dbadwqa 120, 1025 Nov  4 12:06 /dev/vote1
    brw-rw-r-- 1 crsdwqa dbadwqa 120, 1041 Nov  4 12:06 /dev/vote2
    brw-rw-r-- 1 crsdwqa dbadwqa 120, 1057 Nov  5 04:42 /dev/vote3
    Regards,
    Anirban.

  • Exception while failing over to 2nd RAC Node

    We are using Weblogic 10.3.4. Our setup is that we have a Web Application (A tapestry front end Web UI) and EJb 2.1 back-end talking to the Oracle database. The EJB’s are CMP. Our product always was just stand alone and it wasn’t until this release we needed to make it work with RAC. To get this to work we followed the model of having a Multidatasource with datasources pointing to our RAC nodes. We have two types of datasources that we use persistent and non-persistent. And we are using the Oracle thin driver – non-XA for RAC Service Instances, supporting global transactions.
    When we do failover to the 2nd node we get a nasty exception in our GUI but after logging out and logging back it we are fine.
    My question is that I assumed I shouldn't have to restart our web-application and it should have stayed up ?? Or is there something wrong with our setup ?
    Thanks,
    Ian

    Showing us the exception and/or the error messages at the server might help...
    Note that failing over does not save any ongoing connection or transaction that
    had been to the dead RAC node... Does your web-app get-use-close JDBC
    connections on a per-user-invoke basis, or does it hold onto connections?
    Joe

  • Rac node failed how do you bring it back up?

    Example: If there are 3 RAC nodes and one becomes unavailable/fails, how do you bring it back up?

    There are typically two basic reasons why a RAC node will go down.
    A cluster issue, causing the node to be evicted. This usually means the node lost access to the cluster storage (e.g. no access to voting disks), or lost access to the Interconnect.
    An o/s issue causing the node to fail. E.g. a kernel panic due to a page swap and memory not syncing, or a soft CPU lockup, etc.
    You need to determine why it went down, and determine what is needed to enable it to join the successfully cluster again.

  • Dbconsole failed to start on one RAC node

    Hi
    I have 2 RAC nodes (RHEL 4) and 10.2.0.1. On one dbconsole is running and on other I get the following. Earlier dbconsole
    on both the nodes used to run perfectly fine. I will appreacite any suggestions to rectify this problem.
    Regards
    oracle@rac01<18>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> emctl start dbconsole
    TZ set to Canada/Newfoundland
    Oracle Enterprise Manager 10g Database Control Release 10.2.0.1.0
    Copyright (c) 1996, 2005 Oracle Corporation. All rights reserved.
    http://rac01:1158/em/console/aboutApplication
    Agent Version : 10.1.0.4.1
    OMS Version : Unknown
    Protocol Version : 10.1.0.2.0
    Agent Home : /u01/app/oracle/product/10.2/db_1/rac01_RACDB1
    Agent binaries : /u01/app/oracle/product/10.2/db_1
    Agent Process ID : 23329
    Parent Process ID : 21132
    Agent URL : http://rac01:3938/emd/main
    Started at : 2007-07-25 11:37:32
    Started by user : oracle
    Last Reload : 2007-07-25 11:37:32
    Last successful upload : (none)
    Last attempted upload : (none)
    Total Megabytes of XML files uploaded so far : 0.00
    Number of XML files pending upload : 371
    Size of XML files pending upload(MB) : 7.66
    Available disk space on upload filesystem : 44.78%
    Agent is already started. Will restart the agent
    Stopping agent ... stopped.
    Starting Oracle Enterprise Manager 10g Database Control ............................................................................................. failed.
    Logs are generated in directory /u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log
    oracle@rac01<19>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log>
    ON OTHER NODE:
    oracle@rac02<2>:/u01/app/oracle> emctl start dbconsole
    TZ set to Canada/Newfoundland
    Oracle Enterprise Manager 10g Database Control Release 10.2.0.1.0
    Copyright (c) 1996, 2005 Oracle Corporation. All rights reserved.
    http://rac01:1158/em/console/aboutApplication
    Starting Oracle Enterprise Manager 10g Database Control .................................... started.
    Logs are generated in directory /u01/app/oracle/product/10.2/db_1/rac02_RACDB2/sysman/log
    oracle@rac02<3>:/u01/app/oracle>

    Thanks for your time and reply .
    Well, here is what I got, couldn't make out from here.
    Regards
    oracle@rac01<19>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> ls -lart
    total 13500
    drwxr----- 7 oracle dba 4096 Jul 14 10:48 ..
    -rw-r----- 1 oracle dba 0 Jul 14 10:48 emdctl.log
    drwxrwx--- 2 oracle dba 4096 Jul 14 10:54 nmcRACDB11521
    -rw-r----- 1 oracle dba 4655792 Jul 24 23:01 emoms.trc
    -rw-r----- 1 oracle dba 4655792 Jul 24 23:01 emoms.log
    drwxr----- 3 oracle dba 4096 Jul 25 11:35 .
    -rw-r----- 1 oracle dba 4096 Jul 25 12:05 emdb.nohup.lr
    -rw-r----- 1 oracle dba 1074 Jul 25 12:05 emagent_perl.trc
    -rw-r----- 1 oracle dba 1731 Jul 25 12:06 emagent.log
    -rw-r----- 1 oracle dba 1080 Jul 25 12:07 emagentfetchlet.trc
    -rw-r----- 1 oracle dba 1080 Jul 25 12:07 emagentfetchlet.log
    -rw-r----- 1 oracle dba 81089 Jul 25 13:28 emdctl.trc
    -rw-r----- 1 oracle dba 3309143 Jul 25 13:28 emdb.nohup
    -rw-r----- 1 oracle dba 1044518 Jul 25 13:28 emagent.trc
    oracle@rac01<20>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> cat emagent.log
    2007-07-14 10:50:44 Thread-3086936288 Starting Agent 10.1.0.4.1 from /u01/app/oracle/product/10.2/db_1 (00701)
    2007-07-14 10:51:16 Thread-3086936288 EMAgent started successfully (00702)
    2007-07-14 14:38:21 Thread-3086935744 Starting Agent 10.1.0.4.1 from /u01/app/oracle/product/10.2/db_1 (00701)
    2007-07-14 14:39:00 Thread-3086935744 EMAgent started successfully (00702)
    2007-07-24 07:05:06 Thread-3086935744 Starting Agent 10.1.0.4.1 from /u01/app/oracle/product/10.2/db_1 (00701)
    2007-07-24 07:07:11 Thread-3086935744 target {+ASM1_rac01, osm_instance} is broken: cannot compute dynamic properties in time. (00155)
    2007-07-24 07:07:14 Thread-3086935744 EMAgent started successfully (00702)
    2007-07-24 12:06:27 Thread-3086935744 EMAgent normal shutdown (00703)
    2007-07-24 12:08:26 Thread-3086935744 Starting Agent 10.1.0.4.1 from /u01/app/oracle/product/10.2/db_1 (00701)
    2007-07-24 12:08:51 Thread-3086935744 EMAgent started successfully (00702)
    2007-07-25 11:35:35 Thread-3086935744 EMAgent normal shutdown (00703)
    2007-07-25 11:37:32 Thread-3086935744 Starting Agent 10.1.0.4.1 from /u01/app/oracle/product/10.2/db_1 (00701)
    2007-07-25 11:39:29 Thread-3086935744 target {+ASM1_rac01, osm_instance} is broken: cannot compute dynamic properties in time. (00155)
    2007-07-25 11:39:30 Thread-3086935744 EMAgent started successfully (00702)
    2007-07-25 12:03:36 Thread-3086935744 EMAgent normal shutdown (00703)
    2007-07-25 12:05:15 Thread-3086935744 Starting Agent 10.1.0.4.1 from /u01/app/oracle/product/10.2/db_1 (00701)
    2007-07-25 12:06:23 Thread-3086935744 target {+ASM1_rac01, osm_instance} is broken: cannot compute dynamic properties in time. (00155)
    2007-07-25 12:06:24 Thread-3086935744 EMAgent started successfully (00702)
    oracle@rac01<21>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> cat emagentfetchlet.log
    2007-07-14 11:01:44,208 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
    2007-07-14 14:40:29,096 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
    2007-07-24 07:10:44,123 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
    2007-07-24 12:12:48,187 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
    2007-07-25 11:41:25,628 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
    2007-07-25 12:07:30,335 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
    oracle@rac01<22>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log>
    oracle@rac01<22>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> tail -40 emagentfetchlet.trc
    2007-07-14 11:01:44,208 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
    2007-07-14 14:40:29,096 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
    2007-07-24 07:10:44,123 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
    2007-07-24 12:12:48,187 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
    2007-07-25 11:41:25,628 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
    2007-07-25 12:07:30,335 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
    oracle@rac01<25>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> tail -10 emdctl.trc
    2007-07-25 13:01:02 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
    2007-07-25 13:04:41 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
    2007-07-25 13:07:12 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
    2007-07-25 13:10:50 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
    2007-07-25 13:14:32 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
    2007-07-25 13:18:09 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
    2007-07-25 13:20:40 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
    2007-07-25 13:24:27 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
    2007-07-25 13:28:06 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
    2007-07-25 13:31:43 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
    oracle@rac01<28>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> tail -10 emagent.trc
    2007-07-25 13:31:44 Thread-43162528 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
    2007-07-25 13:31:44 Thread-43162528 ERROR pingManager: nmepm_pingReposURL: Cannot connect to http://rac01:1158/em/upload/: retStatus=-32
    2007-07-25 13:32:14 Thread-74791840 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
    2007-07-25 13:32:14 Thread-74791840 ERROR pingManager: nmepm_pingReposURL: Cannot connect to http://rac01:1158/em/upload/: retStatus=-32
    2007-07-25 13:32:14 Thread-74791840 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
    2007-07-25 13:32:14 Thread-74791840 ERROR pingManager: nmepm_pingReposURL: Cannot connect to http://rac01:1158/em/upload/: retStatus=-32
    2007-07-25 13:32:44 Thread-74791840 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
    2007-07-25 13:32:44 Thread-74791840 ERROR pingManager: nmepm_pingReposURL: Cannot connect to http://rac01:1158/em/upload/: retStatus=-32
    2007-07-25 13:32:44 Thread-74791840 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
    2007-07-25 13:32:44 Thread-74791840 ERROR pingManager: nmepm_pingReposURL: Cannot connect to http://rac01:1158/em/upload/: retStatus=-32
    Message was edited by:
    Singh

  • RAC - Oracle Grid Infrastructure configure failed

    Hi, am trying to install 2 node RAC on Oracle VMs. Before the installation during the -preinst check there were few issues which were resolved (ex user equivalence). After that during the installation process of the Grid it failed at step "Configure Oracle Grid Infrastructure for a cluster". After it failed at this step, subsequent steps too failed which I asked OUI to ignore and then I ran both the post installation scripts. And then ran post crsinst which failed. Pasting below the output of the root.sh script, post crsinst and other checks.
    [root@bsfrac01 grid]# sh root.sh
    Running Oracle 11g root.sh script...
    The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME= /u01/app/11.2/grid
    Enter the full pathname of the local bin directory: [usr/local/bin]:
    Copying dbhome to /usr/local/bin ...
    Copying oraenv to /usr/local/bin ...
    Copying coraenv to /usr/local/bin ...
    Creating /etc/oratab file...
    Entries will be added to the /etc/oratab file as needed by
    Database Configuration Assistant when a database is created
    Finished running generic part of root.sh script.
    Now product-specific root actions will be performed.
    2011-02-13 00:11:55: Parsing the host name
    2011-02-13 00:11:55: Checking for super user privileges
    2011-02-13 00:11:55: User has super user privileges
    Using configuration parameter file: /u01/app/11.2/grid/crs/install/crsconfig_params
    Creating trace directory
    LOCAL ADD MODE
    Creating OCR keys for user 'root', privgrp 'root'..
    Operation successful.
    root wallet
    root wallet cert
    root cert export
    peer wallet
    profile reader wallet
    pa wallet
    peer wallet keys
    pa wallet keys
    peer cert request
    pa cert request
    peer cert
    pa cert
    peer root cert TP
    profile reader root cert TP
    pa root cert TP
    peer pa cert TP
    pa peer cert TP
    profile reader pa cert TP
    profile reader peer cert TP
    peer user cert
    pa user cert
    Adding daemon to inittab
    CRS-4123: Oracle High Availability Services has been started.
    ohasd is starting
    CRS-2672: Attempting to start 'ora.gipcd' on 'bsfrac01'
    CRS-2672: Attempting to start 'ora.mdnsd' on 'bsfrac01'
    CRS-2676: Start of 'ora.mdnsd' on 'bsfrac01' succeeded
    CRS-2676: Start of 'ora.gipcd' on 'bsfrac01' succeeded
    CRS-2672: Attempting to start 'ora.gpnpd' on 'bsfrac01'
    CRS-2676: Start of 'ora.gpnpd' on 'bsfrac01' succeeded
    CRS-2672: Attempting to start 'ora.cssdmonitor' on 'bsfrac01'
    CRS-2676: Start of 'ora.cssdmonitor' on 'bsfrac01' succeeded
    CRS-2672: Attempting to start 'ora.cssd' on 'bsfrac01'
    CRS-2672: Attempting to start 'ora.diskmon' on 'bsfrac01'
    CRS-2676: Start of 'ora.diskmon' on 'bsfrac01' succeeded
    CRS-2676: Start of 'ora.cssd' on 'bsfrac01' succeeded
    CRS-2672: Attempting to start 'ora.ctssd' on 'bsfrac01'
    CRS-2676: Start of 'ora.ctssd' on 'bsfrac01' succeeded
    ASM created and started successfully.
    DiskGroup DATA1 created successfully.
    clscfg: -install mode specified
    Successfully accumulated necessary OCR keys.
    Creating OCR keys for user 'root', privgrp 'root'..
    Operation successful.
    CRS-2672: Attempting to start 'ora.crsd' on 'bsfrac01'
    CRS-2676: Start of 'ora.crsd' on 'bsfrac01' succeeded
    CRS-4256: Updating the profile
    Successful addition of voting disk 0ea2052d8a714fd7bf46d9d5c785483e.
    Successfully replaced voting disk group with +DATA1.
    CRS-4256: Updating the profile
    CRS-4266: Voting file(s) successfully replaced
    ## STATE File Universal Id File Name Disk group
    1. ONLINE 0ea2052d8a714fd7bf46d9d5c785483e (ORCL:DISK1) [DATA1]
    Located 1 voting disk(s).
    *Failed to rmtcopy "/tmp/filekRIMbG" to "/u01/app/11.2/grid/gpnp/manifest.txt" for nodes {bsfrac01,bsfrac02}, rc=256*
    *Failed to rmtcopy "/u01/app/11.2/grid/gpnp/bsfrac01/profiles/peer/profile.xml" to "/u01/app/11.2/grid/gpnp/profiles/peer/profile.xml" for nodes {bsfrac01,bsfrac02}, rc=256*
    rmtcopy aborted
    Failed to promote local gpnp setup to other cluster nodes
    CRS-2673: Attempting to stop 'ora.crsd' on 'bsfrac01'
    CRS-2677: Stop of 'ora.crsd' on 'bsfrac01' succeeded
    CRS-2673: Attempting to stop 'ora.asm' on 'bsfrac01'
    CRS-2677: Stop of 'ora.asm' on 'bsfrac01' succeeded
    CRS-2673: Attempting to stop 'ora.ctssd' on 'bsfrac01'
    CRS-2677: Stop of 'ora.ctssd' on 'bsfrac01' succeeded
    CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'bsfrac01'
    CRS-2677: Stop of 'ora.cssdmonitor' on 'bsfrac01' succeeded
    CRS-2673: Attempting to stop 'ora.cssd' on 'bsfrac01'
    CRS-2677: Stop of 'ora.cssd' on 'bsfrac01' succeeded
    CRS-2673: Attempting to stop 'ora.gpnpd' on 'bsfrac01'
    CRS-2677: Stop of 'ora.gpnpd' on 'bsfrac01' succeeded
    CRS-2673: Attempting to stop 'ora.gipcd' on 'bsfrac01'
    CRS-2677: Stop of 'ora.gipcd' on 'bsfrac01' succeeded
    CRS-2673: Attempting to stop 'ora.mdnsd' on 'bsfrac01'
    CRS-2677: Stop of 'ora.mdnsd' on 'bsfrac01' succeeded
    Initial cluster configuration failed. See /u01/app/11.2/grid/cfgtoollogs/crsconfig/rootcrs_bsfrac01.log for details
    [root@bsfrac01 grid]#
    [oracle@bsfrac01 bin]$ ./cluvfy stage -post crsinst -n bsfrac01,bsfrac02 -verbose
    Performing post-checks for cluster services setup
    Checking node reachability...
    Check: Node reachability from node "bsfrac01"
    Destination Node Reachable?
    bsfrac01 yes
    bsfrac02 yes
    Result: Node reachability check passed from node "bsfrac01"
    Checking user equivalence...
    Check: User equivalence for user "oracle"
    Node Name Comment
    bsfrac01 passed
    bsfrac02 passed
    Result: User equivalence check passed for user "oracle"
    ERROR:
    PRKC-1094 : Failed to retrieve the active version of crs: {0}
    Checking time zone consistency...
    Time zone consistency check passed.
    ERROR:
    PRKC-1093 : Failed to retrieve the version of crs software on node "java.io.IOException: /u01/app/11.2.0/grid/bin/crsctl: not found
    " : {1}
    ERROR:
    Cluster manager integrity check failed
    PRVF-5434 : Cannot identify the current CRS software version
    UDev attributes check for OCR locations started...
    Result: UDev attributes check passed for OCR locations
    UDev attributes check for Voting Disk locations started...
    ERROR:
    PRVF-5197 : Failed to retrieve voting disk locationsPRKC-1092 : Failed to retrieve the location of votedisks: java.io.IOException: /u01/app/11.2.0/grid/bin/crsctl: not found
    Result: UDev attributes check failed for Voting Disk locations
    Check default user file creation mask
    Node Name Available Required Comment
    bsfrac01 0022 0022 passed
    bsfrac02 0022 0022 passed
    Result: Default user file creation mask check passed
    Checking cluster integrity...
    Node Name
    bsfrac01
    Cluster integrity check failed This check did not run on the following node(s):
    bsfrac02
    Checking OCR integrity...
    Checking the absence of a non-clustered configuration...
    All nodes free of non-clustered, local-only configurations
    ERROR:
    PRKC-1094 : Failed to retrieve the active version of crs: {0}
    ERROR:
    PRVF-5300 : Failed to retrieve active version for CRS on this node
    OCR integrity check failed
    Checking CRS integrity...
    ERROR:
    PRKC-1094 : Failed to retrieve the active version of crs: {0}
    ERROR:
    PRVF-5300 : Failed to retrieve active version for CRS on this node
    CRS integrity check failed
    OCR detected on ASM. Running ACFS Integrity checks...
    Starting check to see if ASM is running on all cluster nodes...
    PRVF-5137 : Failure while checking ASM status on node "bsfrac01"
    PRVF-5137 : Failure while checking ASM status on node "bsfrac02"
    Starting Disk Groups check to see if at least one Disk Group configured...
    PRVF-5112 : An Exception occurred while checking for Disk Groups
    PRVF-5114 : Disk Group check failed. No Disk Groups configured
    Task ACFS Integrity check failed
    Checking Oracle Cluster Voting Disk configuration...
    ERROR:
    PRKC-1093 : Failed to retrieve the version of crs software on node "java.io.IOException: /u01/app/11.2.0/grid/bin/crsctl: not found
    " : {1}
    ERROR:
    PRVF-5434 : Cannot identify the current CRS software version
    PRVF-5431 : Oracle Cluster Voting Disk configuration check failed
    Checking to make sure user "oracle" is not in "root" group
    Node Name Status Comment
    bsfrac01 does not exist passed
    bsfrac02 does not exist passed
    Result: User "oracle" is not part of "root" group. Check passed
    Post-check for cluster services setup was unsuccessful on all the nodes.
    [oracle@bsfrac01 bin]$ /u01/app/11.2/grid/bin/ocrcheck
    Status of Oracle Cluster Registry is as follows :
    Version : 3
    Total space (kbytes) : 262120
    Used space (kbytes) : 408
    Available space (kbytes) : 261712
    ID : 1671840043
    Device/File Name : +DATA1
    Device/File integrity check succeeded
    Device/File not configured
    Device/File not configured
    Device/File not configured
    Device/File not configured
    Cluster registry integrity check succeeded
    Logical corruption check bypassed due to non-privileged user
    ASM looks to be up and running..
    [oracle@bsfrac01 bin]$ /usr/sbin/oracleasm listdisks
    DISK1
    DISK2
    DISK3
    DISK4
    DISK5
    DISK6
    [oracle@bsfrac01 bin]$ /usr/sbin/oracleasm status
    Checking if ASM is loaded: yes
    Checking if /dev/oracleasm is mounted: yes
    Please help.

    before installation have u configure the private interconnect on both the nodes to same network adapter..
    for example on node 1 if the private interconnect is on eth0 then on the node 2 it should use eth0 only...
    for private interconnect use the hostonly option on both the nodes in the network configuration page of the vmware or virtual box..
    and for public network it can be bridged
    more over if you are installing on the laptop its good to configure the SSH using the OUI.. rather than doing it manually.. as it saves time
    the private and the public networks should not have same range of ip address. like if public address are like 192.168.2.222/255.255.255.0 and private address have to be different like 10.10.1.2/255.0.0.0 (this is just an example)
    have to configured the NTP.
    any ways try installing the oracle rac on virtual box follow the steps given the below website they are pretty straight forward...
    http://www.oracle-base.com/articles/11g/OracleDB11gR2RACInstallationOnOEL5UsingVirtualBox.php

  • RAC node restarting!

    hi
    one of our RAC environment keep restarting.
    i've disable the init.cssd, init.crs, init.evmd in the /etc/inittab in order to check the logs.
    this is the situation:
    crsd.log:
    2009-02-04 00:09:00.118: [ COMMCRS][9]clsc_connect: (8000000100318640) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_loud))
    2009-02-04 00:09:00.132: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
    2009-02-04 00:09:00.134: [  CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..
    2009-02-04 00:09:08.016: [    CRSD][1]32Daemon Version: 10.2.0.2.0 Active Version: 10.2.0.2.0
    2009-02-04 00:09:08.016: [    CRSD][1]32Active Version and Software Version are same
    2009-02-04 00:09:08.017: [ CRSMAIN][1]32Initializing OCR
    2009-02-04 00:09:08.037: [  OCRRAW][1]proprioo: for disk 0 (/dev/rdsk/ora_ocr_raw), id match (1), my id set (752560621,1028247821) total id sets (1), 1st set
    (752560621,1028247821), 2nd set (0,0) my votes (2), total votes (2)
    2009-02-04 00:09:08.140: [ CSSCLNT][24]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
    ocssd.log:
    [    CSSD]2009-02-03 21:52:08.651 [9] >USER: clssnmHandleUpdate: NODE 1 (node1l) IS ACTIVE MEMBER OF CLUSTER
    [    CSSD]2009-02-03 21:52:08.651 [9] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
    [    CSSD]2009-02-03 21:52:08.651 [16] >TRACE: clssnmWaitForAcks: done, msg type(15)
    [    CSSD]2009-02-03 21:52:08.651 [16] >TRACE: clssnmDoSyncUpdate: Sync Complete!
    [    CSSD]2009-02-03 21:52:08.722 [1] >USER: NMEVENT_SUSPEND [00][00][00][00]
    [    CSSD]2009-02-03 21:52:08.724 [17] >TRACE: clssgmReconfigThread: started for reconfig (1)
    [    CSSD]2009-02-03 21:52:08.749 [17] >USER: NMEVENT_RECONFIG [00][00][00][02]
    [    CSSD]2009-02-03 21:52:08.749 [17] >TRACE: clssgmEstablishConnections: 1 nodes in cluster incarn 1
    [    CSSD]2009-02-03 21:52:08.751 [13] >TRACE: clssgmPeerListener: connects done (1/1)
    [    CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmEstablishMasterNode: MASTER for 1 is node(1) birth(1)
    [    CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
    [    CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
    [    CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
    [    CSSD]CLSS-3000: reconfiguration successful, incarnation 1 with 1 nodes
    [    CSSD]CLSS-3001: local node number 1, master node number 1
    [    CSSD]2009-02-03 21:52:08.753 [17] >TRACE: clssgmReconfigThread: completed for reconfig(1), with status(1)
    [    CSSD]2009-02-03 21:52:08.863 [10] >TRACE: clssgmClientConnectMsg: Connect from con(80000001008fd2a0) proc(8000000100ae26a8) pid() proto(10:2:1:1)
    [    CSSD]2009-02-03 21:52:08.864 [10] >TRACE: clssgmClientConnectMsg: Connect from con(8000000100ae0128) proc(8000000100ae2a10) pid() proto(10:2:1:1) from con(8000000100aa32c0) proc(8000000100aa5b90) pid() proto(10:2:1:1)
    alertlog:
    [cssd(2535)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node1 .
    2009-02-03 23:55:20.821
    [cssd(2575)]CRS-1605:CSSD voting file is online: /dev/rdsk/ora_voting_raw. Detai ls in /work/crs/product/10.2/crs/log/lourmel/cssd/ocssd.log.
    2009-02-03 23:55:28.376
    evmd.log:
    Oracle Database 10g CRS Release 10.2.0.2.0 Production Copyright 1996, 2004, Oracle. All rights reserved
    2009-02-04 00:08:58.331: [    EVMD][1]32EVMD waiting for CSS to be ready err = 3
    2009-02-04 00:08:59.939: [ COMMCRS][9]clsc_connect: (800000010007d658) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_loud))
    2009-02-04 00:08:59.946: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
    2009-02-04 00:08:59.948: [    EVMD][1]32EVMD waiting for CSS to be ready err = 3
    2009-02-04 00:09:07.596: [ CSSCLNT][1]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
    syslog:
    Feb 4 00:08:41 lourmel syslog: Oracle Cluster Ready Services starting up automatically.
    Feb 4 00:08:45 lourmel sfd[2153]: starting the daemon.
    Feb 4 00:08:45 lourmel su: + tty?? root-orac
    Feb 4 00:08:45 lourmel krsd[2152]: Delay time is 300 seconds
    Feb 4 00:08:43 lourmel syslog: Oracle Cluster Ready Services starting up automatically.
    Feb 4 00:08:52 lourmel above message repeats 2 times
    Feb 4 00:08:52 lourmel syslog: Cluster Ready Services completed waiting on dependencies.
    Feb 4 00:08:53 lourmel syslog: Running CRSD with TZ =
    when i checked(befor the restart) the command crs_stat i got the message:
    ORA-0184: Cannot communicate wirh CRS
    crsctl check crs gives us:
    Failure 1 contacting CSS daemon
    Cannot communicate with CRS
    Cannot communicate with EVM
    as i said befor, the machine always restarting
    anyone have an idea?? please

    Dear All,
    I recently upgrade the Few RAC setups with Oracle 10g Patchset 3 (10.2.0.4) on Linux Servers
    In one of the RAC setup, found servers are rebooting daily. The same setup was working fine and problem started only after applying the Patchset. Checked all the logs and Found nothing relevant.
    Then i checked the things which added with this Patchset.
    The Most interesting found , Oracle Added a New Daemon- oprocd.
    # ps -efl | grep oprocd
    4 S root 6440 6063 0 -40 - - 2114 - Mar03 ? 00:00:00 /opt/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 500 -hsi 5:10:50:75:90 -f
    These are Interesting Points about above line
    1.This Process is running by root user
    2. With Highest Priority -40
    3. Probing every Seconds (t 1000)
    4. waiting CPU response for 500 Milliseconds ( -m 500 means margin time is 500 Milli Seconds)
    5. Process status is Fatal (-f)
    Now I am concluding these points- This daemon will probe cpu every second and wait for response within 500 Mill seconds. If in the 500 Milli second not getting any response from the cpu, will assume the CPU is hang and try to Reboot the Machine. The OPERATING SYSTEM will not get enough time to write the system logs and server reboots.
    So the solution is increase the Margin time for 500 Milli second to 10 seconds.
    These are following steps to increase the Margin time.
    Please Remember- The Modification process need Downtime and You need to stop cluster service in all member nodes.
    1. Stop The CRS Process
    #crsctl stop crs
    #<CRS_HOME>/bin/oprocd stop
    2. Ensure that Clusterware stack is down and not running
    #ps -ef |egrep "crsd.bin|ocssd.bin|evmd.bin|oprocd"
    This should return no processes.
    3. From one node of the cluster, change the value of the "diagwait" parameter to 13 by issuing the command as root:
    #crsctl set css diagwait 13 -force
    4. Check if diagwait is successfully set.
    #crsctl get css diagwait
    5. Restart the Oracle Clusterware on all the nodes by executing:
    #crsctl start crs
    (Note- If facing any problem to restarting the CRS services, ASM and Database, You can reboot the Nodes.The Cluster and Database will come automatically due to init startup scripts.)
    6. The oprocd daemon process will show with -m 10000
    # ps -efl| grep oprocd
    # 4 S root 6440 6063 0 -40 - - 2114 - Feb02 ? 00:00:00 /opt/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 10000 -hsi 5:10:50:75:90 -f
    Rollback Procedure-
    If You need to unset oprocd value due any reason
    #crsctl unset css diagwait
    I am confident, The abnormal RAC Node restart problem will solve with this workaround.
    Regards,
    Sumit
    Bangalore,India

  • RAC node is not starting

    I have 2 node RAC instance in 2 virtual machines. My node 2 ASM instance is not starting after the reboot but node 1 is working fine.
    I got below output from Node 2.
    srvctl status asm
    PRCR-1070 : Failed to check if resource ora.asm is registered
    Cannot communicate with crsd
    alertlog
    [/u01/app/11.2.0/grid/bin/cssdagent(3782)]CRS-5818:Aborted command 'start for resource: ora.cssd 1 1' for resource 'ora.cssd'. Details at (:CRSAGF00113:) in /u01/app/11.2.0/grid/log/ol5-112-rac2/agent/ohasd/oracssdagent_root/oracssdagent_root.log.
    2015-03-13 16:38:23.671
    [ohasd(3315)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cssd'. Details at (:CRSPE00111:) in /u01/app/11.2.0/grid/log/ol5-112-rac2/ohasd/ohasd.log.
    2015-03-13 16:38:24.264
    [ohasd(3315)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server 'ol5-112-rac2'.
    2015-03-13 16:38:35.915
    [cssd(4326)]CRS-1713:CSSD daemon is started in clustered mode
    2015-03-13 16:38:41.195
    [cssd(4326)]CRS-1707:Lease acquisition for node ol5-112-rac2 number 2 completed
    2015-03-13 16:38:41.259
    [cssd(4326)]CRS-1605:CSSD voting file is online: /dev/oracleasm/disks/DISK1; details in /u01/app/11.2.0/grid/log/ol5-112-rac2/cssd/ocssd.log.
    2015-03-13 16:38:41.562
    [crsd(4285)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/ol5-112-rac2/crsd/crsd.log.
    2015-03-13 16:38:42.217
    [ohasd(3315)]CRS-2765:Resource 'ora.crsd' has failed on server 'ol5-112-rac2'.
    2015-03-13 16:38:43.313
    [crsd(4357)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/ol5-112-rac2/crsd/crsd.log.
    2015-03-13 16:38:44.262
    [ohasd(3315)]CRS-2765:Resource 'ora.crsd' has failed on server 'ol5-112-rac2'.
    2015-03-13 16:38:44.716
    [ohasd(3315)]CRS-2765:Resource 'ora.diskmon' has failed on server 'ol5-112-rac2'.
    crsd log
    2015-03-13 16:39:02.140: [  OCRASM][1266976496]proprasmo: kgfoCheckMount returned [7]
    2015-03-13 16:39:02.140: [  OCRASM][1266976496]proprasmo: The ASM instance is down
    2015-03-13 16:39:02.140: [  OCRRAW][1266976496]proprioo: Failed to open [+DATA]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
    2015-03-13 16:39:02.140: [  OCRRAW][1266976496]proprioo: No OCR/OLR devices are usable
    2015-03-13 16:39:02.140: [  OCRASM][1266976496]proprasmcl: asmhandle is NULL
    2015-03-13 16:39:02.140: [  OCRRAW][1266976496]proprinit: Could not open raw device
    2015-03-13 16:39:02.140: [  OCRASM][1266976496]proprasmcl: asmhandle is NULL
    2015-03-13 16:39:02.140: [  OCRAPI][1266976496]a_init:16!: Backend init unsuccessful : [26]
    2015-03-13 16:39:02.140: [  CRSOCR][1266976496] OCR context init failure.  Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge
    ORA-15077: could not locate ASM instance serving a required diskgroup
    ] [7]
    2015-03-13 16:39:02.140: [    CRSD][1266976496][PANIC] CRSD exiting: Could not init OCR, code: 26
    2015-03-13 16:39:02.140: [    CRSD][1266976496] Done
    Pls help

    Hi Levi,
    I got below output from cssd.log
    2015-03-15 10:38:48.248: [ GIPCNET][1214789952]gipcmodNetworkProcessConnect: slos op  :  sgipcnTcpConnect
    2015-03-15 10:38:48.248: [ GIPCNET][1214789952]gipcmodNetworkProcessConnect: slos dep :  No route to host (113)
    2015-03-15 10:38:48.248: [ GIPCNET][1214789952]gipcmodNetworkProcessConnect: slos loc :  connect
    2015-03-15 10:38:48.248: [ GIPCNET][1214789952]gipcmodNetworkProcessConnect: slos info:  addr '192.168.1.101:42729'
    2015-03-15 10:38:48.248: [    CSSD][1214789952]clssscSelect: conn complete ctx 0x24d7aa0 endp 0x10d4
    2015-03-15 10:38:48.248: [    CSSD][1214789952]clssnmeventhndlr: node(1), endp(0x10d4) failed, probe((nil)) ninf->endp (0x1000010d4) CONNCOMPLETE
    2015-03-15 10:38:48.248: [    CSSD][1214789952]clssnmDiscHelper: ol5-112-rac1, node(1) connection failed, endp (0x10d4), probe(0x100000000), ninf->endp 0x10d4
    2015-03-15 10:38:48.248: [    CSSD][1214789952]clssnmDiscHelper: node 1 clean up, endp (0x10d4), init state 0, cur state 0
    2015-03-15 10:38:48.248: [GIPCXCPT][1214789952]gipcInternalDissociate: obj 0x27c8050 [00000000000010d4] { gipcEndpoint : localAddr 'gipc://ol5-112-rac2:de93-de83-5e0c-a373#192.168.1.102#50439', remoteAddr 'gipc://ol5-112-rac1:nm_ol5-112-scan#192.168.1.101#42729', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x8061a, usrFlags 0x0 } not associated with any container, ret gipcretFail (1)
    2015-03-15 10:38:48.248: [GIPCXCPT][1214789952]gipcDissociateF [clssnmDiscHelper : clssnm.c : 3215]: EXCEPTION[ ret gipcretFail (1) ]  failed to dissociate obj 0x27c8050 [00000000000010d4] { gipcEndpoint : localAddr 'gipc://ol5-112-rac2:de93-de83-5e0c-a373#192.168.1.102#50439', remoteAddr 'gipc://ol5-112-rac1:nm_ol5-112-scan#192.168.1.101#42729', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x8061a, usrFlags 0x0 }, flags 0x0
    2015-03-15 10:38:48.248: [    CSSD][1214789952]clssnmDiscEndp: gipcDestroy 0x10d4
    2015-03-15 10:38:48.414: [    CSSD][1147648320]clssnmvDHBValidateNCopy: node 1, ol5-112-rac1, has a disk HB, but no network HB, DHB has rcfg 320087493, wrtcnt, 207342, LATS 355744, lastSeqNo 207342, uniqueness 1426244111, timestamp 1426396128/31164744
    2015-03-15 10:38:48.414: [    CSSD][1214789952]clssnmconnect: connecting to addr gipc://ol5-112-rac1:nm_ol5-112-scan#192.168.1.101#42729
    2015-03-15 10:38:48.414: [    CSSD][1214789952]clssscConnect: endp 0x10e0 - cookie 0x24d7aa0 - addr gipc://ol5-112-rac1:nm_ol5-112-scan#192.168.1.101#42729
    2015-03-15 10:38:48.414: [    CSSD][1214789952]clssnmconnect: connecting to node(1), endp(0x10e0), flags 0x10002
    2015-03-15 10:38:48.710: [    CSSD][1181219136]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
    2015-03-15 10:38:49.209: [    CSSD][1206397248]clssnmRcfgMgrThread: Local Join
    2015-03-15 10:38:49.209: [    CSSD][1206397248]clssnmLocalJoinEvent: begin on node(2), waittime 193000
    2015-03-15 10:38:49.209: [    CSSD][1206397248]clssnmLocalJoinEvent: set curtime (356544) for my node
    2015-03-15 10:38:49.209: [    CSSD][1206397248]clssnmLocalJoinEvent: scanning 32 nodes
    2015-03-15 10:38:49.209: [    CSSD][1206397248]clssnmLocalJoinEvent: Node ol5-112-rac1, number 1, is in an existing cluster with disk state 3
    2015-03-15 10:38:49.210: [    CSSD][1206397248]clssnmLocalJoinEvent: takeover aborted due to cluster member node found on disk
    Thanks,
    Suranga

  • Huge number of idle connections from loopback ip on oracle RAC node

    Hi,
    We have a 2node 11gR2(11.2.0.3) oracle RAC node. We are seeing huge number of idle connection(more than 5000 in each node) on both the nodes and increasing day by day. All the idle connections are from VIP and loopback address(127.0.0.1.47971 )
    netstat -an |grep -i idle|more
    127.0.0.1.47971 Idle
    any insight will be helpful.
    The server is suffering memory issues occasionally (once in a month).
    ORA-27300: OS system dependent operation:fork failed with status: 11
    ORA-27301: OS failure message: Resource temporarily unavailable
    Thanks

    user12959884 wrote:
    Hi,
    We have a 2node 11gR2(11.2.0.3) oracle RAC node. We are seeing huge number of idle connection(more than 5000 in each node) on both the nodes and increasing day by day. All the idle connections are from VIP and loopback address(127.0.0.1.47971 )
    netstat -an |grep -i idle|more
    127.0.0.1.47971 Idle
    any insight will be helpful.
    The server is suffering memory issues occasionally (once in a month).
    ORA-27300: OS system dependent operation:fork failed with status: 11
    ORA-27301: OS failure message: Resource temporarily unavailable
    Thankswe can not control what occurs on your DB Server.
    How do I ask a question on the forums?
    SQL and PL/SQL FAQ
    post results from following SQL
    SELECT * FROM V$VERSION;

  • RAC node rebooting frequently

    Hi all,
    I am woserking on two node rac environment.One of my rac node is rebooting so frequently.I am using oracle 10g database and clusterware also(10.2.0.1).
    Ihave checked os logs(linux AS 4),and rac related logs.Not able to find out anything.Posting all logs please suggest.

    Hi i am posting alert log,os log and ocssd logs....
    clusterware alert log....._
    [crsd(5649)]CRS-1201:CRSD started on node ctmisdb1.
    2012-03-21 09:50:38.188
    [cssd(7490)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 .
    2012-03-21 09:50:46.726
    [crsd(5649)]CRS-1204:Recovering CRS resources for node ctmisdb2.
    2012-03-21 09:55:21.760
    [cssd(7490)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 ctmisdb2 .
    2012-03-21 12:07:46.681
    [cssd(7426)]CRS-1605:CSSD voting file is online: /dev/raw/raw2. Details in /u01/app/oracle/product/crs/log/ctmisdb1/cssd/ocssd.log.
    2012-03-21 12:07:50.432
    [cssd(7426)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 ctmisdb2 .
    2012-03-21 12:07:50.893
    [crsd(5549)]CRS-1012:The OCR service started on node ctmisdb1.
    2012-03-21 12:07:50.942
    [evmd(7304)]CRS-1401:EVMD started on node ctmisdb1.
    2012-03-21 12:07:52.827
    [crsd(5549)]CRS-1201:CRSD started on node ctmisdb1.
    2012-03-21 12:48:41.908
    [cssd(7448)]CRS-1605:CSSD voting file is online: /dev/raw/raw2. Details in /u01/app/oracle/product/crs/log/ctmisdb1/cssd/ocssd.log.
    2012-03-21 12:48:45.741
    [cssd(7448)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 ctmisdb2 .
    2012-03-21 12:48:49.173
    [crsd(5546)]CRS-1012:The OCR service started on node ctmisdb1.
    2012-03-21 12:48:49.190
    [evmd(7328)]CRS-1401:EVMD started on node ctmisdb1.
    2012-03-21 12:48:50.818
    [crsd(5546)]CRS-1201:CRSD started on node ctmisdb1.
    2012-03-21 13:26:36.398
    [cssd(7343)]CRS-1605:CSSD voting file is online: /dev/raw/raw2. Details in /u01/app/oracle/product/crs/log/ctmisdb1/cssd/ocssd.log.
    2012-03-21 13:26:40.492
    [cssd(7343)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 ctmisdb2 .
    2012-03-21 13:26:40.939
    [crsd(5542)]CRS-1012:The OCR service started on node ctmisdb1.
    2012-03-21 13:26:40.977
    [evmd(7223)]CRS-1401:EVMD started on node ctmisdb1.
    2012-03-21 13:26:42.772
    [crsd(5542)]CRS-1201:CRSD started on node ctmisdb1.
    node os log....+
    Mar 21 12:06:35 ctmisdb1 rc: Starting readahead: succeeded
    Mar 21 12:06:35 ctmisdb1 messagebus: messagebus startup succeeded
    Mar 21 12:06:36 ctmisdb1 cups-config-daemon: cups-config-daemon startup succeeded
    Mar 21 12:06:36 ctmisdb1 haldaemon: haldaemon startup succeeded
    Mar 21 12:06:37 ctmisdb1 fstab-sync[6267]: removed all generated mount points
    Mar 21 12:06:37 ctmisdb1 fstab-sync[6378]: added mount point /media/cdrecorder for /dev/hde
    Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6323]: session opened for user oracle by (uid=0)
    Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6324]: session opened for user oracle by (uid=0)
    Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6229]: session opened for user oracle by (uid=0)
    Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6229]: session closed for user oracle
    Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6644]: session opened for user oracle by (uid=0)
    Mar 21 12:06:37 ctmisdb1 kernel: matroxfb: cannot set xres to 800, rounded up to 832
    Mar 21 12:06:37 ctmisdb1 last message repeated 2 times
    Mar 21 12:06:41 ctmisdb1 su(pam_unix)[6323]: session closed for user oracle
    Mar 21 12:06:41 ctmisdb1 su(pam_unix)[6644]: session closed for user oracle
    Mar 21 12:06:41 ctmisdb1 su(pam_unix)[6324]: session closed for user oracle
    Mar 21 12:06:41 ctmisdb1 logger: Cluster Ready Services completed waiting on dependencies.
    Mar 21 12:06:41 ctmisdb1 last message repeated 2 times
    Mar 21 12:06:45 ctmisdb1 gdm(pam_unix)[6379]: session opened for user root by (uid=0)
    Mar 21 12:06:46 ctmisdb1 gconfd (root-7052): starting (version 2.8.1), pid 7052 user 'root'
    Mar 21 12:06:47 ctmisdb1 gconfd (root-7052): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration source at position 0
    Mar 21 12:06:47 ctmisdb1 gconfd (root-7052): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position 1
    Mar 21 12:06:47 ctmisdb1 gconfd (root-7052): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source at position 2
    Mar 21 12:06:55 ctmisdb1 gconfd (root-7052): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position 0
    Mar 21 12:07:41 ctmisdb1 su(pam_unix)[5547]: session opened for user oracle by (uid=0)
    Mar 21 12:07:41 ctmisdb1 logger: Running CRSD with TZ =
    Mar 21 12:07:43 ctmisdb1 su(pam_unix)[7399]: session opened for user oracle by (uid=0)
    Mar 21 12:12:49 ctmisdb1 sshd(pam_unix)[15323]: session opened for user root by root(uid=0)
    Mar 21 12:12:57 ctmisdb1 su(pam_unix)[15531]: session opened for user oracle by root(uid=0)
    Mar 21 12:47:05 ctmisdb1 syslogd 1.4.1: restart.
    ocssd log....
    [    CSSD]2012-03-21 11:24:41.045 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661f0c0) proc(0x8006622560) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 11:24:41.078 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660cfe0) proc(0x800662ba70) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:07:44.564 >USER: Oracle Database 10g CSS Release 10.2.0.1.0 Production Copyright 1996, 2004 Oracle. All rights reserved.
    [  clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=ctmisdb1DBG_CSSD))
    [    CSSD]2012-03-21 12:07:44.564 >USER: CSS daemon log for node ctmisdb1, number 1, in cluster crs
    [    CSSD]2012-03-21 12:07:44.581 [28260544] >TRACE: clssscmain: local-only set to false
    [    CSSD]2012-03-21 12:07:44.603 [28260544] >TRACE: clssnmReadNodeInfo: added node 1 (ctmisdb1) to cluster
    [    CSSD]2012-03-21 12:07:44.621 [28260544] >TRACE: clssnmReadNodeInfo: added node 2 (ctmisdb2) to cluster
    [    CSSD]2012-03-21 12:07:44.627 [72925824] >TRACE: clssnm_skgxnmon: skgxn init failed, rc 1
    [    CSSD]2012-03-21 12:07:44.627 [28260544] >TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor
    [    CSSD]2012-03-21 12:07:44.641 [28260544] >TRACE: clssnmInitNMInfo: misscount set to 60
    [    CSSD]2012-03-21 12:07:44.655 [28260544] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw2)
    [    CSSD]2012-03-21 12:07:46.661 [72925824] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw2)
    [    CSSD]2012-03-21 12:07:46.690 [72925824] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(18) wrtcnt(7920) LATS(0) Disk lastSeqNo(7920)
    [    CSSD]2012-03-21 12:07:46.752 [28260544] >TRACE: clssnmFatalInit: fatal mode enabled
    [    CSSD]2012-03-21 12:07:46.752 [94777984] >TRACE: clssnmconnect: connecting to node 1, flags 0x0001, connector 1
    [    CSSD]2012-03-21 12:07:46.753 [94777984] >TRACE: clssnmconnect: connecting to node 0, flags 0x0000, connector 1
    [    CSSD]2012-03-21 12:07:46.753 [94777984] >TRACE: clssnmClusterListener: Probing node(2)
    [    CSSD]2012-03-21 12:07:46.755 [94777984] >TRACE: clssnmConnComplete: connected to node 2 (con 0x8006601040), state 3 birth 0, unique 1332303918/1332303918 prevConuni(0)
    [    CSSD]2012-03-21 12:07:46.756 [106332800] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
    [    CSSD]2012-03-21 12:07:46.756 [106332800] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_ctmisdb1_crs))
    [    CSSD]2012-03-21 12:07:46.757 [151810688] >TRACE: clssnmPollingThread: Connection complete
    [    CSSD]2012-03-21 12:07:46.757 [162296448] >TRACE: clssnmSendingThread: Connection complete
    [    CSSD]2012-03-21 12:07:46.757 [172782208] >TRACE: clssnmRcfgMgrThread: Connection complete
    [    CSSD]2012-03-21 12:07:46.757 [172782208] >TRACE: clssnmRcfgMgrThread: Local Join
    [    CSSD]2012-03-21 12:07:46.757 [172782208] >WARNING: clssnmLocalJoinEvent: takeover aborted due to connected but inactive nodes
    [    CSSD]2012-03-21 12:07:47.339 [94777984] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[ctmisdb2] seq[5] sync[18]
    [    CSSD]2012-03-21 12:07:47.759 [172782208] >TRACE: clssnmRcfgMgrThread: lastleader(2) unique(1332311864)
    [    CSSD]2012-03-21 12:07:48.341 [94777984] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(18)
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >TRACE: clssnmDeactivateNode: node 0 () left cluster
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >TRACE: clssnmUpdateNodeState: node 1, state (1/2) unique (1332311864/1332311864) prevConuni(0) birth (0/18) (old/new)
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >TRACE: clssnmUpdateNodeState: node 2, state (4/3) unique (1332303918/1332303918) prevConuni(0) birth (0/16) (old/new)
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >USER: clssnmHandleUpdate: SYNC(18) from node(2) completed
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >USER: clssnmHandleUpdate: NODE 1 (ctmisdb1) IS ACTIVE MEMBER OF CLUSTER
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >USER: clssnmHandleUpdate: NODE 2 (ctmisdb2) IS ACTIVE MEMBER OF CLUSTER
    [    CSSD]2012-03-21 12:07:50.429 [28260544] >USER: NMEVENT_SUSPEND [00][00][00][00]
    [    CSSD]2012-03-21 12:07:50.429 [183267968] >TRACE: clssgmReconfigThread: started for reconfig (18)
    [    CSSD]2012-03-21 12:07:50.429 [183267968] >USER: NMEVENT_RECONFIG [00][00][00][06]
    [    CSSD]2012-03-21 12:07:50.429 [183267968] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 18
    [    CSSD]2012-03-21 12:07:50.430 [140255872] >TRACE: clssgmInitialRecv: (0x102a0360) accepted a new connection from node 2 born at 16 active (2, 2), vers (10,3,1,2)
    [    CSSD]2012-03-21 12:07:50.430 [140255872] >TRACE: clssgmInitialRecv: conns done (2/2)
    [    CSSD]2012-03-21 12:07:50.430 [183267968] >TRACE: clssgmEstablishMasterNode: MASTER for 18 is node(2) birth(16)
    [    CSSD]2012-03-21 12:07:50.430 [183267968] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
    [    CSSD]2012-03-21 12:07:50.432 [140255872] >TRACE: clssgmHandleDBDone(): src/dest (2/65535) size(72) incarn 18
    [    CSSD]CLSS-3000: reconfiguration successful, incarnation 18 with 2 nodes
    [    CSSD]CLSS-3001: local node number 1, master node number 2
    [    CSSD]2012-03-21 12:07:50.433 [183267968] >TRACE: clssgmReconfigThread: completed for reconfig(18), with status(1)
    [    CSSD]2012-03-21 12:07:50.550 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006603bb0) proc(0x8006608b00) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:07:50.551 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x80066066f0) proc(0x8006608d70) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:07:53.569 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660ec70) proc(0x8006611260) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:00.829 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006610990) proc(0x800660de00) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:04.698 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006613030) proc(0x8006612930) pid(8115) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:04.816 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006612950) proc(0x8006613c20) pid(8115) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:04.832 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006612950) proc(0x8006613c20) pid(8115) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:06.615 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006612950) proc(0x8006613c20) pid(8171) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:07.114 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006615960) proc(0x8006616350) pid(8175) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:11.373 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x80066192a0) proc(0x8006619470) pid(8302) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:11.669 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661bf60) proc(0x800661ee20) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:17.135 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661bf60) proc(0x800661ee70) pid(8458) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:17.268 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661fc00) proc(0x80066220d0) pid(8460) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:17.305 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x80066223e0) proc(0x8006625250) pid(8462) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:17.353 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006625560) proc(0x8006628430) pid(8464) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:24.585 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006625560) proc(0x8006628430) pid(8645) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:27.957 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006628740) proc(0x800662b610) pid(8722) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:30.931 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800662cce0) proc(0x800662c860) pid(8801) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:36.400 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661c5f0) proc(0x800661eb50) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:37.863 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800662f1c0) proc(0x800661eee0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:38.537 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800662f1c0) proc(0x800661d500) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:39.232 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661bf60) proc(0x800661d500) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:43.085 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006611210) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:58.971 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x80066112c0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:09:59.290 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800660b190) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:10:59.589 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800660b190) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:11:59.904 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800660b190) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:13:00.203 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800660b190) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:13:14.029 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800660b190) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:14:00.501 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006611210) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:15:00.809 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:16:01.117 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:17:01.447 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:01.762 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:39.841 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:42.123 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:42.316 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:42.843 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:42.963 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:43.098 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b260) proc(0x800662bd20) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:44.173 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:44.368 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b260) proc(0x800660b310) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:45.351 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:46.236 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:47.031 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:47.694 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:47.819 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b260) proc(0x800660b310) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:48.103 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:48.327 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b260) proc(0x800660b310) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:48.484 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006611210) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:48.758 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:49.529 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:50.509 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:51.060 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:51.558 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:48:39.836 >USER: Oracle Database 10g CSS Release 10.2.0.1.0 Production Copyright 1996, 2004 Oracle. All rights reserved.
    [  clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=ctmisdb1DBG_CSSD))
    [    CSSD]2012-03-21 12:48:39.836 >USER: CSS daemon log for node ctmisdb1, number 1, in cluster crs
    [    CSSD]2012-03-21 12:48:39.849 [28260544] >TRACE: clssscmain: local-only set to false
    [    CSSD]2012-03-21 12:48:39.865 [28260544] >TRACE: clssnmReadNodeInfo: added node 1 (ctmisdb1) to cluster
    [    CSSD]2012-03-21 12:48:39.872 [28260544] >TRACE: clssnmReadNodeInfo: added node 2 (ctmisdb2) to cluster
    [    CSSD]2012-03-21 12:48:39.879 [72925824] >TRACE: clssnm_skgxnmon: skgxn init failed, rc 1
    [    CSSD]2012-03-21 12:48:39.879 [28260544] >TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor
    [    CSSD]2012-03-21 12:48:39.881 [28260544] >TRACE: clssnmInitNMInfo: misscount set to 60
    [    CSSD]2012-03-21 12:48:39.888 [28260544] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw2)
    [    CSSD]2012-03-21 12:48:41.892 [72925824] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw2)
    [    CSSD]2012-03-21 12:48:41.915 [72925824] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(20) wrtcnt(10367) LATS(0) Disk lastSeqNo(10367)
    [    CSSD]2012-03-21 12:48:41.959 [28260544] >TRACE: clssnmFatalInit: fatal mode enabled
    [    CSSD]2012-03-21 12:48:41.959 [94777984] >TRACE: clssnmconnect: connecting to node 1, flags 0x0001, connector 1
    [    CSSD]2012-03-21 12:48:41.959 [94777984] >TRACE: clssnmconnect: connecting to node 0, flags 0x0000, connector 1
    [    CSSD]2012-03-21 12:48:41.959 [94777984] >TRACE: clssnmClusterListener: Probing node(2)
    [    CSSD]2012-03-21 12:48:41.961 [94777984] >TRACE: clssnmConnComplete: connected to node 2 (con 0x8006702790), state 3 birth 0, unique 1332303918/1332303918 prevConuni(0)
    [    CSSD]2012-03-21 12:48:41.962 [106332800] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
    [    CSSD]2012-03-21 12:48:41.962 [106332800] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_ctmisdb1_crs))
    [    CSSD]2012-03-21 12:48:41.963 [152330880] >TRACE: clssnmPollingThread: Connection complete
    [    CSSD]2012-03-21 12:48:41.963 [162816640] >TRACE: clssnmSendingThread: Connection complete
    [    CSSD]2012-03-21 12:48:41.963 [173302400] >TRACE: clssnmRcfgMgrThread: Connection complete
    [    CSSD]2012-03-21 12:48:41.963 [173302400] >TRACE: clssnmRcfgMgrThread: Local Join
    [    CSSD]2012-03-21 12:48:41.963 [173302400] >WARNING: clssnmLocalJoinEvent: takeover aborted due to connected but inactive nodes
    [    CSSD]2012-03-21 12:48:42.631 [94777984] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[ctmisdb2] seq[13] sync[20]
    [    CSSD]2012-03-21 12:48:42.965 [173302400] >TRACE: clssnmRcfgMgrThread: lastleader(2) unique(1332314319)
    [    CSSD]2012-03-21 12:48:43.636 [94777984] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(20)
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >TRACE: clssnmDeactivateNode: node 0 () left cluster
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >TRACE: clssnmUpdateNodeState: node 1, state (1/2) unique (1332314319/1332314319) prevConuni(0) birth (0/20) (old/new)
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >TRACE: clssnmUpdateNodeState: node 2, state (4/3) unique (1332303918/1332303918) prevConuni(0) birth (0/16) (old/new)
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >USER: clssnmHandleUpdate: SYNC(20) from node(2) completed
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >USER: clssnmHandleUpdate: NODE 1 (ctmisdb1) IS ACTIVE MEMBER OF CLUSTER
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >USER: clssnmHandleUpdate: NODE 2 (ctmisdb2) IS ACTIVE MEMBER OF CLUSTER
    [    CSSD]2012-03-21 12:48:45.737 [28260544] >USER: NMEVENT_SUSPEND [00][00][00][00]
    [    CSSD]2012-03-21 12:48:45.738 [183788160] >TRACE: clssgmReconfigThread: started for reconfig (20)
    [    CSSD]2012-03-21 12:48:45.738 [183788160] >USER: NMEVENT_RECONFIG [00][00][00][06]
    [    CSSD]2012-03-21 12:48:45.738 [183788160] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 20
    [    CSSD]2012-03-21 12:48:45.739 [140776064] >TRACE: clssgmInitialRecv: (0x102a0370) accepted a new connection from node 2 born at 16 active (2, 2), vers (10,3,1,2)
    [    CSSD]2012-03-21 12:48:45.739 [140776064] >TRACE: clssgmInitialRecv: conns done (2/2)
    [    CSSD]2012-03-21 12:48:45.739 [183788160] >TRACE: clssgmEstablishMasterNode: MASTER for 20 is node(2) birth(16)
    [    CSSD]2012-03-21 12:48:45.739 [183788160] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
    [    CSSD]2012-03-21 12:48:45.741 [140776064] >TRACE: clssgmHandleDBDone(): src/dest (2/65535) size(72) incarn 20
    [    CSSD]CLSS-3000: reconfiguration successful, incarnation 20 with 2 nodes
    Plz check and help..........

  • What would happened when one RAC node's public NIC down ?

    Dear all,
    There's a two-node RAC on my office. As my observation, when one RAC node's public NIC is down, the "crs_stat -t" command wouldn't show any resource is OFFLINE. So at this moment if i use sqlplus to connect to my RAC db, it still will choose node1 or node2 instance randomly right? And if i've been assigned to the node's instance whose public NIC is down, my sqlplus would failed to connect to db?
    So, can we say that Oracle RAC can't supply HA function if one node's public NIC is down? Or is there just any other solution to solve this issue? Any suggestion would be appreciated.

    Public node is down means that only one node would be able to take the load i.e. in your case it will be Node 2. It may happen that CRS is unable to record the status - in such case there may be failed connections to one node.

  • Remove RAC node on Windows

    I have done all the steps to remove one RAC node but got stuck at the step of running rootdelete.sh file from $CRS_HOME/install directory as I don't have this file in windows environment.
    What is the equivalent file for rootdelete.sh on windows platform. I want to run this to remove the node info from the clusterware entry.
    Is there a good document that explains about removing the node on windows platform.

    Hello,
    You need to run the following steps to remove a node from a RAC cluster on Windows platform:
    Perform the following steps on a node other than the node you want to delete:
    1. Run the Database Configuration Assistant (DBCA) utility to delete the instance.
    2. Then run the Net Configuration Assistant (NetCA) to delete the listener.
    3. If the node that you are deleting has ASM instance, then delete the ASM instance using the srvctl stop asm and srvctl remove asm commands.
    4. Run the command srvctl stop nodeapps -n nodename of the node to be deleted to stop the node applications.
    5. Run the command srvctl remove nodeapps -n nodename of the node to be deleted to remove the node applications.
    6. Stop isqlplus if it is running.
    7. Run the command setup.exe -updateNodeList ORACLE_HOME=Oracle_home ORACLE_HOME_NAME=Oracle_home_name CLUSTER_NODES=remaining
    nodes where remaining nodes is a list of the nodes that are to remain part of the cluster.
    Perform the following steps on the deleted RAC node:
    1. Run the command setup.exe -updateNodeList -local -noClusterEnabled ORACLE_HOME=Oracle_home ORACLE_HOME_NAME=Oracle_home_name CLUSTER_NODES="".
    Note that you do not need a value for "" after the CLUSTER_NODES= entry in this command. If you delete more than one node, then you must run this command on every deleted node to remove the Oracle home if you have a non-shared Oracle home (non-cluster file system) installation.
    2. On the same node, delete the Windows Registry entries and ASM services using Oradim.
    3. From the deleted RAC node, run the command Oracle_home\oui\bin\setup.exe to start the Oracle Universal Installer (OUI). Select Deinstall Products and select the Oracle home that you want to de-install.
    4. Then to delete the CRS node, from a remaining node run the command crssetup del -nn node_name of the deleted node, node number
    5. Then run the command setup.exe -updateNodeList ORACLE_HOME=CRS home ORACLE_HOME_NAME=CRS home name CLUSTER_NODES=remaining nodes where remaining nodes is a list of the nodes that are to remain in the cluster.
    6. Then on the deleted CRS node, run the command setup.exe -updateNodeList -local -noClusterEnabled ORACLE_HOME=CRS home ORACLE_HOME_NAME=CRS home name CLUSTER_NODES=""
    7. Remove the Oracle home manually from the new node if the home is not shared and then manually remove the HKLM/software/Oracle registry keys and the Oracle services. 7
    8. After adding or deleting nodes from your Oracle Database 10g with RAC environment, and after you are sure that your system is functioning properly, make a backup of the contents of the voting disk using the dd.exe utility. The dd.exe utility is part of the MKS toolkit.
    ASM Instance Cleanup Procedures after Node Deletion on Windows-Based Platforms
    The delete node procedure requires the following additional steps on Windows-based systems to remove the ASM instances:
    1. If this is the Oracle home from which the node-specific listener named LISTENER_nodename runs, then use NetCA to remove this listener and its CRS resources. If necessary, re-create this listener in another home.
    2. If this is the Oracle home from which the ASM instance runs, then remove the ASM configuration by running the following command for all nodes on which this Oracle home exists:
    srvctl stop asm -n node
    Then run the following command for the nodes that you are removing:
    srvctl remove asm -n node
    3. If you are using a cluster file system for your ASM Oracle home, then run the following commands on the local node:
    4. rd -s -q %ORACLE_BASE%\admin\+ASM
    delete %ORACLE_HOME%\database\*ASM*
    5. If you are not using a cluster file system for your ASM Oracle home, then run the delete command mentioned in the previous step on each node on which the Oracle home exists.
    6. Run the following command on each node that has an ASM instance:
    oradim -delete -asmsid +ASMnode_number
    Source:
    Oracle® Real Application Clusters Administrator's Guide
    10g Release 1 (10.1)
    Part Number B10765-02
    Chapter 5: Adding and Deleting Nodes and Instances
    Hope this helps,
    Ben Prusinski, Oracle 10g OCP
    http://oracle-magician.blogspot.com

  • Scan-vip running only on one RAC node

    Hi ,
    While setting up RAC11.2 on Centos 5.7 , I was getting this error during the grid installation:
    PRCR-1079 : Failed to start resource ora.scan1.vip
    CRS-5005: IP Address: 192.168.100.208 is already in use in the network
    CRS-2674: Start of 'ora.scan1.vip' on 'falcen6b' failed
    CRS-2632: There are no more servers to try to place resource 'ora.scan1.vip' on that would satisfy its placement policy
    PRCR-1079 : Failed to start resource ora.scan2.vip
    CRS-5005: IP Address: 192.168.100.209 is already in use in the network
    CRS-2674: Start of 'ora.scan2.vip' on 'falcen6b' failed
    CRS-2632: There are no more servers to try to place resource 'ora.scan2.vip' on that would satisfy its placement policy
    PRCR-1079 : Failed to start resource ora.scan3.vip
    CRS-5005: IP Address: 192.168.100.210 is already in use in the network
    CRS-2674: Start of 'ora.scan3.vip' on 'falcen6b' failed
    CRS-2632: There are no more servers to try to place resource 'ora.scan3.vip' on that would satisfy its placement policy
    I figured that the scan service is able to run only on one node at a time. When I stopped the service on rac1 and started it on rac2 the service is starting.
    But I think for the grid installation the scan service has to simultaneously run on both the nodes.
    How do I resolve it?
    Any suggestions please.
    PS - I am planning to try with the patch 11.0.2.3 but it will be a while till i get access to it.
    Till then can someone suggest a workaround?

    Hi Balazs Papp and onedbguru,
    I was able to resolve that error by running the following command on rac2, now that part of the installer passed.
    crsctl start res ora.scan1.vip
    However the cluster verification utility is failing at the end of installer.
    When I executed the below command, this is my output:
    [oracle@falcen6a grid]$ ./runcluvfy.sh stage -post crsinst -n falcen6a,falcen6b -verbose
    Performing post-checks for cluster services setup
    Checking node reachability...
    Check: Node reachability from node "falcen6a"
    Destination Node Reachable?
    falcen6a yes
    falcen6b yes
    Result: Node reachability check passed from node "falcen6a"
    Checking user equivalence...
    Check: User equivalence for user "oracle"
    Node Name Comment
    falcen6b passed
    falcen6a passed
    Result: User equivalence check passed for user "oracle"
    Checking time zone consistency...
    Time zone consistency check passed.
    Checking Cluster manager integrity...
    Checking CSS daemon...
    Node Name Status
    falcen6b running
    falcen6a running
    Oracle Cluster Synchronization Services appear to be online.
    Cluster manager integrity check passed
    UDev attributes check for OCR locations started...
    Result: UDev attributes check passed for OCR locations
    UDev attributes check for Voting Disk locations started...
    Result: UDev attributes check passed for Voting Disk locations
    Check default user file creation mask
    Node Name Available Required Comment
    falcen6b 0022 0022 passed
    falcen6a 0022 0022 passed
    Result: Default user file creation mask check passed
    Checking cluster integrity...
    Cluster is divided into 2 partitions
    Partition 1 consists of the following members:
    Node Name
    falcen6b
    Partition 2 consists of the following members:
    Node Name
    falcen6a
    Cluster integrity check failed. Cluster is divided into 2 partition(s).
    Checking OCR integrity...
    Checking the absence of a non-clustered configuration...
    All nodes free of non-clustered, local-only configurations
    ERROR:
    PRVF-4193 : Asm is not running on the following nodes. Proceeding with the remaining nodes.
    Checking OCR config file "/etc/oracle/ocr.loc"...
    OCR config file "/etc/oracle/ocr.loc" check successful
    ERROR:
    PRVF-4195 : Disk group for ocr location "+DATA" not available on the following nodes:
    Checking size of the OCR location "+DATA" ...
    Size check for OCR location "+DATA" successful...
    OCR integrity check failed
    Checking CRS integrity...
    ERROR:
    PRVF-5316 : Failed to retrieve version of CRS installed on node "falcen6b"
    The Oracle clusterware is healthy on node "falcen6b"
    The Oracle clusterware is healthy on node "falcen6a"
    CRS integrity check failed
    Checking node application existence...
    Checking existence of VIP node application
    Node Name Required Status Comment
    falcen6b yes unknown failed
    falcen6a yes unknown failed
    Result: Check failed.
    Checking existence of ONS node application
    Node Name Required Status Comment
    falcen6b no unknown ignored
    falcen6a no online passed
    Result: Check ignored.
    Checking existence of GSD node application
    Node Name Required Status Comment
    falcen6b no unknown ignored
    falcen6a no does not exist ignored
    Result: Check ignored.
    Checking existence of EONS node application
    Node Name Required Status Comment
    falcen6b no unknown ignored
    falcen6a no online passed
    Result: Check ignored.
    Checking existence of NETWORK node application
    Node Name Required Status Comment
    falcen6b no unknown ignored
    falcen6a no online passed
    Result: Check ignored.
    Checking Single Client Access Name (SCAN)...
    SCAN VIP name Node Running? ListenerName Port Running?
    falcen6-scan unknown false LISTENER 1521 false
    WARNING:
    PRVF-5056 : Scan Listener "LISTENER" not running
    Checking name resolution setup for "falcen6-scan"...
    SCAN Name IP Address Status Comment
    falcen6-scan 192.168.100.210 passed
    falcen6-scan 192.168.100.208 passed
    falcen6-scan 192.168.100.209 passed
    Verification of SCAN VIP and Listener setup failed
    OCR detected on ASM. Running ACFS Integrity checks...
    Starting check to see if ASM is running on all cluster nodes...
    PRVF-5137 : Failure while checking ASM status on node "falcen6b"
    Starting Disk Groups check to see if at least one Disk Group configured...
    Disk Group Check passed. At least one Disk Group configured
    Task ACFS Integrity check failed
    Checking Oracle Cluster Voting Disk configuration...
    Oracle Cluster Voting Disk configuration check passed
    Checking to make sure user "oracle" is not in "root" group
    Node Name Status Comment
    falcen6b does not exist passed
    falcen6a does not exist passed
    Result: User "oracle" is not part of "root" group. Check passed
    Checking if Clusterware is installed on all nodes...
    Check of Clusterware install passed
    Checking if CTSS Resource is running on all nodes...
    Check: CTSS Resource running on all nodes
    Node Name Status
    falcen6b passed
    falcen6a passed
    Result: CTSS resource check passed
    Querying CTSS for time offset on all nodes...
    Result: Query of CTSS for time offset passed
    Check CTSS state started...
    Check: CTSS state
    Node Name State
    falcen6b Observer
    falcen6a Observer
    CTSS is in Observer state. Switching over to clock synchronization checks using NTP
    Starting Clock synchronization checks using Network Time Protocol(NTP)...
    NTP Configuration file check started...
    The NTP configuration file "/etc/ntp.conf" is available on all nodes
    NTP Configuration file check passed
    Checking daemon liveness...
    Check: Liveness for "ntpd"
    Node Name Running?
    falcen6b yes
    falcen6a yes
    Result: Liveness check passed for "ntpd"
    Checking NTP daemon command line for slewing option "-x"
    Check: NTP daemon command line
    Node Name Slewing Option Set?
    falcen6b yes
    falcen6a yes
    Result:
    NTP daemon slewing option check passed
    Checking NTP daemon's boot time configuration, in file "/etc/sysconfig/ntpd", for slewing option "-x"
    Check: NTP daemon's boot time configuration
    Node Name Slewing Option Set?
    falcen6b yes
    falcen6a yes
    Result:
    NTP daemon's boot time configuration check for slewing option passed
    NTP common Time Server Check started...
    NTP Time Server "133.243.236.19" is common to all nodes on which the NTP daemon is running
    NTP Time Server "133.243.236.18" is common to all nodes on which the NTP daemon is running
    NTP Time Server "210.173.160.86" is common to all nodes on which the NTP daemon is running
    NTP Time Server ".LOCL." is common to all nodes on which the NTP daemon is running
    Check of common NTP Time Server passed
    Clock time offset check from NTP Time Server started...
    Checking on nodes "[falcen6b, falcen6a]"...
    Check: Clock time offset from NTP Time Server
    Time Server: 133.243.236.19
    Time Offset Limit: 1000.0 msecs
    Node Name Time Offset Status
    falcen6b 15.332 passed
    falcen6a -1.503 passed
    Time Server "133.243.236.19" has time offsets that are within permissible limits for nodes "[falcen6b, falcen6a]".
    Time Server: 133.243.236.18
    Time Offset Limit: 1000.0 msecs
    Node Name Time Offset Status
    falcen6b 15.115 passed
    falcen6a -1.614 passed
    Time Server "133.243.236.18" has time offsets that are within permissible limits for nodes "[falcen6b, falcen6a]".
    Time Server: 210.173.160.86
    Time Offset Limit: 1000.0 msecs
    Node Name Time Offset Status
    falcen6b 15.219 passed
    falcen6a -1.527 passed
    Time Server "210.173.160.86" has time offsets that are within permissible limits for nodes "[falcen6b, falcen6a]".
    Time Server: .LOCL.
    Time Offset Limit: 1000.0 msecs
    Node Name Time Offset Status
    falcen6b 0.0 passed
    falcen6a 0.0 passed
    Time Server ".LOCL." has time offsets that are within permissible limits for nodes "[falcen6b, falcen6a]".
    Clock time offset check passed
    Result: Clock synchronization check using Network Time Protocol(NTP) passed
    Oracle Cluster Time Synchronization Services check passed
    Post-check for cluster services setup was unsuccessful on all the nodes.
    [oracle@falcen6a grid]$
    Any suggestions?

  • Found the errors in CSSD logs of RAC node

    Found the below error in CSSD logs in One of RAC nodes from 5:15 to 5:18 PM, after this the error got disappeared. Could anyone please have an idea what could be the reason of this error.
    Also, at that time we didn't find any errors in the alert log.
    [    CSSD]2009-07-19 17:15:51.048 [3600] >TRACE: Authorization failed (112bd2a70), timed out, start 17:13:51.041, duration 120009
    [    CSSD]2009-07-19 17:15:51.048 [3600] >TRACE: Authorization prepare time: 2 ms
    [    CSSD]2009-07-19 17:15:51.233 [3086] >TRACE: clssgmClientConnectMsg: Connect from con(112b67930) proc(112b680b0) pid(1049540) proto(10:2:1:1)
    [    CSSD]2009-07-19 17:15:51.268 [3600] >TRACE: Authorization failed (112bd4a10), timed out, start 17:13:51.268, duration 120003
    [    CSSD]2009-07-19 17:15:51.268 [3600] >TRACE: Authorization prepare time: 3 ms
    [    CSSD]2009-07-19 17:15:52.544 [3086] >TRACE: clssgmClientConnectMsg: Connect from con(112b67930) proc(112b680b0) pid(786918) proto(10:2:1:1)
    [    CSSD]2009-07-19 17:15:53.297 [3600] >TRACE: Authorization failed (112c38af0), timed out, start 17:13:53.290, duration 120009
    [    CSSD]2009-07-19 17:15:53.297 [3600] >TRACE: Authorization prepare time: 3 ms
    [    CSSD]2009-07-19 17:15:53.317 [3600] >TRACE: Authorization failed (112d356f0), timed out, start 17:13:53.320, duration 120000
    [    CSSD]2009-07-19 17:15:53.317 [3600] >TRACE: Authorization prepare time: 2 ms
    [    CSSD]2009-07-19 17:16:02.342 [3086] >TRACE: clssgmClientConnectMsg: Connect from con(112b932b0) proc(112b67d10) pid(1336252) proto(10:2:1:1)
    [    CSSD]2009-07-19 17:16:02.977 [3600] >TRACE: Authorization failed (112d04f70), timed out, start 17:14:02.978, duration 120001
    [    CSSD]2009-07-19 17:16:02.977 [3600] >TRACE: Authorization prepare time: 2 ms
    [    CSSD]2009-07-19 17:16:03.007 [3600] >TRACE: Authorization failed (112d38210), timed out, start 17:14:03.006, duration 120002
    [    CSSD]2009-07-19 17:16:03.007 [3600] >TRACE: Authorization prepare time: 2 ms
    [    CSSD]2009-07-19 17:16:10.447 [3600] >TRACE: Authorization failed (112bd7e30), timed out, start 17:14:10.441, duration 120007
    [    CSSD]2009-07-19 17:16:10.447 [3600] >TRACE: Authorization prepare time: 2 ms
    [    CSSD]2009-07-19 17:16:10.847 [3600] >TRACE: Authorization failed (112d3ee70), timed out, start 17:14:10.840, duration 120008
    [    CSSD]2009-07-19 17:16:10.847 [3600] >TRACE: Authorization prepare time: 2 ms
    Thanks,
    Mahi

    Check the metalink note:
    6996694-OCSSD.BIN CONSUMING 100% CPU AND ASM/DB HANGING

Maybe you are looking for