Rac node failure crs cleanup failing

I have a three node rac database, 10.2.0.4 running on Windows server 2008. I lost a hard drive on one of the servers and it corrupted the mirror disk as well so I am having to rebuild. I am going through these procedures, RAC on Windows: How to Cleanup When A Node Has Been Disconnected or The OS Rebuilt (Doc ID 742737.1) and am running into a problem once I tried to delete the listener and then on to crs to delete the nodeapps for node3.
For the listener, I go into netca and the option to delete a listener is grayed out. When I run crs_stat I can still see the ora.node3.lsnr there. Does this mean that I just need to update tnsnames.ora or is there another place the information would be held? I hate to manually delete because I am afraid I won't get it cleaned out from everywhere. Any idea why that option would not be there?
My second issue is when I run this:
srvctl stop nodeapps -n node3 The nodeapps stop doesn't return any output and then when I try to remove nodeapps it gives me PRKO-2112: Some or all node applications are not removed successfully on node.
I have searched metalink for that error with no success as the document I found also says that you must stop nodeapps. I have already deleted the node from the db and asm and updated the appropriate inventory. I just need to finish the listener and crs and update the inventory for crs. Also, I noticed that the vip for the failed node was reassigned to node2 and I show that it has been released when i run cluvfy to check. Would crs give me errors on this if that was not the case?
I appreciate any help or guidance!

Wanted to post a follow up in case any others are interested in the results...
I had tried to add the listener back to one of the remaining nodes .ora file and then delete but that didn't work. Also, remove nodeapps continued to throw an error that it could not stop the listener or vip for the failed node.
After a few days of reading I make a decision to just unregister the abandoned services from crs. I made sure to backup the OCRCONFIG before I ran crs_unregister and was able to successfully remove the listener and vip services from the failed node.
This eliminated my issue with netca, the node did not show up there anymore. I then went on to remove nodeapps and it failed saying it could not find the resource vip. I then ran olsnodes -n and used crssetup to remove the node entirely. Everything showed removed and I went and updated the crs inventory to finish.
All looks good and now I am working to rebuild and add the node back in.

Similar Messages

RAC node outage causes SOA Suite 10.1.3.4 BPEL failure

Using weblogic 9.2 and the SOA Suite 10.1.3.4. We use a 10g Oracle RAC ( 2 nodes ); the WL cluster has a multi data source of 2 pools, each pool pointing to a single node in the rac, each pool deployed to the cluster, and the multi data source in load-balancing mode.
So the other night, one of the db nodes had a hardware failure ( ironically, with a remote monitoring / management card ). Annoying, but it should not have caused the BPEL servers to be in "FAILED NOT RESTARTABLE" status the next morning.
Jun 9, 2009 12:10:07 AM EDT> <Warning> <JDBC> <BEA-001129> <Received exception while creating connection for pool "esbaqds2": Io exception: The Network Adapter could not establish the connection>
SEVERE: Destroying JMSDequeuer failed
oracle.jms.AQjmsException: Connection has been administratively destroyed. Reconnect.
at oracle.jms.AQjmsSession.preClose(AQjmsSession.java:980)
at oracle.jms.AQjmsObject.close(AQjmsObject.java:409)
at oracle.jms.AQjmsSession.close(AQjmsSession.java:1020)
at oracle.tip.esb.server.dispatch.JMSDequeuer.destroy(JMSDequeuer.java:419)
at oracle.tip.esb.server.dispatch.JMSDequeuer.destroyWithoutUnsubscribing(JMSDequeuer.java:395)
at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:175)
at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
followed by a 2 GB log file containing 1.3 million iterations of the following within the next 10 minutes before the managed servers failed.
java.lang.NullPointerException
at java.lang.String.<init>(String.java:144)
at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:168)
at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
Both managed instances of the BPEL cluster failed, even though the 1st node of the Oracle RAC was still available.
Our 10.3 cluster, also using multi data sources to the same RAC for the OSB components, simply went on about its business using the remaining rac node pool.
Seems to be a single point of failure...

We haven't changed the JDBC connection string yet, but we did run a test in the same environment while Oracle support considers the situation.
For the test, we simply shutdown one node of the RAC and watched to see what happens. Within the space of a minute, the JDBC "Failed Reserve Request Count" was increasing by thousands on every refresh of the screen. We restarted the RAC node after 5 minutes, by which time the "Failed Reserve Request Count" was over 190,000
The 2 BPEL managed servers remained in Running status and each created a 660 MB log file within that 5 minutes. In the original outage, the nodes were down for about 15 minutes. Most of the logging is being generated from within the oracle.tip.esb classes, not by the weblogic classes. It looks like that once the pool pointing to the downed RAC node becomes disabled, the Oracle BPEL code is still trying to use it even though the multi-source JNDI is the published lookup:
INFO: JMSDequeuer::createConnection - AQ Topics
java.sql.SQLException: weblogic.common.ResourceException:
esbaqds(esbaqds2): Pool esbaqds2 is disabled, cannot allocate resources to applications..
esbaqds(esbaqds1): Pool esbaqds1 is disabled, cannot allocate resources to applications..
at weblogic.jdbc.common.internal.JDBCUtil.wrapAndThrowResourceException(JDBCUtil.java:250)
at weblogic.jdbc.common.internal.RmiDataSource.getPoolConnection(RmiDataSource.java:348)
at weblogic.jdbc.common.internal.RmiDataSource.getConnection(RmiDataSource.java:364)
at oracle.tip.esb.server.dispatch.JMSDequeuer.createAQConnection(JMSDequeuer.java:559)
at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:159)
at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
Caused by: weblogic.common.ResourceException:
esbaqds(esbaqds2): Pool esbaqds2 is disabled, cannot allocate resources to applications..
esbaqds(esbaqds1): Pool esbaqds1 is disabled, cannot allocate resources to applications..
at weblogic.jdbc.common.internal.MultiPool.searchLoadBalance(MultiPool.java:331)
at weblogic.jdbc.common.internal.MultiPool.findPool(MultiPool.java:202)
at weblogic.jdbc.common.internal.ConnectionPoolManager.reserve(ConnectionPoolManager.java:77)
at weblogic.jdbc.common.internal.RmiDataSource.getPoolConnection(RmiDataSource.java:346)
... 11 more
SEVERE: Failed to process deferred message
oracle.tip.esb.server.dispatch.QueueHandlerException: Error creating "weblogic.common.ResourceException: No good connections available."
at oracle.tip.esb.server.dispatch.JMSDequeuer.createAQConnection(JMSDequeuer.java:661)
at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:159)
at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)

Unable to bring up crs on rac node 2 after server reboot

Hi,
We have a 2 node rac architecture. We are only able to bring up Node 1 on the cluster, whereas node 2 is failing. Here are some points::
1. After the server reboot, node 2 crs crs/resources weren't starting up apart from OHAS.
2. We again stopped both the CRS and tried bringing up CRS on node 2 initially and succeeded. But now node 1 wasn't coming up.
3. Again brought down both nodes' CRS and tried bringing up CRS on node1 and succeded but asm wasn't showing the Diskgroups. So we changed pfile to include asm_diskstring from ORCL* to /dev/oracleasm/disks and we could lsdg in asm now.So started all the instances from node 1 now. Apart from this, again node 2 CRS wasn't starting. From alertlog I saw "CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds;". But we were able to query voting disks initially. What has gone wrong now??
./crsctl status res -t -init
NAME           TARGET STATE        SERVER                   STATE_DETAILS
Cluster Resources
ora.asm
      1        OFFLINE OFFLINE
ora.cluster_interconnect.haip
      1        ONLINE OFFLINE
ora.crf
      1        ONLINE ONLINE       kusmnd0r
ora.crsd
      1        ONLINE OFFLINE
ora.cssd
      1        ONLINE OFFLINE                               STARTING
ora.cssdmonitor
      1        ONLINE ONLINE       kusmnd0r
ora.ctssd
      1        ONLINE OFFLINE
ora.diskmon
      1        OFFLINE OFFLINE
ora.evmd
      1        ONLINE OFFLINE
ora.gipcd
      1        ONLINE ONLINE       kusmnd0r
ora.gpnpd
      1        ONLINE ONLINE       kusmnd0r
ora.mdnsd
      1        ONLINE ONLINE       kusmnd0r
This is the history of activitites. Kindly someone throw light on this please.
Thanks,
Anirban.

It is on a raw device.
Healthy node::
ls -ltrh /dev/vote*
brw-rw---- 1 crsdwqa dbadwqa 120, 1057 Nov 6 11:32 /dev/vote3
brw-rw---- 1 crsdwqa dbadwqa 120, 1025 Nov 6 11:32 /dev/vote1
brw-rw---- 1 crsdwqa dbadwqa 120, 1041 Nov 6 11:32 /dev/vote2
Affected Node::
ls -ltrh /dev/vote*
brw-rw-r-- 1 crsdwqa dbadwqa 120, 1025 Nov 4 12:06 /dev/vote1
brw-rw-r-- 1 crsdwqa dbadwqa 120, 1041 Nov 4 12:06 /dev/vote2
brw-rw-r-- 1 crsdwqa dbadwqa 120, 1057 Nov 5 04:42 /dev/vote3
Regards,
Anirban.

Exception while failing over to 2nd RAC Node

We are using Weblogic 10.3.4. Our setup is that we have a Web Application (A tapestry front end Web UI) and EJb 2.1 back-end talking to the Oracle database. The EJB’s are CMP. Our product always was just stand alone and it wasn’t until this release we needed to make it work with RAC. To get this to work we followed the model of having a Multidatasource with datasources pointing to our RAC nodes. We have two types of datasources that we use persistent and non-persistent. And we are using the Oracle thin driver – non-XA for RAC Service Instances, supporting global transactions.
When we do failover to the 2nd node we get a nasty exception in our GUI but after logging out and logging back it we are fine.
My question is that I assumed I shouldn't have to restart our web-application and it should have stayed up ?? Or is there something wrong with our setup ?
Thanks,
Ian

Showing us the exception and/or the error messages at the server might help...
Note that failing over does not save any ongoing connection or transaction that
had been to the dead RAC node... Does your web-app get-use-close JDBC
connections on a per-user-invoke basis, or does it hold onto connections?
Joe

Rac node failed how do you bring it back up?

Example: If there are 3 RAC nodes and one becomes unavailable/fails, how do you bring it back up?

There are typically two basic reasons why a RAC node will go down.
A cluster issue, causing the node to be evicted. This usually means the node lost access to the cluster storage (e.g. no access to voting disks), or lost access to the Interconnect.
An o/s issue causing the node to fail. E.g. a kernel panic due to a page swap and memory not syncing, or a soft CPU lockup, etc.
You need to determine why it went down, and determine what is needed to enable it to join the successfully cluster again.

Dbconsole failed to start on one RAC node

Hi
I have 2 RAC nodes (RHEL 4) and 10.2.0.1. On one dbconsole is running and on other I get the following. Earlier dbconsole
on both the nodes used to run perfectly fine. I will appreacite any suggestions to rectify this problem.
Regards
oracle@rac01<18>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> emctl start dbconsole
TZ set to Canada/Newfoundland
Oracle Enterprise Manager 10g Database Control Release 10.2.0.1.0
Copyright (c) 1996, 2005 Oracle Corporation. All rights reserved.
http://rac01:1158/em/console/aboutApplication
Agent Version : 10.1.0.4.1
OMS Version : Unknown
Protocol Version : 10.1.0.2.0
Agent Home : /u01/app/oracle/product/10.2/db_1/rac01_RACDB1
Agent binaries : /u01/app/oracle/product/10.2/db_1
Agent Process ID : 23329
Parent Process ID : 21132
Agent URL : http://rac01:3938/emd/main
Started at : 2007-07-25 11:37:32
Started by user : oracle
Last Reload : 2007-07-25 11:37:32
Last successful upload : (none)
Last attempted upload : (none)
Total Megabytes of XML files uploaded so far : 0.00
Number of XML files pending upload : 371
Size of XML files pending upload(MB) : 7.66
Available disk space on upload filesystem : 44.78%
Agent is already started. Will restart the agent
Stopping agent ... stopped.
Starting Oracle Enterprise Manager 10g Database Control ............................................................................................. failed.
Logs are generated in directory /u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log
oracle@rac01<19>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log>
ON OTHER NODE:
oracle@rac02<2>:/u01/app/oracle> emctl start dbconsole
TZ set to Canada/Newfoundland
Oracle Enterprise Manager 10g Database Control Release 10.2.0.1.0
Copyright (c) 1996, 2005 Oracle Corporation. All rights reserved.
http://rac01:1158/em/console/aboutApplication
Starting Oracle Enterprise Manager 10g Database Control .................................... started.
Logs are generated in directory /u01/app/oracle/product/10.2/db_1/rac02_RACDB2/sysman/log
oracle@rac02<3>:/u01/app/oracle>

Thanks for your time and reply .
Well, here is what I got, couldn't make out from here.
Regards
oracle@rac01<19>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> ls -lart
total 13500
drwxr----- 7 oracle dba 4096 Jul 14 10:48 ..
-rw-r----- 1 oracle dba 0 Jul 14 10:48 emdctl.log
drwxrwx--- 2 oracle dba 4096 Jul 14 10:54 nmcRACDB11521
-rw-r----- 1 oracle dba 4655792 Jul 24 23:01 emoms.trc
-rw-r----- 1 oracle dba 4655792 Jul 24 23:01 emoms.log
drwxr----- 3 oracle dba 4096 Jul 25 11:35 .
-rw-r----- 1 oracle dba 4096 Jul 25 12:05 emdb.nohup.lr
-rw-r----- 1 oracle dba 1074 Jul 25 12:05 emagent_perl.trc
-rw-r----- 1 oracle dba 1731 Jul 25 12:06 emagent.log
-rw-r----- 1 oracle dba 1080 Jul 25 12:07 emagentfetchlet.trc
-rw-r----- 1 oracle dba 1080 Jul 25 12:07 emagentfetchlet.log
-rw-r----- 1 oracle dba 81089 Jul 25 13:28 emdctl.trc
-rw-r----- 1 oracle dba 3309143 Jul 25 13:28 emdb.nohup
-rw-r----- 1 oracle dba 1044518 Jul 25 13:28 emagent.trc
oracle@rac01<20>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> cat emagent.log
2007-07-14 10:50:44 Thread-3086936288 Starting Agent 10.1.0.4.1 from /u01/app/oracle/product/10.2/db_1 (00701)
2007-07-14 10:51:16 Thread-3086936288 EMAgent started successfully (00702)
2007-07-14 14:38:21 Thread-3086935744 Starting Agent 10.1.0.4.1 from /u01/app/oracle/product/10.2/db_1 (00701)
2007-07-14 14:39:00 Thread-3086935744 EMAgent started successfully (00702)
2007-07-24 07:05:06 Thread-3086935744 Starting Agent 10.1.0.4.1 from /u01/app/oracle/product/10.2/db_1 (00701)
2007-07-24 07:07:11 Thread-3086935744 target {+ASM1_rac01, osm_instance} is broken: cannot compute dynamic properties in time. (00155)
2007-07-24 07:07:14 Thread-3086935744 EMAgent started successfully (00702)
2007-07-24 12:06:27 Thread-3086935744 EMAgent normal shutdown (00703)
2007-07-24 12:08:26 Thread-3086935744 Starting Agent 10.1.0.4.1 from /u01/app/oracle/product/10.2/db_1 (00701)
2007-07-24 12:08:51 Thread-3086935744 EMAgent started successfully (00702)
2007-07-25 11:35:35 Thread-3086935744 EMAgent normal shutdown (00703)
2007-07-25 11:37:32 Thread-3086935744 Starting Agent 10.1.0.4.1 from /u01/app/oracle/product/10.2/db_1 (00701)
2007-07-25 11:39:29 Thread-3086935744 target {+ASM1_rac01, osm_instance} is broken: cannot compute dynamic properties in time. (00155)
2007-07-25 11:39:30 Thread-3086935744 EMAgent started successfully (00702)
2007-07-25 12:03:36 Thread-3086935744 EMAgent normal shutdown (00703)
2007-07-25 12:05:15 Thread-3086935744 Starting Agent 10.1.0.4.1 from /u01/app/oracle/product/10.2/db_1 (00701)
2007-07-25 12:06:23 Thread-3086935744 target {+ASM1_rac01, osm_instance} is broken: cannot compute dynamic properties in time. (00155)
2007-07-25 12:06:24 Thread-3086935744 EMAgent started successfully (00702)
oracle@rac01<21>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> cat emagentfetchlet.log
2007-07-14 11:01:44,208 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
2007-07-14 14:40:29,096 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
2007-07-24 07:10:44,123 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
2007-07-24 12:12:48,187 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
2007-07-25 11:41:25,628 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
2007-07-25 12:07:30,335 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
oracle@rac01<22>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log>
oracle@rac01<22>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> tail -40 emagentfetchlet.trc
2007-07-14 11:01:44,208 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
2007-07-14 14:40:29,096 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
2007-07-24 07:10:44,123 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
2007-07-24 12:12:48,187 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
2007-07-25 11:41:25,628 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
2007-07-25 12:07:30,335 [main] WARN track.OracleInventory collectInventory.439 - ECM: The inventory location file for the special Windows NT case does not exist or is unreadable.
oracle@rac01<25>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> tail -10 emdctl.trc
2007-07-25 13:01:02 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:04:41 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:07:12 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:10:50 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:14:32 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:18:09 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:20:40 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:24:27 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:28:06 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:31:43 Thread-3086935744 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
oracle@rac01<28>:/u01/app/oracle/product/10.2/db_1/rac01_RACDB1/sysman/log> tail -10 emagent.trc
2007-07-25 13:31:44 Thread-43162528 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:31:44 Thread-43162528 ERROR pingManager: nmepm_pingReposURL: Cannot connect to http://rac01:1158/em/upload/: retStatus=-32
2007-07-25 13:32:14 Thread-74791840 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:32:14 Thread-74791840 ERROR pingManager: nmepm_pingReposURL: Cannot connect to http://rac01:1158/em/upload/: retStatus=-32
2007-07-25 13:32:14 Thread-74791840 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:32:14 Thread-74791840 ERROR pingManager: nmepm_pingReposURL: Cannot connect to http://rac01:1158/em/upload/: retStatus=-32
2007-07-25 13:32:44 Thread-74791840 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:32:44 Thread-74791840 ERROR pingManager: nmepm_pingReposURL: Cannot connect to http://rac01:1158/em/upload/: retStatus=-32
2007-07-25 13:32:44 Thread-74791840 WARN http: snmehl_connect: connect failed to (rac01:1158): Connection refused (error = 111)
2007-07-25 13:32:44 Thread-74791840 ERROR pingManager: nmepm_pingReposURL: Cannot connect to http://rac01:1158/em/upload/: retStatus=-32
Message was edited by:
Singh

RAC - Oracle Grid Infrastructure configure failed

Hi, am trying to install 2 node RAC on Oracle VMs. Before the installation during the -preinst check there were few issues which were resolved (ex user equivalence). After that during the installation process of the Grid it failed at step "Configure Oracle Grid Infrastructure for a cluster". After it failed at this step, subsequent steps too failed which I asked OUI to ignore and then I ran both the post installation scripts. And then ran post crsinst which failed. Pasting below the output of the root.sh script, post crsinst and other checks.
[root@bsfrac01 grid]# sh root.sh
Running Oracle 11g root.sh script...
The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/app/11.2/grid
Enter the full pathname of the local bin directory: [usr/local/bin]:
Copying dbhome to /usr/local/bin ...
Copying oraenv to /usr/local/bin ...
Copying coraenv to /usr/local/bin ...
Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2011-02-13 00:11:55: Parsing the host name
2011-02-13 00:11:55: Checking for super user privileges
2011-02-13 00:11:55: User has super user privileges
Using configuration parameter file: /u01/app/11.2/grid/crs/install/crsconfig_params
Creating trace directory
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
root wallet
root wallet cert
root cert export
peer wallet
profile reader wallet
pa wallet
peer wallet keys
pa wallet keys
peer cert request
pa cert request
peer cert
pa cert
peer root cert TP
profile reader root cert TP
pa root cert TP
peer pa cert TP
pa peer cert TP
profile reader pa cert TP
profile reader peer cert TP
peer user cert
pa user cert
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting
CRS-2672: Attempting to start 'ora.gipcd' on 'bsfrac01'
CRS-2672: Attempting to start 'ora.mdnsd' on 'bsfrac01'
CRS-2676: Start of 'ora.mdnsd' on 'bsfrac01' succeeded
CRS-2676: Start of 'ora.gipcd' on 'bsfrac01' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'bsfrac01'
CRS-2676: Start of 'ora.gpnpd' on 'bsfrac01' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'bsfrac01'
CRS-2676: Start of 'ora.cssdmonitor' on 'bsfrac01' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'bsfrac01'
CRS-2672: Attempting to start 'ora.diskmon' on 'bsfrac01'
CRS-2676: Start of 'ora.diskmon' on 'bsfrac01' succeeded
CRS-2676: Start of 'ora.cssd' on 'bsfrac01' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'bsfrac01'
CRS-2676: Start of 'ora.ctssd' on 'bsfrac01' succeeded
ASM created and started successfully.
DiskGroup DATA1 created successfully.
clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-2672: Attempting to start 'ora.crsd' on 'bsfrac01'
CRS-2676: Start of 'ora.crsd' on 'bsfrac01' succeeded
CRS-4256: Updating the profile
Successful addition of voting disk 0ea2052d8a714fd7bf46d9d5c785483e.
Successfully replaced voting disk group with +DATA1.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
## STATE File Universal Id File Name Disk group
1. ONLINE 0ea2052d8a714fd7bf46d9d5c785483e (ORCL:DISK1) [DATA1]
Located 1 voting disk(s).
*Failed to rmtcopy "/tmp/filekRIMbG" to "/u01/app/11.2/grid/gpnp/manifest.txt" for nodes {bsfrac01,bsfrac02}, rc=256*
*Failed to rmtcopy "/u01/app/11.2/grid/gpnp/bsfrac01/profiles/peer/profile.xml" to "/u01/app/11.2/grid/gpnp/profiles/peer/profile.xml" for nodes {bsfrac01,bsfrac02}, rc=256*
rmtcopy aborted
Failed to promote local gpnp setup to other cluster nodes
CRS-2673: Attempting to stop 'ora.crsd' on 'bsfrac01'
CRS-2677: Stop of 'ora.crsd' on 'bsfrac01' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'bsfrac01'
CRS-2677: Stop of 'ora.asm' on 'bsfrac01' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'bsfrac01'
CRS-2677: Stop of 'ora.ctssd' on 'bsfrac01' succeeded
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'bsfrac01'
CRS-2677: Stop of 'ora.cssdmonitor' on 'bsfrac01' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'bsfrac01'
CRS-2677: Stop of 'ora.cssd' on 'bsfrac01' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'bsfrac01'
CRS-2677: Stop of 'ora.gpnpd' on 'bsfrac01' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'bsfrac01'
CRS-2677: Stop of 'ora.gipcd' on 'bsfrac01' succeeded
CRS-2673: Attempting to stop 'ora.mdnsd' on 'bsfrac01'
CRS-2677: Stop of 'ora.mdnsd' on 'bsfrac01' succeeded
Initial cluster configuration failed. See /u01/app/11.2/grid/cfgtoollogs/crsconfig/rootcrs_bsfrac01.log for details
[root@bsfrac01 grid]#
[oracle@bsfrac01 bin]$ ./cluvfy stage -post crsinst -n bsfrac01,bsfrac02 -verbose
Performing post-checks for cluster services setup
Checking node reachability...
Check: Node reachability from node "bsfrac01"
Destination Node Reachable?
bsfrac01 yes
bsfrac02 yes
Result: Node reachability check passed from node "bsfrac01"
Checking user equivalence...
Check: User equivalence for user "oracle"
Node Name Comment
bsfrac01 passed
bsfrac02 passed
Result: User equivalence check passed for user "oracle"
ERROR:
PRKC-1094 : Failed to retrieve the active version of crs: {0}
Checking time zone consistency...
Time zone consistency check passed.
ERROR:
PRKC-1093 : Failed to retrieve the version of crs software on node "java.io.IOException: /u01/app/11.2.0/grid/bin/crsctl: not found
" : {1}
ERROR:
Cluster manager integrity check failed
PRVF-5434 : Cannot identify the current CRS software version
UDev attributes check for OCR locations started...
Result: UDev attributes check passed for OCR locations
UDev attributes check for Voting Disk locations started...
ERROR:
PRVF-5197 : Failed to retrieve voting disk locationsPRKC-1092 : Failed to retrieve the location of votedisks: java.io.IOException: /u01/app/11.2.0/grid/bin/crsctl: not found
Result: UDev attributes check failed for Voting Disk locations
Check default user file creation mask
Node Name Available Required Comment
bsfrac01 0022 0022 passed
bsfrac02 0022 0022 passed
Result: Default user file creation mask check passed
Checking cluster integrity...
Node Name
bsfrac01
Cluster integrity check failed This check did not run on the following node(s):
bsfrac02
Checking OCR integrity...
Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations
ERROR:
PRKC-1094 : Failed to retrieve the active version of crs: {0}
ERROR:
PRVF-5300 : Failed to retrieve active version for CRS on this node
OCR integrity check failed
Checking CRS integrity...
ERROR:
PRKC-1094 : Failed to retrieve the active version of crs: {0}
ERROR:
PRVF-5300 : Failed to retrieve active version for CRS on this node
CRS integrity check failed
OCR detected on ASM. Running ACFS Integrity checks...
Starting check to see if ASM is running on all cluster nodes...
PRVF-5137 : Failure while checking ASM status on node "bsfrac01"
PRVF-5137 : Failure while checking ASM status on node "bsfrac02"
Starting Disk Groups check to see if at least one Disk Group configured...
PRVF-5112 : An Exception occurred while checking for Disk Groups
PRVF-5114 : Disk Group check failed. No Disk Groups configured
Task ACFS Integrity check failed
Checking Oracle Cluster Voting Disk configuration...
ERROR:
PRKC-1093 : Failed to retrieve the version of crs software on node "java.io.IOException: /u01/app/11.2.0/grid/bin/crsctl: not found
" : {1}
ERROR:
PRVF-5434 : Cannot identify the current CRS software version
PRVF-5431 : Oracle Cluster Voting Disk configuration check failed
Checking to make sure user "oracle" is not in "root" group
Node Name Status Comment
bsfrac01 does not exist passed
bsfrac02 does not exist passed
Result: User "oracle" is not part of "root" group. Check passed
Post-check for cluster services setup was unsuccessful on all the nodes.
[oracle@bsfrac01 bin]$ /u01/app/11.2/grid/bin/ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 408
Available space (kbytes) : 261712
ID : 1671840043
Device/File Name : +DATA1
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check bypassed due to non-privileged user
ASM looks to be up and running..
[oracle@bsfrac01 bin]$ /usr/sbin/oracleasm listdisks
DISK1
DISK2
DISK3
DISK4
DISK5
DISK6
[oracle@bsfrac01 bin]$ /usr/sbin/oracleasm status
Checking if ASM is loaded: yes
Checking if /dev/oracleasm is mounted: yes
Please help.

before installation have u configure the private interconnect on both the nodes to same network adapter..
for example on node 1 if the private interconnect is on eth0 then on the node 2 it should use eth0 only...
for private interconnect use the hostonly option on both the nodes in the network configuration page of the vmware or virtual box..
and for public network it can be bridged
more over if you are installing on the laptop its good to configure the SSH using the OUI.. rather than doing it manually.. as it saves time
the private and the public networks should not have same range of ip address. like if public address are like 192.168.2.222/255.255.255.0 and private address have to be different like 10.10.1.2/255.0.0.0 (this is just an example)
have to configured the NTP.
any ways try installing the oracle rac on virtual box follow the steps given the below website they are pretty straight forward...
http://www.oracle-base.com/articles/11g/OracleDB11gR2RACInstallationOnOEL5UsingVirtualBox.php

RAC node restarting!

hi
one of our RAC environment keep restarting.
i've disable the init.cssd, init.crs, init.evmd in the /etc/inittab in order to check the logs.
this is the situation:
crsd.log:
2009-02-04 00:09:00.118: [ COMMCRS][9]clsc_connect: (8000000100318640) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_loud))
2009-02-04 00:09:00.132: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
2009-02-04 00:09:00.134: [ CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2009-02-04 00:09:08.016: [    CRSD][1]32Daemon Version: 10.2.0.2.0 Active Version: 10.2.0.2.0
2009-02-04 00:09:08.016: [    CRSD][1]32Active Version and Software Version are same
2009-02-04 00:09:08.017: [ CRSMAIN][1]32Initializing OCR
2009-02-04 00:09:08.037: [ OCRRAW][1]proprioo: for disk 0 (/dev/rdsk/ora_ocr_raw), id match (1), my id set (752560621,1028247821) total id sets (1), 1st set
(752560621,1028247821), 2nd set (0,0) my votes (2), total votes (2)
2009-02-04 00:09:08.140: [ CSSCLNT][24]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
ocssd.log:
[    CSSD]2009-02-03 21:52:08.651 [9] >USER: clssnmHandleUpdate: NODE 1 (node1l) IS ACTIVE MEMBER OF CLUSTER
[    CSSD]2009-02-03 21:52:08.651 [9] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
[    CSSD]2009-02-03 21:52:08.651 [16] >TRACE: clssnmWaitForAcks: done, msg type(15)
[    CSSD]2009-02-03 21:52:08.651 [16] >TRACE: clssnmDoSyncUpdate: Sync Complete!
[    CSSD]2009-02-03 21:52:08.722 [1] >USER: NMEVENT_SUSPEND [00][00][00][00]
[    CSSD]2009-02-03 21:52:08.724 [17] >TRACE: clssgmReconfigThread: started for reconfig (1)
[    CSSD]2009-02-03 21:52:08.749 [17] >USER: NMEVENT_RECONFIG [00][00][00][02]
[    CSSD]2009-02-03 21:52:08.749 [17] >TRACE: clssgmEstablishConnections: 1 nodes in cluster incarn 1
[    CSSD]2009-02-03 21:52:08.751 [13] >TRACE: clssgmPeerListener: connects done (1/1)
[    CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmEstablishMasterNode: MASTER for 1 is node(1) birth(1)
[    CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
[    CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
[    CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
[    CSSD]CLSS-3000: reconfiguration successful, incarnation 1 with 1 nodes
[    CSSD]CLSS-3001: local node number 1, master node number 1
[    CSSD]2009-02-03 21:52:08.753 [17] >TRACE: clssgmReconfigThread: completed for reconfig(1), with status(1)
[    CSSD]2009-02-03 21:52:08.863 [10] >TRACE: clssgmClientConnectMsg: Connect from con(80000001008fd2a0) proc(8000000100ae26a8) pid() proto(10:2:1:1)
[    CSSD]2009-02-03 21:52:08.864 [10] >TRACE: clssgmClientConnectMsg: Connect from con(8000000100ae0128) proc(8000000100ae2a10) pid() proto(10:2:1:1) from con(8000000100aa32c0) proc(8000000100aa5b90) pid() proto(10:2:1:1)
alertlog:
[cssd(2535)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node1 .
2009-02-03 23:55:20.821
[cssd(2575)]CRS-1605:CSSD voting file is online: /dev/rdsk/ora_voting_raw. Detai ls in /work/crs/product/10.2/crs/log/lourmel/cssd/ocssd.log.
2009-02-03 23:55:28.376
evmd.log:
Oracle Database 10g CRS Release 10.2.0.2.0 Production Copyright 1996, 2004, Oracle. All rights reserved
2009-02-04 00:08:58.331: [    EVMD][1]32EVMD waiting for CSS to be ready err = 3
2009-02-04 00:08:59.939: [ COMMCRS][9]clsc_connect: (800000010007d658) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_loud))
2009-02-04 00:08:59.946: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
2009-02-04 00:08:59.948: [    EVMD][1]32EVMD waiting for CSS to be ready err = 3
2009-02-04 00:09:07.596: [ CSSCLNT][1]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
syslog:
Feb 4 00:08:41 lourmel syslog: Oracle Cluster Ready Services starting up automatically.
Feb 4 00:08:45 lourmel sfd[2153]: starting the daemon.
Feb 4 00:08:45 lourmel su: + tty?? root-orac
Feb 4 00:08:45 lourmel krsd[2152]: Delay time is 300 seconds
Feb 4 00:08:43 lourmel syslog: Oracle Cluster Ready Services starting up automatically.
Feb 4 00:08:52 lourmel above message repeats 2 times
Feb 4 00:08:52 lourmel syslog: Cluster Ready Services completed waiting on dependencies.
Feb 4 00:08:53 lourmel syslog: Running CRSD with TZ =
when i checked(befor the restart) the command crs_stat i got the message:
ORA-0184: Cannot communicate wirh CRS
crsctl check crs gives us:
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
as i said befor, the machine always restarting
anyone have an idea?? please

Dear All,
I recently upgrade the Few RAC setups with Oracle 10g Patchset 3 (10.2.0.4) on Linux Servers
In one of the RAC setup, found servers are rebooting daily. The same setup was working fine and problem started only after applying the Patchset. Checked all the logs and Found nothing relevant.
Then i checked the things which added with this Patchset.
The Most interesting found , Oracle Added a New Daemon- oprocd.
# ps -efl | grep oprocd
4 S root 6440 6063 0 -40 - - 2114 - Mar03 ? 00:00:00 /opt/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 500 -hsi 5:10:50:75:90 -f
These are Interesting Points about above line
1.This Process is running by root user
2. With Highest Priority -40
3. Probing every Seconds (t 1000)
4. waiting CPU response for 500 Milliseconds ( -m 500 means margin time is 500 Milli Seconds)
5. Process status is Fatal (-f)
Now I am concluding these points- This daemon will probe cpu every second and wait for response within 500 Mill seconds. If in the 500 Milli second not getting any response from the cpu, will assume the CPU is hang and try to Reboot the Machine. The OPERATING SYSTEM will not get enough time to write the system logs and server reboots.
So the solution is increase the Margin time for 500 Milli second to 10 seconds.
These are following steps to increase the Margin time.
Please Remember- The Modification process need Downtime and You need to stop cluster service in all member nodes.
1. Stop The CRS Process
#crsctl stop crs
#<CRS_HOME>/bin/oprocd stop
2. Ensure that Clusterware stack is down and not running
#ps -ef |egrep "crsd.bin|ocssd.bin|evmd.bin|oprocd"
This should return no processes.
3. From one node of the cluster, change the value of the "diagwait" parameter to 13 by issuing the command as root:
#crsctl set css diagwait 13 -force
4. Check if diagwait is successfully set.
#crsctl get css diagwait
5. Restart the Oracle Clusterware on all the nodes by executing:
#crsctl start crs
(Note- If facing any problem to restarting the CRS services, ASM and Database, You can reboot the Nodes.The Cluster and Database will come automatically due to init startup scripts.)
6. The oprocd daemon process will show with -m 10000
# ps -efl| grep oprocd
# 4 S root 6440 6063 0 -40 - - 2114 - Feb02 ? 00:00:00 /opt/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 10000 -hsi 5:10:50:75:90 -f
Rollback Procedure-
If You need to unset oprocd value due any reason
#crsctl unset css diagwait
I am confident, The abnormal RAC Node restart problem will solve with this workaround.
Regards,
Sumit
Bangalore,India

RAC node is not starting

I have 2 node RAC instance in 2 virtual machines. My node 2 ASM instance is not starting after the reboot but node 1 is working fine.
I got below output from Node 2.
srvctl status asm
PRCR-1070 : Failed to check if resource ora.asm is registered
Cannot communicate with crsd
alertlog
[/u01/app/11.2.0/grid/bin/cssdagent(3782)]CRS-5818:Aborted command 'start for resource: ora.cssd 1 1' for resource 'ora.cssd'. Details at (:CRSAGF00113:) in /u01/app/11.2.0/grid/log/ol5-112-rac2/agent/ohasd/oracssdagent_root/oracssdagent_root.log.
2015-03-13 16:38:23.671
[ohasd(3315)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cssd'. Details at (:CRSPE00111:) in /u01/app/11.2.0/grid/log/ol5-112-rac2/ohasd/ohasd.log.
2015-03-13 16:38:24.264
[ohasd(3315)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server 'ol5-112-rac2'.
2015-03-13 16:38:35.915
[cssd(4326)]CRS-1713:CSSD daemon is started in clustered mode
2015-03-13 16:38:41.195
[cssd(4326)]CRS-1707:Lease acquisition for node ol5-112-rac2 number 2 completed
2015-03-13 16:38:41.259
[cssd(4326)]CRS-1605:CSSD voting file is online: /dev/oracleasm/disks/DISK1; details in /u01/app/11.2.0/grid/log/ol5-112-rac2/cssd/ocssd.log.
2015-03-13 16:38:41.562
[crsd(4285)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/ol5-112-rac2/crsd/crsd.log.
2015-03-13 16:38:42.217
[ohasd(3315)]CRS-2765:Resource 'ora.crsd' has failed on server 'ol5-112-rac2'.
2015-03-13 16:38:43.313
[crsd(4357)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/ol5-112-rac2/crsd/crsd.log.
2015-03-13 16:38:44.262
[ohasd(3315)]CRS-2765:Resource 'ora.crsd' has failed on server 'ol5-112-rac2'.
2015-03-13 16:38:44.716
[ohasd(3315)]CRS-2765:Resource 'ora.diskmon' has failed on server 'ol5-112-rac2'.
crsd log
2015-03-13 16:39:02.140: [ OCRASM][1266976496]proprasmo: kgfoCheckMount returned [7]
2015-03-13 16:39:02.140: [ OCRASM][1266976496]proprasmo: The ASM instance is down
2015-03-13 16:39:02.140: [ OCRRAW][1266976496]proprioo: Failed to open [+DATA]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2015-03-13 16:39:02.140: [ OCRRAW][1266976496]proprioo: No OCR/OLR devices are usable
2015-03-13 16:39:02.140: [ OCRASM][1266976496]proprasmcl: asmhandle is NULL
2015-03-13 16:39:02.140: [ OCRRAW][1266976496]proprinit: Could not open raw device
2015-03-13 16:39:02.140: [ OCRASM][1266976496]proprasmcl: asmhandle is NULL
2015-03-13 16:39:02.140: [ OCRAPI][1266976496]a_init:16!: Backend init unsuccessful : [26]
2015-03-13 16:39:02.140: [ CRSOCR][1266976496] OCR context init failure. Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge
ORA-15077: could not locate ASM instance serving a required diskgroup
] [7]
2015-03-13 16:39:02.140: [    CRSD][1266976496][PANIC] CRSD exiting: Could not init OCR, code: 26
2015-03-13 16:39:02.140: [    CRSD][1266976496] Done
Pls help

Hi Levi,
I got below output from cssd.log
2015-03-15 10:38:48.248: [ GIPCNET][1214789952]gipcmodNetworkProcessConnect: slos op : sgipcnTcpConnect
2015-03-15 10:38:48.248: [ GIPCNET][1214789952]gipcmodNetworkProcessConnect: slos dep : No route to host (113)
2015-03-15 10:38:48.248: [ GIPCNET][1214789952]gipcmodNetworkProcessConnect: slos loc : connect
2015-03-15 10:38:48.248: [ GIPCNET][1214789952]gipcmodNetworkProcessConnect: slos info: addr '192.168.1.101:42729'
2015-03-15 10:38:48.248: [    CSSD][1214789952]clssscSelect: conn complete ctx 0x24d7aa0 endp 0x10d4
2015-03-15 10:38:48.248: [    CSSD][1214789952]clssnmeventhndlr: node(1), endp(0x10d4) failed, probe((nil)) ninf->endp (0x1000010d4) CONNCOMPLETE
2015-03-15 10:38:48.248: [    CSSD][1214789952]clssnmDiscHelper: ol5-112-rac1, node(1) connection failed, endp (0x10d4), probe(0x100000000), ninf->endp 0x10d4
2015-03-15 10:38:48.248: [    CSSD][1214789952]clssnmDiscHelper: node 1 clean up, endp (0x10d4), init state 0, cur state 0
2015-03-15 10:38:48.248: [GIPCXCPT][1214789952]gipcInternalDissociate: obj 0x27c8050 [00000000000010d4] { gipcEndpoint : localAddr 'gipc://ol5-112-rac2:de93-de83-5e0c-a373#192.168.1.102#50439', remoteAddr 'gipc://ol5-112-rac1:nm_ol5-112-scan#192.168.1.101#42729', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x8061a, usrFlags 0x0 } not associated with any container, ret gipcretFail (1)
2015-03-15 10:38:48.248: [GIPCXCPT][1214789952]gipcDissociateF [clssnmDiscHelper : clssnm.c : 3215]: EXCEPTION[ ret gipcretFail (1) ] failed to dissociate obj 0x27c8050 [00000000000010d4] { gipcEndpoint : localAddr 'gipc://ol5-112-rac2:de93-de83-5e0c-a373#192.168.1.102#50439', remoteAddr 'gipc://ol5-112-rac1:nm_ol5-112-scan#192.168.1.101#42729', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x8061a, usrFlags 0x0 }, flags 0x0
2015-03-15 10:38:48.248: [    CSSD][1214789952]clssnmDiscEndp: gipcDestroy 0x10d4
2015-03-15 10:38:48.414: [    CSSD][1147648320]clssnmvDHBValidateNCopy: node 1, ol5-112-rac1, has a disk HB, but no network HB, DHB has rcfg 320087493, wrtcnt, 207342, LATS 355744, lastSeqNo 207342, uniqueness 1426244111, timestamp 1426396128/31164744
2015-03-15 10:38:48.414: [    CSSD][1214789952]clssnmconnect: connecting to addr gipc://ol5-112-rac1:nm_ol5-112-scan#192.168.1.101#42729
2015-03-15 10:38:48.414: [    CSSD][1214789952]clssscConnect: endp 0x10e0 - cookie 0x24d7aa0 - addr gipc://ol5-112-rac1:nm_ol5-112-scan#192.168.1.101#42729
2015-03-15 10:38:48.414: [    CSSD][1214789952]clssnmconnect: connecting to node(1), endp(0x10e0), flags 0x10002
2015-03-15 10:38:48.710: [    CSSD][1181219136]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
2015-03-15 10:38:49.209: [    CSSD][1206397248]clssnmRcfgMgrThread: Local Join
2015-03-15 10:38:49.209: [    CSSD][1206397248]clssnmLocalJoinEvent: begin on node(2), waittime 193000
2015-03-15 10:38:49.209: [    CSSD][1206397248]clssnmLocalJoinEvent: set curtime (356544) for my node
2015-03-15 10:38:49.209: [    CSSD][1206397248]clssnmLocalJoinEvent: scanning 32 nodes
2015-03-15 10:38:49.209: [    CSSD][1206397248]clssnmLocalJoinEvent: Node ol5-112-rac1, number 1, is in an existing cluster with disk state 3
2015-03-15 10:38:49.210: [    CSSD][1206397248]clssnmLocalJoinEvent: takeover aborted due to cluster member node found on disk
Thanks,
Suranga

Huge number of idle connections from loopback ip on oracle RAC node

Hi,
We have a 2node 11gR2(11.2.0.3) oracle RAC node. We are seeing huge number of idle connection(more than 5000 in each node) on both the nodes and increasing day by day. All the idle connections are from VIP and loopback address(127.0.0.1.47971 )
netstat -an |grep -i idle|more
127.0.0.1.47971 Idle
any insight will be helpful.
The server is suffering memory issues occasionally (once in a month).
ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
Thanks

user12959884 wrote:
Hi,
We have a 2node 11gR2(11.2.0.3) oracle RAC node. We are seeing huge number of idle connection(more than 5000 in each node) on both the nodes and increasing day by day. All the idle connections are from VIP and loopback address(127.0.0.1.47971 )
netstat -an |grep -i idle|more
127.0.0.1.47971 Idle
any insight will be helpful.
The server is suffering memory issues occasionally (once in a month).
ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
Thankswe can not control what occurs on your DB Server.
How do I ask a question on the forums?
SQL and PL/SQL FAQ
post results from following SQL
SELECT * FROM V$VERSION;

RAC node rebooting frequently

Hi all,
I am woserking on two node rac environment.One of my rac node is rebooting so frequently.I am using oracle 10g database and clusterware also(10.2.0.1).
Ihave checked os logs(linux AS 4),and rac related logs.Not able to find out anything.Posting all logs please suggest.

Hi i am posting alert log,os log and ocssd logs....
clusterware alert log....._
[crsd(5649)]CRS-1201:CRSD started on node ctmisdb1.
2012-03-21 09:50:38.188
[cssd(7490)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 .
2012-03-21 09:50:46.726
[crsd(5649)]CRS-1204:Recovering CRS resources for node ctmisdb2.
2012-03-21 09:55:21.760
[cssd(7490)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 ctmisdb2 .
2012-03-21 12:07:46.681
[cssd(7426)]CRS-1605:CSSD voting file is online: /dev/raw/raw2. Details in /u01/app/oracle/product/crs/log/ctmisdb1/cssd/ocssd.log.
2012-03-21 12:07:50.432
[cssd(7426)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 ctmisdb2 .
2012-03-21 12:07:50.893
[crsd(5549)]CRS-1012:The OCR service started on node ctmisdb1.
2012-03-21 12:07:50.942
[evmd(7304)]CRS-1401:EVMD started on node ctmisdb1.
2012-03-21 12:07:52.827
[crsd(5549)]CRS-1201:CRSD started on node ctmisdb1.
2012-03-21 12:48:41.908
[cssd(7448)]CRS-1605:CSSD voting file is online: /dev/raw/raw2. Details in /u01/app/oracle/product/crs/log/ctmisdb1/cssd/ocssd.log.
2012-03-21 12:48:45.741
[cssd(7448)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 ctmisdb2 .
2012-03-21 12:48:49.173
[crsd(5546)]CRS-1012:The OCR service started on node ctmisdb1.
2012-03-21 12:48:49.190
[evmd(7328)]CRS-1401:EVMD started on node ctmisdb1.
2012-03-21 12:48:50.818
[crsd(5546)]CRS-1201:CRSD started on node ctmisdb1.
2012-03-21 13:26:36.398
[cssd(7343)]CRS-1605:CSSD voting file is online: /dev/raw/raw2. Details in /u01/app/oracle/product/crs/log/ctmisdb1/cssd/ocssd.log.
2012-03-21 13:26:40.492
[cssd(7343)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 ctmisdb2 .
2012-03-21 13:26:40.939
[crsd(5542)]CRS-1012:The OCR service started on node ctmisdb1.
2012-03-21 13:26:40.977
[evmd(7223)]CRS-1401:EVMD started on node ctmisdb1.
2012-03-21 13:26:42.772
[crsd(5542)]CRS-1201:CRSD started on node ctmisdb1.
node os log....+
Mar 21 12:06:35 ctmisdb1 rc: Starting readahead: succeeded
Mar 21 12:06:35 ctmisdb1 messagebus: messagebus startup succeeded
Mar 21 12:06:36 ctmisdb1 cups-config-daemon: cups-config-daemon startup succeeded
Mar 21 12:06:36 ctmisdb1 haldaemon: haldaemon startup succeeded
Mar 21 12:06:37 ctmisdb1 fstab-sync[6267]: removed all generated mount points
Mar 21 12:06:37 ctmisdb1 fstab-sync[6378]: added mount point /media/cdrecorder for /dev/hde
Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6323]: session opened for user oracle by (uid=0)
Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6324]: session opened for user oracle by (uid=0)
Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6229]: session opened for user oracle by (uid=0)
Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6229]: session closed for user oracle
Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6644]: session opened for user oracle by (uid=0)
Mar 21 12:06:37 ctmisdb1 kernel: matroxfb: cannot set xres to 800, rounded up to 832
Mar 21 12:06:37 ctmisdb1 last message repeated 2 times
Mar 21 12:06:41 ctmisdb1 su(pam_unix)[6323]: session closed for user oracle
Mar 21 12:06:41 ctmisdb1 su(pam_unix)[6644]: session closed for user oracle
Mar 21 12:06:41 ctmisdb1 su(pam_unix)[6324]: session closed for user oracle
Mar 21 12:06:41 ctmisdb1 logger: Cluster Ready Services completed waiting on dependencies.
Mar 21 12:06:41 ctmisdb1 last message repeated 2 times
Mar 21 12:06:45 ctmisdb1 gdm(pam_unix)[6379]: session opened for user root by (uid=0)
Mar 21 12:06:46 ctmisdb1 gconfd (root-7052): starting (version 2.8.1), pid 7052 user 'root'
Mar 21 12:06:47 ctmisdb1 gconfd (root-7052): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration source at position 0
Mar 21 12:06:47 ctmisdb1 gconfd (root-7052): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position 1
Mar 21 12:06:47 ctmisdb1 gconfd (root-7052): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source at position 2
Mar 21 12:06:55 ctmisdb1 gconfd (root-7052): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position 0
Mar 21 12:07:41 ctmisdb1 su(pam_unix)[5547]: session opened for user oracle by (uid=0)
Mar 21 12:07:41 ctmisdb1 logger: Running CRSD with TZ =
Mar 21 12:07:43 ctmisdb1 su(pam_unix)[7399]: session opened for user oracle by (uid=0)
Mar 21 12:12:49 ctmisdb1 sshd(pam_unix)[15323]: session opened for user root by root(uid=0)
Mar 21 12:12:57 ctmisdb1 su(pam_unix)[15531]: session opened for user oracle by root(uid=0)
Mar 21 12:47:05 ctmisdb1 syslogd 1.4.1: restart.
ocssd log....
[    CSSD]2012-03-21 11:24:41.045 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661f0c0) proc(0x8006622560) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 11:24:41.078 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660cfe0) proc(0x800662ba70) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:07:44.564 >USER: Oracle Database 10g CSS Release 10.2.0.1.0 Production Copyright 1996, 2004 Oracle. All rights reserved.
[ clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=ctmisdb1DBG_CSSD))
[    CSSD]2012-03-21 12:07:44.564 >USER: CSS daemon log for node ctmisdb1, number 1, in cluster crs
[    CSSD]2012-03-21 12:07:44.581 [28260544] >TRACE: clssscmain: local-only set to false
[    CSSD]2012-03-21 12:07:44.603 [28260544] >TRACE: clssnmReadNodeInfo: added node 1 (ctmisdb1) to cluster
[    CSSD]2012-03-21 12:07:44.621 [28260544] >TRACE: clssnmReadNodeInfo: added node 2 (ctmisdb2) to cluster
[    CSSD]2012-03-21 12:07:44.627 [72925824] >TRACE: clssnm_skgxnmon: skgxn init failed, rc 1
[    CSSD]2012-03-21 12:07:44.627 [28260544] >TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor
[    CSSD]2012-03-21 12:07:44.641 [28260544] >TRACE: clssnmInitNMInfo: misscount set to 60
[    CSSD]2012-03-21 12:07:44.655 [28260544] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw2)
[    CSSD]2012-03-21 12:07:46.661 [72925824] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw2)
[    CSSD]2012-03-21 12:07:46.690 [72925824] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(18) wrtcnt(7920) LATS(0) Disk lastSeqNo(7920)
[    CSSD]2012-03-21 12:07:46.752 [28260544] >TRACE: clssnmFatalInit: fatal mode enabled
[    CSSD]2012-03-21 12:07:46.752 [94777984] >TRACE: clssnmconnect: connecting to node 1, flags 0x0001, connector 1
[    CSSD]2012-03-21 12:07:46.753 [94777984] >TRACE: clssnmconnect: connecting to node 0, flags 0x0000, connector 1
[    CSSD]2012-03-21 12:07:46.753 [94777984] >TRACE: clssnmClusterListener: Probing node(2)
[    CSSD]2012-03-21 12:07:46.755 [94777984] >TRACE: clssnmConnComplete: connected to node 2 (con 0x8006601040), state 3 birth 0, unique 1332303918/1332303918 prevConuni(0)
[    CSSD]2012-03-21 12:07:46.756 [106332800] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
[    CSSD]2012-03-21 12:07:46.756 [106332800] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_ctmisdb1_crs))
[    CSSD]2012-03-21 12:07:46.757 [151810688] >TRACE: clssnmPollingThread: Connection complete
[    CSSD]2012-03-21 12:07:46.757 [162296448] >TRACE: clssnmSendingThread: Connection complete
[    CSSD]2012-03-21 12:07:46.757 [172782208] >TRACE: clssnmRcfgMgrThread: Connection complete
[    CSSD]2012-03-21 12:07:46.757 [172782208] >TRACE: clssnmRcfgMgrThread: Local Join
[    CSSD]2012-03-21 12:07:46.757 [172782208] >WARNING: clssnmLocalJoinEvent: takeover aborted due to connected but inactive nodes
[    CSSD]2012-03-21 12:07:47.339 [94777984] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[ctmisdb2] seq[5] sync[18]
[    CSSD]2012-03-21 12:07:47.759 [172782208] >TRACE: clssnmRcfgMgrThread: lastleader(2) unique(1332311864)
[    CSSD]2012-03-21 12:07:48.341 [94777984] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(18)
[    CSSD]2012-03-21 12:07:50.346 [94777984] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[    CSSD]2012-03-21 12:07:50.346 [94777984] >TRACE: clssnmDeactivateNode: node 0 () left cluster
[    CSSD]2012-03-21 12:07:50.346 [94777984] >TRACE: clssnmUpdateNodeState: node 1, state (1/2) unique (1332311864/1332311864) prevConuni(0) birth (0/18) (old/new)
[    CSSD]2012-03-21 12:07:50.346 [94777984] >TRACE: clssnmUpdateNodeState: node 2, state (4/3) unique (1332303918/1332303918) prevConuni(0) birth (0/16) (old/new)
[    CSSD]2012-03-21 12:07:50.346 [94777984] >USER: clssnmHandleUpdate: SYNC(18) from node(2) completed
[    CSSD]2012-03-21 12:07:50.346 [94777984] >USER: clssnmHandleUpdate: NODE 1 (ctmisdb1) IS ACTIVE MEMBER OF CLUSTER
[    CSSD]2012-03-21 12:07:50.346 [94777984] >USER: clssnmHandleUpdate: NODE 2 (ctmisdb2) IS ACTIVE MEMBER OF CLUSTER
[    CSSD]2012-03-21 12:07:50.429 [28260544] >USER: NMEVENT_SUSPEND [00][00][00][00]
[    CSSD]2012-03-21 12:07:50.429 [183267968] >TRACE: clssgmReconfigThread: started for reconfig (18)
[    CSSD]2012-03-21 12:07:50.429 [183267968] >USER: NMEVENT_RECONFIG [00][00][00][06]
[    CSSD]2012-03-21 12:07:50.429 [183267968] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 18
[    CSSD]2012-03-21 12:07:50.430 [140255872] >TRACE: clssgmInitialRecv: (0x102a0360) accepted a new connection from node 2 born at 16 active (2, 2), vers (10,3,1,2)
[    CSSD]2012-03-21 12:07:50.430 [140255872] >TRACE: clssgmInitialRecv: conns done (2/2)
[    CSSD]2012-03-21 12:07:50.430 [183267968] >TRACE: clssgmEstablishMasterNode: MASTER for 18 is node(2) birth(16)
[    CSSD]2012-03-21 12:07:50.430 [183267968] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
[    CSSD]2012-03-21 12:07:50.432 [140255872] >TRACE: clssgmHandleDBDone(): src/dest (2/65535) size(72) incarn 18
[    CSSD]CLSS-3000: reconfiguration successful, incarnation 18 with 2 nodes
[    CSSD]CLSS-3001: local node number 1, master node number 2
[    CSSD]2012-03-21 12:07:50.433 [183267968] >TRACE: clssgmReconfigThread: completed for reconfig(18), with status(1)
[    CSSD]2012-03-21 12:07:50.550 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006603bb0) proc(0x8006608b00) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:07:50.551 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x80066066f0) proc(0x8006608d70) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:07:53.569 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660ec70) proc(0x8006611260) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:00.829 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006610990) proc(0x800660de00) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:04.698 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006613030) proc(0x8006612930) pid(8115) proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:04.816 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006612950) proc(0x8006613c20) pid(8115) proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:04.832 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006612950) proc(0x8006613c20) pid(8115) proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:06.615 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006612950) proc(0x8006613c20) pid(8171) proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:07.114 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006615960) proc(0x8006616350) pid(8175) proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:11.373 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x80066192a0) proc(0x8006619470) pid(8302) proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:11.669 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661bf60) proc(0x800661ee20) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:17.135 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661bf60) proc(0x800661ee70) pid(8458) proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:17.268 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661fc00) proc(0x80066220d0) pid(8460) proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:17.305 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x80066223e0) proc(0x8006625250) pid(8462) proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:17.353 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006625560) proc(0x8006628430) pid(8464) proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:24.585 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006625560) proc(0x8006628430) pid(8645) proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:27.957 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006628740) proc(0x800662b610) pid(8722) proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:30.931 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800662cce0) proc(0x800662c860) pid(8801) proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:36.400 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661c5f0) proc(0x800661eb50) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:37.863 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800662f1c0) proc(0x800661eee0) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:38.537 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800662f1c0) proc(0x800661d500) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:39.232 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661bf60) proc(0x800661d500) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:43.085 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006611210) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:08:58.971 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x80066112c0) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:09:59.290 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800660b190) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:10:59.589 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800660b190) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:11:59.904 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800660b190) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:13:00.203 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800660b190) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:13:14.029 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800660b190) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:14:00.501 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006611210) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:15:00.809 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:16:01.117 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:17:01.447 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800662f0f0) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:01.762 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:39.841 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:42.123 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006628670) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:42.316 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:42.843 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:42.963 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006628670) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:43.098 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b260) proc(0x800662bd20) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:44.173 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006628670) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:44.368 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b260) proc(0x800660b310) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:45.351 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006628670) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:46.236 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:47.031 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:47.694 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:47.819 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b260) proc(0x800660b310) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:48.103 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:48.327 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b260) proc(0x800660b310) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:48.484 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006611210) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:48.758 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800662f0f0) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:49.529 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:50.509 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800662f0f0) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:51.060 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:18:51.558 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800662f0f0) pid() proto(10:2:1:1)
[    CSSD]2012-03-21 12:48:39.836 >USER: Oracle Database 10g CSS Release 10.2.0.1.0 Production Copyright 1996, 2004 Oracle. All rights reserved.
[ clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=ctmisdb1DBG_CSSD))
[    CSSD]2012-03-21 12:48:39.836 >USER: CSS daemon log for node ctmisdb1, number 1, in cluster crs
[    CSSD]2012-03-21 12:48:39.849 [28260544] >TRACE: clssscmain: local-only set to false
[    CSSD]2012-03-21 12:48:39.865 [28260544] >TRACE: clssnmReadNodeInfo: added node 1 (ctmisdb1) to cluster
[    CSSD]2012-03-21 12:48:39.872 [28260544] >TRACE: clssnmReadNodeInfo: added node 2 (ctmisdb2) to cluster
[    CSSD]2012-03-21 12:48:39.879 [72925824] >TRACE: clssnm_skgxnmon: skgxn init failed, rc 1
[    CSSD]2012-03-21 12:48:39.879 [28260544] >TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor
[    CSSD]2012-03-21 12:48:39.881 [28260544] >TRACE: clssnmInitNMInfo: misscount set to 60
[    CSSD]2012-03-21 12:48:39.888 [28260544] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw2)
[    CSSD]2012-03-21 12:48:41.892 [72925824] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw2)
[    CSSD]2012-03-21 12:48:41.915 [72925824] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(20) wrtcnt(10367) LATS(0) Disk lastSeqNo(10367)
[    CSSD]2012-03-21 12:48:41.959 [28260544] >TRACE: clssnmFatalInit: fatal mode enabled
[    CSSD]2012-03-21 12:48:41.959 [94777984] >TRACE: clssnmconnect: connecting to node 1, flags 0x0001, connector 1
[    CSSD]2012-03-21 12:48:41.959 [94777984] >TRACE: clssnmconnect: connecting to node 0, flags 0x0000, connector 1
[    CSSD]2012-03-21 12:48:41.959 [94777984] >TRACE: clssnmClusterListener: Probing node(2)
[    CSSD]2012-03-21 12:48:41.961 [94777984] >TRACE: clssnmConnComplete: connected to node 2 (con 0x8006702790), state 3 birth 0, unique 1332303918/1332303918 prevConuni(0)
[    CSSD]2012-03-21 12:48:41.962 [106332800] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
[    CSSD]2012-03-21 12:48:41.962 [106332800] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_ctmisdb1_crs))
[    CSSD]2012-03-21 12:48:41.963 [152330880] >TRACE: clssnmPollingThread: Connection complete
[    CSSD]2012-03-21 12:48:41.963 [162816640] >TRACE: clssnmSendingThread: Connection complete
[    CSSD]2012-03-21 12:48:41.963 [173302400] >TRACE: clssnmRcfgMgrThread: Connection complete
[    CSSD]2012-03-21 12:48:41.963 [173302400] >TRACE: clssnmRcfgMgrThread: Local Join
[    CSSD]2012-03-21 12:48:41.963 [173302400] >WARNING: clssnmLocalJoinEvent: takeover aborted due to connected but inactive nodes
[    CSSD]2012-03-21 12:48:42.631 [94777984] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[ctmisdb2] seq[13] sync[20]
[    CSSD]2012-03-21 12:48:42.965 [173302400] >TRACE: clssnmRcfgMgrThread: lastleader(2) unique(1332314319)
[    CSSD]2012-03-21 12:48:43.636 [94777984] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(20)
[    CSSD]2012-03-21 12:48:45.640 [94777984] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[    CSSD]2012-03-21 12:48:45.640 [94777984] >TRACE: clssnmDeactivateNode: node 0 () left cluster
[    CSSD]2012-03-21 12:48:45.640 [94777984] >TRACE: clssnmUpdateNodeState: node 1, state (1/2) unique (1332314319/1332314319) prevConuni(0) birth (0/20) (old/new)
[    CSSD]2012-03-21 12:48:45.640 [94777984] >TRACE: clssnmUpdateNodeState: node 2, state (4/3) unique (1332303918/1332303918) prevConuni(0) birth (0/16) (old/new)
[    CSSD]2012-03-21 12:48:45.640 [94777984] >USER: clssnmHandleUpdate: SYNC(20) from node(2) completed
[    CSSD]2012-03-21 12:48:45.640 [94777984] >USER: clssnmHandleUpdate: NODE 1 (ctmisdb1) IS ACTIVE MEMBER OF CLUSTER
[    CSSD]2012-03-21 12:48:45.640 [94777984] >USER: clssnmHandleUpdate: NODE 2 (ctmisdb2) IS ACTIVE MEMBER OF CLUSTER
[    CSSD]2012-03-21 12:48:45.737 [28260544] >USER: NMEVENT_SUSPEND [00][00][00][00]
[    CSSD]2012-03-21 12:48:45.738 [183788160] >TRACE: clssgmReconfigThread: started for reconfig (20)
[    CSSD]2012-03-21 12:48:45.738 [183788160] >USER: NMEVENT_RECONFIG [00][00][00][06]
[    CSSD]2012-03-21 12:48:45.738 [183788160] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 20
[    CSSD]2012-03-21 12:48:45.739 [140776064] >TRACE: clssgmInitialRecv: (0x102a0370) accepted a new connection from node 2 born at 16 active (2, 2), vers (10,3,1,2)
[    CSSD]2012-03-21 12:48:45.739 [140776064] >TRACE: clssgmInitialRecv: conns done (2/2)
[    CSSD]2012-03-21 12:48:45.739 [183788160] >TRACE: clssgmEstablishMasterNode: MASTER for 20 is node(2) birth(16)
[    CSSD]2012-03-21 12:48:45.739 [183788160] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
[    CSSD]2012-03-21 12:48:45.741 [140776064] >TRACE: clssgmHandleDBDone(): src/dest (2/65535) size(72) incarn 20
[    CSSD]CLSS-3000: reconfiguration successful, incarnation 20 with 2 nodes
Plz check and help..........

What would happened when one RAC node's public NIC down ?

Dear all,
There's a two-node RAC on my office. As my observation, when one RAC node's public NIC is down, the "crs_stat -t" command wouldn't show any resource is OFFLINE. So at this moment if i use sqlplus to connect to my RAC db, it still will choose node1 or node2 instance randomly right? And if i've been assigned to the node's instance whose public NIC is down, my sqlplus would failed to connect to db?
So, can we say that Oracle RAC can't supply HA function if one node's public NIC is down? Or is there just any other solution to solve this issue? Any suggestion would be appreciated.

Public node is down means that only one node would be able to take the load i.e. in your case it will be Node 2. It may happen that CRS is unable to record the status - in such case there may be failed connections to one node.

Remove RAC node on Windows

I have done all the steps to remove one RAC node but got stuck at the step of running rootdelete.sh file from $CRS_HOME/install directory as I don't have this file in windows environment.
What is the equivalent file for rootdelete.sh on windows platform. I want to run this to remove the node info from the clusterware entry.
Is there a good document that explains about removing the node on windows platform.

Hello,
You need to run the following steps to remove a node from a RAC cluster on Windows platform:
Perform the following steps on a node other than the node you want to delete:
1. Run the Database Configuration Assistant (DBCA) utility to delete the instance.
2. Then run the Net Configuration Assistant (NetCA) to delete the listener.
3. If the node that you are deleting has ASM instance, then delete the ASM instance using the srvctl stop asm and srvctl remove asm commands.
4. Run the command srvctl stop nodeapps -n nodename of the node to be deleted to stop the node applications.
5. Run the command srvctl remove nodeapps -n nodename of the node to be deleted to remove the node applications.
6. Stop isqlplus if it is running.
7. Run the command setup.exe -updateNodeList ORACLE_HOME=Oracle_home ORACLE_HOME_NAME=Oracle_home_name CLUSTER_NODES=remaining
nodes where remaining nodes is a list of the nodes that are to remain part of the cluster.
Perform the following steps on the deleted RAC node:
1. Run the command setup.exe -updateNodeList -local -noClusterEnabled ORACLE_HOME=Oracle_home ORACLE_HOME_NAME=Oracle_home_name CLUSTER_NODES="".
Note that you do not need a value for "" after the CLUSTER_NODES= entry in this command. If you delete more than one node, then you must run this command on every deleted node to remove the Oracle home if you have a non-shared Oracle home (non-cluster file system) installation.
2. On the same node, delete the Windows Registry entries and ASM services using Oradim.
3. From the deleted RAC node, run the command Oracle_home\oui\bin\setup.exe to start the Oracle Universal Installer (OUI). Select Deinstall Products and select the Oracle home that you want to de-install.
4. Then to delete the CRS node, from a remaining node run the command crssetup del -nn node_name of the deleted node, node number
5. Then run the command setup.exe -updateNodeList ORACLE_HOME=CRS home ORACLE_HOME_NAME=CRS home name CLUSTER_NODES=remaining nodes where remaining nodes is a list of the nodes that are to remain in the cluster.
6. Then on the deleted CRS node, run the command setup.exe -updateNodeList -local -noClusterEnabled ORACLE_HOME=CRS home ORACLE_HOME_NAME=CRS home name CLUSTER_NODES=""
7. Remove the Oracle home manually from the new node if the home is not shared and then manually remove the HKLM/software/Oracle registry keys and the Oracle services. 7
8. After adding or deleting nodes from your Oracle Database 10g with RAC environment, and after you are sure that your system is functioning properly, make a backup of the contents of the voting disk using the dd.exe utility. The dd.exe utility is part of the MKS toolkit.
ASM Instance Cleanup Procedures after Node Deletion on Windows-Based Platforms
The delete node procedure requires the following additional steps on Windows-based systems to remove the ASM instances:
1. If this is the Oracle home from which the node-specific listener named LISTENER_nodename runs, then use NetCA to remove this listener and its CRS resources. If necessary, re-create this listener in another home.
2. If this is the Oracle home from which the ASM instance runs, then remove the ASM configuration by running the following command for all nodes on which this Oracle home exists:
srvctl stop asm -n node
Then run the following command for the nodes that you are removing:
srvctl remove asm -n node
3. If you are using a cluster file system for your ASM Oracle home, then run the following commands on the local node:
4. rd -s -q %ORACLE_BASE%\admin\+ASM
delete %ORACLE_HOME%\database\*ASM*
5. If you are not using a cluster file system for your ASM Oracle home, then run the delete command mentioned in the previous step on each node on which the Oracle home exists.
6. Run the following command on each node that has an ASM instance:
oradim -delete -asmsid +ASMnode_number
Source:
Oracle® Real Application Clusters Administrator's Guide
10g Release 1 (10.1)
Part Number B10765-02
Chapter 5: Adding and Deleting Nodes and Instances
Hope this helps,
Ben Prusinski, Oracle 10g OCP
http://oracle-magician.blogspot.com

Scan-vip running only on one RAC node

Hi ,
While setting up RAC11.2 on Centos 5.7 , I was getting this error during the grid installation:
PRCR-1079 : Failed to start resource ora.scan1.vip
CRS-5005: IP Address: 192.168.100.208 is already in use in the network
CRS-2674: Start of 'ora.scan1.vip' on 'falcen6b' failed
CRS-2632: There are no more servers to try to place resource 'ora.scan1.vip' on that would satisfy its placement policy
PRCR-1079 : Failed to start resource ora.scan2.vip
CRS-5005: IP Address: 192.168.100.209 is already in use in the network
CRS-2674: Start of 'ora.scan2.vip' on 'falcen6b' failed
CRS-2632: There are no more servers to try to place resource 'ora.scan2.vip' on that would satisfy its placement policy
PRCR-1079 : Failed to start resource ora.scan3.vip
CRS-5005: IP Address: 192.168.100.210 is already in use in the network
CRS-2674: Start of 'ora.scan3.vip' on 'falcen6b' failed
CRS-2632: There are no more servers to try to place resource 'ora.scan3.vip' on that would satisfy its placement policy
I figured that the scan service is able to run only on one node at a time. When I stopped the service on rac1 and started it on rac2 the service is starting.
But I think for the grid installation the scan service has to simultaneously run on both the nodes.
How do I resolve it?
Any suggestions please.
PS - I am planning to try with the patch 11.0.2.3 but it will be a while till i get access to it.
Till then can someone suggest a workaround?

Hi Balazs Papp and onedbguru,
I was able to resolve that error by running the following command on rac2, now that part of the installer passed.
crsctl start res ora.scan1.vip
However the cluster verification utility is failing at the end of installer.
When I executed the below command, this is my output:
[oracle@falcen6a grid]$ ./runcluvfy.sh stage -post crsinst -n falcen6a,falcen6b -verbose
Performing post-checks for cluster services setup
Checking node reachability...
Check: Node reachability from node "falcen6a"
Destination Node Reachable?
falcen6a yes
falcen6b yes
Result: Node reachability check passed from node "falcen6a"
Checking user equivalence...
Check: User equivalence for user "oracle"
Node Name Comment
falcen6b passed
falcen6a passed
Result: User equivalence check passed for user "oracle"
Checking time zone consistency...
Time zone consistency check passed.
Checking Cluster manager integrity...
Checking CSS daemon...
Node Name Status
falcen6b running
falcen6a running
Oracle Cluster Synchronization Services appear to be online.
Cluster manager integrity check passed
UDev attributes check for OCR locations started...
Result: UDev attributes check passed for OCR locations
UDev attributes check for Voting Disk locations started...
Result: UDev attributes check passed for Voting Disk locations
Check default user file creation mask
Node Name Available Required Comment
falcen6b 0022 0022 passed
falcen6a 0022 0022 passed
Result: Default user file creation mask check passed
Checking cluster integrity...
Cluster is divided into 2 partitions
Partition 1 consists of the following members:
Node Name
falcen6b
Partition 2 consists of the following members:
Node Name
falcen6a
Cluster integrity check failed. Cluster is divided into 2 partition(s).
Checking OCR integrity...
Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations
ERROR:
PRVF-4193 : Asm is not running on the following nodes. Proceeding with the remaining nodes.
Checking OCR config file "/etc/oracle/ocr.loc"...
OCR config file "/etc/oracle/ocr.loc" check successful
ERROR:
PRVF-4195 : Disk group for ocr location "+DATA" not available on the following nodes:
Checking size of the OCR location "+DATA" ...
Size check for OCR location "+DATA" successful...
OCR integrity check failed
Checking CRS integrity...
ERROR:
PRVF-5316 : Failed to retrieve version of CRS installed on node "falcen6b"
The Oracle clusterware is healthy on node "falcen6b"
The Oracle clusterware is healthy on node "falcen6a"
CRS integrity check failed
Checking node application existence...
Checking existence of VIP node application
Node Name Required Status Comment
falcen6b yes unknown failed
falcen6a yes unknown failed
Result: Check failed.
Checking existence of ONS node application
Node Name Required Status Comment
falcen6b no unknown ignored
falcen6a no online passed
Result: Check ignored.
Checking existence of GSD node application
Node Name Required Status Comment
falcen6b no unknown ignored
falcen6a no does not exist ignored
Result: Check ignored.
Checking existence of EONS node application
Node Name Required Status Comment
falcen6b no unknown ignored
falcen6a no online passed
Result: Check ignored.
Checking existence of NETWORK node application
Node Name Required Status Comment
falcen6b no unknown ignored
falcen6a no online passed
Result: Check ignored.
Checking Single Client Access Name (SCAN)...
SCAN VIP name Node Running? ListenerName Port Running?
falcen6-scan unknown false LISTENER 1521 false
WARNING:
PRVF-5056 : Scan Listener "LISTENER" not running
Checking name resolution setup for "falcen6-scan"...
SCAN Name IP Address Status Comment
falcen6-scan 192.168.100.210 passed
falcen6-scan 192.168.100.208 passed
falcen6-scan 192.168.100.209 passed
Verification of SCAN VIP and Listener setup failed
OCR detected on ASM. Running ACFS Integrity checks...
Starting check to see if ASM is running on all cluster nodes...
PRVF-5137 : Failure while checking ASM status on node "falcen6b"
Starting Disk Groups check to see if at least one Disk Group configured...
Disk Group Check passed. At least one Disk Group configured
Task ACFS Integrity check failed
Checking Oracle Cluster Voting Disk configuration...
Oracle Cluster Voting Disk configuration check passed
Checking to make sure user "oracle" is not in "root" group
Node Name Status Comment
falcen6b does not exist passed
falcen6a does not exist passed
Result: User "oracle" is not part of "root" group. Check passed
Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed
Checking if CTSS Resource is running on all nodes...
Check: CTSS Resource running on all nodes
Node Name Status
falcen6b passed
falcen6a passed
Result: CTSS resource check passed
Querying CTSS for time offset on all nodes...
Result: Query of CTSS for time offset passed
Check CTSS state started...
Check: CTSS state
Node Name State
falcen6b Observer
falcen6a Observer
CTSS is in Observer state. Switching over to clock synchronization checks using NTP
Starting Clock synchronization checks using Network Time Protocol(NTP)...
NTP Configuration file check started...
The NTP configuration file "/etc/ntp.conf" is available on all nodes
NTP Configuration file check passed
Checking daemon liveness...
Check: Liveness for "ntpd"
Node Name Running?
falcen6b yes
falcen6a yes
Result: Liveness check passed for "ntpd"
Checking NTP daemon command line for slewing option "-x"
Check: NTP daemon command line
Node Name Slewing Option Set?
falcen6b yes
falcen6a yes
Result:
NTP daemon slewing option check passed
Checking NTP daemon's boot time configuration, in file "/etc/sysconfig/ntpd", for slewing option "-x"
Check: NTP daemon's boot time configuration
Node Name Slewing Option Set?
falcen6b yes
falcen6a yes
Result:
NTP daemon's boot time configuration check for slewing option passed
NTP common Time Server Check started...
NTP Time Server "133.243.236.19" is common to all nodes on which the NTP daemon is running
NTP Time Server "133.243.236.18" is common to all nodes on which the NTP daemon is running
NTP Time Server "210.173.160.86" is common to all nodes on which the NTP daemon is running
NTP Time Server ".LOCL." is common to all nodes on which the NTP daemon is running
Check of common NTP Time Server passed
Clock time offset check from NTP Time Server started...
Checking on nodes "[falcen6b, falcen6a]"...
Check: Clock time offset from NTP Time Server
Time Server: 133.243.236.19
Time Offset Limit: 1000.0 msecs
Node Name Time Offset Status
falcen6b 15.332 passed
falcen6a -1.503 passed
Time Server "133.243.236.19" has time offsets that are within permissible limits for nodes "[falcen6b, falcen6a]".
Time Server: 133.243.236.18
Time Offset Limit: 1000.0 msecs
Node Name Time Offset Status
falcen6b 15.115 passed
falcen6a -1.614 passed
Time Server "133.243.236.18" has time offsets that are within permissible limits for nodes "[falcen6b, falcen6a]".
Time Server: 210.173.160.86
Time Offset Limit: 1000.0 msecs
Node Name Time Offset Status
falcen6b 15.219 passed
falcen6a -1.527 passed
Time Server "210.173.160.86" has time offsets that are within permissible limits for nodes "[falcen6b, falcen6a]".
Time Server: .LOCL.
Time Offset Limit: 1000.0 msecs
Node Name Time Offset Status
falcen6b 0.0 passed
falcen6a 0.0 passed
Time Server ".LOCL." has time offsets that are within permissible limits for nodes "[falcen6b, falcen6a]".
Clock time offset check passed
Result: Clock synchronization check using Network Time Protocol(NTP) passed
Oracle Cluster Time Synchronization Services check passed
Post-check for cluster services setup was unsuccessful on all the nodes.
[oracle@falcen6a grid]$
Any suggestions?

Found the errors in CSSD logs of RAC node

Found the below error in CSSD logs in One of RAC nodes from 5:15 to 5:18 PM, after this the error got disappeared. Could anyone please have an idea what could be the reason of this error.
Also, at that time we didn't find any errors in the alert log.
[    CSSD]2009-07-19 17:15:51.048 [3600] >TRACE: Authorization failed (112bd2a70), timed out, start 17:13:51.041, duration 120009
[    CSSD]2009-07-19 17:15:51.048 [3600] >TRACE: Authorization prepare time: 2 ms
[    CSSD]2009-07-19 17:15:51.233 [3086] >TRACE: clssgmClientConnectMsg: Connect from con(112b67930) proc(112b680b0) pid(1049540) proto(10:2:1:1)
[    CSSD]2009-07-19 17:15:51.268 [3600] >TRACE: Authorization failed (112bd4a10), timed out, start 17:13:51.268, duration 120003
[    CSSD]2009-07-19 17:15:51.268 [3600] >TRACE: Authorization prepare time: 3 ms
[    CSSD]2009-07-19 17:15:52.544 [3086] >TRACE: clssgmClientConnectMsg: Connect from con(112b67930) proc(112b680b0) pid(786918) proto(10:2:1:1)
[    CSSD]2009-07-19 17:15:53.297 [3600] >TRACE: Authorization failed (112c38af0), timed out, start 17:13:53.290, duration 120009
[    CSSD]2009-07-19 17:15:53.297 [3600] >TRACE: Authorization prepare time: 3 ms
[    CSSD]2009-07-19 17:15:53.317 [3600] >TRACE: Authorization failed (112d356f0), timed out, start 17:13:53.320, duration 120000
[    CSSD]2009-07-19 17:15:53.317 [3600] >TRACE: Authorization prepare time: 2 ms
[    CSSD]2009-07-19 17:16:02.342 [3086] >TRACE: clssgmClientConnectMsg: Connect from con(112b932b0) proc(112b67d10) pid(1336252) proto(10:2:1:1)
[    CSSD]2009-07-19 17:16:02.977 [3600] >TRACE: Authorization failed (112d04f70), timed out, start 17:14:02.978, duration 120001
[    CSSD]2009-07-19 17:16:02.977 [3600] >TRACE: Authorization prepare time: 2 ms
[    CSSD]2009-07-19 17:16:03.007 [3600] >TRACE: Authorization failed (112d38210), timed out, start 17:14:03.006, duration 120002
[    CSSD]2009-07-19 17:16:03.007 [3600] >TRACE: Authorization prepare time: 2 ms
[    CSSD]2009-07-19 17:16:10.447 [3600] >TRACE: Authorization failed (112bd7e30), timed out, start 17:14:10.441, duration 120007
[    CSSD]2009-07-19 17:16:10.447 [3600] >TRACE: Authorization prepare time: 2 ms
[    CSSD]2009-07-19 17:16:10.847 [3600] >TRACE: Authorization failed (112d3ee70), timed out, start 17:14:10.840, duration 120008
[    CSSD]2009-07-19 17:16:10.847 [3600] >TRACE: Authorization prepare time: 2 ms
Thanks,
Mahi

Check the metalink note:
6996694-OCSSD.BIN CONSUMING 100% CPU AND ASM/DB HANGING

Rac node failure crs cleanup failing

Similar Messages

Maybe you are looking for