UCCX 4.0 - Second node status is unknown

In UCCX 4.0, the status of 2nd node is shown as unknown and all services status in control center for that particular node are also unknown. I am also not able to login to crsadmin page of 2nd node. Please suggest the possible remedy to bring this node up into production.

Hi Rehan,
Was adding your UCCX second node successful?
Have you performed the initial appadmin configuration on second node?
Do you have HA license on first node?
IS your second node hardware capacity is equal or greater than the first node?
What about the time zone setting on the second node, is it same as First Node?
Is your CRS Node Manager up and running on the second node? Check this in the windows services page.
You can bring up any of the second Node filed service by navigating to Start->Programs->UCCX Admin->Serviceability tool, click on the second node check for the failure count and failed services, you can enable all the failed services and you need to restart the CRS (or UCCX) Node Manger service to activate them.
Hope it helps.
Anand
Please rate the helpful posts by clicking on the stars below the right answers !!

Similar Messages

UCCX 8.0 - 1st node crashed

Hi,
We have a 2-server UCCX 8.0 cluster running on UCS servers. Recently, when moving the publisher (1st node) to a new UCS server, we accidently deleted some files of the Virtual machine. (there are 2 folders in the datastore, named UCCX1 and UCCX1_1; my colleague deleted the UCCX1_1 folder as he thought it was not neccessary). After that, the ESXi kept asking for UCCX1_1\UCCX1_1.vmx file when we trying to boot the server. We had to re-add the server (browse to the vmx file in the datastore, and Add to Inventory); the server could boot up now, but I think we lost all the data (we cannot access to the Application web page).
Now we still have UCCX 2 running, could we force the 1st server to update its database to sync with the UCCX 2? If YES, how to do that?
If NO, what should we do? Re-install everything or is there a better way to recover the cluster?
Thanks,
hoanghiep

Hi Hoanghiep,
You can not make the UCCX 8.x series second node as the first node, this was supported only on Windows platform (i.e. UCCX 7.x and earlier releases).
If you have taken a valid DRS backup, than yes reinstall the UCCX 8.x first node (with the same details as before like hostnema, ip address, DNS....etc) and than restore this backup.
http://www.cisco.com/en/US/docs/voice_ip_comm/cust_contact/contact_center/crs/express_8_0/configuration/guide/uccx801drs.pdf
Restoring only the Publisher Node in an HA Setup (with Rebuild)
In a high availability (HA) setup , if there is a hard-drive failure or any other critical hardware or
Software failure which needs rebuild of the Publisher ( first ) node, then follow the below procedure to
recover the publisher node to the last backed up state of the publisher. Run the below procedure if you
have a valid backup taken before the failure of the node.
Procedure
Step 1 Perform a fresh installation of the same version of Cisco Unified Contact Center Express (using the same
administrator credentials, network configuration and security password used earlier) on the node prior
to restoring it. 
For more information on installing Cisco Unified Contact Center Express, see the Installing Cisco
Unified Contact Center Express available here:
http://www.cisco.com/en/US/products/sw/custcosw/ps1846/prod_installation_guides_list.html
Step 2 Navigate to Cisco Unified Contact Center Administration, select Disaster Recovery System from the
Navigation drop-down list box in the upper-right corner of the Cisco Unified Contact Center Express
Administration window, and click Go.
The Disaster Recovery System Logon window displays.
Step 3 Log in to the Disaster Recovery System by using the same Platform Administrator username and
password that you use to log in to Cisco Unified Operating System Administration.
Step 4 Configure the backup device. For more information, see Managing Backup Devices, page 7.
Step 5 Navigate to Restore > Restore Wizard. The Restore Wizard Step 1 window displays.
Step 6 In the Select Backup Device area, choose the backup device from which to restore.
Step 7 Click Next. The Restore Wizard Step 2 window displays.
Step 8 Choose the backup file that you want to restore.
Note The backup filename indicates the date and time that the system created the backup file.
Step 9 Click Next. The Restore Wizard Step 3 window displays.
Step 10 Select the feature UCCX.
Step 11 Click Next. The Restore Wizard Step 4 window displays,
Step 12 When you get prompted to choose the nodes to restore, choose only the first node (the publisher).
CautionDo not select the second (subscriber) node in this condition as this will result in failure of the restore attempt.
Step 13 To start restoring the data, click Restore.
Note During the restore process, do not perform any tasks with Cisco Unified Contact Center Express
Administration or User Options.
Restoring the first node may take up to several hours based on the size of database that is being restored.
Depending on the size of your database that you choose to restore, the system can require one hour or
more to restore.
Note Based on the requirements, you have the option to either retrieve the existing publisher node data
from the DRS backup to be available on all the nodes in the cluster or retrieve the more recent
data (if available) from the subscriber node to be available in the cluster.
Step 14 Run the following CLI command from the Subscriber node after the restore process is successful (restore
status indicates 100 per cent) to inititate restoring the Publisher node only (with rebuild).
utils uccx setuppubrestore
Step 15 Run the following CLI command on the target node; that is if you want to retrieve the publisher node’s
data, then run this command on the subscriber node, but if you want to retrieve the subscriber node’s data
(which is more up-to-date), then run this command on the publisher node.
utils uccx database forcedatasync
Warning In any case, you must execute this command on either of the nodes after restoring the publisher node.
Step 16 Restart both the nodes and run the following CLI command on the Publisher node to set up replication.
utils uccx dbreplication reset
For more information on restarting, see the Cisco Unified Communications Operating System
Administration Guide available here:
http://www.cisco.com/en/US/products/sw/custcosw/ps1846/prod_maintenance_guides_list.html.
CautionIf you have done some configuration or hardware changes while performing fresh installation in Step 1 that might impact the License MAC, then rehost your license again using the license rehosting mechanism before running the CLI command “utils uccx dbreplication reset”. For more information on the licensing rehosting mechanism, see the Installing Cisco Unified Contact Center Express available here:
http://www.cisco.com/en/US/products/sw/custcosw/ps1846/prod_installation_guides_list.html
Step 17 Your data gets restored on the publisher node. To view the status of the restore, see the “Viewing the
Restore Status” section on page 19.
Hope this helps.
Anand
Please rate helpful posts !!

Error CLSRSC-507 during the execution of root.sh on second node

Hi all.
   OS.......: Red-Hat 6.5
   RDBMS: Oracle 12.1.0.2.0
   During the installation of a 2-node RAC in a RHEL 6.5, during the execution of the root.sh script in the second node, i get the following error:
[root@oraprd02 grid]# ./root.sh
Performing root user operation.
The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME= /u01/app/12.1.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...
Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/12.1.0/grid/crs/install/crsconfig_params
2015/05/04 22:47:16 CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector.
2015/05/04 22:47:59 CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector.
2015/05/04 22:48:00 CLSRSC-363: User ignored prerequisites during installation
OLR initialization - successful
2015/05/04 22:48:46 CLSRSC-507: The root script cannot proceed on this node oraprd02 because either the first-node operations have not completed on node oraprd01 or there was an error in obtaining the status of the first-node operations.
Died at /u01/app/12.1.0/grid/crs/install/crsutils.pm line 3681.
The command '/u01/app/12.1.0/grid/perl/bin/perl -I/u01/app/12.1.0/grid/perl/lib -I/u01/app/12.1.0/grid/crs/install /u01/app/12.1.0/grid/crs/install/rootcrs.pl ' execution failed
The root.sh on the first node completed successfully. I get the succeeded message from the script in the first node.
Have anyone faced this problem? Any assistance will be most helpfull.
Thanks in advance.

crsd and cssd logs were empty and there was no relevant info in crs alert
i am just reinstalling clusterware now
one thing i wanted to ask
why does owner ship of raw files change back to root (after node restart)
even though i chnged them to oracle

Root.sh failed on second node while installing CRS 10g on centos 5.5

root.sh failed on second node while installing CRS 10g
Hi all,
I am able to install Oracle 10g RAC clusterware on first node of the cluster. However, when I run the root.sh script as root
user on second node of the cluster, it fails with following error message:
NO KEYS WERE WRITTEN. Supply -force parameter to override.
-force is destructive and will destroy any previous cluster
configuration.
Oracle Cluster Registry for cluster has already been initialized
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
Failure at final check of Oracle CRS stack.
10
and run cluvfy stage -post hwos -n all -verbose,it show message:
ERROR:
Could not find a suitable set of interfaces for VIPs.
Result: Node connectivity check failed.
Checking shared storage accessibility...
Disk Sharing Nodes (2 in count)
/dev/sda db2 db1
and run cluvfy stage -pre crsinst -n all -verbose,it show message:
ERROR:
Could not find a suitable set of interfaces for VIPs.
Result: Node connectivity check failed.
Checking system requirements for 'crs'...
No checks registered for this product.
and run cluvfy stage -post crsinst -n all -verbose,it show message:
Result: Node reachability check passed from node "DB2".
Result: User equivalence check passed for user "oracle".
Node Name CRS daemon CSS daemon EVM daemon
db2 no no no
db1 yes yes yes
Check: Health of CRS
Node Name CRS OK?
db1 unknown
Result: CRS health check failed.
check crsd.log and show message:
clsc_connect: (0x143ca610) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_db2_crs))
clsssInitNative: connect failed, rc 9
Any help would be greatly appreciated.
Edited by: 868121 on 2011-6-24 上午12:31

Hello, it took a little searching, but I found this in a note in the GRID installation guide for Linux/UNIX:
Public IP addresses and virtual IP addresses must be in the same subnet.
In your case, you are using two different subnets for the VIPs.

Root.sh fails on second node

I already posted this issue on database installation forum, and was suggested to post it on this forum.
Here are the details.
I am running Linux 64bit on ESx clients. Installing Oracle 11gR2.
It passed all the per-requisite. Run root.sh on first node. It finished with no errorrs.
On second node I got the following:
Running Oracle 11g root.sh script...
The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/app/11.2.0/grid
Enter the full pathname of the local bin directory: [usr/local/bin]:
The file "dbhome" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]:
The file "oraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]:
The file "coraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]:
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2010-07-13 12:51:28: Parsing the host name
2010-07-13 12:51:28: Checking for super user privileges
2010-07-13 12:51:28: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node fred0224, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
CRS-2672: Attempting to start 'ora.mdnsd' on 'fred0225'
CRS-2676: Start of 'ora.mdnsd' on 'fred0225' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'fred0225'
CRS-2676: Start of 'ora.gipcd' on 'fred0225' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'fred0225'
CRS-2676: Start of 'ora.gpnpd' on 'fred0225' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'fred0225'
CRS-2676: Start of 'ora.cssdmonitor' on 'fred0225' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'fred0225'
CRS-2672: Attempting to start 'ora.diskmon' on 'fred0225'
CRS-2676: Start of 'ora.diskmon' on 'fred0225' succeeded
CRS-2676: Start of 'ora.cssd' on 'fred0225' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'fred0225'
Start action for octssd aborted
CRS-2676: Start of 'ora.ctssd' on 'fred0225' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'fred0225'
CRS-2672: Attempting to start 'ora.asm' on 'fred0225'
CRS-2676: Start of 'ora.drivers.acfs' on 'fred0225' succeeded
CRS-2676: Start of 'ora.asm' on 'fred0225' succeeded
CRS-2664: Resource 'ora.ctssd' is already running on 'fred0225'
CRS-4000: Command Start failed, or completed with errors.
Command return code of 1 (256) from command: /u01/app/11.2.0/grid/bin/crsctl start resource ora.asm -init
Start of resource "ora.asm -init" failed
Failed to start ASM
Failed to start Oracle Clusterware stack
In the ocssd.log I found
[ CSSD][3559689984]clssnmvDHBValidateNCopy: node 1, fred0224, has a disk HB, but no network HB, DHB has rcfg 174483948, wrtcnt, 232, LATS 521702664, lastSeqNo 232, uniqueness 1279039649, timestamp 1279039959/521874274
In oraagent_oracle.log I found
[ clsdmc][1212365120]Fail to connect (ADDRESS=(PROTOCOL=ipc)(KEY=fred0225DBG_GPNPD)) with status 9
2010-07-13 12:54:07.234: [ora.gpnpd][1212365120] [check] Error = error 9 encountered when connecting to GPNPD
2010-07-13 12:54:07.238: [ora.gpnpd][1212365120] [check] Calling PID check for daemon
2010-07-13 12:54:07.238: [ora.gpnpd][1212365120] [check] Trying to check PID = 20584
2010-07-13 12:54:07.432: [ COMMCRS][1285794112]clsc_connect: (0x1304d850) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=fred0225DBG_GPNPD))
[ clsdmc][1222854976]Fail to connect (ADDRESS=(PROTOCOL=ipc)(KEY=fred0225DBG_MDNSD)) with status 9
2010-07-13 12:54:08.649: [ora.mdnsd][1222854976] [check] Error = error 9 encountered when connecting to MDNSD
2010-07-13 12:54:08.649: [ora.mdnsd][1222854976] [check] Calling PID check for daemon
2010-07-13 12:54:08.649: [ora.mdnsd][1222854976] [check] Trying to check PID = 20571
2010-07-13 12:54:08.841: [ COMMCRS][1201875264]clsc_connect: (0x12f3b1d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=fred0225DBG_MDNSD))
[ clsdmc][1159915840]Fail to connect (ADDRESS=(PROTOCOL=ipc)(KEY=fred0225DBG_GIPCD)) with status 9
2010-07-13 12:54:10.051: [ora.gipcd][1159915840] [check] Error = error 9 encountered when connecting to GIPCD
2010-07-13 12:54:10.051: [ora.gipcd][1159915840] [check] Calling PID check for daemon
2010-07-13 12:54:10.051: [ora.gipcd][1159915840] [check] Trying to check PID = 20566
2010-07-13 12:54:10.242: [ COMMCRS][1254324544]clsc_connect: (0x12f35630) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=fred0225DBG_GIPCD))
In oracssdagent_root.log I found
2010-07-13 12:52:28.698: [ CSSCLNT][1102481728]clssscConnect: gipc request failed with 29 (0x16)
2010-07-13 12:52:28.698: [ CSSCLNT][1102481728]clsssInitNative: connect failed, rc 29
2010-07-13 12:53:55.222: [ CSSCLNT][1102481728]clssnsqlnum: RPC failed rc 3
2010-07-13 12:53:55.222: [ USRTHRD][1102481728] clsnomon_cssini: failed 3 to fetch node number
2010-07-13 12:53:55.222: [ USRTHRD][1102481728] clsnomon_init: css init done, nodenum -1.
2010-07-13 12:53:55.222: [ CSSCLNT][1102481728]clsssRecvMsg: got a disconnect from the server while waiting for message type 43
2010-07-13 12:53:55.222: [ CSSCLNT][1102481728]clsssGetNLSData: Failure receiving a msg, rc 3
If you need more info, let me know.

Well, the error clearly indicates that a communication problem exists on the private interconnect.
Could this be a setting in ESX, which prevents some communication between the clients on the second network card? Any routing table in ESX not configured correctly?
Sebastian

Root.sh on second node fails

I am running Linux 64bit. Installing Oracle 11gR2.
It passed all the per-requisite. Run root.sh on first node. It finished with no errorrs.
On second node I got the following:
Running Oracle 11g root.sh script...
The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/app/11.2.0/grid
Enter the full pathname of the local bin directory: [usr/local/bin]:
The file "dbhome" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]:
The file "oraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]:
The file "coraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]:
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2010-07-13 12:51:28: Parsing the host name
2010-07-13 12:51:28: Checking for super user privileges
2010-07-13 12:51:28: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node fred0224, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
CRS-2672: Attempting to start 'ora.mdnsd' on 'fred0225'
CRS-2676: Start of 'ora.mdnsd' on 'fred0225' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'fred0225'
CRS-2676: Start of 'ora.gipcd' on 'fred0225' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'fred0225'
CRS-2676: Start of 'ora.gpnpd' on 'fred0225' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'fred0225'
CRS-2676: Start of 'ora.cssdmonitor' on 'fred0225' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'fred0225'
CRS-2672: Attempting to start 'ora.diskmon' on 'fred0225'
CRS-2676: Start of 'ora.diskmon' on 'fred0225' succeeded
CRS-2676: Start of 'ora.cssd' on 'fred0225' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'fred0225'
Start action for octssd aborted
CRS-2676: Start of 'ora.ctssd' on 'fred0225' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'fred0225'
CRS-2672: Attempting to start 'ora.asm' on 'fred0225'
CRS-2676: Start of 'ora.drivers.acfs' on 'fred0225' succeeded
CRS-2676: Start of 'ora.asm' on 'fred0225' succeeded
CRS-2664: Resource 'ora.ctssd' is already running on 'fred0225'
CRS-4000: Command Start failed, or completed with errors.
Command return code of 1 (256) from command: /u01/app/11.2.0/grid/bin/crsctl start resource ora.asm -init
Start of resource "ora.asm -init" failed
Failed to start ASM
Failed to start Oracle Clusterware stack
In the ocssd.log I found
[ CSSD][3559689984]clssnmvDHBValidateNCopy: node 1, fred0224, has a disk HB, but no network HB, DHB has rcfg 174483948, wrtcnt, 232, LATS 521702664, lastSeqNo 232, uniqueness 1279039649, timestamp 1279039959/521874274
In oraagent_oracle.log I found
[ clsdmc][1212365120]Fail to connect (ADDRESS=(PROTOCOL=ipc)(KEY=fred0225DBG_GPNPD)) with status 9
2010-07-13 12:54:07.234: [ora.gpnpd][1212365120] [check] Error = error 9 encountered when connecting to GPNPD
2010-07-13 12:54:07.238: [ora.gpnpd][1212365120] [check] Calling PID check for daemon
2010-07-13 12:54:07.238: [ora.gpnpd][1212365120] [check] Trying to check PID = 20584
2010-07-13 12:54:07.432: [ COMMCRS][1285794112]clsc_connect: (0x1304d850) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=fred0225DBG_GPNPD))
[ clsdmc][1222854976]Fail to connect (ADDRESS=(PROTOCOL=ipc)(KEY=fred0225DBG_MDNSD)) with status 9
2010-07-13 12:54:08.649: [ora.mdnsd][1222854976] [check] Error = error 9 encountered when connecting to MDNSD
2010-07-13 12:54:08.649: [ora.mdnsd][1222854976] [check] Calling PID check for daemon
2010-07-13 12:54:08.649: [ora.mdnsd][1222854976] [check] Trying to check PID = 20571
2010-07-13 12:54:08.841: [ COMMCRS][1201875264]clsc_connect: (0x12f3b1d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=fred0225DBG_MDNSD))
[ clsdmc][1159915840]Fail to connect (ADDRESS=(PROTOCOL=ipc)(KEY=fred0225DBG_GIPCD)) with status 9
2010-07-13 12:54:10.051: [ora.gipcd][1159915840] [check] Error = error 9 encountered when connecting to GIPCD
2010-07-13 12:54:10.051: [ora.gipcd][1159915840] [check] Calling PID check for daemon
2010-07-13 12:54:10.051: [ora.gipcd][1159915840] [check] Trying to check PID = 20566
2010-07-13 12:54:10.242: [ COMMCRS][1254324544]clsc_connect: (0x12f35630) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=fred0225DBG_GIPCD))
In oracssdagent_root.log I found
2010-07-13 12:52:28.698: [ CSSCLNT][1102481728]clssscConnect: gipc request failed with 29 (0x16)
2010-07-13 12:52:28.698: [ CSSCLNT][1102481728]clsssInitNative: connect failed, rc 29
2010-07-13 12:53:55.222: [ CSSCLNT][1102481728]clssnsqlnum: RPC failed rc 3
2010-07-13 12:53:55.222: [ USRTHRD][1102481728] clsnomon_cssini: failed 3 to fetch node number
2010-07-13 12:53:55.222: [ USRTHRD][1102481728] clsnomon_init: css init done, nodenum -1.
2010-07-13 12:53:55.222: [ CSSCLNT][1102481728]clsssRecvMsg: got a disconnect from the server while waiting for message type 43
2010-07-13 12:53:55.222: [ CSSCLNT][1102481728]clsssGetNLSData: Failure receiving a msg, rc 3
If anyone needs more info please let me know.

On all nodes,
1. Modify the /etc/sysconfig/oracleasm with:
ORACLEASM_SCANORDER="dm"
ORACLEASM_SCANEXCLUDE="sd"
2. restart the asmlib by :
# /etc/init.d/oracleasm restart
3. Run root.sh on the 2nd node
hope this helps you

SC 3.2 second node panics on boot

I am trying to get a two node (potentially three if the cluster works :) ) cluster running in a solaris 10 x86 (AMD64) environment. Following are the machine specifications:
AMD 64 single core
SATA2 hdd partitioned as / (100+gb), swap (4gb) and /globaldevices (1gb)
Solaris 10 Generic_127112-07
Completely patched
2 gb RAM
NVidia nge nic
Syskonnect skge nic
Realtek rge nic
Sun Cluster 3.2
Two unmanaged gigabit switches
The cluster setup would look like the following:
DB03 (First node of the cluster)
db03nge0 -- public interconnect
db03skge0 -- private interconnect 1 -- connected to sw07
db03rge0 -- private interconnect 2 -- connected to sw09
/globaldevices -- local disk
DB02 (Second node of the cluster)
db02nge0 -- public interconnect
db02skge0 -- private interconnect 1 -- connected to sw07
db02rge0 -- private interconnect 2 -- connected to sw09
/globaldevices -- local disk
DB01 (Third node of the cluster)
db01nge0 -- public interconnect
db01skge0 -- private interconnect 1 -- connected to sw07
db01rge0 -- private interconnect 2 -- connected to sw09
/globaldevices -- local disk
All external/public communication happens at the nge0 nic.
Switch sw07 and sw09 connects these machines for private interconnect.
All of them have a local disk partition mounted as /globaldevices
Another fourth server which is not a part of the cluster environment acts as a quorum server. The systems connect to the quorum server over nge nic. the quorum device name is cl01qs
Next, I did a single node configuration on DB03 through scinstall utility and it completed successfully. The DB03 system reboot and acquired quorum vote from the quorum server and came up fine.
Then, I added the second node to the cluster (running the scinstall command from the second node). The scinstall completes successfully and goes down for a reboot.
i can see the following from the first node:
db03nge0# cluster show
Cluster ===
Cluster Name:                                   cl01
installmode:                                     disabled
private_netaddr:                                 172.16.0.0
private_netmask:                                 255.255.248.0
max_nodes:                                       64
max_privatenets:                                 10
udp_session_timeout:                             480
global_fencing:                                  pathcount
Node List:                                       db03nge0, db02nge0
Host Access Control ===
Cluster name:                                 cl01
    Allowed hosts:                                 Any
    Authentication Protocol:                       sys
Cluster Nodes ===
Node Name:                                    db03nge0
    Node ID:                                       1
    Enabled:                                       yes
    privatehostname:                               clusternode1-priv
    reboot_on_path_failure:                        disabled
    globalzoneshares:                              1
    defaultpsetmin:                                1
    quorum_vote:                                   1
    quorum_defaultvote:                            1
    quorum_resv_key:                               0x479C227E00000001
    Transport Adapter List:                        skge0, rge0
Node Name:                                    db02nge0
    Node ID:                                       2
    Enabled:                                       yes
    privatehostname:                               clusternode2-priv
    reboot_on_path_failure:                        disabled
    globalzoneshares:                              1
    defaultpsetmin:                                1
    quorum_vote:                                   0
    quorum_defaultvote:                            1
    quorum_resv_key:                               0x479C227E00000002
    Transport Adapter List:                        skge0, rge0Now, the problem part, when scinstall completes on the second node, it sends the machine for a reboot and, the second node encounters a panic and shuts itself down. This panic and reboot cycle keeps on going unless I place the second node in non-cluster mode. The output from both the nodes looks like the following:
First Node DB03 (Primary)
Jan 27 18:34:49 db03nge0 genunix: [ID 537175 kern.notice] NOTICE: CMM: Node db02nge0 (nodeid: 2, incarnation #: 1201476860) has become reachable.
Jan 27 18:34:49 db03nge0 genunix: [ID 387288 kern.notice] NOTICE: clcomm: Path db03nge0:rge0 - db02nge0:rge0 online
Jan 27 18:34:49 db03nge0 genunix: [ID 387288 kern.notice] NOTICE: clcomm: Path db03nge0:skge0 - db02nge0:skge0 online
Jan 27 18:34:49 db03nge0 genunix: [ID 377347 kern.notice] NOTICE: CMM: Node db02nge0 (nodeid = 2) is up; new incarnation number = 1201476860.
Jan 27 18:34:49 db03nge0 genunix: [ID 108990 kern.notice] NOTICE: CMM: Cluster members: db03nge0 db02nge0.
Jan 27 18:34:49 db03nge0 Cluster.Framework: [ID 801593 daemon.notice] stdout: releasing reservations for scsi-2 disks shared with db02nge0
Jan 27 18:34:49 db03nge0 genunix: [ID 279084 kern.notice] NOTICE: CMM: node reconfiguration #7 completed.
Jan 27 18:34:59 db03nge0 genunix: [ID 446068 kern.notice] NOTICE: CMM: Node db02nge0 (nodeid = 2) is down.
Jan 27 18:34:59 db03nge0 genunix: [ID 108990 kern.notice] NOTICE: CMM: Cluster members: db03nge0.
Jan 27 18:34:59 db03nge0 genunix: [ID 489438 kern.notice] NOTICE: clcomm: Path db03nge0:skge0 - db02nge0:skge0 being drained
Jan 27 18:34:59 db03nge0 genunix: [ID 489438 kern.notice] NOTICE: clcomm: Path db03nge0:rge0 - db02nge0:rge0 being drained
Jan 27 18:35:00 db03nge0 genunix: [ID 279084 kern.notice] NOTICE: CMM: node reconfiguration #8 completed.
Jan 27 18:35:00 db03nge0 Cluster.Framework: [ID 801593 daemon.notice] stdout: fencing node db02nge0 from shared devices
Jan 27 18:35:59 db03nge0 genunix: [ID 604153 kern.notice] NOTICE: clcomm: Path db03nge0:skge0 - db02nge0:skge0 errors during initiation
Jan 27 18:35:59 db03nge0 genunix: [ID 618107 kern.warning] WARNING: Path db03nge0:skge0 - db02nge0:skge0 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
Jan 27 18:35:59 db03nge0 genunix: [ID 604153 kern.notice] NOTICE: clcomm: Path db03nge0:rge0 - db02nge0:rge0 errors during initiation
Jan 27 18:35:59 db03nge0 genunix: [ID 618107 kern.warning] WARNING: Path db03nge0:rge0 - db02nge0:rge0 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
Jan 27 18:40:27 db03nge0 genunix: [ID 273354 kern.notice] NOTICE: CMM: Node db02nge0 (nodeid = 2) is dead.Second Node DB02 (Secondary node just added to cluster)
Jan 27 18:33:43 db02nge0 ipf: [ID 774698 kern.info] IP Filter: v4.1.9, running.
Jan 27 18:33:50 db02nge0 svc.startd[8]: [ID 652011 daemon.warning] svc:/system/pools:default: Method "/lib/svc/method/svc-pools start" failed with exit status 96.
Jan 27 18:33:50 db02nge0 svc.startd[8]: [ID 748625 daemon.error] system/pools:default misconfigured: transitioned to maintenance (see 'svcs -xv' for details)
Jan 27 18:34:20 db02nge0 genunix: [ID 965873 kern.notice] NOTICE: CMM: Node db03nge0 (nodeid = 1) with votecount = 1 added.
Jan 27 18:34:20 db02nge0 genunix: [ID 965873 kern.notice] NOTICE: CMM: Node db02nge0 (nodeid = 2) with votecount = 0 added.
Jan 27 18:34:20 db02nge0 genunix: [ID 884114 kern.notice] NOTICE: clcomm: Adapter rge0 constructed
Jan 27 18:34:20 db02nge0 genunix: [ID 884114 kern.notice] NOTICE: clcomm: Adapter skge0 constructed
Jan 27 18:34:20 db02nge0 genunix: [ID 843983 kern.notice] NOTICE: CMM: Node db02nge0: attempting to join cluster.
Jan 27 18:34:23 db02nge0 skge: [ID 418734 kern.notice] skge0: Network connection up on port A
Jan 27 18:34:23 db02nge0 skge: [ID 249518 kern.notice]     Link Speed:      1000 Mbps
Jan 27 18:34:23 db02nge0 skge: [ID 966250 kern.notice]     Autonegotiation: Yes
Jan 27 18:34:23 db02nge0 skge: [ID 676895 kern.notice]     Duplex Mode:     Full
Jan 27 18:34:23 db02nge0 skge: [ID 825410 kern.notice]     Flow Control:    Symmetric
Jan 27 18:34:23 db02nge0 skge: [ID 512437 kern.notice]     Role:            Slave
Jan 27 18:34:23 db02nge0 rge: [ID 801725 kern.info] NOTICE: rge0: link up 1000Mbps Full_Duplex (initialized)
Jan 27 18:34:24 db02nge0 genunix: [ID 537175 kern.notice] NOTICE: CMM: Node db03nge0 (nodeid: 1, incarnation #: 1201416440) has become reachable.
Jan 27 18:34:24 db02nge0 genunix: [ID 387288 kern.notice] NOTICE: clcomm: Path db02nge0:rge0 - db03nge0:rge0 online
Jan 27 18:34:24 db02nge0 genunix: [ID 525628 kern.notice] NOTICE: CMM: Cluster has reached quorum.
Jan 27 18:34:24 db02nge0 genunix: [ID 377347 kern.notice] NOTICE: CMM: Node db03nge0 (nodeid = 1) is up; new incarnation number = 1201416440.
Jan 27 18:34:24 db02nge0 genunix: [ID 377347 kern.notice] NOTICE: CMM: Node db02nge0 (nodeid = 2) is up; new incarnation number = 1201476860.
Jan 27 18:34:24 db02nge0 genunix: [ID 108990 kern.notice] NOTICE: CMM: Cluster members: db03nge0 db02nge0.
Jan 27 18:34:24 db02nge0 genunix: [ID 387288 kern.notice] NOTICE: clcomm: Path db02nge0:skge0 - db03nge0:skge0 online
Jan 27 18:34:25 db02nge0 genunix: [ID 279084 kern.notice] NOTICE: CMM: node reconfiguration #7 completed.
Jan 27 18:34:25 db02nge0 genunix: [ID 499756 kern.notice] NOTICE: CMM: Node db02nge0: joined cluster.
Jan 27 18:34:25 db02nge0 cl_dlpitrans: [ID 624622 kern.notice] Notifying cluster that this node is panicking
Jan 27 18:34:25 db02nge0 unix: [ID 836849 kern.notice]
Jan 27 18:34:25 db02nge0 ^Mpanic[cpu0]/thread=ffffffff8202a1a0:
Jan 27 18:34:25 db02nge0 genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=fffffe8000636b90 addr=30 occurred in module "cl_comm" due to a NULL pointer dereference
Jan 27 18:34:25 db02nge0 cl_dlpitrans: [ID 624622 kern.notice] Notifying cluster that this node is panicking
Jan 27 18:34:25 db02nge0 unix: [ID 836849 kern.notice]
Jan 27 18:34:25 db02nge0 ^Mpanic[cpu0]/thread=ffffffff8202a1a0:
Jan 27 18:34:25 db02nge0 genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=fffffe8000636b90 addr=30 occurred in module "cl_comm" due to a NULL pointer dereference
Jan 27 18:34:25 db02nge0 unix: [ID 100000 kern.notice]
Jan 27 18:34:25 db02nge0 unix: [ID 839527 kern.notice] cluster:
Jan 27 18:34:25 db02nge0 unix: [ID 753105 kern.notice] #pf Page fault
Jan 27 18:34:25 db02nge0 unix: [ID 532287 kern.notice] Bad kernel fault at addr=0x30
Jan 27 18:34:25 db02nge0 unix: [ID 243837 kern.notice] pid=4, pc=0xfffffffff262c3f6, sp=0xfffffe8000636c80, eflags=0x10202
Jan 27 18:34:25 db02nge0 unix: [ID 211416 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f0<xmme,fxsr,pge,mce,pae,pse>
Jan 27 18:34:25 db02nge0 unix: [ID 354241 kern.notice] cr2: 30 cr3: efd4000 cr8: c
Jan 27 18:34:25 db02nge0 unix: [ID 592667 kern.notice] rdi: ffffffff8c932b18 rsi: ffffffffc055a8e6 rdx:               10
Jan 27 18:34:25 db02nge0 unix: [ID 592667 kern.notice] rcx: ffffffff8d10d0c0 r8:                0 r9:                0
Jan 27 18:34:25 db02nge0 unix: [ID 592667 kern.notice] rax:               10 rbx:                0 rbp: fffffe8000636cd0
Jan 27 18:34:25 db02nge0 unix: [ID 592667 kern.notice] r10:                0 r11: fffffffffbce2d40 r12: ffffffff8216a008
Jan 27 18:34:25 db02nge0 unix: [ID 592667 kern.notice] r13:              800 r14:                0 r15: ffffffff8216a0d8
Jan 27 18:34:25 db02nge0 unix: [ID 592667 kern.notice] fsb: ffffffff80000000 gsb: fffffffffbc25520 ds:               43
Jan 27 18:34:25 db02nge0 unix: [ID 592667 kern.notice]   es:               43 fs:                0 gs:              1c3
Jan 27 18:34:25 db02nge0 unix: [ID 592667 kern.notice] trp:                e err:                0 rip: fffffffff262c3f6
Jan 27 18:34:25 db02nge0 unix: [ID 592667 kern.notice]   cs:               28 rfl:            10202 rsp: fffffe8000636c80
Jan 27 18:34:25 db02nge0 unix: [ID 266532 kern.notice]   ss:               30
Jan 27 18:34:25 db02nge0 unix: [ID 100000 kern.notice]
Jan 27 18:34:25 db02nge0 genunix: [ID 655072 kern.notice] fffffe8000636aa0 unix:die+da ()
Jan 27 18:34:25 db02nge0 genunix: [ID 655072 kern.notice] fffffe8000636b80 unix:trap+d86 ()
Jan 27 18:34:25 db02nge0 genunix: [ID 655072 kern.notice] fffffe8000636b90 unix:cmntrap+140 ()
Jan 27 18:34:25 db02nge0 genunix: [ID 655072 kern.notice] fffffe8000636cd0 cl_comm:__1cKfp_adapterNget_fp_header6MpCLHC_pnEmsgb__+163 ()
Jan 27 18:34:25 db02nge0 genunix: [ID 655072 kern.notice] fffffe8000636d30 cl_comm:__1cJfp_holderVupdate_remote_macaddr6MrnHnetworkJmacinfo_t__v_+e5 ()
Jan 27 18:34:25 db02nge0 genunix: [ID 655072 kern.notice] fffffe8000636d80 cl_comm:__1cLpernodepathOstart_matching6MnM_ManagedSeq_4nL_NormalSeq_4nHnetworkJmacinfo_t__
_n0C____v_+180 ()
Jan 27 18:34:25 db02nge0 genunix: [ID 655072 kern.notice] fffffe8000636e60 cl_comm:__1cGfpconfIfp_ns_if6M_v_+195 ()
Jan 27 18:34:25 db02nge0 genunix: [ID 655072 kern.notice] fffffe8000636e70 cl_comm:.XDKsQAiaUkSGENQ.__1fTget_idlversion_impl1AG__CCLD_+320bf51b ()
Jan 27 18:34:25 db02nge0 genunix: [ID 655072 kern.notice] fffffe8000636ed0 cl_orb:cllwpwrapper+106 ()
Jan 27 18:34:25 db02nge0 genunix: [ID 655072 kern.notice] fffffe8000636ee0 unix:thread_start+8 ()
Jan 27 18:34:25 db02nge0 unix: [ID 100000 kern.notice]
Jan 27 18:34:25 db02nge0 genunix: [ID 672855 kern.notice] syncing file systems...
Jan 27 18:34:25 db02nge0 genunix: [ID 433738 kern.notice] [1]
Jan 27 18:34:25 db02nge0 genunix: [ID 733762 kern.notice] 33
Jan 27 18:34:26 db02nge0 genunix: [ID 433738 kern.notice] [1]
Jan 27 18:34:26 db02nge0 genunix: [ID 733762 kern.notice] 2
Jan 27 18:34:27 db02nge0 genunix: [ID 433738 kern.notice] [1]
Jan 27 18:34:48 db02nge0 last message repeated 20 times
Jan 27 18:34:49 db02nge0 genunix: [ID 622722 kern.notice] done (not all i/o completed)
Jan 27 18:34:50 db02nge0 genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c1d0s1, offset 860356608, content: kernel
Jan 27 18:34:55 db02nge0 genunix: [ID 409368 kern.notice] ^M100% done: 92936 pages dumped, compression ratio 5.02,
Jan 27 18:34:55 db02nge0 genunix: [ID 851671 kern.notice] dump succeeded
Jan 27 18:35:41 db02nge0 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.10 Version Generic_127112-07 64-bit
Jan 27 18:35:41 db02nge0 genunix: [ID 943907 kern.notice] Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved.
Jan 27 18:35:41 db02nge0 Use is subject to license terms.
Jan 27 18:35:41 db02nge0 unix: [ID 126719 kern.info] features: 1076fdf<cpuid,sse3,nx,asysc,sse2,sse,pat,cx8,pae,mca,mmx,cmov,pge,mtrr,msr,tsc,lgpg>
Jan 27 18:35:41 db02nge0 unix: [ID 168242 kern.info] mem = 3144188K (0xbfe7f000)
Jan 27 18:35:41 db02nge0 rootnex: [ID 466748 kern.info] root nexus = i86pcI don't know what is the next step to overcome this problem. I have tried the same with DB01 machine, but that machine is also throwing a kernel panic at the same point. From what I can see from the logs, it seems as if the secondary node(s) do join the cluster:
Jan 27 18:34:20 db02nge0 genunix: [ID 965873 kern.notice] NOTICE: CMM: Node db03nge0 (nodeid = 1) with votecount = 1 added.
Jan 27 18:34:20 db02nge0 genunix: [ID 965873 kern.notice] NOTICE: CMM: Node db02nge0 (nodeid = 2) with votecount = 0 added.
Jan 27 18:34:20 db02nge0 genunix: [ID 884114 kern.notice] NOTICE: clcomm: Adapter rge0 constructed
Jan 27 18:34:20 db02nge0 genunix: [ID 884114 kern.notice] NOTICE: clcomm: Adapter skge0 constructed
Jan 27 18:34:20 db02nge0 genunix: [ID 843983 kern.notice] NOTICE: CMM: Node db02nge0: attempting to join cluster.
Jan 27 18:34:23 db02nge0 rge: [ID 801725 kern.info] NOTICE: rge0: link up 1000Mbps Full_Duplex (initialized)
Jan 27 18:34:24 db02nge0 genunix: [ID 537175 kern.notice] NOTICE: CMM: Node db03nge0 (nodeid: 1, incarnation #: 1201416440) has become reachable.
Jan 27 18:34:24 db02nge0 genunix: [ID 387288 kern.notice] NOTICE: clcomm: Path db02nge0:rge0 - db03nge0:rge0 online
Jan 27 18:34:24 db02nge0 genunix: [ID 525628 kern.notice] NOTICE: CMM: Cluster has reached quorum.
Jan 27 18:34:24 db02nge0 genunix: [ID 377347 kern.notice] NOTICE: CMM: Node db03nge0 (nodeid = 1) is up; new incarnation number = 1201416440.
Jan 27 18:34:24 db02nge0 genunix: [ID 377347 kern.notice] NOTICE: CMM: Node db02nge0 (nodeid = 2) is up; new incarnation number = 1201476860.
Jan 27 18:34:24 db02nge0 genunix: [ID 108990 kern.notice] NOTICE: CMM: Cluster members: db03nge0 db02nge0.
Jan 27 18:34:24 db02nge0 genunix: [ID 387288 kern.notice] NOTICE: clcomm: Path db02nge0:skge0 - db03nge0:skge0 online
Jan 27 18:34:25 db02nge0 genunix: [ID 279084 kern.notice] NOTICE: CMM: node reconfiguration #7 completed.
Jan 27 18:34:25 db02nge0 genunix: [ID 499756 kern.notice] NOTICE: CMM: Node db02nge0: joined cluster.but, then, immediately due to some reason encounter the kernel panick.
The only thing which is coming to my mind is that the skge driver is somehow causing the problem while it is a part of the cluster interconnect. Don't know, but another thread somewhere on the internet was facing a similar problem:
http://unix.derkeiler.com/Mailing-Lists/SunManagers/2005-12/msg00114.html
The next step looks like inter-changing the nge and skge nics and trying it again.
Any help is much appreciated.
Thanks in advance.
tualha

I'm not sure I can solve your problem but I have some suggestions that you might want to consider. I can't find anything in the bugs database that is identical to this, but that may be because we haven't certified the adapters you are using and thus never came across the problem.
Although I'm not that hot on kernel debugging, looking at the stack traces seems to suggest that there might have been a problem with MAC addresses. Can you check that you have the equivalent of local_mac_address = true set, so that each adapter has a separate MAC address. If they don't it might confuse the cl_com module which seems to have had the fault.
If that checks out, then I would try switching the syskonnect adapter to being the public network and making the nge adapter the other private network. Again, I don't think any of these adapters have every been tested so there is no guarantee they will work.
Other ideas to try are to set the adapters to not auto-negotiate speeds, disable jumbo frames, check that they don't have any power saving modes that might put them to sleep periodically, etc.
Let us know if any of these make any difference.
Tim
---

Crs doesn't start on second node

Guys,
RAC on 2 nodes
Release 10.2.0.5.0
Solaris 10
There was a problem with the cable that enables connection for the interconnect, but the problem has been solved. One of the nodes was evicted and all resources were move to the other node. Once the problem was solved I tried to start the cluster that was evicted but to no success. when I run crs_stat -t I get the infamous CRS-0184.
I have checked the ocr and olsnodes; ocr seems to be fine and the second node is recognized as part of the cluster.
cluvfy comp ocr -n lenin,trotsky -verbose
Verifying OCR integrity
Checking OCR integrity...
Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations.
Uniqueness check for OCR device passed.
Checking the version of OCR...
OCR of correct Version "2" exists.
Checking data integrity of OCR...
Data integrity check for OCR passed.
OCR integrity check passed.
Verification of OCR integrity was successful.
oracle@trotsky > cluvfy comp nodereach -n lenin,trotsky -srcnode trotsky -verbose
Verifying node reachability
Checking node reachability...
Check: Node reachability from node "trotsky"
Destination Node Reachable?
lenin yes
trotsky yes
Result: Node reachability check passed from node "trotsky".
I have checked /var/adm/messages and crs and cssd log but I didn't see anything that stands out...
I have also tried to delete the content of /var/tmp/.oracle and restart crs but again to no success.
I have read in another thread in this forum that crs problems are either related to the interconnect or ocr/voting disks but as mentioned before they seem to be OK.
I'm running out of ideas, any suggestions?
One of the nodes now holds both vip addresses:
bge0:1: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 index 2
inet 192.168.191.184 netmask ffffff00 broadcast 192.168.191.255
bge0:2: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 index 2
inet 192.168.191.182 netmask ffffff00 broadcast 192.168.191.255
Do I need to manually reconfigure the interface do that is then held by the second node?
Thanks in advance for your help

Cheers for your input!
The results on the suggested cluvfy command is: passed on all checks with the exception of the daemon liveness (as expected).
Excerpts from the different logs:
alert.log
2010-11-19 13:12:35.033
[cssd(4928)]CRS-1605:CSSD voting file is online: /dev/rdsk/c1t500601604BA03AEAd0s5. Details in /u01/crs/10.2.0/crs_1/log/trotsky/cssd/ocssd.log.
2010-11-19 13:12:35.050
[cssd(4928)]CRS-1605:CSSD voting file is online: /dev/rdsk/c1t500601604BA03AEAd0s4. Details in /u01/crs/10.2.0/crs_1/log/trotsky/cssd/ocssd.log.
2010-11-19 13:12:35.062
[cssd(4928)]CRS-1605:CSSD voting file is online: /dev/rdsk/c1t500601604BA03AEAd0s6. Details in /u01/crs/10.2.0/crs_1/log/trotsky/cssd/ocssd.log.
cssd.log
[    CSSD]2010-11-19 13:16:47.059 [21] >WARNING: clssnmLocalJoinEvent: takeover aborted due to ALIVE node on Disk
[    CSSD]2010-11-19 13:16:47.059 [21] >WARNING: clssnmRcfgMgrThread: not possible to join the cluster. Please reboot the node.
[    CSSD]2010-11-19 13:16:47.059 [21] >WARNING: clssnmReconfigThread: state(1) clusterState(0) exit
I have tried rebooting the node but that did not help.
crsd.log
2010-11-19 13:53:49.652: [ CRSRTI][1] CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2010-11-19 13:53:50.889: [ COMMCRS][1802]clsc_connect: (1009ac310) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_trotsky_))
2010-11-19 13:53:50.889: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
2010-11-19 13:53:50.890: [ CRSRTI][1] CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2010-11-19 13:53:51.899: [    CRSD][1][PANIC] CRSD exiting: Could not init the CSS context
2010-11-19 13:53:51.899: [    CRSD][1] Done.
Does this help?

Question:managed server status display unknown ??

platform: AS5 weblogic8.16 cluter
when managed server startup,status ok,but after some days,managed server status is unknown or shutdown in console web
I check process "ps -ef|grep java" managed server process exit
and application on managed server is ok
this is log in domain.log
####<2009-9-9 <Warning> <Management> <web3> <myserver> <ExecuteThread: '1' for queue: 'weblogic.kernel.System'> <<WLS Kernel>> <BEA-141138> <Managed Server web03 is disconnected from the admin server. This may be either due to a managed server getting temporarily partitioned or the managed server process exiting.>
after restart managed server ,status running
so why managed server status display unknown/shutdown after some days?? how can i solve this problem??

Status unknown means that node manager can't establish contact with the mgd server. If your server is dropping out of the cluster, that would result in Unknown.
Your server could be dropping out of the cluster for many, many reasons. If it is always the same server dropping out, that points to a particular node in your cluster.

Rac resource status is unknown only vip and asm is running fine

i have just install oracle 10g clusterware and oracle software on RHEL 5
only asm and vip is working fine ..rest of the resource not working status is unknown
[oracle@rac1 bin]$ ./crs_stat -t
Name Type Target State Host
ora....SM1.asm application ONLINE ONLINE rac1
ora....C1.lsnr application ONLINE UNKNOWN rac1
ora.rac1.gsd application ONLINE UNKNOWN rac1
ora.rac1.ons application ONLINE UNKNOWN rac1
ora.rac1.vip application ONLINE ONLINE rac1
ora....SM2.asm application ONLINE ONLINE rac2
ora....C2.lsnr application ONLINE UNKNOWN rac2
ora.rac2.gsd application ONLINE UNKNOWN rac2
ora.rac2.ons application ONLINE UNKNOWN rac2
ora.rac2.vip application ONLINE ONLINE rac2
i tried to start is manually but its return error like this
[oracle@rac1 bin]$ ./crs_stop ora.rac1.LISTENER_RAC1.lsnr -f
Attempting to stop `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1`
Stop of `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1` succeeded.
[oracle@rac1 bin]$ ./crs_start ora.rac1.LISTENER_RAC1.lsnr -f
Attempting to start `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1`
`ora.rac1.LISTENER_RAC1.lsnr` on member `rac1` has experienced an unrecoverable failure.
Human intervention required to resume its availability.
CRS-0215: Could not start resource 'ora.rac1.LISTENER_RAC1.lsnr'.
[oracle@rac1 bin]$ ./crs_start ora.rac1.LISTENER_RAC1.lsnr -f
CRS-1028: Dependency analysis failed because of:
'Resource in UNKNOWN state: ora.rac1.LISTENER_RAC1.lsnr'
CRS-0223: Resource 'ora.rac1.LISTENER_RAC1.lsnr' has placement error.
[oracle@rac1 bin]$ ./crs_start ora.rac2.LISTENER_RAC2.lsnr -f
CRS-1028: Dependency analysis failed because of:
'Resource in UNKNOWN state: ora.rac2.LISTENER_RAC2.lsnr'
CRS-0223: Resource 'ora.rac2.LISTENER_RAC2.lsnr' has placement error.
i reboot the system 3 times but problem is same plz help me to solve this problem...

/opt/app/crs/log/rac1/alertrac1.log output..
[cssd(5441)]CRS-1605:CSSD voting file is online: /dev/raw/raw2. Details in /opt/app/crs/log/rac1/cssd/ocssd.log.
2012-12-19 05:17:02.561
[cssd(5441)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 .
2012-12-19 05:17:03.998
[crsd(4718)]CRS-1012:The OCR service started on node rac1.
2012-12-19 05:17:04.028
[evmd(5327)]CRS-1401:EVMD started on node rac1.
2012-12-19 05:17:12.456
[crsd(4718)]CRS-1201:CRSD started on node rac1.
2012-12-19 05:17:23.668
[cssd(5441)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 .
2012-12-19 07:23:46.211
[cssd(5216)]CRS-1605:CSSD voting file is online: /dev/raw/raw2. Details in /opt/app/crs/log/rac1/cssd/ocssd.log.
2012-12-19 07:23:49.399
[cssd(5216)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 .
2012-12-19 07:23:50.458
[crsd(4709)]CRS-1012:The OCR service started on node rac1.
2012-12-19 07:23:50.490
[evmd(5098)]CRS-1401:EVMD started on node rac1.
2012-12-19 07:23:55.776
[crsd(4709)]CRS-1201:CRSD started on node rac1.
2012-12-19 07:25:00.583
[cssd(5216)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 .
2012-12-20 00:09:11.199
[cssd(5286)]CRS-1605:CSSD voting file is online: /dev/raw/raw2. Details in /opt/app/crs/log/rac1/cssd/ocssd.log.
2012-12-20 00:09:14.907
[cssd(5286)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 .
2012-12-20 00:09:16.446
[evmd(5128)]CRS-1401:EVMD started on node rac1.
2012-12-20 00:09:16.459
[crsd(4756)]CRS-1012:The OCR service started on node rac1.
2012-12-20 00:10:02.406
[crsd(4756)]CRS-1201:CRSD started on node rac1.
2012-12-20 00:10:39.220
[cssd(5286)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 .
/opt/app/crs/log/rac1/crsd/crsd.log output:-
2012-12-20 00:09:15.606: [    CRSD][7390912]0Daemon Version: 10.2.0.1.0 Active Version: 10.2.0.1.0
2012-12-20 00:09:15.606: [    CRSD][7390912]0Active Version and Software Version are same
2012-12-20 00:09:15.606: [ CRSMAIN][7390912]0Initializing OCR
2012-12-20 00:09:15.801: [ OCRRAW][7390912]proprioo: for disk 0 (/dev/raw/raw1), id match (1), my id set (1669906634,1028247821) total id sets (1), 1st set (1669906634,1028247821), 2nd set (0,0) my votes (2), total votes (2)
2012-12-20 00:09:16.264: [ OCRMAS][3065346960]th_master:12: I AM THE NEW OCR MASTER at incar 1. Node Number = 1
2012-12-20 00:09:16.310: [ OCRRAW][3065346960]proprioo: for disk 0 (/dev/raw/raw1), id match (1), my id set (1669906634,1028247821) total id sets (1), 1st set (1669906634,1028247821), 2nd set (0,0) my votes (2), total votes (2)
2012-12-20 00:09:16.524: [    CRSD][7390912]0ENV Logging level for Module: allcomp 0
2012-12-20 00:09:16.528: [    CRSD][7390912]0ENV Logging level for Module: default 0
2012-12-20 00:09:16.534: [    CRSD][7390912]0ENV Logging level for Module: COMMCRS 0
2012-12-20 00:09:16.536: [    CRSD][7390912]0ENV Logging level for Module: COMMNS 0
2012-12-20 00:09:16.549: [    CRSD][7390912]0ENV Logging level for Module: CRSUI 0
2012-12-20 00:09:16.556: [    CRSD][7390912]0ENV Logging level for Module: CRSCOMM 0
2012-12-20 00:09:16.559: [    CRSD][7390912]0ENV Logging level for Module: CRSRTI 0
2012-12-20 00:09:16.562: [    CRSD][7390912]0ENV Logging level for Module: CRSMAIN 0
2012-12-20 00:09:16.564: [    CRSD][7390912]0ENV Logging level for Module: CRSPLACE 0
2012-12-20 00:09:16.567: [    CRSD][7390912]0ENV Logging level for Module: CRSAPP 0
2012-12-20 00:09:16.570: [    CRSD][7390912]0ENV Logging level for Module: CRSRES 0
2012-12-20 00:09:16.573: [    CRSD][7390912]0ENV Logging level for Module: CRSOCR 0
2012-12-20 00:09:16.576: [    CRSD][7390912]0ENV Logging level for Module: CRSTIMER 0
2012-12-20 00:09:16.582: [    CRSD][7390912]0ENV Logging level for Module: CRSEVT 0
2012-12-20 00:09:16.586: [    CRSD][7390912]0ENV Logging level for Module: CRSD 0
2012-12-20 00:09:16.590: [    CRSD][7390912]0ENV Logging level for Module: CLUCLS 0
2012-12-20 00:09:16.593: [    CRSD][7390912]0ENV Logging level for Module: OCRRAW 0
2012-12-20 00:09:16.596: [    CRSD][7390912]0ENV Logging level for Module: OCROSD 0
2012-12-20 00:09:16.600: [    CRSD][7390912]0ENV Logging level for Module: CSSCLNT 0
2012-12-20 00:09:16.603: [    CRSD][7390912]0ENV Logging level for Module: OCRAPI 0
2012-12-20 00:09:16.606: [    CRSD][7390912]0ENV Logging level for Module: OCRUTL 0
2012-12-20 00:09:16.609: [    CRSD][7390912]0ENV Logging level for Module: OCRMSG 0
2012-12-20 00:09:16.613: [    CRSD][7390912]0ENV Logging level for Module: OCRCLI 0
2012-12-20 00:09:16.651: [    CRSD][7390912]0ENV Logging level for Module: OCRCAC 0
2012-12-20 00:09:16.671: [    CRSD][7390912]0ENV Logging level for Module: OCRSRV 0
2012-12-20 00:09:16.678: [    CRSD][7390912]0ENV Logging level for Module: OCRMAS 0
2012-12-20 00:09:16.678: [ CRSMAIN][7390912]0Filename is /opt/app/crs/crs/init/rac1.pid
2012-12-20 00:09:16.956: [ CRSMAIN][7390912]0Using Authorizer location: /opt/app/crs/crs/auth/
2012-12-20 00:09:17.080: [ CRSMAIN][7390912]0Initializing RTI
2012-12-20 00:09:17.085: [CRSTIMER][2845059984]0Timer Thread Starting.
[ clsdmt][2866039696]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=rac1DBG_CRSD))
2012-12-20 00:09:17.236: [ CRSRES][7390912]0Parameter SECURITY = 1, running in USER Mode
2012-12-20 00:09:17.236: [ CRSMAIN][7390912]0Initializing EVMMgr
2012-12-20 00:09:17.475: [ COMMCRS][2834570128]clsc_connect: (0xa4e7cc8) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))
2012-12-20 00:09:18.437: [ COMMCRS][2834570128]clsc_connect: (0xa45b6b8) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))
2012-12-20 00:09:18.888: [ COMMCRS][2834570128]clsc_connect: (0xa45af68) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))
2012-12-20 00:09:19.575: [ COMMCRS][2834570128]clsc_connect: (0xa456f50) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))
2012-12-20 00:09:20.029: [ COMMCRS][2834570128]clsc_connect: (0xa45b330) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))
2012-12-20 00:09:47.675: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [172339896] retval lht [-27] Signal CV.
2012-12-20 00:09:55.947: [ CRSMAIN][7390912]0CRSD locked during state recovery, please wait.
2012-12-20 00:10:01.118: [ CRSMAIN][7390912]0CRSD recovered, unlocked.
2012-12-20 00:10:01.127: [ CRSMAIN][7390912]0QS socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=ora_crsqs))
2012-12-20 00:10:02.329: [ CRSMAIN][7390912]0CRSD UI socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=CRSD_UI_SOCKET))
2012-12-20 00:10:02.400: [ CRSMAIN][7390912]0E2E socket on: (ADDRESS=(PROTOCOL=tcp)(HOST=rac1-priv)(PORT=49896))
2012-12-20 00:10:02.401: [ CRSMAIN][7390912]0Starting Threads
2012-12-20 00:10:02.406: [ CRSMAIN][7390912]0CRS Daemon Started.
2012-12-20 00:10:09.239: [ CRSRES][2740161424]0startRunnable: setting CLI values
2012-12-20 00:10:09.612: [ CRSRES][2729671568]0startRunnable: setting CLI values
2012-12-20 00:10:10.089: [ CRSRES][2740161424]0Attempting to start `ora.rac1.vip` on member `rac1`
2012-12-20 00:10:10.147: [ CRSRES][2729671568]0Attempting to start `ora.rac2.vip` on member `rac1`
2012-12-20 00:10:25.883: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:25.895: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:25.907: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:25.929: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.009: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.036: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.077: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.099: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.112: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.124: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.138: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.156: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.181: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.197: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.213: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.225: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.239: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.253: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.265: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.275: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.288: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.299: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.312: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.324: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.335: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.351: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.366: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.379: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.389: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.400: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.414: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.426: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.438: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.449: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.460: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.473: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.486: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.500: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.513: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.523: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.537: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.551: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.563: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.574: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.587: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.607: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.620: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.636: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.650: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.662: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.680: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:26.694: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [176455408] retval lht [-27] Signal CV.
2012-12-20 00:10:40.640: [ CRSRES][2740161424]0Start of `ora.rac1.vip` on member `rac1` succeeded.
2012-12-20 00:10:42.653: [ CRSRES][2729671568]0Start of `ora.rac2.vip` on member `rac1` succeeded.
2012-12-20 00:10:44.290: [ CRSRES][2740161424]0startRunnable: setting CLI values
2012-12-20 00:10:44.600: [ CRSRES][2729671568]0StopResource: setting CLI values
2012-12-20 00:10:44.601: [ CRSRES][2740161424]0Attempting to start `ora.rac1.ASM1.asm` on member `rac1`
2012-12-20 00:10:44.669: [ CRSRES][2729671568]0Attempting to stop `ora.rac2.vip` on member `rac1`
2012-12-20 00:10:46.051: [ CRSRES][2729671568]0Stop of `ora.rac2.vip` on member `rac1` succeeded.
2012-12-20 00:10:46.154: [ COMMCRS][2696936336]clsc_send_msg: (0xa84ecb0) NS err (12571, 12560), transport (530, 111, 0)
2012-12-20 00:10:46.154: [ CRSCOMM][2729671568]0CLSC connection failed, ret = 9
2012-12-20 00:10:46.154: [ CRSEVT][2729671568]0invokepeer ret 200
2012-12-20 00:10:46.302: [ CRSRES][2729671568]0Remote start never sent to rac2: X_E2E_NotSent : Failed to connect to node: rac2
(File: caa_CmdRTI.cpp, line: 492
2012-12-20 00:10:46.303: [ CRSRES][2729671568][ALERT]0Remote start for `ora.rac2.vip` failed on member `rac2`
2012-12-20 00:10:46.446: [ CRSEVT][2740161424]0CAAMonitorHandler :: 0:Action Script /opt/app/oracle/product/db_1/bin/racgwrap(start) timed out for ora.rac1.ASM1.asm! (timeout=600)
2012-12-20 00:10:46.446: [ CRSAPP][2740161424]0StartResource error for ora.rac1.ASM1.asm error code = -2
2012-12-20 00:10:46.558: [ CRSRES][2729671568]0startRunnable: setting CLI values
2012-12-20 00:10:46.625: [ CRSEVT][2740161424]0CAAMonitorHandler :: 0:Action Script /opt/app/oracle/product/db_1/bin/racgwrap(stop) timed out for ora.rac1.ASM1.asm! (timeout=600)
2012-12-20 00:10:46.626: [ CRSAPP][2740161424]0StopResource error for ora.rac1.ASM1.asm error code = -2
2012-12-20 00:10:46.665: [ CRSRES][2729671568]0Attempting to start `ora.rac2.vip` on member `rac1`
2012-12-20 00:10:46.750: [ CRSRES][2740161424]0X_OP_StopResourceFailed : Stop Resource failed
(File: rti.cpp, line: 1698
2012-12-20 00:10:46.750: [ CRSRES][2740161424][ALERT]0`ora.rac1.ASM1.asm` on member `rac1` has experienced an unrecoverable failure.
2012-12-20 00:10:46.750: [ CRSRES][2740161424]0Human intervention required to resume its availability.
2012-12-20 00:10:46.938: [ CRSRES][2740161424]0startRunnable: setting CLI values
2012-12-20 00:10:46.978: [ CRSRES][2740161424]0Attempting to start `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1`
2012-12-20 00:10:47.541: [ CRSEVT][2740161424]0CAAMonitorHandler :: 0:Action Script /opt/app/oracle/product/db_1/bin/racgwrap(start) timed out for ora.rac1.LISTENER_RAC1.lsnr! (timeout=600)
2012-12-20 00:10:47.541: [ CRSAPP][2740161424]0StartResource error for ora.rac1.LISTENER_RAC1.lsnr error code = -2
2012-12-20 00:10:47.807: [ CRSEVT][2740161424]0CAAMonitorHandler :: 0:Action Script /opt/app/oracle/product/db_1/bin/racgwrap(stop) timed out for ora.rac1.LISTENER_RAC1.lsnr! (timeout=600)
2012-12-20 00:10:47.807: [ CRSAPP][2740161424]0StopResource error for ora.rac1.LISTENER_RAC1.lsnr error code = -2
2012-12-20 00:10:48.181: [ CRSRES][2740161424]0X_OP_StopResourceFailed : Stop Resource failed
(File: rti.cpp, line: 1698
2012-12-20 00:10:48.181: [ CRSRES][2740161424][ALERT]0`ora.rac1.LISTENER_RAC1.lsnr` on member `rac1` has experienced an unrecoverable failure.
2012-12-20 00:10:48.181: [ CRSRES][2740161424]0Human intervention required to resume its availability.
2012-12-20 00:10:50.692: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [177094664] retval lht [-27] Signal CV.
2012-12-20 00:11:00.139: [ CRSRES][2729671568]0Start of `ora.rac2.vip` on member `rac1` succeeded.
2012-12-20 00:11:00.257: [ CRSRES][2729671568]0StopResource: setting CLI values
2012-12-20 00:11:00.269: [ CRSRES][2729671568]0Attempting to stop `ora.rac2.vip` on member `rac1`
2012-12-20 00:11:00.882: [ CRSRES][2729671568]0Stop of `ora.rac2.vip` on member `rac1` succeeded.
2012-12-20 00:11:00.938: [ CRSRES][2729671568]0Attempting to start `ora.rac2.vip` on member `rac2`
2012-12-20 00:11:23.937: [ CRSRES][2729671568]0Start of `ora.rac2.vip` on member `rac2` failed.
2012-12-20 00:11:24.051: [ CRSRES][2729671568]0startRunnable: setting CLI values
2012-12-20 00:11:24.067: [ CRSRES][2729671568]0Attempting to start `ora.rac2.vip` on member `rac1`
2012-12-20 00:11:36.257: [ OCRSRV][3044367248]th_select_handler: Failed to retrieve procctx from ht. constr = [177090520] retval lht [-27] Signal CV.
2012-12-20 00:11:43.061: [ CRSAPP][2729671568]0StartResource error for ora.rac2.vip error code = 1
2012-12-20 00:11:46.191: [ CRSAPP][2740161424]0CheckResource error for ora.rac1.vip error code = 1
2012-12-20 00:11:46.196: [ CRSRES][2740161424]0In stateChanged, ora.rac1.vip target is ONLINE
2012-12-20 00:11:46.197: [ CRSRES][2740161424]0ora.rac1.vip on rac1 went OFFLINE unexpectedly
2012-12-20 00:11:46.197: [ CRSRES][2740161424]0StopResource: setting CLI values
2012-12-20 00:11:46.211: [ CRSRES][2740161424]0Attempting to stop `ora.rac1.vip` on member `rac1`
2012-12-20 00:11:46.422: [ CRSRES][2729671568]0Start of `ora.rac2.vip` on member `rac1` failed.
2012-12-20 00:11:46.815: [ CRSRES][2686446480]0startRunnable: setting CLI values
2012-12-20 00:11:46.848: [ CRSRES][2729671568]0startRunnable: setting CLI values
2012-12-20 00:11:47.139: [ CRSRES][2686446480]0Attempting to start `ora.rac1.ons` on member `rac1`
2012-12-20 00:11:47.163: [ CRSRES][2729671568]0Attempting to start `ora.rac1.gsd` on member `rac1`
2012-12-20 00:11:47.266: [ CRSRES][2675956624]0Attempting to start `ora.rac2.gsd` on member `rac2`
2012-12-20 00:11:47.571: [ CRSRES][2665466768]0Attempting to start `ora.rac2.ons` on member `rac2`
2012-12-20 00:11:49.679: [ CRSEVT][2686446480]0CAAMonitorHandler :: 0:Action Script /opt/app/crs/bin/racgwrap(start) timed out for ora.rac1.ons! (timeout=600)
2012-12-20 00:11:49.680: [ CRSAPP][2686446480]0StartResource error for ora.rac1.ons error code = -2
2012-12-20 00:11:49.710: [ CRSEVT][2729671568]0CAAMonitorHandler :: 0:Action Script /opt/app/crs/bin/racgwrap(start) timed out for ora.rac1.gsd! (timeout=600)
2012-12-20 00:11:49.710: [ CRSAPP][2729671568]0StartResource error for ora.rac1.gsd error code = -2
2012-12-20 00:11:49.794: [ CRSEVT][2686446480]0CAAMonitorHandler :: 0:Action Script /opt/app/crs/bin/racgwrap(stop) timed out for ora.rac1.ons! (timeout=600)
2012-12-20 00:11:49.794: [ CRSAPP][2686446480]0StopResource error for ora.rac1.ons error code = -2
2012-12-20 00:11:49.813: [ CRSRES][2686446480]0X_OP_StopResourceFailed : Stop Resource failed
(File: rti.cpp, line: 1698
2012-12-20 00:11:49.813: [ CRSRES][2686446480][ALERT]0`ora.rac1.ons` on member `rac1` has experienced an unrecoverable failure.
2012-12-20 00:11:49.813: [ CRSRES][2686446480]0Human intervention required to resume its availability.
2012-12-20 00:11:49.839: [ CRSEVT][2729671568]0CAAMonitorHandler :: 0:Action Script /opt/app/crs/bin/racgwrap(stop) timed out for ora.rac1.gsd! (timeout=600)
2012-12-20 00:11:49.839: [ CRSAPP][2729671568]0StopResource error for ora.rac1.gsd error code = -2
2012-12-20 00:11:49.865: [ CRSRES][2729671568]0X_OP_StopResourceFailed : Stop Resource failed
(File: rti.cpp, line: 1698
2012-12-20 00:11:49.865: [ CRSRES][2729671568][ALERT]0`ora.rac1.gsd` on member `rac1` has experienced an unrecoverable failure.
2012-12-20 00:11:49.865: [ CRSRES][2729671568]0Human intervention required to resume its availability.
2012-12-20 00:11:50.076: [ CRSRES][2740161424]0Stop of `ora.rac1.vip` on member `rac1` succeeded.
2012-12-20 00:11:50.079: [ CRSRES][2740161424]0ora.rac1.vip RESTART_COUNT=0 RESTART_ATTEMPTS=0
2012-12-20 00:11:50.103: [ CRSRES][2740161424]0ora.rac1.vip failed on rac1 relocating.
2012-12-20 00:11:50.520: [ CRSRES][2740161424]0Attempting to start `ora.rac1.vip` on member `rac2`
2012-12-20 00:11:50.773: [ CRSRES][2675956624][ALERT]0`ora.rac2.gsd` on member `rac2` has experienced an unrecoverable failure.
2012-12-20 00:11:50.774: [ CRSRES][2675956624]0Human intervention required to resume its availability.
2012-12-20 00:11:50.810: [ CRSRES][2665466768][ALERT]0`ora.rac2.ons` on member `rac2` has experienced an unrecoverable failure.
2012-12-20 00:11:50.810: [ CRSRES][2665466768]0Human intervention required to resume its availability.
2012-12-20 00:12:13.625: [ CRSRES][2740161424]0Start of `ora.rac1.vip` on member `rac2` failed.
2012-12-20 00:12:13.994: [ CRSRES][2686446480]0startRunnable: setting CLI values
2012-12-20 00:12:26.925: [ COMMCRS][2824080272]clsc_receive: (0xa975368) Lock release 1 failed, rc 2
2012-12-20 00:12:26.925: [ COMMCRS][2824080272]clsc_receive: (0xa975368) error 2
2012-12-20 00:12:33.062: [ CRSAPP][2686446480]0StartResource error for ora.rac2.vip error code = 1
[root@rac1 bin]# ./crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
[root@rac1 bin]#
current output of crs_stat
[root@rac1 bin]# ./crs_stat -t
Name Type Target State Host
ora....SM1.asm application ONLINE UNKNOWN rac1
ora....C1.lsnr application ONLINE UNKNOWN rac1
ora.rac1.gsd application ONLINE UNKNOWN rac1
ora.rac1.ons application ONLINE UNKNOWN rac1
ora.rac1.vip application ONLINE OFFLINE
ora....SM2.asm application ONLINE OFFLINE
ora....C2.lsnr application ONLINE OFFLINE
ora.rac2.gsd application ONLINE UNKNOWN rac2
ora.rac2.ons application ONLINE UNKNOWN rac2
ora.rac2.vip application ONLINE OFFLINE

UCCX 8.5: first Node cable disconnect results

After disconnecting the first Node network cable :
- Forward busy/unregistered/failure triggers only after like 3-6 minutes. In the meantime we get silence.
- CAD get unresponsive for about 2-6 minutes.
- Agent can be place into ready after 2-6 minutes but the rest of the service don't workanother 3-5 minutes.
- Sometimes CAD shows crazy messages like Jtapi error, user does not exist, etc for about 2-8 minutes.(clicking on OK will remove all errors eventually)
NO FAILOVER MECHANISM WORKS DURING TRANSIENT (3 - 6 minutes)
Any ideas?

Hi Maxim,
Page 39 of the below link states,
http://www.cisco.com/en/US/docs/voice_ip_comm/cust_contact/contact_center/crs/express_8_5/installation/guide/cad85ccxig-cm.pdf
Privileges:
By default, Windows Installer installations run in the context of the logged-on user. CAD installations, which use Windows Installer, require either administrative or elevated (system) privileges. If the CAD installation is run in the context of an administrative account, no additional privileges are required.
You can also cross verify is the CAD has been installed after the UCCX HA setup is completetd, by running the
PostInstall.exe from the path C:\Program Files\Cisco\Desktop\bin in the CAD\CSD boxes. If you dont find the second node ip address, please enter it here.
And check the behaviour.
Hope it helps,
Anand

RAC issues with second node

We have two node rac setup. We are facing issues with the second node. if both node are up then application is not able to connect with the database. if the second node is down then application is able to connect.
What is the reason of this abnormal behaviour ?
any suggestion ?
We have oracle 10g RAC ( Database version 10.2.0.4), CRS version 10.2.0.3
Please help

oracle@ora1-oam # crsstat
HA Resource Target State
ora.cms.cms1.inst ONLINE ONLINE on ora1-oam
ora.cms.cms2.inst ONLINE ONLINE on ora2-oam
ora.cms.db ONLINE ONLINE on ora1-oam
ora.cms.db.cms.com.cms1.srv ONLINE ONLINE on ora1-oam
ora.cms.db.cms.com.cms2.srv ONLINE ONLINE on ora2-oam
ora.cms.db.cms.com.cs ONLINE ONLINE on ora1-oam
ora.myrio.db ONLINE ONLINE on ora1-oam
ora.myrio.db.myrio.com.cs ONLINE ONLINE on ora1-oam
ora.myrio.db.myrio.com.myrio1.srv ONLINE ONLINE on ora1-oam
ora.myrio.db.myrio.com.myrio2.srv ONLINE ONLINE on ora2-oam
ora.myrio.myrio1.inst ONLINE ONLINE on ora1-oam
ora.myrio.myrio2.inst ONLINE ONLINE on ora2-oam
ora.ora1-oam.ASM1.asm ONLINE ONLINE on ora1-oam
ora.ora1-oam.LISTENER_ORA1-OAM.lsnr ONLINE ONLINE on ora1-oam
ora.ora1-oam.gsd ONLINE ONLINE on ora1-oam
ora.ora1-oam.ons ONLINE ONLINE on ora1-oam
ora.ora1-oam.vip ONLINE ONLINE on ora1-oam
ora.ora2-oam.ASM2.asm ONLINE ONLINE on ora2-oam
ora.ora2-oam.LISTENER_ORA2-OAM.lsnr ONLINE ONLINE on ora2-oam
ora.ora2-oam.gsd ONLINE ONLINE on ora2-oam
ora.ora2-oam.ons ONLINE ONLINE on ora2-oam
ora.ora2-oam.vip ONLINE ONLINE on ora2-oam
ora.rms.db ONLINE ONLINE on ora1-oam
ora.rms.db.rms.com.cs ONLINE ONLINE on ora1-oam
ora.rms.db.rms.com.rms1.srv ONLINE ONLINE on ora1-oam
ora.rms.db.rms.com.rms2.srv ONLINE ONLINE on ora2-oam
ora.rms.rms1.inst ONLINE ONLINE on ora1-oam
ora.rms.rms2.inst ONLINE ONLINE on ora2-oam
ora.tmpl.db ONLINE ONLINE on ora1-oam
ora.tmpl.tmpl1.inst ONLINE ONLINE on ora1-oam
ora.tmpl.tmpl2.inst ONLINE ONLINE on ora2-oam
ora.vcas.db ONLINE ONLINE on ora1-oam
ora.vcas.db.vcas.com.cs ONLINE ONLINE on ora1-oam
ora.vcas.db.vcas.com.vcas1.srv ONLINE ONLINE on ora1-oam
ora.vcas.db.vcas.com.vcas2.srv ONLINE ONLINE on ora2-oam
ora.vcas.vcas1.inst ONLINE ONLINE on ora1-oam
ora.vcas.vcas2.inst ONLINE ONLINE on ora2-oam
ora.vmxcsmdb.db ONLINE ONLINE on ora1-oam
ora.vmxcsmdb.vmxcsmdb1.inst ONLINE ONLINE on ora1-oam
ora.vmxcsmdb.vmxcsmdb2.inst ONLINE ONLINE on ora2-oam
this is the status when both nodes were up.....

ORA-27504: IPC error creating OSD context : Unable to start second node

I have set the DB parameter CLUSTER_INTERCONNECT to point to the Inet addr.
oifcfg getif
bondeth0
172.23.250.128 global public
bondib0 192.168.8.0 global
cluster_interconnect
When I try to restart the DB services, it is throwing below error while starting the second node.
These are set of commands I have executed to change the DB Parameter
alter system set cluster_interconnects = '192.168.10.6' scope=spfile sid='RAC1' ;
alter system set cluster_interconnects = '192.168.10.7' scope=spfile sid='RAC2' ;
alter system set cluster_interconnects = '192.168.10.6' scope=spfile sid='ASM1' ;
alter system set cluster_interconnects = '192.168.10.7' scope=spfile sid='ASM2' ;
On second node
SQL> startup ;
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 192.168.10.6 not found. Check output from ifconfig command
SQL>
please let me know whether the proceedure I have followed is wrong
Thanks

Node 1:
[oracle@prdat137db03 etc]$ /sbin/ifconfig bondib0
bondib0   Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:192.168.10.6 Bcast:192.168.11.255 Mask:255.255.252.0
          inet6 addr: fe80::221:2800:1ef:bc4f/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST MTU:65520 Metric:1
          RX packets:32550051 errors:0 dropped:0 overruns:0 frame:0
          TX packets:32395961 errors:0 dropped:42 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:19382043590 (18.0 GiB) TX bytes:17164065360 (15.9 GiB)
[oracle@prdat137db03 etc]$
Node 2:
[oracle@prdat137db04 ~]$ /sbin/ifconfig bondib0
bondib0   Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:192.168.10.7 Bcast:192.168.11.255 Mask:255.255.252.0
          inet6 addr: fe80::221:2800:1ef:abdb/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST MTU:65520 Metric:1
          RX packets:29618287 errors:0 dropped:0 overruns:0 frame:0
          TX packets:30769233 errors:0 dropped:12 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:16453595058 (15.3 GiB) TX bytes:18960175021 (17.6 GiB)
[oracle@prdat137db04 ~]$

RAC, ASM failed to start up on second node , ORA-03113: end-of-file on comm

i'm installing an RAC with 2 nodes on top of ASM
when creating ASM Diskgroup , it failed and reported error CRS-0215 failed to start asm on node2
Oracle 10.2.0.1
linux CentOs 4.x
u01/app/oracle/product/10.2.0/db_1/bin/dbca -progress_only   -configureASM -templateName NO_VALUE -gdbName NO -sid NO      -emConf
iguration NONE    -diskList /dev/raw/raw2,/dev/raw/raw3 -diskGroupName DATA -datafileJarLocation /u01/app/oracle/product/10.2.0/db_
1/assistants/dbca/templates -responseFile NO_VALUE -nodeinfo node1,node2    -obfuscatedPasswords true   -oratabLocation /u01/app/o
racle/product/10.2.0/db_1/install/oratab   -asmSysPassword 05dbb0be38ecf8cca822cf3cf99e675448 -redundancy EXTERNA
[oracle@node2 bin]$ ./crs_stat -t -v
Name           Type           R/RA   F/FT   Target    State     Host
ora....SM1.asm application    0/5    0/0    ONLINE    ONLINE    node1
ora....E1.lsnr application    0/5    0/0    ONLINE    ONLINE    node1
ora.node1.gsd application    0/5    0/0    ONLINE    ONLINE    node1
ora.node1.ons application    0/3    0/0    ONLINE    ONLINE    node1
ora.node1.vip application    0/0    0/0    ONLINE    ONLINE    node1
ora....SM2.asm application    0/5    0/0    OFFLINE   OFFLINE
ora....E2.lsnr application    0/5    0/0    ONLINE    ONLINE    node2
ora.node2.gsd application    0/5    0/0    ONLINE    ONLINE    node2
ora.node2.ons application    0/3    0/0    ONLINE    ONLINE    node2
ora.node2.vip application    0/0    0/0    ONLINE    ONLINE    node2
i checked the status , asm is able to start on both nodes if not at the same time ,
when trying to start the second node , with srvctl or sqlplus , each give the error 03113
can anyone suggest me of how to bring up both instances ,
thanks~
[oracle@node2 bin]$ srvctl stop asm -n node1
[oracle@node2 bin]$ srvctl start asm -n node1
[oracle@node2 bin]$ srvctl start asm -n node2
PRKS-1009 : Failed to start ASM instance "+ASM2" on node "node2", [PRKS-1009 : Failed to start ASM instance "+ASM2" on node "node2", [node2:ora.node2.ASM2.asm:
node2:ora.node2.ASM2.asm:SQL*Plus: Release 10.2.0.1.0 - Production on Wed May 27 16:14:50 2009
node2:ora.node2.ASM2.asm:
node2:ora.node2.ASM2.asm:Copyright (c) 1982, 2005, Oracle. All rights reserved.
node2:ora.node2.ASM2.asm:
node2:ora.node2.ASM2.asm:Enter user-name: Connected to an idle instance.
node2:ora.node2.ASM2.asm:
node2:ora.node2.ASM2.asm:SQL> ORA-03113: end-of-file on communication channel
node2:ora.node2.ASM2.asm:SQL> Disconnected
node2:ora.node2.ASM2.asm:
[code/]
Edited by: zs_hzh on May 27, 2009 1:25 AM

Is it possible to start ASM on second node with SQL*Plus in NOMOUNT state?

ONS failed to start on second node

Hi,
I have a problem with ons on 10g rac running on linux 5.3
on node 1 it is running without problem but on second node i got this error
2009-04-08 16:30:41.318: [    RACG][3065611968] [16553][3065611968][ora.rac2.ons]: RCV: Permission denied
Communication error with the OPMN server local port.
Check the OPMN log files
RCV: Permission denied
Communication error with the OPMN server local port.
Check the OPMN log files
RCV: Permission denied
Communication error with the
2009-04-08 16:30:41.319: [    RACG][3065611968] [16553][3065611968][ora.rac2.ons]: OPMN server local port.
Check the OPMN log files
RCV: Permission denied
Communication error with the OPMN server local port.
Check the OPMN log files
RCV: Permission denied
Communication error with the OPMN server local port.
Check the OPMN log files
2009-04-08 16:30:41.319: [    RACG][3065611968] [16553][3065611968][ora.rac2.ons]: RCV: Permission denied
Communication error with the OPMN server local port.
Check the OPMN log files
Number of onsconfiguration retrieved, numcfg = 2
onscfg[0]
{node = rac1, port = 6200}
Adding remote host rac1:6200
onscfg[1]
{node = rac2, port = 6
2009-04-08 16:30:41.319: [    RACG][3065611968] [16553][3065611968][ora.rac2.ons]: 200}
Adding remote host rac2:6200
RCV: Permission denied
Communication error with the OPMN server local port.
Check the OPMN log files
RCV: Permission denied
Communication error with the OPMN server local port.
Check the OPMN log files
RCV: Permission d
2009-04-08 16:30:41.319: [    RACG][3065611968] [16553][3065611968][ora.rac2.ons]: enied
Communication error with the OPMN server local port.
Check the OPMN log files
RCV: Permission denied
Communication error with the OPMN server local port.
Check the OPMN log files
RCV: Permission denied
Communication error with the OPMN server loca
2009-04-08 16:30:41.319: [    RACG][3065611968] [16553][3065611968][ora.rac2.ons]: l port.
Check the OPMN log files
RCV: Permission denied
Communication error with the OPMN server local port.
Check the OPMN log files
Number of onsconfiguration retrieved, numcfg = 2
onscfg[0]
{node = rac1, port = 6200}
Adding remote host rac1:6200
o
2009-04-08 16:30:41.319: [    RACG][3065611968] [16553][3065611968][ora.rac2.ons]: nscfg[1]
{node = rac2, port = 6200}
Adding remote host rac2:6200
onsctl: ons failed to start
2009-04-08 16:30:41.319: [    RACG][3065611968] [16553][3065611968][ora.rac2.ons]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/app/crs
2009-04-08 16:30:41.319: [    RACG][3065611968] [16553][3065611968][ora.rac2.ons]: clsrcexecut: cmd = /u01/app/crs/bin/racgeut -e USRORA_DEBUG=0 540 /u01/app/crs/bin/onsctl start
2009-04-08 16:30:41.320: [    RACG][3065611968] [16553][3065611968][ora.rac2.ons]: clsrcexecut: rc = 1, time = 2.580s
2009-04-08 16:30:42.148: [    RACG][3065611968] [16553][3065611968][ora.rac2.ons]: RCV: Permission denied
Communication error with the OPMN server local port.
Check the OPMN log files
RCV: Permission denied
Communication error with the OPMN server local port.
Check the OPMN log files
RCV: Permission denied
Communication error with the
2009-04-08 16:30:42.150: [    RACG][3065611968] [16553][3065611968][ora.rac2.ons]: OPMN server local port.
Check the OPMN log files
RCV: Permission denied
Communication error with the OPMN server local port.
Check the OPMN log files
RCV: Permission denied
Communication error with the OPMN server local port.
Check the OPMN log files
2009-04-08 16:30:42.150: [    RACG][3065611968] [16553][3065611968][ora.rac2.ons]: RCV: Permission denied
Communication error with the OPMN server local port.
Check the OPMN log files
Number of onsconfiguration retrieved, numcfg = 2
onscfg[0]
{node = rac1, port = 6200}
Adding remote host rac1:6200
onscfg[1]
{node = rac2, port = 6
2009-04-08 16:30:42.150: [    RACG][3065611968] [16553][3065611968][ora.rac2.ons]: 200}
Adding remote host rac2:6200
ons is not running ...
2009-04-08 16:30:42.151: [    RACG][3065611968] [16553][3065611968][ora.rac2.ons]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/app/crs
2009-04-08 16:30:42.151: [    RACG][3065611968] [16553][3065611968][ora.rac2.ons]: clsrcexecut: cmd = /u01/app/crs/bin/racgeut -e USRORA_DEBUG=0 540 /u01/app/crs/bin/onsctl ping
2009-04-08 16:30:42.151: [    RACG][3065611968] [16553][3065611968][ora.rac2.ons]: clsrcexecut: rc = 1, time = 0.840s
2009-04-08 16:30:42.153: [    RACG][3065611968] [16553][3065611968][ora.rac2.ons]: end for resource = ora.rac2.ons, action = start, status = 1, time = 3.620s
2009-04-08 16:30:44.376: [    RACG][3066242752] [17061][3066242752][ora.rac2.ons]: onsctl: shutting down ons daemon ...
Number of onsconfiguration retrieved, numcfg = 2
onscfg[0]
{node = rac1, port = 6200}
Adding remote host rac1:6200
onscfg[1]
{node = rac2, port = 6200}
Adding remote host rac2:6200
Any idea how to fix this?
Thanks

check the output for crs_getperm for the resource from both nodes. If you could, post them here.
Regards,
Ganesh

UCCX 4.0 - Second node status is unknown

Similar Messages

Maybe you are looking for