RAC Node 2 is down !!
When we want to startup node2 of our RAC installation we see the below error on alert.log file;
Errors in file /u01/oracle/product/10.2.0/db/rdbms/log/uret2_ora_8166.trc:
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:skgxnqtsz failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: SKGXN not av
clsssinit ret = 21
interconnect information is not available from OCR
WARNING: No cluster interconnect has been specified. Depending on
the communication driver configured Oracle cluster traffic
may be directed to the public interface of this machine.
Oracle recommends that RAC clustered databases be configured
with a private interconnect for enhanced security and
performance.
Picked latch-free SCN scheme 3
Please refer to Metalink , oracle support
Subject: Troubleshooting ORA-27300 ORA-27301 ORA-27302 errors
Doc ID: Note:579365.1
also,
http://translate.google.co.in/translate?hl=en&sl=zh-CN&u=http://space.itpub.net/35489/viewspace-687019&ei=nfpQToqnHMHprQeM7ICtAg&sa=X&oi=translate&ct=result&resnum=5&ved=0CEAQ7gEwBA&prev=/search%3Fq%3DORA-27504,ORA-27300,ORA-27301,ORA-27302%26hl%3Den%26biw%3D1366%26bih%3D575%26prmd%3Divnsfd
http://chandrapabba.blogspot.com/2010/02/ora-27504-while-starting-11gr1-database.html
http://www.oraclefaq.net/2007/05/23/ora-27504-ipc-error-creating-osd-context-on-ibm-aix-system-with-10gr2-rac/
Edited by: rajeysh on Aug 21, 2011 6:05 PM
Similar Messages
-
Archive REDO When One RAC Node is Down
I have a question about how redo log get archived when one of the instances in a two node RAC cluster is down (not open, not mounted).
For example, let's assume instance1 was shutdown and only instance 2 is running.
I have 6 redo logs:
<font face="courier">
SQL> SELECT GROUP#, THREAD#, SEQUENCE#, STATUS FROM V$LOG;
GROUP# THREAD# SEQUENCE# STATUS
1 1 3390 INACTIVE
2 1 3389 INACTIVE
3 1 3391 ACTIVE
5 2 3886 INACTIVE
4 2 3887 INACTIVE
6 2 3888 CURRENT
</font>
If I run the following statement what will happen?
<font face="courier">
SQL> ALTER SYSTEM ARCHIVE LOG CURRENT;
</font>
The documentation says +Specify CURRENT to manually archive the current redo log file group of the specified thread, forcing a log switch. If you omit the THREAD parameter, then Oracle Database archives all redo log file groups from all enabled threads, including logs previous to current logs. You can specify CURRENT only when the database is open.+
Would Oracle archive sequence# 3391 from thread 1 even though the instance is not open?When your instance are not working, it means that it doesn't have any CURRENT redo log. So when you issue switching logfile - it launches archiving only current redo log, for working instance only. That 3391 redo log is not current, it has Active status, that means that it could be needed for recovery purpose, but it had to be archived earlier.
-
JDBC read stuck if RAC node goes down
We did several tests with Java applications against our RAC DB and face a hanging application if we power off the RAC node that executes the current (long) running query.
We can see that the application receives HA-events via UCP:
2015-01-22 13:02:11 | r-thread-1 | WARN | o.ucp.jdbc.oracle.ONSDatabaseFailoverEvent | NO timezone in HA event
However, the application started a query before and the query is not aborted with an exception. A Thread dump after about 7 minutes shows that the application is hanging in a socket read call:
"pool-1-thread-1" #32 prio=5 os_prio=0 tid=0x00007fedf45b2000 nid=0xbc4 runnable [0x00007fee00cd3000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at oracle.net.ns.Packet.receive(Packet.java:283)
at oracle.net.ns.DataPacket.receive(DataPacket.java:103)
at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:230)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:175)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:100)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:85)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79)
at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1122)
at oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1099)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:288)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:191)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:523)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:207)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:863)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1153)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1275)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3576)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3620)
- locked <0x00000000c0ddcb20> (a oracle.jdbc.driver.T4CConnection)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1491)
at org.springframework.jdbc.core.JdbcTemplate$1.doInPreparedStatement(JdbcTemplate.java:703)
The expected behaviour would be that a running query is aborted with an exception. (BTW: This happens if the service is taken down with "shutdown immediate". All ok for this case.)
We consider to implement custom ONS listeners [1], but we actually expect that UCP would handle such situations or lets us register strategies/callbacks for certain events.
Our config:
Oracle Enterprise 11.2.0.4.0 with RAC
ons.jar 12.1.0.1
ojdbc6.jar 11.2.0.2
ucp.jar 12.1.0.1
Server JRE 1.8.0_25
Any hints appreciated.
[1] http://docs.oracle.com/cd/E11882_01/java.112/e16548/apxracfan.htm#JJDBC28945You're concept isn't right:
http://docs.oracle.com/cd/E11882_01/server.112/e25494/restart.htm#ADMIN13178
Overview of Fast Application Notification
FAN is a notification mechanism that Oracle Restart can use to notify other processes about configuration changes that include service status changes, such as UP or DOWN events. FAN provides the ability to immediately terminate inflight transaction when an instance or server fails. Integrated Oracle clients receive the events and respond. Applications can respond either by propagating the error to the user or by resubmitting the transactions and masking the error from the application user. When a DOWN event occurs, integrated clients immediately clean up connections to the terminated database. When an UP event occurs, the clients create new connections to the new primary database instance.
Also, take a look at these docs: http://docs.oracle.com/cd/E11882_01/java.112/e12265/rac.htm#JJUCP08100 ; and https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=890204623685515&id=566573.1&_afrWindowMode=0&_adf.ctrl-s…
And make a test, execute a query that took about 1 minute and after you executed, just power down the node where it is executing, to see if it will retrieve the results.
Regards. -
One RAC node is down give the following error when starting the database!
wHEN TRYING TO START THE DATABASE ON RAC ENVIORNMENT
SQL> connect sys as sysdba
Enter password:
Connected to an idle instance.
SQL> startup
ORA-27102: out of memory
HPUX-ia64 Error: 12: Not enough space
SQL> exit
Disconnected
When we r trying to start the database it said
$ bdf
Filesystem kbytes used avail %used Mounted on
/dev/vg00/lvol3 2097152 240944 1841896 12% /
/dev/vg00/lvol1 344064 115376 226944 34% /stand
/dev/vg00/lvol8 10485760 9370960 1106232 89% /var
/dev/vg00/lvol7 4866048 2557680 2290400 53% /usr
/dev/vg00/u02 10485760 3502229 6547116 35% /u02
/dev/vg00/u01 10485760 10476596 9164 100% /u01
/dev/vg00/lvol4 2097152 601872 1483944 29% /tmp
/dev/vg00/lvol6 4194304 3231000 955792 77% /opt
/dev/vg00/lvol5 524288 311520 211136 60% /home
WHERE /U01 WAS 100%. Now i emptied the space in /u01 to
$ bdf
Filesystem kbytes used avail %used Mounted on
/dev/vg00/lvol3 2097152 240944 1841896 12% /
/dev/vg00/lvol1 344064 115376 226944 34% /stand
/dev/vg00/lvol8 10485760 9370960 1106232 89% /var
/dev/vg00/lvol7 4866048 2557680 2290400 53% /usr
/dev/vg00/u02 10485760 3502229 6547116 35% /u02
/dev/vg00/u01 10485760 9508934 930943 91% /u01
/dev/vg00/lvol4 2097152 601872 1483944 29% /tmp
/dev/vg00/lvol6 4194304 3231000 955792 77% /opt
/dev/vg00/lvol5 524288 311520 211136 60% /home
When trying to start the db again its giving the following error...
SQL> connect sys as sysdba
Enter password:
Connected to an idle instance.
SQL> startup
ORA-27102: out of memory
HPUX-ia64 Error: 12: Not enough space
SQL> exit
Disconnected
here i changed the sga_target and now it says
ORACLE instance started.
Total System Global Area 436207616 bytes
Fixed Size 1297912 bytes
Variable Size 148648456 bytes
Database Buffers 285212672 bytes
Redo Buffers 1048576 bytes
ORA-01105: mount is incompatible with mounts by other instances
ORA-19808: recovery destination parameter mismatch
What could be the issue..
Ur help would be highly appreciated...Hello
SQL> startup
ORA-27102: out of memory
HPUX-ia64 Error: 12: Not enough space
SQL> exiterror is not related to space on your mount point. it is related to memory.
if you are getting this error means chekc at the OS level whether something is consuming more memory due to which it is not allowing oracle to allocate sga.
Check top/sar/glance to see who is consuming more memory
Total System Global Area 436207616 bytes
Fixed Size 1297912 bytes
Variable Size 148648456 bytes
Database Buffers 285212672 bytes
Redo Buffers 1048576 bytes
ORA-01105: mount is incompatible with mounts by other instances
ORA-19808: recovery destination parameter mismatchit is not the best practice to maintain differnet parameters for each instance in RAC env. also check db_recovery_file_dest and db_recovery_file_dest_size is same on all node. it should be same i.e it should be a shared location.
Anil Malkai -
ORA-12514 on R12 EBS Apps server when 1 DB RAC node crashed/down
Just now Production 11.2.0.2 RAC DB on windows 2008 server Node1 crashed. While on Node 2 all services are up and running including database. But from EBS R12.1.2 application server when connecting as username/password from sql*plus is throwing ORA-12514 error.
While It is connecting if I give username/password@TNSNAME but not without @TNSNAME. Due to this none of the application services are starting.
Please help/advise. Thank you.
Following is the tnsnames.ora,
# This file is automatically generated by AutoConfig. It will be read and
# overwritten. If you were instructed to edit this file, or if you are not
# able to use the settings created by AutoConfig, refer to Metalink Note
# 387859.1 for assistance.
#$Header: NetServiceHandler.java 120.19.12010000.6 2010/03/09 08:11:36 jmajumde ship $
ORCL=
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=orcldbscan)(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=ORCL)
(INSTANCE_NAME=ORCL1)
ORCL1=
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=orcldbscan)(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=ORCL)
(INSTANCE_NAME=ORCL1)
ORCL1_FO=
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=orcldbscan)(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=ORCL)
(INSTANCE_NAME=ORCL1)
ORCL_FO=
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=orcldbscan)(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=ORCL)
(INSTANCE_NAME=ORCL1)
ORCL1=
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=orcldbscan)(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=ORCL)
(INSTANCE_NAME=ORCL1)
ORCL1_FO=
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=orcldbscan)(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=ORCL)
(INSTANCE_NAME=ORCL1)
ORCL2=
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=orcldb2-vip.sa.company.net)(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=ORCL)
(INSTANCE_NAME=ORCL2)
ORCL2_FO=
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=orcldb2-vip.sa.company.net)(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=ORCL)
(INSTANCE_NAME=ORCL2)
ORCL_BALANCE=
(DESCRIPTION=
(ADDRESS_LIST=
(LOAD_BALANCE=YES)
(FAILOVER=YES)
(ADDRESS=(PROTOCOL=tcp)(HOST=orcldb1-vip.sa.company.net)(PORT=1521))
(ADDRESS=(PROTOCOL=tcp)(HOST=orcldb2-vip.sa.company.net)(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=ORCL)
FNDFS_orclAPPL=
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=orclAPPL.sa.company.net)(PORT=1626))
(CONNECT_DATA=
(SID=FNDFS)
FNDFS_orclAPPL.sa.company.net=
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=orclAPPL.sa.company.net)(PORT=1626))
(CONNECT_DATA=
(SID=FNDFS)
FNDFS_ORCL_orclAPPL=
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=orclAPPL.sa.company.net)(PORT=1626))
(CONNECT_DATA=
(SID=FNDFS)
FNDFS_ORCL_orclAPPL.sa.company.net=
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=orclAPPL.sa.company.net)(PORT=1626))
(CONNECT_DATA=
(SID=FNDFS)
FNDSM_orclAPPL_ORCL=
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=orclAPPL.sa.company.net)(PORT=1626))
(CONNECT_DATA=
(SID=FNDSM)
FNDSM_orclAPPL.sa.company.net_ORCL=
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=orclAPPL.sa.company.net)(PORT=1626))
(CONNECT_DATA=
(SID=FNDSM)
FNDFS_APPLTOP_orclappl=
(DESCRIPTION=
(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=tcp)(HOST=orclAPPL.sa.company.net)(PORT=1626))
(CONNECT_DATA=
(SID=FNDFS)
IFILE=E:\ORACLE\ORCL\INST\APPS\ORCL_orclappl\ora\10.1.2\network\admin\ORCL_orclappl_ifile.oraA database/OS crash shouldn't have such impact on your configuration. Please review the following docs and verify your setup.
Using Oracle 11g Release 2 Real Application Clusters with Oracle E-Business Suite Release 12 (Doc ID 823587.1)
Configuring and Managing E-Business Application Tier for RAC (Doc ID 1311528.1)
Thanks,
Hussein -
I have a two node rac setup. One Node went down because of hardware issues. And it seems that I cannot connect from client (jdbc) when SCAN gives particular ip.
I receive : ORA-12514, TNS:listener does not currently know of service requested in connect descriptor. If DNS returns the correct ip - everything works fine.
connection string:
jdbc:oracle:thin:@(DESCRIPTION= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST=testracscan.internal.int)(PORT=1521)) (CONNECT_DATA=(SERVICE_NAME=testdb.internal.int)))
Interfaces show that VIPS and SCANS are assigned correctly on Node 1:
vlan65 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
inet addr:192.168.2.10 Bcast:192.168.2.255 Mask:255.255.255.0
inet6 addr: fe80::2e76:8aff:fe4f:b5cc/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:937195 errors:0 dropped:0 overruns:0 frame:0
TX packets:852745 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:186434457 (177.7 MiB) TX bytes:141217705 (134.6 MiB)
vlan65:1 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
inet addr:192.168.2.25 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
vlan65:2 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
inet addr:192.168.2.35 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
vlan65:3 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
inet addr:192.168.2.30 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
vlan65:4 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
inet addr:192.168.2.110 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
vlan65:5 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
inet addr:192.168.2.115 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
[oracle@srvtestdb1 ~]$ lsnrctl status
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 03-SEP-2012 15:35:05
Copyright (c) 1991, 2011, Oracle. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
Alias LISTENER
Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date 29-AUG-2012 15:52:57
Uptime 4 days 23 hr. 42 min. 7 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File /u01/app/grid/diag/tnslsnr/srvtestdb1/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.2.10)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.2.110)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "testdb.internal.int" has 1 instance(s).
Instance "testdb1", status READY, has 1 handler(s) for this service...
Service "testdbXDB.internal.int" has 1 instance(s).
Instance "testdb1", status READY, has 1 handler(s) for this service...
Service "testdbsvc.internal.int" has 1 instance(s).
Instance "testdb1", status READY, has 1 handler(s) for this service...
The command completed successfully
[oracle@srvtestdb1 ~]$
SQL> show parameter listener
NAME TYPE VALUE
listener_networks string
local_listener string (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.2.110)(PORT=1521))))
remote_listener string testracscan.internal.int:1521
nslookup testracscan.internal.int
Server: 192.168.0.18
Address: 192.168.0.18#53
Name: testracscan.internal.int
Address: 192.168.2.30
Name: testracscan.internal.int
Address: 192.168.2.25
Name: testracscan.internal.int
Address: 192.168.2.35
Problems arise when client ip is resolved to 192.168.2.35 - i get ORA12514.
When IP is resolved to 192.168.2.110 it simply sits ant waits for a moment and then begins to work, and nestat shows:
tcp 0 0 ::ffff:1 192.168.2.5:51685 ::ffff:192.168.2.110:1521 ESTABLISHED
What might be causing this?[grid@srvtestdb1 ~]$ ps -ef|grep tns
root 65 2 0 Aug29 ? 00:00:00 [netns]
grid 4449 1 0 Aug29 ? 00:00:25 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN2 -inherit
grid 4454 1 0 Aug29 ? 00:00:23 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN3 -inherit
grid 4481 1 0 Aug29 ? 00:00:33 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER -inherit
grid 37028 1 0 09:38 ? 00:00:00 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
grid 37901 36372 0 09:45 pts/0 00:00:00 grep tns
[grid@srvtestdb1 ~]$
[grid@srvtestdb1 ~]$ srvctl config scan_listener
SCAN Listener LISTENER_SCAN1 exists. Port: TCP:1521
SCAN Listener LISTENER_SCAN2 exists. Port: TCP:1521
SCAN Listener LISTENER_SCAN3 exists. Port: TCP:1521
[grid@srvtestdb1 ~]$
[grid@srvtestdb1 ~]$ srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is running on node srvtestdb1
SCAN Listener LISTENER_SCAN2 is enabled
SCAN listener LISTENER_SCAN2 is running on node srvtestdb1
SCAN Listener LISTENER_SCAN3 is enabled
SCAN listener LISTENER_SCAN3 is running on node srvtestdb1
[grid@srvtestdb1 ~]$ srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node srvtestdb1
SCAN VIP scan2 is enabled
SCAN VIP scan2 is running on node srvtestdb1
SCAN VIP scan3 is enabled
SCAN VIP scan3 is running on node srvtestdb1 -
Backing up the RAC DB when either one of the node is down
11.2.0.2/Solaris 10 (x86-64bit) For our 2-Node production RAC DB, I had configured RMAN backup from Node1 using Cronjob. Last weekend our Node1 went down. Our SMS notifying system which sends SMS alerts to our Mobiles went down on the weekend as well. Only by Monday Noon we came to know that Node1 is down and that there is no backup for Saturday and Sunday.
How can i make sure that RMAN backup of the DB will be taken even if either one of the Nodes go down ? My friend suggested IBM TWS scheduler. Can Tivoli Work Scheduler detect a dead RAC Node and fire RMAN backup from the surviving node ?I don't know the answer regarding TWS, but if you run the backup from crontab I guess that you don't have any 3rd party tool now.
I think the easiest solution will be to have the script and crontab job on both servers and decide which one runs the backup.
For example, the script that is scheduled in the crontab will do:
1. if $HOSTNAME is node1 run the backup. If $HOSTNAME is node2, check if node1 is up and if not run the backup.
2. This is more elegant, check the "crsctl status resource" for something and run the backup accordingly. For example, the script will check where SCAN1 VIP is located and this is the node which will run the backup.
HTH
Liron -
What would happened when one RAC node's public NIC down ?
Dear all,
There's a two-node RAC on my office. As my observation, when one RAC node's public NIC is down, the "crs_stat -t" command wouldn't show any resource is OFFLINE. So at this moment if i use sqlplus to connect to my RAC db, it still will choose node1 or node2 instance randomly right? And if i've been assigned to the node's instance whose public NIC is down, my sqlplus would failed to connect to db?
So, can we say that Oracle RAC can't supply HA function if one node's public NIC is down? Or is there just any other solution to solve this issue? Any suggestion would be appreciated.Public node is down means that only one node would be able to take the load i.e. in your case it will be Node 2. It may happen that CRS is unable to record the status - in such case there may be failed connections to one node.
-
Oracle Forms 10g runtime handling during RAC node failover.
Hi,
Forms version 10g R2 (10.1.2.0.2)
Oracle DB version 10g R2 RAC with 3 nodes.
If the RAC DB node that the user is connected to goes down, the user gets FRM-40733 and ORA-03114 error messages and the client forms application gets locked down/ goes in a loop with the error messages. The user has to close the browser to get out of the loop. I understand that this is the expected behaviour, but I'm wondering whether we can trap the error ORA-03114 and fire the "key-exit" trigger to get out of the application.
Have any one implemented a clean way to exit the Forms application when the RAC DB node goes down..?
I'm looking for some suggestions or an elegant way to handle the above failure.
Thank you in advance.
SudhakarGlen,
I haven't solve this one yet.I have been playing around with the following:
In my environment, I am still using 6i (not web) forms/reports.
My clients are XP, NT, 2000.
I have the forms/report runtime installed on their PCs.
Their TNSNAMES.ORA will be pointing to PRIMARY ( PDB)
If a SWITCHOVER or FAILOVER happens to the physical standby (SDB), I want a trigger to kick a batch file that will manipulate the TNSNAME.ORA on each clients stations.
On the standby
CREATE OR REPLACE TRIGGER change_tns
AFTER DB_ROLE_CHANGE ON DATABASE
DECLARE
role VARCHAR2(30);
dbname varchar2(100);
BEGIN
SELECT
DATABASE_ROLE,
DB_UNIQUE_NAME
INTO
role,
dbname
FROM
V$DATABASE;
IF role = 'PRIMARY' and dbname='SDB' THEN
dbms_scheduler.create_job(
job_name=>'move_sqlnet',
job_type=>'executable',
job_action=>'c:\temp\movetns.cmd',
enabled=TRUE
ELSE
-- if the standby >was< PRIMARY,
-- but the primary comes BACK on line,
-- need to reverse the step above.
END IF;
END;
As for the movetns.cmd
something like
rem -- attach to the workstation,
net use m: \\station name\share name
rem -- stdb_tnsname.ora would be pointing to STANDBY
copy stdb_tnsname.ora m:\orant\net80\tnsname.ora
net use m: /delete
rem -- need to do that for all workstations..
As you can see, there could be lots of problems with this procedure.
Client doesn't know about the failover, starts a RE-BOOT on the pc, therefore, the new tnsnames.ora will not get to client.. what to do for that client? Do i re-run the batch ...every hour?
tell me if you come up with an answer..
p- -
Hello everyone,
I have met an error,that is our RAC node auto restart with below messages.
#/u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/alert_odsdb1.log
Fri Jun 07 12:23:42 2013
Thread 1 cannot allocate new log, sequence 58363
Checkpoint not complete
Current log# 2 seq# 58362 mem# 0: +DATA/odsdb/onlinelog/group_2.265.812288839
Current log# 2 seq# 58362 mem# 1: +DATA/odsdb/onlinelog/group_2.266.812288839
Fri Jun 07 12:23:42 2013
NOTE: ASMB terminating
Errors in file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc:
ORA-15064: ? ASM ??????
ORA-03113: ?????????
?? ID:
?? ID: 2047 ???: 5
Errors in file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc:
ORA-15064: ? ASM ??????
ORA-03113: ?????????
?? ID:
?? ID: 2047 ???: 5
ASMB (ospid: 32641): terminating the instance due to error 15064
Fri Jun 07 12:23:44 2013
ORA-1092 : opitsk aborting process
Fri Jun 07 12:23:46 2013
ORA-1092 : opitsk aborting process
Instance terminated by ASMB, pid = 32641
Fri Jun 07 12:25:02 2013
Starting ORACLE instance (normal)
Fri Jun 07 12:25:23 2013
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
[name='eth1:1', type=1, ip=169.254.37.103, mac=00-26-55-eb-61-89, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface 'eth0' configured from GPnP for use as a public interface.
[name='eth0', type=1, ip=135.33.2.8, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
Public Interface 'eth0:1' configured from GPnP for use as a public interface.
[name='eth0:1', type=1, ip=135.33.2.13, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/oracle/product/11.2.0/dbhome_2/dbs/arch
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options.
ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_2
System name: Linux
Node name: odsdb1
Release: 2.6.18-308.el5
Version: #1 SMP Fri Jan 27 17:17:51 EST 2012
Machine: x86_64
Using parameter settings in server-side pfile /u01/app/oracle/product/11.2.0/dbhome_2/dbs/initodsdb1.ora
System parameters with non-default values:
processes = 4500
sessions = 6784
event = ""
spfile = "+DATA/odsdb/spfileodsdb.ora"
nls_language = "SIMPLIFIED CHINESE"
nls_territory = "CHINA"
memory_target = 170G
control_files = "+DATA/odsdb/controlfile/current.262.812288837"
control_files = "+DATA/odsdb/controlfile/current.261.812288837"
db_block_size = 8192
compatible = "11.2.0.0.0"
db_files = 4096
cluster_database = TRUE
db_create_file_dest = "+DATA"
db_recovery_file_dest = ""
db_recovery_file_dest_size= 38820M
thread = 1
undo_tablespace = "UNDOTBS1"
instance_number = 1
remote_login_passwordfile= "EXCLUSIVE"
db_domain = ""
dispatchers = "(PROTOCOL=TCP) (SERVICE=odsdbXDB)"
remote_listener = "odsdb-cluster-scan:1521"
job_queue_processes = 1000
audit_file_dest = "/u01/app/oracle/admin/odsdb/adump"
audit_trail = "DB"
db_name = "odsdb"
open_cursors = 300
diagnostic_dest = "/u01/app/oracle"
Cluster communication is configured to use the following interface(s) for this instance
169.254.37.103
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Fri Jun 07 12:25:33 2013
PMON started with pid=2, OS id=22959
Fri Jun 07 12:25:33 2013
PSP0 started with pid=3, OS id=22962
Fri Jun 07 12:25:34 2013
VKTM started with pid=4, OS id=22971 at elevated priority
VKTM running at (1)millisec precision with DBRM quantum (100)ms
Fri Jun 07 12:25:34 2013
GEN0 started with pid=5, OS id=22977
Fri Jun 07 12:25:34 2013
DIAG started with pid=6, OS id=22979
Fri Jun 07 12:25:35 2013
DBRM started with pid=7, OS id=22981
Fri Jun 07 12:25:35 2013
PING started with pid=8, OS id=22983
Fri Jun 07 12:25:35 2013
ACMS started with pid=9, OS id=22985
Fri Jun 07 12:25:35 2013
DIA0 started with pid=10, OS id=22987
Fri Jun 07 12:25:35 2013
LMON started with pid=11, OS id=22989
Fri Jun 07 12:25:35 2013
LMD0 started with pid=12, OS id=22991
* Load Monitor used for high load check
* New Low - High Load Threshold Range = [61440 - 81920]
Fri Jun 07 12:25:35 2013
LMS0 started with pid=13, OS id=22994 at elevated priority
Fri Jun 07 12:25:35 2013
LMS1 started with pid=14, OS id=22998 at elevated priority
Fri Jun 07 12:25:35 2013
LMS2 started with pid=15, OS id=23002 at elevated priority
Fri Jun 07 12:25:35 2013
LMS3 started with pid=16, OS id=23006 at elevated priority
Fri Jun 07 12:25:35 2013
RMS0 started with pid=17, OS id=23010
Fri Jun 07 12:25:35 2013
LMHB started with pid=18, OS id=23013
Fri Jun 07 12:25:35 2013
MMAN started with pid=19, OS id=23015
Fri Jun 07 12:25:35 2013
DBW0 started with pid=20, OS id=23017
Fri Jun 07 12:25:35 2013
DBW1 started with pid=21, OS id=23019
Fri Jun 07 12:25:35 2013
DBW2 started with pid=22, OS id=23022
Fri Jun 07 12:25:35 2013
DBW3 started with pid=23, OS id=23024
Fri Jun 07 12:25:35 2013
DBW4 started with pid=24, OS id=23026
Fri Jun 07 12:25:35 2013
DBW5 started with pid=25, OS id=23028
Fri Jun 07 12:25:35 2013
DBW6 started with pid=26, OS id=23031
Fri Jun 07 12:25:35 2013
DBW7 started with pid=27, OS id=23033
Fri Jun 07 12:25:35 2013
LGWR started with pid=28, OS id=23035
Fri Jun 07 12:25:35 2013
CKPT started with pid=29, OS id=23037
Fri Jun 07 12:25:35 2013
SMON started with pid=30, OS id=23039
Fri Jun 07 12:25:35 2013
RECO started with pid=31, OS id=23041
Fri Jun 07 12:25:35 2013
RBAL started with pid=32, OS id=23043
Fri Jun 07 12:25:35 2013
ASMB started with pid=33, OS id=23045
Fri Jun 07 12:25:35 2013
MMON started with pid=34, OS id=23048
Fri Jun 07 12:25:35 2013
MMNL started with pid=35, OS id=23052
Fri Jun 07 12:25:35 2013
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
NOTE: initiating MARK startup
starting up 1 shared server(s) ...
Starting background process MARK
Fri Jun 07 12:25:35 2013
MARK started with pid=37, OS id=23056
NOTE: MARK has subscribed
lmon registered with NM - instance number 1 (internal mem no 0)
Reconfiguration started (old inc 0, new inc 119)
List of instances:
1 2 (myinst: 1)
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
* domain 0 valid according to instance 2
* domain 0 valid = 1 according to instance 2
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration started (old inc 119, new inc 121)
List of instances:
1 2 (myinst: 1)
Nested reconfiguration detected.
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Fri Jun 07 12:25:45 2013
Submitted all GCS remote-cache requests
Fri Jun 07 12:26:08 2013
Fix write in gcs resources
Reconfiguration complete
Fri Jun 07 12:26:10 2013
LCK0 started with pid=40, OS id=23632
Fri Jun 07 12:26:10 2013
Starting background process RSMN
Fri Jun 07 12:26:10 2013
RSMN started with pid=41, OS id=23646
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Reusing ORACLE_BASE from an earlier startup = /u01/app/oracle
Fri Jun 07 12:26:11 2013
ALTER SYSTEM SET local_listener=' (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=135.33.2.13)(PORT=1521))))' SCOPE=MEMORY SID='odsdb1';
ALTER DATABASE MOUNT /* db agent *//* {1:9971:2} */
Fri Jun 07 12:26:11 2013
NOTE: Loaded library: System
Fri Jun 07 12:26:11 2013
SUCCESS: diskgroup DATA was mounted
Fri Jun 07 12:26:11 2013
NOTE: dependency between database odsdb and diskgroup resource ora.DATA.dg is established
Fri Jun 07 12:26:16 2013
Successful mount of redo thread 1, with mount id 3452000551
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Lost write protection disabled
Completed: ALTER DATABASE MOUNT /* db agent *//* {1:9971:2} */
ALTER DATABASE OPEN /* db agent *//* {1:9971:2} */
Picked broadcast on commit scheme to generate SCNs
Thread 1 advanced to log sequence 58364 (thread open)
Thread 1 opened at log sequence 58364
Current log# 2 seq# 58364 mem# 0: +DATA/odsdb/onlinelog/group_2.265.812288839
Current log# 2 seq# 58364 mem# 1: +DATA/odsdb/onlinelog/group_2.266.812288839
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Jun 07 12:26:21 2013
SMON: enabling cache recovery
Fri Jun 07 12:26:23 2013
minact-scn: Inst 1 is a slave inc#:121 mmon proc-id:23048 status:0x2
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
Fri Jun 07 12:26:34 2013
[23651] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:2061372614 end:2061384964 diff:12350 (123 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
Fri Jun 07 12:26:34 2013
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
Starting background process GTX0
Fri Jun 07 12:26:35 2013
GTX0 started with pid=45, OS id=23931
Starting background process RCBG
Fri Jun 07 12:26:35 2013
RCBG started with pid=46, OS id=23933
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Fri Jun 07 12:26:35 2013
QMNC started with pid=48, OS id=23940
Completed: ALTER DATABASE OPEN /* db agent *//* {1:9971:2} */
Fri Jun 07 12:26:38 2013
Starting background process CJQ0
Fri Jun 07 12:26:38 2013
CJQ0 started with pid=55, OS id=23977
Fri Jun 07 12:27:56 2013
Thread 1 advanced to log sequence 58365 (LGWR switch)
Current log# 1 seq# 58365 mem# 0: +DATA/odsdb/onlinelog/group_1.263.812288839
Current log# 1 seq# 58365 mem# 1: +DATA/odsdb/onlinelog/group_1.264.812288839
Fri Jun 07 12:28:18 2013
Starting background process SMCO
Fri Jun 07 12:28:18 2013
SMCO started with pid=70, OS id=25166
Fri Jun 07 12:29:01 2013
Thread 1 cannot allocate new log, sequence 58366
Trace file /u01/app/oracle/diag/rdbms/odsdb/odsdb1/trace/odsdb1_asmb_32641.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_2
System name: Linux
Node name: odsdb1
Release: 2.6.18-308.el5
Version: #1 SMP Fri Jan 27 17:17:51 EST 2012
Machine: x86_64
Instance name: odsdb1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 33
Unix process pid: 32641, image: oracle@odsdb1 (ASMB)
*** 2013-05-14 15:37:08.705
*** SESSION ID:(3499.1) 2013-05-14 15:37:08.705
*** CLIENT ID:() 2013-05-14 15:37:08.705
*** SERVICE NAME:() 2013-05-14 15:37:08.705
*** MODULE NAME:() 2013-05-14 15:37:08.705
*** ACTION NAME:() 2013-05-14 15:37:08.705
NOTE: initiating MARK startup
*** 2013-05-14 15:37:16.835
instance health monitoring reports instance shutting down
*** 2013-06-07 12:23:42.700
NOTE: ASMB terminating
ORA-15064: ? ASM ??????
ORA-03113: ?????????
?? ID:
?? ID: 2047 ???: 5
error 15064 detected in background process
ORA-15064: ? ASM ??????
ORA-03113: ?????????
?? ID:
?? ID: 2047 ???: 5
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+461<-kjzdssdmp()+267<-kjzduptcctx()+232<-kjzdicrshnfy()+53<-ksuitm()+1332<-ksbrdp()+3344<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+252<-main()+201<-__libc_start_main()+244<-_start()+36
----- End of Abridged Call Stack Trace -----
*** 2013-06-07 12:23:42.783
ASMB (ospid: 32641): terminating the instance due to error 15064
/u01/app/grid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log
NOTE: ASMB process exiting, either shutdown is in progress
NOTE: or foreground connected to ASMB was killed.
Fri Jun 07 12:23:42 2013
NOTE: client exited [14808]
Fri Jun 07 12:23:44 2013
Received an instance abort message from instance 2
Please check instance 2 alert and LMON trace files for detail.
Fri Jun 07 12:23:44 2013
Received an instance abort message from instance 2
Please check instance 2 alert and LMON trace files for detail.
LMD0 (ospid: 31201): terminating the instance due to error 481
Instance terminated by LMD0, pid = 31201
Fri Jun 07 12:24:30 2013
* instance_number obtained from CSS = 1, checking for the existence of node 0...
* node 0 does not exist. instance_number = 1
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
[name='eth1:1', type=1, ip=169.254.37.103, mac=00-26-55-eb-61-89, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface 'eth0' configured from GPnP for use as a public interface.
[name='eth0', type=1, ip=135.33.2.8, mac=00-26-55-eb-61-88, net=135.33.2.0/27, mask=255.255.255.224, use=public/1]
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/11.2.0.2/grid/dbs/arch
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
[grid@odsdb1 cssd]$ file core.30481
core.30481: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'ocssd.bin'
[grid@odsdb1 cssd]$ gdb
gdb gdbserver gdbtui
[grid@odsdb1 cssd]$ gdb ocssd.bin core.30481
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-42.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /u01/app/11.2.0.2/grid/bin/ocssd.bin...(no debugging symbols found)...done.
[New Thread 30486]
[New Thread 30530]
[New Thread 30526]
[New Thread 30525]
[New Thread 30523]
[New Thread 30522]
[New Thread 30521]
[New Thread 30520]
[New Thread 30519]
[New Thread 30504]
[New Thread 30503]
[New Thread 30495]
[New Thread 30485]
[New Thread 30484]
[New Thread 30483]
[New Thread 30481]
Reading symbols from /u01/app/11.2.0.2/grid/lib/libhasgen11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libhasgen11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libocr11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocr11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libocrb11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocrb11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libocrutl11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libocrutl11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libclntsh.so.11.1...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libclntsh.so.11.1
Reading symbols from /u01/app/11.2.0.2/grid/lib/libskgxn2.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libskgxn2.so
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /u01/app/11.2.0.2/grid/lib/libasmclntsh11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libasmclntsh11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libcell11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libcell11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libskgxp11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libskgxp11.so
Reading symbols from /u01/app/11.2.0.2/grid/lib/libnnz11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libnnz11.so
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /usr/lib64/libaio.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libaio.so.1
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /u01/app/11.2.0.2/grid/lib/libnque11.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/11.2.0.2/grid/lib/libnque11.so
Reading symbols from /opt/oracle/extapi/64/asm/orcl/1/libasm.so...(no debugging symbols found)...done.
Loaded symbols for /opt/oracle/extapi/64/asm/orcl/1/libasm.so
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff505fd000
Core was generated by `/u01/app/11.2.0.2/grid/bin/ocssd.bin '.
Program terminated with signal 6, Aborted.
#0 0x000000369ea30265 in raise () from /lib64/libc.so.6
(gdb) where
#0 0x000000369ea30265 in raise () from /lib64/libc.so.6
#1 0x000000369ea31d10 in abort () from /lib64/libc.so.6
#2 0x00002afc67f9aeda in scls_abort (flags=0) at scls.c:7088
#3 0x000000000040babd in clssscExit (thrd=0x10d325a0, status=clssscreasonSHUTNORM) at clsssc.c:2155
#4 0x0000000000446221 in clssgmClientShutdown (thrd=0x10d325a0, cmInfo=0x10b40090) at clssgmc.c:6415
#5 0x0000000000436707 in clssgmProcClientReqs (thrd=0x10d325a0, clctx=0x10b40630) at clssgmc.c:704
#6 0x0000000000436405 in clssgmclientlsnr (thrd=0x10d325a0) at clssgmc.c:644
#7 0x000000000040ac2f in clssscthrdmain (thrd=0x10d325a0) at clsssc.c:1716
#8 0x000000369fa0677d in start_thread () from /lib64/libpthread.so.0
#9 0x000000369ead49ad in clone () from /lib64/libc.so.6
(gdb)
2013-06-07 12:19:37.377: [ CSSD][1085888832]clssscSelect: cookie accept request 0x10b40630
2013-06-07 12:19:37.377: [ CSSD][1085888832]clssgmAllocProc: (0x2aaab0133ea0) allocated
2013-06-07 12:19:37.379: [ CSSD][1085888832]clssgmClientConnectMsg: properties of cmProc 0x2aaab0133ea0 - 1,2,3,4,5
2013-06-07 12:19:37.379: [ CSSD][1085888832]clssgmClientConnectMsg: Connect from con(0x6ae44fa) proc(0x2aaab0133ea0) pid(14139/14139) version 11:2:1:4, properties: 1,2,3,4,5
2013-06-07 12:19:37.379: [ CSSD][1085888832]clssgmClientConnectMsg: msg flags 0x0000
2013-06-07 12:19:37.384: [ CSSD][1085888832]clssscSelect: cookie accept request 0x2aaab0133ea0
2013-06-07 12:19:37.384: [ CSSD][1085888832]clssscevtypSHRCON: getting client with cmproc 0x2aaab0133ea0
2013-06-07 12:19:37.384: [ CSSD][1085888832]clssgmRegisterClient: proc(69/0x2aaab0133ea0), client(1/0x2aaab010c5c0)
2013-06-07 12:19:37.385: [ CSSD][1085888832]clssgmRegisterShared: grp DBODSDB, mbr 0, type 1
2013-06-07 12:19:37.385: [ CSSD][1085888832]clssgmQueueShare: (0x2aaab0085790) target global grock DBODSDB member 0 type 1 queued from client (0x2aaab010c5c0), global grock DBODSDB, refcount 23
2013-06-07 12:19:37.385: [ CSSD][1085888832]clssgmRegisterShared: global grock DBODSDB member 0 share type 1, refcount 23
2013-06-07 12:19:37.391: [ CSSD][1085888832]clssscSelect: cookie accept request 0x2aaab0133ea0
2013-06-07 12:19:37.391: [ CSSD][1085888832]clssscevtypSHRCON: getting client with cmproc 0x2aaab0133ea0
2013-06-07 12:19:37.391: [ CSSD][1085888832]clssgmRegisterClient: proc(69/0x2aaab0133ea0), client(2/0x2aaab0061f10)
what is the problem
Edited by: 徐振富 on 2013-6-7 下午6:38
Edited by: 徐振富 on 2013-6-7 下午6:45is your ASM instance up?
If not, trying bring up ASM instance up just by itself and see if it throws any error?
Post status of crsctl status cluster -all -
RAC node outage causes SOA Suite 10.1.3.4 BPEL failure
Using weblogic 9.2 and the SOA Suite 10.1.3.4. We use a 10g Oracle RAC ( 2 nodes ); the WL cluster has a multi data source of 2 pools, each pool pointing to a single node in the rac, each pool deployed to the cluster, and the multi data source in load-balancing mode.
So the other night, one of the db nodes had a hardware failure ( ironically, with a remote monitoring / management card ). Annoying, but it should not have caused the BPEL servers to be in "FAILED NOT RESTARTABLE" status the next morning.
Jun 9, 2009 12:10:07 AM EDT> <Warning> <JDBC> <BEA-001129> <Received exception while creating connection for pool "esbaqds2": Io exception: The Network Adapter could not establish the connection>
SEVERE: Destroying JMSDequeuer failed
oracle.jms.AQjmsException: Connection has been administratively destroyed. Reconnect.
at oracle.jms.AQjmsSession.preClose(AQjmsSession.java:980)
at oracle.jms.AQjmsObject.close(AQjmsObject.java:409)
at oracle.jms.AQjmsSession.close(AQjmsSession.java:1020)
at oracle.tip.esb.server.dispatch.JMSDequeuer.destroy(JMSDequeuer.java:419)
at oracle.tip.esb.server.dispatch.JMSDequeuer.destroyWithoutUnsubscribing(JMSDequeuer.java:395)
at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:175)
at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
followed by a 2 GB log file containing 1.3 million iterations of the following within the next 10 minutes before the managed servers failed.
java.lang.NullPointerException
at java.lang.String.<init>(String.java:144)
at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:168)
at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
Both managed instances of the BPEL cluster failed, even though the 1st node of the Oracle RAC was still available.
Our 10.3 cluster, also using multi data sources to the same RAC for the OSB components, simply went on about its business using the remaining rac node pool.
Seems to be a single point of failure...We haven't changed the JDBC connection string yet, but we did run a test in the same environment while Oracle support considers the situation.
For the test, we simply shutdown one node of the RAC and watched to see what happens. Within the space of a minute, the JDBC "Failed Reserve Request Count" was increasing by thousands on every refresh of the screen. We restarted the RAC node after 5 minutes, by which time the "Failed Reserve Request Count" was over 190,000
The 2 BPEL managed servers remained in Running status and each created a 660 MB log file within that 5 minutes. In the original outage, the nodes were down for about 15 minutes. Most of the logging is being generated from within the oracle.tip.esb classes, not by the weblogic classes. It looks like that once the pool pointing to the downed RAC node becomes disabled, the Oracle BPEL code is still trying to use it even though the multi-source JNDI is the published lookup:
INFO: JMSDequeuer::createConnection - AQ Topics
java.sql.SQLException: weblogic.common.ResourceException:
esbaqds(esbaqds2): Pool esbaqds2 is disabled, cannot allocate resources to applications..
esbaqds(esbaqds1): Pool esbaqds1 is disabled, cannot allocate resources to applications..
at weblogic.jdbc.common.internal.JDBCUtil.wrapAndThrowResourceException(JDBCUtil.java:250)
at weblogic.jdbc.common.internal.RmiDataSource.getPoolConnection(RmiDataSource.java:348)
at weblogic.jdbc.common.internal.RmiDataSource.getConnection(RmiDataSource.java:364)
at oracle.tip.esb.server.dispatch.JMSDequeuer.createAQConnection(JMSDequeuer.java:559)
at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:159)
at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
Caused by: weblogic.common.ResourceException:
esbaqds(esbaqds2): Pool esbaqds2 is disabled, cannot allocate resources to applications..
esbaqds(esbaqds1): Pool esbaqds1 is disabled, cannot allocate resources to applications..
at weblogic.jdbc.common.internal.MultiPool.searchLoadBalance(MultiPool.java:331)
at weblogic.jdbc.common.internal.MultiPool.findPool(MultiPool.java:202)
at weblogic.jdbc.common.internal.ConnectionPoolManager.reserve(ConnectionPoolManager.java:77)
at weblogic.jdbc.common.internal.RmiDataSource.getPoolConnection(RmiDataSource.java:346)
... 11 more
SEVERE: Failed to process deferred message
oracle.tip.esb.server.dispatch.QueueHandlerException: Error creating "weblogic.common.ResourceException: No good connections available."
at oracle.tip.esb.server.dispatch.JMSDequeuer.createAQConnection(JMSDequeuer.java:661)
at oracle.tip.esb.server.dispatch.JMSDequeuer.dequeue(JMSDequeuer.java:159)
at oracle.tip.esb.server.dispatch.agent.ESBWork.process(ESBWork.java:174)
at oracle.tip.esb.server.dispatch.agent.ESBWork.run(ESBWork.java:132)
at weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)
at weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)
at weblogic.connector.work.WorkRequest.run(WorkRequest.java:93)
at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:181) -
Failover did not happen when one node went down!!! PLEASE HELP
Hi gurus,
Yesterday one disaster struck my RAC database. We have two node cluster and it is 10.2.0.2, both of them located in different sites, yesterday suddenly power went down and the one of the network switch went down and got destructed, node one of RAC database was connected to that switch, but the failover did not happen to the node two as this should be the case when one node goes down the other should be available for all the node one sessions/connections.
when I tried to ping/telnet the node 1, it was not happening because the switch was down, the network guyz connected the cables to other switch available. When I connected to the node 1, it was showing "Oracle is not available" message.
And when I tried the other node, it was the same case but I did not see any error in alert log file. Then my TL restarted both the nodes and then the database was available.
I am very confused that how the failover did not happen and how the database went down, PLEASE suggest something to how to identifiy what was happened. Thanks & RegardsThanks for your reply,
after the network switch was replaced we connected to both the nodes and found that the instances are down with no reason given in the Alertlog file. We just restarted both the instances and then the database was up and the clients connected to both the instances with equal sessions on both the instances. I want to know that whether the failover can be done at the application side or it should be done on the database side i,e; in tnsnames.ora file with the required parameters? as in our scenario there is no failover configuration in the tnsnames.ora file.
Thanks & Regards -
RAC Node eviction question...
Say we have 3 node RAC cluster on OEL5.3. What happens if one node evicted out of it? I know other two instance will do dynamic remastering... and something more.
I want to know eachand every steps in detail. What really happens when one node goes down in RAC environment.
Experts please comment.
Many Thanks.I want to know each and every steps in detail. Assume you know "each and every steps in detail." what will you do differently based upon this information?
Handle: vh_dba
Status Level: Newbie (30)
Registered: Jan 10, 2010
Total Posts: 38
Total Questions: 16 (15 unresolved)
So many questions with only a single answer.
:-( -
hi
one of our RAC environment keep restarting.
i've disable the init.cssd, init.crs, init.evmd in the /etc/inittab in order to check the logs.
this is the situation:
crsd.log:
2009-02-04 00:09:00.118: [ COMMCRS][9]clsc_connect: (8000000100318640) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_loud))
2009-02-04 00:09:00.132: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
2009-02-04 00:09:00.134: [ CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2009-02-04 00:09:08.016: [ CRSD][1]32Daemon Version: 10.2.0.2.0 Active Version: 10.2.0.2.0
2009-02-04 00:09:08.016: [ CRSD][1]32Active Version and Software Version are same
2009-02-04 00:09:08.017: [ CRSMAIN][1]32Initializing OCR
2009-02-04 00:09:08.037: [ OCRRAW][1]proprioo: for disk 0 (/dev/rdsk/ora_ocr_raw), id match (1), my id set (752560621,1028247821) total id sets (1), 1st set
(752560621,1028247821), 2nd set (0,0) my votes (2), total votes (2)
2009-02-04 00:09:08.140: [ CSSCLNT][24]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
ocssd.log:
[ CSSD]2009-02-03 21:52:08.651 [9] >USER: clssnmHandleUpdate: NODE 1 (node1l) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2009-02-03 21:52:08.651 [9] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
[ CSSD]2009-02-03 21:52:08.651 [16] >TRACE: clssnmWaitForAcks: done, msg type(15)
[ CSSD]2009-02-03 21:52:08.651 [16] >TRACE: clssnmDoSyncUpdate: Sync Complete!
[ CSSD]2009-02-03 21:52:08.722 [1] >USER: NMEVENT_SUSPEND [00][00][00][00]
[ CSSD]2009-02-03 21:52:08.724 [17] >TRACE: clssgmReconfigThread: started for reconfig (1)
[ CSSD]2009-02-03 21:52:08.749 [17] >USER: NMEVENT_RECONFIG [00][00][00][02]
[ CSSD]2009-02-03 21:52:08.749 [17] >TRACE: clssgmEstablishConnections: 1 nodes in cluster incarn 1
[ CSSD]2009-02-03 21:52:08.751 [13] >TRACE: clssgmPeerListener: connects done (1/1)
[ CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmEstablishMasterNode: MASTER for 1 is node(1) birth(1)
[ CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
[ CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
[ CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
[ CSSD]CLSS-3000: reconfiguration successful, incarnation 1 with 1 nodes
[ CSSD]CLSS-3001: local node number 1, master node number 1
[ CSSD]2009-02-03 21:52:08.753 [17] >TRACE: clssgmReconfigThread: completed for reconfig(1), with status(1)
[ CSSD]2009-02-03 21:52:08.863 [10] >TRACE: clssgmClientConnectMsg: Connect from con(80000001008fd2a0) proc(8000000100ae26a8) pid() proto(10:2:1:1)
[ CSSD]2009-02-03 21:52:08.864 [10] >TRACE: clssgmClientConnectMsg: Connect from con(8000000100ae0128) proc(8000000100ae2a10) pid() proto(10:2:1:1) from con(8000000100aa32c0) proc(8000000100aa5b90) pid() proto(10:2:1:1)
alertlog:
[cssd(2535)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node1 .
2009-02-03 23:55:20.821
[cssd(2575)]CRS-1605:CSSD voting file is online: /dev/rdsk/ora_voting_raw. Detai ls in /work/crs/product/10.2/crs/log/lourmel/cssd/ocssd.log.
2009-02-03 23:55:28.376
evmd.log:
Oracle Database 10g CRS Release 10.2.0.2.0 Production Copyright 1996, 2004, Oracle. All rights reserved
2009-02-04 00:08:58.331: [ EVMD][1]32EVMD waiting for CSS to be ready err = 3
2009-02-04 00:08:59.939: [ COMMCRS][9]clsc_connect: (800000010007d658) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_loud))
2009-02-04 00:08:59.946: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
2009-02-04 00:08:59.948: [ EVMD][1]32EVMD waiting for CSS to be ready err = 3
2009-02-04 00:09:07.596: [ CSSCLNT][1]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
syslog:
Feb 4 00:08:41 lourmel syslog: Oracle Cluster Ready Services starting up automatically.
Feb 4 00:08:45 lourmel sfd[2153]: starting the daemon.
Feb 4 00:08:45 lourmel su: + tty?? root-orac
Feb 4 00:08:45 lourmel krsd[2152]: Delay time is 300 seconds
Feb 4 00:08:43 lourmel syslog: Oracle Cluster Ready Services starting up automatically.
Feb 4 00:08:52 lourmel above message repeats 2 times
Feb 4 00:08:52 lourmel syslog: Cluster Ready Services completed waiting on dependencies.
Feb 4 00:08:53 lourmel syslog: Running CRSD with TZ =
when i checked(befor the restart) the command crs_stat i got the message:
ORA-0184: Cannot communicate wirh CRS
crsctl check crs gives us:
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
as i said befor, the machine always restarting
anyone have an idea?? pleaseDear All,
I recently upgrade the Few RAC setups with Oracle 10g Patchset 3 (10.2.0.4) on Linux Servers
In one of the RAC setup, found servers are rebooting daily. The same setup was working fine and problem started only after applying the Patchset. Checked all the logs and Found nothing relevant.
Then i checked the things which added with this Patchset.
The Most interesting found , Oracle Added a New Daemon- oprocd.
# ps -efl | grep oprocd
4 S root 6440 6063 0 -40 - - 2114 - Mar03 ? 00:00:00 /opt/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 500 -hsi 5:10:50:75:90 -f
These are Interesting Points about above line
1.This Process is running by root user
2. With Highest Priority -40
3. Probing every Seconds (t 1000)
4. waiting CPU response for 500 Milliseconds ( -m 500 means margin time is 500 Milli Seconds)
5. Process status is Fatal (-f)
Now I am concluding these points- This daemon will probe cpu every second and wait for response within 500 Mill seconds. If in the 500 Milli second not getting any response from the cpu, will assume the CPU is hang and try to Reboot the Machine. The OPERATING SYSTEM will not get enough time to write the system logs and server reboots.
So the solution is increase the Margin time for 500 Milli second to 10 seconds.
These are following steps to increase the Margin time.
Please Remember- The Modification process need Downtime and You need to stop cluster service in all member nodes.
1. Stop The CRS Process
#crsctl stop crs
#<CRS_HOME>/bin/oprocd stop
2. Ensure that Clusterware stack is down and not running
#ps -ef |egrep "crsd.bin|ocssd.bin|evmd.bin|oprocd"
This should return no processes.
3. From one node of the cluster, change the value of the "diagwait" parameter to 13 by issuing the command as root:
#crsctl set css diagwait 13 -force
4. Check if diagwait is successfully set.
#crsctl get css diagwait
5. Restart the Oracle Clusterware on all the nodes by executing:
#crsctl start crs
(Note- If facing any problem to restarting the CRS services, ASM and Database, You can reboot the Nodes.The Cluster and Database will come automatically due to init startup scripts.)
6. The oprocd daemon process will show with -m 10000
# ps -efl| grep oprocd
# 4 S root 6440 6063 0 -40 - - 2114 - Feb02 ? 00:00:00 /opt/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 10000 -hsi 5:10:50:75:90 -f
Rollback Procedure-
If You need to unset oprocd value due any reason
#crsctl unset css diagwait
I am confident, The abnormal RAC Node restart problem will solve with this workaround.
Regards,
Sumit
Bangalore,India -
Hi all,
I am woserking on two node rac environment.One of my rac node is rebooting so frequently.I am using oracle 10g database and clusterware also(10.2.0.1).
Ihave checked os logs(linux AS 4),and rac related logs.Not able to find out anything.Posting all logs please suggest.Hi i am posting alert log,os log and ocssd logs....
clusterware alert log....._
[crsd(5649)]CRS-1201:CRSD started on node ctmisdb1.
2012-03-21 09:50:38.188
[cssd(7490)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 .
2012-03-21 09:50:46.726
[crsd(5649)]CRS-1204:Recovering CRS resources for node ctmisdb2.
2012-03-21 09:55:21.760
[cssd(7490)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 ctmisdb2 .
2012-03-21 12:07:46.681
[cssd(7426)]CRS-1605:CSSD voting file is online: /dev/raw/raw2. Details in /u01/app/oracle/product/crs/log/ctmisdb1/cssd/ocssd.log.
2012-03-21 12:07:50.432
[cssd(7426)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 ctmisdb2 .
2012-03-21 12:07:50.893
[crsd(5549)]CRS-1012:The OCR service started on node ctmisdb1.
2012-03-21 12:07:50.942
[evmd(7304)]CRS-1401:EVMD started on node ctmisdb1.
2012-03-21 12:07:52.827
[crsd(5549)]CRS-1201:CRSD started on node ctmisdb1.
2012-03-21 12:48:41.908
[cssd(7448)]CRS-1605:CSSD voting file is online: /dev/raw/raw2. Details in /u01/app/oracle/product/crs/log/ctmisdb1/cssd/ocssd.log.
2012-03-21 12:48:45.741
[cssd(7448)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 ctmisdb2 .
2012-03-21 12:48:49.173
[crsd(5546)]CRS-1012:The OCR service started on node ctmisdb1.
2012-03-21 12:48:49.190
[evmd(7328)]CRS-1401:EVMD started on node ctmisdb1.
2012-03-21 12:48:50.818
[crsd(5546)]CRS-1201:CRSD started on node ctmisdb1.
2012-03-21 13:26:36.398
[cssd(7343)]CRS-1605:CSSD voting file is online: /dev/raw/raw2. Details in /u01/app/oracle/product/crs/log/ctmisdb1/cssd/ocssd.log.
2012-03-21 13:26:40.492
[cssd(7343)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 ctmisdb2 .
2012-03-21 13:26:40.939
[crsd(5542)]CRS-1012:The OCR service started on node ctmisdb1.
2012-03-21 13:26:40.977
[evmd(7223)]CRS-1401:EVMD started on node ctmisdb1.
2012-03-21 13:26:42.772
[crsd(5542)]CRS-1201:CRSD started on node ctmisdb1.
node os log....+
Mar 21 12:06:35 ctmisdb1 rc: Starting readahead: succeeded
Mar 21 12:06:35 ctmisdb1 messagebus: messagebus startup succeeded
Mar 21 12:06:36 ctmisdb1 cups-config-daemon: cups-config-daemon startup succeeded
Mar 21 12:06:36 ctmisdb1 haldaemon: haldaemon startup succeeded
Mar 21 12:06:37 ctmisdb1 fstab-sync[6267]: removed all generated mount points
Mar 21 12:06:37 ctmisdb1 fstab-sync[6378]: added mount point /media/cdrecorder for /dev/hde
Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6323]: session opened for user oracle by (uid=0)
Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6324]: session opened for user oracle by (uid=0)
Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6229]: session opened for user oracle by (uid=0)
Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6229]: session closed for user oracle
Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6644]: session opened for user oracle by (uid=0)
Mar 21 12:06:37 ctmisdb1 kernel: matroxfb: cannot set xres to 800, rounded up to 832
Mar 21 12:06:37 ctmisdb1 last message repeated 2 times
Mar 21 12:06:41 ctmisdb1 su(pam_unix)[6323]: session closed for user oracle
Mar 21 12:06:41 ctmisdb1 su(pam_unix)[6644]: session closed for user oracle
Mar 21 12:06:41 ctmisdb1 su(pam_unix)[6324]: session closed for user oracle
Mar 21 12:06:41 ctmisdb1 logger: Cluster Ready Services completed waiting on dependencies.
Mar 21 12:06:41 ctmisdb1 last message repeated 2 times
Mar 21 12:06:45 ctmisdb1 gdm(pam_unix)[6379]: session opened for user root by (uid=0)
Mar 21 12:06:46 ctmisdb1 gconfd (root-7052): starting (version 2.8.1), pid 7052 user 'root'
Mar 21 12:06:47 ctmisdb1 gconfd (root-7052): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration source at position 0
Mar 21 12:06:47 ctmisdb1 gconfd (root-7052): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position 1
Mar 21 12:06:47 ctmisdb1 gconfd (root-7052): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source at position 2
Mar 21 12:06:55 ctmisdb1 gconfd (root-7052): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position 0
Mar 21 12:07:41 ctmisdb1 su(pam_unix)[5547]: session opened for user oracle by (uid=0)
Mar 21 12:07:41 ctmisdb1 logger: Running CRSD with TZ =
Mar 21 12:07:43 ctmisdb1 su(pam_unix)[7399]: session opened for user oracle by (uid=0)
Mar 21 12:12:49 ctmisdb1 sshd(pam_unix)[15323]: session opened for user root by root(uid=0)
Mar 21 12:12:57 ctmisdb1 su(pam_unix)[15531]: session opened for user oracle by root(uid=0)
Mar 21 12:47:05 ctmisdb1 syslogd 1.4.1: restart.
ocssd log....
[ CSSD]2012-03-21 11:24:41.045 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661f0c0) proc(0x8006622560) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 11:24:41.078 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660cfe0) proc(0x800662ba70) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:07:44.564 >USER: Oracle Database 10g CSS Release 10.2.0.1.0 Production Copyright 1996, 2004 Oracle. All rights reserved.
[ clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=ctmisdb1DBG_CSSD))
[ CSSD]2012-03-21 12:07:44.564 >USER: CSS daemon log for node ctmisdb1, number 1, in cluster crs
[ CSSD]2012-03-21 12:07:44.581 [28260544] >TRACE: clssscmain: local-only set to false
[ CSSD]2012-03-21 12:07:44.603 [28260544] >TRACE: clssnmReadNodeInfo: added node 1 (ctmisdb1) to cluster
[ CSSD]2012-03-21 12:07:44.621 [28260544] >TRACE: clssnmReadNodeInfo: added node 2 (ctmisdb2) to cluster
[ CSSD]2012-03-21 12:07:44.627 [72925824] >TRACE: clssnm_skgxnmon: skgxn init failed, rc 1
[ CSSD]2012-03-21 12:07:44.627 [28260544] >TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor
[ CSSD]2012-03-21 12:07:44.641 [28260544] >TRACE: clssnmInitNMInfo: misscount set to 60
[ CSSD]2012-03-21 12:07:44.655 [28260544] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw2)
[ CSSD]2012-03-21 12:07:46.661 [72925824] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw2)
[ CSSD]2012-03-21 12:07:46.690 [72925824] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(18) wrtcnt(7920) LATS(0) Disk lastSeqNo(7920)
[ CSSD]2012-03-21 12:07:46.752 [28260544] >TRACE: clssnmFatalInit: fatal mode enabled
[ CSSD]2012-03-21 12:07:46.752 [94777984] >TRACE: clssnmconnect: connecting to node 1, flags 0x0001, connector 1
[ CSSD]2012-03-21 12:07:46.753 [94777984] >TRACE: clssnmconnect: connecting to node 0, flags 0x0000, connector 1
[ CSSD]2012-03-21 12:07:46.753 [94777984] >TRACE: clssnmClusterListener: Probing node(2)
[ CSSD]2012-03-21 12:07:46.755 [94777984] >TRACE: clssnmConnComplete: connected to node 2 (con 0x8006601040), state 3 birth 0, unique 1332303918/1332303918 prevConuni(0)
[ CSSD]2012-03-21 12:07:46.756 [106332800] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
[ CSSD]2012-03-21 12:07:46.756 [106332800] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_ctmisdb1_crs))
[ CSSD]2012-03-21 12:07:46.757 [151810688] >TRACE: clssnmPollingThread: Connection complete
[ CSSD]2012-03-21 12:07:46.757 [162296448] >TRACE: clssnmSendingThread: Connection complete
[ CSSD]2012-03-21 12:07:46.757 [172782208] >TRACE: clssnmRcfgMgrThread: Connection complete
[ CSSD]2012-03-21 12:07:46.757 [172782208] >TRACE: clssnmRcfgMgrThread: Local Join
[ CSSD]2012-03-21 12:07:46.757 [172782208] >WARNING: clssnmLocalJoinEvent: takeover aborted due to connected but inactive nodes
[ CSSD]2012-03-21 12:07:47.339 [94777984] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[ctmisdb2] seq[5] sync[18]
[ CSSD]2012-03-21 12:07:47.759 [172782208] >TRACE: clssnmRcfgMgrThread: lastleader(2) unique(1332311864)
[ CSSD]2012-03-21 12:07:48.341 [94777984] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(18)
[ CSSD]2012-03-21 12:07:50.346 [94777984] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[ CSSD]2012-03-21 12:07:50.346 [94777984] >TRACE: clssnmDeactivateNode: node 0 () left cluster
[ CSSD]2012-03-21 12:07:50.346 [94777984] >TRACE: clssnmUpdateNodeState: node 1, state (1/2) unique (1332311864/1332311864) prevConuni(0) birth (0/18) (old/new)
[ CSSD]2012-03-21 12:07:50.346 [94777984] >TRACE: clssnmUpdateNodeState: node 2, state (4/3) unique (1332303918/1332303918) prevConuni(0) birth (0/16) (old/new)
[ CSSD]2012-03-21 12:07:50.346 [94777984] >USER: clssnmHandleUpdate: SYNC(18) from node(2) completed
[ CSSD]2012-03-21 12:07:50.346 [94777984] >USER: clssnmHandleUpdate: NODE 1 (ctmisdb1) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2012-03-21 12:07:50.346 [94777984] >USER: clssnmHandleUpdate: NODE 2 (ctmisdb2) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2012-03-21 12:07:50.429 [28260544] >USER: NMEVENT_SUSPEND [00][00][00][00]
[ CSSD]2012-03-21 12:07:50.429 [183267968] >TRACE: clssgmReconfigThread: started for reconfig (18)
[ CSSD]2012-03-21 12:07:50.429 [183267968] >USER: NMEVENT_RECONFIG [00][00][00][06]
[ CSSD]2012-03-21 12:07:50.429 [183267968] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 18
[ CSSD]2012-03-21 12:07:50.430 [140255872] >TRACE: clssgmInitialRecv: (0x102a0360) accepted a new connection from node 2 born at 16 active (2, 2), vers (10,3,1,2)
[ CSSD]2012-03-21 12:07:50.430 [140255872] >TRACE: clssgmInitialRecv: conns done (2/2)
[ CSSD]2012-03-21 12:07:50.430 [183267968] >TRACE: clssgmEstablishMasterNode: MASTER for 18 is node(2) birth(16)
[ CSSD]2012-03-21 12:07:50.430 [183267968] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
[ CSSD]2012-03-21 12:07:50.432 [140255872] >TRACE: clssgmHandleDBDone(): src/dest (2/65535) size(72) incarn 18
[ CSSD]CLSS-3000: reconfiguration successful, incarnation 18 with 2 nodes
[ CSSD]CLSS-3001: local node number 1, master node number 2
[ CSSD]2012-03-21 12:07:50.433 [183267968] >TRACE: clssgmReconfigThread: completed for reconfig(18), with status(1)
[ CSSD]2012-03-21 12:07:50.550 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006603bb0) proc(0x8006608b00) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:07:50.551 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x80066066f0) proc(0x8006608d70) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:07:53.569 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660ec70) proc(0x8006611260) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:00.829 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006610990) proc(0x800660de00) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:04.698 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006613030) proc(0x8006612930) pid(8115) proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:04.816 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006612950) proc(0x8006613c20) pid(8115) proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:04.832 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006612950) proc(0x8006613c20) pid(8115) proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:06.615 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006612950) proc(0x8006613c20) pid(8171) proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:07.114 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006615960) proc(0x8006616350) pid(8175) proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:11.373 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x80066192a0) proc(0x8006619470) pid(8302) proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:11.669 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661bf60) proc(0x800661ee20) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:17.135 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661bf60) proc(0x800661ee70) pid(8458) proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:17.268 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661fc00) proc(0x80066220d0) pid(8460) proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:17.305 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x80066223e0) proc(0x8006625250) pid(8462) proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:17.353 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006625560) proc(0x8006628430) pid(8464) proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:24.585 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006625560) proc(0x8006628430) pid(8645) proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:27.957 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006628740) proc(0x800662b610) pid(8722) proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:30.931 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800662cce0) proc(0x800662c860) pid(8801) proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:36.400 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661c5f0) proc(0x800661eb50) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:37.863 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800662f1c0) proc(0x800661eee0) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:38.537 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800662f1c0) proc(0x800661d500) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:39.232 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661bf60) proc(0x800661d500) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:43.085 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006611210) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:08:58.971 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x80066112c0) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:09:59.290 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800660b190) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:10:59.589 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800660b190) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:11:59.904 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800660b190) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:13:00.203 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800660b190) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:13:14.029 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800660b190) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:14:00.501 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006611210) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:15:00.809 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:16:01.117 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:17:01.447 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800662f0f0) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:01.762 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:39.841 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:42.123 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006628670) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:42.316 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:42.843 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:42.963 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006628670) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:43.098 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b260) proc(0x800662bd20) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:44.173 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006628670) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:44.368 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b260) proc(0x800660b310) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:45.351 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006628670) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:46.236 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:47.031 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:47.694 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:47.819 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b260) proc(0x800660b310) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:48.103 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:48.327 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b260) proc(0x800660b310) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:48.484 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006611210) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:48.758 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800662f0f0) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:49.529 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:50.509 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800662f0f0) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:51.060 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:18:51.558 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800662f0f0) pid() proto(10:2:1:1)
[ CSSD]2012-03-21 12:48:39.836 >USER: Oracle Database 10g CSS Release 10.2.0.1.0 Production Copyright 1996, 2004 Oracle. All rights reserved.
[ clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=ctmisdb1DBG_CSSD))
[ CSSD]2012-03-21 12:48:39.836 >USER: CSS daemon log for node ctmisdb1, number 1, in cluster crs
[ CSSD]2012-03-21 12:48:39.849 [28260544] >TRACE: clssscmain: local-only set to false
[ CSSD]2012-03-21 12:48:39.865 [28260544] >TRACE: clssnmReadNodeInfo: added node 1 (ctmisdb1) to cluster
[ CSSD]2012-03-21 12:48:39.872 [28260544] >TRACE: clssnmReadNodeInfo: added node 2 (ctmisdb2) to cluster
[ CSSD]2012-03-21 12:48:39.879 [72925824] >TRACE: clssnm_skgxnmon: skgxn init failed, rc 1
[ CSSD]2012-03-21 12:48:39.879 [28260544] >TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor
[ CSSD]2012-03-21 12:48:39.881 [28260544] >TRACE: clssnmInitNMInfo: misscount set to 60
[ CSSD]2012-03-21 12:48:39.888 [28260544] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw2)
[ CSSD]2012-03-21 12:48:41.892 [72925824] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw2)
[ CSSD]2012-03-21 12:48:41.915 [72925824] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(20) wrtcnt(10367) LATS(0) Disk lastSeqNo(10367)
[ CSSD]2012-03-21 12:48:41.959 [28260544] >TRACE: clssnmFatalInit: fatal mode enabled
[ CSSD]2012-03-21 12:48:41.959 [94777984] >TRACE: clssnmconnect: connecting to node 1, flags 0x0001, connector 1
[ CSSD]2012-03-21 12:48:41.959 [94777984] >TRACE: clssnmconnect: connecting to node 0, flags 0x0000, connector 1
[ CSSD]2012-03-21 12:48:41.959 [94777984] >TRACE: clssnmClusterListener: Probing node(2)
[ CSSD]2012-03-21 12:48:41.961 [94777984] >TRACE: clssnmConnComplete: connected to node 2 (con 0x8006702790), state 3 birth 0, unique 1332303918/1332303918 prevConuni(0)
[ CSSD]2012-03-21 12:48:41.962 [106332800] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
[ CSSD]2012-03-21 12:48:41.962 [106332800] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_ctmisdb1_crs))
[ CSSD]2012-03-21 12:48:41.963 [152330880] >TRACE: clssnmPollingThread: Connection complete
[ CSSD]2012-03-21 12:48:41.963 [162816640] >TRACE: clssnmSendingThread: Connection complete
[ CSSD]2012-03-21 12:48:41.963 [173302400] >TRACE: clssnmRcfgMgrThread: Connection complete
[ CSSD]2012-03-21 12:48:41.963 [173302400] >TRACE: clssnmRcfgMgrThread: Local Join
[ CSSD]2012-03-21 12:48:41.963 [173302400] >WARNING: clssnmLocalJoinEvent: takeover aborted due to connected but inactive nodes
[ CSSD]2012-03-21 12:48:42.631 [94777984] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[ctmisdb2] seq[13] sync[20]
[ CSSD]2012-03-21 12:48:42.965 [173302400] >TRACE: clssnmRcfgMgrThread: lastleader(2) unique(1332314319)
[ CSSD]2012-03-21 12:48:43.636 [94777984] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(20)
[ CSSD]2012-03-21 12:48:45.640 [94777984] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[ CSSD]2012-03-21 12:48:45.640 [94777984] >TRACE: clssnmDeactivateNode: node 0 () left cluster
[ CSSD]2012-03-21 12:48:45.640 [94777984] >TRACE: clssnmUpdateNodeState: node 1, state (1/2) unique (1332314319/1332314319) prevConuni(0) birth (0/20) (old/new)
[ CSSD]2012-03-21 12:48:45.640 [94777984] >TRACE: clssnmUpdateNodeState: node 2, state (4/3) unique (1332303918/1332303918) prevConuni(0) birth (0/16) (old/new)
[ CSSD]2012-03-21 12:48:45.640 [94777984] >USER: clssnmHandleUpdate: SYNC(20) from node(2) completed
[ CSSD]2012-03-21 12:48:45.640 [94777984] >USER: clssnmHandleUpdate: NODE 1 (ctmisdb1) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2012-03-21 12:48:45.640 [94777984] >USER: clssnmHandleUpdate: NODE 2 (ctmisdb2) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2012-03-21 12:48:45.737 [28260544] >USER: NMEVENT_SUSPEND [00][00][00][00]
[ CSSD]2012-03-21 12:48:45.738 [183788160] >TRACE: clssgmReconfigThread: started for reconfig (20)
[ CSSD]2012-03-21 12:48:45.738 [183788160] >USER: NMEVENT_RECONFIG [00][00][00][06]
[ CSSD]2012-03-21 12:48:45.738 [183788160] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 20
[ CSSD]2012-03-21 12:48:45.739 [140776064] >TRACE: clssgmInitialRecv: (0x102a0370) accepted a new connection from node 2 born at 16 active (2, 2), vers (10,3,1,2)
[ CSSD]2012-03-21 12:48:45.739 [140776064] >TRACE: clssgmInitialRecv: conns done (2/2)
[ CSSD]2012-03-21 12:48:45.739 [183788160] >TRACE: clssgmEstablishMasterNode: MASTER for 20 is node(2) birth(16)
[ CSSD]2012-03-21 12:48:45.739 [183788160] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
[ CSSD]2012-03-21 12:48:45.741 [140776064] >TRACE: clssgmHandleDBDone(): src/dest (2/65535) size(72) incarn 20
[ CSSD]CLSS-3000: reconfiguration successful, incarnation 20 with 2 nodes
Plz check and help.......... -
I have a pre-production 2-node cluster running on Solaris 10, Oracle 10.2.0.3 with the Oracle CRS, and using a NetApp filer as the shared storage.
I also have a separate Solaris server running Grid Control 10.2.0.3, with the repository as one of the databases on the RAC (don't know if this is relevant to my problem).
Periodically both RAC nodes reboot, with no trace of why (the GC server is fine). There is nothing logged in the Solaris logs (messages file), CRS logs, Oracle logs or the NetApp logs.
All that is shown is the relevant service starting up following the shutdown.
Has anyone any experience of this, or any thoughts on which component may cause such an issue?
Thanks in advance
BobWhat type of Sun hardware are you using?
Below is the Action Plan Oracle support sent me on my SR on this issue, not sure if any of this was provided to you or would be of help.
ACTION PLAN
============
1. there is nothing on the files at all that sheds any light on the issue
agian 3 sperate sets of clusters all losing all nodes at the same tiem is a very strange occurance. Please be sure to have the admin look for
anything in common wiht all custers.
2. advice placing oswatcher on the systems Note.301137.1 Ext/Pub OS Watcher User Guide
if we should have another occurances we will want the oswatcher logs for 1 hr before issue thru issue
also see if the unix admin perhaps has any os stats from this occurance
3. advice settign ntpd to run with -x option I do see that you are having negative time changes
at times
-x will give us a skew rather then an abbrupt time change
4. advice setting this when you can
Please do the following
set the diagwait parameter:
crsctl set css diagwait N [-force]
Where N is the number of seconds to wait for a filesystem sync to
complete (after this wait the node will reboot regardless of whether the
sync has completed). This change must be made with the clusterware
down, which will require the '-force', or with the stack up on just 1
node, after which the stack on that node must be restarted before the
stack starts up on any of the other nodes.
N should be set to 25 (25 seconds)
5. advice that you have with pcw mlr#6 Patch 5980915 on the systems as well
but I do not believe that this was an oracle bug the reason for placing the patch on is for advanced diagnostics that is in that patchset
6. the two issues sun is workking on
Sun is working to resolve a time skew issue and a Solaris 10 kernel SIGALRM Sun#6292092 in addition to Sun#6595936.
7. we do have a diagnostic oprocd that soem sites have used but on thier test systems. It stops reboots adn dumps information but I have
been hesitant to place it on production boxes if you continue to have issues we may consider download the oprocd_skewfix_noreboot fro
m Bug 6279879 but at this time I do not belvve that is warrented
Maybe you are looking for
-
Error while importing Forms in Planning using Formdefutil
Hi, I am getting the following error when i try to import the forms into planning using Formdefutil utility provided. It used to work fine but today i am getting this error. Warning: unable to read transaction.interoperability config property* Settin
-
WHY IS MY PB G4 RUNNING SO SLOW?
Hi there, I have the PB G4 and I'm using broandband internet which is super fast. I clear the history, empty cache in Safari and trash the icon folder but I still don't know why it's taking so long to load a page when I'm online. Please help!
-
Full PAYMENT - 50% cash and 50% cheque...
Hi All Customer paid full amount 50% by cash and 50% by cheque... how is it reflected in SD document. I know this scenario is a standard scenario in retail business... so IS - Retail shoud be having a solution for this... is there a solution for this
-
Partial transparency with save for web gif?
Hey all, I remember back in the day of photoshop 6 or maybe even 7, when I would create partial transparent .gif files through the "view menu" export transparency button, or something like that, and it would let me create beautiful partial transparen
-
ENV: ECC6.0 EHP4 + CE7.2 there are many guide to install CE7.2,but after install the CE7.2 and publish the service in SOAManager. there are no endpoint in SR. i guess There are no endpoint in ECC Server.maybe OTL does any one have some guide that set