Rac is down

Hi alll. I have a huge problem with oracle RAC. I can't startup crs:
[oracle@host ~]$ sudo crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
[oracle@host ~]$ sudo crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
[oracle@host ~]$What may be the problem?
Enviroment: 2 node rac, oracle 10.2.0.4, rhel 5.3, nfs.

2009-08-06 11:18:55.623: [ COMMCRS][3176671696]Authentication OSD error, op: scls_auth_response_set
loc: open
info: failed to open
dep: 28
2009-08-06 11:18:55.623: [ CSSCLNT][3176671696]clsssInitNative: connect failed, rc 2
2009-08-06 11:18:55.623: [ CRSRTI][3176671696]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2009-08-06 11:18:56.859: [ COMMCRS][3176671696]Authentication OSD error, op: scls_auth_response_set
loc: open
info: failed to open
dep: 28
2009-08-06 11:18:56.859: [ CSSCLNT][3176671696]clsssInitNative: connect failed, rc 2
2009-08-06 11:18:56.859: [ CRSRTI][3176671696]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2009-08-06 11:18:58.091: [ COMMCRS][3176671696]Authentication OSD error, op: scls_auth_response_set
loc: open
info: failed to open
dep: 28
2009-08-06 11:18:58.091: [ CSSCLNT][3176671696]clsssInitNative: connect failed, rc 2
2009-08-06 11:18:58.091: [ CRSRTI][3176671696]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..alert:
[crsd(2566)]CRS-1204:Recovering CRS resources for node node2.
[cssd(3242)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node1 node2 .
2009-08-05 09:30:16.058
[cssd(3242)]CRS-1612:node node2 (2) at 50% heartbeat fatal, eviction in 29.058 seconds
2009-08-05 09:30:17.060
[cssd(3242)]CRS-1612:node node2 (2) at 50% heartbeat fatal, eviction in 28.058 seconds
2009-08-05 09:30:31.068
[cssd(3242)]CRS-1611:node node2 (2) at 75% heartbeat fatal, eviction in 14.058 seconds
2009-08-05 09:30:32.070
[cssd(3242)]CRS-1611:node node2 (2) at 75% heartbeat fatal, eviction in 13.048 seconds
2009-08-05 09:30:40.066
[cssd(3242)]CRS-1610:node node2 (2) at 90% heartbeat fatal, eviction in 5.058 seconds
2009-08-05 09:30:41.068
[cssd(3242)]CRS-1610:node node2 (2) at 90% heartbeat fatal, eviction in 4.058 seconds
2009-08-05 09:30:42.070
[cssd(3242)]CRS-1610:node node2 (2) at 90% heartbeat fatal, eviction in 3.058 seconds
2009-08-05 09:30:43.072
[cssd(3242)]CRS-1610:node node2 (2) at 90% heartbeat fatal, eviction in 2.048 seconds
2009-08-05 09:30:44.064
[cssd(3242)]CRS-1610:node node2 (2) at 90% heartbeat fatal, eviction in 1.058 seconds
2009-08-05 09:30:45.066
[cssd(3242)]CRS-1610:node node2 (2) at 90% heartbeat fatal, eviction in 0.058 seconds
2009-08-05 09:30:45.638
[cssd(3242)]CRS-1607:CSSD evicting node node2. Details in /mnt/rac/crs/log/node1/cssd/ocssd.log.
[cssd(3242)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node1 .
2009-08-05 09:31:23.512
[crsd(2566)]CRS-1204:Recovering CRS resources for node node2.
[cssd(3242)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node1 node2 .
2009-08-06 09:38:03.947
[cssd(4984)]CRS-1605:CSSD voting file is online: /mnt/rac/voting_disk. Details in /mnt/rac/crs/log/node1/cssd/ocssd.log.
[cssd(4984)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node1 node2 .
[oracle@node1 node1]$

Similar Messages

What steps to follow to make RAC Database down.

hi all,
I need to know the order which we have to follow in making RAC database Down Completely,
Information reg database:
OS::IBM AIX,
ASM Storage,
2 node RAC,
2 databases.
order in the sence to shutdown the RAC database first what we have to shutdown like ,
database,asm,cluster, etc.and also give respective commands for reference.
Regards,
vamsi.

844795 wrote:
hi all,
I need to know the order which we have to follow in making RAC database Down Completely,
Information reg database:
OS::IBM AIX,
ASM Storage,
2 node RAC,
2 databases.
order in the sence to shutdown the RAC database first what we have to shutdown like ,
database,asm,cluster, etc.and also give respective commands for reference.
Regards,
vamsi.Stopping the Oracle RAC 10g Environment
The first step is to stop the Oracle instance. When the instance (and related services) is down, then bring down the ASM instance. Finally, shut down the node applications (Virtual IP, GSD, TNS Listener, and ONS).
$ export ORACLE_SID=orcl1
$ emctl stop dbconsole
$ srvctl stop instance -d orcl -i orcl1
$ srvctl stop asm -n linux1
$ srvctl stop nodeapps -n linux1
Starting the Oracle RAC 10g Environment
The first step is to start the node applications (Virtual IP, GSD, TNS Listener, and ONS). When the node applications are successfully started, then bring up the ASM instance. Finally, bring up the Oracle instance (and related services) and the Enterprise Manager Database console.
$ export ORACLE_SID=orcl1
$ srvctl start nodeapps -n linux1
$ srvctl start asm -n linux1
$ srvctl start instance -d orcl -i orcl1
$ emctl start dbconsole
Start/Stop All Instances with SRVCTL
Start/stop all the instances and their enabled services. I have included this step just for fun as a way to bring down all instances!
$ srvctl start database -d orcl
$ srvctl stop database -d orcl
reference:http://www.rampant-books.com/art_hunter_rac_start_stop_cluster.htm
refer the links for more informations:
Starting and Stopping Instances and Oracle Real Application Clusters Databases
http://download.oracle.com/docs/cd/B19306_01/rac.102/b14197/dbinstmgt.htm#BCEBGHHC
Server Control Utility Reference
http://download.oracle.com/docs/cd/B19306_01/rac.102/b14197/srvctladmin.htm
answered by ssolbach
Just a minor comment on the stop nodeapps.
While it is fine to stop the nodeapps on the server, the drawback to this is, that the VIP will not failover if you stop the nodeapps, but will be stopped.
Hence if you only shutdown one server, then you are causing clients to fail to connect to the VIP and having to wait for the TCP/Timeout.
So if you are not going to shut down all the server, but just want to shutdown one node, you should failover the VIP the the other node.
See: Note 749160.1 Vip Does Not Failover When Nodeapps Stopped
So it is sometimes better instead of stopping the nodeapps, to simply shutdown the cluster with crsctl stop crs (which will failover the VIP).
Sebastian
reference:-
Re: RAC Questions

RAC Node down and ORA-12514

I have a two node rac setup. One Node went down because of hardware issues. And it seems that I cannot connect from client (jdbc) when SCAN gives particular ip.
I receive : ORA-12514, TNS:listener does not currently know of service requested in connect descriptor. If DNS returns the correct ip - everything works fine.
connection string:
jdbc:oracle:thin:@(DESCRIPTION= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST=testracscan.internal.int)(PORT=1521)) (CONNECT_DATA=(SERVICE_NAME=testdb.internal.int)))
Interfaces show that VIPS and SCANS are assigned correctly on Node 1:
vlan65 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
inet addr:192.168.2.10 Bcast:192.168.2.255 Mask:255.255.255.0
inet6 addr: fe80::2e76:8aff:fe4f:b5cc/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:937195 errors:0 dropped:0 overruns:0 frame:0
TX packets:852745 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:186434457 (177.7 MiB) TX bytes:141217705 (134.6 MiB)
vlan65:1 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
inet addr:192.168.2.25 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
vlan65:2 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
inet addr:192.168.2.35 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
vlan65:3 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
inet addr:192.168.2.30 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
vlan65:4 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
inet addr:192.168.2.110 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
vlan65:5 Link encap:Ethernet HWaddr 2C:76:8A:4F:B5:CC
inet addr:192.168.2.115 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
[oracle@srvtestdb1 ~]$ lsnrctl status
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 03-SEP-2012 15:35:05
Copyright (c) 1991, 2011, Oracle. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
Alias LISTENER
Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date 29-AUG-2012 15:52:57
Uptime 4 days 23 hr. 42 min. 7 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File /u01/app/grid/diag/tnslsnr/srvtestdb1/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.2.10)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.2.110)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "testdb.internal.int" has 1 instance(s).
Instance "testdb1", status READY, has 1 handler(s) for this service...
Service "testdbXDB.internal.int" has 1 instance(s).
Instance "testdb1", status READY, has 1 handler(s) for this service...
Service "testdbsvc.internal.int" has 1 instance(s).
Instance "testdb1", status READY, has 1 handler(s) for this service...
The command completed successfully
[oracle@srvtestdb1 ~]$
SQL> show parameter listener
NAME TYPE VALUE
listener_networks string
local_listener string (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.2.110)(PORT=1521))))
remote_listener string testracscan.internal.int:1521
nslookup testracscan.internal.int
Server: 192.168.0.18
Address: 192.168.0.18#53
Name: testracscan.internal.int
Address: 192.168.2.30
Name: testracscan.internal.int
Address: 192.168.2.25
Name: testracscan.internal.int
Address: 192.168.2.35
Problems arise when client ip is resolved to 192.168.2.35 - i get ORA12514.
When IP is resolved to 192.168.2.110 it simply sits ant waits for a moment and then begins to work, and nestat shows:
tcp 0 0 ::ffff:1 192.168.2.5:51685 ::ffff:192.168.2.110:1521 ESTABLISHED
What might be causing this?

[grid@srvtestdb1 ~]$ ps -ef|grep tns
root 65 2 0 Aug29 ? 00:00:00 [netns]
grid 4449 1 0 Aug29 ? 00:00:25 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN2 -inherit
grid 4454 1 0 Aug29 ? 00:00:23 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN3 -inherit
grid 4481 1 0 Aug29 ? 00:00:33 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER -inherit
grid 37028 1 0 09:38 ? 00:00:00 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
grid 37901 36372 0 09:45 pts/0 00:00:00 grep tns
[grid@srvtestdb1 ~]$
[grid@srvtestdb1 ~]$ srvctl config scan_listener
SCAN Listener LISTENER_SCAN1 exists. Port: TCP:1521
SCAN Listener LISTENER_SCAN2 exists. Port: TCP:1521
SCAN Listener LISTENER_SCAN3 exists. Port: TCP:1521
[grid@srvtestdb1 ~]$
[grid@srvtestdb1 ~]$ srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is running on node srvtestdb1
SCAN Listener LISTENER_SCAN2 is enabled
SCAN listener LISTENER_SCAN2 is running on node srvtestdb1
SCAN Listener LISTENER_SCAN3 is enabled
SCAN listener LISTENER_SCAN3 is running on node srvtestdb1
[grid@srvtestdb1 ~]$ srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node srvtestdb1
SCAN VIP scan2 is enabled
SCAN VIP scan2 is running on node srvtestdb1
SCAN VIP scan3 is enabled
SCAN VIP scan3 is running on node srvtestdb1

10g RAC Node down Enterprise Manager

Hello
We have a 2 node Oracle 10g RAC Rel2 Linux setup. Enterprise Manager was first stalled on Node1 and we access it using http://node1:5500/em.
This node has a hardware failure and is out of commision at the moment. When I try to connect to http://node2.5500/em it does not work.
I see the dbconsole process is running onthe node2.
How can I use the Enterprise Manager if the node1 is down?
Thanks

Can you try deconfig and config again on node 2?
emctl stop dbconsol
emca -deconfig dbcontrol
emca -config dbcontrolSalman

RAC node down

Hi All,
Oracle 10.2.0
Windows server 2003
My doubt is, In the Oracle RAC server if the there are some connections to the node1 and node2 through some applications.
If node1 is down, Will the existing connections to the node1 will be redirected to the node2 ?
Please Advice..
TIA,

If you did configure transparent application failover (TAF) in the tnsnames.ora of the clients, yes.
You can read about TAF in the Net Administrators Manaul at http://tahiti.oracle.com
Sybrand Bakker
Senior Oracle DBA

From where did i get 9i RAC free down load for Windows 2 platform

Can any one tell me how can i download 9i RAC for windows 2000.
Thanks in advance
Vijaya Bhaskar Utla.

Just having the same issue there, which is real not fun, I have not tried to contact them, but looking at your experience I am not sure if I should.
My key comes from the back of a laptop o so I guess that would be a W7
OEM, and that could explain why you can't download a ISO from their website, would that also your case ?
Also could it be because we are attempting to get the ISO after the product became EOL ?
http://windows.microsoft.com/en-us/windows/lifecycle
Maybe I will give a go with the customer care people and maybe I will get a right answer ;)
Will keep you posted

Application responds FASTER with one of two RAC instances DOWN.

Anyone in the forum ever had a case where a twon-node RAC database's application responds fast when only one of instances are alive?
We have performed serveral tests, and the evidence is clear: if both instances are running, the response time is slower than with only one instance up. It doesn't matter which of the two that are up though.
The application is a GIS tool, which means that there may be lots of calculations in the database (I assume). I tried to increase the sga_target parameter slightly, but there was no change in response time. Unfortunately I cannot increase the memory as much as I want to because of hardware limitations. Currently the database has a sga_target of 1024M on both instances.
Any idea that could point me in the right direction would be appreciated.
- Vegard

Yes I do, confirmed. It is a regular ehternet connection, 1 GB, with a gigabit swich between them as recommended by Oracle.
We have also confirmed that the application using the RAC database, does not like the concept of RAC - it performs very well with one instance alive. As soon as we open the other one, the response time increases considerably.
This lead me, too, to believe that there's a problem with the RAC intercommunication. My collegue which is a sysadmin has suggested to look into whether or not "jumbo frames" can be enabled. Our network cards do support this setting, so we might buy some performance gain there.
The application using the database is a GIS-application, so there is a lot of graphics being generated by the data in the database and a SELECT involves lots and lots of tables. The transactions however, involves only a few tables. So there shouldn't be any I/O issues here, also confirmed when looking at the performance graphs in EM.
I appreciate your comments.
- Vegard

10g RAC's listener cannot startup when the node's back from down

Hello
My 10g RAC node1 down and I reboot it ,
then everything's back except the listener of node1.
[oracle@node2 crsd]$ crs_stat -t
Name Type Target State Host
ora....SM1.asm application ONLINE ONLINE node1
ora....E1.lsnr application ONLINE OFFLINE
ora.node1.gsd application ONLINE ONLINE node1
ora.node1.ons application ONLINE ONLINE node1
ora.node1.vip application ONLINE ONLINE node1
ora....SM2.asm application ONLINE ONLINE node2
ora....E2.lsnr application ONLINE ONLINE node2
ora.node2.gsd application ONLINE ONLINE node2
ora.node2.ons application ONLINE ONLINE node2
ora.node2.vip application ONLINE ONLINE node2
ora.v20db.db application ONLINE ONLINE node2
ora....b1.inst application ONLINE ONLINE node1
ora....b2.inst application ONLINE ONLINE node2
And I did start it manually, but failed:
[oracle@node2 crsd]$ crs_start ora.node1.LISTENER_NODE1.lsnr
Attempting to start `ora.node1.LISTENER_NODE1.lsnr` on member `node1`
`ora.node1.LISTENER_NODE1.lsnr` on member `node1` has experienced an unrecoverable failure.
Human intervention required to resume its availability.
CRS-0215: Could not start resource 'ora.node1.LISTENER_NODE1.lsnr'.
What does that mean?
and what should I do?
Thank a lot.

You should first force stop it by doing:
crs_stop -f ora.node1.LISTENER_NODE1.lsnr
and then do :
crs_start ora.node1.LISTENER_NODE1.lsnr
either the listener will start or it will show you the error which will point to the real cause.
regards

10g R2 RAC on Solaris 10 with EMC Storage

We are in the process of setting up 3/4 node RAC with the following components:
Oracle 10g R2 RAC
Oracle Clusterware / Sun Cluster / Veritas Cluster
Sun Solaris 10
EMC storage
ASM/Cluster FS
I would appreciate if some one can through some light on:
* ) Veritas cluster / Sun Cluster is must component or Can I use Oracle clusterware ? what are the advantages and disadvantages of using Oracl clusterware compare with varitas cluster or sun cluster
* ) Is cluster filesystem a compulsory component or Can I use ASM instead of Cluster File system.
* ) If I don't use cluster filesystem where to put CRS repository and voting disk ?
* ) What is best option for Oracle_Home, is it shared oracle home or sepereate oracle_home on each node ?
* ) Are there any known risks invovled in using ASM. How is the I/O performance with ASM on EMC with Solaris ? Are there any best practices
* ) Is GigE okay for interconnect or do I need to go for Infiniband ?
* ) Is there any notes on Best practices for the above components
*) Do I need to consider fail over option for NIC's (interconnect and public), if yes, how to do that ?
*) Are there any other risks do I need to consider ?
Thanks
G

Hi,
I see lot of good input. I have done few RAC installs on sun/solaris/emc ...
Here are few things to consider.
* ) Veritas cluster / Sun Cluster is must component or Can I use Oracle clusterware ? what are the advantages and disadvantages of using Oracl clusterware compare with varitas cluster or sun cluster
Just stay with Oracle Clusterware. If there are any issues then you only have to deal with one vendor and there will be no finger pointing. In any case Oracle Clusterware is needed even if you install Veritas/Sun.
* ) Is cluster filesystem a compulsory component or Can I use ASM instead of Cluster File system.
For the database you can use ASM. The only time I have considered a cluster filesystem is if external tables were in use.
When you use ASM you need to partition the disk with 1 meg offset or start at cyclinder 1.
* ) If I don't use cluster filesystem where to put CRS repository and voting disk ?
OCR and Voting Disk go on raw devices.
* ) What is best option for Oracle_Home, is it shared oracle home or sepereate oracle_home on each node ?
Install ORACLE_HOME, ASM_HOME and CRS_HOME locally on each server.
* ) Are there any known risks invovled in using ASM. How is the I/O performance with ASM on EMC with Solaris ? Are there any best practices
http://www.oracle.com/technology/products/database/asm/pdf/asm-on-emc-5_3.pdf
We have always installed 2 HBAs and used powerpath.
* ) Is GigE okay for interconnect or do I need to go for Infiniband ?
For a majority of cases gigE is sufficient.
* ) Is there any notes on Best practices for the above components
Have redundancy at each level.
*) Do I need to consider fail over option for NIC's (interconnect and public), if yes, how to do that ?
You can use IPMP. Use large send/receive buffers. Enable Jumbo Frames.
We had to apply some patches.
5128575 - RAC install of 10.2.0.2 does not update libknlopt.a on all nodes
4769197 - WHILE ONE NODE OF RAC IS DOWN, CONNECTIONS FROM CLIENT HANG
patch 5749953
Thanks
G

Connect to 11g RAC cluster failing

Hi,
I have a simple java client which tries to connect to the 11g RAC clustered Oracle server on a remote location using oracle thin client. We have been provided the SID of the load balancer and hence the client is not able to connect it. When I try to connect to individual nodes with their respective SID's it is able to connect. The server architect recommends to switch to OCI based client instead of thin client. On doing that I am getting this error "Exception in thread "main" java.lang.UnsatisfiedLinkError: no ocijdbc11 in java.library.path". The client resides on a 64 bit Linux machine.How can I make it work.
Below is the code I am using to connect:
OracleDataSource _source = new OracleDataSource();
source.setURL(mInitConnString); //_mInitString="jdbc:oracle:oci:@ip:port:SID";
source.setUser(mDbUser);
source.setPassword(mDbPass);
mConn = source.getConnection();

user10183827 wrote:
It seems that the failover and load balancing features of JDBC do not work with Oracle RAC. As phrased that statement is wrong.
JDBC is an interface. It doesn't do much of anything.
JDBC drivers on the other hand do something.
Oracle has always and still does create a JDBC driver. And so do other companies - for Oracle. The merge will have no impact on that.
Oracle certainly seems to think that RAC does work with their JDBC driver. The following link demonstrates that. (Notice that the page copyright on that page is 2008 so perhaps they have thought so for a while.)
http://download.oracle.com/docs/cd/E13222_01/wls/docs103/jdbc_admin/oracle_rac.html
and the fail over strings connect but without failover. So if I take a rac node down then the sites go down.Far as I know the primary and most significant goal of RAC is to support clusters of servers, so if in fact there isn't in some way to actually do what you are saying then why does Oracle claim that RAC does in fact work in Java? Why does the above link specifically discuss failover/retries within a Java context if it does not in fact work at all?

How ConnectionPool works in RAC?

For example, we create a ConnectionPool .if one of the instances in RAC is down, and some connections in ConnectionPool just connect this instance, then this connection become invalid. Can ConnectionPool automatically eliminate this connection from the pool?
If happened, How can I deal with it?

For example, we create a ConnectionPool .if one of the instances in RAC is down, and some connections in ConnectionPool just connect this instance, then this connection become invalid. Can ConnectionPool automatically eliminate this connection from the pool?
If happened, How can I deal with it?

RAC crashed - errors in messages log

2 node RAC. running 1 node config.
OEL 5.
Oracle 11.2.0.2.0
RAC crashed this morning with following errors in ASM log
ORA-27300: OS system dependent operation:semget failed with status: 28
ORA-27301: OS failure message: No space left on device
ORA-27302: failure occurred at: sskgpsemsper
Found blog notes to up the semaphores in the kernel. http://arjudba.blogspot.ie/2008/09/database-startup-fails-with-ora-27302.html
bumped up to the following as per note but didnt fix
from: kernel.sem = 256 32000 100 128
to: kernel.sem = 256 32768 100 228
then tried 258 - didnt work
found another blog note that said to try 288 and that DID work so appears to be a semaphore issue. 2 questions:
1: How can this happen all of a sudden? no changes on box for last 5-6 weeks, nothing else running except RAC.
2. Error in /var/log/messages log file. RAC came down at 10:02, messges in the log file from 10:01, must be related. anyone know what they are? Looks like storage to me.
Jan 10 10:01:48 orcldub817 kernel: lpfc 0000:08:00.0: 0:(0):0713 SCSI layer issued Device Reset (0, 23) return x2002
Jan 10 10:02:17 orcldub817 kernel: INFO: task MpxPeriodicCall:28484 blocked for more than 120 seconds.
Jan 10 10:02:17 orcldub817 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 10 10:02:17 orcldub817 kernel: MpxPeriodicCa D ffffffff80151248     0 28484      1         28485 28483 (L-TLB)
Jan 10 10:02:17 orcldub817 kernel: ffff810c0d8e78a0 0000000000000046 ffff810095560840 ffff810c2f7b14f8
Jan 10 10:02:17 orcldub817 kernel: ffff810c2f7b1000 000000000000000a ffff810c28a96100 ffff81062fd6a100
Jan 10 10:02:17 orcldub817 kernel: 00267af542a39442 0000000000008106 ffff810c28a962e8 000000072e01a0c8
Jan 10 10:02:17 orcldub817 kernel: Call Trace:
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff80064167>] wait_for_completion+0x79/0xa2
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff8008e16d>] default_wake_function+0x0/0xe
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff801459cc>] blk_execute_rq_nowait+0x86/0x9a
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff80145a78>] blk_execute_rq+0x98/0xc0
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff8840b90d>] :emcp:emcp_scsi_cmd_ioctl+0x2ad/0x400
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff88412cb5>] :emcp:PowerPlatformBottomDispatch+0x3d5/0x760
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff884130d8>] :emcp:PowerSyncIoBottomDispatch+0x78/0xd0
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff88413626>] :emcp:PowerDispatchX+0x356/0x410
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff884147fd>] :emcp:EmsInquiry+0x9d/0x1a0
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff8863245a>] :emcpmpx:ClariionKLam_getPathLunStatus+0x8a/0x110
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff88622391>] :emcpmpx:MpxDefaultTestPath+0x21/0x80
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff88631bf3>] :emcpmpx:MpxLnxTestPath+0x43/0x270
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff800a1bad>] autoremove_wake_function+0x9/0x2e
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff8008c597>] __wake_up_common+0x3e/0x68
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff8002e511>] __wake_up+0x38/0x4f
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff88407e9e>] :emcp:PowerPutSema+0x4e/0x60
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff8006f1f5>] do_gettimeofday+0x40/0x90
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff8862a054>] :emcpmpx:MpxTestPath+0x234/0x1fc0
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff80063ff8>] thread_return+0x62/0xfe
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff8862e8cb>] :emcpmpx:MpxPeriodicTestPath+0x9b/0x230
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff8840e9c4>] :emcp:PowerTimeout+0x1a4/0x1e0
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff8006f1f5>] do_gettimeofday+0x40/0x90
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff8862ed5a>] :emcpmpx:MpxPeriodicCallout+0x2fa/0x410
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff8862ea60>] :emcpmpx:MpxPeriodicCallout+0x0/0x410
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff8841164d>] :emcp:PowerServiceDaemonQ+0xad/0xd0
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff8005efb1>] child_rip+0xa/0x11
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff88411670>] :emcp:PowerDaemonStart+0x0/0x20
Jan 10 10:02:17 orcldub817 kernel: [<ffffffff8005efa7>] child_rip+0x0/0x11
Jan 10 10:02:17 orcldub817 kernel:second node messages log has exact same...
Jan 10 10:01:48 orcldub818 kernel: lpfc 0000:08:00.0: 0:(0):0713 SCSI layer issued Device Reset (1, 23) return x2002
Jan 10 10:02:11 orcldub818 kernel: INFO: task MpxPeriodicCall:28298 blocked for more than 120 seconds.
Jan 10 10:02:11 orcldub818 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 10 10:02:11 orcldub818 kernel: MpxPeriodicCa D ffffffff80151248     0 28298      1         28299 28297 (L-TLB)
Jan 10 10:02:11 orcldub818 kernel: ffff810626d7d8a0 0000000000000046 ffff8107adac6500 ffff810c2ff7e4f8
Jan 10 10:02:11 orcldub818 kernel: ffff810c2ff7e000 000000000000000a ffff81062a2ad7a0 ffff81062fcf80c0
Jan 10 10:02:11 orcldub818 kernel: 00267d060c8f888d 000000000002ac41 ffff81062a2ad988 0000000676dd2cd8
and same messages as per node 1 above.Edited by: 961469 on Jan 10, 2013 5:46 AM

Dude wrote:
Why do you think it's a problem with kernel semaphores?
There are 2 other issues that I suggest to look into first:
SCSI layer issued Device Reset (0, 23)
ORA-27301: OS failure message: No space left on device
To me this looks like a hardware problem with your EMC storage
Also please provide exact details about your OS version and system.See the errors in the ASM first of all. A few notes are thrown up when we look for those errors. Upping the semaphores resolved. My task now is to get to root cause.
versions: x86_64 2.6.18-194.el5
While increasing the semaphores resolved, I too think it looks like a problem with the storage as we get the same issue in the messages fil eon both nodes at same time.
couple of things looking at ipcs that dont make sense
[root@orcld818 proc]# ipcs -sa | wc -l
242
[root@orcl818 proc]# ipcs -sa
242 listed -
0x45116b13 7241949    root      666        1
0x45116ab3 7274718    root      666        1
0x45116a13 7307487    root      666        1
0x45116a73 7340256    root      666        1
0x45116a93 7373025    root      666        1
0x45119ca9 7405794    root      666        1
0x451187f7 7438563    root      666        1      All those returned are root owned, this doesnt make sense to me, I thought there should be only a few listed owned by oracle
If I try to identify any the processes associated with those, theres a PID listed, yet the pid doesnt exist when I try to query it e.g. the last one listed above...
[root@orcl818 proc]# ipcs -i 7438563 -s
Semaphore Array semid=7438563
uid=0    gid=0   cuid=0 cgid=0
mode=0666, access_perms=0666
nsems = 1
otime = Thu Jan 10 11:15:13 2013
ctime = Thu Jan 10 11:15:12 2013
semnum     value      ncount     zcount     pid
0          0          0          0          22922
[root@orcl818 proc]# ps -ef | grep 22922
root     19611 9740 0 14:54 pts/2    00:00:00 grep 22922
[root@orcl818 proc]#
[root@orcl818 proc]# ls /proc/22922
ls: /proc/22922: No such file or directory
[root@orcl818 proc]#

JDBC connection creation with ORACLE RAC

Hello All,
Here my scenario is when ever one of my VIP instance in Oracle RAC goes down.Weblogic/Java(JDBC) is taking close to 3 minutes for failover with secondary host. I am looking for a solution to reduce the connect time failover seconds..
jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=HOST1)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=HOST2)(PORT=1521))(FAILOVER=on)(LOAD_BALANCE=off))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=SERVICE)))

Hi,
In such case failover will depend on RAC instead on Weblogic.
Oracle will never recommend such way.
Please try to use Mutli datasource to implement such failover and in older version we have 60 seconds for failover time.
But we can change but adding one Bug (currently not remember).
Regards,
Kal

Questions on Client Failver and Fast Failover

've few questions regarding Client Failover and Fast Faiolver of Oracle Database HA. Before I ask those questions I would like to explain my environment. Below are the details.
- We have two physical locations called 'ABC' and 'PQR'
- ABC is the primary site.
- PQR is the standby site.
- In ABC, we have Oracle RAC database (11.2.0.2) with two nodes.
- In PQR, we have a single server standalone database (11.2.0.2) with ASM. This is not a RAC.
- Data Guard has been configured between ABC and PQR and it is working as expected.
- Please note that we have a licence for Active Data Guard.
- We have Oracle Identity Management products at both ABC and PQR and they are going to use RAC database as a primary database which is in ABC.
- We did not configure Data Guard broker yet.
We want to achieve below goals:
Goal 1:
Whenever RAC primary goes down completely, standby database should become primary database AUTOMATICALLY and it should allow read/write operation.
I guess this is called 'Fast Failover'. Please let me know if I am wrong.
Questions :
- To make this happen, Do I need to configure Data Guard Broker so that standy database becomes primary when RAC goes down completely with planned or unplanned outage.
- Let's say RAC goes down completely, how long does Data Guard broker take to make standby db as Primary.
- What about the client/application who already connected to RAC.
- Let's standby DB has become as primary and after sometime if RAC comes back , Does data guard automatically changes the role of RAC to primary ?
Goal 2:
As I explained above, all Oracle IDM products and applications speak to RAC database they only know about RAC database which is primary.They are not aware of standby database.
- Whenever a client session is in progress with RAC primary database, if RAC goes down completely , we would like to expect client session should get transferred standby datbase without loosing session information . However before this happens, standby database should become primary becuase client session may perform write operations.
- Whenever a client is trying to connect to RAC prmary and assume RAC is completely down, we would like to expect client connections should get transferred to standby database.
However before this happens, standby database should become primary becuase client session may perform write operations.
As per my knowledge, above scenarios are called 'client failver'. Please let me know if I am wrong.
Questions:
1. Please throw some light to achieve above features.
2. As per my understanding, before client failover happens, fast failover should have already occured and standby should get switch to primary role. I guess all this happens through TIMEOUT parameters. What are those.
Could you please help ?
Thanks

859875 wrote:
Goal 1:
Whenever RAC primary goes down completely, standby database should become primary database AUTOMATICALLY and it should allow read/write operation.
I guess this is called 'Fast Failover'. Please let me know if I am wrong.You are correct.
>
Questions :
- To make this happen, Do I need to configure Data Guard Broker so that standy database becomes primary when RAC goes down completely with planned or unplanned outage.Yes (you can also use Grid Control that will use Data Guard Broker).
>
Goal 2:
As I explained above, all Oracle IDM products and applications speak to RAC database they only know about RAC database which is primary.They are not aware of standby database.
- Whenever a client session is in progress with RAC primary database, if RAC goes down completely , we would like to expect client session should get transferred standby datbase without loosing session information:This is not possible: it is possible only for SELECT statement and only in a single RAC database.
You can find some interesting documents on MAA home page (best pratices, case studies...):
http://www.oracle.com/technetwork/database/features/availability/maa-090890.html

Opening OEM for two Oracle 11gR2 Databases in the same web browser automatically log out.

Hi to everyone,
I have an issue regarding Oracle Enterprise Manager in 11gR2. I have two database (SWPROD, PDPROD) in a single server. When I open the OEM URL for SWPROD it is successfully logged on but when I open the OEM URL for PDPROD and successfully logged on the other tab for SWPROD will automatically logged out. And when I switch to the other tab for PDPROD it is also automatically logged out. Both OEM URL is open in a single web browser like Mozilla Firefox. What would be the reason why both OEM URL will be logged out when I open them at the same time?
Thank you for your incoming response.

Well it seems the only way to clear these out of EM was to shut BOTH RAC nodes down and power them up one at a time. Now the updates aren't shown as required and my compliance score is where it should be.
Is this a bug ? Seems pretty stupid to have to shut down both RAC nodes to fix this. Powering a single RAC node off and back on did not clear this.
Unless i'm missing something??

Rac is down

Similar Messages

Maybe you are looking for