RAC Node eviction question...

Say we have 3 node RAC cluster on OEL5.3. What happens if one node evicted out of it? I know other two instance will do dynamic remastering... and something more.
I want to know eachand every steps in detail. What really happens when one node goes down in RAC environment.
Experts please comment.
Many Thanks.

I want to know each and every steps in detail. Assume you know "each and every steps in detail." what will you do differently based upon this information?
Handle:      vh_dba
Status Level:      Newbie (30)
Registered:      Jan 10, 2010
Total Posts:      38
Total Questions:      16 (15 unresolved)
So many questions with only a single answer.
:-(

Similar Messages

Rac node evicted and asm related

Hi friends
I have few doubts in rac environment
1.In 2 node rac while adding datafile to tablespace if you forget to metion '+'then what will happen whether it is going to be create or it throws an error if it creates where exactly located and other node users how to work on that tablespace .what all steps to perform that datafile is usefull for all node users.
2. In Rac environment how to check how many sessions connected to particular node.
3)
In Rac any node is evicted due to network failure then after we rebuild the network .Is there any steps to do manually to access the failure node after rebuilding the network or it will automatically available in cluster group which service is perform this activity.
4.While configuring clusterware you choose voting disk and ocr disk location and which redundancy you will choose suppose if you go for normal redundancy how many disks you can select for each file either one or two?.

[grid@srvtestdb1 ~]$ ps -ef|grep tns
root 65 2 0 Aug29 ? 00:00:00 [netns]
grid 4449 1 0 Aug29 ? 00:00:25 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN2 -inherit
grid 4454 1 0 Aug29 ? 00:00:23 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN3 -inherit
grid 4481 1 0 Aug29 ? 00:00:33 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER -inherit
grid 37028 1 0 09:38 ? 00:00:00 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
grid 37901 36372 0 09:45 pts/0 00:00:00 grep tns
[grid@srvtestdb1 ~]$
[grid@srvtestdb1 ~]$ srvctl config scan_listener
SCAN Listener LISTENER_SCAN1 exists. Port: TCP:1521
SCAN Listener LISTENER_SCAN2 exists. Port: TCP:1521
SCAN Listener LISTENER_SCAN3 exists. Port: TCP:1521
[grid@srvtestdb1 ~]$
[grid@srvtestdb1 ~]$ srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is running on node srvtestdb1
SCAN Listener LISTENER_SCAN2 is enabled
SCAN listener LISTENER_SCAN2 is running on node srvtestdb1
SCAN Listener LISTENER_SCAN3 is enabled
SCAN listener LISTENER_SCAN3 is running on node srvtestdb1
[grid@srvtestdb1 ~]$ srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node srvtestdb1
SCAN VIP scan2 is enabled
SCAN VIP scan2 is running on node srvtestdb1
SCAN VIP scan3 is enabled
SCAN VIP scan3 is running on node srvtestdb1

Oracle 10g RAC on Solaris Node Eviction

Been having periodic node eviction on my server. I've found several threads regarding RAC node reboots but nothing specific.. In my case, the node eviction warning appears to be "immediate"
[cssd(9530)]CRS-1612:node mbdmb2 (0) at 50% heartbeat fatal, eviction in 0.000 seconds
[cssd(9530)]CRS-1612:node mbdmb2 (0) at 50% heartbeat fatal, eviction in 0.000 seconds
[cssd(9530)]CRS-1611:node mbdmb2 (0) at 75% heartbeat fatal, eviction in 0.000 seconds
[cssd(9530)]CRS-1611:node mbdmb2 (0) at 75% heartbeat fatal, eviction in 0.000 seconds
[cssd(9530)]CRS-1610:node mbdmb2 (0) at 90% heartbeat fatal, eviction in 0.000 seconds
[cssd(9530)]CRS-1610:node mbdmb2 (0) at 90% heartbeat fatal, eviction in 0.000 seconds
[cssd(9530)]CRS-1610:node mbdmb2 (0) at 90% heartbeat fatal, eviction in 0.000 seconds
[cssd(9530)]CRS-1610:node mbdmb2 (0) at 90% heartbeat fatal, eviction in 0.000 seconds
[cssd(9530)]CRS-1610:node mbdmb2 (0) at 90% heartbeat fatal, eviction in 0.000 seconds
[cssd(9530)]CRS-1610:node mbdmb2 (0) at 90% heartbeat fatal, eviction in 0.000 seconds
[cssd(9530)]CRS-1610:node mbdmb2 (0) at 90% heartbeat fatal, eviction in 0.000 seconds
[cssd(9530)]CRS-1610:node mbdmb2 (0) at 90% heartbeat fatal, eviction in 0.000 seconds
[cssd(9530)]CRS-1610:node mbdmb2 (0) at 90% heartbeat fatal, eviction in 0.000 seconds
[cssd(9530)]CRS-1610:node mbdmb2 (0) at 90% heartbeat fatal, eviction in 0.000 seconds
[cssd(9530)]CRS-1610:node mbdmb2 (0) at 90% heartbeat fatal, eviction in 0.000 seconds
[cssd(9530)]CRS-1610:node mbdmb2 (0) at 90% heartbeat fatal, eviction in 0.000 seconds
[cssd(9530)]CRS-1607:CSSD evicting node mbdmb2. Details in /u01/crs/oracle/product/10.2/app/log/mbdmb1/cssd/
ocssd.log.
Other people’s: Seem to have a time to recover.. and only reboots when it eventually runs out of time..
2009-08-31 16:05:41.405
[cssd(4968)]CRS-1612:node simsd1 (1) at 50% heartbeat fatal, eviction in 29.611 seconds
2009-08-31 16:05:42.403
[cssd(4968)]CRS-1612:node simsd1 (1) at 50% heartbeat fatal, eviction in 28.613 seconds
2009-08-31 16:05:56.412
[cssd(4968)]CRS-1611:node simsd1 (1) at 75% heartbeat fatal, eviction in 14.604 seconds
2009-08-31 16:05:57.411
[cssd(4968)]CRS-1611:node simsd1 (1) at 75% heartbeat fatal, eviction in 13.605 seconds
2009-08-31 16:06:05.413
[cssd(4968)]CRS-1610:node simsd1 (1) at 90% heartbeat fatal, eviction in 5.603 seconds
2009-08-31 16:06:06.412
[cssd(4968)]CRS-1610:node simsd1 (1) at 90% heartbeat fatal, eviction in 4.604 seconds
2009-08-31 16:06:07.410
[cssd(4968)]CRS-1610:node simsd1 (1) at 90% heartbeat fatal, eviction in 3.606 seconds
2009-08-31 16:06:08.409
[cssd(4968)]CRS-1610:node simsd1 (1) at 90% heartbeat fatal, eviction in 2.607 seconds
2009-08-31 16:06:09.407
[cssd(4968)]CRS-1610:node simsd1 (1) at 90% heartbeat fatal, eviction in 1.609 seconds
2009-08-31 16:06:10.405
[cssd(4968)]CRS-1610:node simsd1 (1) at 90% heartbeat fatal, eviction in 0.611 seconds
2009-08-31 16:06:11.061
[cssd(4968)]CRS-1609:CSSD detected a network split. Details in C:\product\11.1.0\crs\log\simsd2\cssd\ocssd.log.
2009-08-31 16:14:37.873
I'm lead to think this is due to something with the setting on the heartbeat loss window. There are some threads suggesting the hangcheck-timer but it does not appear to be for solaris. Wondering where if any place I can check/change this setting.

Ah, thanks, looks like even just looking at the log yielded some thing different. I was grepping the alertlog instead which apparantly doesn't show as much (and shows time as 0). In the ocssd.log, it shows it with the time to live.
One more question, can you tell from this that whether this is a network hb or disk hb related or something else?
Thanks!
CSSD]2010-02-24 06:46:17.941 [19] >TRACE: clssnmSendingThread: sending status msg to all nodes
[    CSSD]2010-02-24 06:46:17.941 [19] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes
[    CSSD]2010-02-24 06:46:22.941 [19] >TRACE: clssnmSendingThread: sending status msg to all nodes
[    CSSD]2010-02-24 06:46:22.941 [19] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes
[    CSSD]2010-02-24 06:46:27.685 [14] >TRACE: clssgmRegisterClient: proc(17/1009503f0), client(344/10097f
7f0)
[    CSSD]2010-02-24 06:46:27.685 [14] >TRACE: clssgmExecuteClientRequest: GRKJOIN recvd from client 344 (
10097f7f0)
[    CSSD]2010-02-24 06:46:27.685 [14] >TRACE: clssgmJoinGrock: grock DG_FLASH51 new client 10097f7f0 with
con 100932430, requested num -1
[    CSSD]2010-02-24 06:46:27.685 [14] >TRACE: clssgmAddGrockMember: adding member to grock DG_FLASH51
[    CSSD]2010-02-24 06:46:27.685 [14] >TRACE: clssgmAddMember: member (2/100921830) added. pbsz(123) prsz
(42) flags 0x0 to grock (100914210/DG_FLASH51)
[    CSSD]2010-02-24 06:46:27.686 [14] >TRACE: clssgmQueueGrockEvent: groupName(DG_FLASH51) count(3) maste
r(0) event(1), incarn 208505, mbrc 3, to member 0, events 0x0, state 0x0
[    CSSD]2010-02-24 06:46:27.686 [14] >TRACE: clssgmCommonAddMember: Local member(2) node(1) flags 0x0 0x
100 grock (2/100914210/DG_FLASH51)
[    CSSD]2010-02-24 06:46:27.941 [19] >TRACE: clssnmSendingThread: sending status msg to all nodes
[    CSSD]2010-02-24 06:46:27.941 [19] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes
[    CSSD]2010-02-24 06:46:28.941 [18] >WARNING: clssnmPollingThread: node mbdmb2 (2) at 50 2.123767e-314art
beat fatal, eviction in 59.577 seconds
[    CSSD]2010-02-24 06:46:28.941 [18] >TRACE: clssnmPollingThread: node mbdmb2 (2) is impending reconfig,
flag 1, misstime 60423
[    CSSD]2010-02-24 06:46:28.941 [18] >TRACE: clssnmPollingThread: diskTimeout set to (117000)ms impendin
g reconfig status(1)
[    CSSD]2010-02-24 06:46:29.941 [18] >WARNING: clssnmPollingThread: node mbdmb2 (2) at 50 2.123767e-314art
beat fatal, eviction in 58.577 seconds
[    CSSD]2010-02-24 06:46:30.363 [17] >TRACE: clssgmDispatchCMXMSG(): msg type(12) src(2) dest(1) size(36
0) tag(00d2002a) incarnation(88)
[    CSSD]2010-02-24 06:46:32.941 [19] >TRACE: clssnmSendingThread: sending status msg to all nodes
[    CSSD]2010-02-24 06:46:32.941 [19] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes
[    CSSD]2010-02-24 06:46:37.941 [19] >TRACE: clssnmSendingThread: sending status msg to all nodes
[    CSSD]2010-02-24 06:46:37.941 [19] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes
[    CSSD]2010-02-24 06:46:42.941 [19] >TRACE: clssnmSendingThread: sending status msg to all nodes
[    CSSD]2010-02-24 06:46:42.941 [19] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes
[    CSSD]2010-02-24 06:46:47.941 [19] >TRACE: clssnmSendingThread: sending status msg to all nodes
[    CSSD]2010-02-24 06:46:47.941 [19] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes
[    CSSD]2010-02-24 06:46:52.941 [19] >TRACE: clssnmSendingThread: sending status msg to all nodes
[    CSSD]2010-02-24 06:46:57.941 [19] >TRACE: clssnmSendingThread: sending status msg to all nodes
[    CSSD]2010-02-24 06:46:57.941 [19] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes
[    CSSD]2010-02-24 06:46:58.941 [18] >WARNING: clssnmPollingThread: node mbdmb2 (2) at 75 2.123767e-314art
beat fatal, eviction in 29.577 seconds
[    CSSD]2010-02-24 06:46:59.941 [18] >WARNING: clssnmPollingThread: node mbdmb2 (2) at 75 2.123767e-314art
beat fatal, eviction in 28.577 seconds
[    CSSD]2010-02-24 06:47:02.941 [19] >TRACE: clssnmSendingThread: sending status msg to all nodes
[    CSSD]2010-02-24 06:47:02.941 [19] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes
[    CSSD]2010-02-24 06:47:07.941 [19] >TRACE: clssnmSendingThread: sending status msg to all nodes
[    CSSD]2010-02-24 06:47:07.941 [19] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes
[    CSSD]2010-02-24 06:47:12.941 [19] >TRACE: clssnmSendingThread: sending status msg to all nodes
[    CSSD]2010-02-24 06:47:12.941 [19] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes
[    CSSD]2010-02-24 06:47:16.941 [18] >WARNING: clssnmPollingThread: node mbdmb2 (2) at 90 2.123767e-314art
beat fatal, eviction in 11.577 seconds
[    CSSD]2010-02-24 06:47:17.725 [17] >TRACE: clssgmDispatchCMXMSG(): msg type(12) src(2) dest(1) size(36
0) tag(00d3002a) incarnation(88)
[    CSSD]2010-02-24 06:47:17.941 [19] >TRACE: clssnmSendingThread: sending status msg to all nodes
[    CSSD]2010-02-24 06:47:17.941 [18] >WARNING: clssnmPollingThread: node mbdmb2 (2) at 90 2.123767e-314art
beat fatal, eviction in 10.577 seconds
[    CSSD]2010-02-24 06:47:17.941 [19] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes
[    CSSD]2010-02-24 06:47:18.941 [18] >WARNING: clssnmPollingThread: node mbdmb2 (2) at 90 2.123767e-314art
beat fatal, eviction in 9.577 seconds
[    CSSD]2010-02-24 06:47:19.941 [18] >WARNING: clssnmPollingThread: node mbdmb2 (2) at 90 2.123767e-314art
beat fatal, eviction in 8.577 seconds
[    CSSD]2010-02-24 06:47:20.941 [18] >WARNING: clssnmPollingThread: node mbdmb2 (2) at 90 2.123767e-314art
beat fatal, eviction in 7.577 seconds
[    CSSD]2010-02-24 06:47:21.941 [18] >WARNING: clssnmPollingThread: node mbdmb2 (2) at 90 2.123767e-314art
beat fatal, eviction in 6.577 seconds
[    CSSD]2010-02-24 06:47:22.941 [19] >TRACE: clssnmSendingThread: sending status msg to all nodes
[    CSSD]2010-02-24 06:47:22.941 [18] >WARNING: clssnmPollingThread: node mbdmb2 (2) at 90 2.123767e-314art
beat fatal, eviction in 5.577 seconds
[    CSSD]2010-02-24 06:47:22.941 [19] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes
[    CSSD]2010-02-24 06:47:26.941 [18] >WARNING: clssnmPollingThread: node mbdmb2 (2) at 90 2.123767e-314art
beat fatal, eviction in 1.577 seconds
[    CSSD]2010-02-24 06:47:26.941 [19] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes
[    CSSD]2010-02-24 06:47:27.703 [17] >TRACE: clssgmPeerEventHndlr: receive failed, node 2 (mbdmb2) (1009
0eb90), rc 11
[    CSSD]2010-02-24 06:47:27.704 [17] >TRACE: clssgmPeerDeactivate: node 2 (mbdmb2), death 0, state 0x800
00001 connstate 0xf
[    CSSD]2010-02-24 06:47:27.704 [17] >TRACE: clssgmPeerListener: discarded 0 future msgsfor 2
[    CSSD]2010-02-24 06:47:27.941 [18] >WARNING: clssnmPollingThread: node mbdmb2 (2) at 90 2.123767e-314art
beat fatal, eviction in 0.577 seconds
[    CSSD]2010-02-24 06:47:28.521 [18] >TRACE: clssnmPollingThread: Eviction started for node mbdmb2 (2),
flags 0x0001, state 3, wt4c 0

Question on Rebooting RAC Nodes

Hi, I heard that when rebooting all RAC nodes, one has to wait at least 5 minutes between each node reboot. So you would reboot node 1 at 0 time, node 2 5 minutes later etc.
However, I could not find any documentation on this, can someone please point me to the right place to look? Thanks.

I have not heard that before. I generally use srvctl to stop/start the databases, I do not think it waits 5 minutes between starting nodes as it does not take 15 minutes to stop/start the databases. As far as rebooting the hosts, the only time interval between machines was the time it took to send the reboot command to the host.

11g R1 Node Evictions on Linux

We are getting random node evictions on Linux. oswatcher is showing sometimes we get times of around 2.0ms. They are typically below .500ms. We have a VLAN interconnect switch. If times above .500ms are seen, what is the duration that seeing these high times will cause eviction? for example, a 30sec period of interconnect times above .500ms?
Thanks in advance!

You can check the values of disktimeout and misscount in your CRS: crsctl get css disktimeout, and look at the ocrdump for the second.
Have you already checked similiar questions here ?. This looks like something similar to your problem, with a good discussion: RAC nodes rebooting
Regards.

Exception while failing over to 2nd RAC Node

We are using Weblogic 10.3.4. Our setup is that we have a Web Application (A tapestry front end Web UI) and EJb 2.1 back-end talking to the Oracle database. The EJB’s are CMP. Our product always was just stand alone and it wasn’t until this release we needed to make it work with RAC. To get this to work we followed the model of having a Multidatasource with datasources pointing to our RAC nodes. We have two types of datasources that we use persistent and non-persistent. And we are using the Oracle thin driver – non-XA for RAC Service Instances, supporting global transactions.
When we do failover to the 2nd node we get a nasty exception in our GUI but after logging out and logging back it we are fine.
My question is that I assumed I shouldn't have to restart our web-application and it should have stayed up ?? Or is there something wrong with our setup ?
Thanks,
Ian

Showing us the exception and/or the error messages at the server might help...
Note that failing over does not save any ongoing connection or transaction that
had been to the dead RAC node... Does your web-app get-use-close JDBC
connections on a per-user-invoke basis, or does it hold onto connections?
Joe

Running Oracle database 10g and 11g on same 5 RAC nodes

Hello Gurus,
Could any body throw light if I can install and sucessfully run Oracle database 10g and 11g on the same Oracle RAC installation setup.My setup is below
Number of nodes-5
OS- windows 2003 or RHEL5
storage- DELL EMC SAN
Clusterware- oracle version11g
File system-Automatic storage management(ASM)
After I successfully setup clusterware,ASM on the nodes,I would want to install 11g database on all 5 nodes .
Then Install 10g database on only 3 of the nodes using the same clusterware.
What are your views on the same.
Also FYI... as per metalink node 220970.1(RAC: Frequently Asked Questions) one can do such a setup.
what iam looking for is practical experience if anyone has implemented this in production system,if yes any issues faced and how tough it is to support.
Thanks,
Imtiyaz

You could run an 11g database and 10g database on the same cluster as long as you use Clusterware 11g.
The administration aspect will drastically change according to the platform you run on. As of now, it appears you don't know whether it will be Linux or Windows.
It would be practical to support the same database release.

Huge number of idle connections from loopback ip on oracle RAC node

Hi,
We have a 2node 11gR2(11.2.0.3) oracle RAC node. We are seeing huge number of idle connection(more than 5000 in each node) on both the nodes and increasing day by day. All the idle connections are from VIP and loopback address(127.0.0.1.47971 )
netstat -an |grep -i idle|more
127.0.0.1.47971 Idle
any insight will be helpful.
The server is suffering memory issues occasionally (once in a month).
ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
Thanks

user12959884 wrote:
Hi,
We have a 2node 11gR2(11.2.0.3) oracle RAC node. We are seeing huge number of idle connection(more than 5000 in each node) on both the nodes and increasing day by day. All the idle connections are from VIP and loopback address(127.0.0.1.47971 )
netstat -an |grep -i idle|more
127.0.0.1.47971 Idle
any insight will be helpful.
The server is suffering memory issues occasionally (once in a month).
ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
Thankswe can not control what occurs on your DB Server.
How do I ask a question on the forums?
SQL and PL/SQL FAQ
post results from following SQL
SELECT * FROM V$VERSION;

RAC node reboot

Hi,
May I ask here that how to prevent from split-brain happen to a healthy two nodes RAC? I understand Oracle decided restart one node based on network messaging healthy, and on the other hand, i think from 10g to 11g, there are bugs about evict node due to ipc timeout.
Thanks

You should check following to get correct issue-
refer to database log & associated trace files, asm.log associated trace files then further drill down to ocssd, crsd evmd logfiles.
From trace files you will get the reason for node eviction normally for following reasons
Reason 0 = No reconfiguration
Reason 1 = The Node Monitor generated the reconfiguration.
Reason 2 = An instance death was detected.
Reason 3 = Communications Failure
Reason 4 = Reconfiguration after suspend
Once you know the reason, then look for the cause and fix it. For troubleshooting and data gathering refer to metalink notes.
Thanks.

RAC node Hung

Hi Friends,
Server info:
Windows 2003 server
Oracle 10.2.0.5, 2 Node RAC
We are having problem Hung Node 2 server due to Blue dump error. But in Oracle we are not getting any error on CRS & alertlogs. After restarted the server problem solved. How can we identify what could be the reason of server hang. We are not getting any error in Operating System side also. Is there any way to identify the problem of server hang after restarted server?
Thanks in advance.

user12159566 wrote:
Hi,
Thanks for your reply.
OS side also having no logs generated except "*Blue Screen Trap (BugCheck, STOP: 0x0000FFFF (0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000))*" . As per my knowledge this is not a Node eviction problem. We are not able to find any node eviction log in Oracle logs.
See this note:
*RAC on Windows: Oracle Clusterware Node Evictions a.k.a. Why do we get a Blue Screen (BSOD) Caused By Orafencedrv.sys? [ID 337784.1]*

Rac node failed how do you bring it back up?

Example: If there are 3 RAC nodes and one becomes unavailable/fails, how do you bring it back up?

There are typically two basic reasons why a RAC node will go down.
A cluster issue, causing the node to be evicted. This usually means the node lost access to the cluster storage (e.g. no access to voting disks), or lost access to the Interconnect.
An o/s issue causing the node to fail. E.g. a kernel panic due to a page swap and memory not syncing, or a soft CPU lockup, etc.
You need to determine why it went down, and determine what is needed to enable it to join the successfully cluster again.

Error during RAC Node Registration: WB_RT_SERVICE_MANAGEMENT missed

hi to all,
can you help me by OWB-Installation on RAC?
I start on Unix-Server (Node1) with:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
PL/SQL Release 11.1.0.7.0 - Production
CORE 11.1.0.7.0 Production
TNS for Linux: Version 11.1.0.7.0 - Production
NLSRTL Version 11.1.0.7.0 - Production
owb/bin/unix/reposinst.sh.
Now in Repository Assistant -> <IP of HOST1>:1521:<Service Name of RAC> -> Register a RAC Instance -> Finish.
And now during Registration an error appear:
Error occurred during RAC Node Registration. Exception =java.lang.Exception: java.sql.SQLException: ORA-06550: line 7, column 14:
PLS-00201: identifier 'WB_RT_SERVICE_MANAGEMENT.ADD_NODE' must be declared;
I am going to select all objects in database:
SQL> select * from dba_objects where lower(object_name) like '%wb_rt%';
no rows selected
All components are installed:
COMP_ID COMP_NAME VERSION
OWB OWB 11.1.0.7.0
APEX Oracle Application Express 3.0.1.00.1
EM Oracle Enterprise Manager 11.1.0.7.0
WK Oracle Ultra Search 11.1.0.7.0
AMD OLAP Catalog 11.1.0.7.0
SDO Spatial 11.1.0.7.0
ORDIM Oracle Multimedia 11.1.0.7.0
XDB Oracle XML Database 11.1.0.7.0
CONTEXT Oracle Text 11.1.0.7.0
EXF Oracle Expression Filter 11.1.0.7.0
RUL Oracle Rules Manager 11.1.0.7.0
OWM Oracle Workspace Manager 11.1.0.7.0
CATALOG Oracle Database Catalog Views 11.1.0.7.0
CATPROC Oracle Database Packages and Types 11.1.0.7.0
JAVAVM JServer JAVA Virtual Machine 11.1.0.7.0
XML Oracle XDK 11.1.0.7.0
CATJAVA Oracle Database Java Packages 11.1.0.7.0
APS OLAP Analytic Workspace 11.1.0.7.0
XOQ Oracle OLAP API 11.1.0.7.0
RAC Oracle Real Application Clusters 11.1.0.7.0
What i must and can do?
Edited by: AndreyT on 26.04.2010 05:23

No. Must I install a Repository with Server's Repository Administrator (on Unix - Node 1) or with Client's Repository Administrator (on Windows)? Wie configure i Control Center ?
And yet one question. I have Oracle 11.1.0.7.0 auf Unix with OWB 11.1.0.7.0. There are two Versions of OWB Standalone Software on Oracle-Site to download: OWB 11.2.0.1.0 and 11.1.0.6.0. What one must i download to client computer to install? Work 11.2.0.1.0 on client with 11.1.0.7.0 on Server?
Must
Edited by: AndreyT on 27.04.2010 04:12
Edited by: AndreyT on 27.04.2010 12:46

Is it possible to move some of the capture processes to another rac node?

Hi All,
Is it possible to move some of the ODI (Oracle Data Integrator) capture processes running on node1 to node2. Once moved does it work as usual or not? If its possible please provide me with steps.
Appreciate your response
Best Regards
SK.

Hi Cezar,
Thanks for your post. I have a related question regarding this,
Is it really necessary to have multiple capture and multiple apply processes? One for each schema in ODI? Because if set to automatic configuration, ODI seems to create a capture and a related apply process for each schema, which I guess leads to our specific performance problem (high cpu etc) I mentioned in my other post: Re: Is it possible to move some of the capture processes to another rac node?
Is there way to use just one capture and one apply process for all of the schemas in ODI?
Thanks a million.
Edited by: oyigit on Nov 6, 2009 5:31 AM

Node Eviction and SGA

Hello All,
We have a 6 node RAC on 10g rel 2 / windows 2003 64 bit. It was working well from all aspects.
About 3 weeks back ( 3 days before i was to go for my vacation) SA needed to add more power modules, so the entire system (including SAN) was powered down and then brought back up. DB m/c by themselves have undergone a complete reboot before without any issues. This time it was the entire IT system.
Two days after that, all out of sudden, we starting witnessing node eviction issues. Every day one node would get evicted but the m/c would not go down. The typical messages seen were (below is the message from ocssd.log on node 2 ) ..
[    CSSD]2008-07-27 16:04:14.605 [5540] >WARNING: clssnmPollingThread: node serv-db01 (1) at 50% heartbeat fatal, eviction in 29.125 seconds
[    CSSD]2008-07-27 16:04:29.605 [5540] >WARNING: clssnmPollingThread: node serv-db01 (1) at 75% heartbeat fatal, eviction in 14.125 seconds
[    CSSD]2008-07-27 16:04:38.606 [5540] >WARNING: clssnmPollingThread: node serv-db01 (1) at 90% heartbeat fatal, eviction in 5.125 seconds
[    CSSD]2008-07-27 16:04:39.606 [5540] >WARNING: clssnmPollingThread: node serv-db01 (1) at 90% heartbeat fatal, eviction in 4.125 seconds
[    CSSD]2008-07-27 16:04:40.606 [5540] >TRACE: clssnmPollingThread: node serv-db01 (1) is impending reconfig
[    CSSD]2008-07-27 16:04:40.606 [5540] >WARNING: clssnmPollingThread: node serv-db01 (1) at 90% heartbeat fatal, eviction in 3.125 seconds
[    CSSD]2008-07-27 16:04:40.606 [5540] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[    CSSD]2008-07-27 16:04:41.606 [5540] >TRACE: clssnmPollingThread: node serv-db01 (1) is impending reconfig
[    CSSD]2008-07-27 16:04:41.606 [5540] >WARNING: clssnmPollingThread: node serv-db01 (1) at 90% heartbeat fatal, eviction in 2.125 seconds
[    CSSD]2008-07-27 16:04:42.606 [5540] >TRACE: clssnmPollingThread: node serv-db01 (1) is impending reconfig
[    CSSD]2008-07-27 16:04:42.606 [5540] >WARNING: clssnmPollingThread: node serv-db01 (1) at 90% heartbeat fatal, eviction in 1.125 seconds
[    CSSD]2008-07-27 16:04:43.606 [5540] >TRACE: clssnmPollingThread: node serv-db01 (1) is impending reconfig
[    CSSD]2008-07-27 16:04:43.606 [5540] >WARNING: clssnmPollingThread: node serv-db01 (1) at 90% heartbeat fatal, eviction in 0.125 seconds
[    CSSD]2008-07-27 16:04:43.731 [5540] >TRACE: clssnmPollingThread: node serv-db01 (1) is impending reconfig
[    CSSD]2008-07-27 16:04:43.731 [5540] >TRACE: clssnmPollingThread: Eviction started for node serv-db01 (1), flags 0x000f, state 3, wt4c 0
[    CSSD]2008-07-27 16:04:43.731 [5640] >TRACE: clssnmDoSyncUpdate: Initiating sync 8
[    CSSD]2008-07-27 16:04:43.731 [5640] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (57000)ms
[    CSSD]2008-07-27 16:04:43.731 [5640] >TRACE: clssnmSetupAckWait: Ack message type (11)
[    CSSD]2008-07-27 16:04:43.731 [5640] >TRACE: clssnmSetupAckWait: node(1) is ALIVE
[    CSSD]2008-07-27 16:04:43.731 [5640] >TRACE: clssnmSetupAckWait: node(2) is ALIVE
[    CSSD]2008-07-27 16:04:43.731 [5640] >TRACE: clssnmSetupAckWait: node(3) is ALIVE
[    CSSD]2008-07-27 16:04:43.731 [5640] >TRACE: clssnmSetupAckWait: node(4) is ALIVE
[    CSSD]2008-07-27 16:04:43.731 [5640] >TRACE: clssnmSetupAckWait: node(5) is ALIVE
[    CSSD]2008-07-27 16:04:43.731 [5640] >TRACE: clssnmSendSync: syncSeqNo(8)
[    CSSD]2008-07-27 16:04:43.731 [5648] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[serv-db02] seq[1] sync[8]
[    CSSD]2008-07-27 16:04:43.731 [5648] >TRACE: clssnmHandleSync: diskTimeout set to (57000)ms
[    CSSD]2008-07-27 16:04:43.731 [4340] >USER: NMEVENT_SUSPEND [00][00][00][3e]
[    CSSD]2008-07-27 16:04:43.731 [5640] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(4)
[    CSSD]2008-07-27 16:04:43.731 [5640] >TRACE: clssnmWaitForAcks: node(1) is expiring, msg type(11)
[    CSSD]2008-07-27 16:04:43.731 [5640] >TRACE: clssnmWaitForAcks: done, msg type(11)
[    CSSD]2008-07-27 16:04:43.731 [5640] >TRACE: clssnmDoSyncUpdate: Terminating node 1, serv-db01, misstime(60000) state(3)
[    CSSD]2008-07-27 16:04:43.731 [5640] >TRACE: clssnmSetupAckWait: Ack message type (13)
No information was written to the alert logs on all the nodes.
. We contacted oracle support and they were saying its a n/w issue etc,. But my SA was adament that its an oracle problem. Anyway i went for my vacation. There was a suggestion (SA had an oracle contact) that SGA needs to be increased. It was at 800 mb per node. My junoir dba was forced to raise it to 2 gb on each node based on SA's suggestion. Then all of a sudden from the next day, node eviction stopped.
I cannot still beleive that increasing the SGA has got anything to do with node eviction. I told my upper mgmt that node eviction has nothing to do with the SGA. But the consensus in my IT dept is SGA increase solved the issue. Does anybdy think there is any connection between increase in SGA and node eviction. ?. I have read the node eviction papers in metalink and they do not mention about SGA at all.
I would really appriciate any help in this regard.
Thank You,
Sat

WARNING: clssnmPollingThread: node servAs you know that the above warning message because of the network delay which causes the node evictions. If a node doesn't send a network heartbeat for <misscount> (times in seconds) then the node will be evicted from the cluster. Have you check from your network team about any glitches in the around the node eviction time?
I have read the node eviction papers in metalink and they do not mention about SGA at all. One of the other prime reason for node eviction is lack of resource on the server. I am not sure increasing the SGA might have resolved the node eviction issue. Why don't you produce a test case and submit oracle support for more clarification?
Jaffar

What is best use of 1400 gb SGA (2 rac nodes 768gb each)

currently using 11.2.0.3.0 on unix sun sever with 2 RAC nodes each 8 UltraSPARC-T1 cpus (came out in 2005) four threads each so oracle sees 32 CPUS very slow(1.2 gb). Database is 4TB in size on regular SAN (10k speed).
8gb SGA.
New boss wants to update system to the max to get best performance possible Money is a concern of course but budget is pretty high, Our use case is 12-16 users at same time, running reports some small others very large (return single row or 10000s or rows). reports take 5 sec to 5 minutes, Our job is get the fastest system possible, We have total of 8 licenses available so we can have 16 cores. We are also getting a 6tb all flash SSD array for database. we can get any CPU we want but we cant use parallel query server due to all kinds of issues we have experienced (too many slaves, RAC interconnect saturation etc, whack-a-mole). sparc has too many threads and without PS oracle runs query in single thread.
we have speced out the following system for each RAC node
HP ProLiant DL380p Gen8 8 SFF server
2 Intel Xeon E5-2637v2 3.5GHz/4-core cpus
768 gb ram
2 HP 300GB 6G SAS 15K drives for database software
this will give us total of 4 Xeon E5-2637v2 cpus 16 cores total (,5 factor for 8 licenses) and 1536 ram (leaving ~1400 for sga). this will guarantee an available core for each user. we intend to create very very large keep pool around 300 gb for each node that will hold all our dimension tables. this we hope will reduce reads from the SSD to just data from fact tables.,
Are we doing a massive overkill here? the budget for this was way less than what our boss expected. will that big an sga be wasted will say a 256gb be fine. or will oracle take advantage of it and be able to keep most blocks in there.
will an sga that big cause oracle problems due to overhead of handling that much ram?

Current System:
===========
a. Version : 11.2.0.3
b. Unix Sun
c. CPU - 8 cpus with 4 threads => 32 logical cpus or cores
d. database 4TB
e. SAN - 10k speed disk drives
f. 8gb SGA
g. 1.2 gb ??
h. Users --> 12-16 concurrent and run reports varying size
i. reports elasped time 5 sec to 5 mins
j. cpu license -->8
Target System
===========
a. Version: 11.2.0.3
b. HP ProLiant DL380p Gen8 8 SFF server
c. RAM --> 768 GB
d. 2 HP 300GB 6G SAS 15K drives for database software
e. large keep pool -->90 gb to hold all dimension tables.
f. SSD to just data from fact tables
g. SGA -->256gb
Reassessment of the performance issues of current system appears to be required.Good performance tuning expert is required to look into tuning issues of current application by analyzing awr performance metrics . If 8GB SGA is not enough,then reason behind so is that queries running in the system are not having good access path to select lesser data to avoid flushing out of recent buffers from different tables involved in the query. Until those issues are identified , wherever you go, performance issue wont be going away as table size increase in future , problem will reappear.Even if the queries are running with more FULL Scan , then re-platforming to Exadata might be right decision as Exadata has smart scan , cell offloading feature which works faster and might be right direction for best performance and best investment for future.Compression (compress for OLTP) could be one of the other feature to exploit to improve further efficiency while reading the lesser block in lesser read time.
Investment in infrastructure will solve a few issue in short term but long term issue will again arise.
Investment in identifying the performance issues of current system would be best investment in current scenario.

RAC Node eviction question...

Similar Messages

Maybe you are looking for