ORACLE RAC failure scenarios

Hello,
We have heard all the good points about the RAC and many of these are true but I just want real experience when even well configured RAC is failed and have unplanned downtime.
Can anyone tell the failure scenarios also? I understand very basic one for example, interconnect fails ,SAN failed etc but please share some real life experience where even Oracle Customer Service takes not only hours but days to resolve the problem and they simply termed that problem as bug.
Thanks,
S.Mann

I agree with Andreas and I think it's important to point out that the issues he mentioned (networking issues as well as other communication problems) are typically more common when RAC is deployed on a platform that isn't completely familiar to the implementor. That is, if you run Oracle on Windows servers, then deploying RAC on Linux successfully will probably be difficult.
My standard answer for "what's the best platform for RAC?" is to run RAC on the platform that you know the most about. When you're building a system to house your most critical applications, wouldn't you want to build it on the platform that you know the most about?

Similar Messages

Oracle RAC implementation scenario

Hello DB Experts,
I have got a very specific business scenario to be implemented.
Basically we need to implement 2 Node RAC infrastructure.
But in this on each node we have 2 oracle instances running.
Now is it possible to have one Instance running on Active / Active mode and another instance in Active / Passive mode.
First of all i want to know it is possible to implement RAC in this manner.
If yes, then for Active / Passive instance if Active instance goes down how to switch the traffic to the passive node? Or basically i want to know how do i make that passive node active?
Would really appreciate some quick response.
Thanks,
Darshan

Lets assume some better names for this discussion...
RAC Database MYDB (instance on node1 = MYDB1, instance on node2=MYDB2)
SI Database URDB (instance on node1 = URDB with active/passive on node2)
Can this be done? Yes. But why? You can use SERVICES to limit on which node your application is accessing.

Oracle RAC with ASM install failure

Hi Guy's
I've just rebuilt an Oracle RAC system, i've built this system many times and have the build documented. I usually follow the build doc to the tee and everything works fine. I have now been asked to rebuild the system to go into the production evironment. This is a two server standard RAC setup using ASM. We have a HP MSA500 shared storage setup. I've run the cluster verification tool and all the checks have come out fine. Whilst installing the clusterware i get a failed error message whilst it try's to confiure the Oracle Clusterware Configuration Assistant.
The error message i get is:
Command = C:\Windows\system 32\cmd\C Call E:\Oracle\Product\10.2.0\crs\install\crssetup.config.bat
PROT-1:Failed to intialize OCR Config
STEP 1 Checking status of CRS cluster
STEP 2 Creating directories (E:\Oracle\Product\10.2.0\Crs
STEP 3 Configuring OCR repository
ocr upgrade failed with (-1)
I've done this build many times and never seen this error. From what i've read on the internet it looks like it could be something up with my shared storage but i dont know what.
Any help would be greatly appreciated
Thanks
Lee

try to apply ''dd' command against OCR & Voting disks to ensure that their headers are formatted 100% and contains no data from old installation:
dd if=/dev/zero of=YOUR_OCR_DISK bs=1024 count=10000
dd if=/dev/zero of=YOUR_VOTE_DISK bs=1024 count=10000

Gig Ethernet V/S SCI as Cluster Private Interconnect for Oracle RAC

Hello Gurus
Can any one pls confirm if it's possible to configure 2 or more Gigabit Ethernet interconnects ( Sun Cluster 3.1 Private Interconnects) on a E6900 cluster ?
It's for a High Availability requirement of Oracle 9i RAC. i need to know ,
1) can i use gigabit ethernet as Private cluster interconnect for Deploying Oracle RAC on E6900 ?
2) What is the recommended Private Cluster Interconnect for Oracle RAC ? GiG ethernet or SCI with RSM ?
3) How about the scenarios where one can have say 3 X Gig Ethernet V/S 2 X SCI , as their cluster's Private Interconnects ?
4) How the Interconnect traffic gets distributed amongest the multiple GigaBit ethernet Interconnects ( For oracle RAC) , & is anything required to be done at oracle Rac Level to enable Oracle to recognise that there are multiple interconnect cards it needs to start utilizing all of the GigaBit ethernet Interfaces for transfering packets ?
5) what would happen to Oracle RAC if one of the Gigabit ethernet private interconnects fails
Have tried searching for this info but could not locate any doc that can precisely clarify these doubts that i have .........
thanks for the patience
Regards,
Nilesh

Answers inline...
Tim
Can any one pls confirm if it's possible to configure
2 or more Gigabit Ethernet interconnects ( Sun
Cluster 3.1 Private Interconnects) on a E6900
cluster ?Yes, absolutely. You can configure up to 6 NICs for the private networks. Traffic is automatically striped across them if you specify clprivnet0 to Oracle RAC (9i or 10g). That is TCP connections and UDP messages.
It's for a High Availability requirement of Oracle
9i RAC. i need to know ,
1) can i use gigabit ethernet as Private cluster
interconnect for Deploying Oracle RAC on E6900 ? Yes, definitely.
2) What is the recommended Private Cluster
Interconnect for Oracle RAC ? GiG ethernet or SCI
with RSM ? SCI is or is in the process of being EOL'ed. Gigabit is usually sufficient. Longer term you may want to consider Infiniband or 10 Gigabit ethernet with RDS.
3) How about the scenarios where one can have say 3 X
Gig Ethernet V/S 2 X SCI , as their cluster's
Private Interconnects ? I would still go for 3 x GbE because it is usually cheaper and will probably work just as well. The latency and bandwidth differences are often masked by the performance of the software higher up the stack. In short, unless you tuned the heck out of your application and just about everything else, don't worry too much about the difference between GbE and SCI.
4) How the Interconnect traffic gets distributed
amongest the multiple GigaBit ethernet Interconnects
( For oracle RAC) , & is anything required to be done
at oracle Rac Level to enable Oracle to recognise
that there are multiple interconnect cards it needs
to start utilizing all of the GigaBit ethernet
Interfaces for transfering packets ?You don't need to do anything at the Oracle level. That's the beauty of using Oracle RAC with Sun Cluster as opposed to RAC on its own. The striping takes place automatically and transparently behind the scenes.
5) what would happen to Oracle RAC if one of the
Gigabit ethernet private interconnects fails It's completely transparent. Oracle will never see the failure.
Have tried searching for this info but could not
locate any doc that can precisely clarify these
doubts that i have .........This is all covered in a paper that I have just completed and should be published after Christmas. Unfortunately, I cannot give out the paper yet.
thanks for the patience
Regards,
Nilesh

Extended RAC failure question

Hi gurus.
I have a question about this kind of environment.
In a Extended RAC scenario, with host-based mirroring (active/active storage with ASM) and a third site to mantain the voting disk, What will happen in a communication failure between the two sites (DWDM failure)?
This failure could be a problem? The Oracle RAC will survive?
Thank you!

Hi,
In a Extended RAC scenario, with host-based mirroring (active/active storage with ASM) and a third site to mantain the voting disk, What will happen in a communication failure between the two sites (DWDM failure)?
This failure could be a problem? The Oracle RAC will survive?This questions you must not get here ... you must get these answers by testing their environment.
Extended clusters need additional destructive testing, covering:
** Site failure*
** Communication failure*
But..... If DWDM fails the split-brain scenario should happen. You should search about "Oracle RAC Cluster Fencing ( For your specific release)" to understand the symptoms caused by split-brain scenario.
Regards,
Levi Pereira

Recommendations - Oracle RAC 10g on Solaris 10 Containers Logical/Local..

Dear Oracle Experts et all
I have a couple of questions for Oracle 10g RAC implementation on Solaris and seek your advice. we are attempting to implement oracle 10g RAC on Solaris OS and SPARC Platform.
1 We are wondering if Oracle 10g RAC could be implemented on Solaris Local/Logical Containers? I was assuming that Oracle will always link it self with OS binaries and Libraries while S/W installation and hence will need an OS image/Root Disk over which it could go. However, in containers, I assume we have a single solaris installation and configuration which will thus be shared to the containers which will be further configured in it. In such situations how does Oracle instalation proceed? Do I need to look at a scenario where, the global Container/Zone will have Oracle install and this image be shared across to zones/containers accordingly? If it is so, what all filesystems from OS will need to be shared across to these zones/containers?
Additionally, even if this approach is supported, is it a recommended approach? I am unsure about the stability and functionality of Oracle in such cases and am not able to completly conceptualize. However, I assume there could be certain items which needs to be approprietly taken care off. It will help if you could share observations from your experiences.
2 The idea of RAC we are looking at is to have multiple Oracle Installations on top of native clustering solution say veritas clusters/Sun Clusters. Do we still need to have Oracle Cluster solution Clusterware (ORACRS) on top of this to achieve Oracle Clustering? Will I be able to install Oracle as a standalone installation on top of native clustering solution say veritas clusters/Sun Clusters?
Our requirement is to have the above mentioned multiple Oracle installations spread across two (2) seperate H/W platforms,say Node A and Node B, and configure our Cluster Solution to behave as active-passive across Node A and Node B. In other words, I will configure Clustering Solution like VRTS/SunCluster in Active-Passive, then have 3 Oracle installations on Node A, another 3 on Node B. I will configure one database each for each of these Oracle S/W installation (with an idea not to have Clusterware between clustering solution VRTS/SunCluster and Oracle installation, if it works). Now I will run 3 databases thus on each of these nodes. If any downtime happens on any one of the nodes, say Node A, I will fail all oracle databases and S/W accordingly to the alternate available node, Node B in this case, using native clustering solution and I will want the database to behave as it was behaving earlier, on Node A. I am not sure though if I will be able to bring the database up on Node B when resources in OS perspective are failed over.
we want to use Oracle 10g RAC Release 2 EE on Solaris 10 OS latest/one before the latest release.
Please share your thoughts.
Regards!
Sarat

Sarat Chandra C wrote:
Dear Oracle Experts et all
I have a couple of questions for Oracle 10g RAC implementation on Solaris and seek your advice. we are attempting to implement oracle 10g RAC on Solaris OS and SPARC Platform.
1 We are wondering if Oracle 10g RAC could be implemented on Solaris Local/Logical Containers? My understanding is that RAC in a Zone (Container) is not supported by Oracle, and will not work anyway. Regardless of installation, RAC needs to do cluster level stuff about the cluster configuration, changing network addresses dynamically, and sending guaranteed messages over the cluster interconnect. None of this stuff can be done in a Local Zone in Solaris, because Local Zones have fewer permissions that the Global Zone. This is part of the design of Solaris Zones, and nothing to do with how Oracle RAC itself works on them.
This is all down to the security model of Zones, and Local Zones lack the ability to do certain things, to stop them reconfiguring themselves and impacting other Zones. Hence RAC cannot do dynamic cluster reconfiguration in a Local Zone, such as changing virtual network addresses when a node fails.
My understanding is that RAC just cannot work in a Local Zone. This was certainly true 5 years ago (mid 2005), and was a result of the inherent design and implementation of Zones in Solaris. Things may have changed, so check the Solaris documentation, and check if Oracle RAC is supported in Local Zones. However, as I said, this limitation was inherent in the design of Zones, so I do not see how Sun could possibly have changed it so that RAC would work in a Local Zone.
To me, your only option is the Global Zone. Which pretty much destroys the argument for having Zones on a Solaris system, unless you can host other non-Oracle application on the other Zones.
2 The idea of RAC we are looking at is to have multiple Oracle Installations on top of native clustering solution say veritas clusters/Sun Clusters. Do we still need to have Oracle Cluster solution Clusterware (ORACRS) on top of this to achieve Oracle Clustering? Will I be able to install Oracle as a standalone installation on top of native clustering solution say veritas clusters/Sun Clusters?I am not sure the term 'native' is correct. All 'Cluster' software is low level, and has components that run within the operating system. Whether this is Sun Cluster, Veritas Cluster Server, or Oracle Clusterware. They are all as 'native' to Solaris as each other. They all perform the same function for Oracle RAC around Cluster management - which nodes are members of the cluster, heartbeats between nodes, reliable fast message delivery, etc.
You only need one piece of Cluster software. So pick one and use it. If you use the Sun or Veritas cluster products, then you do not need the Oracle Clusterware software. But I would use it, because it is free (included with RAC), is from Oracle themselves and so guaranteed to work, is fully supported, and is one less third party product to deal with. Having an all Oracle software stack makes things simpler and more reliable, as far as I am concerned. You can be sure that Oracle will have fully tested RAC on their own Clusterware, and be able to replicate any issues in their own support environments.
Officially the Sun and Veritas products will work and are supported. But when you get a problem with your Cluster environment, who are you going to call? You really want to avoid "finger pointing" when you have a problem, with each vendor blaming the cause of the problem on another vendor. Using an all Oracle stack is simpler, and ensures Oracle will "own" all your support problems.
Also future upgrades between versions will be simpler, as Oracle will release all their software together, and have tested it together. When using third party Cluster software, you have to wait for all vendors to release new versions of their own software, and then wait again while it is tested against all the different third party software that runs on it. I have heard of customers stuck on old versions of certain cluster products, who cannot upgrade because there are no compatible combinations in the support matrices between the cluster product and Oracle database versions.
I will configure Clustering Solution like VRTS/SunCluster in Active-Passive, then have 3 Oracle installations on Node A, another 3 on Node B. As I said before, these 3 Oracle installations will actually all be on the same Global Zone, because RAC will not go into Local Zones.
John

Oracle RAC 11G - Service configuration

Hi,
I have been reading a lot of documentation regarding oracle services and I have an ok understanding of how they work. However, I have a general question regarding configuring services using Oracle RAC. For instance, if I have a 2 node oracle 11GR2 RAC on a Linux Redhat server. I have an application that connects to a service I have created. I create the service as follows.
srvctl add service -d ORCL_RAC -s APP_SERVICE -r ORCL_RAC1,ORCL_RAC2
The tnsnames contains:
APP_OLTP =
(DESCRIPTION =
(LOAD_BALANCE = ON)
(FAILOVER = ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = server01)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = server02)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = APP_SERVICE)
(FAILOVER_MODE =
(TYPE = SELECT)
(METHOD = BASIC)
(RETRIES = 20)
(DELAY = 1)
My questions are as follows:
1) When I do a 'srvctl status service -d ORCL_RAC', should I see the service running on both nodes of the RAC? Or does it run only one node, then it will fail over to the other when needed?
2) If I have a RAC environment where I see two services created (RAC_SRV1 and RAC_SRV2). I see that RAC_SRV1 is only running on node1 and RAC_SRV2 is only running on node2. There are two applications sharing the same database, one application is using RAC_SRV1 and the other application is using RAC_SRV2. Am I correct in thinking that there is no failover available here? If node1 goes down, the application connecting to RAC_SRV1 will not be able to connect to node2 right?
3) In the case of the scenario in question 2 above, would it be best practise to simply create one service and have both applications connecting to the one service? Could I configure the one service to point connections from one application to node1 and connections from the other application to node2?

1) When I do a 'srvctl status service -d ORCL_RAC', should I see the service running on both nodes of the RAC? Or does it run only one node, then it will fail over to the other when needed?you can see its running on both nodes.
use option -a in srvctl ( A list of available instances to which the service fails over when the database is administrator managed.)
http://docs.oracle.com/cd/E11882_01/rac.112/e16795/srvctladmin.htm#i1008562
2) If I have a RAC environment where I see two services created (RAC_SRV1 and RAC_SRV2). I see that RAC_SRV1 is only running on node1 and RAC_SRV2 is >only running on node2. There are two applications sharing the same database, one application is using RAC_SRV1 and the other application is using RAC_SRV2. Am >I correct in thinking that there is no failover available here? If node1 goes down, the application connecting to RAC_SRV1 will not be able to connect to node2 >right?All depend on your service configuration. check it by srvctl config
3) In the case of the scenario in question 2 above, would it be best practise to simply create one service and have both applications connecting to the one >service? Could I configure the one service to point connections from one application to node1 and connections from the other application to node2? better create two service ,one for each application with specific node and other node in available list.

How to configure sun application server 8.2 for Oracle RAC 10g

Hello,
We have numerous boxes running the sun platform application server 8.2 and 2 boxes running enterprise version 8.2 all connecting to a 4 node Oracle RAC 10 G release 2 database. We have the system up and working. The application servers are connecting just fine to the database and the apps don't have any problems querying, inserting, etc. However, when we try to do failover testing of situations when a node or nodes of the Oracle RAC database goes down the application server does not gain new valid connections. Our configuration is this, OracleDataSource for the data source, table validation turned on with a valid table, ONS configuration set in properties, connectionCache enabled, and fastconnectionfailover enabled as well in the properties. We have that long oracle rac url with load balancing turned on set fot the database URL. We have the checkbox checked to fail all connections on any failure. ONS is configured properly within the database because we have a java application that runs outside of the application server that uses all the same settings described above (only set manually in our code for the OracleDataSource). This application works seemlessly when DB nodes are shutdown. We can shutdown all but one node and it's still humming along without skipping a beat. Start up one of the others, kill the last node, it still hums along nicely without skipping a beat. We'd really like to get the applications running in the application server to work the same way. Any help would be greatly appreciated. We've tried all the combinations that we can think of with configuration settings in the application server and it never works. Am tempted to rip out the database connection pool from inside the application server and configure it manually in the code but we are using entity beans and this is the much easier approach, if it will work. It's down to the point of does sun application server actually work with oracle RAC for connection failovers.

Hi,
We are also facing similar execption. Here is the error, we are getting, when a node is failed on RAC.
[#|2007-11-11T12:43:53.685+0000|WARNING|sun-appserver-ee8.1_02|javax.enterprise.system.core.transaction|_ThreadID=38;|JTS5041: The resource manager is doing work outside a global transaction
oracle.jdbc.xa.OracleXAException
at oracle.jdbc.xa.OracleXAResource.checkError(OracleXAResource.java:1270)
at oracle.jdbc.xa.client.OracleXAResource.start(OracleXAResource.java:318)
at com.sun.gjc.spi.XAResourceImpl.start(XAResourceImpl.java:184)
at com.sun.jts.jta.TransactionState.startAssociation(TransactionState.java:258)
at com.sun.jts.jta.TransactionImpl.enlistResource(TransactionImpl.java:181)
at com.sun.enterprise.distributedtx.J2EETransaction.enlistResource(J2EETransaction.java:397)
at com.sun.enterprise.distributedtx.J2EETransactionManagerImpl.enlistResource(J2EETransactionManagerImpl.java:312)
at com.sun.enterprise.distributedtx.J2EETransactionManagerOpt.enlistResource(J2EETransactionManagerOpt.java:114)
at com.sun.enterprise.resource.ResourceManagerImpl.registerResource(ResourceManagerImpl.java:113)
at com.sun.enterprise.resource.ResourceManagerImpl.enlistResource(ResourceManagerImpl.java:71)
at com.sun.enterprise.resource.PoolManagerImpl.getResource(PoolManagerImpl.java:176)
at com.sun.enterprise.connectors.ConnectionManagerImpl.internalGetConnection(ConnectionManagerImpl.java:268)
at com.sun.enterprise.connectors.ConnectionManagerImpl.allocateConnection(ConnectionManagerImpl.java:193)
at com.sun.enterprise.connectors.ConnectionManagerImpl.allocateConnection(ConnectionManagerImpl.java:122)
at com.sun.gjc.spi.DataSource.getConnection(DataSource.java:70)
at com.syntegra.nasp.etp.dax.DBManager.getConnection(DBManager.java:192)
at com.syntegra.nasp.etp.dax.DBManager.createDBCommand(DBManager.java:241)
at com.syntegra.nasp.etp.dax.DBManager.createDBCommand(DBManager.java:251)
at com.syntegra.nasp.etp.dax.sp.SPS_PRESCRIPTION_GUID_PROC.getCommand(SPS_PRESCRIPTION_GUID_PROC.java:31)
at com.syntegra.nasp.etp.dax.sp.SPS_PRESCRIPTION_GUID_PROC.execute(SPS_PRESCRIPTION_GUID_PROC.java:23)
at com.syntegra.nasp.etp.dax.PrescriptionBaseDataMapper.loadPresciptionByGUID(PrescriptionBaseDataMapper.java:203)
at com.syntegra.nasp.etp.model.PrescriptionBase.findByPrescriptionGUID(PrescriptionBase.java:176)
at com.syntegra.nasp.etp.messages.PatientPrescriptionReleaseRequest.execute(PatientPrescriptionReleaseRequest.java:120)
at com.syntegra.nasp.etp.service.ETPSLBean.processMessage(ETPSLBean.java:159)
at sun.reflect.GeneratedMethodAccessor97.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:324)
at com.sun.enterprise.security.SecurityUtil.invoke(SecurityUtil.java:147)
at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:128)
at $Proxy6.processMessage(Unknown Source)
at com.syntegra.nasp.etp.listener.RequestListener.onRequest(RequestListener.java:204)
at com.syntegra.spine.csf.consumer.mdb.CSFListenerRegisteringConsumer.onRequest(CSFListenerRegisteringConsumer.java:54)
at com.syntegra.spine.csf.consumer.mdb.CSFConsumerBase.invokeListener(CSFConsumerBase.java:267)
at com.syntegra.spine.csf.consumer.mdb.CSFConsumerBase.processMessage(CSFConsumerBase.java:180)
at com.syntegra.spine.csf.consumer.mdb.CSFConsumerBase.onMessage(CSFConsumerBase.java:102)
at sun.reflect.GeneratedMethodAccessor96.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:324)
at com.sun.enterprise.security.SecurityUtil$2.run(SecurityUtil.java:153)
at java.security.AccessController.doPrivileged(Native Method)
at com.sun.enterprise.security.application.EJBSecurityManager.doAsPrivileged(EJBSecurityManager.java:955)
at com.sun.enterprise.security.SecurityUtil.invoke(SecurityUtil.java:158)
at com.sun.ejb.containers.MessageBeanContainer.deliverMessage(MessageBeanContainer.java:956)
at com.sun.ejb.containers.MessageBeanListenerImpl.deliverMessage(MessageBeanListenerImpl.java:42)
at com.sun.enterprise.connectors.inflow.MessageEndpointInvocationHandler.invoke(MessageEndpointInvocationHandler.java:130)
at $Proxy9.onMessage(Unknown Source)
at com.sun.genericra.inbound.DeliveryHelper.deliverMessage(DeliveryHelper.java:183)
at com.sun.genericra.inbound.DeliveryHelper.deliver(DeliveryHelper.
Regards
Selvan.

Oracle RAC Nodes getting reboot in case of preferred controller failed

When we are disconnecting both Fiber cable from preferred Controller A or plugging out Controller A card from Disk Array(IBM DS 4300), After 90 seconds both the servers are rebooting.
In this time complete RAC network is going out of service for approx 5 minutes.After reboot both servers are coming with both instances without any manual intervention
Its a critical issue for us because we are loosing High Availability, Let us know how we can resolve this critical issue.
Detail of Network:
1. Software- Oracle 10g Release2
2. OS- Redhat Linux 3 (Kernel Version-2.4.21-27.ELsmp)
3. Shared Storage- IBM DS 4300.
4. Multipathing Driver - RDAC (rdac-LINUX-09.00 A5.13)
4. Nodes- IBM 346
5. Databse on ASM
6. ASM,OCR & Voting Disk Preferred controller is A.
7. Hangcheck timer value is 210 seconds.
8. Both Server available with 2 HBA port . I HBA port is connected with Controller A and Seconfd HBA port is connected with Controller B of SAN Disk Array.
As per my understanding,
Voting disk resides in Disk Array and Controller A is preferred owner of Voting Disk LUN.. When i am disconnecting both fiber cable from preferred controller A , then Both Nodes Clusterware software trying to contact with Voting Disk, When they are unable to contact with Voting disk in specfic time period, they are going for reboot.
I tested Controller failure testing with Oracle RAC software as well without Oracle. Without Oracle its working fine and reason behind, in that time Disk Array is waiting for approx 300 seconds for changing preferred controlller from A to B.
But With Oracle, Clusterware Software reboot both nodes before Controller can shift from A to B.
So if i conclude,the tech who has good understanding of Oracle Clusterware on Linux OS & IBM RDAC multipath driver can help me.
when we install Oracle RAC on Linux, it is required to configure hangcheck timer.
Oracle recomends 180 second.
It means if one of node is hanging, then second node will wait for 180 seconds, if within 180 seconds ,it is not able to resolve this situation then it will reboot hung node.
I think Hangcheck timer configuration reuired only with Linux OS.
Configuration File
cat >> /etc/rc.d/rc.local << EOF
modprobe hangcheck-timer hangcheck_tick=15 hangcheck_margin=60

Sorry
Hangcheck timer is
Configuration File
cat >> /etc/rc.d/rc.local << EOF
modprobe hangcheck-timer hangcheck_tick=30 hangcheck_margin=180

Oracle RAC 10g on Solaris x86 using vmware

Guys,
I am practising 10g RAC installation on my laptop, wherein I have vmware workstation and
solaris-x86 version 10 installed on it.
I am trying to install Oracle 10g clusterware.
I have followed the steps for clusterware installation.
I am facing an errror on one of the node while running root.sh.
On the first node, the root.sh runs fine.
The following is the log.
bash-3.00# ./root.sh
WARNING: directory '/u01/app/oracle/product' is not owned by root
WARNING: directory '/u01/app/oracle' is not owned by root
WARNING: directory '/u01/app' is not owned by root
WARNING: directory '/u01' is not owned by root
Checking to see if Oracle CRS stack is already configured
Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/u01/app/oracle/product' is not owned by root
WARNING: directory '/u01/app/oracle' is not owned by root
WARNING: directory '/u01/app' is not owned by root
WARNING: directory '/u01' is not owned by root
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: xsan001 xsan001-priv xsan001
node 2: xsan002 xsan002-priv xsan002
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Now formatting voting device: /dev/rdsk/c0d0s4
Format of 1 voting devices complete.
Startup will be queued to init within 30 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
xsan001
CSS is inactive on these nodes.
xsan002
Local node checking complete.
Run root.sh on remaining nodes to start CRS daemons.
=============================
On the second node, the root.sh gives error.
bash-3.00# ./root.sh
WARNING: directory '/u01/app/oracle/product' is not owned by root
WARNING: directory '/u01/app/oracle' is not owned by root
WARNING: directory '/u01/app' is not owned by root
WARNING: directory '/u01' is not owned by root
Checking to see if Oracle CRS stack is already configured
Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/u01/app/oracle/product' is not owned by root
WARNING: directory '/u01/app/oracle' is not owned by root
WARNING: directory '/u01/app' is not owned by root
WARNING: directory '/u01' is not owned by root
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: xsan001 xsan001-priv xsan001
node 2: xsan002 xsan002-priv xsan002
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Now formatting voting device: /dev/rdsk/c0d0s4
Format of 1 voting devices complete.
Startup will be queued to init within 30 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
Failure at final check of Oracle CRS stack.
10
Can you provide some sort of clue what could be done to resolve the above error?

Refer to http://www.oracleracsig.org/ Under documents section select OS=Solaris. You will find a document on Oracle RAC on Solaris10 using Vmware.

Oracle RAC Interconnect, PowerVM VLANs, and the Limit of 20

Hello,
Our company has a requirement to build a multitude of Oracle RAC clusters on AIX using Power VM on 770s and 795 hardware.
We presently have 802.1q trunking configured on our Virtual I/O Servers, and have currently consumed 12 of 20 allowed VLANs for a virtual ethernet adapter. We have read the Oracle RAC FAQ on Oracle Metalink and it seems to otherwise discourage the use of sharing these interconnect VLANs between different clusters. This puts us in a scalability bind; IBM limits VLANs to 20 and Oracle says there is a one-to-one relationship between VLANs and subnets and RAC clusters. We must assume we have a fixed number of network interfaces available and that we absolutely have to leverage virtualized network hardware in order to build these environments. "add more network adapters to VIO" isn't an acceptable solution for us.
Does anyone know if Oracle can afford any flexibility which would allow us to host multiple Oracle RAC interconnects on the same 802.1q trunk VLAN? We will independently guarantee the bandwidth, latency, and redundancy requirements are met for proper Oracle RAC performance, however we don't want a design "flaw" to cause us supportability issues in the future.
We'd like it very much if we could have a bunch of two-node clusters all sharing the same private interconnect. For example:
Cluster 1, node 1: 192.168.16.2 / 255.255.255.0 / VLAN 16
Cluster 1, node 2: 192.168.16.3 / 255.255.255.0 / VLAN 16
Cluster 2, node 1: 192.168.16.4 / 255.255.255.0 / VLAN 16
Cluster 2, node 2: 192.168.16.5 / 255.255.255.0 / VLAN 16
Cluster 3, node 1: 192.168.16.6 / 255.255.255.0 / VLAN 16
Cluster 3, node 2: 192.168.16.7 / 255.255.255.0 / VLAN 16
Cluster 4, node 1: 192.168.16.8 / 255.255.255.0 / VLAN 16
Cluster 4, node 2: 192.168.16.9 / 255.255.255.0 / VLAN 16
etc.
Whereas the concern is that Oracle Corp will only support us if we do this:
Cluster 1, node 1: 192.168.16.2 / 255.255.255.0 / VLAN 16
Cluster 1, node 2: 192.168.16.3 / 255.255.255.0 / VLAN 16
Cluster 2, node 1: 192.168.17.2 / 255.255.255.0 / VLAN 17
Cluster 2, node 2: 192.168.17.3 / 255.255.255.0 / VLAN 17
Cluster 3, node 1: 192.168.18.2 / 255.255.255.0 / VLAN 18
Cluster 3, node 2: 192.168.18.3 / 255.255.255.0 / VLAN 18
Cluster 4, node 1: 192.168.19.2 / 255.255.255.0 / VLAN 19
Cluster 4, node 2: 192.168.19.3 / 255.255.255.0 / VLAN 19
Which eats one VLAN per RAC cluster.

Thank you for your answer!!
I think I roughly understand the argument behind a 2-node RAC and a 3-node or greater RAC. We, unfortunately, were provided with two physical pieces of hardware to virtualize to support production (and two more to support non-production) and as a result we really have no place to host a third RAC node without placing it within the same "failure domain" (I hate that term) as one of the other nodes.
My role is primarily as a system engineer, and, generally speaking, our main goals are eliminating single points of failure. We may be misusing 2-node RACs to eliminate single points of failure since it seems to violate the real intentions behind RAC, which is used more appropriately to scale wide to many nodes. Unfortunately, we've scaled out to only two nodes, and opted to scale these two nodes up, making them huge with many CPUs and lots of memory.
Other options, notably the active-passive failover cluster we have in HACMP or PowerHA on the AIX / IBM Power platform is unattractive as the standby node drives no resources yet must consume CPU and memory resources so that it is prepared for a failover of the primary node. We use HACMP / PowerHA with Oracle and it works nice, however Oracle RAC, even in a two-node configuration, drives load on both nodes unlike with an active-passive clustering technology.
All that aside, I am posing the question to both IBM, our Oracle DBAs (whom will ask Oracle Support). Typically the answers we get vary widely depending on the experience and skill level of the support personnel we get on both the Oracle and IBM sides... so on a suggestion from a colleague (Hi Kevin!) I posted here. I'm concerned that the answer from Oracle Support will unthinkingly be "you can't do that, my script says to tell you the absolute most rigid interpretation of the support document" while all the time the same document talks of the use of NFS and/or iSCSI storage eye roll
We have a massive deployment of Oracle EBS and honestly the interconnect doesn't even touch 100mbit speeds even though the configuration has been checked multiple times by Oracle and IBM and with the knowledge that Oracle EBS is supposed to heavily leverage RAC. I haven't met a single person who doesn't look at our environment and suggest jumbo frames. It's a joke at this point... comments like "OMG YOU DON'T HAVE JUMBO FRAMES" and/or "OMG YOU'RE NOT USING INFINIBAND WHATTA NOOB" are commonplace when new DBAs are hired. I maintain that the utilization numbers don't support this.
I can tell you that we have 8Gb fiber channel storage and 10Gb network connectivity. I would probably assume that there were a bottleneck in the storage infrastructure first. But alas, I digress.
Mainly I'm looking for a real-world answer to this question. Aside from violating every last recommendation and making oracle support folk gently weep at the suggestion, are there any issues with sharing interconnects between RAC environments that will prevent it's functionality and/or reduce it's stability?
We have rapid spanning tree configured, as far as I know, and our network folks have tuned the timers razor thin. We have Nexus 5k and Nexus 7k network infrastructure. The typical issues you'd fine with standard spanning tree really don't affect us because our network people are just that damn good.

Getting error when try to backup oracle rac to another location

Hi there,
I am attempting to backup database to another location from a Oracle RAC database version 11gr2. Here is my script:
#!/bin/ksh
export ORACLE_SID=vvsms1
ORACLE_BASE=/u01/app/oracle; export ORACLE_BASE
ORACLE_HOME=$ORACLE_BASE/product/11.2.0/dbhome_1; export ORACLE_HOME
BASE_PATH=/usr/sbin:$PATH; export BASE_PATH
PATH=$ORACLE_HOME/bin:$BASE_PATH; export PATH
/u01/app/oracle/product/11.2.0/dbhome_1/bin/rman target sys/viviet@vvsms log /home/oracle/log_rman/vvsms.log append <<EOF
RUN {
CROSSCHECK BACKUP;
CROSSCHECK ARCHIVELOG ALL;
ALLOCATE CHANNEL CHANNEL1 TYPE DISK FORMAT '/home/oracle/backup/vvsms/backup_%U';
BACKUP INCREMENTAL LEVEL 0 TAG 'incr_vvsms' DATABASE;
BACKUP ARCHIVELOG ALL;
DELETE OBSOLETE;
RELEASE CHANNEL CHANNEL1;
EXIT;
EOF
I write it into a .sh file and set crontab to run it. But when it run i get the error like these:
Starting backup at 22-OCT-12
channel CHANNEL1: starting incremental level 0 datafile backup set
channel CHANNEL1: specifying datafile(s) in backup set
input datafile file number=00002 name=+DISK2/vvsms/datafile/sysaux.289.794242439
input datafile file number=00006 name=+DISK2/vvsms/datafile/ts_service.dbf
input datafile file number=00007 name=+DISK2/vvsms/datafile/ts_viviet.dbf
input datafile file number=00008 name=+DISK2/vvsms/datafile/viviet.dbf
input datafile file number=00009 name=+DISK2/vvsms/datafile/ts_vivietct_primary.dbf
input datafile file number=00003 name=+DISK2/vvsms/datafile/undotbs1.290.794242445
input datafile file number=00001 name=+DISK2/vvsms/datafile/system.288.794242429
input datafile file number=00004 name=+DISK2/vvsms/datafile/undotbs2.292.794242453
input datafile file number=00005 name=+DISK2/vvsms/datafile/users.293.794242455
channel CHANNEL1: starting piece 1 at 22-OCT-12
released channel: CHANNEL1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on CHANNEL1 channel at 10/22/2012 01:50:16
ORA-19504: failed to create file "/home/oracle/backup/vvsms/backup_2anobqu5_1_1"
ORA-27040: file create error, unable to create file
Linux-x86_64 Error: 2: No such file or directory
I don't know what i'm wrong? The location is correct ("/home/oracle/backup/vvsms/).
Please suggest me some things about it. What do i need to do?
Thanks in advance!
P/s: Sorry for my bad English.

/u01/app/oracle/product/11.2.0/dbhome_1/bin/rman target sys/viviet@vvsms log /home/oracle/log_rman/vvsms.log appendThis line can be your problem.
As this database is a RAC and your are using a service with load balance "vvsms" and "/home/oracle/backup/vvsms" is not a shared location. RMAN is starting a session on Other node where "/home/oracle/backup/vvsms" does not exists.
Try change it :
/u01/app/oracle/product/11.2.0/dbhome_1/bin/rman target sys/viviet@vvsms log /home/oracle/log_rman/vvsms.log appendTo this (using Easy Connect Method):
/u01/app/oracle/product/11.2.0/dbhome_1/bin/rman target sys/viviet@localhost:1521/VVSMS log /home/oracle/log_rman/vvsms.log appendWhere:
localhost: your local node
VVSMS : is the service of database
Also check if "/u01/app/oracle/product/11.2.0/dbhome_1/network/admin/sqlnet.ora" have configured the line "NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)"
P.S : When RMAN start a session it show where is logged, check on logs what instance RMAN was logged.
HTH,
Levi Pereira

FDM is not supported on Oracle RAC? Why not?

[http://download.oracle.com/docs/cd/E12825_01/epm.111/fdm_11113_readme.pdf|http://download.oracle.com/docs/cd/E12825_01/epm.111/fdm_11113_readme.pdf] We are in the process of implementing Oracle Hyperion Financial Data Quality Management (FDM) version 11.1.1.3 as part of our Hyperion Essbase BI environment. I read over the documentation prior to building the databases, but apparently I missed an important bit of information. It seems that FDM is not supported on Oracle RAC. I just noticed this under in the "Known Issues" section of the FDM 11.1.1.3 release notes, and that was only one sentence. The release notes do not explain why this is the case.
- Does anyone know why FDM release 11.1.1.3 is not supported for Oracle RAC?
- Has this always been the case?
- Can a single node database, running on an Oracle 10.2.0.4 two node RAC environment, be used instead of building a separate, stand-alone database?
Link to readme document. The reference to RAC support, or non-support, is on page 6:
http://download.oracle.com/docs/cd/E12825_01/epm.111/fdm_11113_readme.pdf

Hi Daan,
I belive we all shoud consider http://forums.oracle.com/forums/ann.jspa?annID=939
I get upset to see people on this forum who answer the questions like *'use google'* or *'let me google it for you'* or *'use this forum filter'*
I belive it is better to help them answer a better question or if you don't like the question ignore it.
Hi user10511107,
I belive your problem is with Windows not with BI, BI just can't get windows version,
If you are on right version of windows go to MSDN and search for your error and how to fix it
ERROR: Provider Load Failure.
Regards
Nicolae

Failover not happening the Oracle RAC 10g

Hi All,
I am new to RAC.
I have installed Oracle RAC 10g on Redhat Linux 4.0. Till yesterday failover was happening that is when i stopped one instance on node01 the vip of node01 was transferred to node02.This was shown using ifconfig -a but now that is now happening.Don't know as what has happened.Can you please help me out
Below information is given:
[oracle@node01 ~]$ crs_stat -t
Name Type Target State Host
ora.hitesh.db application ONLINE ONLINE node02
ora....h1.inst application ONLINE ONLINE node01
ora....h2.inst application OFFLINE OFFLINE
ora....SM1.asm application ONLINE ONLINE node01
ora....01.lsnr application ONLINE ONLINE node01
ora.node01.gsd application ONLINE ONLINE node01
ora.node01.ons application ONLINE ONLINE node01
ora.node01.vip application ONLINE ONLINE node01
ora....SM2.asm application ONLINE ONLINE node02
ora....02.lsnr application ONLINE ONLINE node02
ora.node02.gsd application ONLINE ONLINE node02
ora.node02.ons application ONLINE ONLINE node02
ora.node02.vip application ONLINE ONLINE node02
Listner status on node01 is given:
[oracle@node01 ~]$ lsnrctl status
LSNRCTL for Linux: Version 10.2.0.1.0 - Production on 06-APR-2013 12:59:29
Copyright (c) 1991, 2005, Oracle. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
Alias LISTENER_NODE01
Version TNSLSNR for Linux: Version 10.2.0.1.0 - Production
Start Date 06-APR-2013 11:59:03
Uptime 0 days 1 hr. 0 min. 25 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /home/oracle/oracle/product/10.2.0/db_1/network/admin/listener.ora
Listener Log File /home/oracle/oracle/product/10.2.0/db_1/network/log/listener_node01.log
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.131)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM1", status BLOCKED, has 1 handler(s) for this service...
Service "+ASM_XPT" has 1 instance(s).
Instance "+ASM1", status BLOCKED, has 1 handler(s) for this service...
Service "PLSExtProc" has 1 instance(s).
Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
Service "hitesh" has 2 instance(s).
Instance "hitesh1", status READY, has 2 handler(s) for this service...
Instance "hitesh2", status READY, has 1 handler(s) for this service...
Service "hiteshXDB" has 2 instance(s).
Instance "hitesh1", status READY, has 1 handler(s) for this service...
Instance "hitesh2", status READY, has 1 handler(s) for this service...
Service "hitesh_XPT" has 2 instance(s).
Instance "hitesh1", status READY, has 2 handler(s) for this service...
Instance "hitesh2", status READY, has 1 handler(s) for this service...
The command completed successfully
[root@node01 oracle]# crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
[root@node01 oracle]# ps -ef | grep lmon
oracle 5741 1 0 12:07 ? 00:00:03 ora_lmon_hitesh1
root 22582 20805 0 13:01 pts/2 00:00:00 grep lmon
oracle 23643 1 0 11:58 ? 00:00:01 asm_lmon_+ASM1
Please let me know what information else is required
Edited by: user12924280 on Apr 6, 2013 12:36 AM

Since you didn't say "thank you", I assumed my time was of no value to you.
However, I shall try again.
There is no relationship between instance failure and VIP failover. How can there be? What if you are running ten instances on each node, and one fails? Would you want the VIP to relocate? And I've already told you how to test it: kill the node. Just reboot it.

ASM pfile lost in Oracle RAC 11gr2

Hello Gurus,
I am new to Oracle RAC 11gr2 and facing some issues. our spfile/pfile for asm instance is lost and I am not able to startup the asm instance.
Environment is as below:
RAC 2 nodes
Oracle RAC 11gr2
Enterprise Linux Server release 5.5 (Carthage)
ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file '/u01/app/oracle/product/11.2.0/db_1/dbs/init+ASM1.ora'
Is there any way to recover the spfile/pfile ?
Also getting following error while try to connect " sqlplus / as sysasm"
$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.1.0 Production on Sun Mar 27 11:26:02 2011
Copyright (c) 1982, 2009, Oracle. All rights reserved.
ERROR:
ORA-01031: insufficient privileges
Enter user-name:
Thanks and Regards,

Hi,
Do we need to Set variable ORACLE_HOME properly (Grid Home) while starting RDBMS instance?
e.g ORACLE_HOME=/u01/app/11.2.0/grid*I recommend you set ORACLE_HOME properly for each Oracle Installation when you use SQLPLUS.
i.e: cat /etc/oratab
orcl:/u01/app/oracle/product/11.2.0/db_1
db10g:/u01/app/oracle/product/10.2.0/db_1
+ASM1:/u01/app/11.2.0/gridAlso you can startup/shutdown our database with SQLPLUS using ORACLE NET*, but it must be configured properly.
My recommendation <s>is always</s> use SRVCTL and CRSCTL to manage to your environment. Track all the process of initialization processes through the logs using ADRCI utility (only 11.1 or later).
When use SRVCTL or CRSCTL to manage I recomend you use (GRID HOME).
SQL> alter database open;
alter database open
ERROR at line 1:
ORA-16038: log 3 sequence# 1067 cannot be archived
ORA-19809: limit exceeded for recovery files
ORA-00312: online log 3 thread 2: '+FRA/yyy/onlinelog/group_3.259.738489481'
SQL>I give to you three options:
1° - Add more asmdisk on Diskgroup +FRA
2° - Make backup of all archivelog using RMAN and option delete input (i.e backup archivelog delete all input;)
3° - If this database is for TEST ONLY and you not need backup or recovery of then you can delete all archivelog using RMAN (i.e delete archivelog all;)
I recommend you create routine of backups of the database and archivelog to prevent this area becomes exhausted.
If you need the database in archivelog but it is only to test then create routine to delete archivelogs periodically.
If you dont need the database in archivelog, then just disable it.
Regards,
Levi Pereira

ORACLE RAC failure scenarios

Similar Messages

Maybe you are looking for