Extended RAC failure question

Hi gurus.
I have a question about this kind of environment.
In a Extended RAC scenario, with host-based mirroring (active/active storage with ASM) and a third site to mantain the voting disk, What will happen in a communication failure between the two sites (DWDM failure)?
This failure could be a problem? The Oracle RAC will survive?
Thank you!

Hi,
In a Extended RAC scenario, with host-based mirroring (active/active storage with ASM) and a third site to mantain the voting disk, What will happen in a communication failure between the two sites (DWDM failure)?
This failure could be a problem? The Oracle RAC will survive?This questions you must not get here ... you must get these answers by testing their environment.
Extended clusters need additional destructive testing, covering:
** Site failure*
** Communication failure*
But..... If DWDM fails the split-brain scenario should happen. You should search about "Oracle RAC Cluster Fencing ( For your specific release)" to understand the symptoms caused by split-brain scenario.
Regards,
Levi Pereira

Similar Messages

More on Extended RAC

Erik Peterson (and others who might be interested),
I looked at your Sep '06 "Extended RAC" paper and thought it was very good. In it, you show data and case studies for distances up to 100km. In a post (Re: RAC on SUN geo clusters you say you would never do 1200km. At what point do you say "X kilometers is just too far"?
We have some folks talking about doing RAC over 256km with Dark Fibre. We can multiplex with 40 different frequencies, and tests have shown an average latency of 4ms. How do we decide whether to look at RAC or not? Obviously, other High Availability options are still appropriate (Active/Passive, Data Guard), but I wanted to field the question for the sake of gathering data. I realize some people would rightly question why in the world we are even thinking about this, so consider this post one mostly of curiosity and research. =)

Well it all a matter of what performance you are willing to deal with.
You would be adding your one way latency to each cache fusion message, and round trip latency to all disk I/Os as you will want disk to be synchronous.
How much this will affect an application will depend upon how much I/O and cache fusion bound it is.
That combined w/ the unknown factor of what performance degradation you are willing to live with makes it hard to put an absolute number. Things just get worse and worse.
Personally I am very comfortable implementing extended RAC in a metro area, anything beyond that I view first as an intiial estimate (OK are you willing to live w/ this degradation) followed by a serious POC (OK are you really, really willing to live w/ this degradation)
Hope this helps.
-Erik

Any Benefits of Extended RAC Over Data Guard?

Hi,
My company is in the process of setting a second data center, a bit far from the current one (about 20 KM).
This Data Center will be used as a DR site, as well as to accommodate additional servers since the current Data Center is already stretched.
We're currently running about 5 RAC clusters, two nodes each on Oracle 11g and on AIX plaforms. It's not yet decided what type of technology will be employed with the databases - whether RAC with Data Guard (DG) or Extended RAC.The network link will be fairly good, a dark fiber link.
Does anyone have a suggestion as to which of the above technology would be preferable? With Extended RAC I think we are able to continue operating from the second site without needing to 'failover' as such, while with DG, we probably will need to put in place an elaborate failover procedure, even though we can use the Fast-Start-Automatic failover feature of 10g Rel2 and above.
Any thoughts/suggestions/clarifications?
Dula
Edited by: dula on Aug 28, 2009 2:47 PM

Hi Dula,
Even some time back we were also consdering the same to use remote DG for disaster recovery solution (active-passive) to create extended RAC on a remote site and then make use of those remote servers for the purpose of disaster reocovery and also make those servers connect to the PROD application servers thus making a (active-active) clustering solution.
Management wanted to look into the feasibility of using active-active nodes on remote sites which they thought will help them make use of remote servers to connect to the LIVE applicatio. With the active-passive mode where there was just standby servers in recovery mode on remote site they were of the opinion that why to waste power for remote servers and why not use them in LIVE site.
We thought of using the dense optic fiber for the DB replication to remote RAC nodes and also for our 30+ application servers to connect from site 1 to site 2( site 2 will be remote site with extented rac nodes hosted).
However there were many reservations that came which making decission. First was that we could not find any reliable source of group who had successfully implemented this active-active remote extended rac nodes in production. The cost of dense optic fiber was another consideration. How the interconnects will perform in remote sites was another grey area for us. Also since the application servers uses TAF to connect to DB servers it was not known that how the sessions from the same application servers(web based applications) to local and remote nodes will have performance impact. With so many grey areas we dropped the idea of using extended rac nodes and went ahead with DG solution.
Amar

VIP and Public Network on Extended Rac

Hi Everybody
I want to install Oracle Extended Rac on two Sites which is 25Km far and the spec is :
Oracle 11.1.0.6
HP PA-RISC server
EVA Storage
my problem is
each of my sites has different subnet for public network so for VIP failover there is a problem.
how can solve this problem.
can we have two site with on subnet? how?
I have this problem eith Interconnect too.
thanks anyway
Hashem

hi
mind oifcfg tool
http://www.databasejournal.com/features/oracle/article.php/3651826/Oracle-RAC-Administration---Part-12-RAC-Essentials.htm
and
http://download.oracle.com/docs/cd/B28359_01/rac.111/b28255/oifcfg.htm

10g Extended RAC with Symantec/Veritas

Hi,
Do you know any solution for an active-active 10g extended RAC using Symantec/Veritas Storage Foundation ?
I've seen some active-pasive solutions using Veritas/Symantec GCO and VVR but we would want that the extended node can be online and do some workload.
Thanks in advance
Regards
Jose

You might take campus cluster into your consideration. (2 SFRAC clusters + VxVM Mirror)
Or you can also consitder : 2 SFRAC clusters + GCO + Logical Oracle dataguard.
Thanks.

Configuring our RAC environment Questions

The environment consists of Sun Solaris 10, Veritas, and 10g RAC:
Questions:
I need to know the settings and configuration of the entire software stack that will be the foundation of the oracle RAC environment....Network configurations, settings and requirements for any networks including the rac network between servers
How to set up the solaris 10k structures: what goes into the global zones, the containers, the resource groups, RBAC roles, SMF configuration, schedulers?
Can we use zfs, and if so, what configuration, and what settings?
In addition, these questions I need answers to:
What I am looking for is:
-- special hardware configuration issues, in particular the server rac interconnect. Do we need a hub, switch or crossover cables configured how.
-- Operating System versions and configuration. If it is Solaris 10, then there are more specific requirements: how to handle smf, containers, kernel settings, IPMP, NTP, RBAC, SSH, etc.
-- Disk layout on SAN, including a design for growth several years out: what are the file systems with the most contention, most use, command tag depth issues etc. (can send my questionnaire)
-- Configuration settings\ best practices for Foundation suite for RAC and Volume manager
-- How to test and Tune the Foundation suite settings for thru-put optimization. I can provide stats from the server and the san, but how do we coordinate that with the database.
-- How to test RAC failover -- what items will be monitored for failover that need to be considered from the server perspective.
-- How to test data guard failures and failover -- does system administration have to be prepared to help out at all?
-- How to configure Netbackup --- backups

Answering all these questions accurately and correctly for you implementation might be a bit much for a forum posting.
First I'd recommend accessing the Oracle documentation on otn.oracle.com. This should get you the basics about what is supported for the environment your looking to set up, and go a long way to answering your detailed questions.
Then I'd break this down into smaller sets of specific questions and try and get the RAC axters on the RAC forum to help out.
See: Community Discussion Forums » Grid Computing » Real Application Clusters
Finally Oracle Support via Metalink should be able to fill in any gaps int he documentation.
Good luck on your project,
Tony

Oracle 10gR2 RAC - ASM question

Hi
I have a question regarding the ASM storage. Let says I have a system here running Oracle 10gR2 RAC and would like to add a new/extend the current DATA disk group with more disk space. How do I do that? will it affect the existing data stored inside there?

So to add a little more to the discussion. Let's say your storage administrator presents you a LUN and is nice enough to add a partition of say 7G. (/dev/sdo1).
Now you need to take /dev/sdo1 stamp it and alter your storage group.
For illustration purposes I shall use rac1 and rac2 as my dual instance RAC and add to the asm group ARCH.
As root on rac1
/etc/init.d/oracleasm createdisk ARCH2 /dev/sdo1
then run
/etc/init.d/oracleasm listdisks
to make sure ARCH2 shows up.
On rac2 you run
/etc/init.d/oracleasm listdisks
You don't see ARCH2 so then run
/etc/init.d/oracleasm scandisks
then
/etc/init.d/oracleasm listdisks
Now you should see ARCH2
Ok the asm stamps are in sync now.
Back to rac1
su - oracle
set ORACLE_SID to asm instance and use sqlplus
sqlplus / as sysasm
If you query V$ASM_DISK you will see your disk with a header_status of PROVISIONED
that's good ...
NOw while still in sqlplus
Let's bump up the asm_power_limit so rebalancing runs faster
alter system set asm_power_limit=5 scope=both ;
If your asm instance are sharing the same spfile you only need do this on one instance; otherwise run the command both on all asm instances.
Lastly
ALTER DISKGROUP ARCH ADD DISK 'ORCL:ARCH2' ;
Now you can query V$ASM_OPERATION and watch ASM do it's magic of rebalancing.
That's it. All done while the DB is up and running.
How does that work for you?
-JR jr

RAC Interview Questions

Hi All
i am a newbie in RAC and wants to appear in RAC Job Interviews.
Pls give me questions which you have ever asked.
Regards
Naveen Chhibber

Hi Naveenchhibber,
I think basic tops about architecture, backup, load balance, interconnect.
I found good questions in this blog http://dbaanswers.blogspot.com/2007/06/sroracle-dba-racdatagaurd-interview.html
Regards,
Rodrigo Mufalani
http://mufalani.blogspot.com

Easy ORACLE RAC 11g question....

Hi everyone.....
Im new at this .... I've setup using VMware a 2 node cluster that works fine... 11g
Im planning to move our production database that is currently 10g ...
Heres the question... Is it possible to run a 10g instance on 11g RAC... without an upgrade??
Its 11g compatible to run 10g instances?

Your instance is mapped with oracle database. You installed oracle software first then create a database.
If you upgrade your database to 11g then your instance will be 11g. Instance is oracle memory structure and background processes. Actually we upgrade oracle binaries and oracle database.
Regards
Asif Kabir

Using dbca to extend RAC cluster error

Hi all,
I'm trying to extend my 11gR2 RAC cluster (POC) using the Oracle documentation (http://vishalgupta.com/oracle/docs/Database11.2/rac.112/e10718/adddelunix.htm). I've already cloned and extended Clusterware and ASM (Grid Infrastructure) to the new node, as well as cloned the RAC database software to the new node. When I run the below statement to have dbca extend add a new instance on the node for the RAC I get the error shown:
CMD:
$ORACLE_HOME/bin/dbca -silent -addInstance -nodeList newnode13 -gdbName racdb -instanceName racdb4 -sysDBAUserName sys
-sysDBAPassword manager123
ERROR:
cat racdb0.log
"Adding instance" operation on the admin managed database racdb requires instance configured on local node. There is no instance configured on the local node "newnode13".
I set ORACLE_HOME before running dbca, and I've also tried setting ORACLE_SID to both racdb4 and racdb, no change. My environment is below, any help is appreciated.
OS: SLES 11.1
Database: 11.2.0.1
Existing Nodes: node01,node02, node03
New Node: newnode13
DB Name: racdb
Instances: racdb1, racdb2, racdb3
New Instance: racdb4
Thanks.

Silly me, I was running the command from the new node instead of an existing node. I guess it was a rough weekend after all. Thanks all!

ODBC RAC configuration question

Hello, Im looking into doing some ODBC load balancing testing on Oracle RAC, but I thought I would ask to see if anyone knew of any potential problems before I get too involved in the setup.
Is load balancing configured more on the ODBC side? or on the RAC listener side?
Thanks,
J

Hi,
I think you don't need a Oracle RAC for this purpose. A hardware clustering might be sufficient for your purpose.
You will need an expert(expert on the OS, network, hardware and Oracle) advise but here are some inputs based on my experience on 8i in similar purpose:
-2 or more nodes interconnected to each other and both connected to a disk array
-a OS clustering software
-OS clustering is configured for a failover
-oracle is installed on both the nodes
-database is installed on disk array
-oracle service is configured for failover on the cluster
-oracle is started for automatic startup on one node
-startup is not automatic on node 2
-incase of failure, oracle instance is started on the second node by the clustering software.
HTH
Regards,
Badri.

ORACLE RAC failure scenarios

Hello,
We have heard all the good points about the RAC and many of these are true but I just want real experience when even well configured RAC is failed and have unplanned downtime.
Can anyone tell the failure scenarios also? I understand very basic one for example, interconnect fails ,SAN failed etc but please share some real life experience where even Oracle Customer Service takes not only hours but days to resolve the problem and they simply termed that problem as bug.
Thanks,
S.Mann

I agree with Andreas and I think it's important to point out that the issues he mentioned (networking issues as well as other communication problems) are typically more common when RAC is deployed on a platform that isn't completely familiar to the implementor. That is, if you run Oracle on Windows servers, then deploying RAC on Linux successfully will probably be difficult.
My standard answer for "what's the best platform for RAC?" is to run RAC on the platform that you know the most about. When you're building a system to house your most critical applications, wouldn't you want to build it on the platform that you know the most about?

11g R1 Rac Installation Question

Hi Experts
I hoppe you can help me:
I have a production database, the manager ask me to take the test Server to make an Oracle RAC with the Production Server.
Can I Install the RAC 11g R1 only in one Server (test server) ? and then include the Second Server (production)
Wich is the best way to do this ?
Thank you in advance
J.A.

Based on what you've posted take management out behind the woodshed for some serious re-education. It is likely you can not do RAC on your current hardware. Is is highly probable you should not.
Here are some questions to determine what you should/should not do.
1. What is your shared storage solution? The possible answers are NetApp NFS, SAN with RAW or ASM, a clustered file system such as OCFS2.
2. What is your cache fusion interconnection solution? Will your NIC cards and switches support jumbo frames?
3. What operating system? If Windows I'd recommend not going forward even though it is supported.
If this is your first time trying to build a RAC cluster I would advise not doing it with a single server and thinking you are going to magically add a second server later. Doing it this way is not a good exercise for someone doing this for the first time.
For 95+% of organizations ... Data Guard should come before RAC ... this is, of course, not true for people on commission or being paid by the hour.

Recovery from failure question

In order to use two racks of servers (in our case located in separate buildings) and make sure that primary and backup (backup count 1) of each partition are allocated to nodes in different racks I have understood that there is a "machine id" that can be assigned to each node.
     As far as I understand it the cluster will (when configured this way) survive failure of one rack (as long as no more failures occur before the other rack is online again), ie it will execute in a lower "safely mode". I have two questions about this:
     1. Lets say that one rack looses connection with the rest of the network long enough for the cluster to consider the other rack to be down. This will happen with both racks. What happens when the racks once again can communicate? Will they automatically sort out the "split brain" situation?
     2. Lets assume that one of the racks A is hit by a power spike that causes the switch and all the servers except one to rebooted. This will totally cut the failed rack off from the other rack while the switch re-boots and the nodes in rack B will consider all nodes in rack A to be down. After the switch is re-booted the nodes one still working server in rack A will be able to communicate with the nodes in rack B again. My question is then if Coherent in this situation will try to creating backups for all partitions of rack B in the nodes on the single available server in rack A? If this is the case the nodes in that server will most likely quickly run out of heap. What will happen in this situation? Will coherence retry - rebalancing when more nodes join the cluster or will the cluster crash?
     Best Regards
     Magnus

The second answer makes me a bit confused - two     > follow-up questions:
     >
     > 2A - If I use machine id 0 for N servers in rack A
     > and machine id 1 for N servers in rack B I had the
     > impression that primary and backup could NEVER end up
     > in the same rack (even after failures - ie the system
     > would run with only primaries if all nodes with say
     > machine id 1 went down) - is this not the way it
     > works?
     Never is not exactly true. Only if it is theoretically possible to fit all the data in distinct nodes. But e.g. if you have two boxes, with the cluster nodes on one of the boxes not providing enough capacity to hold an entire copy of the data-set, the two copies still have to exist, then inevitably some of the duplicate data will end up on the nodes on the larger box, since they cannot fit into the smaller one. Also be aware that data distribution is done with the granularity of partitions. You cannot distribute a single partition to two nodes, so too uneven partitioning of the data-set can lead to problems.
     > 2B - My concern was that in the case of failures (or
     > as I stated my question during recovery from
     > failures) the number of physical servers (and
     > therefore nodes!) with machine id say zero and one
     > may be different (it seems quite hard to make sure
     > that this cant happen!) and in this case (say for
     > instance that only one out of N servers with machine
     > id zero comes up immediately, the other N - 1 comes
     > up much later because they were forced to do a disk
     > check or required operator intervention to start
     > after a failure) the memory of the single server
     > (heap space of its nodes) will not be enough to
     > accommodate backups for all the primaries on all the
     > N servers with the other machine id.
     >
     While and immediately after you are losing nodes (when their death is detected) partition backups will be attempted to be promoted to primary copies. This might cause OutOfMemoryErrors in those nodes. I do not know if Coherence is able to recover from that (theoretically it should be possible to drop the newly created object references to get back to the state before the attempt to promote the partition backup to a primary partition).
     So during nodes dying you can either temporarily or permanently lose access to data for which primaries were on boxes which died and the reconstruction from the backup caused OutOfMemoryError. If the Coherence JVM is not able to recover from that OutOfMemoryError then you might lose other primaries which reside on this cluster node and have a ripple effect because of this.
     You can reduce the risk of such events by having more than two racks if multiple nodes in racks can fail together, and also sizing the cluster JVMs to have lower memory utilization due to cached data. This way a rack failure will cause for one less primaries lost and for two, more free memory to reconstruct data in. You can read the Production Checklist for some information on sizing the cluster.
     Whatever happens, I don't expect that you will lose data due to OutOfMemory errors when you do not lose a cluster node. So after that blackout on one of the racks, and after the cluster reached a quiescent state as far as redistribution of partitions is concerned, you will not lose data just because you added back a node (I expect a copy of a backup is not dropped before the safe reception of the backup at the transfer destination is acknowledged).
     If there are not enough cluster nodes running on one of the racks to accommodate the entire data-set, your data will not be balanced in a way that copies of the same data are fully on different, but they will still reside on separate cluster nodes. When more nodes are started then gradually they will be rebalanced to hold more data on separate racks.
     Best regards,
     Robert

RAC newbie question

Hi, all:
I have a question on RAC. when a application wants to connect to a instance, usually, you need to know the port, host and sid/service name. when you have a rac instance, you have multiple nodes. where is this port and host? basically, where is the listener?
i am trying to convince my org to go with RAC. One of the developers was asking how an application developer can take advantage of the RAC's failover capability. I am thinking that logically, as a developer, you do not need to worry about this kind of stuff, when the app connect to that db, the RAC should take care of the transaction failover stuff. There is nothing the developer should know or do. Then, the question goes back to how the app connect to the database, where is the listener.
thanks
jw

jw,
How is application connection to single instance server? In same way application connects to RAC instances. Now with RAC, mutlple listeners on multiple nodes will be configured to handle client connection requests for the same database services (instances, so to the end user or application it will be transparent, not knowing which node it will be getting connected to). Mutliple listener configuration will enable application to benefit from following failover and load balancing feauteres.
1. Client-side connect time load balancing
2. Client-sids connect-time falover
3. Server-side connect-time load balancing.
Now all of the above can be implemented either one by one, or in combination with each other. So if you are using connection pooling you can quickly benefit from this and distribute the client work requests accross the pool of connections available to application. (Using Oracle JDBC and .NET connection pool).
You should look example of each of the above 3 and how its done.
Here is an example of client -side connect time load balancing (tnsnames.ora), and you can find rest in oracle docs and on web
ODS =
(DESCRIPTION =
(ADDRESS_LIST =
(LOAD_BALANCE=ON)
    (ADDRESS = (PROTOCOL = TCP)(HOST = HOSTA)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = HOSTB)(PORT = 1521))
   (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = ODS)
)Regards

Extended RAC failure question

Similar Messages

Maybe you are looking for