Oracle10g RAC Cluster Interconnect issues

Hello Everybody,
Just a brief overview as to what i am currently doing. I have installed Oracle10g RAC database on a cluster of two Windows 2000 AS nodes.These two nodes are accessing an external SCSI hard disk.I have used Oracle cluster file system.
Currently i am facing some performance issues when it comes to balancing workload on both the nodes.(Single instance database load is faster than a parallel load using two database instances).
I feel the performance issues could be due to IPC using public Ethernet IP instead of private interconnect.
(During a parallel load large amount of packets of data are sent over the Public IP and not Private interconnect).
How can i be sure that the Private interconnect is used for transferring cluster traffic and not the Public IP? (Oracle mentions that for a Oracle10g RAC database, private IP should be used for heart beat as well as transferring cluster traffic).
Thanks in advance,
Regards,
Salil

You find the answers here:
RAC: Frequently Asked Questions
Doc ID: NOTE:220970.1
At least crossover interconnect is completely unsupported.
Werner

Similar Messages

Interesting load issue on a new 11.2.0.3.0 RAC cluster

Hi All,
This is for a two node 11.2.0.3.0 Std RAC cluster running RHEL 5.4 x64.
I've built a good few RAC clusters before, and this is a new issue in 11.2.0.3.0 (I haven't seen it in 10.2.0.4/5, 11.1.0.6/7, or 11.2.0.1/2). What I've noticed is that the grid infrastructure processes are "busier" on both nodes than they were in previous releases. These include, but are not limited to, ocssd.bin, gipcd.bin, and oraagent.bin.
Load isn't "high", but the database isn't in use and the load on the server is sitting at around 1.05, whereas on other idle clusters it would be a quarter of that, on average. Has anyone else observed this behavior? If possible, provide a MOS article.
If not, I will escalate this to Oracle and see what they say.
Thanks.

It seems that grid processes in 11g are not fully optimized/tested and due to that higher load can be expected to appear although the 'real', actual load on the system is not happening.
Few months ago we had similar situation on 11.2.0.2 two node RAC.
On one node, grid user(eons resource) was doing high CPU load causing load on the OS varying from 3-4, database was almost completely inactive,there were no other software installed on the nodes except oracle and on
the second node load was at the same time varying from 0,2-0,5.
I resolved that by doing a stop/start of the eons resource on the overloaded node.
There are several articles on support.oracle.com about similar situations with grid.
Some of them are:
High Resource Usage by 11.2.0.1 EONS [ID 1062675.1]
Bug 9378784: EONS HIGH RESOURCE USAGE

Solaris cluster, Oracle10g RAC

I just want to understand, which one is more favorable, most popular combination used by sun customers for Oracle10g RAC.
1) Solaris cluster + VERITAS Storage Foundation
2) Solaris cluster + QFS

Please refer to http://docs.sun.com/app/docs/doc/820-2574/fmnyo?a=view for the supported options for Oracle RAC data storage. This is important because if you stray outside these you will not be on a jointly Sun/Oracle supported configuration.
Therefore, if you want to put Oracle RAC (tablespace) data files on a cluster file system, you must use shared QFS as that is the only supported option open to you. Furthermore, you can only run sQFS on top of SVM/Oban or h/w RAID - we do not support running it on VxVM/CVM.
Regards,
Tim
---

Oracle10g RAC with ASM for stretch cluster

Assuming suitable network between sites is in place for RAC interconnect (e.g. dark fibre / DWDM), does it make sense (or is it possible) to stretch a RAC cluster across 2 sites, using ASM to mirror database files between SAN storage devices located at each site? The idea being to combine local high availability with a disaster recovery capability, using hardware that is all active during normal operation (rather than say a single RAC cluster on one site with Data Guard to transport data to the other site for DR).
Or, for a stretch cluster, would SAN / OS implemented remote mirroring be a better idea? (I'd have thought this is likely to incur even more overhead on the network than ASM, but that might be dependant on individual vendors' implementations).
Any thoughts welcome!
Rob

Please refer the thread Re: 11GR2 ASM in non-rac node not starting... failing with error ORA-29701
and this doc http://docs.oracle.com/cd/E11882_01/install.112/e24616/presolar.htm#CHDHAAHE

RAC Private interconnect redundancy

Hello All,
We are designing (implementation will be done later) a 2-node RAC Database with GI having version 12.1.0.2 and RDBMS S/W having version 11.2.0.4.
We want to make private interconnect redundant but sysadmin does not have two same bandwidth channels, he is giving two NICs with 10Gbe (Giga bit Ethernet) and 1Gbe respectively.
I got to know that 1 Gbe is sufficient for GES and GCS but will this architecture work fine means any harm in having 2 different bandwidth channels also in case of failure of 10Gbe interface definitely there will be performance degradation.
Thanks,
Hemant.

DO NOT use two different network bandwidths for your Cluster Interconnect. With two physical NICs, you will either resort to NIC bonding or HAIP, the latter being the recommendation from Oracle Corp since you are using 12c. In either case, both NIC's will be used equally. This means some traffic on the private network will be 'slower' than the other traffic. You do run the risk of having performance issues with this configuration.
Also...there are two reasons for implementing multiple NICs for the Cluster Interconnect, performance and high availability. I've addressed performance above. On the HA side, dual NICs mean that if one channel goes down, the other channel is available and the cluster can stay operational. There is a law of the universe that says if you have 10gE on one side and 1gE on the other side, you have a 99% chance that if one channel goes down, it will be the 10gE one. Which means you may not have enough bandwidth on the remaining channel.
Cheers,
Brian

Oracle10g RAC vs Oracle9i RAC with WebLogic

Hello, I am planning on installing Oracle RAC on Redhat's Linux AS 3.0, while BEA's WebLogic will the application server.
What would you choose as the back end, Oracle9i RAC or Oracle10g RAC and why ?
Also, what release/patch-level (sp ??) of Linux A.S should be used ?
EMC's storage disk arrays sould be used. Is anyone aware of any I/O issues between Linux AS and EMC disks ?
Thank you for your thoughts.
Regards,
Tom

Sun would not recommend using Oracle 10g RAC without Sun Cluster. Setup for that configuration would be found in the Sun Cluster Data Service for Oracle 10g RAC and the Oracle installation manuals.
The installation of Oracle Clusterware alone would be documented in Oracle's manuals. If you have any problems with that, you are better off asking Oracle.
Regards,
Tim
---

Cluster interconnect

We have a 3 node RAC cluster, 10.2.0.3 version. sys admin is gearing up to change
1g interconnect to 10g interconnect. just trying to find out, if anything we need to be prepared with from the database point of view/cluster point of view.
Thanks

riyaj wrote:
But, if the protocol is not RDS, then the path becomes udp -> ip -> IPoIB -> HCA. Clearly, there is an additional layer IPoIB. Considering that most latency is at the software layer level, not in the hardware layer, I am not sure, additional layer will improve the latency. Perhaps when one compares 10GigE with 10Gb IB... but that would be comparing new Ethernet technology with older IB technology. QDR (40Gb) is pretty much the standard (for some years now) for IB.
Originally we compared 1GigE with 10Gb IB, as 10GigE was not available. IPoIB was a lot faster on SDR IB than 1GigE.
When 10GigE was released, it was pretty expensive (not sure if this is still the case). A 10Gb Ethernet port was more than 1.5x the cost of 40Gb IB port.
IB also supports a direct socket (or something) for IP applications. As I understand, this simplifies the call interface and allows socket calls to be made with less latency (surpassing that of the socket interface of a standard IP stack on Ethernet). We never looked at this ourselves as our Interconnect using IB was pretty robust and performant using standard IPoIB.
Further, InfiniBand has data center implications. In huge companies, this is a problem: A separate infiniband architecture is needed to support infiniband network, which is not exactly a mundane task. With 10Gb NIC cards, existing network infrastructure can be used as long as the switch supports the 10Gb traffic. True... but I see that more as resistance to new technology and even the network vendor used (do not have to name names, do I?) that will specifically slam IB technology as they do not supply IB kit. A pure profit and territory issue.
All resistance I've ever seen and responded to with IB versus Ethernet have been pretty much unwarranted - to the extend of seing RACs being build using 100Mb Ethernet Interconnect as IB was a "foreign" technology and equated to evil/do not use/complex/unstable/etc.
Another issue to keep in mind is that IB is a fabric layer. SRP scales and performs better than using fibre channel technology and protocols. So IB is not only suited as Interconnect, but also as the storage fabric layer. (Exadata pretty much proved that point).
Some months ago OFED announced that the SRP specs have been made available to Ethernet vendors to implement (as there is nothing equivalent on Ethernet). Unfortunately, a 10Gig Ethernet SRP implementation will still lack in comparison with a QDR IB SRP implementation.
Third point, skill set in the system admin side needs further adjustment to support Infiniband hardware effectively. Important point. But it is not that difficult as sysadmin to acquire the basic set of skills to manage IB from an o/s perspective. Likewise not that difficult for a network engineer to acquire the basic skills for managing the switch and fabric layer.
The one issue that I think is the single biggest negative ito using IB is getting a stable OFED driver stack running in the kernel. Some of the older versions were not that stable. However, later versions have improved considerably and the current version seems pretty robust. Oh yeah - this is specifically using SRP. IPoIB and bonding and so on, have always worked pretty well. RDMA and SRP were not always that stable with the v1.3 drivers and earlier.

Cluster Interconnect droped packets.

Hi,
We have a 4 node RAC cluster 10.2.0.3 that is seeing some reboot issues that seem to be network related. The network statistics are showing dropped packets across the interconnect (bond1,eth2). Is this normal behavior due to using UDP?
$ netstat -i
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
bond0 1500 0 387000915 0 0 0 377153910 0 0 0 BMmRU
bond1 1500 0 942586399 0 2450416 0 884471536 0 0 0 BMmRU
eth0 1500 0 386954905 0 0 0 377153910 0 0 0 BMsRU
eth1 1500 0 46010 0 0 0 0 0 0 0 BMsRU
eth2 1500 0 942583215 0 2450416 0 884471536 0 0 0 BMsRU
eth3 1500 0 3184 0 0 0 0 0 0 0 BMsRU
lo 16436 0 1048410 0 0 0 1048410 0 0 0 LRU
Thanks

Hi,
To diagnose the reboot issues refere *Troubleshooting 10g and 11.1 Clusterware Reboots [ID 265769.1]*
Also monitor your lost blocks *gc lost blocks diagnostics [ID 563566.1]*
I had a issue which turned out to be network card related (gc lost blocks) http://www.asanga-pradeep.blogspot.com/2011/05/gathering-stats-for-gc-lost-blocks.html

Aggregates, VLAN's, Jumbo-Frames and cluster interconnect opinions

Hi All,
I'm reviewing my options for a new cluster configuration and would like the opinions of people with more expertise than myself out there.
What I have in mind as follows:
2 x X4170 servers with 8 x NIC's in each.
On each 4170 I was going to configure 2 aggregates with 3 nics in each aggregate as follows
igb0 device in aggr1
igb1 device in aggr1
igb2 device in aggr1
igb3 stand-alone device for iSCSI network
e1000g0 device in aggr2
e1000g1 device in aggr2
e1000g2 device in aggr3
e1000g3 stand-alone device of iSCSI network
Now, on top of these aggregates, I was planning on creating VLAN interfaces which will allow me to connect to our two "public" network segments and for the cluster heartbeat network.
I was then going to configure the vlan's in an IPMP group for failover. I know there are some questions around that configuration in the sense that IPMP will not detect a nic failure if a NIC goes offline in the aggregate, but I could monitor that in a different manner.
At this point, my questions are:
[1] Are vlan's, on top of aggregates, supported withing Solaris Cluster? I've not seen anything in the documentation to mention that it is, or is not for that matter. I see that vlan's are supported, inluding support for cluster interconnects over vlan's.
Now with the standalone interface I want to enable jumbo frames, but I've noticed that the igb.conf file has a global setting for all nic ports, whereas I can enable it for a single nic port in the e1000g.conf kernel driver. My questions are as follows:
[2] What is the general feeling with mixing mtu sizes on the same lan/vlan? Ive seen some comments that this is not a good idea, and some say that it doesnt cause a problem.
[3] If the underlying nic, igb0-2 (aggr1) for example, has 9k mtu enabled, I can force the mtu size (1500) for "normal" networks on the vlan interfaces pointing to my "public" network and cluster interconnect vlan. Does anyone have experience of this causing any issues?
Thanks in advance for all comments/suggestions.

For 1) the question is really "Do I need to enable Jumbo Frames if I don't want to use them (neither public nore private network)" - the answer is no.
For 2) each cluster needs to have its own seperate set of VLANs.
Greets
Thorsten

Multiple databases in one single RAC cluster

Hi, I would like to know if one can have multiple databases running on a single RAC cluster, we have several databases in our shop and would like to consolidate all of them into a single 3-4 node RAC cluster running databases with 10.2 and 11.1 versions.
I am newbie to RAC and would like to get some clarification if anyone has done this, google search comes up with few hits on this topic, so obviously this is not doable.
In our case we have one database supporting critical applications and few other not so critical but are used very extensively between 9-5, so what is the use of RAC if I cannot consolidate all my databases into one cluster, or if I need a separate cluster for each of these critical databases?
I have been all the Oracle docs that keep repeating one database multiple instances and one instance-one machine-one node, they don't even advise running multiple instances on a single node?.
I appreciate any insight.
Thanks.

ora-sql-dba wrote:
Can you give more details on how you would setup multiple databases running different versions on a single RAC cluster, I am yet to find any documentation that supports or even elaborates on this topic.You can configure a cluster with 12 nodes. Then, using dbca, configure a dev instance for nodes 1 and 2, a prod1 instance for nodes 3 to 6 and a prod2 instance for nodes 7 to 12.
You also can configure each of these instances for all 12 nodes. And use it on all 12 nodes.
Or, after configuring it for all 12 nodes, you can start the dev instance on nodes 1 and 2, prod1 on 3 - 6 and prod2 on the remaining nodes. If dev needs more power, you can for example shutdown prod2 on node 12 and start another dev instance there.
My issue is with the 2nd option - running more than one instance on the same node or server. Why? Each instance has a basic resource footprint ito shared memory needed, system processes required (like db writer, log writer, sys monitor) etc. It does not make sense to pay for that same footprint more than once on a server. Each time you do, you need to reduce the amount of resources that can be used by each instance.
So instead of using (for example) 60% of that server's memory as the SGA for a single instance, if you use 2 instances on that server you now have to reduce the SGA of each to 30% of system memory. Effectively crippling those instances by 50% - they will now have smaller buffer caches, require more physical I/O and be more limited in what they can do.
So unless you have very sound technical reasons for running more than one instance on a server (RAC or non-RAC), do not.

JVM patch required for DST on 10.2.0.2 RAC cluster

I have looked all over the internet and Metalink for information regarding the JVM patching on a RAC cluster and haven't found anything useful, so I apologize if this question has already been asked multiple times. Also if there is a forum dedicated to DST issues, please point me in that direction.
I have a 10.2.0.2 RAC cluster so I know I have to do the JVM patching required because of the DST changes. The README for 5075470 says to follow post-implementation steps in the fix5075470README.txt file. Step 3 of those instructions say to bounce the database, and then not allow the use of java until step 4 is complete (which is to run the fix5075470b.sql script).
Here's my question: since this is a RAC database, does that mean I have to shutdown both instances, start them back up, run the script, and then let users log back in? IN OTHER WORDS, AN OUTAGE IS REQUIRED?
Is there a way around having to take an outage? Can I bounce each instance separately (in a rolling fashion) so there's no outage, and then run the script even though users are logged on if I think java isn't being used by the application? Is there a way to confirm whether or not it's being used? If I confirm the application isn't using java, is it ok to run the script while users are logged on?
Any insight would be greatly appreciated.
Thanks,
Susan

According to Note: 414309.1 USA 2007 DST Changes: Frequently Asked Questions an Problem for Oracle JVM Patches, question 4 Does the database need to be down before the OVJM patch is applied, the bounce is necessary. That says nothing about a rolling upgrade in RAC.
You might file an SR asking if a rolling upgrade is possible.

How to check my RAC private interconnect working properly?

All,
Is there any way to check whether my RAC private interconnect is working properly or not?
Thanks,
Mahi

Mahi wrote:
All,
Is there any way to check whether my RAC private interconnect is working properly or not?
Thanks,
MahiCVU verifies the connectivity between all of the nodes in the cluster through those interfaces.
$cluvfy comp nodecon -n all -verbose
http://docs.oracle.com/cd/E11882_01/rac.112/e16794/cvu.htm

Is there a way to config WLS to fail over from a primary RAC cluster to a DR RAC cluster?

Here's the situation:
We have two Oracle RAC clusters, one in a primary site, and the other in a DR site
Although they run active/active using some sort of replication (Oracle Streams? not sure), we are being asked to use only the one currently being used as the primary to prevent latency & conflict issues
We are using this only for read-only queries.
We are not concerned with XA
We're using WebLogic 10.3.5 with MultiDatasources, using the Oracle Thin driver (non-XA for this use case) for instances
I know how to set up MultiDatasources for an individual RAC cluster, and I have been doing that for years.
Question:
Is there a way to configure MultiDatasources (mDS) in WebLogic to allow for automatic failover between the two clusters, or does the app have to be coded to failover from an mDS that's not working to one that's working (with preference to a currently labelled "primary" site).
Note:
We still want to have load balancing across the current "primary" cluster's members
Is there a "best practice" here?

Hi Steve,
There are 2 ways to connect WLS to a Oracle RAC.
1. Use the Oracle RAC service URL which contains the details of all the RAC nodes and the respective IP address and DNS.
2. Connect to the primary cluster as you are currently doing and use a MDS to load-balance/failover between multiple nodes in the primary RAC (if applicable).
In case of a primary RAC nodes failure and switch to DR RAC nodes, use WLST scripts to change the connection URL and restart the application to remove any old connections.
Such DB fail-over tests can be conducted in a test/reference environment to set up the required log monitoring and subsequent steps to measure the timelines.
Thanks,
Souvik.

Routing all connections through a one node in a 2 node RAC cluster

Hi everyone
My client has the following requirement: an active/active RAC cluster (eg node1/node2), but with only one of the nodes being used (node1) and the other sitting there just in case.
For things like services, i'm sure this is straightforward enough - just have them set to preferred on node1 and available on node 2.
For connections, I imagine I would just have the vips in order in the tns file, but with LOAD_BALANCING=OFF, so they go through the tns entries in order (i.e node 1, then node 2), so this would still allow the vip to failover if node 1 is down.
Does that sound about right? Have I missed anything?
Many thanks
Rup

user573914 wrote:
My client has the following requirement: an active/active RAC cluster (eg node1/node2), but with only one of the nodes being used (node1) and the other sitting there just in case.Why? What is the reason for a "+just in case+" node - and when and how is is "enabled" when that just-in-case situation occurs?
This does not many any kind of sense from a high availability or redundancy view.
For connections, I imagine I would just have the vips in order in the tns file, but with LOAD_BALANCING=OFF, so they go through the tns entries in order (i.e node 1, then node 2), so this would still allow the vip to failover if node 1 is down.
Does that sound about right? Have I missed anything?Won't work on 10g - may not work on 11g. The Listener can and does handoff connections, depending on what the TNS connection string say. If you do not connect via a SID entry but via a SERVICE entry, and that service is available on multiple nodes, you may not (and often will not) be connected to instance on the single IP that you used in your TNS connection.
Basic example:
// note that this TEST-RAC alias refers to a single specific IP of a cluster, and use
// SERVICE_NAME as the request
/home/billy> tnsping test-rac
TNS Ping Utility for Linux: Version 10.2.0.1.0 - Production on 18-JAN-2011 09:06:33
Copyright (c) 1997, 2005, Oracle. All rights reserved.
Used parameter files:
/usr/lib/oracle/xe/app/oracle/product/10.2.0/server/network/admin/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS=(PROTOCOL=TCP)(HOST= 196.1.83.116)(PORT=1521)) (LOAD_BALANCE=no) (CONNECT_DATA=(SERVER=shared)(SERVICE_NAME=myservicename)))
OK (50 msec)
// now connecting to the cluster using this TEST-RAC TNS alias - and despite we listing a single
// IP in our TNS connection, we are handed off to a different RAC node (as the service is available
// on all nodes)
// and this also happens despite our TNS connection explicitly requesting no load balancing
/home/billy> sqlplus scott/tiger@test-rac
SQL*Plus: Release 10.2.0.1.0 - Production on Tue Jan 18 09:06:38 2011
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Data Mining and Real Application Testing options
SQL> !lsof -n -p $PPID | grep TCP
sqlplus 5432 billy 8u IPv4 2199967 0t0 TCP 10.251.93.58:33220->196.1.83.127:37031 (ESTABLISHED)
SQL> So we connected to RAC node 196.1.83.116 - and that listener handed us off to RAC node 196.1.83.127. The 11gr2 Listener seems to behave differently - it does not do a handoff (from a quick test I did on a 11.2.0.1 RAC) in the above scenario.
This issue aside - how do you deal with just-in-case situation? How do you get clients to connect to node 2 when node 1 is down? Do you rely on the virtual IP of node 1 to be switched to node 2? Is this a 100% safe and guaranteed method?
It can take some time (minutes, perhaps more) for a virtual IP address to fail over to another node. During that time, any client connection using that virtual IP will fail. Is this acceptable?
I dunno - I dislike this concept of your client of treating the one RAC node as some kind of standby database for a just-in-case situation. I fail to see any logic in that approach.

Why do we use reverse proxy for Oracle RAC Cluster setup

Hello All,
I got this question lately.. "why do we use reverse proxy for Oracle RAC Cluster setup". I know we use the reverse proxy at Middleware level for multiple security reasons.
Thanks..

"why do we use reverse proxy for Oracle RAC Cluster setup".
I wouldn't. I wouldn't use a proxy of any sort for the Cluster Interconnect for sure.
Cheers,
Brian

Oracle10g RAC Cluster Interconnect issues

Similar Messages

Maybe you are looking for