Oracle RAC 2 node architecture-- Node -2 always gets evicted

Hi,
I have Oracle RAC DB with simple 2 node architecture( Host RHEL5.5 X 86_64) . The problem we are facing is, whenever there is network failure on either of nodes, always node-2 gets evicted (rebooted). We do not see any abnormal errors on alert.log file on both the nodes.
The steps followed and results are:
**Node-1#service network restart**
**Result: Node-2 evicted**
**Node-2# service network restart**
**Result: Node-2 evicted**
I would like to know why node-1 never gets evicted even if the network is down or restarted on node-1 itself?? Is this normal.
Regards,
Raj

Hi,
Please find the output below:
2011-06-03 16:36:02.817: [    CSSD][1216194880]clssnmPollingThread: node prddbs02 (2) at 50% heartbeat fatal, removal in 14.120 seconds
2011-06-03 16:36:02.817: [    CSSD][1216194880]clssnmPollingThread: node prddbs02 (2) is impending reconfig, flag 132108, misstime 15880
2011-06-03 16:36:02.817: [    CSSD][1216194880]clssnmPollingThread: local diskTimeout set to 27000 ms, remote disk timeout set to 27000, impending reconfig status(1)
2011-06-03 16:36:05.994: [    CSSD][1132276032]clssnmvSchedDiskThreads: DiskPingMonitorThread sched delay 760 > margin 750 cur_ms 1480138014 lastalive 1480137254
2011-06-03 16:36:07.493: [    CSSD][1226684736]clssnmSendingThread: sending status msg to all nodes
2011-06-03 16:36:07.493: [    CSSD][1226684736]clssnmSendingThread: sent 5 status msgs to all nodes
2011-06-03 16:36:08.084: [    CSSD][1132276032]clssnmvSchedDiskThreads: DiskPingMonitorThread sched delay 850 > margin 750 cur_ms 1480140104 lastalive 1480139254
2011-06-03 16:36:09.831: [    CSSD][1216194880]clssnmPollingThread: node prddbs02 (2) at 75% heartbeat fatal, removal in 7.110 seconds
2011-06-03 16:36:10.122: [    CSSD][1132276032]clssnmvSchedDiskThreads: DiskPingMonitorThread sched delay 880 > margin 750 cur_ms 1480142134 lastalive 1480141254
2011-06-03 16:36:11.112: [    CSSD][1132276032]clssnmvSchedDiskThreads: DiskPingMonitorThread sched delay 860 > margin 750 cur_ms 1480143124 lastalive 1480142264
2011-06-03 16:36:12.212: [    CSSD][1132276032]clssnmvSchedDiskThreads: DiskPingMonitorThread sched delay 950 > margin 750 cur_ms 1480144224 lastalive 1480143274
2011-06-03 16:36:12.487: [    CSSD][1226684736]clssnmSendingThread: sending status msg to all nodes
2011-06-03 16:36:12.487: [    CSSD][1226684736]clssnmSendingThread: sent 5 status msgs to all nodes
2011-06-03 16:36:13.840: [    CSSD][1216194880]clssnmPollingThread: local diskTimeout set to 200000 ms, remote disk timeout set to 200000, impending reconfig status(0)
2011-06-03 16:36:14.881: [    CSSD][1205705024]clssgmTagize: version(1), type(13), tagizer(0x494dfe)
2011-06-03 16:36:14.881: [    CSSD][1205705024]clssgmHandleDataInvalid: grock HB+ASM, member 2 node 2, birth 21
2011-06-03 16:36:17.487: [    CSSD][1226684736]clssnmSendingThread: sending status msg to all nodes
2011-06-03 16:36:17.487: [    CSSD][1226684736]clssnmSendingThread: sent 5 status msgs to all nodes
2011-06-03 16:36:22.486: [    CSSD][1226684736]clssnmSendingThread: sending status msg to all nodes
2011-06-03 16:36:22.486: [    CSSD][1226684736]clssnmSendingThread: sent 5 status msgs to all nodes
2011-06-03 16:36:23.162: [ GIPCNET][1205705024]gipcmodNetworkProcessRecv: [network] failed recv attempt endp 0x2eb80c0 [0000000001fed69c] { gipcEndpoint : localAddr 'gipc://prddbs01:80b3-6853-187b-4d2e#192.168.7.1#33842', remoteAddr 'gipc://prddbs02:gm_prddbs-cluster#192.168.7.2#60074', numPend 4, numReady 1, numDone 0, numDead 0, numTransfer 0, objFlags 0x1e10, pidPeer 0, flags 0x2616, usrFlags 0x0 }, req 0x2aaaac308bb0 [0000000001ff4b7d] { gipcReceiveRequest : peerName '', data 0x2aaaac2e3cd8, len 10240, olen 0, off 0, parentEndp 0x2eb80c0, ret gipc
2011-06-03 16:36:23.162: [ GIPCNET][1205705024]gipcmodNetworkProcessRecv: slos op : sgipcnTcpRecv
2011-06-03 16:36:23.162: [ GIPCNET][1205705024]gipcmodNetworkProcessRecv: slos dep : Connection reset by peer (104)
2011-06-03 16:36:23.162: [ GIPCNET][1205705024]gipcmodNetworkProcessRecv: slos loc : recv
2011-06-03 16:36:23.162: [ GIPCNET][1205705024]gipcmodNetworkProcessRecv: slos info: dwRet 4294967295, cookie 0x2aaaac308bb0
2011-06-03 16:36:23.162: [    CSSD][1205705024]clssgmeventhndlr: Disconnecting endp 0x1fed69c ninf 0x2aaab0000f90
2011-06-03 16:36:23.162: [    CSSD][1205705024]clssgmPeerDeactivate: node 2 (prddbs02), death 0, state 0x80000001 connstate 0x1e
2011-06-03 16:36:23.162: [GIPCXCPT][1205705024]gipcInternalDissociate: obj 0x2eb80c0 [0000000001fed69c] { gipcEndpoint : localAddr 'gipc://prddbs01:80b3-6853-187b-4d2e#192.168.7.1#33842', remoteAddr 'gipc://prddbs02:gm_prddbs-cluster#192.168.7.2#60074', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x1e10, pidPeer 0, flags 0x261e, usrFlags 0x0 } not associated with any container, ret gipcretFail (1)
2011-06-03 16:36:32.494: [    CSSD][1226684736]clssnmSendingThread: sent 5 status msgs to all nodes
2011-06-03 16:36:37.493: [    CSSD][1226684736]clssnmSendingThread: sending status msg to all nodes
2011-06-03 16:36:37.494: [    CSSD][1226684736]clssnmSendingThread: sent 5 status msgs to all nodes
2011-06-03 16:36:40.598: [    CSSD][1216194880]clssnmPollingThread: node prddbs02 (2) at 90% heartbeat fatal, removal in 2.870 seconds, seedhbimpd 1
2011-06-03 16:36:42.497: [    CSSD][1226684736]clssnmSendingThread: sending status msg to all nodes
2011-06-03 16:36:42.497: [    CSSD][1226684736]clssnmSendingThread: sent 5 status msgs to all nodes
2011-06-03 16:36:43.476: [    CSSD][1216194880]clssnmPollingThread: Removal started for node prddbs02 (2), flags 0x20000, state 3, wt4c 0
2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmDoSyncUpdate: Initiating sync 178830908
2011-06-03 16:36:43.476: [    CSSD][1237174592]clssscUpdateEventValue: NMReconfigInProgress val 1, changes 57
2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmDoSyncUpdate: local disk timeout set to 27000 ms, remote disk timeout set to 27000
2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmDoSyncUpdate: new values for local disk timeout and remote disk timeout will take effect when the sync is completed.
2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmDoSyncUpdate: Starting cluster reconfig with incarnation 178830908
2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmSetupAckWait: Ack message type (11)
2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmSetupAckWait: node(1) is ALIVE
2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmSendSync: syncSeqNo(178830908), indicating EXADATA fence initialization complete
2011-06-03 16:36:43.476: [    CSSD][1237174592]List of nodes that have ACKed my sync: NULL
2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmSendSync: syncSeqNo(178830908)
2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmWaitForAcks: Ack message type(11), ackCount(1)
2011-06-03 16:36:43.476: [    CSSD][1247664448]clssnmHandleSync: Node prddbs01, number 1, is EXADATA fence capable
2011-06-03 16:36:43.476: [    CSSD][1247664448]clssscUpdateEventValue: NMReconfigInProgress val 1, changes 58
2011-06-03 16:36:43.476: [    CSSD][1247664448]clssnmHandleSync: local disk timeout set to 27000 ms, remote disk timeout set t:
2011-06-03 16:36:43.476: [    CSSD][1247664448]clssnmQueueClientEvent: Sending Event(2), type 2, incarn 178830907
2011-06-03 16:36:43.476: [    CSSD][1247664448]clssnmQueueClientEvent: Node[1] state = 3, birth = 178830889, unique = 1305623432
2011-06-03 16:36:43.476: [    CSSD][1247664448]clssnmQueueClientEvent: Node[2] state = 5, birth = 178830907, unique = 1307103307
2011-06-03 16:36:43.476: [    CSSD][1247664448]clssnmHandleSync: Acknowledging sync: src[1] srcName[prddbs01] seq[73] sync[178830908]
2011-06-03 16:36:43.476: [    CSSD][1247664448]clssnmSendAck: node 1, prddbs01, syncSeqNo(178830908) type(11)
2011-06-03 16:36:43.476: [    CSSD][1240850064]clssgmStartNMMon: node 1 active, birth 178830889
2011-06-03 16:36:43.476: [    CSSD][1247664448]clssnmHandleAck: src[1] dest[1] dom[0] seq[0] sync[178830908] type[11] ackCount(0)
2011-06-03 16:36:43.476: [    CSSD][1240850064]clssgmStartNMMon: node 2 active, birth 178830907
2011-06-03 16:36:43.476: [    CSSD][1240850064]NMEVENT_SUSPEND [00][00][00][06]
2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmSendSync: syncSeqNo(178830908), indicating EXADATA fence initialization complete
2011-06-03 16:36:43.476: [    CSSD][1240850064]clssgmUpdateEventValue: CmInfo State val 5, changes 190
2011-06-03 16:36:43.476: [    CSSD][1237174592]List of nodes that have ACKed my sync: 1
2011-06-03 16:36:43.476: [    CSSD][1240850064]clssgmSuspendAllGrocks: Issue SUSPEND
2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmWaitForAcks: done, msg type(11)
2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmSetMinMaxVersion:node1 product/protocol (11.2/1.4)
2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmSetMinMaxVersion: properties common to all nodes: 1,2,3,4,5,6,7,8,9,10,11,12,13,14
2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmSetMinMaxVersion: min product/protocol (11.2/1.4)
2011-06-03 16:36:43.476: [    CSSD][1240850064]clssgmQueueGrockEvent: groupName(IG+ASMSYS$USERS) count(2) master(1) event(2), incarn 22, mbrc 2, to member 1, events 0x0, state 0x0
2011-06-03 16:36:43.477: [    CSSD][1237174592]clssnmSetMinMaxVersion: max product/protocol (11.2/1.4)
2011-06-03 16:36:43.477: [    CSSD][1237174592]clssnmNeedConfReq: No configuration to change
etc.etc....
Let me know if any other logfile required. No unususal messages on /var/log/messages.
Regards,
Raj

Similar Messages

  • Oracle RAC Installation: Unix nodes, Windows ASM

    I have a question about configuring Oracle RAC. I have never done any RAC or ASM installation before. Might be a stupid question for some of you.
    Is it possible to install Oracle RAC using following options?
    2 node RAC using Sun Solaris
    Shared Storage using ASM in Windows server
    Any additional information that you can provide will be greatly appreciated.
    Thanks in advance

    Hi,
    First of all, do you have a shared storage available to both unix server or you want to use the windows server as a shared storage ?
    From the documentation
    Single Instance and Clustered Environments:
    Each database server that has database files managed by ASM needs to be running an ASM instance. A single ASM instance can service one or more single-instance databases on a stand-alone server. Each ASM disk group can be shared among all the databases on the server. In a clustered environment, each node runs an ASM instance, and the ASM instances communicate with each other on a peer-to-peer basis.
    Which means that you need to have ASM instance on every server where you have a database instance.
    In case you don't have a shared storage to the unix servers, you have two options - iSCSI or NFS. You can setup the windows machine as an iscsi server and both unix machines as an iscsi clients then you will have shared storage on both unix machines. The other option is to configure the window machine as NFS server and mount the NFS share on both unix machines. Then you can deploy the data files directly at the NFS shared (not supported) or create empty files using dd and use then as device files for ASM.
    For more information on ASM over NFS you can read Tim Halls article:
    http://www.oracle-base.com/articles/linux/UsingNFSWithASM.php
    Regards,
    Sve

  • Oracle RAC 11g R2. Node pinned or unpinned?

    Hi all, I'm working with Oracle Clusterware & RAC 11 g R2.
    As is my first time, I really don't understand what is the meanning of pinning or unpinning a node.
    Can anyone help me please?
    Thanks in advance!!

    From 11.2 RAC deployment guide,
    Pinning a node means that the association of a node name with a node number is fixed. If a node is not pinned, its node number may change if the lease expires while it is down. The lease of a pinned node never expires.Since your installation is a clean installation ( no previous installation done) , you don't need to pin the nodes, it would be done by oracle clusterware.
    HTH
    Aman....

  • ORACLE RAC - 11G 2-NODE- OEL , asmdisk listdisk not showing disk in second node

    Hi,
    i am using vmware to install rac -2node .asmdisk listdisk not showing disk in second node.
    my SCSI values are hardware : 0:0
    hardware 2 = 1:1
    please suggest. do i need to change the above parameter values.

    Please elaborate more on this issue. Is this related to EBS? What are you trying to accomplish? Are you following any link/doc?
    Thanks,
    Hussein

  • Oracle RAC 10.2G reboots node every 45 minutes

    Hello:
    - We have installed Oracle RAC 10.2G for Solaris X86 ( 64 bit ).
    - On one node, there are no issues. But the other node ( I think )
    is being rebooted by CRS every 45 minutes or so.
    - Is this issue caused by some misconfiguration I did during the install ?
    - Or is there a patch available to fix this ?
    - Has anyone else encountered this problem ?
    Thanks
    jlem

    Hello:
    - I re-installed Oracle RAC. The nodes were only rebooted once so far.
    So, the second install may be ok. If not, I have provided answers to the first email reply.
    - Any help given is most welcome. In meantime, I will continue searching the oracle forums
    for solutions.
    - My environment is:
    - both nodes are running under vmware ESX server version 3.0.1
    - the shared storage for OCR and Voting Disk is a raw shared device under vmware
    - both nodes are using Solaris X86 5.10 update 5
    - Oracle version is: 10.2.0.3 ( patched from version 10.2.0.1 )
    - My public network configuration is:
    node 1:
    e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
    inet 10.20.1.74 netmask ffff0000 broadcast 10.20.255.255
    ether 0:c:29:3a:45:a9
    e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
    inet 10.20.1.77 netmask ffff0000 broadcast 10.20.255.255
    node 2:
    e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
    inet 10.20.1.75 netmask ffff0000 broadcast 10.20.255.255
    ether 0:c:29:2b:db:90
    e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
    inet 10.20.1.78 netmask ffff0000 broadcast 10.20.255.255
    - My private network configuration is:
    node 1:
    e1000g1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
    inet 192.168.0.1 netmask ffffff00 broadcast 192.168.0.255
    ether 0:c:29:3a:45:b3
    node 2:
    e1000g1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
    inet 192.168.0.2 netmask ffffff00 broadcast 192.168.0.255
    ether 0:c:29:2b:db:9a
    - My storage solution is:
    - 3 virtual shared SCSI hard disks ( each 500 MB in size )
    - My log files are:
    - /var/adm/messages
    - doesn't report much only the following:
    Nov 12 10:57:05 saucer nfs4cbd[328]: [ID 867284 daemon.notice] nfsv4 cannot determine local hostname binding for transport
    tcp6 - delegations will not be available on this transport
    Nov 12 10:57:21 saucer savecore: [ID 570001 auth.error] reboot after panic: forced crash dump initiated at user requestNov 12 10:57:21 saucer savecore: [ID 748169 auth.error] saving system crash dump in /var/crash/saucer/*.2Nov 12 10:57:41 saucer root: [ID 702911 user.error] Oracle Cluster Ready Services disabled by administrator.Nov 12 10:57:54 saucer rootnex: [ID 349649 kern.info] xsvc0 at rootNov 12 10:57:54 saucer genunix: [ID 936769 kern.info] xsvc0 is /xsvc
    - ocssd.log file for node1 indicates that node2 was evicted for impeding a reconfig. Details are:
    [    CSSD]2008-11-12 10:55:43.700 [15] >TRACE: clssnmPollingThread: node saucer (2) is impending reconfig
    [    CSSD]2008-11-12 10:55:43.700 [15] >WARNING: clssnmPollingThread: node saucer (2) at 90% heartbeat fatal, eviction in 0
    .973 seconds
    [    CSSD]2008-11-12 10:55:44.679 [15] >TRACE: clssnmPollingThread: node saucer (2) is impending reconfig
    [    CSSD]2008-11-12 10:55:44.679 [15] >TRACE: clssnmPollingThread: Eviction started for node saucer (2), flags 0x000d, s
    tate 3, wt4c 0
    [    CSSD]2008-11-12 10:55:44.690 [17] >TRACE: clssnmDoSyncUpdate: Initiating sync 3
    [    CSSD]2008-11-12 10:55:44.690 [17] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (27000)ms
    [    CSSD]2008-11-12 10:55:44.691 [17] >TRACE: clssnmSetupAckWait: Ack message type (11)
    [    CSSD]2008-11-12 10:55:44.691 [17] >TRACE: clssnmSetupAckWait: node(1) is ALIVE
    [    CSSD]2008-11-12 10:55:44.691 [17] >TRACE: clssnmSetupAckWait: node(2) is ALIVE
    [    CSSD]2008-11-12 10:55:44.691 [17] >TRACE: clssnmSendSync: syncSeqNo(3)
    - node2 ocssd.log does not indicate the problem. See below for details:
    [    CSSD]2008-11-12 10:52:34.731 [11] >TRACE: clssgmClientConnectMsg: Connect from con(da8410) proc(dab900) pid() proto(
    10:2:1:1)
    [    CSSD]2008-11-12 10:53:37.305 [11] >TRACE: clssgmClientConnectMsg: Connect from con(da8410) proc(dab900) pid() proto(
    10:2:1:1)
    [    CSSD]2008-11-12 10:54:40.515 [11] >TRACE: clssgmClientConnectMsg: Connect from con(da8410) proc(dab900) pid() proto(
    10:2:1:1)
    [    CSSD]2008-11-12 11:18:09.997 >USER: Oracle Database 10g CSS Release 10.2.0.3.0 Production Copyright 1996, 2004 Orac
    le. All rights reserved.
    [    CSSD]2008-11-12 11:18:09.997 >USER: CSS daemon log for node saucer, number 2, in cluster crs
    [  clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=saucerDBG_CSSD))
    [    CSSD]2008-11-12 11:18:10.016 [1] >TRACE: clssscmain: local-only set to false
    [    CSSD]2008-11-12 11:18:10.031 [1] >TRACE: clssnmReadNodeInfo: added node 1 (flying) to cluster
    [    CSSD]2008-11-12 11:18:10.042 [1] >TRACE: clssnmReadNodeInfo: added node 2 (saucer) to cluster
    [    CSSD]2008-11-12 11:18:10.057 [5] >TRACE: clssnm_skgxnmon: skgxn init failed
    [    CSSD]2008-11-12 11:18:10.057 [1] >TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor
    - ORACLE VERIFY: cluvfy was run on node2 resulting with the following:
    bash-3.00$ ./cluvfy comp ocr -n all -verbose
    Verifying OCR integrity
    Checking OCR integrity...
    Checking the absence of a non-clustered configuration...
    All nodes free of non-clustered, local-only configurations.
    Uniqueness check for OCR device passed.
    Checking the version of OCR...
    OCR of correct Version "2" exists.
    Checking data integrity of OCR...
    Data integrity check for OCR passed.
    OCR integrity check passed.
    Verification of OCR integrity was successful.
    bash-3.00$
    Thanks
    jlem

  • Delete oracle RAC node - still appear on addNode script

    Hello ,
    I delete one from oracle RAC have 3 node .
    I try to re-install it again but it give me below error :
    The following error was encountered for new node . the node names rac3 are unique ...
    it still appear on add node ! why ?
    [oracle@rac1 dev]$ /u01/crs/oracle/product/10.2.0/crs/bin/crs_stat -t
    Name           Type           Target    State     Host       
    ora....C1.inst application    ONLINE    OFFLINE              
    ora....C2.inst application    ONLINE    OFFLINE              
    ora.RAC.db     application    ONLINE    OFFLINE              
    ora....SM1.asm application    ONLINE    ONLINE    rac1       
    ora....C1.lsnr application    ONLINE    OFFLINE              
    ora.rac1.gsd   application    ONLINE    OFFLINE              
    ora.rac1.ons   application    ONLINE    ONLINE    rac1       
    ora.rac1.vip   application    ONLINE    ONLINE    rac1       
    ora....SM2.asm application    ONLINE    OFFLINE              
    ora....C2.lsnr application    ONLINE    OFFLINE              
    ora.rac2.gsd   application    ONLINE    OFFLINE              
    ora.rac2.ons   application    ONLINE    OFFLINE              
    ora.rac2.vip   application    ONLINE    ONLINE    rac1
    Can you help please how to solve this issue  ?
    Thanks

    Hello ,
    Finally done
    /u01/crs/oracle/product/10.2.0/crs/install/rootdeletenode.sh rac3,3
    /u01/crs/oracle/product/10.2.0/crs/bin/olsnodes -n now it done appear
    but still same error appear .......
    so after
    /u01/crs/oracle/product/10.2.0/crs/oui/bin/runInstaller -updateNodelist ORACLE_HOME=/u01/crs/oracle/product/10.2.0/crs/ "CLUSTER_NODES={rac1,rac2}" CRS=true
      445  locate racgons
      446  /u01/crs/oracle/product/10.2.0/crs/bin/racgons remove_config rac3
    Regards,
    Mohanad Awad

  • RAC on two nodes

    Hi,
    Need to implement Oracle RAC on two nodes.I am trying it first time so need some basic advice from you all regarding the basic requirements for installing RAC.
    Are there any more softwares,other than Oracle11g setup, needed for installation?
    What are the basic things I should keep in mind before installing RAC.
    thanx

    11g.DBA wrote:
    Need to implement Oracle RAC on two nodes.I am trying it first time so need some basic advice from you all regarding the basic requirements for installing RAC.
    Are there any more softwares,other than Oracle11g setup, needed for installation?On Linux, you need a certified distro. When installing Grid Infrastructure, it will verify s/w components and lists the ones it need (suh as libaio, gnu c compiler, etc). yum install fixes that (on RHEL-based distros).
    What are the basic things I should keep in mind before installing RAC.RAC is only ever as good as the h/w infrastructure it runs on. Specifically the I/O fabric layer and the Interconnect layer. If these are shoddy, not redundant, slow ito b/w, slow ito of latency, then you WILL have a RAC that fails to perform, fails to scale, and is unable to provide high availability and redundancy.

  • Oracle RAC 10g - Application connect directly to database IP address

    Hi,
    I am a developer and does not have much knowledge about oracle admin. Sorry, if I don't use the term correctly.
    We have a vendor application using Oracle RAC on two node (node1/vip1,node2/vip2). Our application was configured to use JDBC connection string (not TNS, someone told me it's a bad practice - but it's how our consultant vendor set it up). The connection string is configured to point to VIP1 hostname and VIP2 hostname.
    When I look at the list of connections using netstat, I am seeing the connection was established to vip1's ip address as well as node2's ip address (no vip2 or node1 ip).
    1) Should the application just only connect to VIP ip address and not to server ip address?
    2) Because our JDBC entry only contains VIP1 and VIP2 hostname, does it normal that the application can resolve the NODE2 ipaddress? ( we look on all application's configuration files, we are sure that application does not have knowledge of node1/node2 ip address or hostname)
    3) Is this a normal VIP hostname to be resolved to database ip address and not VIP ip address?
    4) I read about VIP address that it will be mapped to the other node's MAC when the node fail, could this happen because VIP misconfiguration?
    Edited by: user644523 on Aug 19, 2010 1:34 PM

    Hi buddy,
    When I look at the list of connections using netstat, I am seeing the connection was established to vip1's ip address as well as node2's ip address (no vip2 or node1 ip). pls show us what You are seing.
    1) Should the application just only connect to VIP ip address and not to server ip address? Yes
    2) Because our JDBC entry only contains VIP1 and VIP2 hostname, does it normal that the application can resolve the NODE2 ipaddress? ( we look on all application's configuration files, we are sure that application does not have knowledge of node1/node2 ip address or hostname)No it's not. It should use the vip until release 11.1 and on release 11.2 the scan
    3) Is this a normal VIP hostname to be resolved to database ip address and not VIP ip address?no, it's not it should be resolved to the ip address configured to the VIP
    4) I read about VIP address that it will be mapped to the other node's MAC when the node fail, could this happen because VIP misconfiguration?We have to check that. Get the nodeapps config (ask trhe dba the output off "srvctl config nodeapps -n <nodename> -a" ) for all nodes and check if the client machine is resolving the name to the right IP address. (good start I guess).
    Regards,
    Cerreia

  • Oracle Rac 11GR2 installation issues;

    Hi fellows. I wonder if you could help me with an issue:
    I have this Oracle RAC with 2 nodes, using OS Oracle Linux 6.3 with architecture x86_64.
    The installation was doing ok, but on "Prerequisite Checks" i had a problem:
    "Device Checks for ASM"
    Device Checks for ASM - This is a pre-check to verify if the specified devices meet the requirements for configuration through the Oracle Universal Storage Manager Configuration Assistant.  Error:
    "/dev/oracleasm/disks/OCR1" is not shared  - Cause: Cause Of Problem Not Available  - Action: User Action Not Available
      Check Failed on Nodes: [cmsora02,  cmsora01] 
    Verification result of failed node: cmsora02
    Details:
    PRVF-5149 : WARNING: Storage "/dev/oracleasm/disks/OCR1" is not shared on all nodes  - Cause:   - Action: 
    Back to Top 
    Verification result of failed node: cmsora01
    Details:
    PRVF-5149 : WARNING: Storage "/dev/oracleasm/disks/OCR1" is not shared on all nodes  - Cause:   - Action: 
    Back to Top
    But, the thing is: They are Shareable.
    I will prove to you:
    [root@cmsora01:/root]$ oracleasm listdisks
    DAT1
    FRA1
    OCR1
    OCR2
    OCR3
    OCR4
    OCR5
    REDA1
    REDB1
    And
    [root@cmsora02:/root]$ oracleasm listdisks
    DAT1
    FRA1
    OCR1
    OCR2
    OCR3
    OCR4
    OCR5
    REDA1
    REDB1
    I don't know if this proves a thing, but the disks are readable for both nodes.
    Anyone has passed for this situation before ?
    Any hint would be helpful.
    Thanks in Advance.
    Regards.

    check the following links if it help's you out
    http://www.oracler.net/?p=66
    http://nnarimanov.blogspot.in/2012/11/asm-disk-permissions-in-rhel6.html

  • Oracle rac o2cb

    I am setting up oracle rac on 2 nodes.
    Redhat5.5 x64 on vsphere4,esx4.0,FC disks.
    I have 1 ocr disk,1 voting disk and 3 asm disks.All on vmware raw device mapping.
    Ocr and Voting are on ocfs2 block devices.I installed grid infrastructure and cluster is online and loaded.
    But i cannot stop o2cb service.
    [root@rac1 ~]# /etc/init.d/o2cb stop
    Stopping O2CB cluster ocfs2: Failed
    Unable to stop cluster as heartbeat region still active
    [root@rac2 /]# /etc/init.d/o2cb stop
    Stopping O2CB cluster ocfs2: Failed
    Unable to stop cluster as heartbeat region still active
    Because of that i cannot power off my server.I execute poweroff but when ocfs2 cluster fail to stop the machine goes to reboot.
    The question is...why cant I stop o2cb service?

    [root@rac1 ~]# umount /ocr
    umount: /ocr: device is busy
    umount: /ocr: device is busy
    [root@rac1 ~]# umount /voting/
    umount: /voting: device is busy
    umount: /voting: device is busy
    [root@rac1 ~]# /etc/init.d/o2cb offline ocfs2
    Stopping O2CB cluster ocfs2: Failed
    Unable to stop cluster as heartbeat region still active
    [root@rac1 ~]# /etc/init.d/o2cb force-offline ocfs2
    Stopping O2CB cluster ocfs2: Failed
    Unable to stop cluster as heartbeat region still active
    [root@rac1 ~]# mounted.ocfs2 -d
    Device FS Stack UUID Label
    /dev/sdc1 ocfs2 o2cb DBB0548511E540649EF06D375F2B128B /OCR
    /dev/sdd1 ocfs2 o2cb 6955A6CFC0AB4926B85A6830C70344F0 /VOTING
    Still nothing.

  • CRS10g patchset을 적용한 이후 Veritas SF Oracle RAC 관련 MODULE

    Problem Description
    다음은 SF(Storage Foundation) Oracle RAC를 구성하기 위하여 Oracle CRS
    PATCHSET 등을 적용하기 전에 SF Oracle RAC Veritas libraries 를 ORACLE_HOME에 설치해야 한다는 내용입니다.
    또한, ORACLE CRS 패치 Version 을 올리면서 Veritas 쪽 관련하여 필요한
    Veritas skgxp module 등이 설치 과정에 빠져 있거나 Veritas library file들이 overwrite되어 에러를 만나는 경우를 볼 수 있습니다.
    이런 관점에서 Veritas storage foundation 을 설치하고 oracle CRS stack을
    설치 또는 갱신하는 과정에서 만날 수 있는 에러에 대해 원인 및 해결방안을 알아봅니다.
    에러 증상
    다음은 CRS 10.2.0.3.0 Patchset이 설치되면서 Veritas library가 overwrite 되면서 발생하는 에러임.
    Veritas clusterware 5.0 not recognized by Oracle due to the fact that Veritas libraries over written with crs 10.2.0.3.0 patchset installation.
    The cssd.log shows:
    [ CSSD]2007-11-08 03:28:02.603 [5] >TRACE: clssnm_skgxnmon: skgxn init failed,
    rc 1 [ CSSD]2007-11-08 03:28:02.603 [1] >TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor [ CSSD]2007-11-08 03:28:02.604 [1] >TRACE:
    clssnmNMInitialize: misscount set to (30), impending reconfig threshold set to
    (26)
    The cssd.log should show this
    [CSSD]2007-09-20 14:14:06.008 [5] >TRACE: clssnm_skgxninit: initialized
    skgxn version (2/0/Veritas Cluster Server MM <<== USING VERITAS SKGXN LIBRARY
    Changes
    CRS patchset 의 installation 이후에 Veritas library file들이 overwrite되는 문제 발생.
    Explanation
    Environment :
    Oracle Server - Enterprise Edition - Version: 10.2.0.3.0
    OS : Solaris Operating System (SPARC 64-bit)
    이 예제는 sun os에 근거함.
    OS에 따라 해결책이 약간 다를 수는 있습니다.
    원인
    새로운 CRS patch 관련 변경이 있은 이후에 Veritas SFRAC 관련 module이 인식되지 못하는 문제 야기됨.
    Installing and Configuring SF Oracle RAC Software
    Veritas 에서 제공하는 Storage/Clusterware 관련한 절차는 이 문서에서 생략합니다.
    Install CD 안의 installsfrac 스크립트 수행 session 에서 RAC software installation을 하는 과정이 들어갑니다.
    Veritas가 제공하는 SF Oracle RAC 5.0 installation 문서를 보면 다음과 같은 내용이 포함되어 있습니다.
    아래 내용에는 SF Oracle RAC component 들을 설치하는 과정만 기술합니다.
    # cd /cdrom/storage_foundation_for_oracle_rac
    # .installsfrac -configure
    Note: Do not run root.sh yet, but return to the installsfrac session from step 15.
    19 In the installsfrac session, press Return. The installsfrac utility now verifies
    the database software installation, copies the SF Oracle RAC libraries to $ORACLE_HOME, and relinks Oracle on each node in the cluster.
    다음은 위와 같은 installsfrac 스크립트를 수행한 이후에 Installer가 5.0 SF Oracle RAC library file들을 ORACLE_HOME으로 Copy하는 과정이고, Oracle library file들을 relink합니다.
    자세한 절차는 다음의 site에서 step 별로 확인을 할 수 있습니다.
    http://ftp.support.veritas.com/pub/support/products/DBE_Advanced_Cluster_for_Oracle_RAC/288502.pdf
    [ 참조 ]
    Performing Post-upgrade tasks for SF Oracle RAC 5.0 MP1.
    To Relink Oracle 10g R1 or R2 using the installer.
    installsfrac를 invoke하기 위해 다음과 같이 stage가 있다고 가정하고 invoke를 합니다.
    # cd /opt/VRTS/install
    # ./installsfrac -configure
    Oracle environment information verification 과정입니다.
    Oracle Unix User : oracle
    Oracle Unix Group : oinstall
    Oracle Clusterware (CRS) Home: /app/oracle/orahome
    Oracle Release: 10.2
    Oracle Patch Level: 0.1
    Oracle Base: /app/oracle
    Oracle Home: /app/oracle/orahome
    Is this information correct: [y,n,q] (y)
    Verifying binaries in /app/oracle/orahome on galaxy ...ok
    Verifying binaries in /app/oracle/orahome on nebula ...ok
    Copying SFRAC libskgxn on galaxy ......................ok
    Copying SFRAC libskgxn on nebula ......................ok
    Copying SFRAC ODM library on galaxy ...................ok
    Copying SFRAC ODM library on nebula ...................ok
    Copying SFRAC libskgxp on galaxy ......................ok
    Copying SFRAC libskgxp on nebula ......................ok
    Relinking Oracle on galaxy ............................ok
    Relinking Oracle on nebula ............................ok
    Oracle Relinking is now complete.
    Solution Description
    Article
    http://seer.entsupport.symantec.com/docs/288502.htm
    다음은 CRS patchset 적용 후, 문제 해결 위해 Oracle 10g 를 manual 하게 relink하는 과정입니다.
    방법1. Relinking Oracle 10g (Using the command line)
    [ 10g R1 ]
    For Oracle 10gR1, enter one set of the following commands
    a. For 32bit oracle:
    # cp /opt/VRTSvcs/rac/lib/libskgxn2_32.so
    /opt/ORCLcluster/rac/lib/libskgxn2.so
    $ cp /opt/VRTSvcs/rac/lib/libskgxp10_ver23_32.so
    $ORACLE_HOME/lib32/libskgxp.so
    $ ln -s /usr/lib/libodm.so libodm10.so
    b. For 64bit oracle:
    # cp /opt/VRTSvcs/rac/lib/libskgxn2_64.so
    /opt/ORCLcluster/rac/lib/libskgxn2.so
    $ cp /opt/VRTSvcs/rac/lib/libskgxp10_ver23_64.so
    $ORACLE_HOME/lib32/libskgxp.so
    $ ln -s /usr/lib/amd64/libodm.so libodm10.so
    [ 10g R2 ]
    For 10gR2, enter one set of the following commands:
    a. For 32bit oracle:
    # cp /opt/VRTSvcs/rac/lib/libskgxn2_32.so
    /opt/ORCLcluster/rac/lib/libskgxn2.so
    $ cp /opt/VRTSvcs/rac/lib/libskgxp10_ver25_32.so
    $ORACLE_HOME/lib32/libskgxp.so
    b. For 64bit oracle:
    # cp /opt/VRTSvcs/rac/lib/libskgxn2_64.so
    /opt/ORCLcluster/rac/lib/libskgxn2.so
    $ cp /opt/VRTSvcs/rac/lib/libskgxp10_ver25_64.so
    $ORACLE_HOME/lib32/libskgxp.so
    방법2. Relinking Oracle10g (Installer)
    위 Article에서 Relinking Oracle 10g after upgrading SF Oracle RAC 부분을 참조하여 해결하는 것도 가능합니다.
    여기서 2개의 node name이 galaxy 와 nebula 라고 가정합니다.
    1. Invoke installsfrac once again:
    # cd /opt/VRTS/install
    #./installsfrac -configure
    2. Enter the system names when prompted:
    Enter the system names separated by spaces on which to configure
    SFRAC: galaxy nebula
    3. Navigate to the "Install and Relink Oracle" menu.
    a. Select the appropriate Oracle 10g version (3):
    1) Oracle 10gR1
    2) Oracle 10gR2
    b. Select "Relink Oracle" (3) from the menu:
    1) Install Oracle Clusterware (CRS)
    2) Install Oracle RDBMS server
    3) Relink Oracle
    b) [Go to previous menu]
    c. From the menu displayed, enter the required information. For example:
    Enter Oracle UNIX user name: (oracle) oracle
    Enter Oracle UNIX group name: [b] (oinstall) oinstall
    Enter Oracle base directory: [b] /app/oracle
    Enter absolute path of CRS Home directory: [b] /app/crshome
    Enter absolute path of Database Home directory: [b] /app/oracle/orahome
    Enter Oracle Bits (64/32) [b] (64) 64
    d. Confirm your responses in the verification screen. The installer copies the SF 5.0 Oracle RAC libraries to /opt/ORCLcluster, where it expects libskgxn.
    Oracle environment information verification
    Oracle Unix User: oracle
    Oracle Unix Group: oinstall
    Oracle Clusterware (CRS) Home: /app/crshome
    Oracle Release: 10.2
    Oracle Bits: 64
    Oracle Base: /app/oracle
    Oracle Home: /app/oracle/orahome
    Is this information correct? [y,n,q] (y)
    galaxy
    Copying /opt/VRTSvcs/rac/lib/libskgxn2_64.so
    /opt/ORCLcluster/lib/libskgxn2.so ........... success
    nebula
    Copying /opt/VRTSvcs/rac/lib/libskgxn2_64.so
    /opt/ORCLcluster/lib/libskgxn2.so .............. success
    galaxy
    Copying /opt/VRTSvcs/rac/lib/libskgxp10_ver25_64.so to
    /app/oracle/orahome/lib/libskgxp10.so ........... success
    Removing /oracle/10g/lib/libodm10.so ............ success
    Linking /opt/VRTSodm/lib/amd64/libodm.so /app/oracle/orahome/
    lib/libodm10.so ... success
    Setting permissions oracle:oinstall /app/oracle/orahome/lib/
    libskgxp10.so ... success
    nebula
    Copying /opt/VRTSvcs/rac/lib/libskgxp10_ver25_64.so to
    /app/oracle/orahome/lib/libskgxp10.so ........... success
    Removing /oracle/10g/lib/libodm10.so ............ success
    Linking /opt/VRTSodm/lib/amd64/libodm.so /app/oracle/orahome/
    lib/libodm10.so ... success
    Setting permissions oracle:oinstall /app/oracle/orahome/lib/
    libskgxp10.so ... success
    e. Enter "q" at the next prompt to leave the installer now that CRS setup
    tasks are complete.
    4. Bring the CSSD resource online. Enter:
    # hares -online cssd -sys galaxy
    # hares -online cssd -sys nebula
    5. Confirm that CRS in online. Enter:
    $CRS_HOME/bin/crs_stat -t
    6. Bring online the oracle resources configured under VCS. If they're directly controlled by CRS, you may run the CRS commands to start the instance.
    [ 참고 ]
    CRS patch 적용 이후에 HP server와 Veritas 환경에서 CRS stack 이 올라오지
    않는 경우를 추가 설명해 봅니다.
    참고로, Veritas 쪽에서 Node status를 check해 주는 tool을 제공하고 있고,
    CRS의 init.cssd 에서 그 tool을 사용하도록 init.cssd 에 patch가 되어야 한
    다는 내용의 문서가 있습니다.
    그 작업을 위한 patch를 역시 Veritas에서 제공하고 있습니다.
    CRS patchset 이나 CRS 를 위한 cumulative patch를 하기 전에 SFRAC 를 위해
    init.cssd에 patch를 적용해야 합니다.
    Symptoms
    1. postrootpatch.sh hangs when apply a patch[set]
    2. prerootpatch.sh hangs when rollback a patch[set]
    자세한 사항은 다음의 문서에서 안내하고 있습니다.
    http://seer.entsupport.symantec.com/docs/281875.htm
    Late Breaking News (LBN) - Updates to the Release Notes for Veritas Storage
    Foundation (tm) and High Availability Solutions 5.0 and 5.0 Maintenance
    Pack 1
    on HP-UX 11iv2 and cross references to product documentation
    환경
    Veritas on HP server
    이 patch 는 HP-UX 11i 에만 해당함.
    예방책
    Before you run the root.sh script, you need to add the init.cssd.patch.
    a. Open another window on the system where you are running the installer
    b. Log in as superuser
    c. Change to the directory where the patch is to be copied:
    For Oracle 10gR1:
    # cd $CRS_HOME/css/admin
    # cp /opt/VRTSvcs/rac/patch /init.cssd-10gR1.patch .
    For Oracle 10gR2:
    # cd $CRS_HOME/css/admin
    # cp /opt/VRTSvcs/rac/patch /init.cssd-10gR2.patch .
    d. Run the following command to install the patch:
    For Oracle 10gR1:
    # patch < init.cssd-10gR1.patch init.cssd
    For Oracle 10gR2:
    # patch < init.cssd-10gR2.patch init.cssd
    e. Run the root.sh script. For example:
    # cd $CRS_HOME
    # ./root.sh
    This starts the CRS daemons on the node where you enter the command.
    References
    http://ftp.support.veritas.com/pub/support/products/DBE_Advanced_Cluster_for_Oracle_RAC/288502.pdf
    http://ftp.support.veritas.com/pub/support/products/DBE_Advanced_Cluster_for_Oracle_RAC/283979.pdf
    http://seer.entsupport.symantec.com/docs/281875.htm
    <Note:467753.1> Title : Veritas clusterware 5.0 not recognized by Oracle
    due to the fact that Veritas libraries over written with crs 10.2.0.3 patchset installation

  • Enterprise User Security (EUS) with Oracle RAC database

    Hi all,
    i'm experiencing a problem configuring centralized AAA on Oracle OID for Oracle RAC Database.
    My environment is:
    1) Oracle OID 10g (192.168.15.245 - rh4oidserver.klab.it)
    2) Oracle RAC database 11g
    I successfull configured a standalone Oracle Database to authenticate user in OID centralized repository, but i'm experiencing different problem to do, with RAC, same things.
    In dept:
    1) Oracle RAC works correctly and internal user (SYS,Oracle, ecc.) are correctly authenticated and authorizated against database
    2) Oracle RAC register himself in OID (see attached snapshoot)
    3) I run sqlplus to connect on Oracle RAC using OID users and i get following error: ORA-28030 Server encountered problems accessing LDAP directory service
    Using a sniffer, i can see a reset message after SSL handshake (SSL v3 encrypted alert), but i don't undenstand root cause....
    Host file on RAC server is:
    # Do not remove the following line, or various programs
    # that require network functionality will fail.
    127.0.0.1          localhost.localdomain localhost
    ::1          localhost6.localdomain6 localhost6
    # Public
    192.168.15.177          orclrac1.klab.it orclrac1
    192.168.15.178 orclrac2.klab.it orclrac2
    #Private
    192.168.1.100          orclrac1-priv.klab.it orclrac1-priv
    192.168.1.105 orclrac2-priv.klab.it orclrac2-priv
    #Virtual
    192.168.15.88 orclrac1-vip.klab.it orclrac1-vip
    192.168.15.96 orclrac2-vip.klab.it orclrac2-vip
    92.168.15.184 openfiler.klab.it openfiler
    192.168.1.90 openfiler-priv.klab.it openfiler-priv
    192.168.15.246     acti.klab.it acti
    #192.168.1.245 rh4oidserver.klab.it rh4oidserver
    192.168.15.245 rh4oidserver.klab.it rh4oidserver
    tnsname.ora is:
    # tnsnames.ora Network Configuration File: /u01/app/oracle/product/11.1.0/db_1/network/admin/tnsnames.ora
    # Generated by Oracle configuration tools.
    RACDB1 =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = orclrac1-vip)(PORT = 1521))
    (CONNECT_DATA =
    (SERVER = DEDICATED)
    (SERVICE_NAME = racdb.klab.it)
    (INSTANCE_NAME = racdb1)
    RACDB =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = orclrac1-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = orclrac2-vip)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
    (SERVER = DEDICATED)
    (SERVICE_NAME = racdb.klab.it)
    LISTENERS_RACDB =
    (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = orclrac1-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = orclrac2-vip)(PORT = 1521))
    RACDB2 =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = orclrac2-vip)(PORT = 1521))
    (CONNECT_DATA =
    (SERVER = DEDICATED)
    (SERVICE_NAME = racdb.klab.it)
    (INSTANCE_NAME = racdb2)
    ldap.ora is:
    # ldap.ora Network Configuration File: /u01/app/oracle/product/11.1.0/db_1/network/admin/ldap.ora
    # Generated by Oracle configuration tools.
    DIRECTORY_SERVERS= (rh4oidserver.klab.it:389:636)
    DEFAULT_ADMIN_CONTEXT = "dc=dbtest101,dc=klab,dc=it"
    DIRECTORY_SERVER_TYPE = OID
    sqlnet.ora is:
    # sqlnet.ora.orclrac1 Network Configuration File: /u01/app/oracle/product/11.1.0/db_1/network/admin/sqlnet.ora.orclrac1
    # Generated by Oracle configuration tools.
    NAMES.DIRECTORY_PATH= (LDAP,TNSNAMES)
    WALLET_LOCATION =
    (SOURCE =
    (METHOD = FILE)
    (METHOD_DATA =
    (DIRECTORY = /u01/app/oracle/admin/racdb)
    listener.ora is:
    # listener.ora.orclrac1 Network Configuration File: /u01/app/oracle/product/11.1.0/db_1/network/admin/listener.ora.orclrac1
    # Generated by Oracle configuration tools.
    LISTENER_ORCLRAC1 =
    (DESCRIPTION_LIST =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = orclrac1-vip)(PORT = 1521)(IP = FIRST))
    (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.15.177)(PORT = 1521)(IP = FIRST))
    LISTENER_ORCLRAC2 =
    (DESCRIPTION_LIST =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = orclrac1-vip)(PORT = 1521)(IP = FIRST))
    (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.15.178)(PORT = 1521)(IP = FIRST))
    Thank's in advance for any help or suggestion.
    Antonio

    Hello bipkary,
    what version are you using?
    the following link tells you everything about EUS in oracle10g R2:
    http://download.oracle.com/docs/cd/B19306_01/network.102/b14269/toc.htm

  • If use MSSQ , when oracle rac node reboot, client get TPEOS error

    Hi, all
    in my tuxedo applicaton, if we use Single Server, Single Queue mode , when reboot any Oracle RAC node, our application is ok, client can get correct result. but if we use MSSQ(Multi Server, Single Queue) , if Oracle RAC node is ok , our application also is ok. but if we reboot any Oracle RAC node, client program can continue run, get correct result, but always get TPEOS error , for this situation, server can get client request, but client can not get server reply, only get TPEOS error.
    our enviroment is :
    oracle RAC ,10g 10.2.0.4 , two instances ,rac1 rac2, and two DTP services s1 and s2, set s1 and s2 services TAF is basic
    tuxedo 10R3 , two nodes ,work in MP model ,use XA access oracle rac database,services have Transaction and not Transaction
    OS is linux AS4 U5, 64bits
    service program use OCI
    can any one encounter this problem ?

    Hi, first thanks you
    in ULOG file , only have failover information, not any other error message, in client side also has no other error.
    not use MSSQ, ubb file about MSSQ config
    SERVERS
    DEFAULT:
    CLOPT="-A "
    sinUpdate_server SRVGRP=GROUP11 SRVID=80 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinUpdate_server SRVGRP=GROUP12 SRVID=160 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinCount_server SRVGRP=GROUP11 SRVID=240 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinCount_server SRVGRP=GROUP12 SRVID=320 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinSelect_server SRVGRP=GROUP11 SRVID=360 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinSelect_server SRVGRP=GROUP12 SRVID=400 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinInsert_server SRVGRP=GROUP11 SRVID=520 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinInsert_server SRVGRP=GROUP12 SRVID=560 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDelete_server SRVGRP=GROUP11 SRVID=600 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDelete_server SRVGRP=GROUP12 SRVID=640 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDdl_server SRVGRP=GROUP11 SRVID=700 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDdl_server SRVGRP=GROUP12 SRVID=740 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    lockselect_server SRVGRP=GROUP11 SRVID=800 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    lockselect_server SRVGRP=GROUP12 SRVID=840 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    #mulup_server SRVGRP=GROUP11 SRVID=1 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    #mulup_server SRVGRP=GROUP12 SRVID=60 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinUpdate_server SRVGRP=GROUP13 SRVID=83 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinUpdate_server SRVGRP=GROUP14 SRVID=164 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinCount_server SRVGRP=GROUP13 SRVID=243 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinCount_server SRVGRP=GROUP14 SRVID=324 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinSelect_server SRVGRP=GROUP13 SRVID=363 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinSelect_server SRVGRP=GROUP14 SRVID=404 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinInsert_server SRVGRP=GROUP13 SRVID=523 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinInsert_server SRVGRP=GROUP14 SRVID=564 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDelete_server SRVGRP=GROUP13 SRVID=603 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDelete_server SRVGRP=GROUP14 SRVID=644 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDdl_server SRVGRP=GROUP13 SRVID=703 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDdl_server SRVGRP=GROUP14 SRVID=744 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    lockselect_server SRVGRP=GROUP13 SRVID=803 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    lockselect_server SRVGRP=GROUP14 SRVID=844 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    #mulup_server SRVGRP=GROUP13 SRVID=13 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    #mulup_server SRVGRP=GROUP14 SRVID=64 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    WSL SRVGRP=GROUP11 SRVID=1000
    CLOPT="-A -- -n//120.3.8.237:7200 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP12 SRVID=1001
    CLOPT="-A -- -n//120.3.8.238:7200 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP13 SRVID=1003
    CLOPT="-A -- -n//120.3.8.237:7203 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP14 SRVID=1004
    CLOPT="-A -- -n//120.3.8.238:7204 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    if we use MSSQ ,ubb file about MSSQ config is
    *SERVERS
    DEFAULT:
    CLOPT="-A -p 1,60:1,30"
    sinUpdate_server SRVGRP=GROUP11 SRVID=80 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinUpdate11 REPLYQ=Y
    sinUpdate_server SRVGRP=GROUP12 SRVID=160 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinUpdate12 REPLYQ=Y
    sinCount_server SRVGRP=GROUP11 SRVID=240 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinCount11 REPLYQ=Y
    sinCount_server SRVGRP=GROUP12 SRVID=320 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinCount12 REPLYQ=Y
    sinSelect_server SRVGRP=GROUP11 SRVID=360 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinSelec11 REPLYQ=Y
    sinSelect_server SRVGRP=GROUP12 SRVID=400 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinSelect12 REPLYQ=Y
    sinInsert_server SRVGRP=GROUP11 SRVID=520 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinInsert11 REPLYQ=Y
    sinInsert_server SRVGRP=GROUP12 SRVID=560 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinInsert12 REPLYQ=Y
    sinDelete_server SRVGRP=GROUP11 SRVID=600 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDelete11 REPLYQ=Y
    sinDelete_server SRVGRP=GROUP12 SRVID=640 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDelete12 REPLYQ=Y
    sinDdl_server SRVGRP=GROUP11 SRVID=700 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDdl11 REPLYQ=Y
    sinDdl_server SRVGRP=GROUP12 SRVID=740 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDdl12 REPLYQ=Y
    lockselect_server SRVGRP=GROUP11 SRVID=800 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=lockselect11 REPLYQ=Y
    lockselect_server SRVGRP=GROUP12 SRVID=840 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=lockselect12 REPLYQ=Y
    #mulup_server SRVGRP=GROUP11 SRVID=1 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=mulup11 REPLYQ=Y
    #mulup_server SRVGRP=GROUP12 SRVID=60 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=mulup12 REPLYQ=Y
    sinUpdate_server SRVGRP=GROUP13 SRVID=83 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinUpdate13 REPLYQ=Y
    sinUpdate_server SRVGRP=GROUP14 SRVID=164 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinUpdate14 REPLYQ=Y
    sinCount_server SRVGRP=GROUP13 SRVID=243 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinCount13 REPLYQ=Y
    sinCount_server SRVGRP=GROUP14 SRVID=324 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinCount14 REPLYQ=Y
    sinSelect_server SRVGRP=GROUP13 SRVID=363 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinSelec13 REPLYQ=Y
    sinSelect_server SRVGRP=GROUP14 SRVID=404 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinSelect14 REPLYQ=Y
    sinInsert_server SRVGRP=GROUP13 SRVID=523 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinInsert13 REPLYQ=Y
    sinInsert_server SRVGRP=GROUP14 SRVID=564 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinInsert14 REPLYQ=Y
    sinDelete_server SRVGRP=GROUP13 SRVID=603 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDelete13 REPLYQ=Y
    sinDelete_server SRVGRP=GROUP14 SRVID=644 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDelete14 REPLYQ=Y
    sinDdl_server SRVGRP=GROUP13 SRVID=703 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDdl13 REPLYQ=Y
    sinDdl_server SRVGRP=GROUP14 SRVID=744 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDdl14 REPLYQ=Y
    lockselect_server SRVGRP=GROUP13 SRVID=803 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=lockselect13 REPLYQ=Y
    lockselect_server SRVGRP=GROUP14 SRVID=844 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=lockselect14 REPLYQ=Y
    #mulup_server SRVGRP=GROUP13 SRVID=13 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=mulup13 REPLYQ=Y
    #mulup_server SRVGRP=GROUP14 SRVID=64 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=mulup14 REPLYQ=Y
    WSL SRVGRP=GROUP11 SRVID=1000
    CLOPT="-A -- -n//120.3.8.237:7200 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP12 SRVID=1001
    CLOPT="-A -- -n//120.3.8.238:7200 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP13 SRVID=1003
    CLOPT="-A -- -n//120.3.8.237:7203 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP14 SRVID=1004
    CLOPT="-A -- -n//120.3.8.238:7204 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    about above ubb file ,has any error ? or not correct use MSSQ
    look forward to you answer,thanks.

  • Oracle RAC Nodes getting reboot in case of preferred controller failed

    When we are disconnecting both Fiber cable from preferred Controller A or plugging out Controller A card from Disk Array(IBM DS 4300), After 90 seconds both the servers are rebooting.
    In this time complete RAC network is going out of service for approx 5 minutes.After reboot both servers are coming with both instances without any manual intervention
    It’s a critical issue for us because we are loosing High Availability, Let us know how we can resolve this critical issue.
    Detail of Network:
    1. Software- Oracle 10g Release2
    2. OS- Redhat Linux 3 (Kernel Version-2.4.21-27.ELsmp)
    3. Shared Storage- IBM DS 4300.
    4. Multipathing Driver - RDAC (rdac-LINUX-09.00 A5.13)
    4. Nodes- IBM 346
    5. Databse on ASM
    6. ASM,OCR & Voting Disk Preferred controller is A.
    7. Hangcheck timer value is 210 seconds.
    8. Both Server available with 2 HBA port . I HBA port is connected with Controller A and Seconfd HBA port is connected with Controller B of SAN Disk Array.
    As per my understanding,
    Voting disk resides in Disk Array and Controller A is preferred owner of Voting Disk LUN.. When i am disconnecting both fiber cable from preferred controller A , then Both Nodes Clusterware software trying to contact with Voting Disk, When they are unable to contact with Voting disk in specfic time period, they are going for reboot.
    I tested Controller failure testing with Oracle RAC software as well without Oracle. Without Oracle its working fine and reason behind, in that time Disk Array is waiting for approx 300 seconds for changing preferred controlller from A to B.
    But With Oracle, Clusterware Software reboot both nodes before Controller can shift from A to B.
    So if i conclude,the tech who has good understanding of Oracle Clusterware on Linux OS & IBM RDAC multipath driver can help me.
    when we install Oracle RAC on Linux, it is required to configure hangcheck timer.
    Oracle recomends 180 second.
    It means if one of node is hanging, then second node will wait for 180 seconds, if within 180 seconds ,it is not able to resolve this situation then it will reboot hung node.
    I think Hangcheck timer configuration reuired only with Linux OS.
    Configuration File
    cat >> /etc/rc.d/rc.local << EOF
    modprobe hangcheck-timer hangcheck_tick=15 hangcheck_margin=60

    Sorry
    Hangcheck timer is
    Configuration File
    cat >> /etc/rc.d/rc.local << EOF
    modprobe hangcheck-timer hangcheck_tick=30 hangcheck_margin=180

  • Details regarding Oracle RAC One node.

    Hi
    I am trying to google regarding the Oracle RAC one node. But I couldnt get an exact details of it. I even tried in Metallink too. Could you guys please provide the link or Notes ID for this Oracle RAC one node which includes what is it ? How to set up this one node RAC etc etc !
    Thanks

    There are couple of notes ,
    http://www.oracle.com/us/products/database/options/rac-one-node/overview/index.html
    http://www.oracle.com/technetwork/products/clustering/overview/ds-rac-one-node-11gr2-185089.pdf
    http://download.oracle.com/docs/cd/E11882_01/rac.112/e16795/onenode.htm#BABGAJGH
    http://download.oracle.com/docs/cd/E11882_01/install.112/e17214/racinstl.htm#CIHGGAAE
    http://download.oracle.com/docs/cd/E11882_01/server.112/e17157/architectures.htm#CJAJEAGH
    Aman....

Maybe you are looking for