When one node reboot other node in RAC

Hi Friends,
I faced one situation where one node of RAC cluster had been rebooted by other node. This happen due to network interconnect link fluctuation.
Sep 13 16:23:48 kkvs1a su: [ID 810491 auth.crit] 'su admin' failed for wipro1 on /dev/pts/3
Sep 14 00:22:17 kkvs1a ixgbe: [ID 611667 kern.info] NOTICE: ixgbe3: link down
Sep 14 00:22:21 kkvs1a ixgbe: [ID 611667 kern.info] NOTICE: ixgbe3: link up, , full duplex
Sep 14 00:22:31 kkvs1a ixgbe: [ID 611667 kern.info] NOTICE: ixgbe1: link down
Sep 14 00:22:31 kkvs1a ixgbe: [ID 611667 kern.info] NOTICE: ixgbe3: link down
/opt/oracle/product/10.2.0/crs/log/node1/alertkk1a.log
==============================================
2013-09-14 00:22:05.180
[cssd(12561)]CRS-1612:node kk1b (2) at 50% heartbeat fatal, eviction in 14.251 seconds
2013-09-14 00:22:12.180
[cssd(12561)]CRS-1611:node kk1b (2) at 75% heartbeat fatal, eviction in 7.251 seconds
2013-09-14 00:22:13.180
[cssd(12561)]CRS-1611:node kk1b (2) at 75% heartbeat fatal, eviction in 6.251 seconds
2013-09-14 00:22:17.179
[cssd(12561)]CRS-1610:node kk1b (2) at 90% heartbeat fatal, eviction in 2.251 seconds
2013-09-14 00:22:18.180
[cssd(12561)]CRS-1610:node kkvs1b (2) at 90% heartbeat fatal, eviction in 1.251 seconds
This clearly shows CSSD of node kkvs1a has given node eviction message to kkvs1b node.
I got following messages on the instance which got rebooted:
ASM alert log:
Sat Sep 14 00:22:25 IST 2013
Error: KGXGN aborts the instance (6)
Sat Sep 14 00:22:25 IST 2013
Errors in file /opt/oracle/admin/+ASM/bdump/+asm2_lmon_8527.trc:
ORA-29702: error occurred in Cluster Group Service operation
LMON: terminating instance due to error 29702
A network fluctuation shouldn't give reboot like this. Then why oracle design like this way? Is this a bug? My oracle version is: 10.2.0.5.0
Could you tell me the other possible situations when 1 RC instance reboots other RAC instacne.

What you are describing is the expected behaviour: if your interconnect fails, you will have a node eviction. Releases < 11.2.0.2 evict a node by reboot, which can fix the problem: the NIC may come up correctly when the machine re-starts. Releases >= 11.2.0.2 can often evict without a re-boot. But either way, if your interconnect goes down, a node must be evicted to prevent uncoordinated disc writes.
If you are interested, you can find some discussion and demos of this in a series of webcasts I've recorded,
Free Oracle Database Tutorials for Administration and Developers
If you really don't like this behaviour and the problems are transient, you can try 'raising the CSS MISSCOUNT parameter.
John Watson
Oracle Certified Master DBA

Similar Messages

  • How to drag and drop user from one node to other node.

    Dear All,
    How to drag and drop user from one node to other node.I tried but no success.
    What are precautions to be taken.
    Cay anybody kindly explain it.
    Thank you.

    Hello, if you had this message you had created BP....
    Now you don't have to user USERS_GEN this transaction is used only in first action, when you create the user in R/3 and then you pass this user to EBP in the organizational structure.
    Now you have to:
    1) Go to PPOMA_BBP
    2) Double click on organizational unit that you want to put this user (purchasing organization or purchasing group box for example)
    3) Select assign button in the top of the functions in the transaction
    4) Click on incorporates -- position
    5) Put userID that you want to add in this organizational unit
    6) Click Save
    Thanks
    Rosa

  • How can i add selected element from one node to other node

    Hi All
    I have below requirement.
    Say In the view leftside Available Languages ItemList Box  Rightside Available Languages ItemList Box and Add & Remove Buttons in the middle.
    User selects languages from Available Languages ItemList Box clicks on add it will adds to the Available Languages Item List box and ViceVersa with Remove Button.
    1) I have created 2 nodes 1) AvlLang 2) SelLang both contains two attributes Key and Value.
    2) For AvlLang node,  Property : Data Dictonary binded with table. Values are coming fine in Available Languages ItemList Box.
    3) How can i add selected language into the node Available Languages.
    Please provide the code snippet how to achieve this.
    BR
    X- CW

    Hi Carlin,
    Find below code to copy selected record from one node to another.
    Here I am copying it_lips node into pack_mat node.
    DATA: wa_temp TYPE REF TO if_wd_context_element,
                lt_temp TYPE wdr_context_element_set,
                count type c.
          DATA : lo_nd_it_lips TYPE REF TO if_wd_context_node,
                 lo_el_it_lips TYPE REF TO if_wd_context_element,
                 ls_it_lips TYPE wd_this->Element_it_lips,
                 lt_it_lips TYPE wd_this->Elements_it_lips,
                 ls_unpack TYPE wd_this->Element_unpack,
                 lt_unpack TYPE wd_this->Elements_unpack.
    * navigate from <CONTEXT> to <IT_LIPS> via lead selection
          lo_nd_it_lips = wd_context->path_get_node( path = `ZRETURN_DEL_CHANGE.CHANGING_3.IT_LIPS` ).
          CALL METHOD lo_nd_it_lips->get_selected_elements
            RECEIVING
              set = lt_temp.
    * navigate from <CONTEXT> to <PACK_MAT> via lead selection
          lo_nd_pack_mat = wd_context->get_child_node( name = wd_this->wdctx_pack_mat ).
          LOOP AT lt_temp INTO wa_temp.
            CALL METHOD wa_temp->get_static_attributes
              IMPORTING
                static_attributes = ls_it_lips.
                  ls_pack_mat-vgbel = ls_it_lips-vgbel.
                  ls_pack_mat-vgpos = ls_it_lips-vgpos.
                  append ls_pack_mat to lt_pack_mat.
                  CLEAR ls_pack_mat.
          endloop.
            lo_nd_pack_mat->bind_table( new_items = LT_PACK_MAT
                                        SET_INITIAL_ELEMENTS = abap_true ).
    Cheers,
    Kris.

  • SC 3.1 and Oracle 10g RAC: instance goes down when rebooting other node

    I have Sun cluster 3.1 with Oracle 10gR2 on Solaris 10 Sparc. Thanks to this forum now that my cluster seems fine with a database running. However I still have one problem: when I reboot node1, the instance on node2 also disappears. The instance on the node2 will make itself alive once node1 comes back. This happens also for the instance on node1 if I reboot node2.
    The interconnect cables are direct cross-over cable.
    Any input is appreciated,
    Luke

    Although I am not TIm I can anticipate his answer, as he gave it 3 topics back in this forum. You cannot mount UFS on top of shared SVM. It does not work as you can see with your own configuration. The only shared filesystem that works for RAC is shared QFS. The doc
    http://docs.sun.com/app/docs/doc/819-0583/6n30h62v7?a=view
    has all the details.
    If you need a shared filesystem for your binaries or whatever, you have to use UFS/PxFS but that sits on top of normal SVM and not shared SVM.
    Hartmut

  • Referencing the Tracker node in other nodes

    I know I can use the MatchMove or Stabilize nodes alone but really, how do I reference the tracks that I create with the Tracker node in non-tracking nodes such as Move2D, Rotate, Pan? I have used linking tracks to QuickShapes and RotoShapes, this turned out to be easy. Please do not laugh, I cannot find how to use tracks from the Tracker with other types of nodes. Advice would be highly appreciated and promptly marked as "Helpful" or "Solved", I know how to do this either.

    Not really sure what you want to do. Trackers are
    good -- you can find all sorts of things to do with
    tracker information - but if you can't get it in a
    node it's gonna be useless to you.
    Thanks Cap'n. It all began when a colleague (who uses mainly Windows compositing applications) attached a pair of huge feathered wings to a character in a movie we are working on, sort of Travolta's "Michael" from the 90-s. He used Shake's MatchMove and we all said great but he wanted to try the same effect with the Tracker node and I was silly to suggest that this would be possible in Shake because I seem to be the only one in this outfit who thinks he can navigate the manual. The manual says, quote, the Tracker node is used to create tracks that are referenced either in other tracking nodes or in non-tracking nodes such as Move2D, Rotate, and so on, unquote. This is page 750 in Chapter 25 but I still cannot find how to implement this in practice.
    No bets were placed and the discussion was purely academical but I feel a bit embarrassed that I cannot prove my point. Never mind, I will tell them I was wrong and wipe the egg off my face. I will leave the question unanswered for a while in the hope that another stubborn newbie will do the job for me and post an answer.

  • I have to MBPs one one wifi network; when one connects, the other disconnects.  Can't keep them both connected.  Any advice?

    Hello,
    We have two MBPs on an Asus router (N56U) and we find that only one computer can connect at a time.  Within a few seconds of the other connecting, the one that was connected disconnects.  When we try to log on again, we're told that the connection timed out.  Or it "connects" but there is no internet.  Any advice?
    Thank you!
    Christian

    Hi Silvergc,
    Thank you for this.  I agree:  the problem seems to be either the router or the modem.  I now have an Airport Extreme and am hoping this solves the problem.
    Only now I have another problem--the airport utility can't find the airport extreme if I plug in an external harddrive into the airport extreme.  I should start another thread for this, yes?
    Thanks again Silvergc.
    Yours,
    Christian

  • HT3384 Template will not open; Pages does not respond when one chooses template other than blank

    The Pages template system crashes when I attempt to choose a template other than blank.  How do I fix this problem, so that I can successfully choose any template that I desire?

    Is it all templates or just a few? It could be a corrupt font or duplicate fonts or an out-dated font. Launch Font Book & validate fonts & check for duplicates.

  • Failover did not happen when one node went down!!! PLEASE HELP

    Hi gurus,
    Yesterday one disaster struck my RAC database. We have two node cluster and it is 10.2.0.2, both of them located in different sites, yesterday suddenly power went down and the one of the network switch went down and got destructed, node one of RAC database was connected to that switch, but the failover did not happen to the node two as this should be the case when one node goes down the other should be available for all the node one sessions/connections.
    when I tried to ping/telnet the node 1, it was not happening because the switch was down, the network guyz connected the cables to other switch available. When I connected to the node 1, it was showing "Oracle is not available" message.
    And when I tried the other node, it was the same case but I did not see any error in alert log file. Then my TL restarted both the nodes and then the database was available.
    I am very confused that how the failover did not happen and how the database went down, PLEASE suggest something to how to identifiy what was happened. Thanks & Regards

    Thanks for your reply,
    after the network switch was replaced we connected to both the nodes and found that the instances are down with no reason given in the Alertlog file. We just restarted both the instances and then the database was up and the clients connected to both the instances with equal sessions on both the instances. I want to know that whether the failover can be done at the application side or it should be done on the database side i,e; in tnsnames.ora file with the required parameters? as in our scenario there is no failover configuration in the tnsnames.ora file.
    Thanks & Regards

  • 3 node Cluster Booting 1 node boots the other node

    Hi,
    I have build a 3 node cluster on Solaris x86 05/9/ u7 and using Sun Cluster 3.2.u1 ( IBM SVC 4.2.6.1 supports only 32.u1)
    Configuration is successful...but when i shutdown one node the other node also shutdown and when they come up, they panic claiming reservation conflict.
    Has anybody come across a situtation like this.
    Any help appreciated!
    Thanks in Adavance
    Raj

    what happens to the third node during this?
    what are you using as a quorum device and what's the status of quorum prior to the shutdown?

  • Logicalhostname IP wont failover when one member of the cluster dies

    Hi There,
    I've setup a failover cluster with 2 servers. The cluser IP is set up as a logicalhostname and each server has two network cards configured as IPMP groups.
    I can test the IPMP failover on each server by failing a network card and checkign the IP address fails over.
    I can test the logicalhost name failsover by switchign the resource group over from one node to the other
    BUT
    If I drop one member of the cluster the failover fails
    Nov 4 15:09:06 nova cl_runtime: NOTICE: clcomm: Path nova:qfe2 - gambit:qfe2 errors during initiation
    Nov 4 15:09:06 nova cl_runtime: WARNING: Path nova:ce1 - gambit:bge1 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
    Nov 4 15:09:06 nova cl_runtime: WARNING: Path nova:qfe2 - gambit:qfe2 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
    ova
    Nov 4 15:09:08 nova Cluster.PNM: PNM daemon system error: SIOCLIFADDIF failed.: Network is down
    Nov 4 15:09:08 nova Cluster.PNM: production can't plumb 130.159.17.1.
    Nov 4 15:09:08 nova SC[SUNW.LogicalHostname,test-vle,vle1,hafoip_prenet_start]: IPMP logical interface configuration operation failed with <-1>.
    Nov 4 15:09:08 nova Cluster.RGM.rgmd: Method <hafoip_prenet_start> failed on resource <vle1> in resource group <test-vle>, exit code <1>, time used: 0% of timeout <300 seconds>
    Nov 4 15:09:08 nova ip: TCP_IOC_ABORT_CONN: local = 130.159.017.001:0, remote = 000.000.000.000:0, start = -2, end = 6
    Nov 4 15:09:08 nova ip: TCP_IOC_ABORT_CONN: aborted 0 connection
    scswitch: Resource group test-vle failed to start on chosen node and may fail over to other node(s)
    Any ideas would be appreciated as I dont understand how it all fails over correctly if the cluster is up but fails when one member is down.

    Hi,
    looking at the messages, the problem seems to be with the network setup on nova. I would suggest to try to configure the logical IP on nova manually to see if that works. If that does not it should tell you where the problem is.
    Or are you saying that manually switching the RG works, but when a node dies and cluster switches the RG it doesn't. That would be strange.
    You should also post the status of your network on nova in the failure case. There might be something wrong with your IPMP setup. Or has the public net failed completely when you killed the other node?
    Regards
    Hartmut

  • SC 3.2 Solaris 10 x86. When one node reboot, the other one does also

    Configured a two node cluster with a EMC clariion san (Raid 6) for holding a zpool and use as quorum device.
    When one node goes down, the other one does also.
    There seems a problem with the quorum.
    I can not understand or figure out what actually goes wrong.
    When starting up:
    Booting as part of a cluster
    NOTICE: CMM: Node cnode01 (nodeid = 1) with votecount = 1 added.
    NOTICE: CMM: Node cnode02 (nodeid = 2) with votecount = 1 added.
    NOTICE: CMM: Quorum device 1 (/dev/did/rdsk/d1s2) added; votecount = 1, bitmask of nodes with configured paths = 0x3.
    NOTICE: clcomm: Adapter nge3 constructed
    NOTICE: clcomm: Adapter nge2 constructed
    NOTICE: CMM: Node cnode01: attempting to join cluster.
    NOTICE: nge3: link down
    NOTICE: nge2: link down
    NOTICE: nge3: link up 1000Mbps Full-Duplex
    NOTICE: nge2: link up 1000Mbps Full-Duplex
    NOTICE: nge3: link down
    NOTICE: nge2: link down
    NOTICE: nge3: link up 1000Mbps Full-Duplex
    NOTICE: nge2: link up 1000Mbps Full-Duplex
    NOTICE: CMM: Node cnode02 (nodeid: 2, incarnation #: 1248284052) has become reachable.
    NOTICE: clcomm: Path cnode01:nge2 - cnode02:nge2 online
    NOTICE: clcomm: Path cnode01:nge3 - cnode02:nge3 online
    NOTICE: CMM: Cluster has reached quorum.
    NOTICE: CMM: Node cnode01 (nodeid = 1) is up; new incarnation number = 1248284001.
    NOTICE: CMM: Node cnode02 (nodeid = 2) is up; new incarnation number = 1248284052.
    NOTICE: CMM: Cluster members: cnode01 cnode02.
    NOTICE: CMM: node econfiguration #1 completed.
    NOTICE: CMM: Node cnode01: joined cluster.
    ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast
    /dev/rdsk/c2t0d0s5 is clean
    Reading ZFS config: done.
    obtaining access to all attached disks
    cnode01 console login:
    Then this on the second node:
    Booting as part of a cluster
    NOTICE: CMM: Node cnode01 (nodeid = 1) with votecount = 1
    NOTICE: CMM: Node cnode02 (nodeid = 2) with votecount = 1
    NOTICE: CMM: Quorum device 1 (/dev/did/rdsk/d1s2) added; votecount = 1, bitmask of nodes with configured paths = 0x3.
    NOTICE: clcomm: Adapter nge3 constructed
    NOTICE: clcomm: Adapter nge2 constructed
    NOTICE: CMM: Node cnode02: attempting to join cluster.
    NOTICE: CMM: Node cnode01 (nodeid: 1, incarnation #: 1248284001) has become reachable.
    NOTICE: clcomm: Path cnode02:nge2 - cnode01:nge2 online
    NOTICE: clcomm: Path cnode02:nge3 - cnode01:nge3 online
    WARNING: CMM: Issuing a NULL Preempt failed on quorum device /dev/did/rdsk/d1s2 with error 2.
    NOTICE: CMM: Cluster has reached quorum.ion ratio 4.77, dump succeeded
    NOTICE: CMM: Node cnode01 (nodeid = 1) is up; new incarnation number = 1248284001.
    NOTICE: CMM: Node cnode02 (nodeid = 2) is up; new incarnation number = 1248284052.
    NOTICE: CMM: Cluster members: cnode01 cnode02.
    NOTICE: CMM: node reconfiguration #1 completed.
    NOTICE: CMM: Node cnode02: joined cluster.
    NOTICE: CCR: Waiting for repository synchronization to finish.
    *{color:#ff0000}WARNING: CMM: Issuing a NULL Preempt failed on quorum device /dev/did/rdsk/d1s2 with error 2.{color}*
    ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast
    /dev/rdsk/c2t0d0s5 is clean
    Reading ZFS config: done.
    obtaining access to all attached disks
    cnode02 console login:
    But when the first node reboot, on the second node this message:
    Jul 22 19:24:48 cnode02 genunix: [ID 936769 kern.info] devinfo0 is /pseudo/devinfo@0
    Jul 22 19:30:57 cnode02 nge: [ID 812601 kern.notice] NOTICE: nge3: link down
    Jul 22 19:30:57 cnode02 nge: [ID 812601 kern.notice] NOTICE: nge2: link down
    Jul 22 19:30:59 cnode02 nge: [ID 812601 kern.notice] NOTICE: nge3: link up 1000Mbps Full-Duplex
    Jul 22 19:31:00 cnode02 nge: [ID 812601 kern.notice] NOTICE: nge2: link up 1000Mbps Full-Duplex
    Jul 22 19:31:06 cnode02 genunix: [ID 489438 kern.notice] NOTICE: clcomm: Path cnode02:nge2 - cnode01:nge2 being drained
    {color:#ff0000}Jul 22 19:31:06 cnode02 scsi_vhci: [ID 734749 kern.warning] WARNING: vhci_scsi_reset 0x0{color}
    Jul 22 19:31:06 cnode02 genunix: [ID 489438 kern.notice] NOTICE: clcomm: Path cnode02:nge3 - cnode01:nge3 being drained
    Jul 22 19:31:11 cnode02 nge: [ID 812601 kern.notice] NOTICE: nge3: link down
    {color:#ff0000}Jul 22 19:31:12 cnode02 genunix: [ID 414208 kern.warning] WARNING: QUORUM_GENERIC: quorum preempt error in CMM: Error 5 --- QUORUM_GENERIC Tkown ioctl failed on quorum device /dev/did/rdsk/d1s2.{color}
    {color:#ff0000}Jul 22 19:31:12 cnode02 cl_dlpitrans: [ID 624622 kern.notice] Notifying cluster that this node is panicking
    Jul 22 19:31:12 cnode02 unix: [ID 836849 kern.notice]
    Jul 22 19:31:12 cnode02 ^Mpanic[cpu3]/thread=ffffffff8b5c06e0:
    Jul 22 19:31:12 cnode02 genunix: [ID 265925 kern.notice] CMM: Cluster lost operational quorum; aborting.{color}
    Jul 22 19:31:12 cnode02 unix: [ID 100000 kern.notice]
    Jul 22 19:31:12 cnode02 genunix: [ID 655072 kern.notice] fffffe8002651b40 genunix:vcmn_err+13 ()
    Jul 22 19:31:12 cnode02 genunix: [ID 655072 kern.notice] fffffe8002651b50 cl_runtime:__1cZsc_syslog_msg_log_no_args6FpviipkcpnR__va_list_element__nZsc_syslog_msg_status_enum__+24 ()
    Jul 22 19:31:12 cnode02 genunix: [ID 655072 kern.notice] fffffe8002651c30 cl_runtime:__1cCosNsc_syslog_msgDlog6MiipkcE_nZsc_syslog_msg_status_enum__+9d ()
    Jul 22 19:31:12 cnode02 genunix: [ID 655072 kern.notice] fffffe8002651e20 cl_haci:__1cOautomaton_implbAstate_machine_qcheck_state6M_nVcmm_automaton_event_t__+3bc ()
    Jul 22 19:31:12 cnode02 genunix: [ID 655072 kern.notice] fffffe8002651e60 cl_haci:__1cIcmm_implStransitions_thread6M_v_+de ()
    Jul 22 19:31:12 cnode02 genunix: [ID 655072 kern.notice] fffffe8002651e70 cl_haci:__1cIcmm_implYtransitions_thread_start6Fpv_v_+b ()
    Jul 22 19:31:12 cnode02 genunix: [ID 655072 kern.notice] fffffe8002651ed0 cl_orb:cllwpwrapper+106 ()
    Jul 22 19:31:13 cnode02 genunix: [ID 655072 kern.notice] fffffe8002651ee0 unix:thread_start+8 ()
    Jul 22 19:31:13 cnode02 unix: [ID 100000 kern.notice]
    Jul 22 19:31:13 cnode02 genunix: [ID 672855 kern.notice] syncing file systems...
    Jul 22 19:31:13 cnode02 genunix: [ID 733762 kern.notice] 1
    Jul 22 19:31:34 cnode02 last message repeated 20 times
    Jul 22 19:31:35 cnode02 genunix: [ID 622722 kern.notice] done (not all i/o completed)
    Jul 22 19:31:36 cnode02 genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c2t0d0s1, offset 3436511232, content: kernel
    Jul 22 19:31:45 cnode02 genunix: [ID 409368 kern.notice] ^M100% done: 136950 pages dumped, compression ratio 4.77,
    Jul 22 19:31:45 cnode02 genunix: [ID 851671 kern.notice] dump succeeded
    Jul 22 19:33:18 cnode02 genunix: [ID 540533 kern.notice] ^M

    Hi,
    the problem lies in the error message around the quorum device. The SC documentation, specifically the Sun Cluster Error Messages Guide at http://docs.sun.com/app/docs/doc/820-4681 explains this as follows:
    414208 QUORUM_GENERIC: quorum preempt error in CMM: Error %d --- QUORUM_GENERIC Tkown ioctl failed on quorum device %s.
    Description:
    This node encountered an error when issuing a QUORUM_GENERIC Take Ownership operation on a quorum device. This error indicates that the node was unsuccessful in preempting keys from the quorum device, and the partition to which it belongs was preempted. If a cluster is divided into two or more disjoint subclusters, one of these must survive as the operational cluster. The surviving cluster forces the other subclusters to abort by gathering enough votes to grant it majority quorum. This action is called "preemption of the losing subclusters".
    Solution:
    Other related messages identify the quorum device where the error occurred. If an EACCES error occurs, the QUORUM_GENERIC command might have failed because of the SCSI3 keys on the quorum device. Scrub the SCSI3 keys off the quorum device and reboot the preempted nodes."
    You should try to follow this advice. I would propose to chose a different QD before trying to do this, if you have one available. Is it possible that this LUN has been in use by a different cluster?
    To scrub SCSI3 keys you should use the scsi command in /usr/cluster/lib/sc: ./scsi -c inkeys -d <device> to check for the existence of keys, and ...-c scrub.. to remove any SCSI3 keys.
    Regards
    Hartmut

  • When we create  one service as preferred on both nodes in  two node RAC

    How to configure listener,tnsnames.ora & listener file When we create one service as preferred on both nodes in two node RAC ... ( I don't need load balancing here but i just want to create service as preferred on both nodes)
    please some one help me in this ..

    Thanks alot Sebastain for your reply..
    I am using 10.2.0.4 version and below tns entry is from my client side tns entry ..
    M4AMPRD_TEST=
    (DESCRIPTION=
    (ADDRESS= (PROTOCOL=TCP) (HOST=153.88.184.228) (PORT=1521))
    (ADDRESS= (PROTOCOL=TCP) (HOST=153.88.184.229) (PORT=1521))
    (FAILOVER=ON)
    (CONNECT_DATA=(SERVICE_NAME=M4AMPRD_TEST)
    (FAILOVER_MODE=
    (TYPE=SELECT)
    (METHOD=BASIC)
    (RETRIES=20)
    (DELAY=5)
    service creation: srvctl add service -d M4AMPRD -s M4AMPRD_TEST -r M4AMPRD1,M4AMPRD2
    But when i connect to database usign above service from client some times its working fine and some times its failing please see below log with timings how it is behaving ..
    SQL> set time on
    18:39:46 SQL> CONN m4owner/iamm4amdev!@M4AMPRD_TEST
    Connected.
    18:39:48 SQL> CONN m4owner/iamm4amdev!@M4AMPRD_TEST
    ERROR:
    ORA-12545: Connect failed because target host or object does not exist
    Warning: You are no longer connected to ORACLE.
    18:39:52 SQL> CONN m4owner/iamm4amdev!@M4AMPRD_TEST
    Connected.
    18:39:53 SQL> CONN m4owner/iamm4amdev!@M4AMPRD_TEST
    Connected.
    18:39:55 SQL> CONN m4owner/iamm4amdev!@M4AMPRD_TEST
    Connected.
    18:39:57 SQL> CONN m4owner/iamm4amdev!@M4AMPRD_TEST
    ERROR:
    ORA-12545: Connect failed because target host or object does not exist
    Warning: You are no longer connected to ORACLE.
    18:39:59 SQL> CONN m4owner/iamm4amdev!@M4AMPRD_TEST
    Connected.
    Thanks for your help in advance
    Anil Vejendla..

  • SC 3.2 nodes reboot when i reboot the first one

    i had create a cluster with two nodes and quorom (shared file system) beetwen the two nodes. but when i try to reboot one node the second one reboot. i had solaris 10 and sun cluster 3.2. the error in the console is
    WARNING: /scsi_vhci/ssd@g600a0b80005a82cf0000031c498ca848 (ssd25):
    offline or reservation conflict
    WARNING: /scsi_vhci/ssd@g600a0b80005a82cf0000038e49941b19 (ssd26):
    offline or reservation conflict
    WARNING: /scsi_vhci/ssd@g600a0b80005a82cf0000039449941c7d (ssd27):
    offline or reservation conflict
    WARNING: /scsi_vhci/ssd@g600a0b80005a82cf0000039149941bd1 (ssd29):
    offline or reservation conflict
    WARNING: /scsi_vhci/ssd@g600a0b80005a82cf0000039749941cab (ssd30):
    offline or reservation conflict
    WARNING: /scsi_vhci/ssd@g600a0b80005ab794000005ff4a564a48 (ssd47):
    offline or reservation conflict
    Update_drv failed to re-read did.conf file for did driver. Will retry once agai
    n.
    Update_drv failed to re-read did.conf file for did driver after 1 retry. Will t
    ry devfsadm.
    Devfsadm successfully configured did devices.
    WARNING: /scsi_vhci/ssd@g600a0b80005a82cf0000031c498ca848 (ssd25):
    offline or reservation conflict
    WARNING: /scsi_vhci/ssd@g600a0b80005a82cf0000038e49941b19 (ssd26):
    offline or reservation conflict
    WARNING: /scsi_vhci/ssd@g600a0b80005a82cf0000039449941c7d (ssd27):
    offline or reservation conflict
    WARNING: /scsi_vhci/ssd@g600a0b80005a82cf0000039149941bd1 (ssd29):
    offline or reservation conflict
    WARNING: /scsi_vhci/ssd@g600a0b80005a82cf0000039749941cab (ssd30):
    offline or reservation conflict
    WARNING: /scsi_vhci/ssd@g600a0b80005ab794000005ff4a564a48 (ssd47):
    offline or reservation conflict
    Update_drv failed to re-read did.conf file for did driver. Will retry once agai
    n.
    Update_drv failed to re-read did.conf file for did driver after 1 retry. Will t
    ry devfsadm.
    Devfsadm successfully configured did devices.
    Mohyi

    A couple more questions:
    - does clq status show that the quorum vote is counted correctly?
    - what kind of storage are you using
    - are these newly created LUNs that you are using or is it possible that these have been used before by other hosts or clusters?
    - any interesting error messages in the log files - /var/adm/messages
    - what is the panic string of the other node that reboots?
    I do not think that the did related message is relevant in this context.

  • Strange issue in Oracle ASM on Two node RAC where in one ASM node shows all diskgroup while other node shows  missing node.

    We have Oracle datbase 11gR1 in RAC node with Oracle ASM.Recently our database server got crashed and we are trying to restore back services.
    Using Snapshot technologyBusiness copy we had synced all our disk on storage level. Post this when we are trying to start ASM instance on node 1 it is coming and showing all diskgroups but on other node it is throwing errot with missing e diskgroup.
    ORA-15032: not all alterations performed
    ORA-15040: diskgroup is incomplete
    ORA-15042: ASM disk "5" is missing
    Expert please share your views.
    Thanks,
    Tushar

    The I/O fabric layer on the other node failed to mount all storage LUNs - resulting in ASM being unable to mount a diskgroup as there are missing disks in that group.
    Rebooting is exactly what could be needed to reset the h/w and infrastructure used by that node, in order for it to see all the storage disks again. As node 1 sees all storage disks (and is working), the disk itself on the storage system is intact and usable.
    What is the o/s? What is the fabric layer? What is used on o/s for dealing with the I/O fabric layer?

  • What would happened when one RAC node's public NIC down ?

    Dear all,
    There's a two-node RAC on my office. As my observation, when one RAC node's public NIC is down, the "crs_stat -t" command wouldn't show any resource is OFFLINE. So at this moment if i use sqlplus to connect to my RAC db, it still will choose node1 or node2 instance randomly right? And if i've been assigned to the node's instance whose public NIC is down, my sqlplus would failed to connect to db?
    So, can we say that Oracle RAC can't supply HA function if one node's public NIC is down? Or is there just any other solution to solve this issue? Any suggestion would be appreciated.

    Public node is down means that only one node would be able to take the load i.e. in your case it will be Node 2. It may happen that CRS is unable to record the status - in such case there may be failed connections to one node.

Maybe you are looking for