11gR1 1 node won't join the cluster after reboot.

This is a high level description of a problem.
We usually run a two node cluster.
This week we had an issue where one node needed to be taken down. It became non responsive and upon reboot the other node no longer functioned correctly.
So one node was left running until the maintenance window.
Apparently when it's brought back up it has the MAC of the second node in the arp cache.
This leads to node1 not being able to join the cluster.
I've seen workarounds that involve refreshing the arp cache but is there anything else to this?

This is a high level description of a problem.
We usually run a two node cluster.
This week we had an issue where one node needed to be taken down. It became non responsive and upon reboot the other node no longer functioned correctly.
So one node was left running until the maintenance window.
Apparently when it's brought back up it has the MAC of the second node in the arp cache.
This leads to node1 not being able to join the cluster.
I've seen workarounds that involve refreshing the arp cache but is there anything else to this?

Similar Messages

  • After patching the node, the node is not joining the cluster.

    Dear All,
    We are having a two node suncluster with below release
    Sun Cluster 3.2u1 for Solaris 10 sparc
    Copyright 2008 Sun Microsystems, Inc. All Rights Reserved.
    And nodes are
    Node Name Status
    scrbdomdefrm005 Online
    scrbdomderue005 Offline
    We are patching the nodes with 2q 2009 quarter patches, first we patched the node scrbdomderue005. we have followed the below step to patch the server.
    1) Our root d0 has d1(c0t0d0s0) and d2(c1t0d0s0)
    2) we have detached the d2 from d0; metaclear d2
    3) mount the c1t0d0s0 /mnt
    4) use the patchadd -R /mnt to patch the server. While patching we got only one error the patch 126106-27 need to be install in noncluster mode.
    5) switch the RG's from node scrbdomderue005 to scrbdomdfrm005.
    6) shutdown the scrbdomderue005, boot the scrbdomderue005 with c1t0d0s0 in noncluster-single user mode, and installed the patch 126106-27 and it was successful.
    7) shutdown the scrbdomderue005, boot the scrbdomderue005 with c1t0d0s0 in clustermode, and we are getting the following error.
    Booting as part of a cluster
    NOTICE: CMM: Node scrbdomdefrm005 (nodeid = 1) with votecount = 1 added.
    NOTICE: CMM: Node scrbdomderue005 (nodeid = 2) with votecount = 1 added.
    WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d5s2 with error 1.
    NOTICE: clcomm: Adapter nxge7 constructed
    NOTICE: clcomm: Adapter nxge3 constructed
    NOTICE: CMM: Node scrbdomderue005: attempting to join cluster.
    NOTICE: nxge3: xcvr addr:0x0a - link is up 1000 Mbps full duplex
    NOTICE: nxge7: xcvr addr:0x0a - link is up 1000 Mbps full duplex
    WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d5s2 with error 1.
    NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting for quorum.
    NOTICE: clcomm: Path scrbdomderue005:nxge7 - scrbdomdefrm005:nxge7 errors during initiation
    NOTICE: clcomm: Path scrbdomderue005:nxge3 - scrbdomdefrm005:nxge3 errors during initiation
    WARNING: Path scrbdomderue005:nxge7 - scrbdomdefrm005:nxge7 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
    WARNING: Path scrbdomderue005:nxge3 - scrbdomdefrm005:nxge3 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
    exit from console.
    We are able to boot the node scrbdomderue005 in noncluster-mode and it was successful., please check the below details.
    scrbdomderue005:/# uname -a
    SunOS scrbdomderue005 5.10 Generic_138888-07 sun4u sparc SUNW,SPARC-Enterprise
    scrbdomderue005:/#
    Before pathcing the server scrbdomderue005 the kernel version was.
    SunOS scrbdomderue005 5.10 Generic_137111-07 sun4u sparc SUNW,SPARC-Enterprise
    If i boot the scrbdomderue005 with d1(c0t0d0s0), the server is properly joining the cluster without issue.
    could any one please guide me, what could be the problem... how to resolve the issue.

    Hi
    I could be because you have installed patch 138888. It has problems with nxge interfaces used as interconnect.
    Rgds
    Carsten

  • Node fails to join the cluster

    We are observing a problem where a node, after getting restarted, fails to join the cluster.
    We run two coherence clusters across three boxes. Each box runs 8 java processes, 4 processes of one cluster, another 4 process
    of the other cluster. They all run as windows NT services. Sometimes, some node goes down and gets restarted. But then it fails to join the cluster with following exception :
    "com.tangosol.net.RequestTimeoutException: Timeout during service start: ServiceInfo(Id=8, Name=DistributedFIIndicativeCacheWithPublishingCacheStore, Type=DistributedCache"
    Has anyone experienced and addressed such a problem? If required, I can provide exact details of the cluster setup.
    -Bharat

    Hi Bharat,
    This may be caused by a stuck or slow DistributedService thread on one of your nodes. Please log into http://support.oracle.com and take a look at [Note 845363.1|https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=845363.1] for more details. Additionally, consider upgrading to Coherence 3.5 as it includes the [Service Guardian for deadlock detection/resolution|http://blackbeanbag.net/wp/2009/07/20/coherence-3-5-service-guardian-deadlock-detection/].
    Thanks,
    Patrick

  • Node failed to join the cluster because it ould not send and receive failure detection network messages

    One of my customers has a Windows Server 2008 R2 cluster for an Exchange 2010 Mailbox Database Availability Group.  Lately, they've been having problems with one of their nodes (the one node that is on a different subnet in a different datacenter) where
    their Exchange databases aren't replicating.  While looking into this issue it seems that the problem is the Network Manager isn't started because the cluster service is failing.  Since the issue seems to be with the cluster service, and not Exchange,
    I'm asking here. 
    When the cluster service starts, it appears to start working, but within a few minutes the following is logged in the system event log.
    FailoverClustering
    1572
    Critical
    Cluster Virtual Adapter
    Node 'nodename' failed to join the cluster because it could not send and receive failure detection network messages with other cluster nodes. ...
    It seems that the problem is with the 169.254 address on the cluster virtual adapter.  An entry in the cluster.log file says: Aborting connection because NetFT route to node nodename on virtual IP 169.254.1.44:~3343~ has failed to come up. 
    In my experience, you never have to mess with the cluster virtual adapter.  I'm not sure what happened here, but I doubt it has been modified.  I need the cluster to communicate with its other nodes on our routed 10. network.  I've never experienced
    this before and found little in my searches on the subject.  Any idea how I can fix this?
    Thanks,
    Joe
    Joseph M. Durnal MCM: Exchange 2010 MCITP: Enterprise Messaging Administrator, Exchange 2010 MCITP: Enterprise Messaging Administrator, MCITP: Enterprise Administrator

    Hi,
    I suspected an issue with communication on UDP port 3343. Please confirm the set rules for port 3343 on all the nodes in firewall and enabled all connections for all the profiles
    in firewall on all the nodes are opened, or confirm the connectivity of all the node.
    Use ipconfig /flushdns to update all the node DNS register, then confirm the DNS in your DNS server entry is correct.
    The similar issue article:
    Exchange 2010 DAG - NetworkManager has not yet been initialized
    https://blogs.technet.com/b/dblanch/archive/2012/03/05/exchange-2010-dag-networkmanager-has-not-yet-been-initialized.aspx?Redirected=true
    Hope this helps.
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • ISE PSN node won't join cluster

    Hi All,
    Has anyone seen an issue where a PSN can't join the cluster ?
    We join PSN Node
    -Node is registered sucessfully (sync in progress)
    - 1hr later - Replication to node failed.
    - Replication Sync failed due to Secondary Database is down
    I have a customer where admin node and PSN are seperated by firewall.
    We allow in both directions
    Admin <--> PSN
    ICMP
    HTTPS
    1521
    Firewall not showing drops.
    DNS and NTP are ok.
    Current topology is 1 PSN, 1 Admin node.
    Works fine in our test lab, but not customers environmnet.
    Cheers
    Peter.

    You will probably need more stuff opened between the PSN and the network but your rules between Admin and PSN. You might wanna add syslog udp 20514 as well.
    Also, what type of FW are you using? If ASA what happens if you run packet tracer and/or packet capture? Is the flow allowed through and do you see the packets in the capture
    Last but not the least, can you confirm that the DB service is running on the secondary node? From CLI run "show application status ise" If is not either restart the node or just issue "application start ise"
    Thank you for rating!

  • Managed server not able to join the cluster

    Hi
    I have two storage node enabled coherence servers on two different machines.These two are able to form the cluster without any problem. I also have two Managed servers. When I start one, will join the cluster without any issue but when I start the fourth one which does not join the cluster. Only one Managed server joins the cluster. I am getting the following error.
    2011-12-22 15:39:26.940/356.798 Oracle Coherence GE 3.6.0.4 &lt;Info> (thread=[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)', member=n/a): Loaded cache configuration from "file:/u02/oracle/admin/atddomain/atdcluster/ATD/config/atd-client-cache-config.xml"
    2011-12-22 15:39:26.943/356.801 Oracle Coherence GE 3.6.0.4 &lt;D4> (thread=[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)', member=n/a): TCMP bound to /172.23.34.91:8190 using SystemSocketProvider
    2011-12-22 15:39:57.909/387.767 Oracle Coherence GE 3.6.0.4 &lt;Warning> (thread=Cluster, member=n/a): This Member(Id=0, Timestamp=2011-12-22 15:39:26.944, Address=172.23.34.91:8190, MachineId=39242, Location=site:dev.icd,machine:appsoad2-web2,process:24613, Role=WeblogicServer) has been attempting to join the cluster at address 231.1.1.50:7777 with TTL 4 for 30 seconds without success; this could indicate a mis-configured TTL value, or it may simply be the result of a busy cluster or active failover.
    2011-12-22 15:39:57.909/387.767 Oracle Coherence GE 3.6.0.4 &lt;Warning> (thread=Cluster, member=n/a): Received a discovery message that indicates the presence of an existing cluster:
    Message "NewMemberAnnounceWait"
    FromMember=Member(Id=2, Timestamp=2011-12-22 15:22:56.607, Address=172.23.34.74:8090, MachineId=39242, Location=site:dev.icd,machine:appsoad4,process:23937,member:CoherenceServer2, Role=WeblogicWeblogicCacheServer)
    FromMessageId=0
    Internal=false
    MessagePartCount=1
    PendingCount=0
    MessageType=9
    ToPollId=0
    Poll=null
    Packets
    [000]=Broadcast{PacketType=0x0DDF00D2, ToId=0, FromId=2, Direction=Incoming, ReceivedMillis=15:39:57.909, MessageType=9, ServiceId=0, MessagePartCount=1, MessagePartIndex=0, Body=0}
    Service=ClusterService{Name=Cluster, State=(SERVICE_STARTED, STATE_ANNOUNCE), Id=0, Version=3.6}
    ToMemberSet=null
    NotifySent=false
    ToMember=Member(Id=0, Timestamp=2011-12-22 15:39:26.944, Address=172.23.34.91:8190, MachineId=39242, Location=site:dev.icd,machine:appsoad2-web2,process:24613, Role=WeblogicServer)
    SeniorMember=Member(Id=1, Timestamp=2011-12-22 15:22:53.032, Address=172.23.34.73:8090, MachineId=39241, Location=site:dev.icd,machine:appsoad3,process:19339,member:CoherenceServer1, Role=WeblogicWeblogicCacheServer)
    2011-12-22 15:40:02.915/392.773 Oracle Coherence GE 3.6.0.4 &lt;Warning> (thread=Cluster, member=n/a): Received a discovery message that indicates the presence of an existing cluster:
    Message "NewMemberAnnounceWait"
    FromMember=Member(Id=2, Timestamp=2011-12-22 15:22:56.607, Address=172.23.34.74:8090, MachineId=39242, Location=site:dev.icd,machine:appsoad4,process:23937,member:CoherenceServer2, Role=WeblogicWeblogicCacheServer)
    FromMessageId=0
    Internal=false
    MessagePartCount=1
    PendingCount=0
    MessageType=9
    ToPollId=0
    Poll=null
    Packets
    {                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               

    Hi,
    By default Coherence uses a multicast protocol to discover other nodes when forming a cluster. Since you are having difficulties in establishing a cluster via multicast, Can you please perform a multicast test and see if multicast is configured properly.
    http://wiki.tangosol.com/display/COH32UG/Multicast+Test
    Hope you are using same configuration files across the cluster members and all members of the cluster must specify the same cluster name in order to be allowed to join the cluster.
    <cluster-name system-property="tangosol.coherence.cluster";>xxx</cluster-name>
    I would suggest, try using the unicast-listener with the well-known-addresses instead of muticast-listener.
    http://wiki.tangosol.com/display/COH32UG/well-known-addresses
    Add similar entries like below in your tangosol override xml..
    <well-known-addresses>
    <socket-address id="1">
    <address> 172.23.34.91<;/address>
    <port>8190</port>
    </socket-address>
    <socket-address id="2">
    <address> 172.23.34.74<;/address>
    <port> 8090</port>
    </socket-address>
    </well-known-addresses>
    This list is used by all other nodes to find their way into the cluster without the use of multicast, thus at least one well known node must be running for other nodes to be able to join.
    Hope this helps!!
    Thanks,
    Ashok.
    <div id="isChromeWebToolbarDiv" style="display:none"></div>

  • Clinet Application without joining the cluster

    Hi All,
    is it possible for client application to access the cache within a coherence cluster. The Client application is not a part of cluster and it didnt start with and cache config files or anything else.
    The client application just uses :
    NamedCache cache = CacheFactory.getCache("VirtualCache");
    if a client application starts with a cache-config file it will also join the cluster in this case the JVM of the client app will aslo be loaded/distributed/replicated with the cache contents ?
    Please clarify my doubts.
    Regards
    Srinivas.

    The only clean way of NOT joining the cluster is to connect via Extend.
    You can join the cluster, and specify LocalStorage=False parameter, however, that is only applicable for distributed cache. Replicated cache data still exists on every node. A bigger issue in my mind, is that your node will be actively managing membership of other members in the cluster, and that can become a problem.
    Timur

  • I have a new Macbook Pro. It won't join the wireless networks that my old Macbook Pro joined; only some of them (I'm talking open networks). What can I do, please?

    I have a new Macbook Pro. It can see, but won't join the open wireless networks that my existing Macbook Pro joins. What can I do, please?

    Mac OS X 10.7 Help: Troubleshoot a network printer

  • Safari won't sync the bookmark - After Sync the bookmarks stays empty

    Safari won't sync the bookmark - After Sync the bookmarks stays empty
    I have tried replacing what is on the ME server over my current iMac , but no luck there.
    Is there a way of reseting the bookmark?

    Did I sound completely looney? Not a single suggestion.
    For anyone interested or NOT
    I fixed the problem by updating firmware on my Airport Extreme. Its true, that was the problem. huh,

  • I have a MacBookPro with Yosemite and Safari won't recognize the router after sleep unless I restart the computer.

    I have a MacBookPro with Yosemite and Safari won't recognize the router after sleep unless I restart the computer.
    OS X 10.10.1. 
    Safari Version 8.0 (10600.1.25.1)
    WiFi router connects with mail but not Safari

    See
    iOS: Device not recognized in iTunes for Windows
    - I would start with
    Removing and Reinstalling iTunes, QuickTime, and other software components for Windows XP
    or               
    Removing and reinstalling iTunes and other software components for Windows Vista, Windows 7, or Windows 8
    However, after your remove the Apple software components also remove the iCloud Control Panel via Windows Programs and Featurs appin the Window Control Panel. Then reinstall all the Apple software components
    - New cable and different USB port
    - Run this and see if the results help with determine the cause
    iTunes for Windows: Device Sync Tests
    - Try on another computer to help determine if computer or iPod problem

  • Public Interface not responding after second node is started in the cluster

    Hi
    Has anyone ever experienced the public interface not responding between nodes in the cluster (ping, ssh, scp) after the second nodeapps is started in the cluster?
    This is a new install so all I have installed so far is the base release of CRS 10.2.0. This is on Solaris10. The vipca failed during the installation, however I was able to proceed and manually add the nodeapps using srvctl add nodeaps -n -o -A.
    It seems after the second node is started I loose all connectivity to the public interfaces and to my default gateway.
    Also I'm getting the following messages sometimes after I try and stop the nodeapps and start them back up.
    CRS-1006: No more members to consider
    CRS-0215: Could not start resource 'ora.node1.vip'.
    Any suggestions on where I should start troubleshooting?
    Thanks

    Do you have default GW?
    It can connects with GW, can't it?
    Check metalink
    CRS-0215: Could not start resource 'ora..vip' [ID 356535.1]
    CRS-1006: No more members to consider when starting service [ID 465364.1]
    Good Luck

  • Can the node's UID be different when re-joining the cluster?

    Let's say that we have a split brain scenario and when both islands meet again one of them is "marked as invalid" and is restarted. Upon cluster restart will the restarted nodes get new UID values?
    Thanks,
    -- Gato

    Gato,
    Absolutely! The UID is the member's identity and if the cluster service restarts (within a given Java process), it will be always assigned a new and unique UID.
    Regards,
    Gene

  • My iMac won't join the correct network

    This is a new iMac with Lion.  The network is ok because all my other devices work fine with it.  The network is in the list and when I select it my mac joins without problem, but won't join it automatically.  I've also tried to move my network up in the list but that doesn't seem to work either.

    Mine is the only network listed in the Perferred Networks List in the advanced settings.  The "Remember networks this computer has joined" is checked. 
    The problem is that when I wake my computer up, it won't find the network.  I updated my airport extreme last week and this problem started after that.  However, none of the other devices using this network are having any problems.

  • New iPod Unable to join the network after inputting password

    Got my iPod this am..
    looked too good to touch no pun intended.
    I went through the instructions, interrupted with an offer to update_, which I did.
    I then saw that the iPod was off line and went to set up the connection.
    I do have a Time Capsule but I just use it as an external drive, I use Ethernet connections between it my Netopia router/modem which provides my Wi Fi for another computer.
    I have a WPA-PSK security on the Netopia.
    My Wi Fi shows up on available networks, I am asked for my password but then I get a message "Unable to join the network"
    I must admit that my password is long (and a bit racy) my fingers a bit big, but I know that I have got it right on at least a few occasions.
    I have tried entering the details on "Other network" with the same result.
    Could my WPA-PSK be a problem?
    I would be most grateful for any suggestions.
    Various forms of this problem have appeared on this forum, but I think mine is a bit different.
    Thank you.

    Thank you.
    Went through that pretty well.
    Finally solved it in 2 steps.
    1. reset my modem/router to factory settings.
    Odd situation, wireless settings, but dependent on ethernet connection.
    2. read a post from Hong Kong re strange network settings.
    On his advice, into network, changed location from Automatic to one of my choosing.
    Bingo! everything came good.
    Totally wireless connection. iPod jumped to it. Isn't it great when everything comes together?
    I still wonder why it didn't take the signal from the WiFi part of my Modem/Router?
    Probably just as well as it sorted my Time Capsule connections out, probably for the first time since I bought one back in early 2008, and its recent replacement when it joined the epidemic of sudden death recently.
    Cheers

  • 11gr1 DB console not showing the cluster instances

    Hi All,
    11.1.0.7 DB with DB console deployed.
    1). EM upload agent command works but on DB console, I am not able to see the cluster instances...
    any config file I need to review?
    I don't see any errors....the emctl dbconsole commands all work fine.

    Hi,
    Run the following command to check if the cluster instances have been discovered:
    <Database Home>/bin/emctl config agent listtargets
    If necessary, see Support note 578011.1 for instructions on using emca to add an instance to DB control.
    Regards,
    - Loc

Maybe you are looking for