Both cluster node reboot

There is a two nodes cluster and running Oracle RAC DB. Yesterday both nodes rebooted at the same time (less than few seconds different). Don't know it was caused by Oracle CRS and server itsefl?
Here is the log:
/var/log/messages in node 1
Dec 8 15:14:38 dc01locs01 kernel: 493 http://RAIDarray.mppdcsgswsst6140:1:0:2 Cmnd failed-retry the same path. vcmnd SN 18469446 pdev H3:C0:T0:L2 0x02/0x04/0x01 0x08000002 mpp_status:1
Dec 8 15:14:38 dc01locs01 kernel: 493 http://RAIDarray.mppdcsgswsst6140:1:0:2 Cmnd failed-retry the same path. vcmnd SN 18469448 pdev H3:C0:T0:L2 0x02/0x04/0x01 0x08000002 mpp_status:1
Dec 8 15:17:20 dc01locs01 syslogd 1.4.1: restart.
Dec 8 15:17:20 dc01locs01 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Dec 8 15:17:20 dc01locs01 kernel: Linux version 2.6.18-128.7.1.0.1.el5 ([email protected]) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Mon Aug 24 14:07:09 EDT 2009
Dec 8 15:17:20 dc01locs01 kernel: Command line: ro root=/dev/vg00/root rhgb quiet crashkernel=128M@16M
Dec 8 15:17:20 dc01locs01 kernel: BIOS-provided physical RAM map:
ocssd.log in node 1
CSSD2009-12-08 15:14:33.467 1134680384 >TRACE: clssgmDispatchCMXMSG: msg type(13) src(2) dest(1) size(123) tag(00000000) incarnation(148585637)
CSSD2009-12-08 15:14:33.468 1134680384 >TRACE: clssgmHandleDataInvalid: grock HB+ASM, member 2 node 2, birth 1
CSSD2009-12-08 15:19:00.217 >USER: Copyright 2009, Oracle version 11.1.0.7.0
CSSD2009-12-08 15:19:00.217 >USER: CSS daemon log for node dc01locs01, number 1, in cluster ocsprodrac
clsdmtListening to (ADDRESS=(PROTOCOL=ipc)(KEY=dc01locs01DBG_CSSD))
CSSD2009-12-08 15:19:00.235 1995774848 >TRACE: clssscmain: Cluster GUID is 79db6803afc7df32ffd952110f22702c
CSSD2009-12-08 15:19:00.239 1995774848 >TRACE: clssscmain: local-only set to false
/var/log/messages in node 2
Dec 8 15:14:38 dc01locs02 kernel: 493 http://RAIDarray.mppdcsgswsst6140:1:0:2 Cmnd failed-retry the same path. vcmnd SN 18561465 pdev H3:C0:T0:L2 0x02/0x04/0x01 0x08000002 mpp_status:1
Dec 8 15:14:38 dc01locs02 kernel: 493 http://RAIDarray.mppdcsgswsst6140:1:0:2 Cmnd failed-retry the same path. vcmnd SN 18561463 pdev H3:C0:T0:L2 0x02/0x04/0x01 0x08000002 mpp_status:1
Dec 8 15:17:14 dc01locs02 syslogd 1.4.1: restart.
Dec 8 15:17:14 dc01locs02 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Dec 8 15:17:14 dc01locs02 kernel: Linux version 2.6.18-128.7.1.0.1.el5 ([email protected]) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Mon Aug 24 14:07:09 EDT 2009
Dec 8 15:17:14 dc01locs02 kernel: Command line: ro root=/dev/vg00/root rhgb quiet crashkernel=128M@16M
Dec 8 15:17:14 dc01locs02 kernel: BIOS-provided physical RAM map:
ocssd.log in node 2
CSSD2009-12-08 15:14:35.450 1264081216 >TRACE: clssgmExecuteClientRequest: Received data update request from client (0x2aaaac065a00), type 1
CSSD2009-12-08 15:14:36.909 1127713088 >TRACE: clssgmDispatchCMXMSG: msg type(13) src(1) dest(1) size(123) tag(00000000) incarnation(148585637)
CSSD2009-12-08 15:14:36.909 1127713088 >TRACE: clssgmHandleDataInvalid: grock HB+ASM, member 1 node 1, birth 0
CSSD2009-12-08 15:18:55.047 >USER: Copyright 2009, Oracle version 11.1.0.7.0
clsdmtListening to (ADDRESS=(PROTOCOL=ipc)(KEY=dc01locs02DBG_CSSD))
CSSD2009-12-08 15:18:55.047 >USER: CSS daemon log for node dc01locs02, number 2, in cluster ocsprodrac
CSSD2009-12-08 15:18:55.071 3628915584 >TRACE: clssscmain: Cluster GUID is 79db6803afc7df32ffd952110f22702c
CSSD2009-12-08 15:18:55.077 3628915584 >TRACE: clssscmain: local-only set to false

Hi!
I suppose this seems easy: you have a service at 'http://RAIDarray.mppdcsgswsst6140:1:0:2' (a RAID perhaps?) which failed. Logically all servers connected to thi RAID went down at the same time.
Seems no Oracle problem. Good luck!

Similar Messages

  • Oracle Cluster Node Reboots Abruptly

    One of our RAC 11gR2 Cluster Node rebooted abruptly. We found the following error in the grid home alter log file and ocssd.log file.
    [cssd(6014)]CRS-1611:Network communication with node mumchora12 (1) missing for 75% of timeout interval.  Removal of this node from cluster in 6.190 secondsWe need to find the Root Cause for this node reboot. Kindly assist.
    OS Version : RHEL 5.8
    GRID : 11.2.0.2
    Database : 11.2.0.2.10

    Hi,
    By looking the logs it seems private interconnect problem. I would suggest you to refer one of nice metalink doc on same issue.
    Node reboot or eviction: How to check if your private interconnect CRS can transmit network heartbeats [ID 1445075.1]
    Hope it will help you to identify the root cause of node eviction.
    Thanks

  • Cluster node reboots after network failure

    hi all,
    The suncluster 3.1 8/05 with 2 nodes (E2900) was working fine without any errors in the sccheck.
    yesterday one node rebooted saying a network failure,errors in the massage file are
    Jan 17 08:00:36 PRD in.mpathd[221]: [ID 594170 daemon.error] NIC failure detected on ce0 of group sc_ipmp0
    Jan 17 08:00:36 PRD Cluster.PNM: [ID 890413 daemon.notice] sc_ipmp0: state transition from OK to DOWN.
    Jan 17 08:00:47 PRD Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource PROD status on node PRD change to R_FM_DEGRADED
    Jan 17 08:00:47 PRD Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource PROD status msg on node PRD change to <IPMP Failure.>
    Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 529407 daemon.notice] resource group CFS state on node PRD change to RG_PENDING_OFFLINE
    Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource PROD state on node PRD change to R_MON_STOPPING
    Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hafoip_monitor_stop> for resource <PROD>, resource group <CFS>, timeout <300> seconds
    Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 736390 daemon.notice] method <hafoip_monitor_stop> completed successfully for resource <PROD>, resource group <CFS>, time used: 0% of timeout <300 seconds>
    Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource PROD state on node PRD change to R_ONLINE_UNMON
    Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource PROD state on node PRD change to R_STOPPING
    Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hafoip_stop> for resource <PROD>, resource group <CFS>, timeout <300> seconds
    Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource PROD status on node PRD change to R_FM_UNKNOWN
    Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource PROD status msg on node PRD change to <Stopping>
    Jan 17 08:00:51 PRD ip: [ID 678092 kern.notice] TCP_IOC_ABORT_CONN: local = 172.016.005.025:0, remote = 000.000.000.000:0, start = -2, end = 6
    Jan 17 08:00:51 PRD ip: [ID 302654 kern.notice] TCP_IOC_ABORT_CONN: aborted 53 connections
    what can be the reason for reabooting?
    is there any way to avoid this, with only a failover?
    rgds
    Message was edited by:
    suj

    What is in that resource group? The cause is probably something with Failover_mode=HARD set. Check the manual reference section for this. The option would be to set the Failover_mode=SOFT.
    Tim
    ---

  • Cluster node reboots repeatedly

    We have 2 node 10.1.0.3 cluster setup. We had a problem with a HBA card for the fibre channel to SAN and after replacing it, one of the cluster nodes keeps rebooting itself right after the Cluster processes startup.
    We have had this issue once before and Support suggested the following.. Howevere the same solution is not working this time around.. Any ideas?
    Check output of the unix command hostname is node1
    Please rename cssnorun file in /etc/oracle/scls_scr/node1/root directory. Please issue "touch /etc/oracle/scls_scr/node1/root/crsdboot" and also change the permission and ownership of the file to match that of the node 2. Please check if there is any differences in permission, ownership, and the group for any files or directory structure under /etc/oracle between two nodes.
    Please reboot node 1 after this change and see if you run into the same problem.
    Please check if there is any /tmp/crsctl* files.

    Well especially if you are Linux RH4 the new controler card will have cause the device names to change. Check that out. It could be that you are no longer seeing you vote and crs partitions. This can happen on other operating systems if the devices now have a new name because the controller card has changed.
    For Linux try the Man pages on udev and search for udev on OTN
    Regards

  • Cluster node reboot and Quick Migration of VMs instead of Live Migration...

    Hi to all,
    how can one configure a Windows Server 2012 multi-node failover cluster, that vms are migrated per Live Migration and NOT per Quick Migration, if one node of the failover cluster will be rebooted.
    Thanks in advance
    Joerg

    Hi Aidan,
    only for the record:
    We get the requested functionality - Live migrate all VMs on reboot without first pausing the cluster- when we do the following:
    Change the value of HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\PreshutdownOrder
    from the default
    vmms
    wuauserv
    gpsvc
    trustedinstall
    to
    clussvc
    vmms
    wuauserv
    gpsvc
    trustedinstall
    Now the cluster service stops at first, if we Trigger a reboot and all VMs migrate as configured per MoveTypeThreshold cluster setting.
    Greetings
    Joerg

  • Automatic reboots in cluster nodes

    Hi all,
    I have installed sun cluster 3.3 on intel x86 machine in Vmware. I have 2 nodes.
    Both the nodes reboot automatically or hang after some time.
    Can you please tell the cause and how to troubleshoot it.
    The memory assigned to both the nodes in VM is 1300 MB each.

    So first I should point out that this is not an officially supported configuration which means there may be any number of issues that exist with this configuration. Having said that, I know that some people have made use of similar sorts of configurations.
    To get a root cause, you need to look at the message logs (/var/adm/messages) for both nodes. See if there is anything to do with either loss of quorum or heartbeat tick timeouts. Both of those can lead to node panics. Once you have that information, it will be easier to search for a potential resolution.
    Tim
    ---

  • File Being processed in two cluster nodes

    Hi ,
    We are having two cluster nodes and when my  adapter picks the file, the file is getting processed in 2 cluster nodes.
    I believe the file should get processed in either of the cluster node but not in both cluster nodes.
    Has any one faced this kind of situation in any of your projects where you might be having different cluster nodes.
    Thanks,
    Chandra.

    Hi Chandra
      Did u get a chance to see this post.. it may help
        Processing in  Multiple Cluster Nodes
    Regards,
    Sandeep

  • Changing Cluster node hostname

    Dear all
    Can i change hostname of box in cluster enviroment
    Regards
    DR

    According to SysAdmin magazine (it's not on there site but in the May 2006 edition) you can change the hostname of cluster hostnames by performing the following:
    Reboot cluster nodes into non-cluster node (reboot -- -x)
    Change the hostname of the system (nodenames, hosts etc)
    Change hostname on all nodes within the files under /erc/cluster/ccr
    Regenerate the checksums for each file changed using ccradm -I /etc/cluster/ccr/FILENAME -0Reboot every cluster node into the cluster.
    I have no idea if this works but if it does then let me know.

  • OES2 SP2a cluster node freeze

    Hi all.
    I have a 3 node cluster based on OES2 SP2a fully patched. There are a coupe of resources: Master_IP and a NSS volume.
    The cluster is virtualized on ESXi 4.1 fully patched, and vmware-tools are installed and up to date.
    If i do an "rcnetwork stop" on a node, it remains with no network for about 20 seconds, and then freezes. Does not reboot. Only freezes. The resource is balanced correctly, but the server remains hanged.
    This behaviour is the same on a server with a cluster resource on it and on a server with no cluster resource on it. Always hangs.
    The correct behaviour should be a reboot, shouldn't?
    Any hints?
    Thanks in advance.

    The node does not reboot because ....
    9.11 Preventing a Cluster Node Reboot after a Node Shutdown
    If LAN connectivity is lost between a cluster node and the other nodes in the cluster, it is possible that the lost node will be automatically shut down by the other cluster nodes. This is normal cluster operating behavior, and it prevents the lost node from trying to load cluster resources because it cannot detect the other cluster nodes. By default, cluster nodes are configured to reboot after an automatic shutdown.
    On certain occasions, you might want to prevent a downed cluster node from rebooting so you can troubleshoot problems.
    Section 9.11.1, OES 2 SP2 with Patches and Later
    Section 9.11.2, OES 2 SP2 Release Version and Earlier
    9.11.1 OES 2 SP2 with Patches and Later
    Beginning in the OES 2 SP2 Maintenance Patch for May 2010, the Novell Cluster Services reboot behavior conforms to the kernel panic setting for the Linux operating system. By default the kernel panic setting is set for no reboot after a node shutdown.
    You can set the kernel panic behavior in the /etc/sysctl.conf file by adding a kernel.panic command line. Set the value to 0 for no reboot after a node shutdown. Set the value to a positive integer value to indicate that the server should be rebooted after waiting the specified number of seconds. For information about the Linux sysctl, see the Linux man pages on sysctl and sysctl.conf.
    1.
    As the root user, open the /etc/sysctl.conf file in a text editor.
    2.
    If the kernel.panic token is not present, add it.
    kernel.panic = 0
    3.
    Set the kernel.panic value to 0 or to a positive integer value, depending on the desired behavior.
    No Reboot: To prevent an automatic cluster reboot after a node shutdown, set the kernel.panic token to value to 0. This allows the administrator to determine what caused the kernel panic condition before manually rebooting the server. This is the recommended setting.
    kernel.panic = 0
    Reboot: To allow a cluster node to reboot automatically after a node shutdown, set the kernel.panic token to a positive integer value that represents the seconds to delay the reboot.
    kernel.panic = <seconds>
    For example, to wait 1 minute (60 seconds) before rebooting the server, specify the following:
    kernel.panic = 60
    4.
    Save your changes.
    9.11.2 OES 2 SP2 Release Version and Earlier
    In OES 2 SP release version and earlier, you can modify the opt/novell/ncs/bin/ldncs file for the cluster to trigger the server to not automatically reboot after a shutdown.
    1.
    Open the opt/novell/ncs/bin/ldncs file in a text editor.
    2.
    Find the following line:
    echo -n $TOLERANCE > /proc/sys/kernel/panic
    3.
    Replace $TOLERANCE with a value of 0 to cause the server to not automatically reboot after a shutdown.
    4.
    After editing the ldncs file, you must reboot the server to cause the change to take effect.

  • Node does not join cluster upon reboot

    Hi Guys,
    I have two servers [Sun Fire X4170] clustered together using Solaris cluster 3.3 for Oracle Database. They are connected to a shared storage which is Dell Equallogic [iSCSI]. Lately, I have ran into a weird kind of a problem where as both nodes come up fine and join the cluster upon reboot; however, when I reboot one of nodes then any of them does not join cluster and shows following errors:
    This is happening on both the nodes [if I reboot only one node at a time]. But if I reboot both the nodes at the same time then they successfully join the cluster and everything runs fine.
    Below is the output from one node which I rebooted and it did not join the cluster and puked out following errors. The other node is running fine will all the services.
    In order to get out of this situation, I have to reboot both the nodes together.
    # dmesg output #
    Apr 23 17:37:03 srvhqon11 ixgbe: [ID 611667 kern.info] NOTICE: ixgbe2: link down
    Apr 23 17:37:12 srvhqon11 iscsi: [ID 933263 kern.notice] NOTICE: iscsi connection(5) unable to connect to target SENDTARGETS_DISCOVERY
    Apr 23 17:37:12 srvhqon11 iscsi: [ID 114404 kern.notice] NOTICE: iscsi discovery failure - SendTargets (010.010.017.104)
    Apr 23 17:37:13 srvhqon11 iscsi: [ID 240218 kern.notice] NOTICE: iscsi session(9) iqn.2001-05.com.equallogic:0-8a0906-96cf73708-ef30000005e50a1b-sblprdbk online
    Apr 23 17:37:13 srvhqon11 scsi: [ID 583861 kern.info] sd11 at scsi_vhci0: unit-address g6090a0887073cf961b0ae505000030ef: g6090a0887073cf961b0ae505000030ef
    Apr 23 17:37:13 srvhqon11 genunix: [ID 936769 kern.info] sd11 is /scsi_vhci/disk@g6090a0887073cf961b0ae505000030ef
    Apr 23 17:37:13 srvhqon11 scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
    Apr 23 17:37:13 srvhqon11 /scsi_vhci/disk@g6090a0887073cf961b0ae505000030ef (sd11): Command failed to complete (3) on path iscsi0/[email protected]:0-8a0906-96cf73708-ef30000005e50a1b-sblprdbk0001,0
    Apr 23 17:46:54 srvhqon11 svc.startd[11]: [ID 122153 daemon.warning] svc:/network/iscsi/initiator:default: Method or service exit timed out. Killing contract 41.
    Apr 23 17:46:54 srvhqon11 svc.startd[11]: [ID 636263 daemon.warning] svc:/network/iscsi/initiator:default: Method "/lib/svc/method/iscsid start" failed due to signal KILL.
    Apr 23 17:46:54 srvhqon11 svc.startd[11]: [ID 748625 daemon.error] network/iscsi/initiator:default failed repeatedly: transitioned to maintenance (see 'svcs -xv' for details)
    Apr 24 14:50:16 srvhqon11 svc.startd[11]: [ID 694882 daemon.notice] instance svc:/system/console-login:default exited with status 1
    root@srvhqon11 # svcs -xv
    svc:/system/cluster/loaddid:default (Oracle Solaris Cluster loaddid)
    State: offline since Tue Apr 23 17:46:54 2013
    Reason: Start method is running.
    See: http://sun.com/msg/SMF-8000-C4
    See: /var/svc/log/system-cluster-loaddid:default.log
    Impact: 49 dependent services are not running:
    svc:/system/cluster/bootcluster:default
    svc:/system/cluster/cl_execd:default
    svc:/system/cluster/zc_cmd_log_replay:default
    svc:/system/cluster/sc_zc_member:default
    svc:/system/cluster/sc_rtreg_server:default
    svc:/system/cluster/sc_ifconfig_server:default
    svc:/system/cluster/initdid:default
    svc:/system/cluster/globaldevices:default
    svc:/system/cluster/gdevsync:default
    svc:/milestone/multi-user:default
    svc:/system/boot-config:default
    svc:/system/cluster/cl-svc-enable:default
    svc:/milestone/multi-user-server:default
    svc:/application/autoreg:default
    svc:/system/basicreg:default
    svc:/system/zones:default
    svc:/system/cluster/sc_zones:default
    svc:/system/cluster/scprivipd:default
    svc:/system/cluster/cl-svc-cluster-milestone:default
    svc:/system/cluster/sc_svtag:default
    svc:/system/cluster/sckeysync:default
    svc:/system/cluster/rpc-fed:default
    svc:/system/cluster/rgm-starter:default
    svc:/application/management/common-agent-container-1:default
    svc:/system/cluster/scsymon-srv:default
    svc:/system/cluster/sc_syncsa_server:default
    svc:/system/cluster/scslmclean:default
    svc:/system/cluster/cznetd:default
    svc:/system/cluster/scdpm:default
    svc:/system/cluster/rpc-pmf:default
    svc:/system/cluster/pnm:default
    svc:/system/cluster/sc_pnm_proxy_server:default
    svc:/system/cluster/cl-event:default
    svc:/system/cluster/cl-eventlog:default
    svc:/system/cluster/cl-ccra:default
    svc:/system/cluster/ql_upgrade:default
    svc:/system/cluster/mountgfs:default
    svc:/system/cluster/clusterdata:default
    svc:/system/cluster/ql_rgm:default
    svc:/system/cluster/scqdm:default
    svc:/application/stosreg:default
    svc:/application/sthwreg:default
    svc:/application/graphical-login/cde-login:default
    svc:/application/cde-printinfo:default
    svc:/system/cluster/scvxinstall:default
    svc:/system/cluster/sc_failfast:default
    svc:/system/cluster/clexecd:default
    svc:/system/cluster/sc_pmmd:default
    svc:/system/cluster/clevent_listenerd:default
    svc:/application/print/server:default (LP print server)
    State: disabled since Tue Apr 23 17:36:44 2013
    Reason: Disabled by an administrator.
    See: http://sun.com/msg/SMF-8000-05
    See: man -M /usr/share/man -s 1M lpsched
    Impact: 2 dependent services are not running:
    svc:/application/print/rfc1179:default
    svc:/application/print/ipp-listener:default
    svc:/network/iscsi/initiator:default (?)
    State: maintenance since Tue Apr 23 17:46:54 2013
    Reason: Restarting too quickly.
    See: http://sun.com/msg/SMF-8000-L5
    See: /var/svc/log/network-iscsi-initiator:default.log
    Impact: This service is not running.
    ######## Cluster Status from working node ############
    root@srvhqon10 # cluster status
    === Cluster Nodes ===
    --- Node Status ---
    Node Name Status
    srvhqon10 Online
    srvhqon11 Offline
    === Cluster Transport Paths ===
    Endpoint1 Endpoint2 Status
    srvhqon10:igb3 srvhqon11:igb3 faulted
    srvhqon10:igb2 srvhqon11:igb2 faulted
    === Cluster Quorum ===
    --- Quorum Votes Summary from (latest node reconfiguration) ---
    Needed Present Possible
    2 2 3
    --- Quorum Votes by Node (current status) ---
    Node Name Present Possible Status
    srvhqon10 1 1 Online
    srvhqon11 0 1 Offline
    --- Quorum Votes by Device (current status) ---
    Device Name Present Possible Status
    d2 1 1 Online
    === Cluster Device Groups ===
    --- Device Group Status ---
    Device Group Name Primary Secondary Status
    --- Spare, Inactive, and In Transition Nodes ---
    Device Group Name Spare Nodes Inactive Nodes In Transistion Nodes
    --- Multi-owner Device Group Status ---
    Device Group Name Node Name Status
    === Cluster Resource Groups ===
    Group Name Node Name Suspended State
    ora-rg srvhqon10 No Online
    srvhqon11 No Offline
    nfs-rg srvhqon10 No Online
    srvhqon11 No Offline
    backup-rg srvhqon10 No Online
    srvhqon11 No Offline
    === Cluster Resources ===
    Resource Name Node Name State Status Message
    ora-listener srvhqon10 Online Online
    srvhqon11 Offline Offline
    ora-server srvhqon10 Online Online
    srvhqon11 Offline Offline
    ora-stor srvhqon10 Online Online
    srvhqon11 Offline Offline
    ora-lh srvhqon10 Online Online - LogicalHostname online.
    srvhqon11 Offline Offline
    nfs-rs srvhqon10 Online Online - Service is online.
    srvhqon11 Offline Offline
    nfs-stor-rs srvhqon10 Online Online
    srvhqon11 Offline Offline
    nfs-lh-rs srvhqon10 Online Online - LogicalHostname online.
    srvhqon11 Offline Offline
    backup-stor srvhqon10 Online Online
    srvhqon11 Offline Offline
    cluster: (C383355) No response from daemon on node "srvhqon11".
    === Cluster DID Devices ===
    Device Instance Node Status
    /dev/did/rdsk/d1 srvhqon10 Ok
    /dev/did/rdsk/d2 srvhqon10 Ok
    srvhqon11 Unknown
    /dev/did/rdsk/d3 srvhqon10 Ok
    srvhqon11 Unknown
    /dev/did/rdsk/d4 srvhqon10 Ok
    /dev/did/rdsk/d5 srvhqon10 Fail
    srvhqon11 Unknown
    /dev/did/rdsk/d6 srvhqon11 Unknown
    /dev/did/rdsk/d7 srvhqon11 Unknown
    /dev/did/rdsk/d8 srvhqon10 Ok
    srvhqon11 Unknown
    /dev/did/rdsk/d9 srvhqon10 Ok
    srvhqon11 Unknown
    === Zone Clusters ===
    --- Zone Cluster Status ---
    Name Node Name Zone HostName Status Zone Status
    Regards.

    check if your global devices are mounted properly
    #cat /etc/mnttab | grep -i global
    check if proper entries are there on both systems
    #cat /etc/vfstab | grep -i global
    give output for quoram devices .
    #scstat -q
    or
    #clquorum list -v
    also check why your scsi initiator service is going offline unexpectedly
    #vi /var/svc/log/network-iscsi-initiator:default.log

  • Operational Quorum and both nodes rebooting.

    I've experienced an issue that when I rip out the SCSI cables to shared storage (and the quorum device), both nodes panic and
    reboot. Is this expected behavior?
    It seems that it is understandable that the active node reboots, because it lost the disk-path and quorum device. But should
    the stand-by node reboot to?

    No problem.
    It's running S10 update 4 w/ SC 3.2.
    3120 JBOD attached to two T2000's, two-node cluster.
    I'm wondering if the stand-by node didn't see the quorum device, when the the active nodes scsi cables were pulled.
    We pulled the standby nodes SCSI cables and reconnected them prior to pulling the active nodes. The difference was that the stand-by node's /var/adm/messages log was filled with expected messages regarding a missing disk. The cables were re-attached to the stand-by and then yanked out of the active node. This is when both nodes panicked.

  • Cluster node fails after testing removing both interconnects in a two node

    Hi,
    cluster node panics and fails to join cluster after testing removing both interconnects in a two node cluster. cluster is up on one node , but the panic'ed node fails to rejoin cluster saying no sufficient quorum yet and both clinterconn failed (even after conencting the interconn). Quorum device used is a shared disk.
    Is this a bug?
    Any workaround or solution?
    Cluster is 3.2 SPARC
    Thanking you
    Ushas Symon

    Sounds like a networking problem to me. If the failed node genuinely can't communicate with the remaining node then it will not be allowed to join the cluster, hence the quorum message. I would suspect either:
    * Misconnected cables
    * A switch that has block or disabled the port
    * A failed auto-negotiation
    This is of course without knowing anything about what your network infrastructure actually is!
    Tim
    ---

  • After reboot cluster node went into maintanance mode (CONTROL-D)

    Hi there!
    I have configured 2 node cluster on 2 x SUN Enterprise 220R and StoreEdge D1000.
    Each time when rebooted any of the cluster nodes i get the following error during boot up:
    The / file system (/dev/rdsk/c0t1d0s0) is being checked.
    /dev/rdsk/c0t1d0s0: UNREF DIR I=35540 OWNER=root MODE=40755
    /dev/rdsk/c0t1d0s0: SIZE=512 MTIME=Jun 5 15:02 2006 (CLEARED)
    /dev/rdsk/c0t1d0s0: UNREF FILE I=1192311 OWNER=root MODE=100600
    /dev/rdsk/c0t1d0s0: SIZE=96 MTIME=Jun 5 13:23 2006 (RECONNECTED)
    /dev/rdsk/c0t1d0s0: LINK COUNT FILE I=1192311 OWNER=root MODE=100600
    /dev/rdsk/c0t1d0s0: SIZE=96 MTIME=Jun 5 13:23 2006 COUNT 0 SHOULD BE 1
    /dev/rdsk/c0t1d0s0: LINK COUNT INCREASING
    /dev/rdsk/c0t1d0s0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
    In maintanance mode i do:
    # fsck -y -F ufs /dev/rdsk/c0t1d0s0
    and it managed to correct the problem ... but problem occured again after each reboot on each cluster node!
    I have installed Sun CLuster 3.1 on Solaris 9 SPARC
    How can i get rid of it?
    Any ideas?
    Brgds,
    Sergej

    Hi i get this:
    112941-09 SunOS 5.9: sysidnet Utility Patch
    116755-01 SunOS 5.9: usr/snadm/lib/libadmutil.so.2 Patch
    113434-30 SunOS 5.9: /usr/snadm/lib Library and Differential Flash Patch
    112951-13 SunOS 5.9: patchadd and patchrm Patch
    114711-03 SunOS 5.9: usr/sadm/lib/diskmgr/VDiskMgr.jar Patch
    118064-04 SunOS 5.9: Admin Install Project Manager Client Patch
    113742-01 SunOS 5.9: smcpreconfig.sh Patch
    113813-02 SunOS 5.9: Gnome Integration Patch
    114501-01 SunOS 5.9: drmproviders.jar Patch
    112943-09 SunOS 5.9: Volume Management Patch
    113799-01 SunOS 5.9: solregis Patch
    115697-02 SunOS 5.9: mtmalloc lib Patch
    113029-06 SunOS 5.9: libaio.so.1 librt.so.1 and abi_libaio.so.1 Patch
    113981-04 SunOS 5.9: devfsadm Patch
    116478-01 SunOS 5.9: usr platform links Patch
    112960-37 SunOS 5.9: patch libsldap ldap_cachemgr libldap
    113332-07 SunOS 5.9: libc_psr.so.1 Patch
    116500-01 SunOS 5.9: SVM auto-take disksets Patch
    114349-04 SunOS 5.9: sbin/dhcpagent Patch
    120441-03 SunOS 5.9: libsec patch
    114344-19 SunOS 5.9: kernel/drv/arp Patch
    114373-01 SunOS 5.9: UMEM - abi_libumem.so.1 patch
    118558-27 SunOS 5.9: Kernel Patch
    115675-01 SunOS 5.9: /usr/lib/liblgrp.so Patch
    112958-04 SunOS 5.9: patch pci.so
    113451-11 SunOS 5.9: IKE Patch
    112920-02 SunOS 5.9: libipp Patch
    114372-01 SunOS 5.9: UMEM - llib-lumem patch
    116229-01 SunOS 5.9: libgen Patch
    116178-01 SunOS 5.9: libcrypt Patch
    117453-01 SunOS 5.9: libwrap Patch
    114131-03 SunOS 5.9: multi-terabyte disk support - libadm.so.1 patch
    118465-02 SunOS 5.9: rcm_daemon Patch
    113490-04 SunOS 5.9: Audio Device Driver Patch
    114926-02 SunOS 5.9: kernel/drv/audiocs Patch
    113318-25 SunOS 5.9: patch /kernel/fs/nfs and /kernel/fs/sparcv9/nfs
    113070-01 SunOS 5.9: ftp patch
    114734-01 SunOS 5.9: /usr/ccs/bin/lorder Patch
    114227-01 SunOS 5.9: yacc Patch
    116546-07 SunOS 5.9: CDRW DVD-RW DVD+RW Patch
    119494-01 SunOS 5.9: mkisofs patch
    113471-09 SunOS 5.9: truss Patch
    114718-05 SunOS 5.9: usr/kernel/fs/pcfs Patch
    115545-01 SunOS 5.9: nss_files patch
    115544-02 SunOS 5.9: nss_compat patch
    118463-01 SunOS 5.9: du Patch
    116016-03 SunOS 5.9: /usr/sbin/logadm patch
    115542-02 SunOS 5.9: nss_user patch
    116014-06 SunOS 5.9: /usr/sbin/usermod patch
    116012-02 SunOS 5.9: ps utility patch
    117433-02 SunOS 5.9: FSS FX RT Patch
    117431-01 SunOS 5.9: nss_nis Patch
    115537-01 SunOS 5.9: /kernel/strmod/ptem patch
    115336-03 SunOS 5.9: /usr/bin/tar, /usr/sbin/static/tar Patch
    117426-03 SunOS 5.9: ctsmc and sc_nct driver patch
    121319-01 SunOS 5.9: devfsadmd_mod.so Patch
    121316-01 SunOS 5.9: /kernel/sys/doorfs Patch
    121314-01 SunOS 5.9: tl driver patch
    116554-01 SunOS 5.9: semsys Patch
    112968-01 SunOS 5.9: patch /usr/bin/renice
    116552-01 SunOS 5.9: su Patch
    120445-01 SunOS 5.9: Toshiba platform token links (TSBW,Ultra-3i)
    112964-15 SunOS 5.9: /usr/bin/ksh Patch
    112839-08 SunOS 5.9: patch libthread.so.1
    115687-02 SunOS 5.9:/var/sadm/install/admin/default Patch
    115685-01 SunOS 5.9: sbin/netstrategy Patch
    115488-01 SunOS 5.9: patch /kernel/misc/busra
    115681-01 SunOS 5.9: usr/lib/fm/libdiagcode.so.1 Patch
    113032-03 SunOS 5.9: /usr/sbin/init Patch
    113031-03 SunOS 5.9: /usr/bin/edit Patch
    114259-02 SunOS 5.9: usr/sbin/psrinfo Patch
    115878-01 SunOS 5.9: /usr/bin/logger Patch
    116543-04 SunOS 5.9: vmstat Patch
    113580-01 SunOS 5.9: mount Patch
    115671-01 SunOS 5.9: mntinfo Patch
    113977-01 SunOS 5.9: awk/sed pkgscripts Patch
    122716-01 SunOS 5.9: kernel/fs/lofs patch
    113973-01 SunOS 5.9: adb Patch
    122713-01 SunOS 5.9: expr patch
    117168-02 SunOS 5.9: mpstat Patch
    116498-02 SunOS 5.9: bufmod Patch
    113576-01 SunOS 5.9: /usr/bin/dd Patch
    116495-03 SunOS 5.9: specfs Patch
    117160-01 SunOS 5.9: /kernel/misc/krtld patch
    118586-01 SunOS 5.9: cp/mv/ln Patch
    120025-01 SunOS 5.9: ipsecconf Patch
    116527-02 SunOS 5.9: timod Patch
    117155-08 SunOS 5.9: pcipsy Patch
    114235-01 SunOS 5.9: libsendfile.so.1 Patch
    117152-01 SunOS 5.9: magic Patch
    116486-03 SunOS 5.9: tsalarm Driver Patch
    121998-01 SunOS 5.9: two-key mode fix for 3DES Patch
    116484-01 SunOS 5.9: consconfig Patch
    116482-02 SunOS 5.9: modload Utils Patch
    117746-04 SunOS 5.9: patch platform/sun4u/kernel/drv/sparcv9/pic16f819
    121992-01 SunOS 5.9: fgrep Patch
    120768-01 SunOS 5.9: grpck patch
    119438-01 SunOS 5.9: usr/bin/login Patch
    114389-03 SunOS 5.9: devinfo Patch
    116510-01 SunOS 5.9: wscons Patch
    114224-05 SunOS 5.9: csh Patch
    116670-04 SunOS 5.9: gld Patch
    114383-03 SunOS 5.9: Enchilada/Stiletto - pca9556 driver
    116506-02 SunOS 5.9: traceroute patch
    112919-01 SunOS 5.9: netstat Patch
    112918-01 SunOS 5.9: route Patch
    112917-01 SunOS 5.9: ifrt Patch
    117132-01 SunOS 5.9: cachefsstat Patch
    114370-04 SunOS 5.9: libumem.so.1 patch
    114010-02 SunOS 5.9: m4 Patch
    117129-01 SunOS 5.9: adb Patch
    117483-01 SunOS 5.9: ntwdt Patch
    114369-01 SunOS 5.9: prtvtoc patch
    117125-02 SunOS 5.9: procfs Patch
    117480-01 SunOS 5.9: pkgadd Patch
    112905-02 SunOS 5.9: ippctl Patch
    117123-06 SunOS 5.9: wanboot Patch
    115030-03 SunOS 5.9: Multiterabyte UFS - patch mount
    114004-01 SunOS 5.9: sed Patch
    113335-03 SunOS 5.9: devinfo Patch
    113495-05 SunOS 5.9: cfgadm Library Patch
    113494-01 SunOS 5.9: iostat Patch
    113493-03 SunOS 5.9: libproc.so.1 Patch
    113330-01 SunOS 5.9: rpcbind Patch
    115028-02 SunOS 5.9: patch /usr/lib/fs/ufs/df
    115024-01 SunOS 5.9: file system identification utilities
    117471-02 SunOS 5.9: fifofs Patch
    118897-01 SunOS 5.9: stc Patch
    115022-03 SunOS 5.9: quota utilities
    115020-01 SunOS 5.9: patch /usr/lib/adb/ml_odunit
    113720-01 SunOS 5.9: rootnex Patch
    114352-03 SunOS 5.9: /etc/inet/inetd.conf Patch
    123056-01 SunOS 5.9: ldterm patch
    116243-01 SunOS 5.9: umountall Patch
    113323-01 SunOS 5.9: patch /usr/sbin/passmgmt
    116049-01 SunOS 5.9: fdfs Patch
    116241-01 SunOS 5.9: keysock Patch
    113480-02 SunOS 5.9: usr/lib/security/pam_unix.so.1 Patch
    115018-01 SunOS 5.9: patch /usr/lib/adb/dqblk
    113277-44 SunOS 5.9: sd and ssd Patch
    117457-01 SunOS 5.9: elfexec Patch
    113110-01 SunOS 5.9: touch Patch
    113077-17 SunOS 5.9: /platform/sun4u/kernal/drv/su Patch
    115006-01 SunOS 5.9: kernel/strmod/kb patch
    113072-07 SunOS 5.9: patch /usr/sbin/format
    113071-01 SunOS 5.9: patch /usr/sbin/acctadm
    116782-01 SunOS 5.9: tun Patch
    114331-01 SunOS 5.9: power Patch
    112835-01 SunOS 5.9: patch /usr/sbin/clinfo
    114927-01 SunOS 5.9: usr/sbin/allocate Patch
    119937-02 SunOS 5.9: inetboot patch
    113467-01 SunOS 5.9: seg_drv & seg_mapdev Patch
    114923-01 SunOS 5.9: /usr/kernel/drv/logindmux Patch
    117443-01 SunOS 5.9: libkvm Patch
    114329-01 SunOS 5.9: /usr/bin/pax Patch
    119929-01 SunOS 5.9: /usr/bin/xargs patch
    113459-04 SunOS 5.9: udp patch
    113446-03 SunOS 5.9: dman Patch
    116009-05 SunOS 5.9: sgcn & sgsbbc patch
    116557-04 SunOS 5.9: sbd Patch
    120241-01 SunOS 5.9: bge: Link & Speed LEDs flash constantly on V20z
    113984-01 SunOS 5.9: iosram Patch
    113220-01 SunOS 5.9: patch /platform/sun4u/kernel/drv/sparcv9/upa64s
    113975-01 SunOS 5.9: ssm Patch
    117165-01 SunOS 5.9: pmubus Patch
    116530-01 SunOS 5.9: bge.conf Patch
    116529-01 SunOS 5.9: smbus Patch
    116488-03 SunOS 5.9: Lights Out Management (lom) patch
    117131-01 SunOS 5.9: adm1031 Patch
    117124-12 SunOS 5.9: platmod, drmach, dr, ngdr, & gptwocfg Patch
    114003-01 SunOS 5.9: bbc driver Patch
    118539-02 SunOS 5.9: schpc Patch
    112837-10 SunOS 5.9: patch /usr/lib/inet/in.dhcpd
    114975-01 SunOS 5.9: usr/lib/inet/dhcp/svcadm/dhcpcommon.jar Patch
    117450-01 SunOS 5.9: ds_SUNWnisplus Patch
    113076-02 SunOS 5.9: dhcpmgr.jar Patch
    113572-01 SunOS 5.9: docbook-to-man.ts Patch
    118472-01 SunOS 5.9: pargs Patch
    122709-01 SunOS 5.9: /usr/bin/dc patch
    113075-01 SunOS 5.9: pmap patch
    113472-01 SunOS 5.9: madv & mpss lib Patch
    115986-02 SunOS 5.9: ptree Patch
    115693-01 SunOS 5.9: /usr/bin/last Patch
    115259-03 SunOS 5.9: patch usr/lib/acct/acctcms
    114564-09 SunOS 5.9: /usr/sbin/in.ftpd Patch
    117441-01 SunOS 5.9: FSSdispadmin Patch
    113046-01 SunOS 5.9: fcp Patch
    118191-01 gtar patch
    114818-06 GNOME 2.0.0: libpng Patch
    117177-02 SunOS 5.9: lib/gss module Patch
    116340-05 SunOS 5.9: gzip and Freeware info files patch
    114339-01 SunOS 5.9: wrsm header files Patch
    122673-01 SunOS 5.9: sockio.h header patch
    116474-03 SunOS 5.9: libsmedia Patch
    117138-01 SunOS 5.9: seg_spt.h
    112838-11 SunOS 5.9: pcicfg Patch
    117127-02 SunOS 5.9: header Patch
    112929-01 SunOS 5.9: RIPv2 Header Patch
    112927-01 SunOS 5.9: IPQos Header Patch
    115992-01 SunOS 5.9: /usr/include/limits.h Patch
    112924-01 SunOS 5.9: kdestroy kinit klist kpasswd Patch
    116231-03 SunOS 5.9: llc2 Patch
    116776-01 SunOS 5.9: mipagent patch
    117420-02 SunOS 5.9: mdb Patch
    117179-01 SunOS 5.9: nfs_dlboot Patch
    121194-01 SunOS 5.9: usr/lib/nfs/statd Patch
    116502-03 SunOS 5.9: mountd Patch
    113331-01 SunOS 5.9: usr/lib/nfs/rquotad Patch
    113281-01 SunOS 5.9: patch /usr/lib/netsvc/yp/ypbind
    114736-01 SunOS 5.9: usr/sbin/nisrestore Patch
    115695-01 SunOS 5.9: /usr/lib/netsvc/yp/yppush Patch
    113321-06 SunOS 5.9: patch sf and socal
    113049-01 SunOS 5.9: luxadm & liba5k.so.2 Patch
    116663-01 SunOS 5.9: ntpdate Patch
    117143-01 SunOS 5.9: xntpd Patch
    113028-01 SunOS 5.9: patch /kernel/ipp/flowacct
    113320-06 SunOS 5.9: patch se driver
    114731-08 SunOS 5.9: kernel/drv/glm Patch
    115667-03 SunOS 5.9: Chalupa platform support Patch
    117428-01 SunOS 5.9: picl Patch
    113327-03 SunOS 5.9: pppd Patch
    114374-01 SunOS 5.9: Perl patch
    115173-01 SunOS 5.9: /usr/bin/sparcv7/gcore /usr/bin/sparcv9/gcore Patch
    114716-02 SunOS 5.9: usr/bin/rcp Patch
    112915-04 SunOS 5.9: snoop Patch
    116778-01 SunOS 5.9: in.ripngd patch
    112916-01 SunOS 5.9: rtquery Patch
    112928-03 SunOS 5.9: in.ndpd Patch
    119447-01 SunOS 5.9: ses Patch
    115354-01 SunOS 5.9: slpd Patch
    116493-01 SunOS 5.9: ProtocolTO.java Patch
    116780-02 SunOS 5.9: scmi2c Patch
    112972-17 SunOS 5.9: patch /usr/lib/libssagent.so.1 /usr/lib/libssasnmp.so.1 mibiisa
    116480-01 SunOS 5.9: IEEE 1394 Patch
    122485-01 SunOS 5.9: 1394 mass storage driver patch
    113716-02 SunOS 5.9: sar & sadc Patch
    115651-02 SunOS 5.9: usr/lib/acct/runacct Patch
    116490-01 SunOS 5.9: acctdusg Patch
    117473-01 SunOS 5.9: fwtmp Patch
    116180-01 SunOS 5.9: geniconvtbl Patch
    114006-01 SunOS 5.9: tftp Patch
    115646-01 SunOS 5.9: libtnfprobe shared library Patch
    113334-03 SunOS 5.9: udfs Patch
    115350-01 SunOS 5.9: ident_udfs.so.1 Patch
    122484-01 SunOS 5.9: preen_md.so.1 patch
    117134-01 SunOS 5.9: svm flasharchive patch
    116472-02 SunOS 5.9: rmformat Patch
    112966-05 SunOS 5.9: patch /usr/sbin/vold
    114229-01 SunOS 5.9: action_filemgr.so.1 Patch
    114335-02 SunOS 5.9: usr/sbin/rmmount Patch
    120443-01 SunOS 5.9: sed core dumps on long lines
    121588-01 SunOS 5.9: /usr/xpg4/bin/awk Patch
    113470-02 SunOS 5.9: winlock Patch
    119211-07 NSS_NSPR_JSS 3.11: NSPR 4.6.1 / NSS 3.11 / JSS 4.2
    118666-05 J2SE 5.0: update 6 patch
    118667-05 J2SE 5.0: update 6 patch, 64bit
    114612-01 SunOS 5.9: ANSI-1251 encodings file errors
    114276-02 SunOS 5.9: Extended Arabic support in UTF-8
    117400-01 SunOS 5.9: ISO8859-6 and ISO8859-8 iconv symlinks
    113584-16 SunOS 5.9: yesstr, nostr nl_langinfo() strings incorrect in S9
    117256-01 SunOS 5.9: Remove old OW Xresources.ow files
    112625-01 SunOS 5.9: Dcam1394 patch
    114600-05 SunOS 5.9: vlan driver patch
    117119-05 SunOS 5.9: Sun Gigabit Ethernet 3.0 driver patch
    117593-04 SunOS 5.9: Manual Page updates for Solaris 9
    112622-19 SunOS 5.9: M64 Graphics Patch
    115953-06 Sun Cluster 3.1: Sun Cluster sccheck patch
    117949-23 Sun Cluster 3.1: Core Patch for Solaris 9
    115081-06 Sun Cluster 3.1: HA-Sun One Web Server Patch
    118627-08 Sun Cluster 3.1: Manageability and Serviceability Agent
    117985-03 SunOS 5.9: XIL 1.4.2 Loadable Pipeline Libraries
    113896-06 SunOS 5.9: en_US.UTF-8 locale patch
    114967-02 SunOS 5.9: FDL patch
    114677-11 SunOS 5.9: International Components for Unicode Patch
    112805-01 CDE 1.5: Help volume patch
    113841-01 CDE 1.5: answerbook patch
    113839-01 CDE 1.5: sdtwsinfo patch
    115713-01 CDE 1.5: dtfile patch
    112806-01 CDE 1.5: sdtaudiocontrol patch
    112804-02 CDE 1.5: sdtname patch
    113244-09 CDE 1.5: dtwm patch
    114312-02 CDE1.5: GNOME/CDE Menu for Solaris 9
    112809-02 CDE:1.5 Media Player (sdtjmplay) patch
    113868-02 CDE 1.5: PDASync patch
    119976-01 CDE 1.5: dtterm patch
    112771-30 Motif 1.2.7 and 2.1.1: Runtime library patch for Solaris 9
    114282-01 CDE 1.5: libDtWidget patch
    113789-01 CDE 1.5: dtexec patch
    117728-01 CDE1.5: dthello patch
    113863-01 CDE 1.5: dtconfig patch
    112812-01 CDE 1.5: dtlp patch
    113861-04 CDE 1.5: dtksh patch
    115972-03 CDE 1.5: dtterm libDtTerm patch
    114654-02 CDE 1.5: SmartCard patch
    117632-01 CDE1.5: sun_at patch for Solaris 9
    113374-02 X11 6.6.1: xpr patch
    118759-01 X11 6.6.1: Font Administration Tools patch
    117577-03 X11 6.6.1: TrueType fonts patch
    116084-01 X11 6.6.1: font patch
    113098-04 X11 6.6.1: X RENDER extension patch
    112787-01 X11 6.6.1: twm patch
    117601-01 X11 6.6.1: libowconfig.so.0 patch
    117663-02 X11 6.6.1: xwd patch
    113764-04 X11 6.6.1: keyboard patch
    113541-02 X11 6.6.1: XKB patch
    114561-01 X11 6.6.1: X splash screen patch
    113513-02 X11 6.6.1: platform support for new hardware
    116121-01 X11 6.4.1: platform support for new hardware
    114602-04 X11 6.6.1: libmpg_psr patch
    Is there a bundle to install or i have to install each patch separatly_?

  • 2 node rac cluster - continuous reboot

    i have semi-successfully installed oracle clusterware on 2 nodes........... had trouble with the last screen running root.sh and orainstRoot.sh (whatever).
    Now, what i have is the continuous reboot of 2 nodes............... i have tried restarting them at exactly the same time but the 1 node reboot as soon as you log in to the node.......... the other node just reboots at some stage.
    So, how to resolve ?
    I am using openfiler 2.2; enterprise linux 5.0 installing oracle 11.1.0.6.
    I am following the instructions as posted on the otn.oracle.com website.........
    My last step was to deinstall the clusterware software as it had issues during final stage of starting ONS and GND (or whatever). I had to reboot after removing the software and that is when continuous reboot cycle started.
    Any help appreciated.....
    THIS IS JUST A DEMO SYSTEM but still i would like to get it working as quickly as possible.

    I think Your system reboot.., Because CRS on Oracle Cluster Software had the problem... you should check logs at $ORA_CRS_HOME/log/<nodename>/*/ ... (about heart beat)
    If you have problem reboot ... when you removing software ,you have make sure you stop RAC processes
    ps -ef | grep init | grep crsAnyway If you don't need to RAC processes start, when you restart server...
    You should comment on /etc/inittab file ()
    #h1:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1 </dev/null
    #h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null
    #h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/nullOr disable...
    10gR1
    /etc/init.d/init.crs disable
    /etc/init.d/init.crs stopabove 10gR2
    $ORA_CRS_HOME/bin/crsctl disable crs
    $ORA_CRS_HOME/bincrsctl stop crsAbout Your System reboot, when on Oracle Cluster... You should check log... , HeartBeat (Interconnect and Disk (CRS File and Voting File))
    Anyway Contact Oracle Support
    Finally, You should installed RAC... If you find system reboot ... You should disable crs, and then investigate the problem (Check Log $ORA_CRS_HOME/log/<nodename>/crsd/*)
    Good Luck

  • OrainstRoot.sh: Failure to promote local gpnp setup to other cluster nodes

    I'm trying to build a 2 node cluster and everything appeared to be going swimmingly until the end of the 1st nodes running of the orainstRoot.sh script.
    The following is the end of the output:
    Disk Group OCR_VOTE created successfully.
    clscfg: -install mode specified
    Successfully accumulated necessary OCR keys.
    Creating OCR keys for user 'root', privgrp 'root'..
    Operation successful.
    CRS-4256: Updating the profile
    Successful addition of voting disk 4e3f692529584f8bbf7f16146bd90346.
    Successful addition of voting disk 728bed918cf54f6cbf904d37638c674b.
    Successful addition of voting disk 8ac20793405d4fdcbfcafc7e311f877d.
    Successfully replaced voting disk group with +OCR_VOTE.
    CRS-4256: Updating the profile
    CRS-4266: Voting file(s) successfully replaced
    ## STATE File Universal Id File Name Disk group
    1. ONLINE 4e3f692529584f8bbf7f16146bd90346 (ORCL:VOTE01) [OCR_VOTE]
    2. ONLINE 728bed918cf54f6cbf904d37638c674b (ORCL:VOTE02) [OCR_VOTE]
    3. ONLINE 8ac20793405d4fdcbfcafc7e311f877d (ORCL:VOTE03) [OCR_VOTE]
    Located 3 voting disk(s).
    Failed to rmtcopy "/tmp/fileLgKPGV" to "/u01/app/11.2.0/grid/gpnp/manifest.txt" for nodes {ilprevzedb01,ilprevzedb02}, rc=256
    Failed to rmtcopy "/u01/app/11.2.0/grid/gpnp/ilprevzedb01/profiles/peer/profile.xml" to "/u01/app/11.2.0/grid/gpnp/profiles/peer/profile.xml" for nodes {ilprevzedb01,ilprevzedb02}, rc=256
    rmtcopy aborted
    Failed to promote local gpnp setup to other cluster nodes at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 6504.
    /u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed
    Has anyone run into this problem and found a solution?
    Thanks in advance!

    Ok, for everyone out there, I resolved the issue. Hopefully this will help others encountering the same problem.
    It turns out that when the OS was installed, iptables firewall was enabled. This will cause havoc with the installer scripts.
    My first inkling should have been when the installer stalled at 65% trying to copy home directories between nodes, the first time I ran through the installer.
    At that time, Googling around found that iptables might be the problem and indeed it was running, so I just did a 'service iptables stop' WITHOUT REBOOTING THE NODES and re-ran the installer.
    Well, it looks as though NOT REBOOTING THE NODES doesn't quite cut it. I then did a 'chkconfig iptables off' and REBOOTED BOTH NODES.
    Oracle support simply provided me with: How to Proceed from Failed 11gR2 Grid Infrastructure (CRS) Installation (Doc ID 942166.1), which didn't really work all that well, lots of failures, errors, etc. So I just deleted the 11.2.0 directory and tried running the installer again.
    This time the install went through without problems.
    Thanks!

Maybe you are looking for

  • I am going to buy a bigger hard drive. i am operating on Lion. how do i install my os on the new disk? i can't use time machine.

    i am going to buy a bigger hard drive. i am operating on Lion. how do i install my os on the new disk? i can't use time machine.

  • Export PDF with additional data

    I would like to add a button on the toolbar next to the PDF button. Which would export a pdf to a path then call a script to which we can send META data with,like order number, business partner nr. etc. Where can one add a icon and write a function f

  • Oracle CEP error

    I am using ORalce WebLogic, ORacle CEP on WIndows I am also using the Eclipse 3.3.2 (Europa) version for ORacle CEP specifid in the Oracle CEP directions I also installe the ORacle CEP plu-in into Eclipse. I created a HelloWorld application, and ran

  • Automated name for NI-Report Print function

    Hi , I'm using NI-Report , while using the NIReport_Print function i notice that i always need to choose my file name. I want to set a var lets say : char reportFileName[256]; in the soft i will set reportFileName and i want the report to be saved au

  • Creator's JDBC Requirements

    Oracle says at its website http://www.oracle.com/technology/tech/java/sqlj_jdbc/htdocs/jdbc_faq.htm that their 10.1.0 drivers offer: "Full support for JDBC 3.0 except for: * retrieving auto-generated keys * result-set holdability * returning multiple