NFS cluster node crashed

Hi all, we have a 2-node cluster running Solaris 10 11/06 and Sun Cluster 3.2.
Recently, we were asked to nfs mount on node 1 of the cluster, a directory from an external Linux host (ie node 1 of the cluster is the nfs client; the linux server is the nfs server).
A few days later, early on a Sunday morning, the linux server developed a high load and was very slow to log into. Around the same time, node 1 of the cluster rebooted. Was this reboot of node 1 a coincidence? I'm not sure.
Anyone got ideas/suggestions about this situation (eg the slow response of the nfs linux server caused node 1 of the cluster to reboot; the external nfs mount is a bad idea)?
Stewart

Hi,
your assumption sounds very unreasonable. But without any hard facts like
- the panic string
- contents of /var/adm/messages at time of crash
- configuration information
- etc.
it is impossible to tell.
Regards
Hartmut

Similar Messages

  • Cluster Node Crashes

    I'm not sure this is the proper forum for this post, if it's not please feel free to move it.
    The situation I'm facing is this:
    My company has clusters setup across North America with our software that utilizes the Oracle database. 90% of the time everything functions exactly as it is supposed to. However, it is the other 10% of sites that I am here to ask about.
    Our clusters are setup in a dual-server environment that basically act as a single server. The application runs on one server and the database runs on another, and in the case of problems, either can be failed over to run both sets of services on a single server (basic, I realize). At certain sites we are unable to run services on one of the nodes. When they are run as they are supposed to, every so often (at some sites a matter of minutes/hours, at others it can be a couple weeks) they will BSOD.
    I fully understand what the blue screen is. The minidump shows that it's the orafencedrv.sys stop, where the Oracle database shuts down a node after loss of communications in order to prevent corruption of the database. This is a great feature and I'm grateful for it, however it has caused us many headaches in diagnosing what it actually causing the drop in communications.
    The interconnect and the public IP are both hooked up over a single switch but they operate on different subnets. Could operating on a single switch be part of the problem?
    Could the problem be that the switches are being overloaded with traffic causing temporary packet losses between the two nodes, which I know is enough to have Oracle BSOD a node?
    Below I'm posting one of the dumps listed in the CSSD log when the node crashes, hopefully this will provide some sort of information as to what is happening.
    If any other information is needed, please feel free to let me know. Thanks for your help in advance.
    [    CSSD]2008-10-29 13:30:06.211 [2732] >ERROR: clssnmvDiskKillCheck: Aborting, evicted by node 1, sync 13, stamp 99832890,
    [    CSSD]2008-10-29 13:30:06.211 [2732] >ERROR: ###################################
    [    CSSD]2008-10-29 13:30:06.211 [2732] >ERROR: clssscExit: CSSD aborting
    [    CSSD]2008-10-29 13:30:06.211 [2732] >ERROR: ###################################
    [    CSSD]--- DUMP GROCK STATE DB ---
    [    CSSD]----------
    [    CSSD] type 2, Id 3, Name = (crs_version)
    [    CSSD] flags: 0x0
    [    CSSD] grant: count=0, type 0, wait 0
    [    CSSD] Member Count =2, master 0
    [    CSSD] . . . . .
    [    CSSD] memberNo =0, seq 5
    [    CSSD] flags = 0x0, granted 0
    [    CSSD] refCnt = 1
    [    CSSD] nodeNum = 2, nodeBirth 6
    [    CSSD] privateDataSize = 0
    [    CSSD] publicDataSize = 0
    [    CSSD] . . . . .
    [    CSSD] memberNo =1, seq 11
    [    CSSD] flags = 0x0, granted 0
    [    CSSD] refCnt = 1
    [    CSSD] nodeNum = 1, nodeBirth 12
    [    CSSD] privateDataSize = 0
    [    CSSD] publicDataSize = 0
    [    CSSD]----------
    [    CSSD]----------
    [    CSSD] type 2, Id 2, Name = (ocr_STLRZOPRCL)
    [    CSSD] flags: 0x0
    [    CSSD] grant: count=0, type 0, wait 0
    [    CSSD] Member Count =2, master 2
    [    CSSD] . . . . .
    [    CSSD] memberNo =2, seq 5
    [    CSSD] flags = 0x0, granted 0
    [    CSSD] refCnt = 1
    [    CSSD] nodeNum = 2, nodeBirth 6
    [    CSSD] privateDataSize = 0
    [    CSSD] publicDataSize = 32
    [    CSSD] . . . . .
    [    CSSD] memberNo =1, seq 11
    [    CSSD] flags = 0x0, granted 0
    [    CSSD] refCnt = 1
    [    CSSD] nodeNum = 1, nodeBirth 12
    [    CSSD] privateDataSize = 0
    [    CSSD] publicDataSize = 32
    [    CSSD]----------
    [    CSSD]----------
    [    CSSD] type 3, Id 15, Name = (_ORA_CRS_MEMBER_stlrzoprcl1)
    [    CSSD] flags: 0x0
    [    CSSD] grant: count=1, type 3, wait 1
    [    CSSD] Member Count =1, master -3
    [    CSSD] . . . . .
    [    CSSD] memberNo =0, seq 0
    [    CSSD] flags = 0x12, granted 0
    [    CSSD] refCnt = 1
    [    CSSD] nodeNum = 1, nodeBirth 12
    [    CSSD] privateDataSize = 0
    [    CSSD] publicDataSize = 0
    [    CSSD]----------
    [    CSSD]----------
    [    CSSD] type 3, Id 15, Name = (_ORA_CRS_MEMBER_stlrzoprcl2)
    [    CSSD] flags: 0x0
    [    CSSD] grant: count=1, type 3, wait 1
    [    CSSD] Member Count =1, master -3
    [    CSSD] . . . . .
    [    CSSD] memberNo =0, seq 0
    [    CSSD] flags = 0x12, granted 1
    [    CSSD] refCnt = 1
    [    CSSD] nodeNum = 2, nodeBirth 6
    [    CSSD] privateDataSize = 0
    [    CSSD] publicDataSize = 0
    [    CSSD]----------
    [    CSSD]----------
    [    CSSD] type 2, Id 4, Name = (CRSDMAIN)
    [    CSSD] flags: 0x0
    [    CSSD] grant: count=0, type 0, wait 0
    [    CSSD] Member Count =2, master 2
    [    CSSD] . . . . .
    [    CSSD] memberNo =2, seq 5
    [    CSSD] flags = 0x0, granted 0
    [    CSSD] refCnt = 1
    [    CSSD] nodeNum = 2, nodeBirth 6
    [    CSSD] privateDataSize = 128
    [    CSSD] publicDataSize = 128
    [    CSSD] . . . . .
    [    CSSD] memberNo =1, seq 11
    [    CSSD] flags = 0x0, granted 0
    [    CSSD] refCnt = 1
    [    CSSD] nodeNum = 1, nodeBirth 12
    [    CSSD] privateDataSize = 128
    [    CSSD] publicDataSize = 128
    [    CSSD]----------
    [    CSSD]----------
    [    CSSD] type 2, Id 1, Name = (EVMDMAIN)
    [    CSSD] flags: 0x0
    [    CSSD] grant: count=0, type 0, wait 0
    [    CSSD] Member Count =2, master 2
    [    CSSD] . . . . .
    [    CSSD] memberNo =2, seq 5
    [    CSSD] flags = 0x0, granted 0
    [    CSSD] refCnt = 1
    [    CSSD] nodeNum = 2, nodeBirth 6
    [    CSSD] privateDataSize = 508
    [    CSSD] publicDataSize = 504
    [    CSSD] . . . . .
    [    CSSD] memberNo =1, seq 11
    [    CSSD] flags = 0x0, granted 0
    [    CSSD] refCnt = 1
    [    CSSD] nodeNum = 1, nodeBirth 12
    [    CSSD] privateDataSize = 508
    [    CSSD] publicDataSize = 504
    [    CSSD]----------
    [    CSSD]--- END OF GROCK STATE DUMP ---
    [    CSSD]------- End Dump -------

    Hi user10508733
    Seems to be your first post, welcome to this forum!!
    What is the OS (blue screen that should be windows? ) and what is the release of your CRS and RDBMS ? hopefully not 10.1x.x.x, if yes please patch it to 10.2.0.4.
    Seems to have a lot of bugs about CRS before 10.2.0.3 see that list
    Doc ID:      Note:391116.1
    Subject:      10.2.0.3 Patch Set - List of Bug Fixes by Problem Type
    let us know what's the result
    thanks

  • Local NFS / LDAP on cluster nodes

    Hi,
    I have a 2-node cluster (3.2 1/09) on Solaris 10 U8, providing NFS (/home) and LDAP for clients. I would like to configure LDAP and NFS clients on each cluster node, so they share user information with the rest of the machines.
    I assume the right way to do this is to configure the cluster nodes the same as other clients, using the HA Logical Hostnames for the LDAP and NFS server; this way, there's always a working LDAP and NFS server for each node. However, what happens if both nodes reboot at once (for example, power failure)? As the first node boots, there is no working LDAP or NFS server, because it hasn't been started yet. Will this cause the boot to fail and require manual intervention, or will the cluster boot without NFS and LDAP clients enabled, allowing me to fix it later?

    Thanks. In that case, is it safe to configure the NFS-exported filesystem as a global mount, and symlink e.g. "/home" -> "/global/home", so home directories are accessible via the normal path on both nodes? (I understand global filesystems have worse performance, but this would just be for administrators logging in with their LDAP accounts.)
    For LDAP, my concern is that if svc:/network/ldap/client:default fails during startup (because no LDAP server is running yet), it might prevent the cluster services from starting, even though all names required by cluster are available from /etc.

  • Node crashes when enabling RDS for private interconnect.

    OS: oel6.3 - 2.6.39-300.17.2.el6uek.x86_64
    Grid and DB: 11.2.0.3.4
    This is a two node Standard Edition cluster.
    The node crashes upon restart of clusterware after following the instructions from note:751343.1 (RAC Support for RDS Over Infiniband) to enable RDS.
    The cluster is running fine using ipoib for the cluster_interconnect.
    1) As the ORACLE_HOME/GI_HOME owner, stop all resources (database, listener, ASM etc) that's running from the home. When stopping database, use NORMAL or IMMEDIATE option.
    2) As root, if relinking 11gR2 Grid Infrastructure (GI) home, unlock GI home: GI_HOME/crs/install/rootcrs.pl -unlock
    3) As the ORACLE_HOME/GI_HOME owner, go to ORACLE_HOME/GI_HOME and cd to rdbms/lib
    4) As the ORACLE_HOME/GI_HOME owner, issue "make -f ins_rdbms.mk ipc_rds ioracle"
    5) As root, if relinking 11gR2 Grid Infrastructure (GI) home, lock GI home: GI_HOME/crs/install/rootcrs.pl -patch
    Looks to abend when asm tries to start with the message below on the console.
    I have a service request open for this issue but, I am hoping someone may have seen this and has
    some way around it.
    Thanks
    Alan
    kernel BUG at net/rds/ib_send.c:547!
    invalid opcode: 0000 [#1] SMP
    CPU 2
    Modules linked in: 8021q garp stp llc iptable_filter ip_tables nfs lockd
    fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand powernow_k8
    freq_table mperf rds_rdma rds_tcp rds ib_ipoib rdma_ucm ib_ucm ib_uverbs
    ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa sr_mod cdrom microcode
    serio_raw pcspkr ghes hed k10temp hwmon amd64_edac_mod edac_core
    edac_mce_amd i2c_piix4 i2c_core sg igb dca mlx4_ib ib_mad ib_core
    mlx4_en mlx4_core ext4 mbcache jbd2 usb_storage sd_mod crc_t10dif ahci
    libahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
    scsi_wait_scan]
    Pid: 4140, comm: kworker/u:1 Not tainted 2.6.39-300.17.2.el6uek.x86_64
    #1 Supermicro BHDGT/BHDGT
    RIP: 0010:[<ffffffffa02db829>] [<ffffffffa02db829>]
    rds_ib_xmit+0xa69/0xaf0 [rds_rdma]
    RSP: 0018:ffff880fb84a3c50 EFLAGS: 00010202
    RAX: ffff880fbb694000 RBX: ffff880fb3e4e600 RCX: 0000000000000000
    RDX: 0000000000000030 RSI: ffff880fbb6c3a00 RDI: ffff880fb058a048
    RBP: ffff880fb84a3d30 R08: 0000000000000fd0 R09: ffff880fbb6c3b90
    R10: 0000000000000000 R11: 000000000000001a R12: ffff880fbb6c3a00
    R13: ffff880fbb6c3a00 R14: 0000000000000000 R15: ffff880fb84a3d90
    FS: 00007fd0a3a56700(0000) GS:ffff88101e240000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000002158ca2 CR3: 0000000001783000 CR4: 00000000000406e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process kworker/u:1 (pid: 4140, threadinfo ffff880fb84a2000, task
    ffff880fae970180)
    Stack:
    0000000000012200 0000000000012200 ffff880f00000000 0000000000000000
    000000000000e5b0 ffffffff8115af81 ffffffff81b8d6c0 ffffffffa02b2e12
    00000001bf272240 ffffffff81267020 ffff880fbb6c3a00 0000003000000002
    Call Trace:
    [<ffffffff8115af81>] ? __kmalloc+0x1f1/0x200
    [<ffffffffa02b2e12>] ? rds_message_alloc+0x22/0x90 [rds]
    [<ffffffff81267020>] ? sg_init_table+0x30/0x50
    [<ffffffffa02b2db2>] ? rds_message_alloc_sgs+0x62/0xa0 [rds]
    [<ffffffffa02b31e4>] ? rds_message_map_pages+0xa4/0x110 [rds]
    [<ffffffffa02b4f3b>] rds_send_xmit+0x38b/0x6e0 [rds]
    [<ffffffff81089d53>] ? cwq_activate_first_delayed+0x53/0x100
    [<ffffffffa02b6040>] ? rds_recv_worker+0xc0/0xc0 [rds]
    [<ffffffffa02b6075>] rds_send_worker+0x35/0xc0 [rds]
    [<ffffffff81089fd6>] process_one_work+0x136/0x450
    [<ffffffff8108bbe0>] worker_thread+0x170/0x3c0
    [<ffffffff8108ba70>] ? manage_workers+0x120/0x120
    [<ffffffff810907e6>] kthread+0x96/0xa0
    [<ffffffff81515544>] kernel_thread_helper+0x4/0x10
    [<ffffffff81090750>] ? kthread_worker_fn+0x1a0/0x1a0
    [<ffffffff81515540>] ? gs_change+0x13/0x13
    Code: ff ff e9 b1 fe ff ff 48 8b 0d b4 54 4b e1 48 89 8d 70 ff ff ff e9
    71 ff ff ff 83 bd 7c ff ff ff 00 0f 84 f4 f5 ff ff 0f 0b eb fe <0f> 0b
    eb fe 44 8b 8d 48 ff ff ff 41 b7 01 e9 51 f6 ff ff 0f 0b
    RIP [<ffffffffa02db829>] rds_ib_xmit+0xa69/0xaf0 [rds_rdma]
    RSP <ffff880fb84a3c50>
    Initializing cgroup subsys cpuset
    Initializing cgroup subsys cpu
    Linux version 2.6.39-300.17.2.el6uek.x86_64
    ([email protected]) (gcc version 4.4.6 20110731 (Red
    Hat 4.4.6-3) (GCC) ) #1 SMP Wed Nov 7 17:48:36 PST 2012
    Command line: ro root=UUID=5ad1a268-b813-40da-bb76-d04895215677
    rd_DM_UUID=ddf1_stor rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD
    SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us numa=off
    console=ttyS1,115200n8 irqpoll maxcpus=1 nr_cpus=1 reset_devices
    cgroup_disable=memory mce=off memmap=exactmap memmap=538K@64K
    memmap=130508K@770048K elfcorehdr=900556K memmap=72K#3668608K
    memmap=184K#3668680K
    BIOS-provided physical RAM map:
    BIOS-e820: 0000000000000100 - 0000000000096800 (usable)
    BIOS-e820: 0000000000096800 - 00000000000a0000 (reserved)
    BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)
    BIOS-e820: 0000000000100000 - 00000000dfe90000 (usable)
    BIOS-e820: 00000000dfe9e000 - 00000000dfea0000 (reserved)
    BIOS-e820: 00000000dfea0000 - 00000000dfeb2000 (ACPI data)
    BIOS-e820: 00000000dfeb2000 - 00000000dfee0000 (ACPI NVS)
    BIOS-e820: 00000000dfee0000 - 00000000f0000000 (reserved)
    BIOS-e820: 00000000ffe00000 - 0000000100000000 (reserved)

    I believe OFED version is 1.5.3.3 but I am not sure if this is correct.
    We have not added any third parry drivers. All that has been done to add infiniband to our build is
    a yum groupinstall iInfiniband support.
    I have not tries rds-stress but rds-ping works fine and rds-info seems fine.
    A service request has been opened but so far I have had better response here.
    oracle@blade1-6:~> rds-info
    RDS IB Connections:
    LocalAddr RemoteAddr LocalDev RemoteDev
    10.10.0.116 10.10.0.119 fe80::25:90ff:ff07:df1d fe80::25:90ff:ff07:e0e5
    TCP Connections:
    LocalAddr LPort RemoteAddr RPort HdrRemain DataRemain SentNxt ExpectUna SeenUna
    Counters:
    CounterName Value
    conn_reset 5
    recv_drop_bad_checksum 0
    recv_drop_old_seq 0
    recv_drop_no_sock 1
    recv_drop_dead_sock 0
    recv_deliver_raced 0
    recv_delivered 18
    recv_queued 18
    recv_immediate_retry 0
    recv_delayed_retry 0
    recv_ack_required 4
    recv_rdma_bytes 0
    recv_ping 14
    send_queue_empty 18
    send_queue_full 0
    send_lock_contention 0
    send_lock_queue_raced 0
    send_immediate_retry 0
    send_delayed_retry 0
    send_drop_acked 0
    send_ack_required 3
    send_queued 32
    send_rdma 0
    send_rdma_bytes 0
    send_pong 14
    page_remainder_hit 0
    page_remainder_miss 0
    copy_to_user 0
    copy_from_user 0
    cong_update_queued 0
    cong_update_received 1
    cong_send_error 0
    cong_send_blocked 0
    ib_connect_raced 4
    ib_listen_closed_stale 0
    ib_tx_cq_call 6
    ib_tx_cq_event 6
    ib_tx_ring_full 0
    ib_tx_throttle 0
    ib_tx_sg_mapping_failure 0
    ib_tx_stalled 16
    ib_tx_credit_updates 0
    ib_rx_cq_call 33
    ib_rx_cq_event 38
    ib_rx_ring_empty 0
    ib_rx_refill_from_cq 0
    ib_rx_refill_from_thread 0
    ib_rx_alloc_limit 0
    ib_rx_credit_updates 0
    ib_ack_sent 4
    ib_ack_send_failure 0
    ib_ack_send_delayed 0
    ib_ack_send_piggybacked 0
    ib_ack_received 3
    ib_rdma_mr_alloc 0
    ib_rdma_mr_free 0
    ib_rdma_mr_used 0
    ib_rdma_mr_pool_flush 8
    ib_rdma_mr_pool_wait 0
    ib_rdma_mr_pool_depleted 0
    ib_atomic_cswp 0
    ib_atomic_fadd 0
    iw_connect_raced 0
    iw_listen_closed_stale 0
    iw_tx_cq_call 0
    iw_tx_cq_event 0
    iw_tx_ring_full 0
    iw_tx_throttle 0
    iw_tx_sg_mapping_failure 0
    iw_tx_stalled 0
    iw_tx_credit_updates 0
    iw_rx_cq_call 0
    iw_rx_cq_event 0
    iw_rx_ring_empty 0
    iw_rx_refill_from_cq 0
    iw_rx_refill_from_thread 0
    iw_rx_alloc_limit 0
    iw_rx_credit_updates 0
    iw_ack_sent 0
    iw_ack_send_failure 0
    iw_ack_send_delayed 0
    iw_ack_send_piggybacked 0
    iw_ack_received 0
    iw_rdma_mr_alloc 0
    iw_rdma_mr_free 0
    iw_rdma_mr_used 0
    iw_rdma_mr_pool_flush 0
    iw_rdma_mr_pool_wait 0
    iw_rdma_mr_pool_depleted 0
    tcp_data_ready_calls 0
    tcp_write_space_calls 0
    tcp_sndbuf_full 0
    tcp_connect_raced 0
    tcp_listen_closed_stale 0
    RDS Sockets:
    BoundAddr BPort ConnAddr CPort SndBuf RcvBuf Inode
    0.0.0.0 0 0.0.0.0 0 131072 131072 340441
    RDS Connections:
    LocalAddr RemoteAddr NextTX NextRX Flg
    10.10.0.116 10.10.0.119 33 38 --C
    Receive Message Queue:
    LocalAddr LPort RemoteAddr RPort Seq Bytes
    Send Message Queue:
    LocalAddr LPort RemoteAddr RPort Seq Bytes
    Retransmit Message Queue:
    LocalAddr LPort RemoteAddr RPort Seq Bytes
    10.10.0.116 0 10.10.0.119 40549 32 0
    oracle@blade1-6:~> cat /etc/rdma/rdma.conf
    # Load IPoIB
    IPOIB_LOAD=yes
    # Load SRP module
    SRP_LOAD=no
    # Load iSER module
    ISER_LOAD=no
    # Load RDS network protocol
    RDS_LOAD=yes
    # Should we modify the system mtrr registers? We may need to do this if you
    # get messages from the ib_ipath driver saying that it couldn't enable
    # write combining for the PIO buffs on the card.
    # Note: recent kernels should do this for us, but in case they don't, we'll
    # leave this option
    FIXUP_MTRR_REGS=no
    # Should we enable the NFSoRDMA service?
    NFSoRDMA_LOAD=yes
    NFSoRDMA_PORT=2050
    oracle@blade1-6:~> /etc/init.d/rdma status
    Low level hardware support loaded:
         mlx4_ib
    Upper layer protocol modules:
         rds_rdma ib_ipoib
    User space access modules:
         rdma_ucm ib_ucm ib_uverbs ib_umad
    Connection management modules:
         rdma_cm ib_cm iw_cm
    Configured IPoIB interfaces: none
    Currently active IPoIB interfaces: ib0

  • SCVMM losing connection to cluster nodes

    Hey guys'n girls, I hope this is the right forum for this question. I already opened a ticket at MS support as well because it's impacting our production environment indirectly, but even after a week there's been no contact. Losing faith in MS support there
    The problem we're having is that scvmm is that a host enters the 'needs attention' state, with a winrm error 0x80338126. I guess it has something to do with the network or with Kerberos, and I've found some info on it, but I still haven't been able to solve
    it. Do you guys have any ideas?
    Problem summary:
    We are seeing an issue on our new hyper-v platform. The platform should have been in production last week, but this issue is delaying our project as we can't seem to get it stable.
    The problem we are experiencing is that SCVMM loses the connection to some of the Hyper-V nodes. Not one
     specific node. Last week it happened to two nodes, and today it happened to another node. I see issues with WinRM, and I expect something to do with kerberos. See the bottom of this post for background details and software versions.
    The host gets the status 'needs attention', and if you look at the status of the machine, WinRM gives an error. The error is:
    Error (2916)
    VMM is unable to complete the request. The connection to the agent cc1-hyp-10.domaincloud1.local was lost.
    WinRM: URL: [http://cc1-hyp-10.domaincloud1.local:5985], Verb: [ENUMERATE], Resource: [http://schemas.microsoft.com/wbem/wsman/1/wmi/root/cimv2/Win32_Service], Filter: [select * from Win32_Service where Name="WinRM"]
    Unknown error (0x80338126)
    Recommended Action
    Ensure that the Windows Remote Management (WinRM) service and the VMM agent are installed and running and that a firewall is not blocking HTTP/HTTPS traffic. Ensure that VMM server is able to communicate with cc1-hyp-10.domaincloud1.local over WinRM by successfully
    running the following command:
     winrm id –r:cc1-hyp-10.domaincloud1.local
    This
     problem can also be caused by a Windows Management Instrumentation (WMI) service crash. If the server is running Windows Server 2008 R2, ensure that KB 982293 (http://support.microsoft.com/kb/982293)
    is installed on it.
    If the error persists, restart cc1-hyp-10.domaincloud1.local and then try the operation again. /nRefer to
    http://support.microsoft.com/kb/2742275 for more details.
    Doing a simple test from the VMM server to the problematic cluster node shows this error:
    PS C:\> hostname
    CC1-VMM-01
    PS C:\> winrm id -r:cc1-hyp-10.domaincloud1.local
    WSManFault
        Message = WinRM cannot complete the operation. Verify that the specified computer name is valid, that the computer is accessible over the network, and that a firewall exception for the WinRM service is enabled and allows access from this
    computer. By default, the WinRM firewall exception for public profiles limits access to remote computers within the same local subnet.
    Error number:  -2144108250 0x80338126
    WinRM cannot complete the operation. Verify that the specified computer name is valid, that the computer is accessible over the network, and that a firewall exception for the WinRM service is enabled and allows access from this computer. By default, the WinRM
    firewall exception for public profiles limits access to remote computers within the same local subnet.
    I CAN connect from other hosts to this problematic cluster node:
    PS C:\> hostname
    CC1-HYP-16
    PS C:\> winrm id -r:cc1-hyp-10.domaincloud1.local
    IdentifyResponse
        ProtocolVersion =
    http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd
        ProductVendor = Microsoft Corporation
        ProductVersion = OS: 6.3.9600 SP: 0.0 Stack: 3.0
        SecurityProfiles
            SecurityProfileName =
    http://schemas.dmtf.org/wbem/wsman/1/wsman/secprofile/http/spnego-kerberos
    And I can connect from the vmm server to all other cluster nodes:
    PS C:\> hostname
    CC1-VMM-01
    PS C:\> winrm id -r:cc1-hyp-11.domaincloud1.local
    IdentifyResponse
        ProtocolVersion =
    http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd
        ProductVendor = Microsoft Corporation
        ProductVersion = OS: 6.3.9600 SP: 0.0 Stack: 3.0
        SecurityProfiles
            SecurityProfileName =
    http://schemas.dmtf.org/wbem/wsman/1/wsman/secprofile/http/spnego-kerberos
    So at this point only the test from the cc1-vmm-01 to cc1-hyp-10 seems to be problematic.
    I followed the steps in the page
    https://support.microsoft.com/kb/2742275 (which is referred to above). I tried the VMMCA, but it can't really get it working the way I want, or it seems to give outdated recommendations.
    I tried checking for duplicate SPN's by running setspn -x on affected machines. No results (although I do not understand
     what an SPN is or how it works). I rebuilt the performance counters.
    It tried setting 'sc config winrm type= own' as described in [http://blinditandnetworkadmin.blogspot.nl/2012/08/kb-how-to-troubleshoot-needs-attention.html].
    If I reboot this cc1-hyp-10 machine, it will start working perfectly again. However, then I can't troubleshoot the issue, and it will happen again.
    I want this problem to be solved, so vmm never loses connection to the hypervisors it's managing again!
    Background information:
    We've set up a platform with Hyper-V to run a VM workload. The platform consists of the following hardware:
    2 Dell R620's with 32GB of RAM, running hyper-v to virtualize the cloud management layer (DC's, VMM, SQL). These machines are called cc1-hyp-01 and cc1-hyp-02. They run the management vm's like cc1-dc-01/02, cc1-sql-01, cc1-vmm-01, etc. The names are self-explanatory.
    The VMM machine is NOT clustered.
    8 Dell M620 blades with 320GB of RAM, running hyper-v to virtualize the customer workload. The machines are
    called cc1-hyp-10 until cc1-hyp-17. They are in a cluster.
    2 Equallogic units form a SAN (premium storage), and we have a Dell R515 running iscsi target (budget storage).
    We have Dell Force10 switches and Cisco C3750X switches to connect everything together (mostly 10GB links).
    All hosts run Windows Server 2012R2 Datacenter edition. The VMM server runs System Center Virtual Machine Manage 2012 R2.
    All the latest Windows updates are installed on every host. There are no firewalls between any host (vmm and hypervisors) at this level. Windows firewalls are all disabled. No antivirus software is installed, no symantec software is installed.
    The only non-standard software that is installed is the Dell Host Integration Tools 4.7.1, Dell Openmanage Server Administrator, and some small stuff like 7-zip, bginfo, net-snap, etc.
    The SCVMM service is running under the domain account DOMAINCLOUD1\scvmm. This machine is in the local administrators group of each cluster node.
    On top of this cloud layer we're running the tenant layer with a lot of vm's for a specific customer (although they are all off now).

    I think I found the culprit, after an hour of analyzing wireshark dumps I found the vmm had jumbo frames enabled on the management interface to the hosts (and the underlying infrastructure does not).. Now my winrm commands started working again.

  • Processing in  Multiple Cluster Nodes

    Hi All,
    In our PI system we have 2 Java nodes due to some requirement. When the communication channel runs and we check the message log, in one Cluster node we have a successful message. In other Cluster Node we have an error message that says "File not found".
    The file processing is completeing successfully on one Cluster node. But I wanted to know if there is any way to suppress the processing of the same file by same channel on another Node. Some setting in administration or IB where we can get this done.
    Is there any way to get this done by some setting?
    Thanks,
    Rashmi.

    Hello!
    As per note #801926, please set the clusterSyncMode parameter on Advanced tab of the communication channel with LOCK value.
    And also check the entries 4 and 48 of the FAQ note #821267:
    4. FTP Sender File Processing in Cluster Environment
    48. File System(NFS) File Sender Processing in Cluster Environment
    Best regards,
    Lucas

  • Services not starting after a node crash

    hi
    We have a 3 node cluster and one of the nodes crashed today, also the services did not get relocated to the other node and when we try to manullay stop/start/relocate the service we get the following error
    srvctl stop service -d BCB -s BCB_J2EE -f
    PRCD-1085 : Failed to stop service BCB_J2EE
    PRCR-1065 : Failed to stop resource ora.BCB.BCB_j2ee.svc
    CRS-2533: Server 'bcb528' is down. Unable to perform the operation on 'ora.BCB.BCB_j2ee.svc'
    Would anyone has seen this before
    Thx
    JJ

    this is what i can find in log
    [   CRSPE][60] Server [bcb528] is unreachable. Stopping the sequencer for: bcbCRON 1 1
    2011-02-28 08:15:21.778: [   CRSPE][60] Sequencer for [bcbCRON 1 1] has completed with error: CRS-2533: Server 'bcb528' is down. Unable to pe
    rform the operation on 'bcbCRON'
    2011-02-28 08:15:21.778: [   CRSPE][60] Required instruction failed in op: START of [bcbCRON 1 1] on [bcb529] : 105247290
    2011-02-28 08:15:21.781: [UiServer][62] Container [ Name: ORDER
    MESSAGE:
    TextMessage[CRS-2533: Server 'bcb528' is down. Unable to perform the operation on 'bcbCRON']
    MSGTYPE:
    TextMessage[1]
    OBJID:
    TextMessage[bcbCRON 1 1]
    WAIT:
    TextMessage[0]

  • Hyper-V Failover Cluster Node Corruption

    Dear All,
                Some of my nodes are showing abnormal behavior.  They are restarting every now and then.  I had updated the cluster nodes, but all updates were OS specific, there was nothing specific
    with respect to hardware update.
    I have analyzed crash dumps and find out that following is causing the crash:
    page_fault_in_nonpaged_area
    anyone has any idea about this?
    Thanks in advance.

    Hi ,
    What is the OS of the cluster node ?
    Did you try to remove the protection client for troubleshooing ?
    If it is a 2008R2 cluster , please refer to this thread :
    http://social.technet.microsoft.com/Forums/en-US/32ab6a85-6002-4c3c-97ea-27cb1091e9b3/windows-cluster-server-is-getting-restarted?forum=winservergen
    Hope it helps
    Best Regards
    Elton Ji
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • Hyper-V Guest Cluster Node Failing Regularly

    Hi,
    We currently have a 4-node Server 2012 R2 Cluster witch hosts among other things, a 3 node Guest Cluster running a single clustered file service.  
    Around once a week, the guest cluster node that is currently hosting the clustered file service will fail.  It's as if the VM is blue screening.  That in itself is fairly anoying and I'll be doing all the updates and checking event log for clues
    as to the cause.  
    The problem then is that whichever physical cluster node that is hosting the VM when it fails,  will not unlock some of the VM's files.  The Virtual machine configuration lists as Online Pending.  This means that the failed VM cannot be restarted
    on any other cluster node.  The only fix is to drain the physical host it failed on, and reboot. 
    Looking for suggestions on how to fix the following.
    1. Crashing guest file cluster node
    2. Failed VM with shared VHDX requiring Phyiscal host reboot.
    Event messages for the physical host that was hosting the failed vm in order that they occured.
    Hyper-V-Worker: Event ID 18590 - 'FS-03' has encountered a fatal error.  The guest operating system reported that it failed with the following error codes: ErrorCode0: 0x9E, ErrorCode1: 0x6C2A17C0, ErrorCode2: 0x3C, ErrorCode3: 0xA, ErrorCode4:
    0x0.  If the problem persists, contact Product Support for the guest operating system.  (Virtual machine ID 36166B47-D003-4E51-AFB5-7B967A3EFD2D)
    FailoverClustering: Event ID 1069 - Cluster resource 'Virtual Machine FS-03' of type 'Virtual Machine' in clustered role 'FS-03' failed.
    Hyper-V-High-Availability: Event ID 21128 - 'Virtual Machine FS-03' failed to shutdown the virtual machine during the resource termination. The virtual machine will be forcefully stopped.
    Hyper-V-High-Availability: Event ID 21110 - 'Virtual Machine FS-03' failed to terminate.
    Hyper-V-VMMS: Event ID 20108 - The Virtual Machine Management Service failed to start the virtual machine '36166B47-D003-4E51-AFB5-7B967A3EFD2D': The group or resource is not in the correct state to perform the requested operation. (0x8007139F).
    Hyper-V-High-Availability: Event ID 21107 - 'Virtual Machine FS-03' failed to start.
    FailoverClustering: Event ID 1205 - The Cluster service failed to bring clustered role 'FS-03' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

    Hi,
    I don’t found the similar issue, Does your cluster can pass the cluster validation? Does all your Hyper-V host compatible with Server 2012r2? Have you try to disable all your
    AV soft and firewall? Please rerun Storage validation on the Cluster in non-production hours, the cluster validation report will quickly locate the issue.
    More information:
    Cluster
    http://technet.microsoft.com/en-us/library/dd581778(v=ws.10).aspx
    Hope this helps.
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • ASM disk busy 99% only on one cluster node

    Hello,
    We have a three node Oracle RAC cluster. Our dba(s) called us and said they are getting OEM critical alers for an asm disk on one node only. I checked and the SAN attached drive does not show the same high utilization on either of the other two nodes. I checked the hardware and it seems fine. If the issue was with the SAN attached disk, we would be seeing the same errors on all three nodes since they share the same disks. The system crashed last week(alert dump in the +asm directories), and at the disk has been busy ever since. I asked if the dba reviewed the ADDM reports and he said he had and that there were no suspicious looking entries that would lead us to the root cause based on those reports. CPU utilization is fine. I am not sure where to look at this point and any help pointing me in the right direction would be appreciated. They do use RMAN, could there be a backup running using those disks only on one node? Has anyone ever seen this before?
    Thank you,
    Benita Ulisano
    Unix/SAN Team
    Chicago Public Schools
    [email protected]

    Hi Harish,
    Thank you for responding. To answer your question, yes, the disks are all of the same spec and are shared among the three cluster node. The asm disk sdw1 is the one with the issue.
    Problem Node: coefsdb02
    three nodes in RAC cluster
    coefsdb01, coefsdb02, coefsdb03
    iostat results for all three nodes - same disk
    coefsdb01
    Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
    sdw1 0.00 1.71 0.12 0.58 1.27 18.78 28.63 0.01 13.38 1.75 0.12
    coefsdb02
    sdw1 0.11 0.02 4.00 0.62 305.84 21.72 70.93 2.96 12.58 211.95 97.88
    coefdb03
    sdw1 0.21 0.01 4.70 0.33 224.05 13.52 47.22 0.05 10.11 6.15 3.09
    The dba(s) run RMAN backups, but only on coefsdb01.
    Benita

  • After reboot cluster node went into maintanance mode (CONTROL-D)

    Hi there!
    I have configured 2 node cluster on 2 x SUN Enterprise 220R and StoreEdge D1000.
    Each time when rebooted any of the cluster nodes i get the following error during boot up:
    The / file system (/dev/rdsk/c0t1d0s0) is being checked.
    /dev/rdsk/c0t1d0s0: UNREF DIR I=35540 OWNER=root MODE=40755
    /dev/rdsk/c0t1d0s0: SIZE=512 MTIME=Jun 5 15:02 2006 (CLEARED)
    /dev/rdsk/c0t1d0s0: UNREF FILE I=1192311 OWNER=root MODE=100600
    /dev/rdsk/c0t1d0s0: SIZE=96 MTIME=Jun 5 13:23 2006 (RECONNECTED)
    /dev/rdsk/c0t1d0s0: LINK COUNT FILE I=1192311 OWNER=root MODE=100600
    /dev/rdsk/c0t1d0s0: SIZE=96 MTIME=Jun 5 13:23 2006 COUNT 0 SHOULD BE 1
    /dev/rdsk/c0t1d0s0: LINK COUNT INCREASING
    /dev/rdsk/c0t1d0s0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
    In maintanance mode i do:
    # fsck -y -F ufs /dev/rdsk/c0t1d0s0
    and it managed to correct the problem ... but problem occured again after each reboot on each cluster node!
    I have installed Sun CLuster 3.1 on Solaris 9 SPARC
    How can i get rid of it?
    Any ideas?
    Brgds,
    Sergej

    Hi i get this:
    112941-09 SunOS 5.9: sysidnet Utility Patch
    116755-01 SunOS 5.9: usr/snadm/lib/libadmutil.so.2 Patch
    113434-30 SunOS 5.9: /usr/snadm/lib Library and Differential Flash Patch
    112951-13 SunOS 5.9: patchadd and patchrm Patch
    114711-03 SunOS 5.9: usr/sadm/lib/diskmgr/VDiskMgr.jar Patch
    118064-04 SunOS 5.9: Admin Install Project Manager Client Patch
    113742-01 SunOS 5.9: smcpreconfig.sh Patch
    113813-02 SunOS 5.9: Gnome Integration Patch
    114501-01 SunOS 5.9: drmproviders.jar Patch
    112943-09 SunOS 5.9: Volume Management Patch
    113799-01 SunOS 5.9: solregis Patch
    115697-02 SunOS 5.9: mtmalloc lib Patch
    113029-06 SunOS 5.9: libaio.so.1 librt.so.1 and abi_libaio.so.1 Patch
    113981-04 SunOS 5.9: devfsadm Patch
    116478-01 SunOS 5.9: usr platform links Patch
    112960-37 SunOS 5.9: patch libsldap ldap_cachemgr libldap
    113332-07 SunOS 5.9: libc_psr.so.1 Patch
    116500-01 SunOS 5.9: SVM auto-take disksets Patch
    114349-04 SunOS 5.9: sbin/dhcpagent Patch
    120441-03 SunOS 5.9: libsec patch
    114344-19 SunOS 5.9: kernel/drv/arp Patch
    114373-01 SunOS 5.9: UMEM - abi_libumem.so.1 patch
    118558-27 SunOS 5.9: Kernel Patch
    115675-01 SunOS 5.9: /usr/lib/liblgrp.so Patch
    112958-04 SunOS 5.9: patch pci.so
    113451-11 SunOS 5.9: IKE Patch
    112920-02 SunOS 5.9: libipp Patch
    114372-01 SunOS 5.9: UMEM - llib-lumem patch
    116229-01 SunOS 5.9: libgen Patch
    116178-01 SunOS 5.9: libcrypt Patch
    117453-01 SunOS 5.9: libwrap Patch
    114131-03 SunOS 5.9: multi-terabyte disk support - libadm.so.1 patch
    118465-02 SunOS 5.9: rcm_daemon Patch
    113490-04 SunOS 5.9: Audio Device Driver Patch
    114926-02 SunOS 5.9: kernel/drv/audiocs Patch
    113318-25 SunOS 5.9: patch /kernel/fs/nfs and /kernel/fs/sparcv9/nfs
    113070-01 SunOS 5.9: ftp patch
    114734-01 SunOS 5.9: /usr/ccs/bin/lorder Patch
    114227-01 SunOS 5.9: yacc Patch
    116546-07 SunOS 5.9: CDRW DVD-RW DVD+RW Patch
    119494-01 SunOS 5.9: mkisofs patch
    113471-09 SunOS 5.9: truss Patch
    114718-05 SunOS 5.9: usr/kernel/fs/pcfs Patch
    115545-01 SunOS 5.9: nss_files patch
    115544-02 SunOS 5.9: nss_compat patch
    118463-01 SunOS 5.9: du Patch
    116016-03 SunOS 5.9: /usr/sbin/logadm patch
    115542-02 SunOS 5.9: nss_user patch
    116014-06 SunOS 5.9: /usr/sbin/usermod patch
    116012-02 SunOS 5.9: ps utility patch
    117433-02 SunOS 5.9: FSS FX RT Patch
    117431-01 SunOS 5.9: nss_nis Patch
    115537-01 SunOS 5.9: /kernel/strmod/ptem patch
    115336-03 SunOS 5.9: /usr/bin/tar, /usr/sbin/static/tar Patch
    117426-03 SunOS 5.9: ctsmc and sc_nct driver patch
    121319-01 SunOS 5.9: devfsadmd_mod.so Patch
    121316-01 SunOS 5.9: /kernel/sys/doorfs Patch
    121314-01 SunOS 5.9: tl driver patch
    116554-01 SunOS 5.9: semsys Patch
    112968-01 SunOS 5.9: patch /usr/bin/renice
    116552-01 SunOS 5.9: su Patch
    120445-01 SunOS 5.9: Toshiba platform token links (TSBW,Ultra-3i)
    112964-15 SunOS 5.9: /usr/bin/ksh Patch
    112839-08 SunOS 5.9: patch libthread.so.1
    115687-02 SunOS 5.9:/var/sadm/install/admin/default Patch
    115685-01 SunOS 5.9: sbin/netstrategy Patch
    115488-01 SunOS 5.9: patch /kernel/misc/busra
    115681-01 SunOS 5.9: usr/lib/fm/libdiagcode.so.1 Patch
    113032-03 SunOS 5.9: /usr/sbin/init Patch
    113031-03 SunOS 5.9: /usr/bin/edit Patch
    114259-02 SunOS 5.9: usr/sbin/psrinfo Patch
    115878-01 SunOS 5.9: /usr/bin/logger Patch
    116543-04 SunOS 5.9: vmstat Patch
    113580-01 SunOS 5.9: mount Patch
    115671-01 SunOS 5.9: mntinfo Patch
    113977-01 SunOS 5.9: awk/sed pkgscripts Patch
    122716-01 SunOS 5.9: kernel/fs/lofs patch
    113973-01 SunOS 5.9: adb Patch
    122713-01 SunOS 5.9: expr patch
    117168-02 SunOS 5.9: mpstat Patch
    116498-02 SunOS 5.9: bufmod Patch
    113576-01 SunOS 5.9: /usr/bin/dd Patch
    116495-03 SunOS 5.9: specfs Patch
    117160-01 SunOS 5.9: /kernel/misc/krtld patch
    118586-01 SunOS 5.9: cp/mv/ln Patch
    120025-01 SunOS 5.9: ipsecconf Patch
    116527-02 SunOS 5.9: timod Patch
    117155-08 SunOS 5.9: pcipsy Patch
    114235-01 SunOS 5.9: libsendfile.so.1 Patch
    117152-01 SunOS 5.9: magic Patch
    116486-03 SunOS 5.9: tsalarm Driver Patch
    121998-01 SunOS 5.9: two-key mode fix for 3DES Patch
    116484-01 SunOS 5.9: consconfig Patch
    116482-02 SunOS 5.9: modload Utils Patch
    117746-04 SunOS 5.9: patch platform/sun4u/kernel/drv/sparcv9/pic16f819
    121992-01 SunOS 5.9: fgrep Patch
    120768-01 SunOS 5.9: grpck patch
    119438-01 SunOS 5.9: usr/bin/login Patch
    114389-03 SunOS 5.9: devinfo Patch
    116510-01 SunOS 5.9: wscons Patch
    114224-05 SunOS 5.9: csh Patch
    116670-04 SunOS 5.9: gld Patch
    114383-03 SunOS 5.9: Enchilada/Stiletto - pca9556 driver
    116506-02 SunOS 5.9: traceroute patch
    112919-01 SunOS 5.9: netstat Patch
    112918-01 SunOS 5.9: route Patch
    112917-01 SunOS 5.9: ifrt Patch
    117132-01 SunOS 5.9: cachefsstat Patch
    114370-04 SunOS 5.9: libumem.so.1 patch
    114010-02 SunOS 5.9: m4 Patch
    117129-01 SunOS 5.9: adb Patch
    117483-01 SunOS 5.9: ntwdt Patch
    114369-01 SunOS 5.9: prtvtoc patch
    117125-02 SunOS 5.9: procfs Patch
    117480-01 SunOS 5.9: pkgadd Patch
    112905-02 SunOS 5.9: ippctl Patch
    117123-06 SunOS 5.9: wanboot Patch
    115030-03 SunOS 5.9: Multiterabyte UFS - patch mount
    114004-01 SunOS 5.9: sed Patch
    113335-03 SunOS 5.9: devinfo Patch
    113495-05 SunOS 5.9: cfgadm Library Patch
    113494-01 SunOS 5.9: iostat Patch
    113493-03 SunOS 5.9: libproc.so.1 Patch
    113330-01 SunOS 5.9: rpcbind Patch
    115028-02 SunOS 5.9: patch /usr/lib/fs/ufs/df
    115024-01 SunOS 5.9: file system identification utilities
    117471-02 SunOS 5.9: fifofs Patch
    118897-01 SunOS 5.9: stc Patch
    115022-03 SunOS 5.9: quota utilities
    115020-01 SunOS 5.9: patch /usr/lib/adb/ml_odunit
    113720-01 SunOS 5.9: rootnex Patch
    114352-03 SunOS 5.9: /etc/inet/inetd.conf Patch
    123056-01 SunOS 5.9: ldterm patch
    116243-01 SunOS 5.9: umountall Patch
    113323-01 SunOS 5.9: patch /usr/sbin/passmgmt
    116049-01 SunOS 5.9: fdfs Patch
    116241-01 SunOS 5.9: keysock Patch
    113480-02 SunOS 5.9: usr/lib/security/pam_unix.so.1 Patch
    115018-01 SunOS 5.9: patch /usr/lib/adb/dqblk
    113277-44 SunOS 5.9: sd and ssd Patch
    117457-01 SunOS 5.9: elfexec Patch
    113110-01 SunOS 5.9: touch Patch
    113077-17 SunOS 5.9: /platform/sun4u/kernal/drv/su Patch
    115006-01 SunOS 5.9: kernel/strmod/kb patch
    113072-07 SunOS 5.9: patch /usr/sbin/format
    113071-01 SunOS 5.9: patch /usr/sbin/acctadm
    116782-01 SunOS 5.9: tun Patch
    114331-01 SunOS 5.9: power Patch
    112835-01 SunOS 5.9: patch /usr/sbin/clinfo
    114927-01 SunOS 5.9: usr/sbin/allocate Patch
    119937-02 SunOS 5.9: inetboot patch
    113467-01 SunOS 5.9: seg_drv & seg_mapdev Patch
    114923-01 SunOS 5.9: /usr/kernel/drv/logindmux Patch
    117443-01 SunOS 5.9: libkvm Patch
    114329-01 SunOS 5.9: /usr/bin/pax Patch
    119929-01 SunOS 5.9: /usr/bin/xargs patch
    113459-04 SunOS 5.9: udp patch
    113446-03 SunOS 5.9: dman Patch
    116009-05 SunOS 5.9: sgcn & sgsbbc patch
    116557-04 SunOS 5.9: sbd Patch
    120241-01 SunOS 5.9: bge: Link & Speed LEDs flash constantly on V20z
    113984-01 SunOS 5.9: iosram Patch
    113220-01 SunOS 5.9: patch /platform/sun4u/kernel/drv/sparcv9/upa64s
    113975-01 SunOS 5.9: ssm Patch
    117165-01 SunOS 5.9: pmubus Patch
    116530-01 SunOS 5.9: bge.conf Patch
    116529-01 SunOS 5.9: smbus Patch
    116488-03 SunOS 5.9: Lights Out Management (lom) patch
    117131-01 SunOS 5.9: adm1031 Patch
    117124-12 SunOS 5.9: platmod, drmach, dr, ngdr, & gptwocfg Patch
    114003-01 SunOS 5.9: bbc driver Patch
    118539-02 SunOS 5.9: schpc Patch
    112837-10 SunOS 5.9: patch /usr/lib/inet/in.dhcpd
    114975-01 SunOS 5.9: usr/lib/inet/dhcp/svcadm/dhcpcommon.jar Patch
    117450-01 SunOS 5.9: ds_SUNWnisplus Patch
    113076-02 SunOS 5.9: dhcpmgr.jar Patch
    113572-01 SunOS 5.9: docbook-to-man.ts Patch
    118472-01 SunOS 5.9: pargs Patch
    122709-01 SunOS 5.9: /usr/bin/dc patch
    113075-01 SunOS 5.9: pmap patch
    113472-01 SunOS 5.9: madv & mpss lib Patch
    115986-02 SunOS 5.9: ptree Patch
    115693-01 SunOS 5.9: /usr/bin/last Patch
    115259-03 SunOS 5.9: patch usr/lib/acct/acctcms
    114564-09 SunOS 5.9: /usr/sbin/in.ftpd Patch
    117441-01 SunOS 5.9: FSSdispadmin Patch
    113046-01 SunOS 5.9: fcp Patch
    118191-01 gtar patch
    114818-06 GNOME 2.0.0: libpng Patch
    117177-02 SunOS 5.9: lib/gss module Patch
    116340-05 SunOS 5.9: gzip and Freeware info files patch
    114339-01 SunOS 5.9: wrsm header files Patch
    122673-01 SunOS 5.9: sockio.h header patch
    116474-03 SunOS 5.9: libsmedia Patch
    117138-01 SunOS 5.9: seg_spt.h
    112838-11 SunOS 5.9: pcicfg Patch
    117127-02 SunOS 5.9: header Patch
    112929-01 SunOS 5.9: RIPv2 Header Patch
    112927-01 SunOS 5.9: IPQos Header Patch
    115992-01 SunOS 5.9: /usr/include/limits.h Patch
    112924-01 SunOS 5.9: kdestroy kinit klist kpasswd Patch
    116231-03 SunOS 5.9: llc2 Patch
    116776-01 SunOS 5.9: mipagent patch
    117420-02 SunOS 5.9: mdb Patch
    117179-01 SunOS 5.9: nfs_dlboot Patch
    121194-01 SunOS 5.9: usr/lib/nfs/statd Patch
    116502-03 SunOS 5.9: mountd Patch
    113331-01 SunOS 5.9: usr/lib/nfs/rquotad Patch
    113281-01 SunOS 5.9: patch /usr/lib/netsvc/yp/ypbind
    114736-01 SunOS 5.9: usr/sbin/nisrestore Patch
    115695-01 SunOS 5.9: /usr/lib/netsvc/yp/yppush Patch
    113321-06 SunOS 5.9: patch sf and socal
    113049-01 SunOS 5.9: luxadm & liba5k.so.2 Patch
    116663-01 SunOS 5.9: ntpdate Patch
    117143-01 SunOS 5.9: xntpd Patch
    113028-01 SunOS 5.9: patch /kernel/ipp/flowacct
    113320-06 SunOS 5.9: patch se driver
    114731-08 SunOS 5.9: kernel/drv/glm Patch
    115667-03 SunOS 5.9: Chalupa platform support Patch
    117428-01 SunOS 5.9: picl Patch
    113327-03 SunOS 5.9: pppd Patch
    114374-01 SunOS 5.9: Perl patch
    115173-01 SunOS 5.9: /usr/bin/sparcv7/gcore /usr/bin/sparcv9/gcore Patch
    114716-02 SunOS 5.9: usr/bin/rcp Patch
    112915-04 SunOS 5.9: snoop Patch
    116778-01 SunOS 5.9: in.ripngd patch
    112916-01 SunOS 5.9: rtquery Patch
    112928-03 SunOS 5.9: in.ndpd Patch
    119447-01 SunOS 5.9: ses Patch
    115354-01 SunOS 5.9: slpd Patch
    116493-01 SunOS 5.9: ProtocolTO.java Patch
    116780-02 SunOS 5.9: scmi2c Patch
    112972-17 SunOS 5.9: patch /usr/lib/libssagent.so.1 /usr/lib/libssasnmp.so.1 mibiisa
    116480-01 SunOS 5.9: IEEE 1394 Patch
    122485-01 SunOS 5.9: 1394 mass storage driver patch
    113716-02 SunOS 5.9: sar & sadc Patch
    115651-02 SunOS 5.9: usr/lib/acct/runacct Patch
    116490-01 SunOS 5.9: acctdusg Patch
    117473-01 SunOS 5.9: fwtmp Patch
    116180-01 SunOS 5.9: geniconvtbl Patch
    114006-01 SunOS 5.9: tftp Patch
    115646-01 SunOS 5.9: libtnfprobe shared library Patch
    113334-03 SunOS 5.9: udfs Patch
    115350-01 SunOS 5.9: ident_udfs.so.1 Patch
    122484-01 SunOS 5.9: preen_md.so.1 patch
    117134-01 SunOS 5.9: svm flasharchive patch
    116472-02 SunOS 5.9: rmformat Patch
    112966-05 SunOS 5.9: patch /usr/sbin/vold
    114229-01 SunOS 5.9: action_filemgr.so.1 Patch
    114335-02 SunOS 5.9: usr/sbin/rmmount Patch
    120443-01 SunOS 5.9: sed core dumps on long lines
    121588-01 SunOS 5.9: /usr/xpg4/bin/awk Patch
    113470-02 SunOS 5.9: winlock Patch
    119211-07 NSS_NSPR_JSS 3.11: NSPR 4.6.1 / NSS 3.11 / JSS 4.2
    118666-05 J2SE 5.0: update 6 patch
    118667-05 J2SE 5.0: update 6 patch, 64bit
    114612-01 SunOS 5.9: ANSI-1251 encodings file errors
    114276-02 SunOS 5.9: Extended Arabic support in UTF-8
    117400-01 SunOS 5.9: ISO8859-6 and ISO8859-8 iconv symlinks
    113584-16 SunOS 5.9: yesstr, nostr nl_langinfo() strings incorrect in S9
    117256-01 SunOS 5.9: Remove old OW Xresources.ow files
    112625-01 SunOS 5.9: Dcam1394 patch
    114600-05 SunOS 5.9: vlan driver patch
    117119-05 SunOS 5.9: Sun Gigabit Ethernet 3.0 driver patch
    117593-04 SunOS 5.9: Manual Page updates for Solaris 9
    112622-19 SunOS 5.9: M64 Graphics Patch
    115953-06 Sun Cluster 3.1: Sun Cluster sccheck patch
    117949-23 Sun Cluster 3.1: Core Patch for Solaris 9
    115081-06 Sun Cluster 3.1: HA-Sun One Web Server Patch
    118627-08 Sun Cluster 3.1: Manageability and Serviceability Agent
    117985-03 SunOS 5.9: XIL 1.4.2 Loadable Pipeline Libraries
    113896-06 SunOS 5.9: en_US.UTF-8 locale patch
    114967-02 SunOS 5.9: FDL patch
    114677-11 SunOS 5.9: International Components for Unicode Patch
    112805-01 CDE 1.5: Help volume patch
    113841-01 CDE 1.5: answerbook patch
    113839-01 CDE 1.5: sdtwsinfo patch
    115713-01 CDE 1.5: dtfile patch
    112806-01 CDE 1.5: sdtaudiocontrol patch
    112804-02 CDE 1.5: sdtname patch
    113244-09 CDE 1.5: dtwm patch
    114312-02 CDE1.5: GNOME/CDE Menu for Solaris 9
    112809-02 CDE:1.5 Media Player (sdtjmplay) patch
    113868-02 CDE 1.5: PDASync patch
    119976-01 CDE 1.5: dtterm patch
    112771-30 Motif 1.2.7 and 2.1.1: Runtime library patch for Solaris 9
    114282-01 CDE 1.5: libDtWidget patch
    113789-01 CDE 1.5: dtexec patch
    117728-01 CDE1.5: dthello patch
    113863-01 CDE 1.5: dtconfig patch
    112812-01 CDE 1.5: dtlp patch
    113861-04 CDE 1.5: dtksh patch
    115972-03 CDE 1.5: dtterm libDtTerm patch
    114654-02 CDE 1.5: SmartCard patch
    117632-01 CDE1.5: sun_at patch for Solaris 9
    113374-02 X11 6.6.1: xpr patch
    118759-01 X11 6.6.1: Font Administration Tools patch
    117577-03 X11 6.6.1: TrueType fonts patch
    116084-01 X11 6.6.1: font patch
    113098-04 X11 6.6.1: X RENDER extension patch
    112787-01 X11 6.6.1: twm patch
    117601-01 X11 6.6.1: libowconfig.so.0 patch
    117663-02 X11 6.6.1: xwd patch
    113764-04 X11 6.6.1: keyboard patch
    113541-02 X11 6.6.1: XKB patch
    114561-01 X11 6.6.1: X splash screen patch
    113513-02 X11 6.6.1: platform support for new hardware
    116121-01 X11 6.4.1: platform support for new hardware
    114602-04 X11 6.6.1: libmpg_psr patch
    Is there a bundle to install or i have to install each patch separatly_?

  • Re-installing Hyper-V 2012 R2 cluster node

    We have a four HP BL460 Gen8 servers acting as a part of Hyper-V Cluster, running Windows Server 2012 R2 Datacenter.
    Storage is provided by two node 3PAR StoreServ 7400.
    All network and fc connections are managed by HP Virtual Connect.
    One of the four nodes crashed during HP SPP upgrade which resulted as non booting OS.
    I managed to get the OS alive by running multiple check disks and by manually restoring registry hives from backup via Windows 7 installation media's recovery console.
    After the recovery there were still some issues with filesystem. Corrupted, orphaned and missing files here and there.
    Now I want to re-install the OS from scratch to make sure everything will work correctly and to avoid any future errors.
    What I need to know is that is the best practice to re-install the OS with new computername, or should I drop the current OS to workgroup, re-install it and join the AD domain with same computer name? I've already evicted the node from Hyper-V cluster
    but the server is still running as a member server on AD.
    Any other things I should take into consideration before doing the re-installation?
    Thanks in advance!

    I agree that after a major problem it is much safer to rebuild the system.  It sounds like you have the node rebuilt, so I would evict it from the cluster and then remove it from the domain. Rebuild it and you can use the same name because those two
    actions will clean up its 'footprints'.
    If the machine were not running, you would still evict the node from the cluster, but you would need to go into Active Directory to delete the computer account.  Then rebuild.
    . : | : . : | : . tim

  • NFS  Cluster

    Hi gurus,
    Is supported to configure more than one resource group that uses the NFS resource?
    That is, to have more than one instance of NFS services.
    Thanks in advance,
    AB

    Yes, absolutely it is supported. Doing so allows you to balance NFS services across the cluster. HOWEVER, there is one rule you must obey and that is that you can only share an NFS mount point from one cluster node at any one time.
    So, suppose you had a set of home directories that you wanted to share from /failover/export/home, then rather than sharing that from both cluster nodes (in two resource groups), you would break the shares up into two pieces. For example, /failover/export/home/a_to_m and /failover/export/home/n_to_z. That way, you can share one from one node and the other from the other node using two separate resource groups.
    One other rule that Solaris Cluster users need to be aware of: You cannot share an HA-NFS mount point from within the cluster to another cluster node. If you do, you risk deadlocks.
    Hope that helps,
    Tim
    ---

  • Cluster node is hung but not killed

    Hello,
    one of two SC3.2 cluster nodes was hung under the heavy load probably due to low memory.
    One of visible symptoms was these error messages:
    Jan 29 17:48:56 node2 genunix: [ID 661778 kern.warning] WARNING: clcomm: memory low: freemem 0xff8
    Jan 29 17:49:15 node2 genunix: [ID 661778 kern.warning] WARNING: clcomm: memory low: freemem 0xff6The problem that the node wasn't killed and the whole cluster (it's an HA NFS Active/Passive configuration) became unfunctional.
    What can be done to prevent such situation?
    TIA,
    -- leon

    Those error messages seem to indicate that at least some part of the system was
    working enough to complain that more memory is needed.
    Can you tell us a bit more about the exact problem you were experiencing? You mention
    that there was heavy load, which indicates perhaps that there was lots of IO going on
    on the system? If so, that is opposite of "hung" which, to me, means that the system is
    not able to perform any useful work at all.
    Perhaps the system was merely very slow, because of lack of memory and very heavy
    load?
    It is possible that the lack of memory is not because of lots of load, but because of a
    bug in the system (a daemon which is leaking memory, perhaps?). However, in that
    case, doing a "prstat" should help you find if that is the case. Otherwise, start with
    memory analysis on you system and try to figure out what it is which is consuming memory.
    Assuming that, as you suggested, the problem is heavy load on the system, read on...
    You mention that HA-NFS is the application running on the system. If you want the system to
    failover in cases of extrerem load and slowness, you can configure HA-NFS to do so by
    reducing its timeouts etc. However, please realize that after the failover to another node,
    (or restart on the local node), the client load would resume, the system would again become
    slow, and you haven't really achieved anything.
    You ask, "What can be done?", i would say adding more memory would be a start. But
    i personally suspect that you are running into a bug in the system somewhere which is
    causing this slowness. If so, let us start with figuring that out by looking closely at the
    system. Do a prstat, note the size of processes on the system and rule out any user level
    processes. Next, look at the filesystem in use by NFS, and make sure that is working
    fine (you are able to create files etc.), look at the CPU/disk usage to rule out "maxed out
    CPU or disk usage" as the cause of the slowness/almost_hungness....
    HTH,
    -ashu

  • Unable to failover the services in active-active cluster node

    Hi,
    i am applying the sp2 patch for sql server 2008 r2 in active-active cluster, we have 3 services in the cluster , node 1 as 2 prefered owner and node 2 as 1 prefered owner, when i try to move the service from node 2 to node1 , i am getting the below errors
    DCOM was unable to communicate with the computer XXXXXXXXX using any of the configured protocols.
    The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server XXXXXXXXX. The target name used was RPCSS/XXXXXX. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal
    name (SPN) is registered on an account other than the account the target service is using. Please ensure that the target SPN is registered on, and only registered on, the account used by the server. This error can also happen when the target service is using
    a different password for the target service account than what the Kerberos Key Distribution Center (KDC) has for the target service account. Please ensure that the service on the server and the KDC are both updated to use the current password. If the server
    name is not fully qualified, and the target domain (XXXXXX) is different from the client domain (XXXXXXX), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server.
    The Cluster service failed to bring clustered service or application 'CHCROCHC045' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
    Cluster resource 'SQL Server (CHCROCHC045)' in clustered service or application 'CHCROCHC045' failed.
    any inputs appreciated to resolve this issue as i could not procedd with patching
    BR
    PGR

    Hi PGR,
    As the issue is more related to Windows Server, I would like to recommend you post the issue in the
    Windows Server forums for better support.
    In addition, below are some article about troubleshooting error ” DCOM was unable to communicate with the computer XXXXXXXXX using any of the configured protocols” for your reference.
    Event ID 10009 — COM Remote Service Availability
    How to troubleshoot DCOM 10009 error logged in system event?
    Thanks,
    Lydia Zhang
    Lydia Zhang
    TechNet Community Support

Maybe you are looking for

  • I can't get Firefox to open a new tab each time I click on a link or favorite. It keeps loading onto the current tab.

    I went to Tools>Options>Tabs and entered a check mark next to the first entry, "Open New Windows in a New Tab Instead" but it still does not open a new tab each time I click on a link or one of my favorites. It used to way back in earlier versions of

  • File upload is not working

    Hi The file upload (any kind static/image etc) is not working. It is giving error "No Data found". This happens only on certain PCs, on other PCs, it works fine. Have any one experience such issue ? I am not sure how to trouble shoot this, as there a

  • UTF-8 file via process chain doesn't load

    Hello all! I've a big problem. I try to load a UTF-8 file via process chain. I have a script which generate the correct name and the process chain will load the file daily. The file format is UTF-8. I've created a info package with the adapter OPEN_D

  • Read web page content via Sockets

    Hey, I have to write a program that will read web page content, using sockets. I got that piece of code and it works for most pages: Socket socket = new Socket(address, 80);           BufferedReader input =                new BufferedReader(         

  • Problems with two external drives

    I apologise for posting this again but this question was posted on the day the support site was being worked on, so just in case it was missed by most I am posting it again. I have since heard of at least 3 people who have problems when using more th