Cluster Node Crashes

I'm not sure this is the proper forum for this post, if it's not please feel free to move it.
The situation I'm facing is this:
My company has clusters setup across North America with our software that utilizes the Oracle database. 90% of the time everything functions exactly as it is supposed to. However, it is the other 10% of sites that I am here to ask about.
Our clusters are setup in a dual-server environment that basically act as a single server. The application runs on one server and the database runs on another, and in the case of problems, either can be failed over to run both sets of services on a single server (basic, I realize). At certain sites we are unable to run services on one of the nodes. When they are run as they are supposed to, every so often (at some sites a matter of minutes/hours, at others it can be a couple weeks) they will BSOD.
I fully understand what the blue screen is. The minidump shows that it's the orafencedrv.sys stop, where the Oracle database shuts down a node after loss of communications in order to prevent corruption of the database. This is a great feature and I'm grateful for it, however it has caused us many headaches in diagnosing what it actually causing the drop in communications.
The interconnect and the public IP are both hooked up over a single switch but they operate on different subnets. Could operating on a single switch be part of the problem?
Could the problem be that the switches are being overloaded with traffic causing temporary packet losses between the two nodes, which I know is enough to have Oracle BSOD a node?
Below I'm posting one of the dumps listed in the CSSD log when the node crashes, hopefully this will provide some sort of information as to what is happening.
If any other information is needed, please feel free to let me know. Thanks for your help in advance.
[    CSSD]2008-10-29 13:30:06.211 [2732] >ERROR: clssnmvDiskKillCheck: Aborting, evicted by node 1, sync 13, stamp 99832890,
[    CSSD]2008-10-29 13:30:06.211 [2732] >ERROR: ###################################
[    CSSD]2008-10-29 13:30:06.211 [2732] >ERROR: clssscExit: CSSD aborting
[    CSSD]2008-10-29 13:30:06.211 [2732] >ERROR: ###################################
[    CSSD]--- DUMP GROCK STATE DB ---
[    CSSD]----------
[    CSSD] type 2, Id 3, Name = (crs_version)
[    CSSD] flags: 0x0
[    CSSD] grant: count=0, type 0, wait 0
[    CSSD] Member Count =2, master 0
[    CSSD] . . . . .
[    CSSD] memberNo =0, seq 5
[    CSSD] flags = 0x0, granted 0
[    CSSD] refCnt = 1
[    CSSD] nodeNum = 2, nodeBirth 6
[    CSSD] privateDataSize = 0
[    CSSD] publicDataSize = 0
[    CSSD] . . . . .
[    CSSD] memberNo =1, seq 11
[    CSSD] flags = 0x0, granted 0
[    CSSD] refCnt = 1
[    CSSD] nodeNum = 1, nodeBirth 12
[    CSSD] privateDataSize = 0
[    CSSD] publicDataSize = 0
[    CSSD]----------
[    CSSD]----------
[    CSSD] type 2, Id 2, Name = (ocr_STLRZOPRCL)
[    CSSD] flags: 0x0
[    CSSD] grant: count=0, type 0, wait 0
[    CSSD] Member Count =2, master 2
[    CSSD] . . . . .
[    CSSD] memberNo =2, seq 5
[    CSSD] flags = 0x0, granted 0
[    CSSD] refCnt = 1
[    CSSD] nodeNum = 2, nodeBirth 6
[    CSSD] privateDataSize = 0
[    CSSD] publicDataSize = 32
[    CSSD] . . . . .
[    CSSD] memberNo =1, seq 11
[    CSSD] flags = 0x0, granted 0
[    CSSD] refCnt = 1
[    CSSD] nodeNum = 1, nodeBirth 12
[    CSSD] privateDataSize = 0
[    CSSD] publicDataSize = 32
[    CSSD]----------
[    CSSD]----------
[    CSSD] type 3, Id 15, Name = (_ORA_CRS_MEMBER_stlrzoprcl1)
[    CSSD] flags: 0x0
[    CSSD] grant: count=1, type 3, wait 1
[    CSSD] Member Count =1, master -3
[    CSSD] . . . . .
[    CSSD] memberNo =0, seq 0
[    CSSD] flags = 0x12, granted 0
[    CSSD] refCnt = 1
[    CSSD] nodeNum = 1, nodeBirth 12
[    CSSD] privateDataSize = 0
[    CSSD] publicDataSize = 0
[    CSSD]----------
[    CSSD]----------
[    CSSD] type 3, Id 15, Name = (_ORA_CRS_MEMBER_stlrzoprcl2)
[    CSSD] flags: 0x0
[    CSSD] grant: count=1, type 3, wait 1
[    CSSD] Member Count =1, master -3
[    CSSD] . . . . .
[    CSSD] memberNo =0, seq 0
[    CSSD] flags = 0x12, granted 1
[    CSSD] refCnt = 1
[    CSSD] nodeNum = 2, nodeBirth 6
[    CSSD] privateDataSize = 0
[    CSSD] publicDataSize = 0
[    CSSD]----------
[    CSSD]----------
[    CSSD] type 2, Id 4, Name = (CRSDMAIN)
[    CSSD] flags: 0x0
[    CSSD] grant: count=0, type 0, wait 0
[    CSSD] Member Count =2, master 2
[    CSSD] . . . . .
[    CSSD] memberNo =2, seq 5
[    CSSD] flags = 0x0, granted 0
[    CSSD] refCnt = 1
[    CSSD] nodeNum = 2, nodeBirth 6
[    CSSD] privateDataSize = 128
[    CSSD] publicDataSize = 128
[    CSSD] . . . . .
[    CSSD] memberNo =1, seq 11
[    CSSD] flags = 0x0, granted 0
[    CSSD] refCnt = 1
[    CSSD] nodeNum = 1, nodeBirth 12
[    CSSD] privateDataSize = 128
[    CSSD] publicDataSize = 128
[    CSSD]----------
[    CSSD]----------
[    CSSD] type 2, Id 1, Name = (EVMDMAIN)
[    CSSD] flags: 0x0
[    CSSD] grant: count=0, type 0, wait 0
[    CSSD] Member Count =2, master 2
[    CSSD] . . . . .
[    CSSD] memberNo =2, seq 5
[    CSSD] flags = 0x0, granted 0
[    CSSD] refCnt = 1
[    CSSD] nodeNum = 2, nodeBirth 6
[    CSSD] privateDataSize = 508
[    CSSD] publicDataSize = 504
[    CSSD] . . . . .
[    CSSD] memberNo =1, seq 11
[    CSSD] flags = 0x0, granted 0
[    CSSD] refCnt = 1
[    CSSD] nodeNum = 1, nodeBirth 12
[    CSSD] privateDataSize = 508
[    CSSD] publicDataSize = 504
[    CSSD]----------
[    CSSD]--- END OF GROCK STATE DUMP ---
[    CSSD]------- End Dump -------

Hi user10508733
Seems to be your first post, welcome to this forum!!
What is the OS (blue screen that should be windows? ) and what is the release of your CRS and RDBMS ? hopefully not 10.1x.x.x, if yes please patch it to 10.2.0.4.
Seems to have a lot of bugs about CRS before 10.2.0.3 see that list
Doc ID:      Note:391116.1
Subject:      10.2.0.3 Patch Set - List of Bug Fixes by Problem Type
let us know what's the result
thanks

Similar Messages

  • NFS cluster node crashed

    Hi all, we have a 2-node cluster running Solaris 10 11/06 and Sun Cluster 3.2.
    Recently, we were asked to nfs mount on node 1 of the cluster, a directory from an external Linux host (ie node 1 of the cluster is the nfs client; the linux server is the nfs server).
    A few days later, early on a Sunday morning, the linux server developed a high load and was very slow to log into. Around the same time, node 1 of the cluster rebooted. Was this reboot of node 1 a coincidence? I'm not sure.
    Anyone got ideas/suggestions about this situation (eg the slow response of the nfs linux server caused node 1 of the cluster to reboot; the external nfs mount is a bad idea)?
    Stewart

    Hi,
    your assumption sounds very unreasonable. But without any hard facts like
    - the panic string
    - contents of /var/adm/messages at time of crash
    - configuration information
    - etc.
    it is impossible to tell.
    Regards
    Hartmut

  • SCVMM losing connection to cluster nodes

    Hey guys'n girls, I hope this is the right forum for this question. I already opened a ticket at MS support as well because it's impacting our production environment indirectly, but even after a week there's been no contact. Losing faith in MS support there
    The problem we're having is that scvmm is that a host enters the 'needs attention' state, with a winrm error 0x80338126. I guess it has something to do with the network or with Kerberos, and I've found some info on it, but I still haven't been able to solve
    it. Do you guys have any ideas?
    Problem summary:
    We are seeing an issue on our new hyper-v platform. The platform should have been in production last week, but this issue is delaying our project as we can't seem to get it stable.
    The problem we are experiencing is that SCVMM loses the connection to some of the Hyper-V nodes. Not one
     specific node. Last week it happened to two nodes, and today it happened to another node. I see issues with WinRM, and I expect something to do with kerberos. See the bottom of this post for background details and software versions.
    The host gets the status 'needs attention', and if you look at the status of the machine, WinRM gives an error. The error is:
    Error (2916)
    VMM is unable to complete the request. The connection to the agent cc1-hyp-10.domaincloud1.local was lost.
    WinRM: URL: [http://cc1-hyp-10.domaincloud1.local:5985], Verb: [ENUMERATE], Resource: [http://schemas.microsoft.com/wbem/wsman/1/wmi/root/cimv2/Win32_Service], Filter: [select * from Win32_Service where Name="WinRM"]
    Unknown error (0x80338126)
    Recommended Action
    Ensure that the Windows Remote Management (WinRM) service and the VMM agent are installed and running and that a firewall is not blocking HTTP/HTTPS traffic. Ensure that VMM server is able to communicate with cc1-hyp-10.domaincloud1.local over WinRM by successfully
    running the following command:
     winrm id –r:cc1-hyp-10.domaincloud1.local
    This
     problem can also be caused by a Windows Management Instrumentation (WMI) service crash. If the server is running Windows Server 2008 R2, ensure that KB 982293 (http://support.microsoft.com/kb/982293)
    is installed on it.
    If the error persists, restart cc1-hyp-10.domaincloud1.local and then try the operation again. /nRefer to
    http://support.microsoft.com/kb/2742275 for more details.
    Doing a simple test from the VMM server to the problematic cluster node shows this error:
    PS C:\> hostname
    CC1-VMM-01
    PS C:\> winrm id -r:cc1-hyp-10.domaincloud1.local
    WSManFault
        Message = WinRM cannot complete the operation. Verify that the specified computer name is valid, that the computer is accessible over the network, and that a firewall exception for the WinRM service is enabled and allows access from this
    computer. By default, the WinRM firewall exception for public profiles limits access to remote computers within the same local subnet.
    Error number:  -2144108250 0x80338126
    WinRM cannot complete the operation. Verify that the specified computer name is valid, that the computer is accessible over the network, and that a firewall exception for the WinRM service is enabled and allows access from this computer. By default, the WinRM
    firewall exception for public profiles limits access to remote computers within the same local subnet.
    I CAN connect from other hosts to this problematic cluster node:
    PS C:\> hostname
    CC1-HYP-16
    PS C:\> winrm id -r:cc1-hyp-10.domaincloud1.local
    IdentifyResponse
        ProtocolVersion =
    http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd
        ProductVendor = Microsoft Corporation
        ProductVersion = OS: 6.3.9600 SP: 0.0 Stack: 3.0
        SecurityProfiles
            SecurityProfileName =
    http://schemas.dmtf.org/wbem/wsman/1/wsman/secprofile/http/spnego-kerberos
    And I can connect from the vmm server to all other cluster nodes:
    PS C:\> hostname
    CC1-VMM-01
    PS C:\> winrm id -r:cc1-hyp-11.domaincloud1.local
    IdentifyResponse
        ProtocolVersion =
    http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd
        ProductVendor = Microsoft Corporation
        ProductVersion = OS: 6.3.9600 SP: 0.0 Stack: 3.0
        SecurityProfiles
            SecurityProfileName =
    http://schemas.dmtf.org/wbem/wsman/1/wsman/secprofile/http/spnego-kerberos
    So at this point only the test from the cc1-vmm-01 to cc1-hyp-10 seems to be problematic.
    I followed the steps in the page
    https://support.microsoft.com/kb/2742275 (which is referred to above). I tried the VMMCA, but it can't really get it working the way I want, or it seems to give outdated recommendations.
    I tried checking for duplicate SPN's by running setspn -x on affected machines. No results (although I do not understand
     what an SPN is or how it works). I rebuilt the performance counters.
    It tried setting 'sc config winrm type= own' as described in [http://blinditandnetworkadmin.blogspot.nl/2012/08/kb-how-to-troubleshoot-needs-attention.html].
    If I reboot this cc1-hyp-10 machine, it will start working perfectly again. However, then I can't troubleshoot the issue, and it will happen again.
    I want this problem to be solved, so vmm never loses connection to the hypervisors it's managing again!
    Background information:
    We've set up a platform with Hyper-V to run a VM workload. The platform consists of the following hardware:
    2 Dell R620's with 32GB of RAM, running hyper-v to virtualize the cloud management layer (DC's, VMM, SQL). These machines are called cc1-hyp-01 and cc1-hyp-02. They run the management vm's like cc1-dc-01/02, cc1-sql-01, cc1-vmm-01, etc. The names are self-explanatory.
    The VMM machine is NOT clustered.
    8 Dell M620 blades with 320GB of RAM, running hyper-v to virtualize the customer workload. The machines are
    called cc1-hyp-10 until cc1-hyp-17. They are in a cluster.
    2 Equallogic units form a SAN (premium storage), and we have a Dell R515 running iscsi target (budget storage).
    We have Dell Force10 switches and Cisco C3750X switches to connect everything together (mostly 10GB links).
    All hosts run Windows Server 2012R2 Datacenter edition. The VMM server runs System Center Virtual Machine Manage 2012 R2.
    All the latest Windows updates are installed on every host. There are no firewalls between any host (vmm and hypervisors) at this level. Windows firewalls are all disabled. No antivirus software is installed, no symantec software is installed.
    The only non-standard software that is installed is the Dell Host Integration Tools 4.7.1, Dell Openmanage Server Administrator, and some small stuff like 7-zip, bginfo, net-snap, etc.
    The SCVMM service is running under the domain account DOMAINCLOUD1\scvmm. This machine is in the local administrators group of each cluster node.
    On top of this cloud layer we're running the tenant layer with a lot of vm's for a specific customer (although they are all off now).

    I think I found the culprit, after an hour of analyzing wireshark dumps I found the vmm had jumbo frames enabled on the management interface to the hosts (and the underlying infrastructure does not).. Now my winrm commands started working again.

  • Services not starting after a node crash

    hi
    We have a 3 node cluster and one of the nodes crashed today, also the services did not get relocated to the other node and when we try to manullay stop/start/relocate the service we get the following error
    srvctl stop service -d BCB -s BCB_J2EE -f
    PRCD-1085 : Failed to stop service BCB_J2EE
    PRCR-1065 : Failed to stop resource ora.BCB.BCB_j2ee.svc
    CRS-2533: Server 'bcb528' is down. Unable to perform the operation on 'ora.BCB.BCB_j2ee.svc'
    Would anyone has seen this before
    Thx
    JJ

    this is what i can find in log
    [   CRSPE][60] Server [bcb528] is unreachable. Stopping the sequencer for: bcbCRON 1 1
    2011-02-28 08:15:21.778: [   CRSPE][60] Sequencer for [bcbCRON 1 1] has completed with error: CRS-2533: Server 'bcb528' is down. Unable to pe
    rform the operation on 'bcbCRON'
    2011-02-28 08:15:21.778: [   CRSPE][60] Required instruction failed in op: START of [bcbCRON 1 1] on [bcb529] : 105247290
    2011-02-28 08:15:21.781: [UiServer][62] Container [ Name: ORDER
    MESSAGE:
    TextMessage[CRS-2533: Server 'bcb528' is down. Unable to perform the operation on 'bcbCRON']
    MSGTYPE:
    TextMessage[1]
    OBJID:
    TextMessage[bcbCRON 1 1]
    WAIT:
    TextMessage[0]

  • Hyper-V Failover Cluster Node Corruption

    Dear All,
                Some of my nodes are showing abnormal behavior.  They are restarting every now and then.  I had updated the cluster nodes, but all updates were OS specific, there was nothing specific
    with respect to hardware update.
    I have analyzed crash dumps and find out that following is causing the crash:
    page_fault_in_nonpaged_area
    anyone has any idea about this?
    Thanks in advance.

    Hi ,
    What is the OS of the cluster node ?
    Did you try to remove the protection client for troubleshooing ?
    If it is a 2008R2 cluster , please refer to this thread :
    http://social.technet.microsoft.com/Forums/en-US/32ab6a85-6002-4c3c-97ea-27cb1091e9b3/windows-cluster-server-is-getting-restarted?forum=winservergen
    Hope it helps
    Best Regards
    Elton Ji
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • Hyper-V Guest Cluster Node Failing Regularly

    Hi,
    We currently have a 4-node Server 2012 R2 Cluster witch hosts among other things, a 3 node Guest Cluster running a single clustered file service.  
    Around once a week, the guest cluster node that is currently hosting the clustered file service will fail.  It's as if the VM is blue screening.  That in itself is fairly anoying and I'll be doing all the updates and checking event log for clues
    as to the cause.  
    The problem then is that whichever physical cluster node that is hosting the VM when it fails,  will not unlock some of the VM's files.  The Virtual machine configuration lists as Online Pending.  This means that the failed VM cannot be restarted
    on any other cluster node.  The only fix is to drain the physical host it failed on, and reboot. 
    Looking for suggestions on how to fix the following.
    1. Crashing guest file cluster node
    2. Failed VM with shared VHDX requiring Phyiscal host reboot.
    Event messages for the physical host that was hosting the failed vm in order that they occured.
    Hyper-V-Worker: Event ID 18590 - 'FS-03' has encountered a fatal error.  The guest operating system reported that it failed with the following error codes: ErrorCode0: 0x9E, ErrorCode1: 0x6C2A17C0, ErrorCode2: 0x3C, ErrorCode3: 0xA, ErrorCode4:
    0x0.  If the problem persists, contact Product Support for the guest operating system.  (Virtual machine ID 36166B47-D003-4E51-AFB5-7B967A3EFD2D)
    FailoverClustering: Event ID 1069 - Cluster resource 'Virtual Machine FS-03' of type 'Virtual Machine' in clustered role 'FS-03' failed.
    Hyper-V-High-Availability: Event ID 21128 - 'Virtual Machine FS-03' failed to shutdown the virtual machine during the resource termination. The virtual machine will be forcefully stopped.
    Hyper-V-High-Availability: Event ID 21110 - 'Virtual Machine FS-03' failed to terminate.
    Hyper-V-VMMS: Event ID 20108 - The Virtual Machine Management Service failed to start the virtual machine '36166B47-D003-4E51-AFB5-7B967A3EFD2D': The group or resource is not in the correct state to perform the requested operation. (0x8007139F).
    Hyper-V-High-Availability: Event ID 21107 - 'Virtual Machine FS-03' failed to start.
    FailoverClustering: Event ID 1205 - The Cluster service failed to bring clustered role 'FS-03' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

    Hi,
    I don’t found the similar issue, Does your cluster can pass the cluster validation? Does all your Hyper-V host compatible with Server 2012r2? Have you try to disable all your
    AV soft and firewall? Please rerun Storage validation on the Cluster in non-production hours, the cluster validation report will quickly locate the issue.
    More information:
    Cluster
    http://technet.microsoft.com/en-us/library/dd581778(v=ws.10).aspx
    Hope this helps.
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • ASM disk busy 99% only on one cluster node

    Hello,
    We have a three node Oracle RAC cluster. Our dba(s) called us and said they are getting OEM critical alers for an asm disk on one node only. I checked and the SAN attached drive does not show the same high utilization on either of the other two nodes. I checked the hardware and it seems fine. If the issue was with the SAN attached disk, we would be seeing the same errors on all three nodes since they share the same disks. The system crashed last week(alert dump in the +asm directories), and at the disk has been busy ever since. I asked if the dba reviewed the ADDM reports and he said he had and that there were no suspicious looking entries that would lead us to the root cause based on those reports. CPU utilization is fine. I am not sure where to look at this point and any help pointing me in the right direction would be appreciated. They do use RMAN, could there be a backup running using those disks only on one node? Has anyone ever seen this before?
    Thank you,
    Benita Ulisano
    Unix/SAN Team
    Chicago Public Schools
    [email protected]

    Hi Harish,
    Thank you for responding. To answer your question, yes, the disks are all of the same spec and are shared among the three cluster node. The asm disk sdw1 is the one with the issue.
    Problem Node: coefsdb02
    three nodes in RAC cluster
    coefsdb01, coefsdb02, coefsdb03
    iostat results for all three nodes - same disk
    coefsdb01
    Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
    sdw1 0.00 1.71 0.12 0.58 1.27 18.78 28.63 0.01 13.38 1.75 0.12
    coefsdb02
    sdw1 0.11 0.02 4.00 0.62 305.84 21.72 70.93 2.96 12.58 211.95 97.88
    coefdb03
    sdw1 0.21 0.01 4.70 0.33 224.05 13.52 47.22 0.05 10.11 6.15 3.09
    The dba(s) run RMAN backups, but only on coefsdb01.
    Benita

  • Re-installing Hyper-V 2012 R2 cluster node

    We have a four HP BL460 Gen8 servers acting as a part of Hyper-V Cluster, running Windows Server 2012 R2 Datacenter.
    Storage is provided by two node 3PAR StoreServ 7400.
    All network and fc connections are managed by HP Virtual Connect.
    One of the four nodes crashed during HP SPP upgrade which resulted as non booting OS.
    I managed to get the OS alive by running multiple check disks and by manually restoring registry hives from backup via Windows 7 installation media's recovery console.
    After the recovery there were still some issues with filesystem. Corrupted, orphaned and missing files here and there.
    Now I want to re-install the OS from scratch to make sure everything will work correctly and to avoid any future errors.
    What I need to know is that is the best practice to re-install the OS with new computername, or should I drop the current OS to workgroup, re-install it and join the AD domain with same computer name? I've already evicted the node from Hyper-V cluster
    but the server is still running as a member server on AD.
    Any other things I should take into consideration before doing the re-installation?
    Thanks in advance!

    I agree that after a major problem it is much safer to rebuild the system.  It sounds like you have the node rebuilt, so I would evict it from the cluster and then remove it from the domain. Rebuild it and you can use the same name because those two
    actions will clean up its 'footprints'.
    If the machine were not running, you would still evict the node from the cluster, but you would need to go into Active Directory to delete the computer account.  Then rebuild.
    . : | : . : | : . tim

  • Node crashes when enabling RDS for private interconnect.

    OS: oel6.3 - 2.6.39-300.17.2.el6uek.x86_64
    Grid and DB: 11.2.0.3.4
    This is a two node Standard Edition cluster.
    The node crashes upon restart of clusterware after following the instructions from note:751343.1 (RAC Support for RDS Over Infiniband) to enable RDS.
    The cluster is running fine using ipoib for the cluster_interconnect.
    1) As the ORACLE_HOME/GI_HOME owner, stop all resources (database, listener, ASM etc) that's running from the home. When stopping database, use NORMAL or IMMEDIATE option.
    2) As root, if relinking 11gR2 Grid Infrastructure (GI) home, unlock GI home: GI_HOME/crs/install/rootcrs.pl -unlock
    3) As the ORACLE_HOME/GI_HOME owner, go to ORACLE_HOME/GI_HOME and cd to rdbms/lib
    4) As the ORACLE_HOME/GI_HOME owner, issue "make -f ins_rdbms.mk ipc_rds ioracle"
    5) As root, if relinking 11gR2 Grid Infrastructure (GI) home, lock GI home: GI_HOME/crs/install/rootcrs.pl -patch
    Looks to abend when asm tries to start with the message below on the console.
    I have a service request open for this issue but, I am hoping someone may have seen this and has
    some way around it.
    Thanks
    Alan
    kernel BUG at net/rds/ib_send.c:547!
    invalid opcode: 0000 [#1] SMP
    CPU 2
    Modules linked in: 8021q garp stp llc iptable_filter ip_tables nfs lockd
    fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand powernow_k8
    freq_table mperf rds_rdma rds_tcp rds ib_ipoib rdma_ucm ib_ucm ib_uverbs
    ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa sr_mod cdrom microcode
    serio_raw pcspkr ghes hed k10temp hwmon amd64_edac_mod edac_core
    edac_mce_amd i2c_piix4 i2c_core sg igb dca mlx4_ib ib_mad ib_core
    mlx4_en mlx4_core ext4 mbcache jbd2 usb_storage sd_mod crc_t10dif ahci
    libahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
    scsi_wait_scan]
    Pid: 4140, comm: kworker/u:1 Not tainted 2.6.39-300.17.2.el6uek.x86_64
    #1 Supermicro BHDGT/BHDGT
    RIP: 0010:[<ffffffffa02db829>] [<ffffffffa02db829>]
    rds_ib_xmit+0xa69/0xaf0 [rds_rdma]
    RSP: 0018:ffff880fb84a3c50 EFLAGS: 00010202
    RAX: ffff880fbb694000 RBX: ffff880fb3e4e600 RCX: 0000000000000000
    RDX: 0000000000000030 RSI: ffff880fbb6c3a00 RDI: ffff880fb058a048
    RBP: ffff880fb84a3d30 R08: 0000000000000fd0 R09: ffff880fbb6c3b90
    R10: 0000000000000000 R11: 000000000000001a R12: ffff880fbb6c3a00
    R13: ffff880fbb6c3a00 R14: 0000000000000000 R15: ffff880fb84a3d90
    FS: 00007fd0a3a56700(0000) GS:ffff88101e240000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000002158ca2 CR3: 0000000001783000 CR4: 00000000000406e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process kworker/u:1 (pid: 4140, threadinfo ffff880fb84a2000, task
    ffff880fae970180)
    Stack:
    0000000000012200 0000000000012200 ffff880f00000000 0000000000000000
    000000000000e5b0 ffffffff8115af81 ffffffff81b8d6c0 ffffffffa02b2e12
    00000001bf272240 ffffffff81267020 ffff880fbb6c3a00 0000003000000002
    Call Trace:
    [<ffffffff8115af81>] ? __kmalloc+0x1f1/0x200
    [<ffffffffa02b2e12>] ? rds_message_alloc+0x22/0x90 [rds]
    [<ffffffff81267020>] ? sg_init_table+0x30/0x50
    [<ffffffffa02b2db2>] ? rds_message_alloc_sgs+0x62/0xa0 [rds]
    [<ffffffffa02b31e4>] ? rds_message_map_pages+0xa4/0x110 [rds]
    [<ffffffffa02b4f3b>] rds_send_xmit+0x38b/0x6e0 [rds]
    [<ffffffff81089d53>] ? cwq_activate_first_delayed+0x53/0x100
    [<ffffffffa02b6040>] ? rds_recv_worker+0xc0/0xc0 [rds]
    [<ffffffffa02b6075>] rds_send_worker+0x35/0xc0 [rds]
    [<ffffffff81089fd6>] process_one_work+0x136/0x450
    [<ffffffff8108bbe0>] worker_thread+0x170/0x3c0
    [<ffffffff8108ba70>] ? manage_workers+0x120/0x120
    [<ffffffff810907e6>] kthread+0x96/0xa0
    [<ffffffff81515544>] kernel_thread_helper+0x4/0x10
    [<ffffffff81090750>] ? kthread_worker_fn+0x1a0/0x1a0
    [<ffffffff81515540>] ? gs_change+0x13/0x13
    Code: ff ff e9 b1 fe ff ff 48 8b 0d b4 54 4b e1 48 89 8d 70 ff ff ff e9
    71 ff ff ff 83 bd 7c ff ff ff 00 0f 84 f4 f5 ff ff 0f 0b eb fe <0f> 0b
    eb fe 44 8b 8d 48 ff ff ff 41 b7 01 e9 51 f6 ff ff 0f 0b
    RIP [<ffffffffa02db829>] rds_ib_xmit+0xa69/0xaf0 [rds_rdma]
    RSP <ffff880fb84a3c50>
    Initializing cgroup subsys cpuset
    Initializing cgroup subsys cpu
    Linux version 2.6.39-300.17.2.el6uek.x86_64
    ([email protected]) (gcc version 4.4.6 20110731 (Red
    Hat 4.4.6-3) (GCC) ) #1 SMP Wed Nov 7 17:48:36 PST 2012
    Command line: ro root=UUID=5ad1a268-b813-40da-bb76-d04895215677
    rd_DM_UUID=ddf1_stor rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD
    SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us numa=off
    console=ttyS1,115200n8 irqpoll maxcpus=1 nr_cpus=1 reset_devices
    cgroup_disable=memory mce=off memmap=exactmap memmap=538K@64K
    memmap=130508K@770048K elfcorehdr=900556K memmap=72K#3668608K
    memmap=184K#3668680K
    BIOS-provided physical RAM map:
    BIOS-e820: 0000000000000100 - 0000000000096800 (usable)
    BIOS-e820: 0000000000096800 - 00000000000a0000 (reserved)
    BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)
    BIOS-e820: 0000000000100000 - 00000000dfe90000 (usable)
    BIOS-e820: 00000000dfe9e000 - 00000000dfea0000 (reserved)
    BIOS-e820: 00000000dfea0000 - 00000000dfeb2000 (ACPI data)
    BIOS-e820: 00000000dfeb2000 - 00000000dfee0000 (ACPI NVS)
    BIOS-e820: 00000000dfee0000 - 00000000f0000000 (reserved)
    BIOS-e820: 00000000ffe00000 - 0000000100000000 (reserved)

    I believe OFED version is 1.5.3.3 but I am not sure if this is correct.
    We have not added any third parry drivers. All that has been done to add infiniband to our build is
    a yum groupinstall iInfiniband support.
    I have not tries rds-stress but rds-ping works fine and rds-info seems fine.
    A service request has been opened but so far I have had better response here.
    oracle@blade1-6:~> rds-info
    RDS IB Connections:
    LocalAddr RemoteAddr LocalDev RemoteDev
    10.10.0.116 10.10.0.119 fe80::25:90ff:ff07:df1d fe80::25:90ff:ff07:e0e5
    TCP Connections:
    LocalAddr LPort RemoteAddr RPort HdrRemain DataRemain SentNxt ExpectUna SeenUna
    Counters:
    CounterName Value
    conn_reset 5
    recv_drop_bad_checksum 0
    recv_drop_old_seq 0
    recv_drop_no_sock 1
    recv_drop_dead_sock 0
    recv_deliver_raced 0
    recv_delivered 18
    recv_queued 18
    recv_immediate_retry 0
    recv_delayed_retry 0
    recv_ack_required 4
    recv_rdma_bytes 0
    recv_ping 14
    send_queue_empty 18
    send_queue_full 0
    send_lock_contention 0
    send_lock_queue_raced 0
    send_immediate_retry 0
    send_delayed_retry 0
    send_drop_acked 0
    send_ack_required 3
    send_queued 32
    send_rdma 0
    send_rdma_bytes 0
    send_pong 14
    page_remainder_hit 0
    page_remainder_miss 0
    copy_to_user 0
    copy_from_user 0
    cong_update_queued 0
    cong_update_received 1
    cong_send_error 0
    cong_send_blocked 0
    ib_connect_raced 4
    ib_listen_closed_stale 0
    ib_tx_cq_call 6
    ib_tx_cq_event 6
    ib_tx_ring_full 0
    ib_tx_throttle 0
    ib_tx_sg_mapping_failure 0
    ib_tx_stalled 16
    ib_tx_credit_updates 0
    ib_rx_cq_call 33
    ib_rx_cq_event 38
    ib_rx_ring_empty 0
    ib_rx_refill_from_cq 0
    ib_rx_refill_from_thread 0
    ib_rx_alloc_limit 0
    ib_rx_credit_updates 0
    ib_ack_sent 4
    ib_ack_send_failure 0
    ib_ack_send_delayed 0
    ib_ack_send_piggybacked 0
    ib_ack_received 3
    ib_rdma_mr_alloc 0
    ib_rdma_mr_free 0
    ib_rdma_mr_used 0
    ib_rdma_mr_pool_flush 8
    ib_rdma_mr_pool_wait 0
    ib_rdma_mr_pool_depleted 0
    ib_atomic_cswp 0
    ib_atomic_fadd 0
    iw_connect_raced 0
    iw_listen_closed_stale 0
    iw_tx_cq_call 0
    iw_tx_cq_event 0
    iw_tx_ring_full 0
    iw_tx_throttle 0
    iw_tx_sg_mapping_failure 0
    iw_tx_stalled 0
    iw_tx_credit_updates 0
    iw_rx_cq_call 0
    iw_rx_cq_event 0
    iw_rx_ring_empty 0
    iw_rx_refill_from_cq 0
    iw_rx_refill_from_thread 0
    iw_rx_alloc_limit 0
    iw_rx_credit_updates 0
    iw_ack_sent 0
    iw_ack_send_failure 0
    iw_ack_send_delayed 0
    iw_ack_send_piggybacked 0
    iw_ack_received 0
    iw_rdma_mr_alloc 0
    iw_rdma_mr_free 0
    iw_rdma_mr_used 0
    iw_rdma_mr_pool_flush 0
    iw_rdma_mr_pool_wait 0
    iw_rdma_mr_pool_depleted 0
    tcp_data_ready_calls 0
    tcp_write_space_calls 0
    tcp_sndbuf_full 0
    tcp_connect_raced 0
    tcp_listen_closed_stale 0
    RDS Sockets:
    BoundAddr BPort ConnAddr CPort SndBuf RcvBuf Inode
    0.0.0.0 0 0.0.0.0 0 131072 131072 340441
    RDS Connections:
    LocalAddr RemoteAddr NextTX NextRX Flg
    10.10.0.116 10.10.0.119 33 38 --C
    Receive Message Queue:
    LocalAddr LPort RemoteAddr RPort Seq Bytes
    Send Message Queue:
    LocalAddr LPort RemoteAddr RPort Seq Bytes
    Retransmit Message Queue:
    LocalAddr LPort RemoteAddr RPort Seq Bytes
    10.10.0.116 0 10.10.0.119 40549 32 0
    oracle@blade1-6:~> cat /etc/rdma/rdma.conf
    # Load IPoIB
    IPOIB_LOAD=yes
    # Load SRP module
    SRP_LOAD=no
    # Load iSER module
    ISER_LOAD=no
    # Load RDS network protocol
    RDS_LOAD=yes
    # Should we modify the system mtrr registers? We may need to do this if you
    # get messages from the ib_ipath driver saying that it couldn't enable
    # write combining for the PIO buffs on the card.
    # Note: recent kernels should do this for us, but in case they don't, we'll
    # leave this option
    FIXUP_MTRR_REGS=no
    # Should we enable the NFSoRDMA service?
    NFSoRDMA_LOAD=yes
    NFSoRDMA_PORT=2050
    oracle@blade1-6:~> /etc/init.d/rdma status
    Low level hardware support loaded:
         mlx4_ib
    Upper layer protocol modules:
         rds_rdma ib_ipoib
    User space access modules:
         rdma_ucm ib_ucm ib_uverbs ib_umad
    Connection management modules:
         rdma_cm ib_cm iw_cm
    Configured IPoIB interfaces: none
    Currently active IPoIB interfaces: ib0

  • Unable to failover the services in active-active cluster node

    Hi,
    i am applying the sp2 patch for sql server 2008 r2 in active-active cluster, we have 3 services in the cluster , node 1 as 2 prefered owner and node 2 as 1 prefered owner, when i try to move the service from node 2 to node1 , i am getting the below errors
    DCOM was unable to communicate with the computer XXXXXXXXX using any of the configured protocols.
    The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server XXXXXXXXX. The target name used was RPCSS/XXXXXX. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal
    name (SPN) is registered on an account other than the account the target service is using. Please ensure that the target SPN is registered on, and only registered on, the account used by the server. This error can also happen when the target service is using
    a different password for the target service account than what the Kerberos Key Distribution Center (KDC) has for the target service account. Please ensure that the service on the server and the KDC are both updated to use the current password. If the server
    name is not fully qualified, and the target domain (XXXXXX) is different from the client domain (XXXXXXX), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server.
    The Cluster service failed to bring clustered service or application 'CHCROCHC045' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
    Cluster resource 'SQL Server (CHCROCHC045)' in clustered service or application 'CHCROCHC045' failed.
    any inputs appreciated to resolve this issue as i could not procedd with patching
    BR
    PGR

    Hi PGR,
    As the issue is more related to Windows Server, I would like to recommend you post the issue in the
    Windows Server forums for better support.
    In addition, below are some article about troubleshooting error ” DCOM was unable to communicate with the computer XXXXXXXXX using any of the configured protocols” for your reference.
    Event ID 10009 — COM Remote Service Availability
    How to troubleshoot DCOM 10009 error logged in system event?
    Thanks,
    Lydia Zhang
    Lydia Zhang
    TechNet Community Support

  • Error while getting cluster node subtree

    Hi,
      We are on SP15.
    The console logs show the following error
    log generation timestamp : 2006_01_17_at_17_14_05
    java.rmi.RemoteException: Error while getting cluster node subtree of :name=ClusterNodeRepresentative,j2eeType=com.sap.engine.services.adminadapter.impl.ClusterNodeRepresentative,SAP_J2EEClusterNode=2053400,SAP_J2EECluster=""; nested exception is:
         com.sap.engine.services.jmx.exception.MBeanServerClusterException: Exception during invocation of remote MBeanServer method, target node: 2053400
         at com.sap.engine.services.adminadapter.impl.ConvenienceEngineAdministratorImpl.getClusterNodeSubTree(ConvenienceEngineAdministratorImpl.java:242)
         at com.sap.engine.services.adminadapter.impl.ConvenienceEngineAdministratorImplp4_Skel.dispatch(ConvenienceEngineAdministratorImplp4_Skel.java:99)
         at com.sap.engine.services.rmi_p4.DispatchImpl._runInternal(DispatchImpl.java:304)
         at com.sap.engine.services.rmi_p4.DispatchImpl._run(DispatchImpl.java:193)
         at com.sap.engine.services.rmi_p4.server.P4SessionProcessor.request(P4SessionProcessor.java:122)
         at com.sap.engine.core.service630.context.cluster.session.ApplicationSessionMessageListener.process(ApplicationSessionMessageListener.java:33)
         at com.sap.engine.core.cluster.impl6.session.MessageRunner.run(MessageRunner.java:41)
         at com.sap.engine.core.thread.impl3.ActionObject.run(ActionObject.java:37)
         at java.security.AccessController.doPrivileged(Native Method)
         at com.sap.engine.core.thread.impl3.SingleThread.execute(SingleThread.java:100)
         at com.sap.engine.core.thread.impl3.SingleThread.run(SingleThread.java:170)
    Caused by: com.sap.engine.services.jmx.exception.MBeanServerClusterException: Exception during invocation of remote MBeanServer method, target node: 2053400
         at com.sap.engine.services.jmx.ClusterInterceptor.invoke(ClusterInterceptor.java:816)
         at com.sap.pj.jmx.server.interceptor.MBeanServerInterceptorChain.invoke(MBeanServerInterceptorChain.java:330)
         at com.sap.engine.services.adminadapter.impl.ConvenienceEngineAdministratorImpl.getClusterNodeSubTree(ConvenienceEngineAdministratorImpl.java:239)
         ... 10 more
    Caused by: com.sap.engine.services.jmx.exception.JmxConnectorException: Unable to de-serialize request parameters, message [ JMX request (java) v1.0 len: 345 |  src: cluster target-node: 2053400 req: invoke params-number: 4 params-bytes: 0 | :name=ClusterNodeRepresentative,j2eeType=com.sap.engine.services.adminadapter.impl.ClusterNodeRepresentative,SAP_J2EEClusterNode=2053400,SAP_J2EECluster="" null null null ]
         at com.sap.engine.services.jmx.MBeanServerConnectionImpl.invokeMbsInternal(MBeanServerConnectionImpl.java:680)
         at com.sap.engine.services.jmx.MBeanServerConnectionImpl.invoke(MBeanServerConnectionImpl.java:467)
         at com.sap.engine.services.jmx.MBeanServerConnectionSecurityWrapper.invoke(MBeanServerConnectionSecurityWrapper.java:221)
         at com.sap.engine.services.jmx.ClusterInterceptor.invoke(ClusterInterceptor.java:813)
         ... 12 more
    Caused by: javax.management.InstanceNotFoundException: MBean with name com.sap.default:name=ClusterNodeRepresentative,j2eeType=com.sap.engine.services.adminadapter.impl.ClusterNodeRepresentative,SAP_J2EEClusterNode=2053400,SAP_J2EECluster=XD1 not found in repository
         at com.sap.pj.jmx.server.MBeanServerImpl.getClassLoaderFor(MBeanServerImpl.java:1408)
         at com.sap.pj.jmx.server.interceptor.MBeanServerWrapperInterceptor.getClassLoaderFor(MBeanServerWrapperInterceptor.java:455)
         at com.sap.engine.services.jmx.CompletionInterceptor.getClassLoaderFor(CompletionInterceptor.java:567)
         at com.sap.pj.jmx.server.interceptor.BasicMBeanServerInterceptor.getClassLoaderFor(BasicMBeanServerInterceptor.java:438)
         at com.sap.jmx.provider.ProviderInterceptor.getClassLoaderFor(ProviderInterceptor.java:330)
         at com.sap.engine.services.jmx.RedirectInterceptor.getClassLoaderFor(RedirectInterceptor.java:501)
         at com.sap.pj.jmx.server.interceptor.MBeanServerInterceptorChain.getClassLoaderFor(MBeanServerInterceptorChain.java:443)
         at com.sap.engine.services.jmx.RequestMessage.readParams(RequestMessage.java:523)
         at com.sap.engine.services.jmx.RequestMessage.getParams(RequestMessage.java:578)
         at com.sap.engine.services.jmx.MBeanServerInvoker.invokeMbs(MBeanServerInvoker.java:106)
         at com.sap.engine.services.jmx.JmxServiceConnectorServer.receiveWait(JmxServiceConnectorServer.java:173)
         at com.sap.engine.core.service630.context.cluster.message.MessageListenerWrapper.process(MessageListenerWrapper.java:81)
         at com.sap.engine.core.cluster.impl6.ms.MSListenerThread.run(MSListenerThread.java:47)
         at com.sap.engine.frame.core.thread.Task.run(Task.java:64)
         at com.sap.engine.core.thread.impl6.SingleThread.execute(SingleThread.java:78)
         at com.sap.engine.core.thread.impl6.SingleThread.run(SingleThread.java:148)
    java.lang.NullPointerException
         at com.sap.engine.services.adminadapter.gui.ClusterView.addGlobalDispatcherServiceProperties(ClusterView.java:455)
         at com.sap.engine.services.adminadapter.gui.ClusterView.createGlobalTrees(ClusterView.java:508)
         at com.sap.engine.services.adminadapter.gui.ClusterView.access$1200(ClusterView.java:29)
         at com.sap.engine.services.adminadapter.gui.ClusterView$4.run(ClusterView.java:420)
    java.rmi.RemoteException: Error while getting cluster node subtree of :name=ClusterNodeRepresentative,j2eeType=com.sap.engine.services.adminadapter.impl.ClusterNodeRepresentative,SAP_J2EEClusterNode=2053400,SAP_J2EECluster=""; nested exception is:
         com.sap.engine.services.jmx.exception.MBeanServerClusterException: Exception during invocation of remote MBeanServer method, target node: 2053400
         at com.sap.engine.services.adminadapter.impl.ConvenienceEngineAdministratorImpl.getClusterNodeSubTree(ConvenienceEngineAdministratorImpl.java:242)
         at com.sap.engine.services.adminadapter.impl.ConvenienceEngineAdministratorImplp4_Skel.dispatch(ConvenienceEngineAdministratorImplp4_Skel.java:99)
         at com.sap.engine.services.rmi_p4.DispatchImpl._runInternal(DispatchImpl.java:304)
         at com.sap.engine.services.rmi_p4.DispatchImpl._run(DispatchImpl.java:193)
         at com.sap.engine.services.rmi_p4.server.P4SessionProcessor.request(P4SessionProcessor.java:122)
         at com.sap.engine.core.service630.context.cluster.session.ApplicationSessionMessageListener.process(ApplicationSessionMessageListener.java:33)
         at com.sap.engine.core.cluster.impl6.session.MessageRunner.run(MessageRunner.java:41)
         at com.sap.engine.core.thread.impl3.ActionObject.run(ActionObject.java:37)
         at java.security.AccessController.doPrivileged(Native Method)
         at com.sap.engine.core.thread.impl3.SingleThread.execute(SingleThread.java:100)
         at com.sap.engine.core.thread.impl3.SingleThread.run(SingleThread.java:170)
    Caused by: com.sap.engine.services.jmx.exception.MBeanServerClusterException: Exception during invocation of remote MBeanServer method, target node: 2053400
         at com.sap.engine.services.jmx.ClusterInterceptor.invoke(ClusterInterceptor.java:816)
         at com.sap.pj.jmx.server.interceptor.MBeanServerInterceptorChain.invoke(MBeanServerInterceptorChain.java:330)
         at com.sap.engine.services.adminadapter.impl.ConvenienceEngineAdministratorImpl.getClusterNodeSubTree(ConvenienceEngineAdministratorImpl.java:239)
         ... 10 more
    Caused by: com.sap.engine.services.jmx.exception.JmxConnectorException: Unable to de-serialize request parameters, message [ JMX request (java) v1.0 len: 345 |  src: cluster target-node: 2053400 req: invoke params-number: 4 params-bytes: 0 | :name=ClusterNodeRepresentative,j2eeType=com.sap.engine.services.adminadapter.impl.ClusterNodeRepresentative,SAP_J2EEClusterNode=2053400,SAP_J2EECluster="" null null null ]
         at com.sap.engine.services.jmx.MBeanServerConnectionImpl.invokeMbsInternal(MBeanServerConnectionImpl.java:680)
         at com.sap.engine.services.jmx.MBeanServerConnectionImpl.invoke(MBeanServerConnectionImpl.java:467)
         at com.sap.engine.services.jmx.MBeanServerConnectionSecurityWrapper.invoke(MBeanServerConnectionSecurityWrapper.java:221)
         at com.sap.engine.services.jmx.ClusterInterceptor.invoke(ClusterInterceptor.java:813)
         ... 12 more
    Caused by: javax.management.InstanceNotFoundException: MBean with name com.sap.default:name=ClusterNodeRepresentative,j2eeType=com.sap.engine.services.adminadapter.impl.ClusterNodeRepresentative,SAP_J2EEClusterNode=2053400,SAP_J2EECluster=XD1 not found in repository
         at com.sap.pj.jmx.server.MBeanServerImpl.getClassLoaderFor(MBeanServerImpl.java:1408)
         at com.sap.pj.jmx.server.interceptor.MBeanServerWrapperInterceptor.getClassLoaderFor(MBeanServerWrapperInterceptor.java:455)
         at com.sap.engine.services.jmx.CompletionInterceptor.getClassLoaderFor(CompletionInterceptor.java:567)
         at com.sap.pj.jmx.server.interceptor.BasicMBeanServerInterceptor.getClassLoaderFor(BasicMBeanServerInterceptor.java:438)
         at com.sap.jmx.provider.ProviderInterceptor.getClassLoaderFor(ProviderInterceptor.java:330)
         at com.sap.engine.services.jmx.RedirectInterceptor.getClassLoaderFor(RedirectInterceptor.java:501)
         at com.sap.pj.jmx.server.interceptor.MBeanServerInterceptorChain.getClassLoaderFor(MBeanServerInterceptorChain.java:443)
         at com.sap.engine.services.jmx.RequestMessage.readParams(RequestMessage.java:523)
         at com.sap.engine.services.jmx.RequestMessage.getParams(RequestMessage.java:578)
         at com.sap.engine.services.jmx.MBeanServerInvoker.invokeMbs(MBeanServerInvoker.java:106)
         at com.sap.engine.services.jmx.JmxServiceConnectorServer.receiveWait(JmxServiceConnectorServer.java:173)
         at com.sap.engine.core.service630.context.cluster.message.MessageListenerWrapper.process(MessageListenerWrapper.java:81)
         at com.sap.engine.core.cluster.impl6.ms.MSListenerThread.run(MSListenerThread.java:47)
         at com.sap.engine.frame.core.thread.Task.run(Task.java:64)
         at com.sap.engine.core.thread.impl6.SingleThread.execute(SingleThread.java:78)
         at com.sap.engine.core.thread.impl6.SingleThread.run(SingleThread.java:148)
    java.lang.NullPointerException
         at com.sap.engine.services.adminadapter.gui.ClusterView.addGlobalDispatcherServiceProperties(ClusterView.java:455)
         at com.sap.engine.services.adminadapter.gui.ClusterView.createGlobalTrees(ClusterView.java:508)
         at com.sap.engine.services.adminadapter.gui.ClusterView.access$1200(ClusterView.java:29)
         at com.sap.engine.services.adminadapter.gui.ClusterView$4.run(ClusterView.java:420)
    Any clue whats it?
    rgds

    Go the same error
    + /usr/java14_64/bin/java -showversion -Duser.language=en -DP4ClassLoad=P4Connection -Dp4Cache=clean -jar go.jar
    java version "1.4.2"
    Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2)
    Classic VM (build 1.4.2, J2RE 1.4.2 IBM AIX 5L for PowerPC (64 bit JVM) build caix64142ifx-20061222 (ifix 113727: SR7 + 112603) (JIT enabled: jitc))
    java.lang.NullPointerException
            at com.sap.engine.services.adminadapter.gui.ClusterView$4.run(ClusterView.java:405)
    Need some help!
    Bernard

  • INS-40925 - One or more nodes have interfaces not configured with a subnet that is common across all cluster nodes.

    Hi All,
    I am facing the below error while installing Oracle RAC in Silent Mode.
    SEVERE: There are no common subnets represented by network interfaces across all cluster nodes.
    SEVERE: [FATAL] [INS-40925] One or more nodes have interfaces not configured with a subnet that is common across all cluster nodes.
       CAUSE: Not all nodes have network interfaces that are configured on subnets that are common to all nodes in the cluster.
       ACTION: Ensure all cluster nodes have a public interface defined with the same subnet accessible by all nodes in the cluster.
    My /etc/hosts is given below.
    127.0.0.1        localhost    localhost.localdomain
    #Public
    192.168.1.101      rac1        rac1.localdomain
    192.168.1.102    rac2        rac2.localdomain
    #Private
    192.168.2.101    rac1-priv    rac1-priv.localdomain
    192.168.2.102    rac2-priv    rac2-priv.localdomain
    #Virtual
    192.168.1.103      rac1-vip    rac1-vip.localdomain
    192.168.1.104    rac2-vip    rac2-vip.localdomain
    #SCAN
    192.168.1.105    rac-scan    rac-scan.localdomain
    Could you please help me to get rid of the error INS-40925....Any Idea...???

    Hi Ramesh,
    Please find the result of ifconfig -a from both nodes RAC1 & RAC2.
    ifconfig -a in RAC1
    [oracle@rac1 Desktop]$ ifconfig -a
    eth0      Link encap:Ethernet  HWaddr 08:00:27:17:7A:D5
              inet addr:192.168.1.101  Bcast:192.168.1.255  Mask:255.255.255.0
              inet6 addr: fe80::a00:27ff:fe17:7ad5/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:102 errors:0 dropped:0 overruns:0 frame:0
              TX packets:48 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:25472 (24.8 KiB)  TX bytes:3322 (3.2 KiB)
              Interrupt:19 Base address:0xd020
    eth1      Link encap:Ethernet  HWaddr 08:00:27:C0:AC:DB
              inet addr:192.168.2.101  Bcast:192.168.2.255  Mask:255.255.255.0
              inet6 addr: fe80::a00:27ff:fec0:acdb/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:4 errors:0 dropped:0 overruns:0 frame:0
              TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:240 (240.0 b)  TX bytes:816 (816.0 b)
              Interrupt:16 Base address:0xd240
    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:16436  Metric:1
              RX packets:56 errors:0 dropped:0 overruns:0 frame:0
              TX packets:56 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:6394 (6.2 KiB)  TX bytes:6394 (6.2 KiB)
    virbr0    Link encap:Ethernet  HWaddr 52:54:00:CC:BD:FB
              inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
              UP BROADCAST MULTICAST  MTU:1500  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
    virbr0-nic Link encap:Ethernet  HWaddr 52:54:00:CC:BD:FB
              BROADCAST MULTICAST  MTU:1500  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:500
              RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
    ifconfig -a in RAC2
    [oracle@rac2 Desktop]$ ifconfig -a
    eth0      Link encap:Ethernet  HWaddr 08:00:27:C9:38:82
              inet addr:192.168.1.102  Bcast:192.168.1.255  Mask:255.255.255.0
              inet6 addr: fe80::a00:27ff:fec9:3882/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:122 errors:0 dropped:0 overruns:0 frame:0
              TX packets:59 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:32617 (31.8 KiB)  TX bytes:5157 (5.0 KiB)
              Interrupt:19 Base address:0xd020
    eth1      Link encap:Ethernet  HWaddr 08:00:27:90:B5:A0
              inet addr:192.168.2.102  Bcast:192.168.2.255  Mask:255.255.255.0
              inet6 addr: fe80::a00:27ff:fe90:b5a0/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:4 errors:0 dropped:0 overruns:0 frame:0
              TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:240 (240.0 b)  TX bytes:746 (746.0 b)
              Interrupt:16 Base address:0xd240
    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:16436  Metric:1
              RX packets:56 errors:0 dropped:0 overruns:0 frame:0
              TX packets:56 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:6390 (6.2 KiB)  TX bytes:6390 (6.2 KiB)
    virbr0    Link encap:Ethernet  HWaddr 52:54:00:CC:BD:FB
              inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
              UP BROADCAST MULTICAST  MTU:1500  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
    virbr0-nic Link encap:Ethernet  HWaddr 52:54:00:CC:BD:FB
              BROADCAST MULTICAST  MTU:1500  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:500
              RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

  • CUA: One or more errors occurred while checking the status of Windows Firewall on the cluster nodes

    Cluster with 2 hosts 2012 R2
    Scheduled CAU fails with:
    CAU run {4EFE116C-AB49-456D-8EED-F7EDC764DA49} on cluster Cluster1 failed. Error Message:One or more errors occurred while checking the status of Windows Firewall on the cluster nodes. Review the errors for more information on how to resolve the problems.
    Error Code:-2146233088 Stack:   at MS.Internal.ClusterAwareUpdating.Util.<CheckFirewallsAsync>d__3a.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.ClusterAwareUpdating.Commands.InvokeCauRunCommand.<_ProcessCluster>d__78.MoveNext()
    If I run CAU "Analyze Readiness" ALL comes as PASS
    If I run CUA by hand on same hosts with NO change to the system (not even reboot) it finishes OK
    Anybody any ideas?
    Thanks
    Seb

    Hi,
    In some case if you disabled the connection in Windows firewall inbound of
     "Cluster aware updating" service it will can’t use the CAU.
    More information:
    Starting with Cluster-Aware Updating: Self-Updating
    http://blogs.technet.com/b/filecab/archive/2012/05/17/starting-with-cluster-aware-updating-self-updating.aspx
    What is Cluster Aware Updating in Windows Server 2012? (Part 1)
    http://blogs.technet.com/b/mspfe/archive/2013/02/06/what-is-cluster-aware-updating-in-windows-server-2012.aspx
    Cluster-Aware Updating Overview
    http://technet.microsoft.com/en-us/library/hh831694.aspx
    Hope this helps.
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • What are the preferred methods for backing up a cluster node bootdisk?

    Hi,
    I would like to use flarcreate to backup the bootdisks for each of the nodes in my cluster... but I cannot see this method mentioned in any cluster documentation...
    Has anybody used flash backups for cluster nodes before (and more importantly - successfully restored a cluster node from a flash image..?)
    Thanks very much,
    Trevor

    Hi, some backround on this - I need to patch some production cluster nodes, and obviously would like to backup the rootdisk of each node before doing this.
    What I really need is some advice about the best method to backup & patch my cluster node (with a recovery method also).
    The sun documentation for this says to use ufsdump, which i have used in the past - but will FLAR do the same job? - has anyone had experiance using FLAR to restore a cluster node?
    Or if someone has some other solutions for patching the nodes? - maybe offline my root mirror (SVM) - patch root disk - barring any major problems - online the mirror again??
    Cheers, Trevor

  • File Being processed in two cluster nodes

    Hi ,
    We are having two cluster nodes and when my  adapter picks the file, the file is getting processed in 2 cluster nodes.
    I believe the file should get processed in either of the cluster node but not in both cluster nodes.
    Has any one faced this kind of situation in any of your projects where you might be having different cluster nodes.
    Thanks,
    Chandra.

    Hi Chandra
      Did u get a chance to see this post.. it may help
        Processing in  Multiple Cluster Nodes
    Regards,
    Sandeep

Maybe you are looking for