Microsoft Cluster node service failing automatically

Hello Expert,
We have Net weaver 7.0 EHP 2 installed on Windows 2008 R2 for EP. It is installed on cluster environment.
We have 2 cluster node Host A and Host B. Also we have 2 services one is for database and another is for SCS. During the failover these 2 services will move to another node.
My problem is SCS cluster service is getting offline automatically which is making my entire EP production server down. As it gets down i manually start cluster service first then app server and my EP system gets start.
Please suggest how can i find the root cause for getting SCS service offline or How we can make it always online?
Regards,

HI Sunil,
I checked dev_ms.old file and below is log:
trc file: "dev_ms", trc level: 1, release: "720"
[Thr 7224] Fri Mar 21 14:05:02 2014
[Thr 7224] ms/http_max_clients = 500 -> 500
[Thr 7224] MsSSetTrcLog: trc logging active, max size = 52428800 bytes
systemid   562 (PC with Windows NT)
relno      7200
patchlevel 0
patchno    101
intno      20020600
make       multithreaded, Unicode, 64 bit, optimized
pid        9488
[Thr 7224] ***LOG Q01=> MsSInit, MSStart (Msg Server 1 9488) [msxxserv.c   2274]
[Thr 7224] Fri Mar 21 14:05:03 2014
[Thr 7224] load acl file = \\EP1SAPGRP\sapmnt\EP1\SYS\global\ms_acl_info.DAT
[Thr 7224] MsGetOwnIpAddr: my host addresses are :
[Thr 7224]   1 : [IP] HOST (HOSTNAME)
[Thr 7224]   2 : [127.0.0.1] FQDN (LOCALHOST)
[Thr 7224]   3 : [IP] FQDN (NILIST)
[Thr 7224]   4 : [IP] EPCLUSTER (NILIST)
[Thr 7224]   5 : [IP] EP1SAPGRP (NILIST)
[Thr 7224]   6 : [IP] EP1ORAGRP (NILIST)
[Thr 7224]   7 : [IP] FQDN (NILIST)
[Thr 7224]   8 : [IP] FQDN (NILIST)
[Thr 7224] MsHttpInit: full qualified hostname = NODE A
[Thr 7224] HTTP logging is switch off
[Thr 7224] set HTTP state to LISTEN
[Thr 7224] *** HTTP port 8110 state LISTEN ***
[Thr 7224] *** I listen to internal port 3910 (3910) ***
[Thr 7224] *** HTTP port 8110 state LISTEN ***
[Thr 7224] CUSTOMER KEY: ><
[Thr 7224] build version=720.2011.05.04
[Thr 7224] MsJ2EE_CheckLoggedInNode: logged in list is not initialized -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [114836600] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [114836600] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [128683700] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [128683700] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [128683700] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [128683751] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [128683751] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [128683751] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [139051900] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [139051900] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [139051900] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [114836650] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [114836650] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [114836650] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [139051951] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [139051951] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [139051951] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [139051950] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [139051950] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [139051950] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [128683750] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [128683750] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [128683750] into logged in list

Similar Messages

  • Cluster node addition fails on cleanup

    We have a 2 node cluster setup already
    (2) HP BL460c G8 servers connected to a VNX5300 SAN (Nodes 1 & 2)
    Server 2012 Datacenter installed
    Quarum: Node + Disk
    all failover tests went perfectly and all VMs are healthy
    Verification on the cluster show some warnings but no failures
    We have rebuilt a server (node 3) renamed it and have run a single machine verification test to see if it is suitable for clustering. it succeeded with minor warnings
    We ran verification on all three machines and received the formentioned warnings but no game stoppers, however when trying to add the host to the cluster we get the following error in the logs:
    WARN mscs::ListenerWorker::operator (): ERROR_TIMEOUT(1460)' because of '[FTI][Initiator] Aborting connection because NetFT route to node <machine name> on virtual IP fe80::cdf2:f6ea:5ce:5f9c:~3343~ has failed to come up.'
    This happens after the node is added to the cluster but reports a failure on cleanup processes and reverts everything back. I have done all of this under my domain_admin account.
    before and after the attempt to add the NetFT adapter is in media disconnect, during the attempts it does pull down a 169 address as it is supposed to
    Node 3 Networking breakdown
    The new host uses an Intel/HP NC365T Quard port adaptor
    port 1: Mgmt : Static assignment subnet 1
    port 2: VM net: Static assignment sibmet 2
    port 3: Heartbeat: assigned via DHCP subnet 1 pool (we have attempted the above with this disabled as well)
    NCU is not installed for the adapter and bridging in server 2012 is not enabled.
    I am at a loss, and would appreciate any additional help as i have spent 3 days researching this to try and find the cause.

    Hi,
    The error message mentioned an IPv6 address, have you enable IPv6 network for the cluster?
    Check the IPv6 network configuration in the 3<sup>rd</sup> node server, what’s the status, enabled or disabled?
    When two or more cluster nodes are running IPv6 for heartbeat communications, they will require any additional nodes that join to also running IPv6. If the node server has IPv6 disabled, it will fail to join.
    Also whether these cluster node server has antivirus software installed, you may temporarily disable it and rejoin the new node.
    Check that and give us feedback for further troubleshooting, for more information please refer to following MS articles:
    Failover Cluster Creation Issue
    http://social.technet.microsoft.com/Forums/en-US/winserverClustering/thread/1ed1936d-6283-46cc-951d-9c236329b8be
    Failure to re-add rebuilt cluster node to Windows 2008 R2 Cluster: System error 1460 has occurred (0x000005b4). Timeout.
    http://social.technet.microsoft.com/Forums/en-US/winserverClustering/thread/a21e9a8e-9f68-4d83-a747-204000cda65a
    Hope this helps!
    TechNet Subscriber Support
    If you are
    TechNet Subscription
    user and have any feedback on our support quality, please send your feedback
    here.
    Lawrence
    TechNet Community Support

  • Cluster node A failed

    Hello,
    we are having cluster configration with SQL 2005 and windows 2003 server. Node A was not able to reboot after one of software upgrade, unfortunately recovery and repair didnt work. so we left with no choice except format and reinstall.
    Node A Evicted from cluster and added again. 
    Node B is working fine.
    Now Inorder to restore previous state, what should be done from database and SAP point of view.
    ASCS and SCS are installed on SAN (O drive). Enqueue replication for ASCS and SCS, Central Instance are on local disk F.
    Thanks in advance
    Kapil

    Hi Kapil,
    If you have already deleted and recreated the server node A again, so it supose to be working. During the bootstrap process (at startup) all the applications and other necessary binary files will be synchronized from database to the filesystem of server node A. Othewise the new server node A was not created successfully, you don't need to make any other configuration.
    Best regards,
    Matheus

  • Sql server 2008 r2 setup support rules missing cluster node

    i have error on install sql r2 there are problems in cluster node its failed on windows 8

    Hi ,
    Before you install SQL Server on a computer that is running Windows 8, about Microsoft SQL Server 2008 , you must apply and install Microsoft SQL Server 2008 R2 Service Pack 1 or a later after the initial setup is complete. To install a SQL Server failover
    cluster, you must create and configure a failover cluster instance by running SQL Server setup. For more information, see:
    Installing a SQL Server 2008 R2 Failover Cluster: http://msdn.microsoft.com/en-us/library/ms179410(v=sql.105).aspx
    In addition, as other post, could you please help us to collect the detailed error message and the following error log? It is very useful for our research.
    C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\LOG\Summary.txt.
    C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\<YYYYMMDD_HHMM>\Detail.txt
    Regards,
    Sofiya Li
    Sofiya Li
    TechNet Community Support

  • OES2 SP2a cluster node freeze

    Hi all.
    I have a 3 node cluster based on OES2 SP2a fully patched. There are a coupe of resources: Master_IP and a NSS volume.
    The cluster is virtualized on ESXi 4.1 fully patched, and vmware-tools are installed and up to date.
    If i do an "rcnetwork stop" on a node, it remains with no network for about 20 seconds, and then freezes. Does not reboot. Only freezes. The resource is balanced correctly, but the server remains hanged.
    This behaviour is the same on a server with a cluster resource on it and on a server with no cluster resource on it. Always hangs.
    The correct behaviour should be a reboot, shouldn't?
    Any hints?
    Thanks in advance.

    The node does not reboot because ....
    9.11 Preventing a Cluster Node Reboot after a Node Shutdown
    If LAN connectivity is lost between a cluster node and the other nodes in the cluster, it is possible that the lost node will be automatically shut down by the other cluster nodes. This is normal cluster operating behavior, and it prevents the lost node from trying to load cluster resources because it cannot detect the other cluster nodes. By default, cluster nodes are configured to reboot after an automatic shutdown.
    On certain occasions, you might want to prevent a downed cluster node from rebooting so you can troubleshoot problems.
    Section 9.11.1, OES 2 SP2 with Patches and Later
    Section 9.11.2, OES 2 SP2 Release Version and Earlier
    9.11.1 OES 2 SP2 with Patches and Later
    Beginning in the OES 2 SP2 Maintenance Patch for May 2010, the Novell Cluster Services reboot behavior conforms to the kernel panic setting for the Linux operating system. By default the kernel panic setting is set for no reboot after a node shutdown.
    You can set the kernel panic behavior in the /etc/sysctl.conf file by adding a kernel.panic command line. Set the value to 0 for no reboot after a node shutdown. Set the value to a positive integer value to indicate that the server should be rebooted after waiting the specified number of seconds. For information about the Linux sysctl, see the Linux man pages on sysctl and sysctl.conf.
    1.
    As the root user, open the /etc/sysctl.conf file in a text editor.
    2.
    If the kernel.panic token is not present, add it.
    kernel.panic = 0
    3.
    Set the kernel.panic value to 0 or to a positive integer value, depending on the desired behavior.
    No Reboot: To prevent an automatic cluster reboot after a node shutdown, set the kernel.panic token to value to 0. This allows the administrator to determine what caused the kernel panic condition before manually rebooting the server. This is the recommended setting.
    kernel.panic = 0
    Reboot: To allow a cluster node to reboot automatically after a node shutdown, set the kernel.panic token to a positive integer value that represents the seconds to delay the reboot.
    kernel.panic = <seconds>
    For example, to wait 1 minute (60 seconds) before rebooting the server, specify the following:
    kernel.panic = 60
    4.
    Save your changes.
    9.11.2 OES 2 SP2 Release Version and Earlier
    In OES 2 SP release version and earlier, you can modify the opt/novell/ncs/bin/ldncs file for the cluster to trigger the server to not automatically reboot after a shutdown.
    1.
    Open the opt/novell/ncs/bin/ldncs file in a text editor.
    2.
    Find the following line:
    echo -n $TOLERANCE > /proc/sys/kernel/panic
    3.
    Replace $TOLERANCE with a value of 0 to cause the server to not automatically reboot after a shutdown.
    4.
    After editing the ldncs file, you must reboot the server to cause the change to take effect.

  • Server Locator Service Failed to Start

    Received Event ID:4380 on single Exchange Server (once part of a DAG but no more - removed other servers and the DAG). Error reads: 'net.tcp://huntsvr.huntelectric.local:64337/Exchange.HighAvailability'. Error A TCP error (10013: An attempt was made to access
    a socket in a way forbidden by its access permissions) occurred while listening on IP Endpoing=0.0.0.0:64337.
    The above error is paired with the following error as well:
    Event ID: 2121. The Microsoft Exchange Replication service failed to start the TCP listener. Error: System.Net.Sockets.SocketException: An attempt was made to access a socket in a way forbidden by its access permissions ..."
    The above two errors are joined by this error:
    Event ID: 4383. Microsoft Exchange Server Locator Service communication channel faulted. State: Faulted
    And the next error appears as well:
    Event ID: 4379. Microsoft Exchange Server Locator Service stopped.
    These 4 errors repeat every minute. Any suggestions of where to find a resolution?
    Thanks,
    Michael

    Hi Michael,
    The above errors indicate the following two things:
    1. An existing service is using port 64337.
    2. Port 64337 is blocked by firewall.
    I recommend you use the commands below to check if port 64337 is used by a program.
    netstat -aon | findstr "64337"
    tasklist | findstr "64337"
    What's more, here is a thread for your reference.
    Exchange 2010 SP3 RU1 event id 4380 and 4383 logged every minute
    http://social.technet.microsoft.com/Forums/exchange/en-US/892721dc-f2cc-45da-ba5f-b24a9a2ef749/exchange-2010-sp3-ru1-event-id-4380-and-4383-logged-every-minute
    Hope it helps.
    If you need further assistance, please feel free to let me know.
    Best regards,
    Amy
    Amy Wang
    TechNet Community Support

  • WWW service is not able to start via Microsoft Failover Cluster generic service resource

    Environment
    Cluster Nodes = two
    Cluster Nodes OS = Windows 2008R2
    Application = IIS
    Query
    I created generic service resources of many windows services under Microsoft Failover Cluster and they are failing over successfully but when I create a generic service resource for WWW, then the WWW service is not able to online
    via Microsoft Failover Cluster. It stuck in online pending.
    I have noticed two things.
    1.) If the WWW service is set to manual and started at passive node and I manually restart the Active node then the WWW service successfully switch over to stand by/passive node. but if the WWW service is set to
    manual and not started on stand by/Passive node then the WWW service is not failing over.
    2.) if I kill the WWW service manually (as a test case) on Active Node via this command (taskkill /f /pid XXXX) then the WWW service failed and is not failing
    over to standby/passive node.
    Any comment will be appreciated. Thanks. Zahid Haseeb.

    The problem is resolved. I feel that it will be helpful to other people who may face the same problem which I faced, so I wrote a blog on "How to configure IIS Web Site and Application Pool in Microsoft Failover Cluster" and mentioned almost all activities
    which I have done. Kindly see the resolution under section "Configure some changes in Cluster Configuration" in the below link
    http://zahidhaseeb.wordpress.com/2014/02/12/how-to-configure-iis-web-site-and-application-pool-in-microsoft-failover-cluster/
    Any comment will be appreciated. Thanks. Zahid Haseeb.

  • Hyper-V Guest Cluster Node Failing Regularly

    Hi,
    We currently have a 4-node Server 2012 R2 Cluster witch hosts among other things, a 3 node Guest Cluster running a single clustered file service.  
    Around once a week, the guest cluster node that is currently hosting the clustered file service will fail.  It's as if the VM is blue screening.  That in itself is fairly anoying and I'll be doing all the updates and checking event log for clues
    as to the cause.  
    The problem then is that whichever physical cluster node that is hosting the VM when it fails,  will not unlock some of the VM's files.  The Virtual machine configuration lists as Online Pending.  This means that the failed VM cannot be restarted
    on any other cluster node.  The only fix is to drain the physical host it failed on, and reboot. 
    Looking for suggestions on how to fix the following.
    1. Crashing guest file cluster node
    2. Failed VM with shared VHDX requiring Phyiscal host reboot.
    Event messages for the physical host that was hosting the failed vm in order that they occured.
    Hyper-V-Worker: Event ID 18590 - 'FS-03' has encountered a fatal error.  The guest operating system reported that it failed with the following error codes: ErrorCode0: 0x9E, ErrorCode1: 0x6C2A17C0, ErrorCode2: 0x3C, ErrorCode3: 0xA, ErrorCode4:
    0x0.  If the problem persists, contact Product Support for the guest operating system.  (Virtual machine ID 36166B47-D003-4E51-AFB5-7B967A3EFD2D)
    FailoverClustering: Event ID 1069 - Cluster resource 'Virtual Machine FS-03' of type 'Virtual Machine' in clustered role 'FS-03' failed.
    Hyper-V-High-Availability: Event ID 21128 - 'Virtual Machine FS-03' failed to shutdown the virtual machine during the resource termination. The virtual machine will be forcefully stopped.
    Hyper-V-High-Availability: Event ID 21110 - 'Virtual Machine FS-03' failed to terminate.
    Hyper-V-VMMS: Event ID 20108 - The Virtual Machine Management Service failed to start the virtual machine '36166B47-D003-4E51-AFB5-7B967A3EFD2D': The group or resource is not in the correct state to perform the requested operation. (0x8007139F).
    Hyper-V-High-Availability: Event ID 21107 - 'Virtual Machine FS-03' failed to start.
    FailoverClustering: Event ID 1205 - The Cluster service failed to bring clustered role 'FS-03' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

    Hi,
    I don’t found the similar issue, Does your cluster can pass the cluster validation? Does all your Hyper-V host compatible with Server 2012r2? Have you try to disable all your
    AV soft and firewall? Please rerun Storage validation on the Cluster in non-production hours, the cluster validation report will quickly locate the issue.
    More information:
    Cluster
    http://technet.microsoft.com/en-us/library/dd581778(v=ws.10).aspx
    Hope this helps.
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • Sap service fails after switching to the second node in HA system.

    Hello all,
    We have installed the HA mscs cluster installation for ECC6.0(my SAP erp 2005) system.The ASCS+SCS isntance  and the DB instance are the failover resources.
    When the ASCS+SCS(sap cluster group) fail over to the next node the service sap_<sid>_00 (ascs) and sap_<sid>_01(scs) fails to start automatically  and we have to manually start the instance from the services console by giving the sapservice<sid> password to start the services.
    when we try to start the service without giving the password then the system gives a message unable to start due to logon failure and after giving the password it shows that the user has been been granted to logon as a service user.
    Pl help resolve this issue.
    thx
    satyajit

    A change to the zpool_import() management of the zpool.cache file, as delivered by Solaris 10 kernel patches 137137-09 (for SPARC) or 137138-09 (for x86), might cause systems that have their shared ZFS (zfs(1M)) storage pools under the control of HAStoragePlus to be simultaneously imported on multiple cluster nodes. Importing a ZFS storage pool on multiple cluster nodes will result in pool corruption, which might cause data integrity issues or cause a cluster node to panic.
    To avoid this problem, install Solaris 10 patch 139579-02 (for SPARC) or 139580-02 (for x86) immediately after you install 137137-09 or 137138-09 but before you reboot the cluster nodes.
    Alternatively, only on the Solaris 10 5/08 OS, remove the affected patch before any ZFS pools are simultaneously imported to multiple cluster nodes. You cannot remove patch 137137-09 or 137138-09 from the Solaris 10 10/08 OS, because these patches are preinstalled on that release.

  • DAG - Backup failing on 1 DB only with error - The Microsoft Exchange Replication service VSS Writer instance ID failed with error code 80070020 when preparing for a backup of database 'DB012'

    Hi Board,
    i´ve search across the board, technet and symantec sites but did not found a hint about my problem.
    we drive a 2 node DAG (Location1-Ex1-mb1 
    Location2-exc1-mb1), on SP2 RU4 patchlevel with 40 Databases.
    Since some time the backup of one - and only one DB - is failing with these events, logged on the Mailboxserver on which the passive DB is hosted.
    Log Name:      Application
    Source:        MSExchangeRepl
    Date:          28.09.2012 00:37:17
    Event ID:      2112
    Task Category: Exchange VSS Writer
    Level:         Error
    Keywords:      Classic
    User:          N/A
    Computer:      Location1-Exc1-MB1
    Description: The Microsoft Exchange Replication service VSS Writer instance 1ab7d204-609a-4aea-b0a7-70afb0db38de failed with error code 80070020 when preparing for a backup of database 'DB012'.
    Followed by
    Log Name:      Application
    Source:        MSExchangeRepl
    Date:         
    01.10.2012 03:33:06
    Event ID:      2024
    Task Category: Exchange VSS Writer
    Level:         Error
    Keywords:      Classic
    User:         
    N/A
    Computer:      Location1-Exc1-MB1
    Description:
    The Microsoft Exchange Replication service VSS Writer (Instance 42916d80-36c1-4f73-86d0-596d30226349) failed with error 80070020 when preparing for a backup.
    The backup Application - Symantec Backup Exec 2010 R3 – states, this error
    Snapshot provider error (0xE000FED1): A failure occurred querying the Writer status.
    Check the Windows Event Viewer for details.
    Writer Name: Exchange Server, Writer ID: {76FE1AC4-15F7-4BCD-987E-8E1ACB462FB7}, Last error: The VSS Writer failed, but the operation can be retried (0x800423f3), State: Stable (1).
    Symatec suggests within http://www.symantec.com/business/support/index?page=content&id=TECH184095
    to restart the MS Exchange Replication Service – BUT the mentioned eventID
    8229 isn´t present on any of the both Mailboxservers.
    The affected Database is active on Location2-Exc1-Mb1 Server and in an overall healthy state. I found during my research, that below Location2-Exc1-Mb1 Server, there are not removed shadow copies present!
    This confuses me, since all Backups are normally taken from the passive copy of a Database.
    So my questions to the board are:
    * Does anyone is facing similar issues?
    * Can someone explain why snapshots are present on the Mailboxserver hosting the Active Database, whilst the errors are logged on the passive one?
    -          * Does someone know the conditions, why shadows copies remain and
    aren´t removed in a proper manner?
    What can cause the circumstance, that only 1 DB is facing such issues?
    Any suggestion is welcome!
    BR
    Markus

    Hi Lenora,
    I´ve encreases VSS / Exchange Backup Log levels to expert, before starting
    those things i´ve all tried now:
    - Backup from passive DB (forced within Symantec Backup Exec)
    - Backup from active DB (forced within Symantec Backup Exec)
    - Backup from passive DB without GRT enabled (forced within Symantec Backup Exec)
    - Backup from active DB without GRT enabled(forced within Symantec Backup Exec)
    All those attempts failed.
    But brought some more details - the backup against the active DB states, that there is still a backup in progress and therefore this backup is cancelled by VSS.
    The Solution was, that i´ve needed to restart the Exchange Replication Service on the Mailbox Server hosting the passive DB.
    Backups are working again on all DBs!
    THX for your replys.
    Best regards
    Markus

  • Unable to failover the services in active-active cluster node

    Hi,
    i am applying the sp2 patch for sql server 2008 r2 in active-active cluster, we have 3 services in the cluster , node 1 as 2 prefered owner and node 2 as 1 prefered owner, when i try to move the service from node 2 to node1 , i am getting the below errors
    DCOM was unable to communicate with the computer XXXXXXXXX using any of the configured protocols.
    The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server XXXXXXXXX. The target name used was RPCSS/XXXXXX. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal
    name (SPN) is registered on an account other than the account the target service is using. Please ensure that the target SPN is registered on, and only registered on, the account used by the server. This error can also happen when the target service is using
    a different password for the target service account than what the Kerberos Key Distribution Center (KDC) has for the target service account. Please ensure that the service on the server and the KDC are both updated to use the current password. If the server
    name is not fully qualified, and the target domain (XXXXXX) is different from the client domain (XXXXXXX), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server.
    The Cluster service failed to bring clustered service or application 'CHCROCHC045' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
    Cluster resource 'SQL Server (CHCROCHC045)' in clustered service or application 'CHCROCHC045' failed.
    any inputs appreciated to resolve this issue as i could not procedd with patching
    BR
    PGR

    Hi PGR,
    As the issue is more related to Windows Server, I would like to recommend you post the issue in the
    Windows Server forums for better support.
    In addition, below are some article about troubleshooting error ” DCOM was unable to communicate with the computer XXXXXXXXX using any of the configured protocols” for your reference.
    Event ID 10009 — COM Remote Service Availability
    How to troubleshoot DCOM 10009 error logged in system event?
    Thanks,
    Lydia Zhang
    Lydia Zhang
    TechNet Community Support

  • Error: Halting this cluster node due to unrecoverable service failure

    Our cluster has experienced some sort of fault that has only become apparent today. The origin appears to have been nearly a month ago yet the symptoms have only just manifested.
    The node in question is a standalone instance running a DistributedCache service with local storage. It output the following to stdout on Jan-22:
    Coherence <Error>: Halting this cluster node due to unrecoverable service failure
    It finally failed today with OutOfMemoryError: Java heap space.
    We're running coherence-3.5.2.jar.
    Q1: It looks like this node failed on Jan-22 yet we did not notice. What is the best way to monitor node health?
    Q2: What might the root cause be for such a fault?
    I found the following in the logs:
    2011-01-22 01:18:58,296 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:18:58.296/9910749.462 Oracle Coherence EE 3.5.2/463 <Error> (thread=Cluster, member=33): Attempting recovery (due to soft timeout) of Guard{Daemon=DistributedCache}
    2011-01-22 01:18:58,296 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:18:58.296/9910749.462 Oracle Coherence EE 3.5.2/463 <Error> (thread=Cluster, member=33): Attempting recovery (due to soft timeout) of Guard{Daemon=DistributedCache}
    2011-01-22 01:19:04,772 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:19:04.772/9910755.938 Oracle Coherence EE 3.5.2/463 <Error> (thread=Cluster, member=33): Terminating guarded execution (due to hard timeout) of Guard{Daemon=DistributedCache}
    2011-01-22 01:19:04,772 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:19:04.772/9910755.938 Oracle Coherence EE 3.5.2/463 <Error> (thread=Cluster, member=33): Terminating guarded execution (due to hard timeout) of Guard{Daemon=DistributedCache}
    2011-01-22 01:19:05,785 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:19:05.785/9910756.951 Oracle Coherence EE 3.5.2/463 <Error> (thread=Termination Thread, member=33): Full Thread Dump
    Thread[Reference Handler,10,system]
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)
    java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    Thread[DistributedCache,5,Cluster]
    java.nio.Bits.copyToByteArray(Native Method)
    java.nio.DirectByteBuffer.get(DirectByteBuffer.java:224)
    com.tangosol.io.nio.ByteBufferInputStream.read(ByteBufferInputStream.java:123)
    java.io.DataInputStream.readFully(DataInputStream.java:178)
    java.io.DataInputStream.readFully(DataInputStream.java:152)
    com.tangosol.util.Binary.readExternal(Binary.java:1066)
    com.tangosol.util.Binary.<init>(Binary.java:183)
    com.tangosol.io.nio.BinaryMap$Block.readValue(BinaryMap.java:4304)
    com.tangosol.io.nio.BinaryMap$Block.getValue(BinaryMap.java:4130)
    com.tangosol.io.nio.BinaryMap.get(BinaryMap.java:377)
    com.tangosol.io.nio.BinaryMapStore.load(BinaryMapStore.java:64)
    com.tangosol.net.cache.SerializationPagedCache$WrapperBinaryStore.load(SerializationPagedCache.java:1547)
    com.tangosol.net.cache.SerializationPagedCache$PagedBinaryStore.load(SerializationPagedCache.java:1097)
    com.tangosol.net.cache.SerializationMap.get(SerializationMap.java:121)
    com.tangosol.net.cache.SerializationPagedCache.get(SerializationPagedCache.java:247)
    com.tangosol.net.cache.AbstractSerializationCache$1.getOldValue(AbstractSerializationCache.java:315)
    com.tangosol.net.cache.OverflowMap$Status.registerBackEvent(OverflowMap.java:4210)
    com.tangosol.net.cache.OverflowMap.onBackEvent(OverflowMap.java:2316)
    com.tangosol.net.cache.OverflowMap$BackMapListener.onMapEvent(OverflowMap.java:4544)
    com.tangosol.util.MultiplexingMapListener.entryDeleted(MultiplexingMapListener.java:49)
    com.tangosol.util.MapEvent.dispatch(MapEvent.java:214)
    com.tangosol.util.MapEvent.dispatch(MapEvent.java:166)
    com.tangosol.util.MapListenerSupport.fireEvent(MapListenerSupport.java:556)
    com.tangosol.net.cache.AbstractSerializationCache.dispatchEvent(AbstractSerializationCache.java:338)
    com.tangosol.net.cache.AbstractSerializationCache.dispatchPendingEvent(AbstractSerializationCache.java:321)
    com.tangosol.net.cache.AbstractSerializationCache.removeBlind(AbstractSerializationCache.java:155)
    com.tangosol.net.cache.SerializationPagedCache.removeBlind(SerializationPagedCache.java:348)
    com.tangosol.util.AbstractKeyBasedMap$KeySet.remove(AbstractKeyBasedMap.java:556)
    com.tangosol.net.cache.OverflowMap.removeInternal(OverflowMap.java:1299)
    com.tangosol.net.cache.OverflowMap.remove(OverflowMap.java:380)
    com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$Storage.clear(DistributedCache.CDB:24)
    com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache.onClearRequest(DistributedCache.CDB:32)
    com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ClearRequest.run(DistributedCache.CDB:1)
    com.tangosol.coherence.component.net.message.requestMessage.DistributedCacheRequest.onReceived(DistributedCacheRequest.CDB:12)
    com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onMessage(Grid.CDB:9)
    com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onNotify(Grid.CDB:136)
    com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache.onNotify(DistributedCache.CDB:3)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
    java.lang.Thread.run(Thread.java:619)
    Thread[Finalizer,8,system]
    java.lang.Object.wait(Native Method)
    java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
    java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
    java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
    Thread[PacketReceiver,7,Cluster]
    java.lang.Object.wait(Native Method)
    com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
    com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketReceiver.onWait(PacketReceiver.CDB:2)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
    java.lang.Thread.run(Thread.java:619)
    Thread[RMI TCP Accept-0,5,system]
    java.net.PlainSocketImpl.socketAccept(Native Method)
    java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
    java.net.ServerSocket.implAccept(ServerSocket.java:453)
    java.net.ServerSocket.accept(ServerSocket.java:421)
    sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
    sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
    java.lang.Thread.run(Thread.java:619)
    Thread[PacketSpeaker,8,Cluster]
    java.lang.Object.wait(Native Method)
    com.tangosol.coherence.component.util.queue.ConcurrentQueue.waitForEntry(ConcurrentQueue.CDB:16)
    com.tangosol.coherence.component.util.queue.ConcurrentQueue.remove(ConcurrentQueue.CDB:7)
    com.tangosol.coherence.component.util.Queue.remove(Queue.CDB:1)
    com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketSpeaker.onNotify(PacketSpeaker.CDB:62)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
    java.lang.Thread.run(Thread.java:619)
    Thread[Logger@9216774 3.5.2/463,3,main]
    java.lang.Object.wait(Native Method)
    com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
    java.lang.Thread.run(Thread.java:619)
    Thread[PacketListener1,8,Cluster]
    java.net.PlainDatagramSocketImpl.receive0(Native Method)
    java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)
    java.net.DatagramSocket.receive(DatagramSocket.java:712)
    com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:20)
    com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
    com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
    java.lang.Thread.run(Thread.java:619)
    Thread[main,5,main]
    java.lang.Object.wait(Native Method)
    com.tangosol.net.DefaultCacheServer.main(DefaultCacheServer.java:79)
    com.networkfleet.cacheserver.Launcher.main(Launcher.java:122)
    Thread[Signal Dispatcher,9,system]
    Thread[RMI TCP Accept-41006,5,system]
    java.net.PlainSocketImpl.socketAccept(Native Method)
    java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
    java.net.ServerSocket.implAccept(ServerSocket.java:453)
    java.net.ServerSocket.accept(ServerSocket.java:421)
    sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
    sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
    java.lang.Thread.run(Thread.java:619)
    ThreadCluster
    java.lang.Object.wait(Native Method)
    com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
    com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onWait(Grid.CDB:9)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
    java.lang.Thread.run(Thread.java:619)
    Thread[TcpRingListener,6,Cluster]
    java.net.PlainSocketImpl.socketAccept(Native Method)
    java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
    java.net.ServerSocket.implAccept(ServerSocket.java:453)
    java.net.ServerSocket.accept(ServerSocket.java:421)
    com.tangosol.coherence.component.net.socket.TcpSocketAccepter.accept(TcpSocketAccepter.CDB:18)
    com.tangosol.coherence.component.util.daemon.TcpRingListener.acceptConnection(TcpRingListener.CDB:10)
    com.tangosol.coherence.component.util.daemon.TcpRingListener.onNotify(TcpRingListener.CDB:9)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
    java.lang.Thread.run(Thread.java:619)
    Thread[PacketPublisher,6,Cluster]
    java.lang.Object.wait(Native Method)
    com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
    com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketPublisher.onWait(PacketPublisher.CDB:2)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
    java.lang.Thread.run(Thread.java:619)
    Thread[RMI TCP Accept-0,5,system]
    java.net.PlainSocketImpl.socketAccept(Native Method)
    java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
    java.net.ServerSocket.implAccept(ServerSocket.java:453)
    java.net.ServerSocket.accept(ServerSocket.java:421)
    sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34)
    sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
    sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
    java.lang.Thread.run(Thread.java:619)
    Thread[PacketListenerN,8,Cluster]
    java.net.PlainDatagramSocketImpl.receive0(Native Method)
    java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)
    java.net.DatagramSocket.receive(DatagramSocket.java:712)
    com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:20)
    com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
    com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
    java.lang.Thread.run(Thread.java:619)
    Thread[Invocation:Management,5,Cluster]
    java.lang.Object.wait(Native Method)
    com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
    com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onWait(Grid.CDB:9)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
    java.lang.Thread.run(Thread.java:619)
    Thread[DistributedCache:PofDistributedCache,5,Cluster]
    java.lang.Object.wait(Native Method)
    com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
    com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onWait(Grid.CDB:9)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
    java.lang.Thread.run(Thread.java:619)
    Thread[Invocation:Management:EventDispatcher,5,Cluster]
    java.lang.Object.wait(Native Method)
    com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
    com.tangosol.coherence.component.util.daemon.queueProcessor.Service$EventDispatcher.onWait(Service.CDB:7)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
    java.lang.Thread.run(Thread.java:619)
    Thread[Termination Thread,5,Cluster]
    java.lang.Thread.dumpThreads(Native Method)
    java.lang.Thread.getAllStackTraces(Thread.java:1487)
    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    java.lang.reflect.Method.invoke(Method.java:597)
    com.tangosol.net.GuardSupport.logStackTraces(GuardSupport.java:791)
    com.tangosol.coherence.component.net.Cluster.onServiceFailed(Cluster.CDB:5)
    com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$Guard.terminate(Grid.CDB:17)
    com.tangosol.net.GuardSupport$2.run(GuardSupport.java:652)
    java.lang.Thread.run(Thread.java:619)
    2011-01-22 01:19:05,785 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:19:05.785/9910756.951 Oracle Coherence EE 3.5.2/463 <Error> (thread=Termination Thread, member=33): Full Thread Dump
    Thread[Reference Handler,10,system]
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)
    java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    Thread[DistributedCache,5,Cluster]
    java.nio.Bits.copyToByteArray(Native Method)
    java.nio.DirectByteBuffer.get(DirectByteBuffer.java:224)
    com.tangosol.io.nio.ByteBufferInputStream.read(ByteBufferInputStream.java:123)
    java.io.DataInputStream.readFully(DataInputStream.java:178)
    java.io.DataInputStream.readFully(DataInputStream.java:152)
    com.tangosol.util.Binary.readExternal(Binary.java:1066)
    com.tangosol.util.Binary.<init>(Binary.java:183)
    com.tangosol.io.nio.BinaryMap$Block.readValue(BinaryMap.java:4304)
    com.tangosol.io.nio.BinaryMap$Block.getValue(BinaryMap.java:4130)
    com.tangosol.io.nio.BinaryMap.get(BinaryMap.java:377)
    com.tangosol.io.nio.BinaryMapStore.load(BinaryMapStore.java:64)
    com.tangosol.net.cache.SerializationPagedCache$WrapperBinaryStore.load(SerializationPagedCache.java:1547)
    com.tangosol.net.cache.SerializationPagedCache$PagedBinaryStore.load(SerializationPagedCache.java:1097)
    com.tangosol.net.cache.SerializationMap.get(SerializationMap.java:121)
    com.tangosol.net.cache.SerializationPagedCache.get(SerializationPagedCache.java:247)
    com.tangosol.net.cache.AbstractSerializationCache$1.getOldValue(AbstractSerializationCache.java:315)
    com.tangosol.net.cache.OverflowMap$Status.registerBackEvent(OverflowMap.java:4210)
    com.tangosol.net.cache.OverflowMap.onBackEvent(OverflowMap.java:2316)
    com.tangosol.net.cache.OverflowMap$BackMapListener.onMapEvent(OverflowMap.java:4544)
    com.tangosol.util.MultiplexingMapListener.entryDeleted(MultiplexingMapListener.java:49)
    com.tangosol.util.MapEvent.dispatch(MapEvent.java:214)
    com.tangosol.util.MapEvent.dispatch(MapEvent.java:166)
    com.tangosol.util.MapListenerSupport.fireEvent(MapListenerSupport.java:556)
    com.tangosol.net.cache.AbstractSerializationCache.dispatchEvent(AbstractSerializationCache.java:338)
    com.tangosol.net.cache.AbstractSerializationCache.dispatchPendingEvent(AbstractSerializationCache.java:321)
    com.tangosol.net.cache.AbstractSerializationCache.removeBlind(AbstractSerializationCache.java:155)
    com.tangosol.net.cache.SerializationPagedCache.removeBlind(SerializationPagedCache.java:348)
    com.tangosol.util.AbstractKeyBasedMap$KeySet.remove(AbstractKeyBasedMap.java:556)
    com.tangosol.net.cache.OverflowMap.removeInternal(OverflowMap.java:1299)
    com.tangosol.net.cache.OverflowMap.remove(OverflowMap.java:380)
    com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$Storage.clear(DistributedCache.CDB:24)
    com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache.onClearRequest(DistributedCache.CDB:32)
    com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ClearRequest.run(DistributedCache.CDB:1)
    com.tangosol.coherence.component.net.message.requestMessage.DistributedCacheRequest.onReceived(DistributedCacheRequest.CDB:12)
    com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onMessage(Grid.CDB:9)
    com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onNotify(Grid.CDB:136)
    com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache.onNotify(DistributedCache.CDB:3)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
    java.lang.Thread.run(Thread.java:619)
    Thread[Finalizer,8,system]
    java.lang.Object.wait(Native Method)
    java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
    java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
    java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
    Thread[PacketReceiver,7,Cluster]
    java.lang.Object.wait(Native Method)
    com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
    com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketReceiver.onWait(PacketReceiver.CDB:2)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
    java.lang.Thread.run(Thread.java:619)
    Thread[RMI TCP Accept-0,5,system]
    java.net.PlainSocketImpl.socketAccept(Native Method)
    java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
    java.net.ServerSocket.implAccept(ServerSocket.java:453)
    java.net.ServerSocket.accept(ServerSocket.java:421)
    sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
    sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
    java.lang.Thread.run(Thread.java:619)
    Thread[PacketSpeaker,8,Cluster]
    java.lang.Object.wait(Native Method)
    com.tangosol.coherence.component.util.queue.ConcurrentQueue.waitForEntry(ConcurrentQueue.CDB:16)
    com.tangosol.coherence.component.util.queue.ConcurrentQueue.remove(ConcurrentQueue.CDB:7)
    com.tangosol.coherence.component.util.Queue.remove(Queue.CDB:1)
    com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketSpeaker.onNotify(PacketSpeaker.CDB:62)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
    java.lang.Thread.run(Thread.java:619)
    Thread[Logger@9216774 3.5.2/463,3,main]
    java.lang.Object.wait(Native Method)
    com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
    java.lang.Thread.run(Thread.java:619)
    Thread[PacketListener1,8,Cluster]
    java.net.PlainDatagramSocketImpl.receive0(Native Method)
    java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)
    java.net.DatagramSocket.receive(DatagramSocket.java:712)
    com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:20)
    com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
    com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
    java.lang.Thread.run(Thread.java:619)
    Thread[main,5,main]
    java.lang.Object.wait(Native Method)
    com.tangosol.net.DefaultCacheServer.main(DefaultCacheServer.java:79)
    com.networkfleet.cacheserver.Launcher.main(Launcher.java:122)
    Thread[Signal Dispatcher,9,system]
    Thread[RMI TCP Accept-41006,5,system]
    java.net.PlainSocketImpl.socketAccept(Native Method)
    java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
    java.net.ServerSocket.implAccept(ServerSocket.java:453)
    java.net.ServerSocket.accept(ServerSocket.java:421)
    sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
    sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
    java.lang.Thread.run(Thread.java:619)
    ThreadCluster
    java.lang.Object.wait(Native Method)
    com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
    com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onWait(Grid.CDB:9)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
    java.lang.Thread.run(Thread.java:619)
    Thread[TcpRingListener,6,Cluster]
    java.net.PlainSocketImpl.socketAccept(Native Method)
    java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
    java.net.ServerSocket.implAccept(ServerSocket.java:453)
    java.net.ServerSocket.accept(ServerSocket.java:421)
    com.tangosol.coherence.component.net.socket.TcpSocketAccepter.accept(TcpSocketAccepter.CDB:18)
    com.tangosol.coherence.component.util.daemon.TcpRingListener.acceptConnection(TcpRingListener.CDB:10)
    com.tangosol.coherence.component.util.daemon.TcpRingListener.onNotify(TcpRingListener.CDB:9)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
    java.lang.Thread.run(Thread.java:619)
    Thread[PacketPublisher,6,Cluster]
    java.lang.Object.wait(Native Method)
    com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
    com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketPublisher.onWait(PacketPublisher.CDB:2)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
    java.lang.Thread.run(Thread.java:619)
    Thread[RMI TCP Accept-0,5,system]
    java.net.PlainSocketImpl.socketAccept(Native Method)
    java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
    java.net.ServerSocket.implAccept(ServerSocket.java:453)
    java.net.ServerSocket.accept(ServerSocket.java:421)
    sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34)
    sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
    sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
    java.lang.Thread.run(Thread.java:619)
    Thread[PacketListenerN,8,Cluster]
    java.net.PlainDatagramSocketImpl.receive0(Native Method)
    java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)
    java.net.DatagramSocket.receive(DatagramSocket.java:712)
    com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:20)
    com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
    com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
    java.lang.Thread.run(Thread.java:619)
    Thread[Invocation:Management,5,Cluster]
    java.lang.Object.wait(Native Method)
    com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
    com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onWait(Grid.CDB:9)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
    java.lang.Thread.run(Thread.java:619)
    Thread[DistributedCache:PofDistributedCache,5,Cluster]
    java.lang.Object.wait(Native Method)
    com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
    com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onWait(Grid.CDB:9)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
    java.lang.Thread.run(Thread.java:619)
    Thread[Invocation:Management:EventDispatcher,5,Cluster]
    java.lang.Object.wait(Native Method)
    com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
    com.tangosol.coherence.component.util.daemon.queueProcessor.Service$EventDispatcher.onWait(Service.CDB:7)
    com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
    java.lang.Thread.run(Thread.java:619)
    Thread[Termination Thread,5,Cluster]
    java.lang.Thread.dumpThreads(Native Method)
    java.lang.Thread.getAllStackTraces(Thread.java:1487)
    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    java.lang.reflect.Method.invoke(Method.java:597)
    com.tangosol.net.GuardSupport.logStackTraces(GuardSupport.java:791)
    com.tangosol.coherence.component.net.Cluster.onServiceFailed(Cluster.CDB:5)
    com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$Guard.terminate(Grid.CDB:17)
    com.tangosol.net.GuardSupport$2.run(GuardSupport.java:652)
    java.lang.Thread.run(Thread.java:619)
    2011-01-22 01:19:06,738 Coherence Logger@9216774 3.5.2/463 INFO 2011-01-22 01:19:06.738/9910757.904 Oracle Coherence EE 3.5.2/463 <Info> (thread=main, member=33): Restarting Service: DistributedCache
    2011-01-22 01:19:06,738 Coherence Logger@9216774 3.5.2/463 INFO 2011-01-22 01:19:06.738/9910757.904 Oracle Coherence EE 3.5.2/463 <Info> (thread=main, member=33): Restarting Service: DistributedCache
    2011-01-22 01:19:06,738 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:19:06.738/9910757.904 Oracle Coherence EE 3.5.2/463 <Error> (thread=main, member=33): Failed to restart services: java.lang.IllegalStateException: Failed to unregister: Distr
    butedCache{Name=DistributedCache, State=(SERVICE_STARTED), LocalStorage=enabled, PartitionCount=257, BackupCount=1, AssignedPartitions=16, BackupPartitions=16}
    2011-01-22 01:19:06,738 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:19:06.738/9910757.904 Oracle Coherence EE 3.5.2/463 <Error> (thread=main, member=33): Failed to restart services: java.lang.IllegalStateException: Failed to unregister: Distr
    butedCache{Name=DistributedCache, State=(SERVICE_STARTED), LocalStorage=enabled, PartitionCount=257, BackupCount=1, AssignedPartitions=16, BackupPartitions=16}

    Hi
    It seems like the problem in this case is the call to clear() which will try to load all entries stored in the overflow scheme to emit potential cache events to listeners. This probably requires much more memory than there is Java heap available, hence the OOM.
    Our recommendation in this case is to call destroy() since this will bypass the even firing.
    /Charlie

  • WDRuntimeException: Failed to create J2EE cluster node in SLD

    Hello,
    I am getting the below error, but to my knowledge I have everything set up properly.  Let me briefly outline the logistics (I am running everything LOCALLY (will move to remote later)):
    WAS 6.4 <b>SP12</b>
    Set up JCo and tests fine
    Set up Visual Administrator / SLD Data Supplier / HTTP and CIM configured and seem to test fine
    Created SLD and it tests OK
    Created Technical Landscape
    I have noticed that in SP12, in the SLD config I actually have a NEW category called "<b>System Landscape</b>" above my "Technical Landscape" link.  I have not seen this option in previous versions SP9 or SP11.  Is it mandatory to configure this?
    Also, I created a model for Adaptive RFC and found the function I needed successfully.
    Anyway, here is the error when trying to deploy...
    com.sap.tc.webdynpro.services.exceptions.WDRuntimeException: Error while obtaining JCO connection.
         at com.sap.tc.webdynpro.services.datatypes.core.DataTypeBroker$1.fillSldConnection(DataTypeBroker.java:90)
    Caused by: com.sap.tc.webdynpro.services.sal.sl.api.WDSystemLandscapeException: Error while obtaining JCO connection.
    Caused by: com.sap.tc.webdynpro.services.exceptions.WDRuntimeException: Failed to create J2EE cluster node in SLD for 'J2E.SystemHome.bc347792': com.sap.lcr.api.cimclient.LcrException: CIM_ERR_NOT_FOUND: No such instance: SAP_J2EEEngineCluster.CreationClassName="SAP_J2EEEngineCluster",Name="J2E.SystemHome.bc347792"
    Any help will be appreciated!

    I figured it out for those that may have a similar problem.
    Although I had created and tested my JCo's properly and they were working fine, somehow, and I still don't know why, they went RED in the JCo Maintainence screen. 
    I had to "create" again and it works fine now.

  • SQL LOG Backup failed in one Cluster Node

    I have 02 node SQL fail over cluster, NOD01 and NODE 02. and configure SQL log backup job via SQL Logshipping
    When the sql service is mounted to node 02 job backup will work without any issues, Once its connected to node 01 this will provide below issue
    Executed as user: <domain>\administrator. The process could not be created for step 1 of job 0xAC90A0F3623AE44285089E9EF53B12C7 (reason: The system cannot find the file specified).  The step failed.
    could anyone have on fix for this
    Thanx

    SQL Server Agent on both nodes run under same domain account?
    Are you sure that path location is correct?
    Best Regards,Uri Dimant SQL Server MVP,
    http://sqlblog.com/blogs/uri_dimant/
    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting:
    Large scale of database and data cleansing
    Remote DBA Services:
    Improves MS SQL Database Performance
    SQL Server Integration Services:
    Business Intelligence

  • Cluster node fails after testing removing both interconnects in a two node

    Hi,
    cluster node panics and fails to join cluster after testing removing both interconnects in a two node cluster. cluster is up on one node , but the panic'ed node fails to rejoin cluster saying no sufficient quorum yet and both clinterconn failed (even after conencting the interconn). Quorum device used is a shared disk.
    Is this a bug?
    Any workaround or solution?
    Cluster is 3.2 SPARC
    Thanking you
    Ushas Symon

    Sounds like a networking problem to me. If the failed node genuinely can't communicate with the remaining node then it will not be allowed to join the cluster, hence the quorum message. I would suspect either:
    * Misconnected cables
    * A switch that has block or disabled the port
    * A failed auto-negotiation
    This is of course without knowing anything about what your network infrastructure actually is!
    Tim
    ---

Maybe you are looking for

  • How to find out the Number range object for Incident number

    How to find out the Number range object for Incident number ? CCIHT_IAL-IALID regards, lavanya

  • OS X 10.8.3 leads to VERY slow home sharing for Apple TV

    Everything was fine, but as soon as I upgraded to OS X 10.8.3 Home Sharing on my Apple TV became unusably slow. Internet is fine (I have a very speedy 40Mbps connection) on both my Mac Pro and my Apple TV (for example movie trailers streamed over the

  • Save a form to normal PDF in Acrobat Reader

    I created a form which my clients can fill in and save within Adobe Reader. The problem is that if the clients want to send this newly edited file to their clients, it will still be still an editable form. How can my client save this form as a normal

  • Siri stopped reading messages...

    Anybody else got this? iOS 8.0 came out and for a brief period I could say 'Hey, Siri' (pause) 'Read Last Message' whilst driving, absolutely brilliant, it meant I didn't have to try to catch it when I heard it come in. Now it just tells me 'You have

  • OID HR Sync Error

    Hi, We are in the process of synchronizing OID with our HR database. Created a config file with [SELECT] tag and select statement and also a corresponding mapping file. On trying to execute the hragent, it ends up with process failure. The trace file