Cluster service is requested to stop on all nodes when DNS is unavailable

Our 6 node coherence cluster has been running fine for few days. All coherence nodes were requested to stop the cluster service when the DNS server was not available for few mins due to a scheduled maintenance activity. Cluster services didn’t come back up until the DNS server is available. Why would it need a DNS server when the cluster is already started and running fine for few days?
Here’s the error message and thread dump from the logs:
2010-12-18 18:07:18.819/3464791.277 Oracle Coherence GE 3.6.0.3 <Error> (thread=IpMonitor, member=7): Detected hard timeout) of {WrapperGuardable Guard{Daemon=Cluster} Service=ClusterService{Name=Cluster, State=(SERVICE_STARTED, STATE_JOINED), Id=0, Version=3.6, OldestMemberId=5}}
2010-12-18 18:07:18.823/3464791.281 Oracle Coherence GE 3.6.0.3 <Error> (thread=Termination Thread, member=7): Full Thread Dump
Thread[Invocation:Management:EventDispatcher,5,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.Service$EventDispatcher.onWait(Service.CDB:7)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[Logger@9250962 3.6.0.3,3,main]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[Signal Dispatcher,9,system]
Thread[Finalizer,8,system]
java.lang.Object.wait(Native Method)
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
Thread[Invocation:Management,5,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onWait(Grid.CDB:6)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
ThreadCluster
java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:850)
java.net.InetAddress.getAddressFromNameService(InetAddress.java:1201)
java.net.InetAddress.getAllByName0(InetAddress.java:1154)
java.net.InetAddress.getAllByName(InetAddress.java:1084)
java.net.InetAddress.getAllByName(InetAddress.java:1020)
java.net.InetAddress.getByName(InetAddress.java:970)
java.net.InetSocketAddress.<init>(InetSocketAddress.java:124)
com.tangosol.net.ConfigurableAddressProvider$AddressHolder.getAddress(ConfigurableAddressProvider.java:426)
com.tangosol.net.ConfigurableAddressProvider$1.next(ConfigurableAddressProvider.java:167)
java.util.AbstractCollection.contains(AbstractCollection.java:89)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.ClusterService.isWellKnown(ClusterService.CDB:5)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.ClusterService.compareImportance(ClusterService.CDB:7)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.ClusterService.getWitnessMemberSet(ClusterService.CDB:49)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.ClusterService.verifyMemberLeft(ClusterService.CDB:91)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.ClusterService.onNotifyTcmpTimeout(ClusterService.CDB:11)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.ClusterService$NotifyTcmpTimeout.onReceived(ClusterService.CDB:1)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onMessage(Grid.CDB:11)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onNotify(Grid.CDB:33)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.ClusterService.onNotify(ClusterService.CDB:3)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[main,5,main]
java.lang.Object.wait(Native Method)
com.tangosol.net.DefaultCacheServer.monitorServices(DefaultCacheServer.java:270)
com.tangosol.net.DefaultCacheServer.startAndMonitor(DefaultCacheServer.java:56)
com.tangosol.net.DefaultCacheServer.main(DefaultCacheServer.java:197)
Thread[PacketReceiver,7,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketReceiver.onWait(PacketReceiver.CDB:2)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[PacketSpeaker,8,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.queue.ConcurrentQueue.waitForEntry(ConcurrentQueue.CDB:16)
com.tangosol.coherence.component.util.queue.ConcurrentQueue.remove(ConcurrentQueue.CDB:7)
com.tangosol.coherence.component.util.Queue.remove(Queue.CDB:1)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketSpeaker.onNotify(PacketSpeaker.CDB:21)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[Termination Thread,6,Cluster]
java.lang.Thread.dumpThreads(Native Method)
java.lang.Thread.getAllStackTraces(Thread.java:1487)
com.tangosol.net.GuardSupport.logStackTraces(GuardSupport.java:810)
com.tangosol.coherence.component.net.Cluster$DefaultFailurePolicy.onGuardableTerminate(Cluster.CDB:4)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$WrapperGuardable.terminate(Grid.CDB:1)
com.tangosol.net.GuardSupport$Context$2.run(GuardSupport.java:677)
java.lang.Thread.run(Thread.java:619)
Thread[Reference Handler,10,system]
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
Thread[PacketPublisher,6,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketPublisher.onWait(PacketPublisher.CDB:2)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[DistributedCache,5,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onWait(Grid.CDB:6)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[IpMonitor,6,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.IpMonitor.onWait(IpMonitor.CDB:4)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[PacketListener1P,8,Cluster]
java.net.PlainDatagramSocketImpl.receive0(Native Method)
java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)
java.net.DatagramSocket.receive(DatagramSocket.java:725)
com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:22)
com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:1)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:20)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[PacketListener1,8,Cluster]
java.net.PlainDatagramSocketImpl.receive0(Native Method)
java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)
java.net.DatagramSocket.receive(DatagramSocket.java:725)
com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:22)
com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:1)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:20)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
2010-12-18 18:07:18.823/3464791.281 Oracle Coherence GE 3.6.0.3 <Warning> (thread=Termination Thread, member=7): Terminating Guard{Daemon=Cluster}
2010-12-18 18:07:18.823/3464791.281 Oracle Coherence GE 3.6.0.3 <Error> (thread=StopService, member=7): Requested to stop cluster service.
2010-12-18 18:07:18.826/3464791.284 Oracle Coherence GE 3.6.0.3 <D5> (thread=DistributedCache, member=7): Service DistributedCache left the cluster
2010-12-18 18:07:18.826/3464791.284 Oracle Coherence GE 3.6.0.3 <D5> (thread=Invocation:Management, member=7): Service Management left the cluster
2010-12-18 18:07:24.904/3464797.362 Oracle Coherence GE 3.6.0.3 <Error> (thread=main, member=7): Failed to restart services: com.tangosol.net.RequestTimeoutException: Timeout while waiting for cluster to stop.
2010-12-18 18:07:33.915/3464806.373 Oracle Coherence GE 3.6.0.3 <Error> (thread=main, member=7): Failed to restart services: com.tangosol.net.RequestTimeoutException: Timeout while waiting for cluster to stop.
2010-12-18 18:07:42.924/3464815.382 Oracle Coherence GE 3.6.0.3 <Error> (thread=main, member=7): Failed to restart services: com.tangosol.net.RequestTimeoutException: Timeout while waiting for cluster to stop.
2010-12-18 18:07:51.936/3464824.394 Oracle Coherence GE 3.6.0.3 <Error> (thread=main, member=7): Failed to restart services: com.tangosol.net.RequestTimeoutException: Timeout while waiting for cluster to stop.

The log file shows that list of the addresses are formed by IP, but they are configured by using hostname in override file.
Here's the log entry:
WellKnownAddressList(Size=2,
WKA{Address=165.X.X.XX7, Port=8088}
WKA{Address=165.X.X.XX8, Port=8088}
Here's the configuration from tangosol-coherence-override-prod.xml:
<well-known-addresses>
<socket-address id="1">
<address system-property="tangosol.coherence.wka">serverA</address>
<port system-property="tangosol.coherence.wka.port">8088</port>
</socket-address>
<socket-address id="2">
<address system-property="tangosol.coherence.wka">serverB</address>
<port system-property="tangosol.coherence.wka.port">8088</port>
</socket-address>
</well-known-addresses>
Thanks,
Ramesh

Similar Messages

The cluster service terminated, error 7024, cannot create a file when that file already exists

I have a test 2-node Failover cluster using Server 2012 R2
As of last night the cluster service on one of the 2 nodes is down with this error:
The Cluster Service service terminated with the following service-specific error:
Cannot create a file when that file already exists.
EventID 7024
The Cluster service waits 60 sec, tries to start, and the same error occurs again.
Any idea where to look to identify which file this error is referring to, or how to go about identifying root cause and getting a solution?
thank you.
samb

Hi Yeswanth
Then you can try with a "Add Counter". This will create new file each time with the same name but a counter will be added to the file name at the end specifying the number of times it is created.
You can also the specify the format to create the counter once select this option u can correspondingly fill the Format and step fields.
Will this be fine.
Regards
Ashmi

Stop, start all nodes.

To shutdown database instance on all the nodes in a clusterd env/. I use
srvctl stop/start database -d dbname
Likewise what is the best way for ASM and cluster?
Thanks!

there is no command to startup from a single node or group command. Correct? Yes, that's correct.

EAR file is not deployed on all nodes when using SDM/Visual admin

Hi
We have a High availability portal landscape with multiple App Servers. When we deploy our custom applications (.EAR) using either SDM or Visual Administrator the file always deploys only onto the Central Instance and the end user some times sees blank screen becoz the load balancer routs the request to some other node (non Central Instance).
Any helpful answer will be awarded with points !!
Thanks
Lakshmi

Hi Lakshmi,
Restarting the SAP System should synchronize the components among the application servers.
You can also check if any of the components need to be updated by going to the deployment overview @ System Administation ->Support -> Support Desk-> Portal Runtime -> Deployment Overview
This link might be of help.
http://help.sap.com/saphelp_nw70/helpdata/en/f7/71b842b714b211e10000000a155106/frameset.htm
Regards,
Abhishek

The Cluster service is shutting down because quorum was lost

Hi, we recently experienced the above issue and after looking for explanations I haven't been able to find any satisfying answers when other people have posted this issue.
Our problem is as follows:
2 node 2008R2 cluster running SQL 2012
Each node is a HP BL460c running in a HP C7000 Blade Chassis.
We were updating the flexfabric cards on one of the chassis. The other chassis had been patched the previous week with no problems.
During the update process the flexfabric cards, which hold the Ethernet and FC connections, reboot so before work had begun all active cluster services had been failed over to the node in the chassis not being worked on. However despite this the cluster
service shut down on this one particular cluster. All other clusters running across these 2 chassis continued to run as expected.
As other people have posted before we saw the following errors in the system log.
1564: File share witness resource 'File Share Witness' failed to arbitrate for the file share
1069: Cluster resource 'File Share Witness' in clustered service or application 'Cluster Group' failed.
1172: The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.
Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected
such as hubs, switches, or bridges.
However we cant understand what could cause this to happen when the service is running on the node in the chassis not being updated, especially when the same update was performed the week before with no issues. How can both nodes lose connectivity
to the File Share Witness at the same time?
Cluster Validation tests run fine and don't highlight any issues. The file share witness is accessible from both servers.

Hi,
Please confirm you have install the Recommended hotfixes and updates for Windows Server 2008 R2 SP1 Failover Clusters update, especially the following hotfix.
The network location profile changes from "Domain" to "Public" in Windows 7 or in Windows Server 2008 R2
http://support.microsoft.com/kb/2524478/EN-US
A hotfix is available that adds two new cluster control codes to help you determine which cluster node is blocking a GUM update in Windows Server 2008 R2 and Windows Server
2012
http://support.microsoft.com/kb/2779069/EN-US
Hope this helps.
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place.

Cluster services UNKNOWN state

Hi,
I am having two node cluster database. I have some doubt
If cluster services will go UNKNOWN state in first node existing connection will failover to second node?
New connections will try to connect first node?

user2017273 wrote:
Hi,
I am having two node cluster database. I have some doubtQuit doubting and TEST it for yourself. Also actually reading the documentation will help
>
If cluster services will go UNKNOWN state in first node existing connection will failover to second node?
Maybe...
New connections will try to connect first node?If nodex is down any connection attempt should go to the remaining nodes.

Error in coherence-- stopping cluster service.

i do have found the error in one of my coherence server log files can some one explain me what does it mean?
Coherence Logger@9272718 3.4.2/411 ERROR 2009-06-01 16:08:31.396/1217.130 Oracle Coherence GE 3.4.2/411 <Error> (thread=Cluster, member=3): Received cluster heartbeat from the senior Member(Id=7, Timestamp=2009-04-24 12:29:25.802, Address=xx.xxx.xx.xxx:8093, MachineId=55400, Location=machine:server72,process:11324, Role=WeblogicServer) that does not contain this Member(Id=3, Timestamp=2009-06-01 15:48:09.18, Address=xx.xxx.xxx.xx:8091, MachineId=47428, Location=site:ops.company.org,machine:cohserverbox1,process:14401, Role=CoherenceServer); stopping cluster service.
Thanks Much

Hi,
This error essentially means what it says: The process received a cluster heartbeat that did not include the process as a member of the cluster. The process, therefore, stops its cluster service and will attempt to join the cluster again when appropriate. There are few reasons that the senior member may not have included the process in its heartbeat. Based on the timestamps and roles, I would first want to confirm the intent to cluster these processes. If the intent is not to cluster these processes, I would adjust their configurations appropriately (eg. use a distinct port) to form separate clusters. If the intent is to cluster these processes and the error (with the timestamp spread) reproduces, I would want to examine the network topology and look for reasons the members are being dropped from the cluster.
Regards,
Harv

Pre-check for cluster services setup was unsuccessful on all the nodes.

hi
when i am running the fixup script getting:
if i run cluvfy again i am getting another fixup script.what exactly to do?
[root@rac-1 grid1]# sh /tmp/CVU_11.2.0.1.0_grid1/runfixup.sh
Response file being used is :/tmp/CVU_11.2.0.1.0_grid1/fixup.response
Enable file being used is :/tmp/CVU_11.2.0.1.0_grid1/fixup.enable
Log file location: /tmp/CVU_11.2.0.1.0_grid1/orarun.log
uid=1100(grid1) gid=1000(oinstall) groups=1000(oinstall),1100(dba),1200(asmdba),1300(asmadmin),1202(asmoper)
grid1     hard    nproc    16384
Value of MAX PROCESSES HARDLIMIT in response file is not greater than value in/etc/security/limits.conf. Hence not changing it.
grid1     hard    nofile   65536
Value of FILE OPEN MAX HARDLIMIT in response file is not greater than value in /etc/security/limits.conf.Hence not changing it.
uid=1100(grid1) gid=1000(oinstall) groups=1000(oinstall),1100(dba),1200(asmdba),1300(asmadmin),1202(asmoper)
[root@rac-1 grid1]#
Performing pre-checks for cluster services setup
Checking node reachability...
Check: Node reachability from node "rac-1"
Destination Node                      Reachable?
rac-2                                 yes
rac-1                                 yes
Result: Node reachability check passed from node "rac-1"
Checking user equivalence...
Check: User equivalence for user "grid1"
Node Name                             Comment
rac-2                                 passed
rac-1                                 passed
Result: User equivalence check passed for user "grid1"
Checking node connectivity...
Checking hosts config file...
Node Name     Status                    Comment
rac-2         passed
rac-1         passed
Verification of the hosts config file successful
Interface information for node "rac-2"
Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU
eth0   192.168.1.3     192.168.1.0     0.0.0.0         192.168.1.1     00:1D:72:39:3A:E4 1500
virbr0 192.168.122.1   192.168.122.0   0.0.0.0         192.168.1.1     00:00:00:00:00:00 1500
eth1   192.168.181.20 192.168.181.0   0.0.0.0         192.168.1.1     00:00:00:00:00:00 1500
Interface information for node "rac-1"
Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU
eth0   192.168.1.2     192.168.1.0     0.0.0.0         192.168.1.1     00:00:E8:F7:02:B0 1500
eth1   192.168.181.10 192.168.181.0   0.0.0.0         192.168.1.1     00:26:18:59:EE:49 1500
virbr0 192.168.122.1   192.168.122.0   0.0.0.0         192.168.1.1     00:00:00:00:00:00 1500
Check: Node connectivity of subnet "192.168.1.0"
Source                          Destination                     Connected?
rac-2:eth0                      rac-1:eth0                      yes
Result: Node connectivity passed for subnet "192.168.1.0" with node(s) rac-2,rac-1
Check: TCP connectivity of subnet "192.168.1.0"
Source                          Destination                     Connected?
rac-1:192.168.1.2               rac-2:192.168.1.3               passed
Result: TCP connectivity check passed for subnet "192.168.1.0"
Check: Node connectivity of subnet "192.168.122.0"
Source                          Destination                     Connected?
rac-2:virbr0                    rac-1:virbr0                    yes
Result: Node connectivity passed for subnet "192.168.122.0" with node(s) rac-2,rac-1
Check: TCP connectivity of subnet "192.168.122.0"
Result: TCP connectivity check failed for subnet "192.168.122.0"
Check: Node connectivity of subnet "192.168.181.0"
Source                          Destination                     Connected?
rac-2:eth1                      rac-1:eth1                      yes
Result: Node connectivity passed for subnet "192.168.181.0" with node(s) rac-2,rac-1
Check: TCP connectivity of subnet "192.168.181.0"
Source                          Destination                     Connected?
rac-1:192.168.181.10            rac-2:192.168.181.20            passed
Result: TCP connectivity check passed for subnet "192.168.181.0"
Interfaces found on subnet "192.168.1.0" that are likely candidates for VIP are:
rac-2 eth0:192.168.1.3
rac-1 eth0:192.168.1.2
Interfaces found on subnet "192.168.122.0" that are likely candidates for a private interconnect are:
rac-2 virbr0:192.168.122.1
rac-1 virbr0:192.168.122.1
Interfaces found on subnet "192.168.181.0" that are likely candidates for a private interconnect are:
rac-2 eth1:192.168.181.20
rac-1 eth1:192.168.181.10
Result: Node connectivity check passed
Check: Total memory
Node Name     Available                 Required                  Comment
rac-2         1.96GB (2050416.0KB)      1.5GB (1572864.0KB)       passed
rac-1         1.96GB (2058984.0KB)      1.5GB (1572864.0KB)       passed
Result: Total memory check passed
Check: Available memory
Node Name     Available                 Required                  Comment
rac-2         1.7GB (1780600.0KB)       50MB (51200.0KB)          passed
rac-1         1.56GB (1636896.0KB)      50MB (51200.0KB)          passed
Result: Available memory check passed
Check: Swap space
Node Name     Available                 Required                  Comment
rac-2         4GB (4194296.0KB)         2.93GB (3075624.0KB)      passed
rac-1         4GB (4192956.0KB)         2.95GB (3088476.0KB)      passed
Result: Swap space check passed
Check: Free disk space for "rac-2:/tmp"
Path              Node Name     Mount point   Available     Required      Comment
/tmp              rac-2         /             24.03GB       1GB           passed
Result: Free disk space check passed for "rac-2:/tmp"
Check: Free disk space for "rac-1:/tmp"
Path              Node Name     Mount point   Available     Required      Comment
/tmp              rac-1         /             16.54GB       1GB           passed
Result: Free disk space check passed for "rac-1:/tmp"
Check: User existence for "grid1"
Node Name     Status                    Comment
rac-2         exists                    passed
rac-1         exists                    passed
Result: User existence check passed for "grid1"
Check: Group existence for "oinstall"
Node Name     Status                    Comment
rac-2         exists                    passed
rac-1         exists                    passed
Result: Group existence check passed for "oinstall"
Check: Group existence for "dba"
Node Name     Status                    Comment
rac-2         exists                    passed
rac-1         exists                    passed
Result: Group existence check passed for "dba"
Check: Membership of user "grid1" in group "oinstall" [as Primary]
Node Name         User Exists   Group Exists User in Group Primary       Comment
rac-2             yes           yes           yes           yes           passed
rac-1             yes           yes           yes           yes           passed
Result: Membership check for user "grid1" in group "oinstall" [as Primary] passed
Check: Membership of user "grid1" in group "dba"
Node Name         User Exists   Group Exists User in Group Comment
rac-2             yes           yes           no            failed
rac-1             yes           yes           yes           passed
Result: Membership check for user "grid1" in group "dba" failed
Check: Run level
Node Name     run level                 Required                  Comment
rac-2         5                         3,5                       passed
rac-1         5                         3,5                       passed
Result: Run level check passed
Check: Hard limits for "maximum open file descriptors"
Node Name         Type          Available     Required      Comment
rac-2             hard          65536         65536         passed
rac-1             hard          65536         65536         passed
Result: Hard limits check passed for "maximum open file descriptors"
Check: Soft limits for "maximum open file descriptors"
Node Name         Type          Available     Required      Comment
rac-2             soft          1024          1024          passed
rac-1             soft          65536         1024          passed
Result: Soft limits check passed for "maximum open file descriptors"
Check: Hard limits for "maximum user processes"
Node Name         Type          Available     Required      Comment
rac-2             hard          16384         16384         passed
rac-1             hard          16384         16384         passed
Result: Hard limits check passed for "maximum user processes"
Check: Soft limits for "maximum user processes"
Node Name         Type          Available     Required      Comment
rac-2             soft          2047          2047          passed
rac-1             soft          16384         2047          passed
Result: Soft limits check passed for "maximum user processes"
Check: System architecture
Node Name     Available                 Required                  Comment
rac-2         x86_64                    x86_64                    passed
rac-1         x86_64                    x86_64                    passed
Result: System architecture check passed
Check: Kernel version
Node Name     Available                 Required                  Comment
rac-2         2.6.18-92.el5             2.6.18                    passed
rac-1         2.6.18-164.el5            2.6.18                    passed
WARNING:
PRVF-7524 : Kernel version is not consistent across all the nodes.
Kernel version = "2.6.18-164.el5" found on nodes: rac-1.
Kernel version = "2.6.18-92.el5" found on nodes: rac-2.
Result: Kernel version check passed
Check: Kernel parameter for "semmsl"
Node Name     Configured                Required                  Comment
rac-2         250                       250                       passed
rac-1         250                       250                       passed
Result: Kernel parameter check passed for "semmsl"
Check: Kernel parameter for "semmns"
Node Name     Configured                Required                  Comment
rac-2         32000                     32000                     passed
rac-1         32000                     32000                     passed
Result: Kernel parameter check passed for "semmns"
Check: Kernel parameter for "semopm"
Node Name     Configured                Required                  Comment
rac-2         100                       100                       passed
rac-1         100                       100                       passed
Result: Kernel parameter check passed for "semopm"
Check: Kernel parameter for "semmni"
Node Name     Configured                Required                  Comment
rac-2         142                       128                       passed
rac-1         142                       128                       passed
Result: Kernel parameter check passed for "semmni"
Check: Kernel parameter for "shmmax"
Node Name     Configured                Required                  Comment
rac-2         1049812992                536870912                 passed
rac-1         4398046511104             536870912                 passed
Result: Kernel parameter check passed for "shmmax"
Check: Kernel parameter for "shmmni"
Node Name     Configured                Required                  Comment
rac-2         4096                      4096                      passed
rac-1         4096                      4096                      passed
Result: Kernel parameter check passed for "shmmni"
Check: Kernel parameter for "shmall"
Node Name     Configured                Required                  Comment
rac-2         3279547                   2097152                   passed
rac-1         1073741824                2097152                   passed
Result: Kernel parameter check passed for "shmall"
Check: Kernel parameter for "file-max"
Node Name     Configured                Required                  Comment
rac-2         6815744                   6815744                   passed
rac-1         6815744                   6815744                   passed
Result: Kernel parameter check passed for "file-max"
Check: Kernel parameter for "ip_local_port_range"
Node Name     Configured                Required                  Comment
rac-2         between 9000 & 65500      between 9000 & 65500      passed
rac-1         between 9000 & 65500      between 9000 & 65500      passed
Result: Kernel parameter check passed for "ip_local_port_range"
Check: Kernel parameter for "rmem_default"
Node Name     Configured                Required                  Comment
rac-2         262144                    262144                    passed
rac-1         4194304                   262144                    passed
Result: Kernel parameter check passed for "rmem_default"
Check: Kernel parameter for "rmem_max"
Node Name     Configured                Required                  Comment
rac-2         4194304                   4194304                   passed
rac-1         4194304                   4194304                   passed
Result: Kernel parameter check passed for "rmem_max"
Check: Kernel parameter for "wmem_default"
Node Name     Configured                Required                  Comment
rac-2         262144                    262144                    passed
rac-1         262144                    262144                    passed
Result: Kernel parameter check passed for "wmem_default"
Check: Kernel parameter for "wmem_max"
Node Name     Configured                Required                  Comment
rac-2         1048576                   1048576                   passed
rac-1         1048576                   1048576                   passed
Result: Kernel parameter check passed for "wmem_max"
Check: Kernel parameter for "aio-max-nr"
Node Name     Configured                Required                  Comment
rac-2         3145728                   1048576                   passed
rac-1         3145728                   1048576                   passed
Result: Kernel parameter check passed for "aio-max-nr"
Check: Package existence for "ocfs2-tools-1.2.7"
Node Name     Available                 Required                  Comment
rac-2         ocfs2-tools-1.2.7-1.el5   ocfs2-tools-1.2.7         passed
rac-1         ocfs2-tools-1.4.2-1.el5   ocfs2-tools-1.2.7         passed
Result: Package existence check passed for "ocfs2-tools-1.2.7"
Check: Package existence for "make-3.81"
Node Name     Available                 Required                  Comment
rac-2         make-3.81-3.el5           make-3.81                 passed
rac-1         make-3.81-3.el5           make-3.81                 passed
Result: Package existence check passed for "make-3.81"
Check: Package existence for "binutils-2.17.50.0.6"
Node Name     Available                 Required                  Comment
rac-2         binutils-2.17.50.0.6-6.el5 binutils-2.17.50.0.6      passed
rac-1         binutils-2.17.50.0.6-12.el5 binutils-2.17.50.0.6      passed
Result: Package existence check passed for "binutils-2.17.50.0.6"
Check: Package existence for "gcc-4.1.2"
Node Name     Available                 Required                  Comment
rac-2         gcc-4.1.2-42.el5          gcc-4.1.2                 passed
rac-1         gcc-4.1.2-46.el5          gcc-4.1.2                 passed
Result: Package existence check passed for "gcc-4.1.2"
Check: Package existence for "libaio-0.3.106 (i386)"
Node Name     Available                 Required                  Comment
rac-2         libaio-0.3.106-3.2 (i386) libaio-0.3.106 (i386)     passed
rac-1         libaio-0.3.106-3.2 (i386) libaio-0.3.106 (i386)     passed
Result: Package existence check passed for "libaio-0.3.106 (i386)"
Check: Package existence for "libaio-0.3.106 (x86_64)"
Node Name     Available                 Required                  Comment
rac-2         libaio-0.3.106-3.2 (x86_64) libaio-0.3.106 (x86_64)   passed
rac-1         libaio-0.3.106-3.2 (x86_64) libaio-0.3.106 (x86_64)   passed
Result: Package existence check passed for "libaio-0.3.106 (x86_64)"
Check: Package existence for "glibc-2.5-24 (i686)"
Node Name     Available                 Required                  Comment
rac-2         glibc-2.5-24 (i686)       glibc-2.5-24 (i686)       passed
rac-1         glibc-2.5-42 (i686)       glibc-2.5-24 (i686)       passed
Result: Package existence check passed for "glibc-2.5-24 (i686)"
Check: Package existence for "glibc-2.5-24 (x86_64)"
Node Name     Available                 Required                  Comment
rac-2         glibc-2.5-24 (x86_64)     glibc-2.5-24 (x86_64)     passed
rac-1         glibc-2.5-42 (x86_64)     glibc-2.5-24 (x86_64)     passed
Result: Package existence check passed for "glibc-2.5-24 (x86_64)"
Check: Package existence for "compat-libstdc++-33-3.2.3 (i386)"
Node Name     Available                 Required                  Comment
rac-2         compat-libstdc++-33-3.2.3-61 (i386) compat-libstdc++-33-3.2.3 (i386) passed
rac-1         compat-libstdc++-33-3.2.3-61 (i386) compat-libstdc++-33-3.2.3 (i386) passed
Result: Package existence check passed for "compat-libstdc++-33-3.2.3 (i386)"
Check: Package existence for "compat-libstdc++-33-3.2.3 (x86_64)"
Node Name     Available                 Required                  Comment
rac-2         compat-libstdc++-33-3.2.3-61 (x86_64) compat-libstdc++-33-3.2.3 (x86_64) passed
rac-1         compat-libstdc++-33-3.2.3-61 (x86_64) compat-libstdc++-33-3.2.3 (x86_64) passed
Result: Package existence check passed for "compat-libstdc++-33-3.2.3 (x86_64)"
Check: Package existence for "elfutils-libelf-0.125 (x86_64)"
Node Name     Available                 Required                  Comment
rac-2         elfutils-libelf-0.125-3.el5 (x86_64) elfutils-libelf-0.125 (x86_64) passed
rac-1         elfutils-libelf-0.137-3.el5 (x86_64) elfutils-libelf-0.125 (x86_64) passed
Result: Package existence check passed for "elfutils-libelf-0.125 (x86_64)"
Check: Package existence for "elfutils-libelf-devel-0.125"
Node Name     Available                 Required                  Comment
rac-2         elfutils-libelf-devel-0.125-3.el5 elfutils-libelf-devel-0.125 passed
rac-1         elfutils-libelf-devel-0.137-3.el5 elfutils-libelf-devel-0.125 passed
Result: Package existence check passed for "elfutils-libelf-devel-0.125"
Check: Package existence for "glibc-common-2.5"
Node Name     Available                 Required                  Comment
rac-2         glibc-common-2.5-24       glibc-common-2.5          passed
rac-1         glibc-common-2.5-42       glibc-common-2.5          passed
Result: Package existence check passed for "glibc-common-2.5"
Check: Package existence for "glibc-devel-2.5 (i386)"
Node Name     Available                 Required                  Comment
rac-2         glibc-devel-2.5-24 (i386) glibc-devel-2.5 (i386)    passed
rac-1         glibc-devel-2.5-42 (i386) glibc-devel-2.5 (i386)    passed
Result: Package existence check passed for "glibc-devel-2.5 (i386)"
Check: Package existence for "glibc-devel-2.5 (x86_64)"
Node Name     Available                 Required                  Comment
rac-2         glibc-devel-2.5-24 (x86_64) glibc-devel-2.5 (x86_64) passed
rac-1         glibc-devel-2.5-42 (x86_64) glibc-devel-2.5 (x86_64) passed
Result: Package existence check passed for "glibc-devel-2.5 (x86_64)"
Check: Package existence for "glibc-headers-2.5"
Node Name     Available                 Required                  Comment
rac-2         glibc-headers-2.5-24      glibc-headers-2.5         passed
rac-1         glibc-headers-2.5-42      glibc-headers-2.5         passed
Result: Package existence check passed for "glibc-headers-2.5"
Check: Package existence for "gcc-c++-4.1.2"
Node Name     Available                 Required                  Comment
rac-2         gcc-c++-4.1.2-42.el5      gcc-c++-4.1.2             passed
rac-1         gcc-c++-4.1.2-46.el5      gcc-c++-4.1.2             passed
Result: Package existence check passed for "gcc-c++-4.1.2"
Check: Package existence for "libaio-devel-0.3.106 (i386)"
Node Name     Available                 Required                  Comment
rac-2         libaio-devel-0.3.106-3.2 (i386) libaio-devel-0.3.106 (i386) passed
rac-1         libaio-devel-0.3.106-3.2 (i386) libaio-devel-0.3.106 (i386) passed
Result: Package existence check passed for "libaio-devel-0.3.106 (i386)"
Check: Package existence for "libaio-devel-0.3.106 (x86_64)"
Node Name     Available                 Required                  Comment
rac-2         libaio-devel-0.3.106-3.2 (x86_64) libaio-devel-0.3.106 (x86_64) passed
rac-1         libaio-devel-0.3.106-3.2 (x86_64) libaio-devel-0.3.106 (x86_64) passed
Result: Package existence check passed for "libaio-devel-0.3.106 (x86_64)"
Check: Package existence for "libgcc-4.1.2 (i386)"
Node Name     Available                 Required                  Comment
rac-2         libgcc-4.1.2-42.el5 (i386) libgcc-4.1.2 (i386)       passed
rac-1         libgcc-4.1.2-46.el5 (i386) libgcc-4.1.2 (i386)       passed
Result: Package existence check passed for "libgcc-4.1.2 (i386)"
Check: Package existence for "libgcc-4.1.2 (x86_64)"
Node Name     Available                 Required                  Comment
rac-2         libgcc-4.1.2-42.el5 (x86_64) libgcc-4.1.2 (x86_64)     passed
rac-1         libgcc-4.1.2-46.el5 (x86_64) libgcc-4.1.2 (x86_64)     passed
Result: Package existence check passed for "libgcc-4.1.2 (x86_64)"
Check: Package existence for "libstdc++-4.1.2 (i386)"
Node Name     Available                 Required                  Comment
rac-2         libstdc++-4.1.2-42.el5 (i386) libstdc++-4.1.2 (i386)    passed
rac-1         libstdc++-4.1.2-46.el5 (i386) libstdc++-4.1.2 (i386)    passed
Result: Package existence check passed for "libstdc++-4.1.2 (i386)"
Check: Package existence for "libstdc++-4.1.2 (x86_64)"
Node Name     Available                 Required                  Comment
rac-2         libstdc++-4.1.2-42.el5 (x86_64) libstdc++-4.1.2 (x86_64) passed
rac-1         libstdc++-4.1.2-46.el5 (x86_64) libstdc++-4.1.2 (x86_64) passed
Result: Package existence check passed for "libstdc++-4.1.2 (x86_64)"
Check: Package existence for "libstdc++-devel-4.1.2 (x86_64)"
Node Name     Available                 Required                  Comment
rac-2         libstdc++-devel-4.1.2-42.el5 (x86_64) libstdc++-devel-4.1.2 (x86_64) passed
rac-1         libstdc++-devel-4.1.2-46.el5 (x86_64) libstdc++-devel-4.1.2 (x86_64) passed
Result: Package existence check passed for "libstdc++-devel-4.1.2 (x86_64)"
Check: Package existence for "sysstat-7.0.2"
Node Name     Available                 Required                  Comment
rac-2         sysstat-7.0.2-1.el5       sysstat-7.0.2

In both node oracle user & group should be same but in your case looks different.
Once again check your user & group.
Babu

SAP Cluster service issue

Here is the description of the PRD cluster scenario. ( windows 2008 + oracle)
We have 2 nodes .
1. host-erpn01 ( Have ASCS , Database instance, Enqueue and Dialog
Instance installed)
2. host-erp02 ( Have Central Instance, Dialog Instance and Enqueue installed)
When we move "SAP SID" service using "failover cluster management tool" from one node to another its fails and we have to manually select the "SAP SID cluster service" and "SAP SID cluster instance" to online.
These both service and instance were coming online after manual selection, however after some time in the mmc console of node 2 the sap instances hosted on node1 are in red cross and are giving " cannot connect to sap service dcom interface error 800706BA"
We replaced the sapstartsrv.exe from working directory of ASCS instance to CI executable directory.
Now the disp+work is stopped for CI instance. Also in the CI instance executable directory we can see five files with name of sapstartsrv i.e
sapstartsrv.exe.new , sapstartsrv.exe.tmp, sapstartsrv.new, sapstartsrv.pdb and actual sapstartsrv.exe file.
Here is the log of sapstartsrv.log CI work directory from node2.
trc file: "sapstartsrv.log", trc level: 0, release: "701"
pid 1968
Mon Oct 11 15:55:33 2010
SAP HA Trace: Build in SAP Microsoft Cluster library '701, patch 32, changelist 1046543' initialized
Initializing SAPControl Webservice
SapSSLInit failed => https support disabled
Starting WebService Named Pipe thread
Starting WebService thread
Webservice named pipe thread started, listening on port
.\pipe\sapcontrol_01
Webservice thread started, listening on port 50113
GCCIA\csrvadmin is starting SAP System at 2010/10/11 16:09:07
SAP HA Trace: FindClusterResource: SAP resource not found [sapwinha.cpp, line 334]
SAP HA Trace: SAP_HA_FindSAPInstance returns: SAP_HA_NOT_CLUSTERED [sapwinha.cpp, line 907]"
or you can view other logs from the work directory dump at
http://s000.tinyupload.com/index.php?file_id=45384422007535688902
Now when we try to start the SAPSID_00 service manually its giving error "The SAPSID_00 service failed to start due to the following error: The system cannot find the path specified.
Please advice.
Regards
Edited by: Tech GCCIA on Oct 11, 2010 3:27 PM
Edited by: Tech GCCIA on Oct 11, 2010 3:28 PM

Hi Sunil ,
On node 1 there is no listener.trc at /oracle_home/network/trace folder , here is the log of listener.log file in case if it is helpful.
TNSLSNR for 64-bit Windows: Version 10.2.0.4.0 - Production on 10-OCT-2010 10:37:37
Copyright (c) 1991, 2007, Oracle. All rights reserved.
System parameter file is D:\oracle\GCP\102\network\admin\listener.ora
Log messages written to D:\oracle\GCP\102\network\log\listener.log
Trace information written to D:\oracle\GCP\102\network\trace\listener.trc
Trace level is currently 0
Started with pid=3116
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
.\pipe\GCP.WORLDipc)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
.\pipe\GCPipc)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=gccia-erpn01.gccia.com.sa)(PORT=1527)))
Listener completed notification to CRS on start
TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE
TNSLSNR for 64-bit Windows: Version 10.2.0.4.0 - Production on 10-OCT-2010 11:59:37
Copyright (c) 1991, 2007, Oracle. All rights reserved.
System parameter file is D:\oracle\GCP\102\network\admin\listener.ora
Log messages written to D:\oracle\GCP\102\network\log\listener.log
Trace information written to D:\oracle\GCP\102\network\trace\listener.trc
Trace level is currently 0
Started with pid=5036
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
.\pipe\GCP.WORLDipc)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
.\pipe\GCPipc)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=1527)))
Listener completed notification to CRS on start
TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE
10-OCT-2010 12:00:31 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=60592)) * establish * GCP * 0
10-OCT-2010 12:00:31 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=60593)) * establish * GCP * 0
10-OCT-2010 12:00:31 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=60594)) * establish * GCP * 0
10-OCT-2010 12:00:31 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=60595)) * establish * GCP * 0
10-OCT-2010 12:00:31 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=60596)) * establish * GCP * 0
10-OCT-2010 13:01:19 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61336)) * establish * GCP * 0
10-OCT-2010 13:01:37 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61340)) * establish * GCP * 0
10-OCT-2010 13:01:37 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61341)) * establish * GCP * 0
10-OCT-2010 13:01:37 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61342)) * establish * GCP * 0
10-OCT-2010 13:01:37 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61343)) * establish * GCP * 0
10-OCT-2010 13:01:37 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61344)) * establish * GCP * 0
10-OCT-2010 13:08:27 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61485)) * establish * GCP * 0
10-OCT-2010 13:08:42 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61489)) * establish * GCP * 0
10-OCT-2010 13:08:42 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61490)) * establish * GCP * 0
10-OCT-2010 13:08:42 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61491)) * establish * GCP * 0
10-OCT-2010 13:08:42 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61492)) * establish * GCP * 0
10-OCT-2010 13:08:42 * (CONNECT_DATA=(SID=GCP)(GLOBAL_NAME=GCP.WORLD)(CID=(PROGRAM=D:\oracle\OFS\SRV\fs\fssvr\bin\FsSurrogate.exe)(HOST=GCCIA-ERPN01)(USER=csrvadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=61493)) * establish * GCP * 0
TNSLSNR for 64-bit Windows: Version 10.2.0.4.0 - Production on 10-OCT-2010 13:09:57
Copyright (c) 1991, 2007, Oracle. All rights reserved.
System parameter file is D:\oracle\GCP\102\network\admin\listener.ora
Log messages written to D:\oracle\GCP\102\network\log\listener.log
Trace information written to D:\oracle\GCP\102\network\trace\listener.trc
Trace level is currently 0
Started with pid=2336
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
.\pipe\GCP.WORLDipc)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
.\pipe\GCPipc)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=1527)))
Listener completed notification to CRS on start
TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE
TNSLSNR for 64-bit Windows: Version 10.2.0.4.0 - Production on 10-OCT-2010 13:14:34
Copyright (c) 1991, 2007, Oracle. All rights reserved.
System parameter file is D:\oracle\GCP\102\network\admin\listener.ora
Log messages written to D:\oracle\GCP\102\network\log\listener.log
Trace information written to D:\oracle\GCP\102\network\trace\listener.trc
Trace level is currently 0
Started with pid=4948
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
.\pipe\GCP.WORLDipc)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
.\pipe\GCPipc)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=1527)))
Listener completed notification to CRS on start
TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE
TNSLSNR for 64-bit Windows: Version 10.2.0.4.0 - Production on 10-OCT-2010 13:38:12
Copyright (c) 1991, 2007, Oracle. All rights reserved.
System parameter file is D:\oracle\GCP\102\network\admin\listener.ora
Log messages written to D:\oracle\GCP\102\network\log\listener.log
Trace information written to D:\oracle\GCP\102\network\trace\listener.trc
Trace level is currently 0
Started with pid=2456
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
.\pipe\GCP.WORLDipc)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
.\pipe\GCPipc)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=1527)))
Listener completed notification to CRS on start
TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE
TNSLSNR for 64-bit Windows: Version 10.2.0.4.0 - Production on 10-OCT-2010 14:03:35
Copyright (c) 1991, 2007, Oracle. All rights reserved.
System parameter file is D:\oracle\GCP\102\network\admin\listener.ora
Log messages written to D:\oracle\GCP\102\network\log\listener.log
Trace information written to D:\oracle\GCP\102\network\trace\listener.trc
Trace level is currently 0
Started with pid=2756
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
.\pipe\GCP.WORLDipc)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
.\pipe\GCPipc)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=1527)))
Listener completed notification to CRS on start
TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE
TNSLSNR for 64-bit Windows: Version 10.2.0.4.0 - Production on 10-OCT-2010 14:10:42
Copyright (c) 1991, 2007, Oracle. All rights reserved.
System parameter file is D:\oracle\GCP\102\network\admin\listener.ora
Log messages written to D:\oracle\GCP\102\network\log\listener.log
Trace information written to D:\oracle\GCP\102\network\trace\listener.trc
Trace level is currently 0
Started with pid=4812
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
.\pipe\GCP.WORLDipc)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=
.\pipe\GCPipc)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=1527)))
Listener completed notification to CRS on start
TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE
TNSLSNR for 64-bit Windows: Version 10.2.0.4.0 - Production on 11-OCT-2010 09:34:05
Copyright (c) 1991, 2007, Oracle. All rights reserved.
System parameter file is D:\oracle\GCP\102\network\admin\listener.ora
Log messages written to D:\oracle\GCP\102\network\log\listener.log
Trace information written to D:\oracle\GCP\102\network\trace\listener.trc
Trace level is currently 0
Started with pid=1920
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=1527)))
Listener completed notification to CRS on start
TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE
TNSLSNR for 64-bit Windows: Version 10.2.0.4.0 - Production on 11-OCT-2010 21:12:29
Copyright (c) 1991, 2007, Oracle. All rights reserved.
System parameter file is D:\oracle\GCP\102\network\admin\listener.ora
Log messages written to D:\oracle\GCP\102\network\log\listener.log
Trace information written to D:\oracle\GCP\102\network\trace\listener.trc
Trace level is currently 0
Started with pid=1952
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.11.13)(PORT=1527)))
Listener completed notification to CRS on start
TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE

He cluster resource host subsystem (RHS) stopped unexpectedly

i m getting following error on a daily at night , my all services got restarted.... Pls help
Running windows 2008 r2 Enterprise
The cluster resource host subsystem (RHS) stopped unexpectedly. An attempt will be made to restart it. This is usually due to a problem in a resource DLL. Please determine which resource DLL is causing the issue and report the problem to the resource vendor.

Hi,
this hotfix contains the latest clusres.dll. Can you try it?
http://support.microsoft.com/kb/2854082
and check this list:
http://social.technet.microsoft.com/wiki/contents/articles/2008.list-of-cluster-hotfixes-for-windows-server-2008-r2.aspx
http://OpsMgr.ru/

Why am I getting ExchangeWebServices Inbox Error: Error, ErrorServerBusy. The server cannot service this request right now. Try again later

I recently switched my application that uses EWS from an on-premise Exchange Server to Exchage Online through Office356.
The process worked just fine for several days, then I started getting the following errors;
Error accessing [USERNAME] email account.; ExchangeWebServices Inbox Error: Error, ErrorServerBusy, The server cannot service this request right now. Try again later. -->
This has been happening for the past 14 hours now.
I contacted my Office365 support team and they acted like they had never heard of the Exchange Web Services API, so no help there.
I can access the mailbox using the O365 web portal and I can access the mailbox account using the Outlook 2013 desktop client. The issue seems specific to EWS
My program is a Windows service, written in VB.Net. It connects to EWS, goes to the user account inbox, iterates through the inbox extracting attachments from messages, then moves the messages to a saved folder below the inbox.
I created the wrapper for EWS that I can reference in my project code using the following, run from an elevated VS2012 command prompt;
wsdl.exe /namespace:ExchangeWebServices /out:EWS.cs https://outlook.office365.com/ews/services.wsdl /username:[email protected] /password:p@ssw0rd
csc /out:EWS_E2K13_release /target:library EWS.cs
I bind to EWS in my class, using the following code;
Imports System.Net
Imports ExchangeWebServices
Public Class Exchange2013WebServiceClass
Private ExchangeBinding As New ExchangeServiceBinding
Public Sub New(ByVal userEmail As String, ByVal userPassword As String, ByVal URL As String)
ExchangeBinding.Credentials = New NetworkCredential(userEmail, userPassword)
ExchangeBinding.Url = URL
End Sub
The error that is logged gets triggered when my code makes a call to the following method;
Public Function GetInboxMessageIDs() As ArrayOfRealItemsType
Dim returnInboxMessageIds As ArrayOfRealItemsType = Nothing
Dim errMsg As String = String.Empty
'Create the request and specify the travesal type.
Dim FindItemRequest As FindItemType
FindItemRequest = New FindItemType
FindItemRequest.Traversal = ItemQueryTraversalType.Shallow
'Define which item properties are returned in the response.
Dim ItemProperties As ItemResponseShapeType
ItemProperties = New ItemResponseShapeType
ItemProperties.BaseShape = DefaultShapeNamesType.IdOnly
'Add properties shape to the request.
FindItemRequest.ItemShape = ItemProperties
'Identify which folders to search to find items.
Dim FolderIDArray(0) As DistinguishedFolderIdType
FolderIDArray(0) = New DistinguishedFolderIdType
FolderIDArray(0).Id = DistinguishedFolderIdNameType.inbox
'Add folders to the request.
FindItemRequest.ParentFolderIds = FolderIDArray
Try
'Send the request and get the response.
Dim FindItemResponse As FindItemResponseType
FindItemResponse = ExchangeBinding.FindItem(FindItemRequest)
'Get the response messages.
Dim ResponseMessage As ResponseMessageType()
ResponseMessage = FindItemResponse.ResponseMessages.Items
Dim FindItemResponseMessage As FindItemResponseMessageType
If ResponseMessage(0).ResponseClass = ResponseClassType.Success Then
FindItemResponseMessage = ResponseMessage(0)
returnInboxMessageIds = FindItemResponseMessage.RootFolder.Item
Else
'' Server error
Dim responseClassStr As String = [Enum].GetName(GetType(ExchangeWebServices.ResponseClassType), ResponseMessage(0).ResponseClass).ToString
Dim responseCodeStr As String = [Enum].GetName(GetType(ExchangeWebServices.ResponseCodeType), ResponseMessage(0).ResponseCode).ToString
Dim messageTextStr As String = ResponseMessage(0).MessageText.ToString
Dim thisErrMsg As String = String.Format("ExchangeWebServices Inbox Error: {0}, {1}, {2}", responseClassStr, responseCodeStr, messageTextStr)
errMsg = If(errMsg.Equals(String.Empty), String.Empty, errMsg & "; ") & thisErrMsg
End If
Catch ex As Exception
'errMsg = String.Join("; ", errMsg, ex.Message)
errMsg = If(errMsg.Equals(String.Empty), String.Empty, errMsg & "; ") & ex.Message
End Try
If Not errMsg.Equals(String.Empty) Then
returnInboxMessageIds = Nothing
Throw New System.Exception(errMsg)
End If
Return returnInboxMessageIds
End Function
Since the code worked just fine for several days and then suddenly stopped working with a server busy error, I have to think that this is some type of limit or throttling by EWS on the account. I process several thousand emails per day, in chunks of 300
at a time.
But I have no idea how to check for any limits exceeded. I am nowhere close to my O365 mailbox size limit. Right now, there are over 4,000 messages in my inbox, and growing.
Thanks in advance for any ideas you can offer.
Dave

All the API's EWS, MAPI, ActiveSync,Remote powershell are throttled on Office365 (based around what 1 particular user could resonably do). If you have had a read of this already i would recommend
http://msdn.microsoft.com/en-us/library/office/jj945066(v=exchg.150).aspx
You can't adjust or even find your current throttle usage so you have to try to design your code around living inside the default limits. If your using One Service Account to access multiple Mailboxes (or if that account is because used across multiple
applications) that can cause problems. In this case using EWS Impersonation is good solution as described in
http://blogs.msdn.com/b/exchangedev/archive/2012/04/19/more-throttling-changes-for-exchange-online.aspx (this basically means the Target Mailbox is charged instead of the Service Account).
Looking at the code one thing I notice missing is your don't appear to be paging the results of FindItems, also have versioned your requests to Exchagne2013. eg ". When the value of the
RequestServerVersion element indicates Exchange 2010 or an earlier version of Exchange, the server sends a failure response with error code
ErrorServerBusy. If the value of the RequestServerVersion
element indicates a version of Exchange starting with Exchange 2010 SP1
or Exchange Online, and the client is using paging, EWS may return a
partial result set instead of an error"
To Page FindItems Correctly you should use the IndexedPageViewType class and page the Items at no more the 1000 at a time eg something like
IndexedPageViewType indexedPageView = new IndexedPageViewType();
indexedPageView.BasePoint = IndexBasePointType.Beginning;
indexedPageView.Offset = 0;
indexedPageView.MaxEntriesReturned = 1000;
indexedPageView.MaxEntriesReturnedSpecified = true;
FindItemType findItemrequest = new FindItemType();
findItemrequest.Item = indexedPageView;
findItemrequest.ItemShape = new ItemResponseShapeType();
findItemrequest.ItemShape.BaseShape = DefaultShapeNamesType.IdOnly;
BasePathToElementType[] beAdditionproperties = new BasePathToElementType[3];
PathToUnindexedFieldType SubjectField = new PathToUnindexedFieldType();
SubjectField.FieldURI = UnindexedFieldURIType.itemSubject;
beAdditionproperties[0] = SubjectField;
PathToUnindexedFieldType RcvdTime = new PathToUnindexedFieldType();
RcvdTime.FieldURI = UnindexedFieldURIType.itemDateTimeReceived;
beAdditionproperties[1] = RcvdTime;
PathToUnindexedFieldType ReadStatus = new PathToUnindexedFieldType();
ReadStatus.FieldURI = UnindexedFieldURIType.messageIsRead;
beAdditionproperties[2] = ReadStatus;
findItemrequest.ItemShape.AdditionalProperties = beAdditionproperties;
DistinguishedFolderIdType[] faFolderIDArray = new DistinguishedFolderIdType[1];
faFolderIDArray[0] = new DistinguishedFolderIdType();
faFolderIDArray[0].Mailbox = new EmailAddressType();
faFolderIDArray[0].Mailbox.EmailAddress = "[email protected]";
faFolderIDArray[0].Id = DistinguishedFolderIdNameType.inbox;
bool moreAvailible = false;
findItemrequest.ParentFolderIds = faFolderIDArray;
int loopCount = 0;
do
FindItemResponseType frFindItemResponse = esb.FindItem(findItemrequest);
if (frFindItemResponse.ResponseMessages.Items[0].ResponseClass == ResponseClassType.Success)
foreach (FindItemResponseMessageType firmtMessage in frFindItemResponse.ResponseMessages.Items)
Console.WriteLine("Number of Items retreived : " + ((ArrayOfRealItemsType)firmtMessage.RootFolder.Item).Items.Length);
if (firmtMessage.RootFolder.IncludesLastItemInRange == false)
moreAvailible = true;
else
moreAvailible = false;
((IndexedPageViewType)findItemrequest.Item).Offset += ((ArrayOfRealItemsType)firmtMessage.RootFolder.Item).Items.Length;
Console.WriteLine("Offset : " + ((IndexedPageViewType)findItemrequest.Item).Offset);
if (firmtMessage.RootFolder.TotalItemsInView > 0)
foreach (ItemType miMailboxItem in ((ArrayOfRealItemsType)firmtMessage.RootFolder.Item).Items)
Console.WriteLine(miMailboxItem.Subject);
else
throw new Exception("error " + frFindItemResponse.ResponseMessages.Items[0].MessageText);
} while (moreAvailible);
The support people should be able to help you as long as you can get past the first level. The EWS Managed API has a RequestId header that gets submitted with requests
http://blogs.msdn.com/b/exchangedev/archive/2012/06/18/exchange-web-services-managed-api-1-2-1-now-released.aspx . In theory they should be able to take this and then from the Logs tell more information about why your request failed etc.
Cheers
Glen

Cluster multi-block requests were consuming significant database time

Hi,
DB : 10.2.0.4 RAC ASM
OS : AIX 5.2 64-bit
We are facing too much performance issues and CPU idle time becoming 20%.Based on the AWR report , the top 5 events are showing that problem is in cluster side.I placed 1st node AWR report here for your suggestions.
WORKLOAD REPOSITORY report for
DB Name DB Id Instance Inst Num Release RAC Host
PROD 1251728398 PROD1 1 10.2.0.4.0 YES msprod1
Snap Id Snap Time Sessions Curs/Sess
Begin Snap: 26177 26-Jul-11 14:29:02 142 37.7
End Snap: 26178 26-Jul-11 15:29:11 159 49.1
Elapsed: 60.15 (mins)
DB Time: 915.85 (mins)
Cache Sizes
~~~~~~~~~~~ Begin End
Buffer Cache: 23,504M 23,504M Std Block Size: 8K
Shared Pool Size: 27,584M 27,584M Log Buffer: 14,248K
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
Redo size: 28,126.82 2,675.18
Logical reads: 526,807.26 50,105.44
Block changes: 3,080.07 292.95
Physical reads: 962.90 91.58
Physical writes: 157.66 15.00
User calls: 1,392.75 132.47
Parses: 246.05 23.40
Hard parses: 11.03 1.05
Sorts: 42.07 4.00
Logons: 0.68 0.07
Executes: 930.74 88.52
Transactions: 10.51
% Blocks changed per Read: 0.58 Recursive Call %: 32.31
Rollback per transaction %: 9.68 Rows per Sort: 4276.06
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.87 Redo NoWait %: 100.00
Buffer Hit %: 99.84 In-memory Sort %: 99.99
Library Hit %: 98.25 Soft Parse %: 95.52
Execute to Parse %: 73.56 Latch Hit %: 99.51
Parse CPU to Parse Elapsd %: 9.22 % Non-Parse CPU: 99.94
Shared Pool Statistics Begin End
Memory Usage %: 68.11 71.55
% SQL with executions>1: 94.54 92.31
% Memory for SQL w/exec>1: 98.79 98.74
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
CPU time 18,798 34.2
gc cr multi block request 46,184,663 18,075 0 32.9 Cluster
gc buffer busy 2,468,308 6,897 3 12.6 Cluster
gc current block 2-way 1,826,433 4,422 2 8.0 Cluster
db file sequential read 142,632 366 3 0.7 User I/O
RAC Statistics DB/Inst: PROD/PROD1 Snaps: 26177-26178
Begin End
Number of Instances: 2 2
Global Cache Load Profile
~~~~~~~~~~~~~~~~~~~~~~~~~ Per Second Per Transaction
Global Cache blocks received: 14,112.50 1,342.26
Global Cache blocks served: 619.72 58.94
GCS/GES messages received: 2,099.38 199.68
GCS/GES messages sent: 23,341.11 2,220.01
DBWR Fusion writes: 3.43 0.33
Estd Interconnect traffic (KB) 122,826.57
Global Cache Efficiency Percentages (Target local+remote 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer access - local cache %: 97.16
Buffer access - remote cache %: 2.68
Buffer access - disk %: 0.16
Global Cache and Enqueue Services - Workload Characteristics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg global enqueue get time (ms): 0.6
Avg global cache cr block receive time (ms): 2.8
Avg global cache current block receive time (ms): 3.0
Avg global cache cr block build time (ms): 0.0
Avg global cache cr block send time (ms): 0.0
Global cache log flushes for cr blocks served %: 11.3
Avg global cache cr block flush time (ms): 1.7
Avg global cache current block pin time (ms): 0.0
Avg global cache current block send time (ms): 0.0
Global cache log flushes for current blocks served %: 0.0
Avg global cache current block flush time (ms): 4.1
Global Cache and Enqueue Services - Messaging Statistics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg message sent queue time (ms): 0.1
Avg message sent queue time on ksxp (ms): 2.4
Avg message received queue time (ms): 0.0
Avg GCS message process time (ms): 0.0
Avg GES message process time (ms): 0.0
% of direct sent messages: 6.27
% of indirect sent messages: 93.48
% of flow controlled messages: 0.25
Time Model Statistics DB/Inst: PROD/PROD1 Snaps: 26177-26178
-> Total time in database user-calls (DB Time): 54951s
-> Statistics including the word "background" measure background process
time, and so do not contribute to the DB time statistic
-> Ordered by % or DB time desc, Statistic name
Statistic Name Time (s) % of DB Time
sql execute elapsed time 54,618.2 99.4
DB CPU 18,798.1 34.2
parse time elapsed 494.3 .9
hard parse elapsed time 397.4 .7
PL/SQL execution elapsed time 38.6 .1
hard parse (sharing criteria) elapsed time 27.3 .0
sequence load elapsed time 5.0 .0
failed parse elapsed time 3.3 .0
PL/SQL compilation elapsed time 2.1 .0
inbound PL/SQL rpc elapsed time 1.2 .0
repeated bind elapsed time 0.8 .0
connection management call elapsed time 0.6 .0
hard parse (bind mismatch) elapsed time 0.3 .0
DB time 54,951.0 N/A
background elapsed time 1,027.9 N/A
background cpu time 518.1 N/A
Wait Class DB/Inst: PROD/PROD1 Snaps: 26177-26178
-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc
Avg
%Time Total Wait wait Waits
Wait Class Waits -outs Time (s) (ms) /txn
Cluster 50,666,311 .0 30,236 1 1,335.4
User I/O 419,542 .0 811 2 11.1
Network 4,824,383 .0 242 0 127.2
Other 797,753 88.5 208 0 21.0
Concurrency 212,350 .1 121 1 5.6
Commit 16,215 .0 53 3 0.4
System I/O 60,831 .0 29 0 1.6
Application 6,069 .0 6 1 0.2
Configuration 763 97.0 0 0 0.0
Second node top 5 events are as below,
Top 5 Timed Events
Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
CPU time 25,959 42.2
db file sequential read 2,288,168 5,587 2 9.1 User I/O
gc current block 2-way 822,985 2,232 3 3.6 Cluster
read by other session 345,338 1,166 3 1.9 User I/O
gc cr multi block request 991,270 831 1 1.4 Cluster
My RAM is 95GB each node and SGA is 51 GB and PGA is 14 GB.
Any inputs from your side are greatly helpful to me ,please.
Thanks,
Sunand

Hi Forstmann,
Thanks for your update.
Even i have collected ADDM report, extract of Node1 report as below
FINDING 1: 40% impact (22193 seconds)
Cluster multi-block requests were consuming significant database time.
RECOMMENDATION 1: SQL Tuning, 6% benefit (3313 seconds)
ACTION: Run SQL Tuning Advisor on the SQL statement with SQL_ID
"59qd3x0jg40h1". Look for an alternative plan that does not use
object scans.
SYMPTOMS THAT LED TO THE FINDING:
SYMPTOM: Inter-instance messaging was consuming significant database
time on this instance. (55% impact [30269 seconds])
SYMPTOM: Wait class "Cluster" was consuming significant database
time. (55% impact [30271 seconds])
FINDING 3: 13% impact (7008 seconds)
Read and write contention on database blocks was consuming significant
database time.
NO RECOMMENDATIONS AVAILABLE
SYMPTOMS THAT LED TO THE FINDING:
SYMPTOM: Inter-instance messaging was consuming significant database
time on this instance. (55% impact [30269 seconds])
SYMPTOM: Wait class "Cluster" was consuming significant database
time. (55% impact [30271 seconds])
Any help from your side , please?
Thanks,
Sunand

Service Ticket request failed

Hey,
Has anyone seen this "alert" coming from the domain controllers?
Service ticket request failed
I want to false positive it out because I've investigated.
But I'd rather go to the server guys with a fix ...

Yes your understanding is correct. The recommended approach is to tune out all the unneeded raw events at the reporting device itself.
This will save both the network and MARS from unnecessary traffic. You can find more details about this error at the following:
http://support.microsoft.com/kb/824905
http://technet.microsoft.com/en-us/library/bb742435.aspx
Regards
Farrukh

Configure the ADMIN and CLUSTER service connections to be SSL

Can you configure the ADMIN and CLUSTER service connections to be SSL
rather than tcp?
I was wondering about the present or future ability to secure other
connection services with SSL. Can you now or are there future plans
to configure the ADMIN and CLUSTER service connections to be SSL
rather than tcp? I suppose I should add the PORTMAPPER to that list.
My primary interest is for an SSLCLUSTER service in the case where
two brokers are connected over a non-trusted network. It may
not be too difficult to secure all the services the same way, but
perhaps that is on the TODO list.
A related question is if there are plans to add SSL with client
authentication as a stronger authentication mechanism than 'simple'
username and password. I believe you could get the username from
the client certificate's DN and continue to use the same LDAP user
repository for access control. I think this is similar to the way
that BEA's Weblogic server does it.
Finally should it be possible to deploy the HTTP tunnel servlet to
a webserver (such as iPlanet Web Server) configured to do SSL with
client authentication as a work-around to get stronger authentication
with the current release of the product? Or am I perhaps missing some
obvious and important detail? :) I guess I would like to know it's been
done already or is at least possible before I try and do it myself.

3 scenarios involving SSL are:
1: JMS client <------- SSL -------> iMQ broker
2: iMQ admin <------- SSL -------> iMQ broker
3: iMQ broker <------- SSL -------> iMQ broker (i.e clusters)
(1) is currently supported in iMQ 2.0
(2) and (3) is not supported in iMQ 2.0. No concrete plans yet to support
it in the near future but we'll definitely consider doing it if we
hear a lot of demand for it.
]A related question is if there are plans to add SSL with client
]authentication as a stronger authentication mechanism than 'simple'
]username and password. I believe you could get the username from
]the client certificate's DN and continue to use the same LDAP user
]repository for access control. I think this is similar to the way
]that BEA's Weblogic server does it.
This is on our todo list, but due to other more pressing issues we
have not been able to address it. We will continue to keep it
on our potential list of new features.
Sorry if I sound pretty wishy-washy in my responses above, but the fact
is that the things you mentioned above had to take a backseat
to other more critical features. That and the usual time/resource
constraints caused them not to be implemented.
]Finally should it be possible to deploy the HTTP tunnel servlet to
]a webserver (such as iPlanet Web Server) configured to do SSL with
]client authentication as a work-around to get stronger authentication
]with the current release of the product? Or am I perhaps missing some
]obvious and important detail? :) I guess I would like to know it's been
]done already or is at least possible before I try and do it myself.
Yes, this should be possible (although I don't believe we've tried it here).
The client authentication here is really only between the JMS client and the
web server (not between the tunnel servlet and the iMQ broker) and should
be similar in setup to any other java application talking to iPlanet Web
Server.

Why virtual interfaces added to ManagementOS not visible to Cluster service?

Hello All,
I"m starting this new thread since the one before is answered by our friend Udo. My problem in short is following. Diagram will be enough to explain what I'm trying to achieve. I've setup this lab to learn Hyper-V clustering with 2 nodes. It is Hyper-V
server 2012. Both nodes have 3x physical NIcs, 1 in each node is dedicated to managing the Node. Rest of the two are used to create a NIC team. Atop of that NIC team, a virtual switch is created with -AllowManagementOS
$False. Next I created and added following virtual interfaces to host partition, and plugged them into virtual switch created atop of teamed interface. These virtual interfaces should serve the purpose of various networks available.
For SAN i'm running a Linux VM which has iSCSI target server and clustering service has no problem with that. All tests pass ok.
The problem is......when those virtual interfaces added to hosts; do not appear as available networks
to cluster service; instead it only shows the management NIC as the available network to leverage.
This is making it difficult to understand how to setup a cluster of 2x Hyper-V Server nodes. Can someone help please?
Regards,
Shahzad.

Shahzad,
I've read this thread a couple of times and I don't think I'm clear on the exact question you're asking.
When the clustering service goes out to look for "Networks", what it does is scan the IP addresses on each node. Every time it finds an IP in a unique subnet, that subnet is listed as a network. It can't see virtual switches and doesn't care about
virtual vs. teamed vs. physical adapters or anything like that. It's just looking at IP addresses. This is why I'm confused when you say, "it won't show virtual interfaces available as networks". "Networks" in this context are IP subnets.
I'm not aware of any context where a singular interface would be treated like a network.
If you've got virtual adapters attached to the management operating system
and have assigned IPs to them, the cluster should have discovered those networks. If you have multiple adapters on the same node using IPs in the same subnet, that network will only appear once and the cluster service will only use
one adapter from that subnet on that node. The one it picked will be visible on the "Network Connections" tab at the bottom of Failover Cluster Manager when you're on the Networks section.
Eric Siron Altaro Hyper-V Blog
I am an independent blog contributor, not an Altaro employee. I am solely responsible for the content of my posts.
"Every relationship you have is in worse shape than you think."
Hello Eric and friends,
Eric, much appreciated about your interest about the issue and yes I agree with you when you said... "When the clustering service goes out to look for "Networks",
what it does is scan the IP addresses on each node. Every time it finds an IP in a unique subnet, that subnet is listed as a network. It can't see virtual switches and doesn't care about virtual vs. teamed vs. physical adapters or anything like that. It's
just looking at IP addresses. This is why I'm confused when you say, "it won't show virtual interfaces available as networks". "Networks" in this context are IP subnets. I'm not aware of any context where a singular interface would be treated
like a network."
By networks I meant to say subnets. Let me explain what I've configured so far:
Node 1 & Node 2 installed with 3x NICs. All 3 NICs/node plugged into same switch.
Node1: 131.107.0.50/24
Node2: 131.107l.0.150/24
A Core Domain controller VM running on Node 1: 131.107.0.200/24
A JUMPBOX (WS 2012 R2 Std.) VM running on Node 1: 131.107.0.100/24
A Linux SAN VM running on Node 2: 10.1.1.100/8
I planed to configured following networks:
(1) Cluster traffic: 10.0.0.50/24 (IP given to virtual interface for Cluster traffic in Node1)
Cluster traffic: 10.0.0.150/24 (IP given to virtual interface for Cluster traffic in Node2)
(2) SAN traffic: 10.1.1.50/8 (IP given to virtual interfce for SAN traffic in Node1)
SAN traffic: 10.1.1.150/8 (IP given to virtual interfce for SAN traffic in Node2)
Note: Cluster service has no problem accessing the SAN VM (10.1.1.100) over this network, it validates SAN settings and comes back OK. This is an indication that virtual interface is
working fine.
(3) Migration traffic: 172.168.0.50/8 (IP given to virtual interfce for
Migration traffic in Node1)
Migration traffic: 172.168.0.150/8 (IP given to virtual interfce for
Migration traffic in Node2)
All these networks (virtual interfaces) are made available through two virtual switches which are configured EXACTLY identical on both Node1/Node2.
Now after finishing the cluster validation steps (which comes all OK), when create cluster wizard starts, it only shows one network; i.e. network of physical Layer 2 switch i.e. 131.107.0.0/24.
I wonder why it won't show IPs of other networks (10.0.0.0/8, 10.1.1.0/8 and 172.168.0.0/8)
Regards,
Shahzad

Cluster service is requested to stop on all nodes when DNS is unavailable

Similar Messages

Maybe you are looking for