DAG Sporadic Entire Server DB Fail Over

Hi,
I have been having this issues for a while now, I have two physical exchange servers in a DAG, both on Exchange 2013 CU1. Randomly, every few days and various times, Server1 will fail all of it's databases over to Server2. I'll redistribute them, and again,
say Server2 will fail all databases to Server1. In short, both servers at times have failed their databases over.
I started with this: http://technet.microsoft.com/en-us/library/dd351258(v=exchg.150).aspx which led me to setup monitoring of the Microsoft-Exchange-ManagedAvailability logs. I can tell you that replication tests work fine, and the health of all the
databases are fine.
My monitoring turned up the following errors, in this example "EX0001" was the server that failed all of it's databases over to "EX0002".  It seems pretty clear to me, that Exchange Managed Availability, is finding an issue with
EWS, attempting to restart the MSExchangeServicesApp pool and cannot due to "Throttling" so ti fails the DB's over, that's my  best guess...the problem is I dont know how to fix this...I've run through troubleshooting EWS Healthset, nothing
really turns up... http://technet.microsoft.com/en-us/library/ms.exch.scom.ews.protocol(v=exchg.150).aspx
EX0001
1011
Microsoft-Exchange-ManagedAvailability
Recovery
Microsoft-Exchange-ManagedAvailability/RecoveryActionLogs
5/22/2014 7:06:43 AM
Warning (Info)
1520183
NT AUTHORITY\SYSTEM
RecycleApplicationPool-MSExchangeServicesAppPool-EWSSelfTestRestart: Throttling rejected the operation
EX0001
4
Microsoft-Exchange-ManagedAvailability
Monitoring
Microsoft-Exchange-ManagedAvailability/Monitoring
5/22/2014 7:17:27 AM
Error (Info)
8287
NT AUTHORITY\SYSTEM
The EWS.Protocol health set has detected a problem on EX0001 beginning at 5/22/2014 10:55:12 AM (UTC). The health manager is reporting that recycling the MSExchangeServicesAppPool
app pool has failed to restore health and it has tried to fail over active copies of local databases to a healthy server. Attempts to auto-recover from this condition have failed and requires Administrator attention. Details below: <b>MachineName:</b>
EX0001 <b>ServiceName:</b> EWS.Protocol <b>ResultName:</b> EWSSelfTestProbe/MSExchangeServicesAppPool <b>Error:</b> System.Exception: System.Exception: >>> PRIMARY ENDPOINT VERIFICATION EwsUrl=https://localhost:444/ews/exchange.asmx
UserName/Password=HealthMailbox663889950a344102878cede289222a46@domain.local/xGAVmP[^jn{qGgOx0Jtx:4X+-j@?d%XM?@7yErsoFF[_#u[%LcX=0hPzMln#1PiQ/7z?14rJJs8Dc)AYLi0F9mU)bMpL_gj{Q3*[Yt1:UgX=:CkQc=[Xuagz%Od=|@tt AuthMethod=CAFE ConvertId (Attempt #0) Status=The
request failed. The operation has timed out ConvertId (Attempt #0) Latency=59521.1327 ConvertId (Attempt #1) Status=iteration 1; 55.427003 seconds elapsed ConvertId (Attempt #1) Latency=55427.003 at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSCommon.RetrySoapActionAndThrow(Action
operation, String soapAction, ExchangeServiceBase service) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.ExecuteEWSCall(String endPoint, String operation, Boolean verifyAffinity) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.DoWorkInternal(CancellationToken
cancellationToken) <b>Exception:</b> System.Exception: System.Exception: System.Exception: >>> PRIMARY ENDPOINT VERIFICATION EwsUrl=https://localhost:444/ews/exchange.asmx
UserName/Password=HealthMailbox663889950a344102878cede289222a46@domain.local/xGAVmP[^jn{qGgOx0Jtx:4X+-j@?d%XM?@7yErsoFF[_#u[%LcX=0hPzMln#1PiQ/7z?14rJJs8Dc)AYLi0F9mU)bMpL_gj{Q3*[Yt1:UgX=:CkQc=[Xuagz%Od=|@tt AuthMethod=CAFE ConvertId (Attempt #0) Status=The
request failed. The operation has timed out ConvertId (Attempt #0) Latency=59521.1327 ConvertId (Attempt #1) Status=iteration 1; 55.427003 seconds elapsed ConvertId (Attempt #1) Latency=55427.003 at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSCommon.RetrySoapActionAndThrow(Action
operation, String soapAction, ExchangeServiceBase service) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.ExecuteEWSCall(String endPoint, String operation, Boolean verifyAffinity) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.DoWorkInternal(CancellationToken
cancellationToken) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSCommon.ThrowError(Object key, Object exceptiondata, String logDetails) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.DoWorkInternal(CancellationToken
cancellationToken) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.RunEWSGenericProbe(CancellationToken cancellationToken) at Microsoft.Exchange.WorkerTaskFramework.WorkItem.Execute(CancellationToken joinedToken) at Microsoft.Exchange.WorkerTaskFramework.WorkItem.<>c__DisplayClass2.<StartExecuting>b__0()
at System.Threading.Tasks.Task.Execute() <b>ExecutionContext:</b> EWSGenericProbeError:Exception=System.Exception: System.Exception: >>> PRIMARY ENDPOINT VERIFICATION EwsUrl=https://localhost:444/ews/exchange.asmx
UserName/Password=HealthMailbox663889950a344102878cede289222a46@domain.local/xGAVmP[^jn{qGgOx0Jtx:4X+-j@?d%XM?@7yErsoFF[_#u[%LcX=0hPzMln#1PiQ/7z?14rJJs8Dc)AYLi0F9mU)bMpL_gj{Q3*[Yt1:UgX=:CkQc=[Xuagz%Od=|@tt AuthMethod=CAFE ConvertId (Attempt #0) Status=The
request failed. The operation has timed out ConvertId (Attempt #0) Latency=59521.1327 ConvertId (Attempt #1) Status=iteration 1; 55.427003 seconds elapsed ConvertId (Attempt #1) Latency=55427.003 at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSCommon.RetrySoapActionAndThrow(Action
operation, String soapAction, ExchangeServiceBase service) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.ExecuteEWSCall(String endPoint, String operation, Boolean verifyAffinity) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon
<b>FailureContext:</b> <b>ResultType:</b> Failed <b>IsNotified:</b> False <b>DeploymentId:</b> 0 <b>RetryCount:</b> 0 <b>ExtensionXml:</b> <b>Version:</b> <b>StateAttribute1:</b>
EWS <b>StateAttribute2:</b> Unknown <b>StateAttribute3:</b> <b>StateAttribute4:</b> <b>StateAttribute5:</b> <b>StateAttribute6:</b> 0 <b>StateAttribute7:</b> 0 <b>StateAttribute8:</b>
0 <b>StateAttribute9:</b> 0 <b>StateAttribute10:</b> 0 <b>StateAttribute11:</b> <b>StateAttribute12:</b> <b>StateAttribute13:</b> <b>StateAttribute14:</b> <b>StateAttribute14:</b>
<b>StateAttribute16:</b> 0 <b>StateAttribute17:</b> 0 <b>StateAttribute18:</b> 0 <b>StateAttribute19:</b> 0 <b>StateAttribute20:</b> 120011 <b>StateAttribute21:</b> [000.000] EWSCommon
start: 5/22/2014 11:13:13 AM [000.000] Configuring EWScommon [000.000] Probe time limit: 120000ms, HTTP timeout: 59500ms, RetryCount: 1 [000.047] using authN: CAFE
[email protected] xGAVmP[^jn{qGgOx0Jtx:4X+-j@?d%XM?@7yErsoFF[_#u[%LcX=0hPzMln#1PiQ/7z?14rJJs8Dc)AYLi0F9mU)bMpL_gj{Q3*[Yt1:UgX=:CkQc=[Xuagz%Od=|@tt
[000.047] using HTTP request timeout: 59500 ms [000.047] action iteration 0 [000.047] starting (total time left 119954 ms) [059.568] action threw Microsoft.Exchange.WebServices.Data.ServiceRequestException: The request failed. The operation has timed out [064.584]
action iteration 1 [064.584] starting (total time left 55416 ms) [120.011] action wait timed out [120.011] action threw System.TimeoutException: iteration 1; 55.427003 seconds elapsed <b>StateAttribute22:</b> <b>StateAttribute23:</b>
<b>StateAttribute24:</b> <b>StateAttribute25:</b> <b>PoisonedCount:</b> 0 <b>ExecutionId:</b> 32395373 <b>ExecutionStartTime:</b> 5/22/2014 11:13:13 AM <b>ExecutionEndTime:</b> 5/22/2014
11:15:13 AM <b>ResultId:</b> 253233015 <b>SampleValue:</b> 0 ------------------------------------------------------------------------------- States of all monitors within the health set: Note: Data may be stale. To get current data,
run: Get-ServerHealth -Identity 'EX0001' -HealthSet 'EWS.Protocol' State Name TargetResource HealthSet AlertValue ServerComponent ----- ---- -------------- --------- ---------- --------------- NotApplicable EWSSelfTestMonitor MSExchangeServicesAppPool EWS.Protocol
Unhealthy None NotApplicable EWSDeepTestMonitor DG01DB15 EWS.Protocol Unhealthy None NotApplicable PrivateWorkingSetWarningThresholdExc... msexchangeservicesapppool EWS.Protocol Healthy None NotApplicable ProcessProcessorTimeErrorThresholdEx... msexchangeservicesapppool
EWS.Protocol Healthy None NotApplicable ExchangeCrashEventErrorThresholdExce... msexchangeservicesapppool EWS.Protocol Healthy None States of all health sets: Note: Data may be stale. To get current data, run: Get-HealthReport -Identity 'EX0001' State HealthSet
AlertValue LastTransitionTime MonitorCount ----- --------- ---------- ------------------ ------------ NotApplicable Autodiscover.Protocol Healthy 3/8/2014 12:46:17 AM 4 NotApplicable ActiveSync.Protocol Healthy 3/8/2014 1:15:35 AM 7 NotApplicable ActiveSync
Healthy 3/8/2014 2:08:15 AM 3 NotApplicable EDS Healthy 5/22/2014 5:19:41 AM 13 NotApplicable ECP Healthy 3/8/2014 1:15:27 AM 3 NotApplicable EventAssistants Healthy 5/22/2014 5:48:56 AM 28 NotApplicable EWS.Protocol Unhealthy 5/22/2014 7:07:12 AM 5 NotApplicable
FIPS Healthy 5/21/2014 10:24:01 PM 18 NotApplicable AD Healthy 2/23/2014 10:42:29 PM 10 NotApplicable OWA.Protocol.Dep Healthy 5/22/2014 5:19:40 AM 1 NotApplicable Monitoring Unhealthy 5/22/2014 5:35:31 AM 9 Online HubTransport Unhealthy 5/22/2014 5:19:43
AM 138 NotApplicable DataProtection Healthy 5/22/2014 7:08:02 AM 201 NotApplicable AntiSpam Healthy 5/22/2014 5:19:40 AM 4 NotApplicable Network Healthy 5/21/2014 10:36:54 PM 1 NotApplicable OWA.Protocol Healthy 3/8/2014 1:15:34 AM 5 NotApplicable MailboxMigration
Healthy 3/8/2014 12:46:18 AM 4 NotApplicable MRS Healthy 3/8/2014 12:44:35 AM 9 NotApplicable MailboxTransport Healthy 5/22/2014 5:19:41 AM 57 NotApplicable PublicFolders Healthy 5/21/2014 10:44:15 PM 4 NotApplicable RPS Healthy 2/23/2014 11:38:33 PM 1 NotApplicable
Outlook.Protocol Healthy 4/22/2014 11:04:18 AM 3 NotApplicable UserThrottling Healthy 5/22/2014 5:51:13 AM 7 NotApplicable SiteMailbox Healthy 3/8/2014 2:10:53 AM 3 NotApplicable UM.Protocol Healthy 5/22/2014 5:19:41 AM 17 NotApplicable Store Healthy 5/22/2014
5:19:43 AM 225 NotApplicable MSExchangeCertificateDeplo... Disabled 1/1/0001 12:00:00 AM 2 NotApplicable DAL Healthy 8/2/2013 12:59:03 AM 16 NotApplicable Search Healthy 5/22/2014 5:37:18 AM 269 Online EWS.Proxy Healthy 5/5/2014 1:34:08 AM 1 Online RPS.Proxy
Healthy 5/5/2014 1:34:38 AM 13 Online OAB.Proxy Healthy 5/5/2014 1:34:37 AM 1 Online ECP.Proxy Healthy 5/5/2014 1:34:17 AM 4 Online OWA.Proxy Healthy 5/5/2014 1:34:25 AM 2 Online Outlook.Proxy Healthy 5/5/2014 1:34:08 AM 1 Online Autodiscover.Proxy Healthy
5/5/2014 1:34:08 AM 1 Online ActiveSync.Proxy Healthy 5/5/2014 1:34:35 AM 1 Online RWS.Proxy Healthy 5/5/2014 1:34:18 AM 10 NotApplicable Autodiscover Healthy 5/21/2014 10:24:01 PM 2 Online FrontendTransport Healthy 5/15/2014 12:49:31 AM 11 NotApplicable EWS
Unhealthy 5/22/2014 7:06:01 AM 2 NotApplicable OWA Healthy 2/23/2014 11:37:56 PM 1 NotApplicable Outlook Healthy 3/8/2014 12:45:14 AM 5 Online UM.CallRouter Healthy 5/22/2014 5:19:41 AM 7 NotApplicable RemoteMonitoring Healthy 8/2/2013 12:58:03 AM 1 NotApplicable
POP.Protocol Healthy 5/20/2014 9:22:12 AM 5 NotApplicable IMAP.Protocol Healthy 5/20/2014 9:22:21 AM 5 Online POP.Proxy Healthy 3/7/2014 1:31:10 PM 1 Online IMAP.Proxy Healthy 3/7/2014 1:31:10 PM 1 NotApplicable IMAP Healthy 5/20/2014 9:23:32 AM 2 NotApplicable
POP Healthy 5/20/2014 9:17:18 AM 2 NotApplicable Antimalware Healthy 5/15/2014 8:33:13 AM 8 NotApplicable FfoQuarantine Healthy 8/2/2013 12:58:20 AM 1 Online Transport Healthy 5/22/2014 5:38:00 AM 9 NotApplicable Security Healthy 3/8/2014 12:46:09 AM 3 NotApplicable
Datamining Healthy 3/8/2014 12:45:44 AM 3 NotApplicable Provisioning Healthy 3/8/2014 12:45:40 AM 3 NotApplicable ProcessIsolation Healthy 3/8/2014 12:47:05 AM 12 NotApplicable TransportSync Healthy 3/8/2014 12:45:37 AM 3 NotApplicable MessageTracing Healthy
3/8/2014 12:44:56 AM 3 NotApplicable CentralAdmin Healthy 3/8/2014 12:45:12 AM 3 NotApplicable OAB Healthy 8/2/2013 1:02:27 AM 3 NotApplicable Calendaring Healthy 8/2/2013 1:02:07 AM 3 NotApplicable PushNotifications.Protocol Healthy 2/23/2014 10:46:17 PM
3 NotApplicable Ediscovery.Protocol Healthy 5/21/2014 10:38:16 PM 1 NotApplicable HDPhoto Healthy 5/6/2014 9:36:25 AM 1 NotApplicable Clustering Healthy 3/8/2014 12:45:34 AM 4 NotApplicable DiskController Healthy 4/22/2014 2:51:30 AM 1 NotApplicable MailboxSpace
Healthy 5/22/2014 6:16:51 AM 96 NotApplicable FreeBusy Healthy 5/22/2014 5:32:54 AM 1 Note: Subsequent detected alerts are suppressed until the health set is healthy again.

Hi,
Based on the error message, throttling rejected the operation. I recommend you use the Get-ThrottlingPolicy | fl cmdlet to view EWS settings in throttling policy.
You can modify the default throttling policy and set the basic settings for EWS. Then restart the Microsoft Exchange Throttling service and recycle the MSExchangeServicesAppPool to check the result.
For more information about the EWS throttling, you can refer to the following articles.
EWS throttling in Exchange
http://msdn.microsoft.com/en-us/library/office/jj945066(v=exchg.150).aspx
EWS Best Practices: Understand Throttling Policies
http://blogs.msdn.com/b/mstehle/archive/2010/11/09/ews-best-practices-understand-throttling-policies.aspx
Best regards,
Belinda
Belinda Ma
TechNet Community Support

Similar Messages

  • Which role do I need DFS or File server on fail over cluster server 2012 R2?

    what I want to achieve is that I want to share all my user data files in a central location and to be highly available all the time whether it's a general share or folder redirection data. BUT I'm a bit confused;  I have fail over cluster  set-up
    on server 2012, now I would like to add DFS as a role but than we have another role called File server and virtually it does the same thing as DFS? Means it creates a namespace share that can be access even one of the nodes goes down. Now I am thinking is
    that DFS does the replication between two physical location but fail over cluster works slightly differently  and with file server it pretty much does the same thing except for replicating data from one drive to another. Now what do you suggest I do or
    did I get the concept wrong like a noob?

    DFS and Failover Clustering for file shares provides a similar end result for file access, but they are significantly different implementations.
    Clustering provides high availability to files by presenting shared access to set a files served from a cluster.  With 2012 R2 Microsoft added the ability to create a Scale-out File Server that even allows all nodes of the cluster to server access to
    the files for a higher level of performance and other great things.  Bottom line with Failover Clusters for files is that there is a single copy of the file presented from the cluster.
    DFS on the other hand provides high availability to files by presenting multiple copies of the file by making a copy in two or more locations and presenting a naming space that allows access to the file through any of the network paths.  DFS works very
    well for files that are primarily read-only.  When you get into a situation where there is a lot of updating of the shared files, DFS is not a very good solution.  There are ways to implement DFS for read/write files, but it generally requires a
    good knowledge of how the files are used and how you want to manage them.
    The key to answering your question comes in your first sentence "I want to share all my user data files in a central location and to be highly available all the time".  My initial reaction to this is that central location means Failover Cluster
    - there is only a single copy of the file.  However, "all the time" can be compromised by network failures to the central site.  Remote sites would not have access if they can't access the central site.  DFS provides the ability to
    have copies remotely, but then if you allow updating at multiple sites, you have to manage the merging of the changes, among other things.
    . : | : . : | : . tim

  • Is my installation of SQL Server Fail Over cluster correct?

    I made a 2 node SQL Server 2012 fail over cluster but having some problems during installation so I wanted to know if the steps below I performed are correct.
    Hardware
    Node1 192.168.1.10
    Node2 192.168.1.11
    Added following entries in DNS
    cluster.domain.local 192.168.1.12 (for Windows Cluster)
    msdtc.domain.local 192.168.1.13 (for MSDTC)
    sql.domain.local 192.168.1.14 (for SQL Server Cluster)
    Cluster Storage
    Disk1 (for Quorum)
    Disk2 (for MSDTC
    Disk3 (for SQL Server)
    Now comes the installation. I am performing all these steps as DOMAIN ADMIN.
    1. First I installed clustering role on both nodes
    2. Then I ran fail over validation wizard on Node1 adding both nodes which went fine (there were some warnings)
    3. Then I made a Windows Cluster on Node1 using these two nodes. I gave the name and IP to this cluster which I wrote above i.e. cluster.domain.local 192.168.1.12
    4. Cluster was created and boths nodes are UP.
    Now I want to ask a question here. Is it best practice to perform the above operation using DOMAIN ADMIN? Or if I use a standard domain user account with local admin rights, will it work? If not then exactly what rights are required to perform this operation.
    5. Then I installed "Application Server" role on both Node1 and Node2 and also added "Distributed Transaction" feature
    6. Then I right clicked on Windows Cluster I created and added a new role/feature which is "DTC"
    7. I gave it the same name which I wrote above i.e. msdtc.domain.local 192.168.1.13
    8. MSDTC was created but when it tried to UP its service, it threw an error. Upon investigation it turns out the Windows Cluster cluster.domain.local doesn't have proper rights to created some objects in AD. I didn't know what rights to give so I gave it full
    permission and after that when I created MSDTC again, the service went up fine.
    So I want to know what rights does cluster.domain.com require to make MSDTC?
    Am I doing good so far?

    Hello,
    >>Then I made a Windows Cluster on Node1 using these two nodes. I gave the name and IP to this cluster which I wrote above i.e. cluster.domain.local 192.168.1.10
    Hello I suppose this IP was physical node IP windows cluster IP was 192.168.1.12  I suppose yo must have given this IP as windows cluster IP.10 and 11 are physical nodes in Cluster but 12 is Cluster IP .Correct me if I am wrong.
    Did you do failover and failback to check whether cluster is configured correctly or not ,If not please do it .
    >>Then I ran fail over validation wizard on Node1 adding both nodes which went fine (there were some warnings)
    Please remove warnings also ,it might cause issue.Not sure its correct every time but make sure cluster validation should be free of error and warning.
    >>Now I want to ask a question here. Is it best practice to perform the above operation using DOMAIN ADMIN?
    You can do it with domain admin account as this is required to create Cluster NAme object(CNO) in domain and local account might not have that right so I would say its ok.
    >>I gave it the same name which I wrote above i.e. msdtc.domain.local
    192.168.1.11
    again this IP is node 2 IP how can you give it to MSDTC.Use below link for reference
    http://blogs.msdn.com/b/cindygross/archive/2009/02/22/how-to-configure-dtc-for-sql-server-in-a-windows-2008-cluster.aspx
    Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers

  • SQL 2005 mirroring - time taken to fail over?

    I'm looking for an easy way to 'measure' the amount of time it takes our principal Database Server to fail over to our mirror Server in which all regular service will be resumed.
    Any help would be hugely appreciated 
    Thanks guys
    ras

    http://technet.microsoft.com/en-us/library/ms187465.aspx
    Best Regards,Uri Dimant SQL Server MVP,
    http://sqlblog.com/blogs/uri_dimant/
    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting:
    Large scale of database and data cleansing
    Remote DBA Services:
    Improves MS SQL Database Performance
    SQL Server Integration Services:
    Business Intelligence

  • SQL Server 2014 Always on HA takes 8-14 seconds to fail over. Application side timeouts occur

    Hi All,
    I have a very similar post in the SQL Server 2014 forums too (https://social.technet.microsoft.com/Forums/sqlserver/en-US/adb5e338-907e-4405-aa62-d3ea93c7a98a/sql-server-2014-always-on-ha-takes-814-seconds-to-fail-over-application-side-timeouts-occur?forum=sqldisasterrecovery) -
    advice in the end was to post a question here.
    SQL Server Nodes, 2014 (12.0.2480.0)
    1 Share witness (on separate subnet)
    1 Cluster
    1 Listener
    I have been testing the response time to failovers – both manual (right-click, fail over in SSMS) and Automatic (shut down the primary host). The way I am testing response is to have a SSMS query running on my desktop, connected to the listener querying
    a small table and hit execute.
    The Query response time, from execute to receiving the result, has been between 8 and 14 seconds based on my testing. My previous experience (in a separate environment) showed around 2 second fail over times in a very similar configuration.
    Availability DB is 200Mb and is not actively used. The nodes are synchronised.
    SQL Server Hosts: Windows 2012, 2 cpu, 8gb RAM.
    Questions:
    1: It’s a big question but what should I expect for a ‘normal’ fail over time. Keep in mind this scenario is about as simple as it gets.
    2: As it stands an 8 to 14 second ‘outage’ could cause some applications to time out. Or am I being un-reasonable? I am seeing the very simple query in SSMS to time out with this:
    Msg 983, Level 14, State 1, Line 2
    Unable to access availability database 'DATABASE' because the database replica is not in the PRIMARY or SECONDARY role. Connections to
    an availability database is permitted only when the database replica is in the PRIMARY or SECONDARY role. Try the operation again later.
    Cluster logs are long - this section accounts for 8 seconds of the 11 second outage I experienced. I can supply the full log if required. Also this log is just the 2 cluster nodes, I removed the witness share to make sure it was as simple as possible.
    00001090.00002128::2015/02/25-03:05:08.255 INFO  [GEM] Node 2: Deleting [1:65 , 1:71] (both included) as it has been ack'd by every node
    00001ee4.00002130::2015/02/25-03:05:10.107 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:5b81e7bd-58fe-4be9-a68a-c48ba2aa552b:Netbios
    00001090.00002128::2015/02/25-03:05:11.888 INFO  [GEM] Node 2: Deleting [1:72 , 1:73] (both included) as it has been ack'd by every node
    00001090.00002698::2015/02/25-03:05:11.889 INFO  [GUM] Node 2: Processing RequestLock 2:49
    00001090.00002128::2015/02/25-03:05:11.890 INFO  [GUM] Node 2: Processing GrantLock to 2 (sent by 1 gumid: 67)
    00001090.00002698::2015/02/25-03:05:11.890 INFO  [GUM] Node 2: executing request locally, gumId:68, my action: /dm/update, # of updates: 1
    00001090.00002128::2015/02/25-03:05:12.890 INFO  [GEM] Node 2: Deleting [1:74 , 1:74] (both included) as it has been ack'd by every node
    00001ee4.00002130::2015/02/25-03:05:15.107 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:5b81e7bd-58fe-4be9-a68a-c48ba2aa552b:Netbios
    00001090.00002128::2015/02/25-03:05:16.988 INFO  [GUM] Node 2: Processing RequestLock 1:28
    Thanks in advance.
    Keegan

    Hi Keegan,
    From these event log , what I can see is "Sending request Netname" wasted the time .
    Could you please tell us the network configuration of that cluster nodes ?
    If I recall correctly , it is recommended to only remain Tcp/IP protocol and disable NetBIOS over TCP/IP for "Private Network" , also do not configure DNS/Wins default gateway for "Private Network" :
    https://support.microsoft.com/kb/258750?wa=wsignin1.0
    After that please test again .
    Best Regards,
    Elton JI
    Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact [email protected] .

  • Failover cluster server - File Server role is clustered - Shadow copies do not seem to travel to other node when failing over

    Hi,
    New to 2012 and implementing a clustered environment for our File Services role.  Have got to a point where I have successfully configured the Shadow copy settings.
    Have a large (15tb) disk.  S:
    Have a VSS drive (volume shadow copy drive) V:
    Have successfully configured through Windows Explorer the Shadow copy settings.
    Created dependencies in Failcover Cluster Server console whereby S: depends on V:
    However, when I failover the resource and browse the Client Access Point share there are no entries under the "Previous Versions" tab. 
    When I visit the S: drive in windows explorer and open the Shadow copy dialogue box, there are entries showing the times and dates of the shadow copies ran when on the original node.  So the disk knows about the shadow copies that were ran on the
    original node but the "previous versions" tab has no entries to display.
    This is in a 2012 server (NOT R2 version).
    Can anyone explain what might be the reason?  Do I have an "issue" or is this by design?
    All help apprecieated!
    Kathy
    Kathleen Hayhurst Senior IT Support Analyst

    Hi,
    Please first check the requirements in following article:
    Using Shadow Copies of Shared Folders in a server cluster
    http://technet.microsoft.com/en-us/library/cc779378(v=ws.10).aspx
    Cluster-managed shadow copies can only be created in a single quorum device cluster on a disk with a Physical Disk resource. In a single node cluster or majority node set cluster without a shared cluster disk, shadow copies can only be created and managed
    locally.
    You cannot enable Shadow Copies of Shared Folders for the quorum resource, although you can enable Shadow Copies of Shared Folders for a File Share resource.
    The recurring scheduled task that generates volume shadow copies must run on the same node that currently owns the storage volume.
    The cluster resource that manages the scheduled task must be able to fail over with the Physical Disk resource that manages the storage volume.
    If you have any feedback on our support, please send to [email protected]

  • Http cluster servlet not failing over when no answer received from server

              I am using weblogic 510 sp9. I have a weblogic server proxying all requests to
              a weblogic cluster using the httpclusterservlet.
              When I kill the weblogic process servicing my request, I see the next request
              get failed over to the secondary server and all my session information has been
              replicated. In short I see the behavior I expect.
              r.troon
              However, when I either disconnect the primary server from the network or just
              switch this server off, I just get a message back
              to the browser - "unable to connect to servers".
              I don't really understand why the behaviour should be different . I would expect
              both to failover in the same manner. Does the cluster servlet only handle tcp
              reset failures?
              Has anybody else experience this or have any ideas.
              Thanks
              

    I think I might have found the answer......
    The AD objects for the clusters had been moved from the Computers OU into a newly created OU. I'm suspecting that the cluster node computer objects didn't have perms to the cluster object within that OU and that was causing the issue. I know I've seen cluster
    object issues before when moving to a new OU.
    All has started working again for the moment so I now just need to investigate what permissions I need on the new OU so that I can move the cluster object in.

  • Weblogic Admin server fail over

    Hi,
    Please let me know if there is a official documentation from Oracle for admin server fail over for version 8.x, 9.x & 10.x?

    I am not sure if there is something as weblogic Admin Server Failover
    For Managed Server failover please read
    http://download.oracle.com/docs/cd/E12840_01/wls/docs103/cluster/failover.html

  • How to add a cloud machine as a node to existing windows fail over cluster having on-premise node in Windows server 2008 R2

    Hi All,
    We have a windows fail over cluster having one windows machine on local network as one of its node.
    I want to add a virtual cloud machine available on microsoft azure as another node to this existing cluster.
    Please suggest how to do this?
    Thanking all in advance,
    Raghvendra

    Before you even start working on the SQL side, you will need to create a Windows Server 2008 R2 cluster with no shared storage.  You can actually test that in-house.  Create a VM running 2008 R2 and cluster it with your physical (from your description,
    I am assuming physical) 2008 R2 machine. Create it with a file share witness for quorum. Then configure your environment to see that it works as expected.
    Once you know how to configure the cluster between physical and VM with a file share witness, build it to Azure.  The location of the FSW gets to be an interesting choice.  To have a FSW in Azure means that you will need another VM in Azure to
    host the file share, meaning you have two quorum votes in Azure and one in-house.  Or, you could create a file share witness on an in-house system, giving you two quorum votes in-house and one in Azure.
    In the FSW in Azure scenario, if you have a loss of the in-house server, automatic failover occurs because two quorum votes exist in Azure.  With FSW in-house, depending on the loss you have in-house, you might have to force quorum to get the Azure
    single-node cluster to run.  Loss of access to Azure reverses those scenarios.  Neither one is optimal, but it does provide some level of recoverability.
    . : | : . : | : . tim

  • Fail-over options for Standalone Print Server

    Our organization recently set up 3 2012 r2 Print Servers to handle 3 separate sites.  Each printer server contains only the printers within the site - which makes each server a standalone print server.  I'm concern with the issue of not
    having a fail-over plan in place, in the event one of my server should fail.  Does anyone have any fail-over suggestions.

    Hi Thomas,
    If we want to create a clustered printer server, we need to created another print server in the same site.
    For detailed information, please refer to the link below:
    https://technet.microsoft.com/en-us/library/cc771091.aspx
    Best Regards.
    Steven Lee Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Support, contact [email protected]

  • Two wistnesses in a SQL Server fail over group

    Is it possible to have two witnesses in a SQL Server Always on Availability Group Fail Over Cluster? Our goal is to have redundant witnesses in an Azure availability set.
    Thanks,
    Mike

    AlwaysOn uses Windows Failover Clustering for quorum.  See, eg Understanding Quorum Configurations in a Failover Cluster
    You can do this, but with Dynamic Quorum it's probably not helpful.  If you loose your witness vote, the cluster will adjust the quorum requirements.
    David
    David http://blogs.msdn.com/b/dbrowne/

  • Server Pool Master fails and cannot fail over to another VM Server

    Dear All,
    Oracle VM 2.2.2
    I have 2 VM Servers connect to Storage 6140 Array and on VM Manager I enable HA on the server pool, then on all virtual machines.
    - VM Server 1 has role as Server Pool Master, Utility Server, Virtual Machine Server and has virtual machines running
    - VM Server 2 has role as Utility Server, Virtual Machine Server and has virtual machines running.
    I try to shutdown the VM Server 1 act as Server Pool Master but I don't see Server Pool Master fail over to another VM Server 2 and also status become to Unreachable both of 2 Servers.
    Especially, All virtual machines cannot be accessible.
    Please kindly give advice for this.
    Thanks and regards,
    Heng

    Thanks Avi, I'll find and read that document. And thanks also for elaborating about the Utility Server.
    After reading the followups to my original question, I tried to think of possible server "layouts" in a HA environment.
    1) "N" servers in the pool, one of them is Pool Master, Utility Server AND VM Guests Server at the same time. Maybe this will be the preferred server for smaller, quicker VMs.
    2) "N" servers in the pool, one is Pool Master AND Utility Server, but has no VM guests running on it
    3) "N" servers in the pool, one is the Pool Master, another one is the Utility Server (none of them has VMs running on them), and finally a number of VM Guest servers
    Let's take case 1. If the Pool Master & Utility server fails, given that it has VM guests running on it as well, I understand from your explanation that I'll be ANYWAY able to manually "live migrate" the guests somewhere else, using VM Manager. Is this correct?
    If it's correct, then it's just a question of how much money I want to spend to have dedicated servers for different tasks, JUST FOR BETTER PERFORMANCES REASONS. Do you agree? And especially: do YOU have dedicated Pool Masters (just to figure out your "real" approach to the problem :-) )
    I feel that I still miss something, the picture is not completely clear to me. The fact is, that I'm now testing on my new bladesystem, but for now I put up one single blade. Testing HA will be the next step. I was just trying to get a few things sorted out in advance, but there is still something that I'm missing, as I was saying...
    Looking forward to your next reply, thanx again
    Rob

  • Fail over is not happening in Weblogic JSP Server

    Hi..
    We have 6 Weblogic instances running as application server (EJB) and 4 Weblogic
    instances running as web server (JSP). We have configured one cluster for EJB
    servers and one cluster for JSP servers. In front-end we are using four Apache
    servers to proxy the request to Weblogic JSP cluster. In my httpd.conf file I
    have configured with the Weblogic cluster. I can see the requests are going in
    all the servers and believe the cluster is working fine in terms of load balancing
    (round-robin). The clients are accessing the servers using CSS (Cisco Load Balancer).
    But when we test the fail-over in the cluster, we are facing problems. Let me
    explain the scenarios of the fail-over test:
    1.     The load was generated by the Load Generator
    2.     When the load is there, we shut down one Apache server, even though there was
    some failed transaction, immedialty the servers become stable. So fail-over is
    happening in this stage.
    3.     When I shutdown one EJB instance, again after some failed transactions, the
    transactions become stable
    4.     But, when I shutdown one JSP instance, immediately the transaction failed and
    it is not able to fail over to another JSP server and the number of failed transactions
    increased.
    So I guess, there is some problem in the proxy plug-in configuration, so that
    when I shutdown one JSP server, still the requests are being send to the JSP server
    by the Apache proxy plug-in.
    I have read various queries posted in the News Groups and found some information
    about configuring session and cookie information in the Weblogic.xml file. Also
    I’m not sure what are all the configurations needs to be done in the Weblogic.xml
    and httpd.conf file. Kindly help me to resolve the problem. I would appreciate
    your response.
    ===============================================================
    My httpd.conf file plug-in configuration:
    ###WebLogic Proxy Directives. If proxying to a WebLogic Cluster see WebLogic
    Documentation.
    <IfModule mod_weblogic.c>
    WebLogicCluster X.X.X.X1:7001,X.X.X.X2:7001,X.X.X.X3:7001,X.X.X.X4:7001
    MatchExpression *.jsp
    </IfModule>
    <Location /apollo>
    SetHandler weblogic-handler
    DynamicServerList ON
    HungServerRecoverSecs 600
    ConnectTimeoutSecs 40
    ConnectRetrySecs 2
    </Location>
    ==============================================================
    Thanks in advance,
    Siva.

    Hi,
    I can see that bug 13703600 is already got fixed in 12.1.2 but still you same problem please raise ticket with oracle support.
    Regrds,
    Kal

  • ACE 4710 - 'reverse proxy' infront of serverfarm - fail-over/sorry server design issue

    Hi All,
    I'm working on a specific config and have an issue in the backup farm/fail-over/sorry server area.
    The customer wants the following:
    They have an existing serverfarm with X web servers, they want a single server to act as a reverse-proxy in front of the farm.
    So that all traffic goes trough that server, that server then forwards the request to the original serverfarm.
    The problem in my design is in the fail-over, if i configure the reverse-proxy server in a new serverfarm and use the original (web servers) farm as backup it has fail-over, but if the reverse-proxy AND the original serverfarm fail, there is no nice way to get the users on a sorry server.
    I could give the original serverfarms rservers a 'backup standby' server but that won't give the desired effect either.
    For maintance they first take 50% of the servers offline and switch to the other 50% after that, so then users would see a sorry page even if there where operational servers in the farm left.
    The 4710's are running routed mode, and the farms use Sticky Cookie, and also some http URL & Cookie matching is done.
    Anyone have an idea how to build this?

    Hi,
    It need additional testing but as per my understanding if you put the back up in this order then the last backup server will be choosen first.
    In your case it will be like " RSERVER1 >> backup sorry server >> backup web content
    As per the below example:
    I put test 2 as first backup server and test1 as second backup server but if you look at the first part it took rserver test1 as first backup.
    serverfarm host 1313-GIN-GWAP-SDC-80
      rserver RSERVER1
        backup-rserver test1
        inservice
      rserver test1
        inservice standby
      rserver test2
        inservice standby
    regards,
    Ajay Kumar

  • Problems with LDAP Server fail-over

    Our Xsan installed with 12 FCP, 2 MDC Xserve and 2 LDAP Xserver for fail-over.
    The 2 MDC fail-over runs well but the 2 LDAP fail-over got problems.
    The first time we up-plug the powercode of 1 xserve and the other LDAP takes over successfully but FCP users re-login takes 15 minutes. That's unacceptable.
    The fail-over never succeed after that.
    That means once the LDAP down and the backup LDAP will not take the job, we will lose everything related to user login.
    Anybody can help? Thanks a lot.

    I believe you can enter both LDAP servers in the client configuration for LDAP access. (Even though you shouldn't have to)
    IP failover is not the issue, your LDAP configuration is.
    Start at page 90 and work throught this document to make sure you have the clients setup properly.
    http://manuals.info.apple.com/en/MacOSXSrvr10.3_OpenDirectoryAdmin.pdf

Maybe you are looking for