DAG Sporadic Entire Server DB Fail Over
Hi,
I have been having this issues for a while now, I have two physical exchange servers in a DAG, both on Exchange 2013 CU1. Randomly, every few days and various times, Server1 will fail all of it's databases over to Server2. I'll redistribute them, and again,
say Server2 will fail all databases to Server1. In short, both servers at times have failed their databases over.
I started with this: http://technet.microsoft.com/en-us/library/dd351258(v=exchg.150).aspx which led me to setup monitoring of the Microsoft-Exchange-ManagedAvailability logs. I can tell you that replication tests work fine, and the health of all the
databases are fine.
My monitoring turned up the following errors, in this example "EX0001" was the server that failed all of it's databases over to "EX0002". It seems pretty clear to me, that Exchange Managed Availability, is finding an issue with
EWS, attempting to restart the MSExchangeServicesApp pool and cannot due to "Throttling" so ti fails the DB's over, that's my best guess...the problem is I dont know how to fix this...I've run through troubleshooting EWS Healthset, nothing
really turns up... http://technet.microsoft.com/en-us/library/ms.exch.scom.ews.protocol(v=exchg.150).aspx
EX0001
1011
Microsoft-Exchange-ManagedAvailability
Recovery
Microsoft-Exchange-ManagedAvailability/RecoveryActionLogs
5/22/2014 7:06:43 AM
Warning (Info)
1520183
NT AUTHORITY\SYSTEM
RecycleApplicationPool-MSExchangeServicesAppPool-EWSSelfTestRestart: Throttling rejected the operation
EX0001
4
Microsoft-Exchange-ManagedAvailability
Monitoring
Microsoft-Exchange-ManagedAvailability/Monitoring
5/22/2014 7:17:27 AM
Error (Info)
8287
NT AUTHORITY\SYSTEM
The EWS.Protocol health set has detected a problem on EX0001 beginning at 5/22/2014 10:55:12 AM (UTC). The health manager is reporting that recycling the MSExchangeServicesAppPool
app pool has failed to restore health and it has tried to fail over active copies of local databases to a healthy server. Attempts to auto-recover from this condition have failed and requires Administrator attention. Details below: <b>MachineName:</b>
EX0001 <b>ServiceName:</b> EWS.Protocol <b>ResultName:</b> EWSSelfTestProbe/MSExchangeServicesAppPool <b>Error:</b> System.Exception: System.Exception: >>> PRIMARY ENDPOINT VERIFICATION EwsUrl=https://localhost:444/ews/exchange.asmx
UserName/Password=HealthMailbox663889950a344102878cede289222a46@domain.local/xGAVmP[^jn{qGgOx0Jtx:4X+-j@?d%XM?@7yErsoFF[_#u[%LcX=0hPzMln#1PiQ/7z?14rJJs8Dc)AYLi0F9mU)bMpL_gj{Q3*[Yt1:UgX=:CkQc=[Xuagz%Od=|@tt AuthMethod=CAFE ConvertId (Attempt #0) Status=The
request failed. The operation has timed out ConvertId (Attempt #0) Latency=59521.1327 ConvertId (Attempt #1) Status=iteration 1; 55.427003 seconds elapsed ConvertId (Attempt #1) Latency=55427.003 at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSCommon.RetrySoapActionAndThrow(Action
operation, String soapAction, ExchangeServiceBase service) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.ExecuteEWSCall(String endPoint, String operation, Boolean verifyAffinity) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.DoWorkInternal(CancellationToken
cancellationToken) <b>Exception:</b> System.Exception: System.Exception: System.Exception: >>> PRIMARY ENDPOINT VERIFICATION EwsUrl=https://localhost:444/ews/exchange.asmx
UserName/Password=HealthMailbox663889950a344102878cede289222a46@domain.local/xGAVmP[^jn{qGgOx0Jtx:4X+-j@?d%XM?@7yErsoFF[_#u[%LcX=0hPzMln#1PiQ/7z?14rJJs8Dc)AYLi0F9mU)bMpL_gj{Q3*[Yt1:UgX=:CkQc=[Xuagz%Od=|@tt AuthMethod=CAFE ConvertId (Attempt #0) Status=The
request failed. The operation has timed out ConvertId (Attempt #0) Latency=59521.1327 ConvertId (Attempt #1) Status=iteration 1; 55.427003 seconds elapsed ConvertId (Attempt #1) Latency=55427.003 at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSCommon.RetrySoapActionAndThrow(Action
operation, String soapAction, ExchangeServiceBase service) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.ExecuteEWSCall(String endPoint, String operation, Boolean verifyAffinity) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.DoWorkInternal(CancellationToken
cancellationToken) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSCommon.ThrowError(Object key, Object exceptiondata, String logDetails) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.DoWorkInternal(CancellationToken
cancellationToken) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.RunEWSGenericProbe(CancellationToken cancellationToken) at Microsoft.Exchange.WorkerTaskFramework.WorkItem.Execute(CancellationToken joinedToken) at Microsoft.Exchange.WorkerTaskFramework.WorkItem.<>c__DisplayClass2.<StartExecuting>b__0()
at System.Threading.Tasks.Task.Execute() <b>ExecutionContext:</b> EWSGenericProbeError:Exception=System.Exception: System.Exception: >>> PRIMARY ENDPOINT VERIFICATION EwsUrl=https://localhost:444/ews/exchange.asmx
UserName/Password=HealthMailbox663889950a344102878cede289222a46@domain.local/xGAVmP[^jn{qGgOx0Jtx:4X+-j@?d%XM?@7yErsoFF[_#u[%LcX=0hPzMln#1PiQ/7z?14rJJs8Dc)AYLi0F9mU)bMpL_gj{Q3*[Yt1:UgX=:CkQc=[Xuagz%Od=|@tt AuthMethod=CAFE ConvertId (Attempt #0) Status=The
request failed. The operation has timed out ConvertId (Attempt #0) Latency=59521.1327 ConvertId (Attempt #1) Status=iteration 1; 55.427003 seconds elapsed ConvertId (Attempt #1) Latency=55427.003 at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSCommon.RetrySoapActionAndThrow(Action
operation, String soapAction, ExchangeServiceBase service) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.ExecuteEWSCall(String endPoint, String operation, Boolean verifyAffinity) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon
<b>FailureContext:</b> <b>ResultType:</b> Failed <b>IsNotified:</b> False <b>DeploymentId:</b> 0 <b>RetryCount:</b> 0 <b>ExtensionXml:</b> <b>Version:</b> <b>StateAttribute1:</b>
EWS <b>StateAttribute2:</b> Unknown <b>StateAttribute3:</b> <b>StateAttribute4:</b> <b>StateAttribute5:</b> <b>StateAttribute6:</b> 0 <b>StateAttribute7:</b> 0 <b>StateAttribute8:</b>
0 <b>StateAttribute9:</b> 0 <b>StateAttribute10:</b> 0 <b>StateAttribute11:</b> <b>StateAttribute12:</b> <b>StateAttribute13:</b> <b>StateAttribute14:</b> <b>StateAttribute14:</b>
<b>StateAttribute16:</b> 0 <b>StateAttribute17:</b> 0 <b>StateAttribute18:</b> 0 <b>StateAttribute19:</b> 0 <b>StateAttribute20:</b> 120011 <b>StateAttribute21:</b> [000.000] EWSCommon
start: 5/22/2014 11:13:13 AM [000.000] Configuring EWScommon [000.000] Probe time limit: 120000ms, HTTP timeout: 59500ms, RetryCount: 1 [000.047] using authN: CAFE
[email protected] xGAVmP[^jn{qGgOx0Jtx:4X+-j@?d%XM?@7yErsoFF[_#u[%LcX=0hPzMln#1PiQ/7z?14rJJs8Dc)AYLi0F9mU)bMpL_gj{Q3*[Yt1:UgX=:CkQc=[Xuagz%Od=|@tt
[000.047] using HTTP request timeout: 59500 ms [000.047] action iteration 0 [000.047] starting (total time left 119954 ms) [059.568] action threw Microsoft.Exchange.WebServices.Data.ServiceRequestException: The request failed. The operation has timed out [064.584]
action iteration 1 [064.584] starting (total time left 55416 ms) [120.011] action wait timed out [120.011] action threw System.TimeoutException: iteration 1; 55.427003 seconds elapsed <b>StateAttribute22:</b> <b>StateAttribute23:</b>
<b>StateAttribute24:</b> <b>StateAttribute25:</b> <b>PoisonedCount:</b> 0 <b>ExecutionId:</b> 32395373 <b>ExecutionStartTime:</b> 5/22/2014 11:13:13 AM <b>ExecutionEndTime:</b> 5/22/2014
11:15:13 AM <b>ResultId:</b> 253233015 <b>SampleValue:</b> 0 ------------------------------------------------------------------------------- States of all monitors within the health set: Note: Data may be stale. To get current data,
run: Get-ServerHealth -Identity 'EX0001' -HealthSet 'EWS.Protocol' State Name TargetResource HealthSet AlertValue ServerComponent ----- ---- -------------- --------- ---------- --------------- NotApplicable EWSSelfTestMonitor MSExchangeServicesAppPool EWS.Protocol
Unhealthy None NotApplicable EWSDeepTestMonitor DG01DB15 EWS.Protocol Unhealthy None NotApplicable PrivateWorkingSetWarningThresholdExc... msexchangeservicesapppool EWS.Protocol Healthy None NotApplicable ProcessProcessorTimeErrorThresholdEx... msexchangeservicesapppool
EWS.Protocol Healthy None NotApplicable ExchangeCrashEventErrorThresholdExce... msexchangeservicesapppool EWS.Protocol Healthy None States of all health sets: Note: Data may be stale. To get current data, run: Get-HealthReport -Identity 'EX0001' State HealthSet
AlertValue LastTransitionTime MonitorCount ----- --------- ---------- ------------------ ------------ NotApplicable Autodiscover.Protocol Healthy 3/8/2014 12:46:17 AM 4 NotApplicable ActiveSync.Protocol Healthy 3/8/2014 1:15:35 AM 7 NotApplicable ActiveSync
Healthy 3/8/2014 2:08:15 AM 3 NotApplicable EDS Healthy 5/22/2014 5:19:41 AM 13 NotApplicable ECP Healthy 3/8/2014 1:15:27 AM 3 NotApplicable EventAssistants Healthy 5/22/2014 5:48:56 AM 28 NotApplicable EWS.Protocol Unhealthy 5/22/2014 7:07:12 AM 5 NotApplicable
FIPS Healthy 5/21/2014 10:24:01 PM 18 NotApplicable AD Healthy 2/23/2014 10:42:29 PM 10 NotApplicable OWA.Protocol.Dep Healthy 5/22/2014 5:19:40 AM 1 NotApplicable Monitoring Unhealthy 5/22/2014 5:35:31 AM 9 Online HubTransport Unhealthy 5/22/2014 5:19:43
AM 138 NotApplicable DataProtection Healthy 5/22/2014 7:08:02 AM 201 NotApplicable AntiSpam Healthy 5/22/2014 5:19:40 AM 4 NotApplicable Network Healthy 5/21/2014 10:36:54 PM 1 NotApplicable OWA.Protocol Healthy 3/8/2014 1:15:34 AM 5 NotApplicable MailboxMigration
Healthy 3/8/2014 12:46:18 AM 4 NotApplicable MRS Healthy 3/8/2014 12:44:35 AM 9 NotApplicable MailboxTransport Healthy 5/22/2014 5:19:41 AM 57 NotApplicable PublicFolders Healthy 5/21/2014 10:44:15 PM 4 NotApplicable RPS Healthy 2/23/2014 11:38:33 PM 1 NotApplicable
Outlook.Protocol Healthy 4/22/2014 11:04:18 AM 3 NotApplicable UserThrottling Healthy 5/22/2014 5:51:13 AM 7 NotApplicable SiteMailbox Healthy 3/8/2014 2:10:53 AM 3 NotApplicable UM.Protocol Healthy 5/22/2014 5:19:41 AM 17 NotApplicable Store Healthy 5/22/2014
5:19:43 AM 225 NotApplicable MSExchangeCertificateDeplo... Disabled 1/1/0001 12:00:00 AM 2 NotApplicable DAL Healthy 8/2/2013 12:59:03 AM 16 NotApplicable Search Healthy 5/22/2014 5:37:18 AM 269 Online EWS.Proxy Healthy 5/5/2014 1:34:08 AM 1 Online RPS.Proxy
Healthy 5/5/2014 1:34:38 AM 13 Online OAB.Proxy Healthy 5/5/2014 1:34:37 AM 1 Online ECP.Proxy Healthy 5/5/2014 1:34:17 AM 4 Online OWA.Proxy Healthy 5/5/2014 1:34:25 AM 2 Online Outlook.Proxy Healthy 5/5/2014 1:34:08 AM 1 Online Autodiscover.Proxy Healthy
5/5/2014 1:34:08 AM 1 Online ActiveSync.Proxy Healthy 5/5/2014 1:34:35 AM 1 Online RWS.Proxy Healthy 5/5/2014 1:34:18 AM 10 NotApplicable Autodiscover Healthy 5/21/2014 10:24:01 PM 2 Online FrontendTransport Healthy 5/15/2014 12:49:31 AM 11 NotApplicable EWS
Unhealthy 5/22/2014 7:06:01 AM 2 NotApplicable OWA Healthy 2/23/2014 11:37:56 PM 1 NotApplicable Outlook Healthy 3/8/2014 12:45:14 AM 5 Online UM.CallRouter Healthy 5/22/2014 5:19:41 AM 7 NotApplicable RemoteMonitoring Healthy 8/2/2013 12:58:03 AM 1 NotApplicable
POP.Protocol Healthy 5/20/2014 9:22:12 AM 5 NotApplicable IMAP.Protocol Healthy 5/20/2014 9:22:21 AM 5 Online POP.Proxy Healthy 3/7/2014 1:31:10 PM 1 Online IMAP.Proxy Healthy 3/7/2014 1:31:10 PM 1 NotApplicable IMAP Healthy 5/20/2014 9:23:32 AM 2 NotApplicable
POP Healthy 5/20/2014 9:17:18 AM 2 NotApplicable Antimalware Healthy 5/15/2014 8:33:13 AM 8 NotApplicable FfoQuarantine Healthy 8/2/2013 12:58:20 AM 1 Online Transport Healthy 5/22/2014 5:38:00 AM 9 NotApplicable Security Healthy 3/8/2014 12:46:09 AM 3 NotApplicable
Datamining Healthy 3/8/2014 12:45:44 AM 3 NotApplicable Provisioning Healthy 3/8/2014 12:45:40 AM 3 NotApplicable ProcessIsolation Healthy 3/8/2014 12:47:05 AM 12 NotApplicable TransportSync Healthy 3/8/2014 12:45:37 AM 3 NotApplicable MessageTracing Healthy
3/8/2014 12:44:56 AM 3 NotApplicable CentralAdmin Healthy 3/8/2014 12:45:12 AM 3 NotApplicable OAB Healthy 8/2/2013 1:02:27 AM 3 NotApplicable Calendaring Healthy 8/2/2013 1:02:07 AM 3 NotApplicable PushNotifications.Protocol Healthy 2/23/2014 10:46:17 PM
3 NotApplicable Ediscovery.Protocol Healthy 5/21/2014 10:38:16 PM 1 NotApplicable HDPhoto Healthy 5/6/2014 9:36:25 AM 1 NotApplicable Clustering Healthy 3/8/2014 12:45:34 AM 4 NotApplicable DiskController Healthy 4/22/2014 2:51:30 AM 1 NotApplicable MailboxSpace
Healthy 5/22/2014 6:16:51 AM 96 NotApplicable FreeBusy Healthy 5/22/2014 5:32:54 AM 1 Note: Subsequent detected alerts are suppressed until the health set is healthy again.
Hi,
Based on the error message, throttling rejected the operation. I recommend you use the Get-ThrottlingPolicy | fl cmdlet to view EWS settings in throttling policy.
You can modify the default throttling policy and set the basic settings for EWS. Then restart the Microsoft Exchange Throttling service and recycle the MSExchangeServicesAppPool to check the result.
For more information about the EWS throttling, you can refer to the following articles.
EWS throttling in Exchange
http://msdn.microsoft.com/en-us/library/office/jj945066(v=exchg.150).aspx
EWS Best Practices: Understand Throttling Policies
http://blogs.msdn.com/b/mstehle/archive/2010/11/09/ews-best-practices-understand-throttling-policies.aspx
Best regards,
Belinda
Belinda Ma
TechNet Community Support
Similar Messages
-
Which role do I need DFS or File server on fail over cluster server 2012 R2?
what I want to achieve is that I want to share all my user data files in a central location and to be highly available all the time whether it's a general share or folder redirection data. BUT I'm a bit confused; I have fail over cluster set-up
on server 2012, now I would like to add DFS as a role but than we have another role called File server and virtually it does the same thing as DFS? Means it creates a namespace share that can be access even one of the nodes goes down. Now I am thinking is
that DFS does the replication between two physical location but fail over cluster works slightly differently and with file server it pretty much does the same thing except for replicating data from one drive to another. Now what do you suggest I do or
did I get the concept wrong like a noob?DFS and Failover Clustering for file shares provides a similar end result for file access, but they are significantly different implementations.
Clustering provides high availability to files by presenting shared access to set a files served from a cluster. With 2012 R2 Microsoft added the ability to create a Scale-out File Server that even allows all nodes of the cluster to server access to
the files for a higher level of performance and other great things. Bottom line with Failover Clusters for files is that there is a single copy of the file presented from the cluster.
DFS on the other hand provides high availability to files by presenting multiple copies of the file by making a copy in two or more locations and presenting a naming space that allows access to the file through any of the network paths. DFS works very
well for files that are primarily read-only. When you get into a situation where there is a lot of updating of the shared files, DFS is not a very good solution. There are ways to implement DFS for read/write files, but it generally requires a
good knowledge of how the files are used and how you want to manage them.
The key to answering your question comes in your first sentence "I want to share all my user data files in a central location and to be highly available all the time". My initial reaction to this is that central location means Failover Cluster
- there is only a single copy of the file. However, "all the time" can be compromised by network failures to the central site. Remote sites would not have access if they can't access the central site. DFS provides the ability to
have copies remotely, but then if you allow updating at multiple sites, you have to manage the merging of the changes, among other things.
. : | : . : | : . tim -
Is my installation of SQL Server Fail Over cluster correct?
I made a 2 node SQL Server 2012 fail over cluster but having some problems during installation so I wanted to know if the steps below I performed are correct.
Hardware
Node1 192.168.1.10
Node2 192.168.1.11
Added following entries in DNS
cluster.domain.local 192.168.1.12 (for Windows Cluster)
msdtc.domain.local 192.168.1.13 (for MSDTC)
sql.domain.local 192.168.1.14 (for SQL Server Cluster)
Cluster Storage
Disk1 (for Quorum)
Disk2 (for MSDTC
Disk3 (for SQL Server)
Now comes the installation. I am performing all these steps as DOMAIN ADMIN.
1. First I installed clustering role on both nodes
2. Then I ran fail over validation wizard on Node1 adding both nodes which went fine (there were some warnings)
3. Then I made a Windows Cluster on Node1 using these two nodes. I gave the name and IP to this cluster which I wrote above i.e. cluster.domain.local 192.168.1.12
4. Cluster was created and boths nodes are UP.
Now I want to ask a question here. Is it best practice to perform the above operation using DOMAIN ADMIN? Or if I use a standard domain user account with local admin rights, will it work? If not then exactly what rights are required to perform this operation.
5. Then I installed "Application Server" role on both Node1 and Node2 and also added "Distributed Transaction" feature
6. Then I right clicked on Windows Cluster I created and added a new role/feature which is "DTC"
7. I gave it the same name which I wrote above i.e. msdtc.domain.local 192.168.1.13
8. MSDTC was created but when it tried to UP its service, it threw an error. Upon investigation it turns out the Windows Cluster cluster.domain.local doesn't have proper rights to created some objects in AD. I didn't know what rights to give so I gave it full
permission and after that when I created MSDTC again, the service went up fine.
So I want to know what rights does cluster.domain.com require to make MSDTC?
Am I doing good so far?Hello,
>>Then I made a Windows Cluster on Node1 using these two nodes. I gave the name and IP to this cluster which I wrote above i.e. cluster.domain.local 192.168.1.10
Hello I suppose this IP was physical node IP windows cluster IP was 192.168.1.12 I suppose yo must have given this IP as windows cluster IP.10 and 11 are physical nodes in Cluster but 12 is Cluster IP .Correct me if I am wrong.
Did you do failover and failback to check whether cluster is configured correctly or not ,If not please do it .
>>Then I ran fail over validation wizard on Node1 adding both nodes which went fine (there were some warnings)
Please remove warnings also ,it might cause issue.Not sure its correct every time but make sure cluster validation should be free of error and warning.
>>Now I want to ask a question here. Is it best practice to perform the above operation using DOMAIN ADMIN?
You can do it with domain admin account as this is required to create Cluster NAme object(CNO) in domain and local account might not have that right so I would say its ok.
>>I gave it the same name which I wrote above i.e. msdtc.domain.local
192.168.1.11
again this IP is node 2 IP how can you give it to MSDTC.Use below link for reference
http://blogs.msdn.com/b/cindygross/archive/2009/02/22/how-to-configure-dtc-for-sql-server-in-a-windows-2008-cluster.aspx
Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers -
SQL 2005 mirroring - time taken to fail over?
I'm looking for an easy way to 'measure' the amount of time it takes our principal Database Server to fail over to our mirror Server in which all regular service will be resumed.
Any help would be hugely appreciated
Thanks guys
rashttp://technet.microsoft.com/en-us/library/ms187465.aspx
Best Regards,Uri Dimant SQL Server MVP,
http://sqlblog.com/blogs/uri_dimant/
MS SQL optimization: MS SQL Development and Optimization
MS SQL Consulting:
Large scale of database and data cleansing
Remote DBA Services:
Improves MS SQL Database Performance
SQL Server Integration Services:
Business Intelligence -
SQL Server 2014 Always on HA takes 8-14 seconds to fail over. Application side timeouts occur
Hi All,
I have a very similar post in the SQL Server 2014 forums too (https://social.technet.microsoft.com/Forums/sqlserver/en-US/adb5e338-907e-4405-aa62-d3ea93c7a98a/sql-server-2014-always-on-ha-takes-814-seconds-to-fail-over-application-side-timeouts-occur?forum=sqldisasterrecovery) -
advice in the end was to post a question here.
SQL Server Nodes, 2014 (12.0.2480.0)
1 Share witness (on separate subnet)
1 Cluster
1 Listener
I have been testing the response time to failovers – both manual (right-click, fail over in SSMS) and Automatic (shut down the primary host). The way I am testing response is to have a SSMS query running on my desktop, connected to the listener querying
a small table and hit execute.
The Query response time, from execute to receiving the result, has been between 8 and 14 seconds based on my testing. My previous experience (in a separate environment) showed around 2 second fail over times in a very similar configuration.
Availability DB is 200Mb and is not actively used. The nodes are synchronised.
SQL Server Hosts: Windows 2012, 2 cpu, 8gb RAM.
Questions:
1: It’s a big question but what should I expect for a ‘normal’ fail over time. Keep in mind this scenario is about as simple as it gets.
2: As it stands an 8 to 14 second ‘outage’ could cause some applications to time out. Or am I being un-reasonable? I am seeing the very simple query in SSMS to time out with this:
Msg 983, Level 14, State 1, Line 2
Unable to access availability database 'DATABASE' because the database replica is not in the PRIMARY or SECONDARY role. Connections to
an availability database is permitted only when the database replica is in the PRIMARY or SECONDARY role. Try the operation again later.
Cluster logs are long - this section accounts for 8 seconds of the 11 second outage I experienced. I can supply the full log if required. Also this log is just the 2 cluster nodes, I removed the witness share to make sure it was as simple as possible.
00001090.00002128::2015/02/25-03:05:08.255 INFO [GEM] Node 2: Deleting [1:65 , 1:71] (both included) as it has been ack'd by every node
00001ee4.00002130::2015/02/25-03:05:10.107 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:5b81e7bd-58fe-4be9-a68a-c48ba2aa552b:Netbios
00001090.00002128::2015/02/25-03:05:11.888 INFO [GEM] Node 2: Deleting [1:72 , 1:73] (both included) as it has been ack'd by every node
00001090.00002698::2015/02/25-03:05:11.889 INFO [GUM] Node 2: Processing RequestLock 2:49
00001090.00002128::2015/02/25-03:05:11.890 INFO [GUM] Node 2: Processing GrantLock to 2 (sent by 1 gumid: 67)
00001090.00002698::2015/02/25-03:05:11.890 INFO [GUM] Node 2: executing request locally, gumId:68, my action: /dm/update, # of updates: 1
00001090.00002128::2015/02/25-03:05:12.890 INFO [GEM] Node 2: Deleting [1:74 , 1:74] (both included) as it has been ack'd by every node
00001ee4.00002130::2015/02/25-03:05:15.107 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:5b81e7bd-58fe-4be9-a68a-c48ba2aa552b:Netbios
00001090.00002128::2015/02/25-03:05:16.988 INFO [GUM] Node 2: Processing RequestLock 1:28
Thanks in advance.
KeeganHi Keegan,
From these event log , what I can see is "Sending request Netname" wasted the time .
Could you please tell us the network configuration of that cluster nodes ?
If I recall correctly , it is recommended to only remain Tcp/IP protocol and disable NetBIOS over TCP/IP for "Private Network" , also do not configure DNS/Wins default gateway for "Private Network" :
https://support.microsoft.com/kb/258750?wa=wsignin1.0
After that please test again .
Best Regards,
Elton JI
Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact [email protected] . -
Hi,
New to 2012 and implementing a clustered environment for our File Services role. Have got to a point where I have successfully configured the Shadow copy settings.
Have a large (15tb) disk. S:
Have a VSS drive (volume shadow copy drive) V:
Have successfully configured through Windows Explorer the Shadow copy settings.
Created dependencies in Failcover Cluster Server console whereby S: depends on V:
However, when I failover the resource and browse the Client Access Point share there are no entries under the "Previous Versions" tab.
When I visit the S: drive in windows explorer and open the Shadow copy dialogue box, there are entries showing the times and dates of the shadow copies ran when on the original node. So the disk knows about the shadow copies that were ran on the
original node but the "previous versions" tab has no entries to display.
This is in a 2012 server (NOT R2 version).
Can anyone explain what might be the reason? Do I have an "issue" or is this by design?
All help apprecieated!
Kathy
Kathleen Hayhurst Senior IT Support AnalystHi,
Please first check the requirements in following article:
Using Shadow Copies of Shared Folders in a server cluster
http://technet.microsoft.com/en-us/library/cc779378(v=ws.10).aspx
Cluster-managed shadow copies can only be created in a single quorum device cluster on a disk with a Physical Disk resource. In a single node cluster or majority node set cluster without a shared cluster disk, shadow copies can only be created and managed
locally.
You cannot enable Shadow Copies of Shared Folders for the quorum resource, although you can enable Shadow Copies of Shared Folders for a File Share resource.
The recurring scheduled task that generates volume shadow copies must run on the same node that currently owns the storage volume.
The cluster resource that manages the scheduled task must be able to fail over with the Physical Disk resource that manages the storage volume.
If you have any feedback on our support, please send to [email protected] -
Http cluster servlet not failing over when no answer received from server
I am using weblogic 510 sp9. I have a weblogic server proxying all requests to
a weblogic cluster using the httpclusterservlet.
When I kill the weblogic process servicing my request, I see the next request
get failed over to the secondary server and all my session information has been
replicated. In short I see the behavior I expect.
r.troon
However, when I either disconnect the primary server from the network or just
switch this server off, I just get a message back
to the browser - "unable to connect to servers".
I don't really understand why the behaviour should be different . I would expect
both to failover in the same manner. Does the cluster servlet only handle tcp
reset failures?
Has anybody else experience this or have any ideas.
Thanks
I think I might have found the answer......
The AD objects for the clusters had been moved from the Computers OU into a newly created OU. I'm suspecting that the cluster node computer objects didn't have perms to the cluster object within that OU and that was causing the issue. I know I've seen cluster
object issues before when moving to a new OU.
All has started working again for the moment so I now just need to investigate what permissions I need on the new OU so that I can move the cluster object in. -
Weblogic Admin server fail over
Hi,
Please let me know if there is a official documentation from Oracle for admin server fail over for version 8.x, 9.x & 10.x?I am not sure if there is something as weblogic Admin Server Failover
For Managed Server failover please read
http://download.oracle.com/docs/cd/E12840_01/wls/docs103/cluster/failover.html -
Hi All,
We have a windows fail over cluster having one windows machine on local network as one of its node.
I want to add a virtual cloud machine available on microsoft azure as another node to this existing cluster.
Please suggest how to do this?
Thanking all in advance,
RaghvendraBefore you even start working on the SQL side, you will need to create a Windows Server 2008 R2 cluster with no shared storage. You can actually test that in-house. Create a VM running 2008 R2 and cluster it with your physical (from your description,
I am assuming physical) 2008 R2 machine. Create it with a file share witness for quorum. Then configure your environment to see that it works as expected.
Once you know how to configure the cluster between physical and VM with a file share witness, build it to Azure. The location of the FSW gets to be an interesting choice. To have a FSW in Azure means that you will need another VM in Azure to
host the file share, meaning you have two quorum votes in Azure and one in-house. Or, you could create a file share witness on an in-house system, giving you two quorum votes in-house and one in Azure.
In the FSW in Azure scenario, if you have a loss of the in-house server, automatic failover occurs because two quorum votes exist in Azure. With FSW in-house, depending on the loss you have in-house, you might have to force quorum to get the Azure
single-node cluster to run. Loss of access to Azure reverses those scenarios. Neither one is optimal, but it does provide some level of recoverability.
. : | : . : | : . tim -
Fail-over options for Standalone Print Server
Our organization recently set up 3 2012 r2 Print Servers to handle 3 separate sites. Each printer server contains only the printers within the site - which makes each server a standalone print server. I'm concern with the issue of not
having a fail-over plan in place, in the event one of my server should fail. Does anyone have any fail-over suggestions.Hi Thomas,
If we want to create a clustered printer server, we need to created another print server in the same site.
For detailed information, please refer to the link below:
https://technet.microsoft.com/en-us/library/cc771091.aspx
Best Regards.
Steven Lee Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Support, contact [email protected] -
Two wistnesses in a SQL Server fail over group
Is it possible to have two witnesses in a SQL Server Always on Availability Group Fail Over Cluster? Our goal is to have redundant witnesses in an Azure availability set.
Thanks,
MikeAlwaysOn uses Windows Failover Clustering for quorum. See, eg Understanding Quorum Configurations in a Failover Cluster
You can do this, but with Dynamic Quorum it's probably not helpful. If you loose your witness vote, the cluster will adjust the quorum requirements.
David
David http://blogs.msdn.com/b/dbrowne/ -
Server Pool Master fails and cannot fail over to another VM Server
Dear All,
Oracle VM 2.2.2
I have 2 VM Servers connect to Storage 6140 Array and on VM Manager I enable HA on the server pool, then on all virtual machines.
- VM Server 1 has role as Server Pool Master, Utility Server, Virtual Machine Server and has virtual machines running
- VM Server 2 has role as Utility Server, Virtual Machine Server and has virtual machines running.
I try to shutdown the VM Server 1 act as Server Pool Master but I don't see Server Pool Master fail over to another VM Server 2 and also status become to Unreachable both of 2 Servers.
Especially, All virtual machines cannot be accessible.
Please kindly give advice for this.
Thanks and regards,
HengThanks Avi, I'll find and read that document. And thanks also for elaborating about the Utility Server.
After reading the followups to my original question, I tried to think of possible server "layouts" in a HA environment.
1) "N" servers in the pool, one of them is Pool Master, Utility Server AND VM Guests Server at the same time. Maybe this will be the preferred server for smaller, quicker VMs.
2) "N" servers in the pool, one is Pool Master AND Utility Server, but has no VM guests running on it
3) "N" servers in the pool, one is the Pool Master, another one is the Utility Server (none of them has VMs running on them), and finally a number of VM Guest servers
Let's take case 1. If the Pool Master & Utility server fails, given that it has VM guests running on it as well, I understand from your explanation that I'll be ANYWAY able to manually "live migrate" the guests somewhere else, using VM Manager. Is this correct?
If it's correct, then it's just a question of how much money I want to spend to have dedicated servers for different tasks, JUST FOR BETTER PERFORMANCES REASONS. Do you agree? And especially: do YOU have dedicated Pool Masters (just to figure out your "real" approach to the problem :-) )
I feel that I still miss something, the picture is not completely clear to me. The fact is, that I'm now testing on my new bladesystem, but for now I put up one single blade. Testing HA will be the next step. I was just trying to get a few things sorted out in advance, but there is still something that I'm missing, as I was saying...
Looking forward to your next reply, thanx again
Rob -
Fail over is not happening in Weblogic JSP Server
Hi..
We have 6 Weblogic instances running as application server (EJB) and 4 Weblogic
instances running as web server (JSP). We have configured one cluster for EJB
servers and one cluster for JSP servers. In front-end we are using four Apache
servers to proxy the request to Weblogic JSP cluster. In my httpd.conf file I
have configured with the Weblogic cluster. I can see the requests are going in
all the servers and believe the cluster is working fine in terms of load balancing
(round-robin). The clients are accessing the servers using CSS (Cisco Load Balancer).
But when we test the fail-over in the cluster, we are facing problems. Let me
explain the scenarios of the fail-over test:
1. The load was generated by the Load Generator
2. When the load is there, we shut down one Apache server, even though there was
some failed transaction, immedialty the servers become stable. So fail-over is
happening in this stage.
3. When I shutdown one EJB instance, again after some failed transactions, the
transactions become stable
4. But, when I shutdown one JSP instance, immediately the transaction failed and
it is not able to fail over to another JSP server and the number of failed transactions
increased.
So I guess, there is some problem in the proxy plug-in configuration, so that
when I shutdown one JSP server, still the requests are being send to the JSP server
by the Apache proxy plug-in.
I have read various queries posted in the News Groups and found some information
about configuring session and cookie information in the Weblogic.xml file. Also
I’m not sure what are all the configurations needs to be done in the Weblogic.xml
and httpd.conf file. Kindly help me to resolve the problem. I would appreciate
your response.
===============================================================
My httpd.conf file plug-in configuration:
###WebLogic Proxy Directives. If proxying to a WebLogic Cluster see WebLogic
Documentation.
<IfModule mod_weblogic.c>
WebLogicCluster X.X.X.X1:7001,X.X.X.X2:7001,X.X.X.X3:7001,X.X.X.X4:7001
MatchExpression *.jsp
</IfModule>
<Location /apollo>
SetHandler weblogic-handler
DynamicServerList ON
HungServerRecoverSecs 600
ConnectTimeoutSecs 40
ConnectRetrySecs 2
</Location>
==============================================================
Thanks in advance,
Siva.Hi,
I can see that bug 13703600 is already got fixed in 12.1.2 but still you same problem please raise ticket with oracle support.
Regrds,
Kal -
ACE 4710 - 'reverse proxy' infront of serverfarm - fail-over/sorry server design issue
Hi All,
I'm working on a specific config and have an issue in the backup farm/fail-over/sorry server area.
The customer wants the following:
They have an existing serverfarm with X web servers, they want a single server to act as a reverse-proxy in front of the farm.
So that all traffic goes trough that server, that server then forwards the request to the original serverfarm.
The problem in my design is in the fail-over, if i configure the reverse-proxy server in a new serverfarm and use the original (web servers) farm as backup it has fail-over, but if the reverse-proxy AND the original serverfarm fail, there is no nice way to get the users on a sorry server.
I could give the original serverfarms rservers a 'backup standby' server but that won't give the desired effect either.
For maintance they first take 50% of the servers offline and switch to the other 50% after that, so then users would see a sorry page even if there where operational servers in the farm left.
The 4710's are running routed mode, and the farms use Sticky Cookie, and also some http URL & Cookie matching is done.
Anyone have an idea how to build this?Hi,
It need additional testing but as per my understanding if you put the back up in this order then the last backup server will be choosen first.
In your case it will be like " RSERVER1 >> backup sorry server >> backup web content
As per the below example:
I put test 2 as first backup server and test1 as second backup server but if you look at the first part it took rserver test1 as first backup.
serverfarm host 1313-GIN-GWAP-SDC-80
rserver RSERVER1
backup-rserver test1
inservice
rserver test1
inservice standby
rserver test2
inservice standby
regards,
Ajay Kumar -
Problems with LDAP Server fail-over
Our Xsan installed with 12 FCP, 2 MDC Xserve and 2 LDAP Xserver for fail-over.
The 2 MDC fail-over runs well but the 2 LDAP fail-over got problems.
The first time we up-plug the powercode of 1 xserve and the other LDAP takes over successfully but FCP users re-login takes 15 minutes. That's unacceptable.
The fail-over never succeed after that.
That means once the LDAP down and the backup LDAP will not take the job, we will lose everything related to user login.
Anybody can help? Thanks a lot.I believe you can enter both LDAP servers in the client configuration for LDAP access. (Even though you shouldn't have to)
IP failover is not the issue, your LDAP configuration is.
Start at page 90 and work throught this document to make sure you have the clients setup properly.
http://manuals.info.apple.com/en/MacOSXSrvr10.3_OpenDirectoryAdmin.pdf
Maybe you are looking for
-
http://imageshack.us/photo/my-images/21/screenshot07m.jpg/ http://imageshack.us/photo/my-images/39/screenshot04s.jpg/ pls help. i change the file preview of a movie file, and it shows that i have changed the preview in finder but when i open the fold
-
Choicebox lists appearing in the wrong place
Hi Having a problem with ChoiceBox. I have a scene with lots of displayed information and a number of choiceboxes. I also have a scene which is much smaller and only displays the most important information. When I switch to the small screen and then
-
How do I create a landing page
I would like to create a few different landing pages to go with email campaigns I'm working on. I have not used business catalyst before. Is it the same as adding a web page?
-
IOS 8.2 , iPhone 6 plus , iTunes won't connect to store
since updating my phone iTunes no longer accepts my account or password giving me can't connect to store error message. WOrked before , worked to sign in to here but won't on the phone.
-
Media suddenly offline, codec problem?
3 AVI files suddenly went offline after i upgraded from CS4 to CS5 I am unable to find a solution to this myself, anyone out there who can help me? Thank you. <3