SQL SERVER Failover Cluster switch failure because the passive node automatically reassign drive letter

I switch the sql server resource group to the standby node , when the disk resource ready bring online in the passive node ,then occur exception. because the original dependency disk resource the drive letter is 'K:' , BUT when the disk bring online , it
automatically reassign new drive letter 'H:' , So the sql server resource couldnot bring online . And After Manual modify the drive letter to 'K:' in the passive node , It Works ! So my question is why it not use the original drive letter
and reassign a new one . what reasons would be cause it ? mount point ? Some log as follows:
00001cbc.000004e0::2015/03/12-14:41:11.377 WARN [RES] Physical Disk <FltLowestPrice_K>: OnlineThread: Failed to set volguid \??\Volume{e32c13d5-02e6-4924-a2d9-59a6fae1a1be}. Error: 183.
00001cbc.000004e0::2015/03/12-14:41:11.377 INFO [RES] Physical Disk <FltLowestPrice_K>: Found 2 mount points for device \Device\Harddisk8\Partition2
00001cbc.00001cdc::2015/03/12-14:41:11.377 INFO [RES] Physical Disk: PNP: Update volume exit, status 1168
00001cbc.00001cdc::2015/03/12-14:41:11.377 INFO [RES] Physical Disk: PNP: Updating volume
\\?\STORAGE#Volume#{1a8ddb8e-fe43-11e2-b7c5-6c3be5a5cdca}#0000000008100000#{53f5630d-b6bf-11d0-94f2-00a0c91efb8b}
00001cbc.00001cdc::2015/03/12-14:41:11.377 INFO [RES] Physical Disk: PNP: Update volume exit, status 5023
00001cbc.000004e0::2015/03/12-14:41:11.377 ERR [RES] Physical Disk: Failed to get volname for drive H:\, status 2
00001cbc.000004e0::2015/03/12-14:41:11.377 INFO [RES] Physical Disk <FltLowestPrice_K>: VolumeIsNtfs: Volume
\\?\GLOBALROOT\Device\Harddisk8\Partition2\ has FS type NTFS
00001cbc.000004e0::2015/03/12-14:41:11.377 INFO [RES] Physical Disk: Volume
\\?\GLOBALROOT\Device\Harddisk8\Partition2\ has FS type NTFS
00001cbc.000004e0::2015/03/12-14:41:11.377 INFO [RES] Physical Disk: MountPoint H:\ points to volume
\\?\Volume{e32c13d5-02e6-4924-a2d9-59a6fae1a1be}\

Sounds like you have an cluster hive that is out of date/bad, or some registry settings which are incorrect. You'll want to have this question transferred to the windows forum as that's really what you're asking about.
-Sean
The views, opinions, and posts do not reflect those of my company and are solely my own. No warranty, service, or results are expressed or implied.

Similar Messages

Service Accounts for Reporting Service in SQL Server Failover Cluster setup

I am setting up 2 Report Services (SSRS) in SQL Failover Clustering (Version: 2012SP1) on Windows 2012, as part of scale out architecture.
There are 2 options to configure the service account for SSRS:
Option 1) Using domain accounts, as what I have done for DB Engine and SQL Agent.
Option 2) accept the default, which is virtual account for SSRS. Per documentation URL:
http://msdn.microsoft.com/en-us/library/ms143504.aspx
which is the recommended one? is it option 2?
There is security note on above URL as well, but does not clearly mention that option 1 is not recommended.
Security Note: Always run SQL Server services by using the lowest possible user rights. Use a MSA or virtual account when possible. When MSA and virtual accounts are not possible, use a specific low-privilege user account or domain account instead
of a shared account for SQL Server services. Use separate accounts for different SQL Server services. Do not grant additional permissions to the SQL Server service account or the service groups. Permissions will be granted through group membership or granted
directly to a service SID, where a service SID is supported.
Thanks very much for your help!

Hi Luo Donghua,
In SQL Server Failover Cluster Instance, personally two options can run well. If you use the virtual account for SQL Server Reporting Service. Virtual accounts in Windows Server 2008 R2 and Windows 7 are managed local accounts that provide the features to
simplify service administration. The virtual account is auto-managed, and the virtual account can access the network in a domain environment.
Of cause, you can also use domain accounts in your clustering.
Just make sure your service account is set up here, or that it is using a proper built-in account.For more information, see:http://ermahblerg.com/2012/11/08/cluster-ssrs-in-2008/
Thanks,
Sofiya Li
Sofiya Li
TechNet Community Support

SQL Server Failover Cluster Questions

Dear All,
            I am building a two-node failover cluster on SQL Server 2012 SP1 (inside Hyper-V as a Guest Cluster) and want clarification on few things that I am facing.
1.  I am receiving MSDTC Warning. I can go ahead and create the cluster, but want to understand whether this MSDTC is to be configured as a role on the cluster or not. I plan to run SCVMM, SCOM, Orchestrator and Windows Azure Pack Databases
and Reports through it so in such a scenario, do I need MSDTC? If yes, how much should be the size of the MSDTC Drive? Is following process correct?
http://www.sqlnotebook.info/configure-msdtc-on-windows-cluster-2012/
2. During First Node configuration, one needs to provide the "SQL CLUSTER RESEOURCE GROUP NAME". Does it have any bearing on how it will be accessed by other servers for databases and logs? or is it just how the cluster resource group
would be named? would it be required for every instance that is created inside the cluster? Just to be clear, so one can name it according to the instance name.
3. During the instance creation, one needs to provide "SQL Server Network Name". As stated above, I plan to run SCVMM, SCOM, Orchestrator and Windows Azure Pack Databases and Reports through it, so would I be required to provide this
for all instances that I create or this is only required once in the cluster:
4. During the instance creation, one needs to provide the features required for installation i.e. instance features and shared features. As stated above, I plan to run SCVMM, SCOM, Orchestrator and Windows Azure Pack Databases and Reports through
it, so which features should be selected? so that there is less workload on the server.
5. All the instances use TempDB for databases that are present inside it. What would be the best practice with respect to TempDB. One TempDB for all instance on the servers on a separate LUN or all instance having their own TempDB LUN?  What
should be the ideal size of the TempDB LUN?
6. Should all the disks required for DBs and Logs be added to Cluster?  Should they be added normal disks or CSV Volumes?
Thanks in advance.

Hello,
1.You can run the Microsoft Distributed Transaction Coordinator service (MSDTC) as a clustered resource on a failover cluster server for increased reliability, based on the failover capabilities of the clustered servers. You can
refer to the MSDTC section of the following blog about determine whether the Microsoft Distributed Transaction Coordinator (MSDTC) cluster resource must be created.
Reference:http://msdn.microsoft.com/en-us/library/ms189910.aspx#MSDTC
2. The Cluster Resource Group is where SQL Server failover cluster resource will be placed. Each clustered SQL Server will belong to a Failover
Cluster Resource Group. For example, if you had configure a two node SQL Server Cluster, each clustered instance on the two node belong to a same Cluster Resource Group.
You can change the Cluster Resource Group name, but notes the following name is reserved and already used as Resource Group names: Available Storage, Cluster Group.
3. Each SQL Server cluster is assigned a virtual Network name and IP address, which client applications use to connect to the clustered SQL Server.
4. Not familiar with SCVMM, SCOM, Orchestrator, but you should install the Database Engine Services and SQL Server Management tools.If you want to use SQL Server Reporting Services, you can install Reporting Servers, but Report Server service cannot participate
in a failover cluster.
5. You can use isolated disk for user database and temp DB of each SQL Server Cluster
6. Yes. You should use Cluster Disks which add to Clustered Shared Volumes to host the data file and log of databases.
http://www.pythian.com/blog/how-to-install-a-clustered-sql-server-2012-instance-step-by-step-part-1/
Regards,
Fanny Liu
Fanny Liu
TechNet Community Support

New SQL Server Failover Cluster Installation - No disk is available to select in section "Cluster Disk Slection"

Hello Everyone,
I am in a deep need for your help regarding the problem I am facing.
I am doing a New SQL Server Failover Cluster Installation in a virtual server that is part of a failover cluster. I am able to complete all the steps successfully but, when I reach the point where I am supposed to select the shared disk that will
be included in the SQL server resource cluster group, I don't find any disk in the list (as u can see in the figure below).
I have already created the a 2 nodes failover cluster and added 3 disks (1 as a witness in Quorum and 2 other available storage).
No roles were created, 2 nodes are available and 1 network is there in the cluster.
If u take a look at the message it say: "The search for mount points failed. Error: the system cannot find the path specified". What is this and how can I solve this issue??
Thanks in advance for your support and looking forward for your valuable feedbacks.
Mark as answer if it was an answer for you question.. Please don't hesitate to ask for any further help..

Dear Ashwin,
I have granted the privileges mentioned in the link you provided as below:
Act as Part of the Operating Sywstem = SeTcbPrivileg
  Bypass Traverse Checking = SeChangeNotify
  Lock Pages In Memory = SeLockMemory
  Log on as a Batch Job = SeBatchLogonRight
  Log on as a Service = SeServiceLogonRight
  Replace a Process Level Token = SeAssignPrimaryTokenPrivilege
I was not able to solve the problem by giving these privileges to the domain account I am using to install SQL.
Mark as answer if it was an answer for you question.. Please don't hesitate to ask for any further help..

Can a SQL Server Failover Cluster Instance (FCI) be Implemented Between Two Hyper-V Hosted Virtual Machines?

I haven't had the opportunity to implement a SQL Server Failover Cluster Instance (FCI) for over 10 years and that was done with two physical, identical database servers way back in the day of Windows Server 2003 and SQL Server 2000 (old school).
Can a SQL Server 2008 R2 Failover Cluster Instance (FCI) be implemented between two Hyper-V hosted virtual machines? The environment in question already has Windows Server 2012 R2 Hyper-V hosts in place, so I'm just looking to see if this is even
possible and/or supported when utilizing virtual machines.
The client in question is currently using SQL Server 2008 R2 instances running on Win2008R2, Win2012, and Win2012R2, but I'd also be interested how this can be done or not with SQL Server 2012 or 2014 as well. Thanks in advance.
Bill Thacker

Yes, it can be done with Hyper-V guests. In fact, with Windows Server 2012 R2 Hyper-V, guests can use the Shared VHDX feature for shared storage used by Windows clusters. The guests can run Windows Server 2008 and higher provided that the Hyper-V Integration
Services are installed to support Shared VHDX. The only challenge here is making the Hyper-V hosts highly available as well, running it on WSFC.
Edwin Sarmiento SQL Server MVP | Microsoft Certified Master
Blog |
Twitter | LinkedIn
SQL Server High Availability and Disaster Recover Deep Dive Course

SQL Server cannot authenticate using Kerberos because the Service Principal Name (SPN) is missing, misplaced, or duplicated

We are getting this below alert message, while using SCOM 2012 R2. Anybody have any idea how to resolve this on the SQL box ?
Thx...
SQL Server cannot authenticate using Kerberos because the Service Principal Name (SPN) is missing, misplaced, or duplicated.
Service Account: NT Service\MSSQL$SQLEXPRESS
Missing SPNs:
Misplaced SPNs: MSSQLSvc/mysqlbox.com:SQLEXPRESS - sqldbadmin
Duplicate SPNs:

To Fix this issue, You can check below links
http://support.microsoft.com/kb/2443457/EN-US
http://www.scomgod.com/?p=155
Please remember, if you see a post that helped you please click "Vote As Helpful" and if it answered your question, please click "Mark As Answer"Mai Ali | My blog:
Technical | Twitter:
Mai Ali

DSC, SQL Server 2012 Enterprise sp2 x64, SQL Server Failover Cluster Install not succeeding

Summary: DSC fails to fully install the SQL Server 2012 Failover Cluster, but the identical code snippet below run in powershell ise with administrator credentials works perfectly as does running the SQL server install interface.
In order to develop DSC configurations, I have set up a Windows Server 2012 R2 failover cluster in VMware Workstation v10 consisting of 3 nodes. All have the same Windows Server 2012 version and have been fully patched via Microsoft Updates.
The cluster properly fails over on command and the cluster validates. Powershell 4.0 is being used as installed in windows.
PDC
Node1
Node2
The DSC script builds up the parameters to setup.exe for SQL Server. Here is the cmd that gets built...
$cmd2 = "C:\SOFTWARE\SQL\Setup.exe /Q /ACTION=InstallFailoverCluster /INSTANCENAME=MSSQLSERVER /INSTANCEID=MSSQLSERVER /IACCEPTSQLSERVERLICENSETERMS /UpdateEnabled=false /IndicateProgress=false /FEATURES=SQLEngine,FullText,SSMS,ADV_SSMS,BIDS,IS,BC,CONN,BOL /SECURITYMODE=SQL /SAPWD=password#1 /SQLSVCACCOUNT=SAASLAB1\sql_services /SQLSVCPASSWORD=password#1 /SQLSYSADMINACCOUNTS=`"SAASLAB1\sql_admin`" `"SAASLAB1\sql_services`" `"SAASLAB1\cubara01`" /AGTSVCACCOUNT=SAASLAB1\sql_services /AGTSVCPASSWORD=password#1 /ISSVCACCOUNT=SAASLAB1\sql_services /ISSVCPASSWORD=password#1 /ISSVCSTARTUPTYPE=Automatic /FAILOVERCLUSTERDISKS=MountRoot /FAILOVERCLUSTERGROUP='SQL Server (MSSQLSERVER)' /FAILOVERCLUSTERNETWORKNAME=SQLClusterLab1 /FAILOVERCLUSTERIPADDRESSES=`"IPv4;192.168.100.15;LAN;255.255.255.0`" /INSTALLSQLDATADIR=M:\SAN\SQLData\MSSQLSERVER /SQLUSERDBDIR=M:\SAN\SQLData\MSSQLSERVER /SQLUSERDBLOGDIR=M:\SAN\SQLLogs\MSSQLSERVER /SQLTEMPDBDIR=M:\SAN\SQLTempDB\MSSQLSERVER /SQLTEMPDBLOGDIR=M:\SAN\SQLTempDB\MSSQLSERVER /SQLBACKUPDIR=M:\SAN\Backups\MSSQLSERVER > C:\Logs\sqlInstall-log.txt "
Invoke-Expression $cmd2
When I run this specific command in Powershell ISE running as administrator, logged in as domain account that is in the Node1's administrators group and has domain administrative authority, it works perfectly fine and sets up the initial node properly.
When I use the EXACT SAME code above pasted into my custom DSC resource, as a test with a known successful install, run with the same user as above, it does NOT completely install the cluster properly. It still installs 17 applications
related to SQL Server and seems to properly configure everything except the cluster. The Failover Cluster Manager shows that the SQL Server Role will not come on line and the SQL Server Agent Role is not created.
The code is run on Node1 so the setup folder is local to Node1.
The ConfigurationFile.ini files for the two types of installs are identical.
Summary.txt does have issues..
Feature:                       Database Engine Services
Status:                        Failed: see logs for details
Reason for failure:            An error occurred during the setup process of the feature.
Next Step:                     Use the following information to resolve the error, uninstall this feature, and then run the setup process again.
Component name:                SQL Server Database Engine Services Instance Features
Component error code:          0x86D8003A
Error description:             The cluster resource 'SQL Server' could not be brought online. Error: There was a failure to call cluster code from a provider. Exception message: Generic
failure . Status code: 5023. Description: The group or resource is not in the correct state to perform the requested operation. .
It feels like this is a security issue with DSC or an issue with the setup in SQL Server, but please note I have granted administrators group and domain administrators authority. The nodes were built with the same login. Windows firewall
is completely disabled.
Please let me know if any more detail is required.

Hi Lydia,
Thanks for your interest and help.
I tried "Option 3 (recommended)" and that did not help.
The issue I encounter with the fail-over cluster only occurs when trying to install with DSC!
Using the SQL Server Install wizard, Command Prompt and even in Powershell by invoking the setup.exe all work perfectly.
So, to reiterate, this issue only occurs while running in the context of DSC.
I am using the same domain login with Domain Admin Security and locally the account has Administrators group credentials. The SQL Server Service account also has Administrators Group Credentials.

Disable Remote Resigtry service for SQL Server failover cluster

Hi,
Is it safe to disable Remote Registry service from the the two nodes of SQL Server failover clustering?
Operating system: Microsoft Server 2008 R2 Enterprise Edition (64-bit)
Database software: Microsoft SQL Server 2008 R2 Enterprise Edition (64-bit), Failover Clustering
Thank you.
Regards,
Gan

Hi Pradeep,
You are right. I tested and verified that the GUI of cluster (Failover Cluster Manager) was not working properly after disabling Remote Registry service. Thank you :)
Regards,
Gan

Install BizTalk 2013 R2 + SQL Server on cluster - MSDTC Failure

Hi, Folks.
I'm trying to install BTS 2013 R2 using SQL Server on a cluster. Successfully I've configured SSO (on same BTS server), BRE and Group, perhaps when I try to install Runtime there is an error:
The Microsoft Distributed Transaction Coordinator (MSDTC) may not be configured correctly. Ensure that the MSDTC service is running and DTC network access is allowed on the BizTalk, SQL and SSO Master servers. For more information, see "MSDTC Configuration
settings required for BizTalk Server" in the BizTalk Server Help.
Internal error: "New transaction cannot enlist in the specified transaction coordinator. "
Well, I get DTCPing and take a test between my BTS server and MSDTC cluster server, which runs fine. I don't have any firewall between those servers. After, I've checked my DTC settings in both sides. They are configured properly, according to MS:
MSDTC Cluster settings
BTS Server settings
After, I've looked to Event Viewer and I found a warning message from SSO every 30 seconds when BTS Config is trying to install BTS Runtime:
Could not access the SSO database. If this condition persists, the SSO service will go offline.
Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding..
SQL Error code: 0xFFFFFFFE
I "googled" it and found this issue is generally related to my BizTalk user permission on database server, perhaps, my user have high privileges in all DB servers which compose my cluster.
All servers (3 from DB cluster and 1 for BTS) runs Windows 2012 R2 64-bit, my SQL Server version is 2014 and BTS user and related groups bellows to my domain. I really don't understand what's going on.

Hi, Shankycheil.
My 3 SQL Cluster nodes share the same CID, so, I've reconfigured two nodes, reboot each server and install MSDTC again. After that, DTCPing stops to show the CID warning.
MSDTC cluster residents on same server with have SQL Server cluster.
After, following Ashwin Prabhu suggestion, I've unconfig and config all BTS itens again, perhaps, at BTS Group I got the same error.
Looking at my BTS MSDTC trace file I see an timeout erro, while BTS Group config is running:
pid=4888 ;tid=3424 ;time=03/02/2015-15:18:16.560 ;seq=11 ;eventid=TRACING_STARTED ;;"TM
Identifier='(null) '" ;"MSDTC is resuming the tracing of long - lived transactions"
pid=4888 ;tid=3424 ;time=03/02/2015-15:18:17.145 ;seq=12 ;eventid=TRANSACTION_BEGUN ;tx_guid=56b93685-6ada-47bc-8a80-c30ff7ad66ae
;"TM Identifier='(null) '" ;"transaction has begun, description :'<NULL>'"
pid=4888 ;tid=3424 ;time=03/02/2015-15:18:17.145 ;seq=13 ;eventid=TRANSACTION_PROPOGATED_TO_CHILD_NODE ;tx_guid=56b93685-6ada-47bc-8a80-c30ff7ad66ae
;"TM Identifier='(null) '" ;"transaction propagated to 'xxxxx' as transaction child node
#1"
pid=4888 ;tid=4964 ;time=03/02/2015-15:23:40.801 ;seq=14 ;eventid=ABORT_DUE_TO_TRANSACTION_TIMER_EXPIRED ;tx_guid=56b93685-6ada-47bc-8a80-c30ff7ad66ae
;"TM Identifier='(null) '" ;"transaction timeout expired"
pid=4888 ;tid=4964 ;time=03/02/2015-15:23:40.801 ;seq=15 ;eventid=TRANSACTION_ABORTING
;tx_guid=56b93685-6ada-47bc-8a80-c30ff7ad66ae ;"TM Identifier='(null) '"
;"transaction is aborting"
pid=4888 ;tid=4964 ;time=03/02/2015-15:23:40.801 ;seq=16 ;eventid=CHILD_NODE_ISSUED_ABORT ;tx_guid=56b93685-6ada-47bc-8a80-c30ff7ad66ae
;"TM Identifier='(null) '" ;"abort request issued to transaction child node
#1 'xxxxx'"
pid=4888 ;tid=4308 ;time=03/02/2015-15:23:40.801 ;seq=17 ;eventid=CHILD_NODE_ACKNOWLEDGED_ABORT ;tx_guid=56b93685-6ada-47bc-8a80-c30ff7ad66ae
;"TM Identifier='(null) '" ;"received acknowledgement of abort request from
transaction child node #1 'xxxxx'"
pid=4888 ;tid=4308 ;time=03/02/2015-15:23:40.801 ;seq=18 ;eventid=TRANSACTION_ABORTED ;tx_guid=56b93685-6ada-47bc-8a80-c30ff7ad66ae
;"TM Identifier='(null) '" ;"transaction has been aborted"
pid=4888 ;tid=4308 ;time=03/02/2015-15:23:41.674 ;seq=19 ;eventid=TRANSACTION_PROPAGATION_FAILED_TRANSACTION_NOT_FOUND ;tx_guid=56b93685-6ada-47bc-8a80-c30ff7ad66ae
;"TM Identifier='(null) '" ;"failed to propagate transaction to child node 'xxxxx' because
the transaction could not be found. Some possible reasons include, client might have already called commit or transaction might have got aborted due to timeout."
pid=4888 ;tid=4308 ;time=03/02/2015-15:23:41.675 ;seq=20 ;eventid=TRANSACTION_PROPAGATION_FAILED_TRANSACTION_NOT_FOUND ;tx_guid=56b93685-6ada-47bc-8a80-c30ff7ad66ae
;"TM Identifier='(null) '" ;"failed to propagate transaction to child node 'xxxxx' because
the transaction could not be found. Some possible reasons include, client might have already called commit or transaction might have got aborted due to timeout."
My remote MSDTC server has a huge capability and I don't see any high consume process at this time to give me this timeout error.
At least, I've tried run "msdtc -tmMappingSet"
and "msdtc.exe -tmMappingView"
on BTS server, but I got an error message from msdtc.exe:
Error occurred while trying to perform the above operation. Check the trace file for more information
I don't see any error at trace, only at eventvwr showing an error event "Invalid command line arguments.". This configuration must be done at BTS server-side or my MSDTC cluster?

The lease timeout between avaiability group and the Windows Server Failover Cluster has expired

Hi,
I am having some issues where I get a lease timeout from time to time. I have a Windows 2012 Failover Cluster with 2 nodes and 2 SQL 2012 Always-on Availability Groups. Both nodes
are a physical machines and each node is the primary for an AG.
From what I understand if
the HealhCheckTimeout
is exceeded without the signal exchange the lease is declared 'expired' and the SQL Server resource dll reports that the SQL Server availability group no longer 'looks alive' to the Windows cluster manager. Here are the properties I have setup
which are the default settings:
LeaseTimeout - 20000
HealthCheckTimeout - 30000
VerboseLoging - 0>
FailureConditionLevel – 3
Here are the events that occur in the Application Event Viewer:
Event ID 19407:
The lease between availability group 'AG_NAME' and the Windows Server Failover Cluster has expired. A connectivity issue occurred between the instance of SQL Server and the Windows Server Failover
Cluster. To determine whether the availability group is failing over correctly, check the corresponding availability group resource in the Windows Server Failover Cluster.
Event ID 35285:
The recovery LSN (120881:37533:1) was identified for the database with ID 32. This is an informational message only. No user action is required.
SQl server logs are too long to post in this box but I can send them if you request.
The AG is setup to failover automatically but it did not failover. I am trying to figure out why the lease timed out. Thanks.

From what I've been able to find out, this is due to an issue with the procedure sp_server_diagnostics. It sounds like the cluster is expecting this procedure to regularly log good status "Clean" in the log files, but the procedure is designed not
to flood the logs with "Clean" messages, so only reports changes, and does not make an entry when the last status was "Clean" and the current status is "Clean". The result is that the cluster looks to be unresponsive. However, once it initiates
the failover, the primary machine responds, since it was never really down, and the failover operation stops.
The end result is that there really never is a failover, but the database becomes unavailable for a few minutes while this is resolved.
I'm going to try setting the cluster's failure condition level to 2 (instead of 3) and see if that prevents the down time.
blogs.msdn.com/b/sql_pfe_blog/archive/2013/04/08/sql-2012-alwayson-availability-groups-automatic-failover-doesn-t-occur-or-does-it-a-look-at-the-logs.aspx

How to Perform Forced Manual Failover of Availability Group (SQL Server) and WSFC (Windows Server Failover Cluster)

I have a scenario with the three nodes with server 2012 standard, each running an instance of SQL Server 2012 enterprise, participate in a
single Windows Server Failover Cluster (WSFC) that spans two data centers.
If the nodes in the primary data center are unavailable due to data center outage. Then how I can able to access node in the WSFC (Windows Server Failover Cluster) in the secondary disaster recovery data center automatically with some script.
I want to write script that can be able to check primary data center by pinging some IP after every 5 or 10 minutes.
If that IP is unable to respond then script can be able to Perform Forced Manual Failover of Availability Group (SQL Server) and WSFC (Windows Server Failover Cluster)
Can you please guide me for script writing for automatic failover in case of primary datacenter outage?

please post you question on failover clusters in the cluster forum. THey will explain how this works and point you at scipts.
You should also look in the Gallery for cluster management scripts.
¯\_(ツ)_/¯

How to Perform Forced Manual Failover of Availability Group (SQL Server) and WSFC (Windows Server Failover Cluster) with scrpiting

I have a scenario with the three nodes with server 2012 standard, each running an instance of SQL Server 2012 enterprise, participate in a
single Windows Server Failover Cluster (WSFC) that spans two data centers.
If the nodes in the primary data center are unavailable due to data center outage. Then how I can able to access node in the WSFC (Windows Server Failover Cluster) in the secondary disaster recovery data center automatically with some script.
I want to write script that can be able to check primary data center by pinging some IP after every 5 or 10 minutes.
If that IP is unable to respond then script can be able to Perform Forced Manual Failover of Availability Group (SQL Server) and WSFC (Windows Server Failover Cluster)
Can you please guide me for script writing for automatic failover in case of primary datacenter outage?

You are trying to implement manually what should be happening automatically in the cluster. If the primary SQL Server becomes unavailable in the data center, it should fail over to the secondary SQL Server automatically. Is that not working?
You also might want to run this configuration by some SQL experts. I am not a SQL expert, but if you have both hosts in the data center in a cluster, there is no need for replication between those two nodes as they would be accessing
the database from some form of shared storage. Then it looks like you are trying to implement Always On to the DR site. I'm not sure you can mix both types of failover in a single configuration.
FYI, it would make more sense to establish a file share witness in your DR site instead of placing a third node in the data center for Node Majority quorum.
. : | : . : | : . tim

DPM failing SQL backups due to error: "the SQL Server instance refused a connection to the protection agent. (ID 30172 Details: Internal error code: 0x80990F85)

I ran across this error starting on 6/4/2011 and have been unable to find the root of the problem. In our environment, we have a DPM 2010 server dedicated to backing up all our SQL envrionment (about 45 SQL Servers total). All of the SQL
environment is backing up fine except for a SQL Cluster Application. This particular SQL Instances is part of a 6 node failover cluster with 6 SQL Instances distributed amongst them. The other 5 SQL instances in the cluster are backing
up fine; only one instance is failing. The DPM Alerts section shows this error when attempting to do a SQL backup of one of the databases on this SQL instance:
Affected area: KEN-PROD-VDB001\POSREPL1\master
Occurred since: 6/11/2011 11:00:56 PM
Description: Recovery point creation jobs for SQL Server 2008 database KEN-PROD-VDB001\POSREPL1\master on SQL Server (POSREPL1) - Store Settings.ken-prod-cl004.aarons.aaronrents.com have been failing. The number of failed recovery point creation jobs =
4.
If the datasource protected is SharePoint, then click on the Error Details to view the list of databases for which recovery point creation failed. (ID 3114)
The DPM job failed for SQL Server 2008 database KEN-PROD-VDB001\POSREPL1\master on SQL Server (POSREPL1) - Store Settings.ken-prod-cl004.aarons.aaronrents.com because the SQL Server instance refused a connection to the protection agent. (ID 30172 Details:
Internal error code: 0x80990F85)
More information
Recommended action: This can happen if the SQL Server process is overloaded, or running short of memory. Please ensure that you are able to successfully run transactions against the SQL database in question and then retry the failed job.
Create a recovery point...
Resolution: To dismiss the alert, click below
Inactivate alert
I have checked the cluster node this particular SQL instance is running on using Perfmon and the machine is nowhere near capacity on CPU, memory, network, or Disk I/O.  I have failed this SQL Application to another node in the cluster and
receive the same error (this other node has another clustered SQL application on it that is actively running as well as backing up fine). The only thing that I am aware of that has changed is that we installed SP2 for SQL 2008 about 2 weeks prior
to when the failures started to occur. However, we updated all six clustered SQL Instances at the same time and only this one is having this issue so I don't believe that caused the problem. We are running SQL 2008 SP2 (version 10.0.4000.0)
on all clustered instances along with DPM 2010 (version 3.0.7696.0) on this particular DPM server that has the issue.
One last thing, I have also noticed errors in the event log pertaining to the same SQL backups that are failing (but the time stamps are not concurrent with each backup attempt):
Log Name:      Application
Source:        MSDPM
Date:          6/13/2011 1:09:12 AM
Event ID:      4223
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      KEN-PROD-BS002.aarons.aaronrents.com
Description:
The description for Event ID 4223 from source MSDPM cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.
If the event originated on another computer, the display information had to be saved with the event.
The following information was included with the event:
DPM writer was unable to snapshot the replica of KEN-PROD-VDB001\POSREPL1\model. This may be due to:
1) No valid recovery points present on the replica.
2) Failure of the last express full backup job for the datasource.
3) Failure while deleting the invalid incremental recovery points on the replica.
Problem Details:
<DpmWriterEvent><__System><ID>30</ID><Seq>1833</Seq><TimeCreated>6/13/2011 5:09:12 AM</TimeCreated><Source>f:\dpmv3_rtm\private\product\tapebackup\dpswriter\vssfunctionality.cpp</Source><Line>815</Line><HasError>True</HasError></__System><DetailedCode>-2147212300</DetailedCode></DpmWriterEvent>
the message resource is present but the message is not found in the string/message table
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
    <Provider Name="MSDPM" />
    <EventID Qualifiers="0">4223</EventID>
    <Level>2</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2011-06-13T05:09:12.000000000Z" />
    <EventRecordID>68785</EventRecordID>
    <Channel>Application</Channel>
    <Computer>KEN-PROD-BS002.aarons.aaronrents.com</Computer>
    <Security />
</System>
<EventData>
    <Data>DPM writer was unable to snapshot the replica of KEN-PROD-VDB001\POSREPL1\model. This may be due to:
1) No valid recovery points present on the replica.
2) Failure of the last express full backup job for the datasource.
3) Failure while deleting the invalid incremental recovery points on the replica.
Problem Details:
<DpmWriterEvent><__System><ID>30</ID><Seq>1833</Seq><TimeCreated>6/13/2011 5:09:12 AM</TimeCreated><Source>f:\dpmv3_rtm\private\product\tapebackup\dpswriter\vssfunctionality.cpp</Source><Line>815</Line><HasError>True</HasError></__System><DetailedCode>-2147212300</DetailedCode></DpmWriterEvent>
</Data>
    <Binary>3C00440070006D005700720069007400650072004500760065006E0074003E003C005F005F00530079007300740065006D003E003C00490044003E00330030003C002F00490044003E003C005300650071003E0031003800330033003C002F005300650071003E003C00540069006D00650043007200650061007400650064003E0036002F00310033002F003200300031003100200035003A00300039003A0031003200200041004D003C002F00540069006D00650043007200650061007400650064003E003C0053006F0075007200630065003E0066003A005C00640070006D00760033005F00720074006D005C0070007200690076006100740065005C00700072006F0064007500630074005C0074006100700065006200610063006B00750070005C006400700073007700720069007400650072005C00760073007300660075006E006300740069006F006E0061006C006900740079002E006300700070003C002F0053006F0075007200630065003E003C004C0069006E0065003E003800310035003C002F004C0069006E0065003E003C004800610073004500720072006F0072003E0054007200750065003C002F004800610073004500720072006F0072003E003C002F005F005F00530079007300740065006D003E003C00440065007400610069006C006500640043006F00640065003E002D0032003100340037003200310032003300300030003C002F00440065007400610069006C006500640043006F00640065003E003C002F00440070006D005700720069007400650072004500760065006E0074003E00</Binary>
</EventData>
</Event>
Any help would be greatly appreciated!

Don't know if this helps or not, but I also noticed another peculiar issue that is derived from this problem. If I go to "Modify protection group", then expand the cluster, then expand all six nodes in the cluster, five of them show "All SQL Servers"
and allow me to expand the SQL Instance and show all databases; the one that is having a problem backing up, when I expand the node, doesn't even show that SQL exists on the node, when in fact, it does.
I would also like to add that the databases on this node that will not backup are running fine. They run hundreds of transactions daily so we know SQL itself is OK. Even though it is a busy SQL Server, there is plenty of available resources as
the SQL buffer and memory counters show the node is not under durress.

SQL Server Failover Clustering error

Hi All,
I am currently setting up 2-node SQL Server cluster but I am getting error when doing the failover test from Node2 to Node1.
Here is the quick overview of what I have so far.
1. Setup the failover cluster for both nodes, public and private network, cluster disks for Quorum, MSDTC and SQL, etc.
2. Run validation configuration before creating the cluster. Validation report completed successfully with no errors/warnings.
3. Created cluster, created MSDTC cluster and installed SQL server on both nodes.
Now I am doing some failover test on whether cluster resources will failover from Node1 to Node2 and Node2 to Node1.
Failover Test: Active Node is Node1.
1. Disable Public network on Node1.
2. Failover to Node2 -> successful
3. Enable Public network on Node1.
Problem:
After the failover to Node1, I tried to failback the resources from Node2 to Node1 by disabling the public network on Node2 (which is the active Node after the failover from Node1 to Node2) but the cluster resources won't failback to Node1.
Failback from Node2 to Node1 -> failed
1. Disable Public network on Node2.
2. Failback to Node1 -> failed
- Cluster Name and Cluster IP ->
failed
- SQL cluster group (SQL name, SQL IP address, Analysis, SQL server and SQL Server Agent) ->
failed
MSDTC cluster group -> failed back successfully to Node1
3. Enable Public network on Node 2.
4. Manually online Cluster Group and SQL cluster group
I tried to Manually online the Cluster Group and SQL cluster group but it CANNOT be online unless I enable the Public network on Node2. I have checked on the cluster event log and I am getting some event ID 1077 and 1069 errors and Event ID 1069 and 1205.
Here are some of the logs on the cluster events.
Event ID 1069: The Cluster service failed to bring clustered service or application 'SQL_Group' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
Event ID 1205: Cluster resource 'SQL IP Address 2 (db-vip)' in clustered service or application 'SQL_Group' failed.
Anyone experience the same issue before? Appreciate if someone can point me to right direction to resolve the issue.
Thanks in advance for your feedback.
BTW, failover and failback works perfectly when I try to reboot the Active node. Resources failed over successfully from Node1 to Node2 and vice versa when I reboot the server.
Thanks again.
Regards,
Ivan

Hi,
In above node test process 6, after “disable public network on node 2” the two cluster should still have private heartbeat network communication, check can you ping node 1 now?
Check event viewer, whether you received Event 1069 in node 1? Refer to this article to troubleshoot this issue:
Event ID 1069 — Clustered Service or Application Availability
http://technet.microsoft.com/en-us/library/cc756225(WS.10).aspx
Check and correct any problems with the application or service associated with the resource. For example, if the application or service cannot start within reasonable time limits, diagnose and resolve the issues that cause slow starting.
Check and correct any problems with cables or cluster-related devices.
Adjust properties for the resource in the cluster configuration, especially the value for Pending timeout for the resource. This value must allow enough time for the associated application or service to start. For more information, see "Viewing the value
for Pending timeout for a resource."
To view the value for Pending timeout for a resource:
To open the failover cluster snap-in, click Start, click Administrative Tools, and then click Failover Cluster Management. If the User Account Control dialog box appears, confirm that the action it displays is what you want, and then click Continue.
In the Failover Cluster Management snap-in, if the cluster you want to manage is not displayed, in the console tree, right-click Failover Cluster Management, click Manage a Cluster, and then select or specify the cluster that you want.
If the console tree is collapsed, expand the tree under the cluster you want to manage, and then expand Services and Applications.
In the console tree, click a clustered service or application.
In the center pane, if you cannot see the clustered resource that you want to view, expand one or more visible resources until you see the clustered resource.
Right-click the resource you want to view, click Properties, and then click the Policies tab.
Under Pending timeout, view the setting. Make sure it allows enough time for the associated application or service to start.
For more information please refer to following MS articles:
Network failure detection and recovery in a two-node Windows Server 2000 cluster
http://technet.microsoft.com/en-us/library/cc756225(WS.10).aspx
Hope this helps!
TechNet Subscriber Support
If you are
TechNet Subscription user and have any feedback on our support quality, please send your feedback
here.
Lawrence
TechNet Community Support

Does a sql server client application needs to be modified to allow it to have benefits of running on a SQL Server 2012 cluster?

I have a client application in c++ which interacts with sql server database. My question is whether I need to make any changes to the client application code to allow it to have the benefits of running on a SQL server 2012 cluster environment.
To elaborate more on my query my concern is for e.g if my application has called an api to execute a sql query and during
the execution of this query the sql server (part of the cluster) goes down then as per my understanding the sql cluster would ensure that another node takes up the task from the current sql server which has gone down. Is this transition transparent to the
client application or in such a case my client application needs to again make a new connection and again execute the query?

Hello,
Just as Shanky post above, When you connected to a database in an availability group and specify the availability group listener in the connection string, if the availability group fails over, the original connection is broken, your application
should try a new connection after the failover.
So, when connect to an availability group, please try to increasing connection timeout and implementing connection retry logic to increase the probability of successful connection.
Reference:SqlClient Support for High Availability, Disaster Recovery
Regards,
Fanny Liu
If you have any feedback on our support, please click here.
Fanny Liu
TechNet Community Support

SQL SERVER Failover Cluster switch failure because the passive node automatically reassign drive letter

Similar Messages

Maybe you are looking for