SQL 2012 AlwaysOn cluster IP not moving after failover, causing database to be read-only

SQL Server Cluster Name: SQLDAG01
SQL Server Cluster IP: 10.0.0.50
Cluster Listener IP: 10.0.0.60
Node 1 Name: SQL01
Node 1 IP: 10.0.0.51
Node 2 Name: SQL02
Node 2 IP: 10.0.0.52
Everything is fine when SQL01 is the primary. When failing over to SQL02, everything looks fine in the dashboard but for some reason the cluster IP, 10.0.0.50, is stuck on node 1. The databases are configured to provide secondary read access. When executing
a query on SQLDAG01, I get an error that the database is in read-only mode. Connectivity tests verify that SQLDAG01, 10.0.0.50, connects to SQL01 even though SQL02 is now the primary.
I've been Googling this for the better part of the day with no luck. Any suggestions? Is there a Powershell command force the cluster IP to move to the active node or something? Also I'm performing the failover as recommended, from Management Studio connected
to the secondary node.

This was the answer, it had been setup to use the cluster name instead of the application name. Whoever installed Sharepoint connected it to SBTSQLDAG01 instead of SHAREPOINT01. Once we changed Sharepoint to connect to SHAREPOINT01, the failover worked as
expected. We did have a secondary issue with the ARP cache and had to install the hotfix from http://support.microsoft.com/kb/2582281 to resolve it. One of the Sharepoint app servers was failing to
ping the SQL node after a failover, the ARP entry was stuck pointing to the previous node. This article actually helped a lot resolving that: http://blog.serverfault.com/2011/05/11/windows-2008-and-broken-arp/
One thing I did notice is that the SQL failover wizard does not move cluster groups "Available Storage" and "Cluster Group", I had to move those through the command line after using the wizard. I'm going to provide the client with a Powershell script that
moves all cluster groups when they need to do a manual failover. This also happens to be why the Sharepoint issue started, "Cluster Group" is what responds to the cluster name SBTSQLDAG01. Moving that group over to the node that has the active SQL cluster
group also made it work properly, but using the application name is the correct method.
Thanks everyone for all your help. Although the nitpicking about terminology really didn't help, that was a pointless argument and we really could have done without it. Yeah I know 2008 called is "Failover Cluster Manager" and MSCS is the "2003 term" but
really, they're basically the same thing and we don't really need to derail the conversation because of it. Also, If you look at the screenshot below you can clearly see "AlwaysOn High Availability" in SQL Management Studio. That's what it's called in SQL,
that's where you do all the work. Trying to tell me it's "not a feature" is wrong, pointless, and asinine, and doesn't get us anywhere.
Sorry it took so long to get back, I was off the project for a couple weeks while they were resolving some SAN issues that caused the failover to happen in the first place.

Similar Messages

DPM 2012 SP1 and SharePoint 2013 on a SQL 2012 AlwaysOn AG

I am trying to protect a new SharePoint Foundation 2013 farm with it's databases stored on an SQL 2012 AlwaysOn Availability Group. I've run configureSharePoint.exe -EnableSharePointProtection on my WFE and SharePoint shows up in DPM when I try to
add it to a protection group. But I get the 32008 referenced here: http://support.microsoft.com/kb/970641 when trying to select the farm to back up.
If I run ConfigureSharePoint.exe -resolveallSQLAliases on the WFE it doesn't return anything.
I'm directly backing up databases on the SQL 2012 server, but not any of the SharePoint DBs.
K

Hi,
I'm having some trouble understanding what I'm seeing in the writer output from the WFE server. Specifically the machine names that the writer is reporting.
If you look in the wfewtiters.txt, What is the machine that ends in 02 ? Is that a physical machine or an alias ?
The machine ending in 06 is the Secondary SQL Server and that is what the Sharepoint writer is pointing to.
The WFE VSS Sharepoint writer shows this for two content DB’s
* WRITER "SharePoint Services Writer"
  - Writer ID   = {da452614-4858-5e53-a512-38aab25c61ad}
  - Writer instance ID = {c0c46d7b-1c01-44bd-8353-e3433f5b8f07}
  - Supports restore events = TRUE
  - Writer restore conditions = VSS_WRE_ALWAYS
  - Restore method = VSS_RME_RESTORE_AT_REBOOT_IF_CANNOT_REPLACE
  - Requires reboot after restore = FALSE
  - Excluded files:
+ Component "SharePoint Services Writer:\XXXX02\WSS_Content_at.contoso.local"
- Name: WSS_Content_at.Contoso.local
- Logical path: XXXX02
- Full path: \XXXX02\WSS_Content_at.Contoso.local
- Caption: Content Database WSS_Content_at.Contoso.local
- Type: VSS_CT_DATABASE [1]
- Is selectable: TRUE
- Is top level: TRUE
- Notify on backup complete: FALSE
- Paths affected by this component:
- Volumes affected by this component:
- Component Dependencies:
- Dependency to "{a65faa63-5ea8-4ebc-9dbd-a0c4db26912a}:\\YYYY06\YYYY06\WSS_Content_at.contoso.local"
                                + Component "SharePoint Services Writer:\XXXX02\WSS_Content_fw.Contoso.local"
- Name: WSS_Content_fw.contoso.local
- Logical path: XXXX02
- Full path: \XXXX02\WSS_Content_fw.Contoso.local
- Caption: Content Database WSS_Content_fw.Contoso.local
- Type: VSS_CT_DATABASE [1]
- Is selectable: TRUE
- Is top level: TRUE
- Notify on backup complete: FALSE
- Paths affected by this component:
- Volumes affected by this component:
- Component Dependencies:
- Dependency to "{a65faa63-5ea8-4ebc-9dbd-a0c4db26912a}:\\YYYY06\YYYY06\WSS_Content_fw.contoso.local"
Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT]
This posting is provided "AS IS" with no warranties, and confers no rights.

SQL 2012 AlwaysOn AG with 3 Nodes

First, what would be the best option of quorum/quorum witness for my SQL 2012 AlwaysOn group?
I have setup the following:
SQL 2012 AlwaysOn with 3 nodes.
1 primary
2 secondaries - one of them is read-only (for reporting and backup)
Dynamics CRM and Reporting DBs are running on them.
How can I achieve the highest avalability and DR with my SQL? Do I even need to create the quorum witness? What kind of Quorum (majority nod with network share?)
Second, Is it good idea to run the backup on that read-only node? If I setup the maintenance plan on this 3rd read-only mode with full and transactional backups, I should be able to restore without any issues?
Third, I have another DB (SharePoint) that needs to be moved over to this SQL AlwaysOn AG but wondering if I can restore it in the secondary node to utilize the unused resources. Is that possible or not recommended?

Are the replicas on the same network subnet? If they are, then, you already have a quorum with node majority (odd number of votes.) Since you are already paying for an additional license to run read-only workloads on your readable secondary, you can take
backups on this replica. Full database backups are in COPY_ONLY mode but you can take regular log backups which will truncate the log on the primary replica. In theory, you will be able to restore the backups taken from your readable secondary but you should
test (hence, why I mentioned "in theory.") I wouldn't move my SharePoint databases to this SQL Server Availability Group mainly because of the different requirements between the two - SharePoint and Dynamics CRM. MAXDOP =1, disabling index maintenance and
statistics updates on SharePoint databases, etc. are just a few of them. Besides, I doubt that the recovery objectives and service level agreements between those two applications are the same within your organization.
Edwin Sarmiento SQL Server MVP | Microsoft Certified Master
Blog |
Twitter | LinkedIn
SQL Server High Availability and Disaster Recover Deep Dive Course

Will SCCM 2012 R2 , SCOM 2012 R2 and SCDPM R2 2012 support SQL 2012 AlwaysOn Availability Group ?

Will SCCM 2012 R2 , SCOM 2012 R2 and SCDPM R2 2012 support SQL 2012 AlwaysOn Availability Group ?

Hi,
It is listed here:
http://technet.microsoft.com/en-us/library/dn281933.aspx
Configuration Manager 2012 r2 and DPM 2012 r2 does not support AlwaysOn.
regards,
Jörgen
-- My System Center blog ccmexec.com -- Twitter
@ccmexec

SQL 2012 AlwaysOn upgrade to SQL 2014

Anyone have some info regarding inplace upgrade from SQL 2012 AlwaysOn to SQL 2014 ?
-Arnstein

Hi,
Please check the following articles:
Upgrade and Update of Availability Group Servers with Minimal Downtime and Data Loss
https://msdn.microsoft.com/en-us/library/dn178483.aspx
Thanks.

Enqueue replication server does not terminate after failover

Hi,
We are trying to setup high availability of enqueue server where in we have running enqueue server on node-A and ERS on node-B all the time.
Whenever enqueue is stopped on node-A, it automatically failovers on node-B but after replication of lock table, enqueue does not terminate the ERS running on node-B and as a result our enqueue and ERS both keeps running on the same host (Failover node-B) which should not be the case.
We havenu2019t configured polling in that scenario SAP note-1018968 depicts the same however this is applicable only for version-640 and 700.
Ideally when enqueue server switches to node-B, it should terminate the ERS on the same node after replication and then HA software would take care of its restart on node -A.
We have ERS running of version 701; could anyone please let me know if the same behaviour is common for 701 version as well?
Or there is any additional configuration to be done to make it working.
Thanks in advance.
Cheers !!!
Ashish

Hi Naveed,
Stopping ERS is suppose to be taken care by SAP only and not the HA software.
Once ERS stops on node -B there would be a fault reported and as a result HA software will restart the ERS on node A.
Please refer to a section of SAP Note 1018968 - 'Enqueue replication server does not terminate after failover'
"Therefore, the cluster software must only organise the restart of the replication server and does not need to do anything for the shutdown."
Another blog about the same:
http://www.symantec.com/connect/blogs/veritas-cluster-server-sap
- After the successful lock table takeover, the Enqueue Replication Server will fault on this node (initiated by SAP). Veritas Cluster Server recognizes this failure and initiates a failover to a remaining node to create SAP Enqueue redundancy again. The Enqueue Replication Server will receive the complete Enqueue table from the Enqueue Server (SCS) and later Enqueue lock updates in a synchronous fashion.
So it is nothing about HA software, it is the SAP which should control ERS on node-B.
Cheers !!!
Ashish

My imac will not load after I enter my password. The only thing I get is the arrow from my mouse on top of a blank white screen. Can anyone tell me what this is? I've restarted and turned off several times. I left it on in this state for 8 hours and

My imac will not load after I enter my password. The only thing I get is the arrow from my mouse on top of a blank white screen. Can anyone tell me what this is? I've restarted and turned off several times. I left it on in this state for 8 hours hoping it would reload and work. No luck.

Hello KCC4ME,
You may try booting your Mac in Safe Boot, as it can resolve many issues that may prevent a successful login.
OS X: What is Safe Boot, Safe Mode?
http://support.apple.com/kb/HT1564
If a Safe Boot allows you to successfully log in, you may have issues with one or more login itmes (while the following article is labelled as a Mavericks article, it is viable for earlier versions of the Mac OS, as well).
OS X Mavericks: If you think you have incompatible login items
http://support.apple.com/kb/PH14201
Cheers,
Allen

SQL Developer Blocked and is not allowed to connect to database ORA-200001

SQL Developer Blocked and is not allowed to connect to database gives ORA-200001.
i found on net that DBA can write triggers which can deny connection to database from certain appplication.
so i want a way to change application name so that when it connects to database
the V$SESSION will have different value (other than SQL Developer) in column PROGRAM and/or MODULE (which i think is used by dba to restrict the connection).
one more way is
i am using jdbc url to connect to database
in java we can change properties of connection to change the PROGRAM in V$SESSION.
but i am not java expert so dont know how and where to make the changes
either way my aim is to connect to database such that V$SESSION will have different value (other than "SQL Developer") in column PROGRAM and/or MODULE

This is not system configration or credential issue.
This is a check which is put DBA using logon trigger
to check certain user loging in using certain application.
only some users using particular username are getting this error when they try to logon using oracle sql developer
the same users when uses different username(some generic user names created to access database) to access the same database then they are able to login.
similarly if they login using SQL navigator they are able to login with both their own and the generic user name
more over of all the database instances this is only happening on some of the instances and all of them are development instances.
following will help you understand
USER | DBUSERNAME | DATABASE | APPLICATION | ACTION
X | X | DB1 | Oracle Sql Developer | Blocked
X | G(Genric) | DB1 | Oracle Sql Developer | Login Success
X | X | DB1 | Sql Navigator/SqlPLus | Login Success
X | G(Genric) | DB1 | Sql Navigator/SqlPLus | Login Success
X | X | DB2 | Oracle Sql Developer | Login Success
X | G(Genric) | DB2 | Oracle Sql Developer | Login Success
X | X | DB2 | Sql Navigator/SqlPLus | Login Success
X | G(Genric) | DB2 | Sql Navigator/SqlPLus | Login Success
I just want to bypass this check which i think uses V$SESSION and columns PROGRAM and/or MODULE to check the application used by particular user.
if i can override these values some where in oracle sql developer before loging in the DB then i can clearly pass this check and login to database.
Edited by: user13430736 on Jun 21, 2011 4:05 AM
Edited by: user13430736 on Jun 21, 2011 4:12 AM

Reports not Generated after moving SCCM 2012 SP1 DB on SQL 2012 SP1 cluster

Hi all,
i have moved my SCCM 2012 SP1 DB on SQL 2012 SP1 Failover Cluster,this cluster is a two node cluster.the DB was moved fine and the DB configuration in SCCM Site maintenance was also done successfully.After the migration process,SCCM was working fine created
few collections & packages, then checked the DB on cluster & they were updated successfully.
but when i tried to install report server on my SCCM machine,again there was no instance displayed after entering the cluster name.
I searched this & come across this post :-
http://social.technet.microsoft.com/Forums/en-US/configmanagergeneral/thread/4479e73e-8a19-4c7e-9418-b36770656b9b/
which says to install report server on the cluster node machine which is currently running the Reporting Service.I tried to install the report server & the instance was detected this time.it was installed successfully.
But when i tried to generate reports I got this error :-
Permissions are fine as i have added SCCMadmin & sccm machine in the local admin of both the nodes.
Please Help.
Thanks,
Pranay.

Yes, I know this is an old post, but I’m trying to clean them up. Did you solve this problem, if so what was the solution?
If you look at your second screenshot the Remote errors is not enabled. Use this blog to enable them.
http://www.enhansoft.com/blog/enabling-remote-errors-in-sql
If changing the data source on the report fixed it for a single report. It means that your shared Datasource is having problems. Reset the username and password within CM12 console and this should fix it for every report.
Garth Jones | My blogs: Enhansoft and
Old Blog site | Twitter:
@GarthMJ

SQL 2012 AlwaysON High Availability for SharePoint 2013

Our Company have 2 Webfront Servers for Sharepoint 2013 and one Database SQL 2012 Server.
We got one more Server & we don't have Storage so need to configure Always On.
There are Some Confusions:
1- Database Server is in production, so how much down time required to Achieve the AlwaysOn?
2- What are the Changes need to be done on Production Server?
3- What are the Steps to be followed While Configuring new Database Server?
Regards,

Hi Genius1985,
According to your description, you want to configure a SQL Server 2012 AlwaysOn Availability Group for your database, right?
Question 1: Database Server is in production, so how much down time required to achieve the AlwaysOn?
There is no a certain downtime for AlwaysOn, it depends on the configuration of our AlwaysOn Availability Group, normally it can be several seconds or minutes. In order to understand why there is downtime for SQL Server with Microsoft Clustering, please refer
to the following article:
http://www.mssqltips.com/sqlservertip/1882/understanding-why-there-is-still-downtime-for-sql-server-with-microsoft-clustering/
Question 2 and 3: What are the Changes need to be done on Production Server? What are the Steps to be followed While Configuring new Database Server?
Since AlwaysOn Availability Groups require a Windows Server Failover Cluster, we first need to add the Windows Failover Cluster Feature to all the machines running the SQL Server instances that we will configure as replicas.
Once the Windows Server Failover Cluster has been created, we need to proceed with enabling the AlwaysOn Availability Groups feature in SQL Server 2012. This needs to be done on all of the SQL Server instances that we will configure as replicas in our
Availability Group.
For more details about Step-By-Step: Creating a SQL Server 2012 AlwaysOn Availability Group, please refer to the following article:
http://blogs.technet.com/b/canitpro/archive/2013/08/20/step-by-step-creating-a-sql-server-2012-alwayson-availability-group.aspx
If you have any question, please feel free to let me know.
Regards,
Jerry Li

SQL 2012 AlwaysOn Dual Data Centre (an instance in each data centre with a secondary in each other respectively)

Hi, hopefully someone will be able to find my scenario interesting enough to comment on!
We have two instances of SQL, for this example I will call them 'L' and 'J'. We also have two data-centres, for this example I will call them 'D1' and 'D2'. We are attempting to create a new solution and our hardware budget is rather large. The directive
from the company is that they want to be able to run either instance from either data centre. Preferably the primary for each will be seperated, so for example:
Instance 'L' will sit in data centre 'D1' with the ability to move to 'D2', and...
Instance 'J' will sit in data centre 'D2' with the ability to move to 'D1' on request.
My initial idea was to create a 6-node cluster - 3-nodes in each data centre. Let's name these D1-1, D1-2, D1-3 and D2-1, D2-2, D2-3 to signify which data centre they sit in.
'L' could then sit on (for example) D1-1, with the option to move to D1-2 (synchronously), D2-1,D2-2 (a-synchronously)
'J' could sit on D2-3, with D2-2 as a synchronous secondary and D1-3,D1-2 as the asynchronous secondaries.
Our asynchronous secondaries in this solution are our full DR options, our synchronous secondaries are our DR option without moving to another data centre site. The synchronous secondaries will be set up as automatic fail-over partners.
In theory, that may seen like a good approach. But when I took it to the proof of concept stage, we had issues with quorum...
Because there are three nodes at each side of the fence (3 in each data centre), then neither side has the 'majority' (the number of votes required to take control of the cluster). To get around this, we used WSFC with Node and File Share majority - with
the file share sitting in the D1 data centre. Now the D1 data centre has 4 votes in total, and D2 only has 3.
This is a great setup if one of our data centres was defined as the 'primary', but the business requirement is to have two primary data centres, with the ability to fail over to one another.
In the proof of concept, i tested the theory by building the example solution and dropping the connection which divides the two data centres. It caused the data centre with the file share to stay online (as it had the majority), but the other data centre
lost it's availability group listeners. SQL Server stayed online, just not via the AG listener's name - i.e. we could connect to them via their hostnames, rather than the shared 'virtual' name.
So I guess really I'm wondering, did anyone else have any experience of this type of setup? or any adjustments that can be made to the example solution, or the quorum settings in order to provide a nice outcome?

So if all nodes lost connectivity to the fileshare it means that there are a total number of 6 votes visible to each node now. Think of people holding up their hands and each one can see the hand. If the second link between the two sites went down then each
node on each side would only see 3 hands being held up. Since Quorum maximum votes =7, the majority needed to be seen by a node would be 4. So in that scenario, every node would realize it had lost majority and would offline from the cluster.
Remember that quorum maximum (and therefore majority), never changes *unless* YOU change node weight. Failures just mean then is one less vote that can be cast, but the required majority remains the same.
Thanks for the complement btw -very kind! I am presuming by your tag that you might be based in the UK. If so and you are ever nearby, make sure you drop by and say hello! I'll be talking at the
London SQL UG two weeks from today if you are around.
Regards,
Mark Broadbent.
Contact me through (twitter|blog|SQLCloud)
Please click "Propose As Answer" if a post solves your problem
or "Vote As Helpful" if a post has been useful to you
Come and see me at the
PASS Summit 2012

SQL 2012 AlwaysOn discoveries broken in 6.4.1.0?

It looks like version 6.4.1.0 of the SQL 2012 MP broke discovery of AlwaysOn (the AlwaysOn Seed). The old version had this for the registry key (note the bit in red, it's missing from the new one):
SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.$Target/Property[Type="SQLServer!Microsoft.SQLServer.ServerRole"]/InstanceName$\MSSQLServer\HADR\HADR_Enabled
And then this is 6.4.1.0:
SOFTWARE\Microsoft\Microsoft SQL Server\$Target/Property[Type="SQLServer!Microsoft.SQLServer.DBEngine"]/InstanceID$\MSSQLServer\HADR\HADR_Enabled
The discovery works fine for our management group that's using the old version, but it's not working for our MG using the new version. Is anyone else running into this?
"Fear disturbs your concentration"

Hi,
SQL Server Management Pack version 6.4.0.0 improves performance of AlwaysOn discovery.
Appendix: AlwaysOn Management Pack Contents
http://technet.microsoft.com/en-us/library/dn456931.aspx
Please ensure your Management Server with latest update installed. Also check if there is any related event.
Niki Han
TechNet Community Support

CS4/Win: markers are not moved after resizing a clip in timeline

Hello,
when I am resizing a clip in the timeline my markers after this clips in the same timeline are not moved, too.
Bug or feature?
If it is a bug: stilil

function(){return A.apply(null,[this].concat($A(arguments)))}
cmeira wrote:
Ok, thanks
But there is a question left: bug or feature? I can't see an advantage of this behavior ... and no solution...
The same behavior in CS5, too?
Neither, is as designed.
CS5 is excactly the same.

Cluster does not work after a while

Hi :
          I have 2 clustered EJB server, the IP address is 192.168.0.226 and
          192.168.0.227.
          A servlet server is calling these two EJB servers with
          t3://192.168.0.226,192.168.0.227:7001
          All three machine use weblogic 5.1 sp9, Win2K Advanced Server.
          If I restart weblogic service, then the cluster works fine. but after while
          the cluster does not work, all the client request goto only one machine, in
          most case it's 227.
          It has no use even if I restart the service in the losted server. I have to
          restart all the server.
          The weblogic.log in the losted server is like this :
          Thu Apr 26 00:04:16 GMT 2001:<I> <RJVM> Signaling
          peer -6817319611378695685S192.168.0.204:[7001,7001,7002,7002,7001,-1] gone:
          weblogic.rjvm.PeerGoneException:
          - with nested exception:
          [java.io.EOFException]
          Thu Apr 26 00:04:47 GMT 2001:<I> <RJVM> Signaling
          peer -2123734719233546013S192.168.0.227:[7001,7001,7002,7002,7001,-1] gone:
          weblogic.rjvm.PeerGoneException:
          - with nested exception:
          [java.io.EOFException]
          Thu Apr 26 00:14:16 GMT 2001:<E> <ConMan> Attempt to sendMsg using a closed
          connection
          Thu Apr 26 00:14:16 GMT 2001:<E> <RJVM> Exception on send :
          weblogic.rmi.ConnectException: Attempt to sendMsg using a closed connection
          Please help me
          thanks
          andrew


Are you running with any proxy? Is the servlet server also the static httpd
          server?
          What type of network equipment are you using? If its high-end (Cisco) make sure
          all ports are set to 100/Full.
          Also, try setting the SendDelay slightly higher (ie. 25, 30, 35). Experiment
          with different values.
          weblogic.cluster.multicastSendDelay=25
          andrew wrote:
          > Mike, thanks for reply
          > The 192.168.0.204 is the servlet server.
          > I changed the NICs to use 100Mbps/Full duplex. it looks better: the error
          > msg is less than before, the it still happened.
          > Any suggestion?
          > thanks
          > andrew
          >
          > "Mike Kincer" <[email protected]> wrote in message
          > news:[email protected]...
          > > Which box is 192.168.0.204 ??
          > > It is communicating via multicast, which I would say is suspect.
          > > Otherwise, I'd say you have some network issues.
          > > Make sure all switch ports and all NICS are NOT set to "auto" config.
          > Select
          > > 100Mbps/Full duplex on all ports.
          > >
          > > andrew wrote:
          > >
          > > > Hi :
          > > > I have 2 clustered EJB server, the IP address is 192.168.0.226 and
          > > > 192.168.0.227.
          > > > A servlet server is calling these two EJB servers with
          > > > t3://192.168.0.226,192.168.0.227:7001
          > > >
          > > > All three machine use weblogic 5.1 sp9, Win2K Advanced Server.
          > > >
          > > > If I restart weblogic service, then the cluster works fine. but after
          > while
          > > > the cluster does not work, all the client request goto only one machine,
          > in
          > > > most case it's 227.
          > > >
          > > > It has no use even if I restart the service in the losted server. I have
          > to
          > > > restart all the server.
          > > >
          > > > The weblogic.log in the losted server is like this :
          > > >
          > > > Thu Apr 26 00:04:16 GMT 2001:<I> <RJVM> Signaling
          > > > peer -6817319611378695685S192.168.0.204:[7001,7001,7002,7002,7001,-1]
          > gone:
          > > > weblogic.rjvm.PeerGoneException:
          > > > - with nested exception:
          > > > [java.io.EOFException]
          > > > Thu Apr 26 00:04:47 GMT 2001:<I> <RJVM> Signaling
          > > > peer -2123734719233546013S192.168.0.227:[7001,7001,7002,7002,7001,-1]
          > gone:
          > > > weblogic.rjvm.PeerGoneException:
          > > > - with nested exception:
          > > > [java.io.EOFException]
          > > > Thu Apr 26 00:14:16 GMT 2001:<E> <ConMan> Attempt to sendMsg using a
          > closed
          > > > connection
          > > > Thu Apr 26 00:14:16 GMT 2001:<E> <RJVM> Exception on send :
          > > > weblogic.rmi.ConnectException: Attempt to sendMsg using a closed
          > connection
          > > >
          > > > Please help me
          > > > thanks
          > > > andrew
          > >
          > > --
          > > /\/\i|<e
          > >
          > > Mike Kincer
          > > Solutions Developer/Engineer
          > > Atlas Commerce "ebusiness evolved"
          > > Office phone: +1-607-741-8877
          > > mailto:[email protected] [http://www.atlascommerce.com]
          > >
          > >
          /\/\i|<e
          Mike Kincer
          Solutions Developer/Engineer
          Atlas Commerce "ebusiness evolved"
          Office phone: +1-607-741-8877
          mailto:[email protected] [http://www.atlascommerce.com]

Server 2012 NFS - 10 minutes to resume after failover?

I've got a Server 2012 Core cluster running an HA file server role with the new NFS service. The role has two associated clustered disks. When I fail the role between nodes, it takes 10-12 minutes for the NFS service to come back online - it'll sit in 'Online
Pending' for several minutes, then transition to 'Failed', then finally come 'Online'. I've looked at the NFS event logs from the time period of failover and they look slightly odd. For example, in a failover at 18:41, I see this in the Admin log:
Time
EventID
Description
18:41:59
1076
Server for NFS successfully started virtual server {fc4bf5c0-c2c9-430f-8c44-4220ff6655bd}
18:50:47
2000
A new NFS share was created. Path:Y:\msd_build, Alias:msd_build, ShareFlags:0xC0AE00, Encoding:7, SecurityFlavorFlags:0x2, UnmappedUid:4294967294, UnmappedGid:4294967294
18:50:47
2000
A new NFS share was created. Path:Z:\eas_build, Alias:eas_build, ShareFlags:0xC0AE00, Encoding:7, SecurityFlavorFlags:0x2, UnmappedUid:4294967294, UnmappedGid:4294967294
18:50:47
2002
A previously shared NFS folder was unshared. Path:Y:\msd_build, Alias:msd_build
18:50:47
2002
A previously shared NFS folder was unshared. Path:Z:\eas_build, Alias:eas_build
18:50:47
1078
NFS virtual server {fc4bf5c0-c2c9-430f-8c44-4220ff6655bd} is stopped
18:51:47
1076
Server for NFS successfully started virtual server {fc4bf5c0-c2c9-430f-8c44-4220ff6655bd}
18:51:47
2000
A new NFS share was created. Path:Y:\msd_build, Alias:msd_build, ShareFlags:0xC0AE00, Encoding:7, SecurityFlavorFlags:0x2, UnmappedUid:4294967294, UnmappedGid:4294967294
18:51:47
2000
A new NFS share was created. Path:Z:\eas_build, Alias:eas_build, ShareFlags:0xC0AE00, Encoding:7, SecurityFlavorFlags:0x2, UnmappedUid:4294967294, UnmappedGid:4294967294
In the Operational log, I see this:
Time
EventID
Description
18:41:51
1108
Server for NFS received an arrival notification for volume \Device\HarddiskVolume11.
18:41:51
1079
NFS virtual server successfully created volume \Device\HarddiskVolume11 (ResolvedPath \Device\HarddiskVolume11\, VolumeId {69d0efca-c067-11e1-bbc5-005056925169}).
18:41:58
1108
Server for NFS received an arrival notification for volume \Device\HarddiskVolume9.
18:41:58
1079
NFS virtual server successfully created volume \Device\HarddiskVolume9 (ResolvedPath \Device\HarddiskVolume9\, VolumeId {c5014a4a-d0b8-11e1-bbcb-005056925167}).
18:41:59
1079
NFS virtual server successfully created volume \Pfs\Volume{fc4bf5c0-c2c9-430f-8c44-4220ff6655bd}\ (ResolvedPath \Pfs\Volume{fc4bf5c0-c2c9-430f-8c44-4220ff6655bd}\, VolumeId {fc4bf5c0-c2c9-430f-8c44-4220ff6655bd}).
18:41:59
1105
Server for NFS started volume \Pfs\Volume{fc4bf5c0-c2c9-430f-8c44-4220ff6655bd}\ (ResolvedPath \Pfs\Volume{fc4bf5c0-c2c9-430f-8c44-4220ff6655bd}\, VolumeId {fc4bf5c0-c2c9-430f-8c44-4220ff6655bd}).
18:41:59
1079
NFS virtual server successfully created volume \DosDevices\Y:\ (ResolvedPath \Device\HarddiskVolume9\, VolumeId {c5014a4a-d0b8-11e1-bbcb-005056925167}).
18:44:06
1116
Server for NFS discovered volume Z: (ResolvedPath \Device\HarddiskVolume11\, VolumeId {69d0efca-c067-11e1-bbc5-005056925169}) and added it to the known volume table.
18:50:47
1116
Server for NFS discovered volume Y: (ResolvedPath \Device\HarddiskVolume9\, VolumeId {c5014a4a-d0b8-11e1-bbcb-005056925167}) and added it to the known volume table.
18:50:47
1081
NFS virtual server successfully destroyed volume \DosDevices\Y:\.
18:50:47
1105
Server for NFS started volume Y: (ResolvedPath \Device\HarddiskVolume9\, VolumeId {c5014a4a-d0b8-11e1-bbcb-005056925167}).
18:50:47
1105
Server for NFS started volume Z: (ResolvedPath \Device\HarddiskVolume11\, VolumeId {69d0efca-c067-11e1-bbc5-005056925169}).
18:50:48
1106
Server for NFS stopped volume \Pfs\Volume{fc4bf5c0-c2c9-430f-8c44-4220ff6655bd}\ (ResolvedPath \Pfs\Volume{fc4bf5c0-c2c9-430f-8c44-4220ff6655bd}\, VolumeId {fc4bf5c0-c2c9-430f-8c44-4220ff6655bd}).
18:50:48
1081
NFS virtual server successfully destroyed volume \Pfs\Volume{fc4bf5c0-c2c9-430f-8c44-4220ff6655bd}\.
18:51:47
1079
NFS virtual server successfully created volume \Pfs\Volume{fc4bf5c0-c2c9-430f-8c44-4220ff6655bd}\ (ResolvedPath \Pfs\Volume{fc4bf5c0-c2c9-430f-8c44-4220ff6655bd}\, VolumeId {fc4bf5c0-c2c9-430f-8c44-4220ff6655bd}).
18:51:47
1105
Server for NFS started volume \Pfs\Volume{fc4bf5c0-c2c9-430f-8c44-4220ff6655bd}\ (ResolvedPath \Pfs\Volume{fc4bf5c0-c2c9-430f-8c44-4220ff6655bd}\, VolumeId {fc4bf5c0-c2c9-430f-8c44-4220ff6655bd}).
From this, I'm not sure what's going on between 18:41:59 and 18:44:06, between 18:44:06 and 18:50:47 or between 18:50:48 and 18:51:47. What's the NFS volume discovery doing and why does it take so long?
Does anyone have any thoughts as to where I could start looking to work out what's happening here? Is there any tracing that can be enabled for the NFS services to indicate what's going on?
Thanks in advance!

I was able to get some downtime this afternoon so I tried
deleting the NFS share in question and recreating it
deleting all the NFS shares on the clustered file server (thus removing the NFS Server resource) and recreating them
deleting all the NFS shares and the ._nfs folder from all the associated drives and recreating them
deleting the clustered file server altogether, shutting down the cluster, starting it back up and recreating the file server.
None of these made any difference - this particular NFS resource still took about 10 minutes to return to service. I'm therefore supposing it's some aspect of the disk or the data on it that NFS is taking a long time to enumerate, but it's annoying that
I don't have any visibility into what's going on. I might try asking this same question in the MS partner forums to see if I get any answers there...

SQL 2012 AlwaysOn cluster IP not moving after failover, causing database to be read-only

Similar Messages

Maybe you are looking for