Automatic failover to an alternate Directory Server
We have two messaging server (Server A and Server B) running with independent LDAP server and replication is enabled between two.
In order to use automatic failover to an alternate Directory Server, I configured like this on Server A
configutil -o local.ugldaphost -v �serverA serverB�
Now to test the configuration I stopped the LDAP server on server A and run[b] imsimta test �rewrite-debug, it worked fine. But when I tried the authenticate user from pop3 or HTTP it came out with an error �Authentication Server is temporarily unavailable�
Any reason why it�s not doing the failover for HTTP, POP3 and IMAP authentication?
sorry I forget this
Sun Java(tm) System Messaging Server 6.1 HotFix 0.01 (built Jun 24 2004)
libimta.so 6.1 HotFix 0.01 (built 17:31:31, Jun 24 2004)
SunOS test.abc.com 5.9 Generic_112234-03 i86pc i386 i86pc
Similar Messages
-
How failover works with SunONE Directory Server?
Assume that I setup 2 masters using the multimaster scheme.
When 1 master fails/down, how do the client knows or get routed to the other master?For full redundancy:
At the application level:
-redundant storage (raid, san,nas)
-multiple connectors to this storage (fiber,ethernet...)
-multiple LDAP servers (multimaster, replica's)
-multiple LDAP proxy servers
-redundant switches/routers (vrrp, ...)
-loadbalanced by redundant interconnected loadbalancers (level7)
All this helps in non persistant connections, if application are using connection pooling (for performance reasons), you have to verify the behaviour. Some applications only create this pool at start, but if the pool connections brake, it should reconnect. -
I plan to have failover capability between two directory servers, they both are java system directory server 6 enterprise edition. I am not sure if the replication is the right solution for failover. How does the failover work? All my systems are solaris 9 systems and I already have one directory server 6 as ldap server and one native solaris ldap client as a test client.
thanks,
--xinhuanThanks for your information.
I still don't understand what the proxy server will be doing. If I put two directory server ips on the client side configuration file, will the client connect to the other server in case one server is down automatically, given that I am using native Solaris ldap client. Why it is necessary to put a proxy server in front of the two master servers? I actually don't need the load balancing but indeed, I need the failover feature. If I don't use the proxy server, does the failover happen automatically or by human intervention?
thanks,
--xinhuan -
Automatic failover doesn't failback to the first server if the second server is lost.
Hi Everybody,
We use the database mirroring a lot in our product solutions and we have recently experienced a strange behaviour in our failover tests with SQL2008R2.
We have 2 servers running Windows 2008 R2 standard and SQL 2008 R2 standard SP2. (let's call them DB1 and DB2)
We also have a Witness workstation running SQL 2008 Express on a Windows 7
A database from DB1 is mirrored to DB2 in "safety full" mode, with witness. At this stage, the database is principal on DB1 and mirror on DB2
To test the automatic failover, we first restart the DB1 server which has the database in principal mode
After a few seconds, the database on DB2 becomes principal, which is normal , that's exactly what we want.
After a few minutes, DB1 comes back online and its database takes the mirror role (still OK). At this stage then, the database is principal on DB2 and mirror on DB1
when the monitoring application shows that the mirror is synchronized and that both servers are connected to the witness, we restart DB2 to trigger an automatic failover to DB1.
What we see is that DB1 never takes the principal role and the database stays in mirror.
In the DB1 Errorlog, I only see these 2 lines when DB2 disappears, no other message related to the mirroring session.
2014-01-22 08:57:26.91 spid43s Starting up database 'Test123'.
2014-01-22 08:57:26.95 spid43s Bypassing recovery for database 'Test123' because it is marked as a mirror database, which cannot be recovered. This is an informational message only. No user action is required.
When DB2 comes back online, the database on DB2 keeps its principal status and the database on DB1 stays mirror.
And what is really really strange is that, if I restart DB2 once again, directly after that, DB1 failover normally and the database on DB1 takes the principal role after a few seconds. without any configuration changes between the 2 restarts.
DB1 errorlog shows then :
2014-01-22 09:00:37.53 spid29s Error: 1474, Severity: 16, State: 1.
2014-01-22 09:00:37.53 spid29s Database mirroring connection error 4 'An error occurred while receiving data: '64(The specified network name is no longer available.)'.' for 'TCP://DB2:5022'.
2014-01-22 09:00:37.53 spid18s Database mirroring is inactive for database 'Test123'. This is an informational message only. No user action is required.
2014-01-22 09:00:42.37 spid32s The mirrored database "Test123" is changing roles from "MIRROR" to "PRINCIPAL" due to Auto Failover.
2014-01-22 09:00:42.39 spid32s Recovery is writing a checkpoint in database 'Test123' (7). This is an informational message only. No user action is required.
2014-01-22 09:00:42.39 spid32s Recovery completed for database Test123 (database ID 7) in 78 second(s) (analysis 0 ms, redo 0 ms, undo 7 ms.) This is an informational message only. No user action is required.
So, if I summarize,
- a first failover from DB1 to DB2 always work
- then, a restart of DB2 never failover to DB1
- a second restart of DB2 always failover to DB1
This is pretty much systematic on one our server couple.
Any explanation for this or any idea where I can search to find the reason of this strange behavior ?
Thanks a lot for your help
SebThank you Tom
But I have already checked that and reported the Errorlog abstracts in my original post.
When DB01 disapears for the first time, nothing in the DB01 ERRORLOG (it is restarting :-) )
AND no particular error message in the DB02 ERRORLOG (nothing related to the fact that DB01 is not reachable anymore !!! )
Only these two lines
2014-01-22 08:57:26.91 spid43s Starting
up database 'Test123'.
2014-01-22 08:57:26.95 spid43s Bypassing recovery
for database 'Test123' because it is marked as a mirror database, which cannot be recovered. This is an informational message only. No user action is required.
So my main question remains Why DB02 doesn't detect that DB01 disapears (and the first time only) and why the failover mechanism doesn't trigger the failover ?
Thank you
Seb -
Hi,
We have cluster with two hosts (Host01 , host02) replicated to another server (Replica01)
in order to test automatic failover to the replica server (Replica01) We unplugged the power cables from Host01 and Host 02
now the VMs on the replica server is still off , why don't the VMs start up automatically on the replica server?
Ramy Shakeroverall there is no automatic failover in Hyper-V
Of course there is. It's enabled by Failover Clustering. This is a totally separate technology from Hyper-V Replica.
There is no automatic start up in Hyper-V Replica because it is not designed to detect a split-brain condition where the same virtual machine is running in multiple locations simultaneously. The replica site has no way to know why it can't reach the primary
system anymore. It might just be because someone unplugged a network cable. If the primary's virtual machines are still running and the replica decides to spin up its copies, you will have many troubles.
Eric Siron Altaro Hyper-V Blog
I am an independent blog contributor, not an Altaro employee. I am solely responsible for the content of my posts.
"Every relationship you have is in worse shape than you think." -
Directory Server SMF tripping over itself (crosspost)
I've posted this question in the SMF related forum too, so if replies could go there, that would be handy: [http://forums.sun.com/thread.jspa?messageID=10940406]
We have a working instance of DSEE6.3.1 under Solaris 10 managed via SMF (using the manifest generated by dsadm/dscfg -- I forget which).
# svcs -a | grep ldap-user
online 10:47:08 svc:/application/sun/ds:ds--data-ldap-user-instanceAfter a forced shutdown, DSEE starts up and does a self-recovery (as it should). When that's complete, the slapd process is running and the startup script exits with status 221 (ie. Not 0) -- however slapd is running.
SMF notices that it's !0 and tries to restart DSEE... by issuing another start. This second start then exits almost immediately saying "slapd already running" but this time exits with 0 -- are we ok? No... cos SMF then notices that all the processes it just started have gone away so it calls "stop" followed by another "start".
This is where it gets a bit hazy as it looks like DSEE never shut down cleanly again so the whole process repeats itself ad infinitum (although I suspect that's a separate issue). :-(
I guess what I'm asking is -- is there a way to stop SMF from doing that: perhaps treat exit=221 as non-fatal and perform a service check?
Log file below:
[ Feb 26 21:40:42 Enabled. ]
[ Feb 26 21:40:50 Executing start method ("/opt/SUNWdsee/ds6/bin/dsadm start --exec /data/ldap/user/instance
Failed to start Directory Server instance '/data/ldap/user/instance'
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
Directory Server instance '/data/ldap/user/instance' has detected a disorderly shutdown or a change in cache
size
Recovery phase is starting, this may take a while...
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
ns-slapd wrote the following lines in the error log (/data/ldap/user/instance/logs/errors):
##[26/Feb/2010:22:00:07 +0000] - Sun-Java(tm)-System-Directory/6.3.1 B2008.1121.0156 (64-bit) starting up
##[26/Feb/2010:22:00:09 +0000] - WARNING<20488> - Backend Database - conn=-1 op=-1 msgId=-1 - Detected Diso
rderly Shutdown last time Directory Server was running, recovering database.
##[26/Feb/2010:22:01:38 +0000] - Database recovery is 0% complete.
##[26/Feb/2010:22:01:51 +0000] - Database recovery is 100% complete.
##[26/Feb/2010:22:01:59 +0000] - WARNING<20805> - Backend Database - conn=-1 op=0 msgId=-1 - search is not
indexed base='cn=changelog' filter='(replicationcsn>=4b87f656000000000000)' scope='sub'
[ Feb 26 22:02:17 Method "start" exited with status 221 ]
[ Feb 26 22:02:17 Executing start method ("/opt/SUNWdsee/ds6/bin/dsadm start --exec /data/ldap/user/instance
Directory Server instance '/data/ldap/user/instance' is already running (pid: 352)
[ Feb 26 22:02:18 Method "start" exited with status 0 ]
[ Feb 26 22:02:18 Stopping because all processes in service exited. ]
[ Feb 26 22:02:18 Executing stop method ("/opt/SUNWdsee/ds6/bin/dsadm stop --exec /data/ldap/user/instance")
Directory Server instance '/data/ldap/user/instance' stopped
[ Feb 26 22:02:20 Method "stop" exited with status 0 ]
[ Feb 26 22:02:20 Executing start method ("/opt/SUNWdsee/ds6/bin/dsadm start --exec /data/ldap/user/instance
Failed to start Directory Server instance '/data/ldap/user/instance'
.......................... repeat ........................Well, one way around it is to write your own start script and manage the exit codes yourself.
I have some doubts about the autorestart configuration of DS, especially in a case like this where the server seems to be crashing. Realistically, you can end up worse off if your server has crashed by automatically restarting it. Your data may be corrupt, and the process may eventually stay up (especially if you work around the current issue), but the DS is not really healthy and it does need an administrator to investigate what's wrong with it. It may also return inconsistent or simply bad data to clients. All in all, I would prefer an instance in such a state to stay down and trigger alarms, assuming it has failover peers that can take on its workload. -
Directory Server SMF tripping over itself
We have a working instance of DSEE6.3.1 under Solaris 10 managed via SMF (using the manifest generated by dsadm/dscfg -- I forget which).
# svcs -a | grep ldap-user
online 10:47:08 svc:/application/sun/ds:ds--data-ldap-user-instanceAfter a forced shutdown, DSEE starts up and does a self-recovery (as it should). When that's complete, the slapd process is running and the startup script exits with status 221 (ie. Not 0) -- however slapd is running.
SMF notices that it's !0 and tries to restart DSEE... by issuing another start. This second start then exits almost immediately saying "slapd already running" but this time exits with 0 -- are we ok? No... cos SMF then notices that all the processes it just started have gone away so it calls "stop" followed by another "start".
This is where it gets a bit hazy as it looks like DSEE never shut down cleanly again so the whole process repeats itself ad infinitum (although I suspect that's a separate issue). :-(
I guess what I'm asking is -- is there a way to stop SMF from doing that: perhaps treat exit=221 as non-fatal and perform a service check?
Log file below:
[ Feb 26 21:40:42 Enabled. ]
[ Feb 26 21:40:50 Executing start method ("/opt/SUNWdsee/ds6/bin/dsadm start --exec /data/ldap/user/instance
Failed to start Directory Server instance '/data/ldap/user/instance'
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
Directory Server instance '/data/ldap/user/instance' has detected a disorderly shutdown or a change in cache
size
Recovery phase is starting, this may take a while...
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
Waiting for Directory Server instance '/data/ldap/user/instance' to start...
ns-slapd wrote the following lines in the error log (/data/ldap/user/instance/logs/errors):
##[26/Feb/2010:22:00:07 +0000] - Sun-Java(tm)-System-Directory/6.3.1 B2008.1121.0156 (64-bit) starting up
##[26/Feb/2010:22:00:09 +0000] - WARNING<20488> - Backend Database - conn=-1 op=-1 msgId=-1 - Detected Diso
rderly Shutdown last time Directory Server was running, recovering database.
##[26/Feb/2010:22:01:38 +0000] - Database recovery is 0% complete.
##[26/Feb/2010:22:01:51 +0000] - Database recovery is 100% complete.
##[26/Feb/2010:22:01:59 +0000] - WARNING<20805> - Backend Database - conn=-1 op=0 msgId=-1 - search is not
indexed base='cn=changelog' filter='(replicationcsn>=4b87f656000000000000)' scope='sub'
[ Feb 26 22:02:17 Method "start" exited with status 221 ]
[ Feb 26 22:02:17 Executing start method ("/opt/SUNWdsee/ds6/bin/dsadm start --exec /data/ldap/user/instance
Directory Server instance '/data/ldap/user/instance' is already running (pid: 352)
[ Feb 26 22:02:18 Method "start" exited with status 0 ]
[ Feb 26 22:02:18 Stopping because all processes in service exited. ]
[ Feb 26 22:02:18 Executing stop method ("/opt/SUNWdsee/ds6/bin/dsadm stop --exec /data/ldap/user/instance")
Directory Server instance '/data/ldap/user/instance' stopped
[ Feb 26 22:02:20 Method "stop" exited with status 0 ]
[ Feb 26 22:02:20 Executing start method ("/opt/SUNWdsee/ds6/bin/dsadm start --exec /data/ldap/user/instance
Failed to start Directory Server instance '/data/ldap/user/instance'
.......................... repeat ........................Well, one way around it is to write your own start script and manage the exit codes yourself.
I have some doubts about the autorestart configuration of DS, especially in a case like this where the server seems to be crashing. Realistically, you can end up worse off if your server has crashed by automatically restarting it. Your data may be corrupt, and the process may eventually stay up (especially if you work around the current issue), but the DS is not really healthy and it does need an administrator to investigate what's wrong with it. It may also return inconsistent or simply bad data to clients. All in all, I would prefer an instance in such a state to stay down and trigger alarms, assuming it has failover peers that can take on its workload. -
Open Directory Server "not responding"
This is strange, and I'm not sure what if anything is wrong...
My server is an OD Master. LDAP, Password Server, and Kerberos all report running. AFP authentication is set to Kerberos (only). Authenticated directory binding is enabled. Client computers are bound to the directory server. They connect via AFP, a ticket is created (viewable in Ticket Viewer), everything works fine (apparently).
However... in System Preferences/Accounts/Login Options, there's a red dot (not Leica) next to the directory server IP, and if I click on Edit it says "The server is not responding". This is the case for all client computers, not just one. Not sure when it started; when I set it up they were all green of course.
So, what does this "server is not responding" mean? Given that clients can do everything they need to do, can/should I consider this a non-issue?Thanks Classic and Chris. Good questions.
The server isn't behaving as expected. Following Classic's suggestion, I tried binding without SSL. I didn't expect it to work, I thought SSL was required. (Under OD Settings/Policies/Binding, "Encrypt all packets (requires SSL or Kerberos)" is checked.) But with SSL unchecked, I was prompted for diradmin username/password. I entered the correct credentials, but they were rejected. So I tried leaving the credentials blank. That bound the client to the directory successfully (green dot). But "Enable authenticated directory binding" is checked.
With the green dot, I tried connecting to the server over AFP, but could not. Only when I manually copied in the Kerberos file was I able to successfully connect to AFP. (Shouldn't the Kerberos file be created automatically at some point?)
So, clearly something is wrong with SSL, and also perhaps with my settings. (The server should only allow binding with authentication and over SSL, but it does not, and it does allow unauthenticated binding without SSL.)
OD Overview confirms that Kerberos is running. Not connected to an AD domain (nor should be).
Running the kadmin.local command gives me a very long list of items that look like e.g. service/[email protected] or service/LKDC:[email protected] One of the services listed is "afpserver". (There are also listings for a number of services that aren't run on the server.)
AFP is restricted to two groups; the username I'm using for AFP connections is a member of one of those groups. -
I have a scenario with the three nodes with server 2012 standard, each running an instance of SQL Server 2012 enterprise, participate in a
single Windows Server Failover Cluster (WSFC) that spans two data centers.
If the nodes in the primary data center are unavailable due to data center outage. Then how I can able to access node in the WSFC (Windows Server Failover Cluster) in the secondary disaster recovery data center automatically with some script.
I want to write script that can be able to check primary data center by pinging some IP after every 5 or 10 minutes.
If that IP is unable to respond then script can be able to Perform Forced Manual Failover of Availability Group (SQL Server) and WSFC (Windows Server Failover Cluster)
Can you please guide me for script writing for automatic failover in case of primary datacenter outage?please post you question on failover clusters in the cluster forum. THey will explain how this works and point you at scipts.
You should also look in the Gallery for cluster management scripts.
¯\_(ツ)_/¯ -
I have a scenario with the three nodes with server 2012 standard, each running an instance of SQL Server 2012 enterprise, participate in a
single Windows Server Failover Cluster (WSFC) that spans two data centers.
If the nodes in the primary data center are unavailable due to data center outage. Then how I can able to access node in the WSFC (Windows Server Failover Cluster) in the secondary disaster recovery data center automatically with some script.
I want to write script that can be able to check primary data center by pinging some IP after every 5 or 10 minutes.
If that IP is unable to respond then script can be able to Perform Forced Manual Failover of Availability Group (SQL Server) and WSFC (Windows Server Failover Cluster)
Can you please guide me for script writing for automatic failover in case of primary datacenter outage?You are trying to implement manually what should be happening automatically in the cluster. If the primary SQL Server becomes unavailable in the data center, it should fail over to the secondary SQL Server automatically. Is that not working?
You also might want to run this configuration by some SQL experts. I am not a SQL expert, but if you have both hosts in the data center in a cluster, there is no need for replication between those two nodes as they would be accessing
the database from some form of shared storage. Then it looks like you are trying to implement Always On to the DR site. I'm not sure you can mix both types of failover in a single configuration.
FYI, it would make more sense to establish a file share witness in your DR site instead of placing a third node in the data center for Node Majority quorum.
. : | : . : | : . tim -
Problem in Publishing the certificate to directory server
I am having problem regarding the publishing the certificate.I am using iPlanet CMS 4.7 and iPlanet directory server 5.1
In the CMS >certificate manager > publishing module > mapper
It provides(manuals) two options to enable the publishing to directory server, i.e
1)create entry automatically(default plug -in)
2)Manual entry in directory and mapper to map it.
I tried both way.When automatically create option is selected it fires an error:
Failed to create the CA entry.There may be entries in the directory hierachy which do not exist.Please create them manually.
I am not able to figure out the problem,even if I create certificate hierachy in the directory server it gives the same error.Can anyone figure out the problem so i can publish certificate.Pleae mail me the solution if anybody knows.Thank youHi,
1. Please open the original project in Captivate 3. i.e. the .cp file in Captivate 3
2. Go to menu "Audio > Audio Settings"
3. Change the bitrate to 96kbps or 64kbps
4. Change the Encoding ferwquency to 44Khz
5. Save and close the project
6. Now open the same project in Captivate 5
7. publish the project
Audio should play correctly now..
Hope this helps.
Regards,
mukul -
Binding to directory server vs. OD replica
Can someone explain the practical differences between binding a server to an OD master vs. being a replica of that OD master?
Why would I bind a server instead of making a replica? Seems like the replica would always be easier to admin and would provide the same function...?I'm wondering why someone would do this. Why bind one server to another vs. making that second server a replica?
The real issue is whether this server is going to provide authentication services to other clients.
In addition to not wanting all your data on a single machine, if you have many client systems it may overwhelm a single directory server. For these reasons you may create a replica (or number of replicas) that keep in sync with the master server and have a complete copy of the entire Open Directory database (all users, machines, groups, etc.)
These replicas can then be used to provide authentication services to client systems, as well as provide failover for the client in case this machine goes away for any reason.
In contrast there's no need for every client system to have the entire directory. If you have many machines, the number of update messages that get passed around and need to be replicated to every machine on the network would be cumbersome, at best.
Then there's also an element of security - the directory should have some level of protection since it includes data about every user, including their password and other personal details. If you replicate this to every machine then any user on your network could poke around the data at their leisure. Contrast that with a typical client machine that only has the account credentials for the current user.
So for any network you should create one master and at least one replica. Client systems should point to a replica and should not be Open Directory replicas themselves. -
SQL 2005 mirroring : Abrupt Automatic failover
hi All,
We have a SQL 2005 SP4 mirroring setup of 15 DBs with Principal(P), Mirror(M) & Witness (W).
We have now seen abrupt DB failovers for some of the databases (yest it was 4 out of 15) from P to M.
Errors were seen on Witness server as follows for all Dbs that failed over:
Date 07/01/2015 11:07:48 PM
Log SQL Server (Current - 08/01/2015 12:00:00 AM)
Source spid19s
Message
The mirroring connection to "TCP://<server.domain.com>:5022" has timed out for database "<DBName>" after 10 seconds without a response. Check the service and network connections.
Actions taken:
1. Network and Firewall team reverted that no error detected and no network traffic between the witness server and db server during the db auto failover period.
2. On the system side, we have verified that no hardware error found on either VM or SAN storage, and no Symantec SQL backup jobs running nor anti virus scanning during the db auto failover period too.
3. We did see some high amount of IO activity on P server around failover time. Some IO errors similar to below were seen, however point to note is these errors were not only for the DBs that failed over, but for others including TEMPDB:
Date 07/01/2015 11:07:38 PM
Log SQL Server (Current - 08/01/2015 4:06:00 AM)
Source spid2s
Message
SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [R:\SQLDATA\MSSQL.1\MSSQL\Data\<DBName>.mdf] in database [DBName] (5). The OS file handle is 0x000000000000095C. The offset of the
latest long I/O is: 0x0000054ff22000
Questions:
1. I assumed that the Witness keeps polling P & M on DB mirroring endpoints (in our case 5022) to check that the DBs are online, but Network team says there is no activity on that port, is my understanding correct?
2. Is there any other reason for DB failover ?
Link referred:
http://dba.stackexchange.com/questions/22402/what-can-cause-a-mirroring-session-to-timeout-then-failover-sql-server-2005
http://msdn.microsoft.com/en-us/library/ms179344(v=sql.90).aspx
Any help is highly appreciated!!!
Regards,
MandarThis is common with Mirroring server it is not as resilient to changes as log shipping. Are you aware about
below fact although not directly related to your question
If you plan to use high-safety mode with automatic failover, the normal load on each failover partner should be less than 50 percent of the CPU. If your work load overloads the CPU, a failover partner might be unable to ping the other server instances in
the mirroring session. This causes a unnecessary failover. If you cannot keep the CPU usage under 50 percent, we recommend that you use either high-safety mode without automatic failover or high-performance mode.
Now to your problem
The mirroring connection to "TCP://<server.domain.com>:5022" has timed out for database "<DBName>" after 10 seconds without a response. Check the service and network connections.
I would say there was network dip for more than 10 seconds and since default failover time is 10 seconds and for few databases witness thought principal cannot be reached it initiated failover.
Network team is incorrect to say there was no dip (its common with NOC team not to take responsibility)
This Support Article is worth reading specially the network part
Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it
My Technet Wiki Article
MVP -
SQL 2012 Database Availability Group - Force Automatic Failover
Hi All,
I'd appreciate some help in understanding the following scenario in my test environment.
I have created a DAG with 2 replica servers (both of which are HyperV VM's running W2012 Std).
From a client PC in my test lab, I can connect to the virtual listener of my DAG and confirm via the "select @@servername" command that I am connecting to the primary replica server.
Using the Failover Wizard, I can easily move to primary instance between my 2 nodes and the command above confirms that the primary replica server has changed. So far so good.
What I wanted to test, was what would happen to my DAG in the event of a complete loss of power to the server that was acting as the primary replica server. At first, I thought I would stop the SQL Server service on the primary server, but this did not result
in my DAG failing over to the secondary replica. I have found out that the only way I can do this is by effectively shutting down the primary server in a controlled manner.
Is there any reason why either stopping the SQL Server service on the primary replica, or indeed forcing a power off of the primary replica does not result in the DAG failing over to the secondary replica?
Thanks,
BobHi,
I would verify if Database Availability Group means AlwaysOn Availability Group.
How did you set the FailureConditionLevel?
Whether the diagnostic data and health information returned by sp_server_diagnostics warrants an automatic failover depends on the failure-condition level of the availability group. The failure-condition level specifies what failure conditions
trigger an automatic failover. There are five failure-condition levels, which range from the least restrictive (level one) to the most restrictive (level five). For details about failure-conditions level, see:
http://msdn.microsoft.com/en-us/library/hh710061.aspx#FClevel
There are two useful articles may be helpful:
SQL 2012 AlwaysOn Availability groups Automatic Failover doesn’t occur or does it – A look at the logs
http://blogs.msdn.com/b/sql_pfe_blog/archive/2013/04/08/sql-2012-alwayson-availability-groups-automatic-failover-doesn-t-occur-or-does-it-a-look-at-the-logs.aspx
SQL Server 2012 AlwaysOn – Part 7 – Details behind an AlwaysOn Availability Group
http://blogs.msdn.com/b/saponsqlserver/archive/2012/04/24/sql-server-2012-alwayson-part-7-details-behind-an-alwayson-availability-group.aspx
Thanks.
Tracy Cai
TechNet Community Support
Hi,
Thanks for the reply.
It's an AlwaysOn Availability Group.
In my test lab, I have changed the quorum configuration to a file share witness and that has allowed an automatic failover when I turn the primary replica server off (rather than power it off).
I'll take a look at the links you provided.
Regards,
Bob -
HACMP Clustering Script for SAP ECC 6.0 (SR1) - Automatic Failover-Oracle10
Hello,
I have installed the SAP ECC 6.0 (SR1) under AIX 5.3 / Oracle 10g with HACMP Clustering environment. Manual Failover is working fine. ASCS and Database instances are loaded in share drive with Virtual IP and Virtual name. Central Instance and Dialog Instance are loaded locally in Node A and Node B. I want to get HACMP Clustering script(automatic failover script) for Automation. Please help me if you have.
Thanks
Gautam PoddarHere are HA stop & start scripts that you should be able adapt for your particular circumstances. Based on earlier versions of SAP / Oracle but assume should be a reasonable guide
Script to start SAP is start_sap_prd
#!/bin/ksh
Script: /usr/local/bin/cluster/start_sap_prd
Comments: HACMP Application START script for PRD
Show me obvious information in hacmp.out
banner "Starting"
banner "PRD SAP"
Set the oracle and sap owner.
ORASID="PRD"
SAPADM="prdadm"
ORAUSR="oraprd"
VIRTUALHOST="vhost"
DEVHOST="vhostdev"
Get the volume groups for this resource group
RG=$( /usr/es/sbin/cluster/utilities/cllsgrp | grep -i $ )
VG_LIST=$( /usr/es/sbin/cluster/utilities/cllsres -g $ | \
grep "VOLUME_GROUP=" | \
awk -F\" '{ print $2 }' )
Check the transport directory is mounted.
if mount | grep -w "/usr/sap/trans"
then
print "Transport directory is already mounted."
else
cd /tmp
print "Attempting a background mount of the transport directory."
nohup mount -o intr,bg,soft :/usr/sap/trans1 /usr/sap/trans &
fi
#Start SAP and Oracle
#Start listener
su - $ -c /rprd/oracle/PRD/920_64/bin/lsnrctl start
rc=$?
if [ $? != 0 ]
then
echo "ERROR: Listener failed to start\n"
fi
#Start Database
su - $ -c "/rprd/oracle/PRD/bin/start_database_PRD.sh"
sleep 20
Standard sapstart script
su - $ -c startsap $
Script: /usr/local/bin/cluster/stop_sap_prd
Dated: 01/11/06
Application: Oracle/SAP
Comments: HACMP Application STOP script for SAP / Oracle PRD
Show me obvious information in hacmp.out
Set the oracle and sap owner.
rc=$?
if [ $? != 0 ]
then
echo "ERROR: Failed to start SAP\n"
fi
exit 0
Script to stop SAP is stop_sap_prd
#!/bin/ksh
set -x
banner "stopping"
banner "PRD SAP"
ORASID="PRD"
SAPADM="prdadm"
ORAUSR="oraprd"
VIRTUALHOST="vhost"
#Stop SAP/Oracle
su - $ -c stopsap $
rc=$?
if [ $? != 0 ]
then
echo "ERROR: Failed to stop SAP and Oracle\n"
break
fi
Stop SAP collector and Oracle listener.
su - $ -c /usr/sap/PRD/SYS/exe/run/saposcol -k
rc=$?
if [ $? != 0 ]
then
echo "ERROR: Failed to stop SAPOSCOL \n"
fi
su - $ -c /rprd/oracle/PRD/920_64/bin/lsnrctl stop
rc=$?
if [ $? != 0 ]
then
echo "ERROR: Listener failed to stop\n"
fi
if mount | grep -w "/usr/sap/trans"
then
print "Transport directory is mounted."
/usr/es/sbin/cluster/events/utils/cl_nfskill -k -u /usr/sap/trans
sleep 1
/usr/es/sbin/cluster/events/utils/cl_nfskill -k -u /usr/sap/trans
sleep 1
umount -f /usr/sap/trans &
else
print "Transport directory is not mounted."
fi
exit 0
Maybe you are looking for
-
Hi guys, I have built a Query on a Multiprovider which is based on a Remote Cube and Basic Cube. I can see Data for my filters via ListCube in both Multi Provider and Remote Cube. The Query is showing Data when Restrict to Value Range [First Value,
-
In google video chat, video is not working in safari
video is not working in google video chat. it shows a black screen. what could be the problem. i have tried in safari, firefox, chrome. face time is working properly. (camera is working)
-
Inconsistencies between sidebar and desktop for disk images
Hi; I'm tweaking my settings to keep just those things visible that I want to have visible, and I have noticed that a mounted disk image is treated differently by different parts of Finder Prefs. In the General tab of Finder Prefs, I make a mounted d
-
Call Type node not setting Call Type Dynamically By name.
I'm attempting to use the Call Type node to set a Call Type dynamically by name using the following logic: concatenate("somestring_",userParseFromVXML("CT"),"_Q") The userParseFromVXML is a custom function that works correctly and pulls my data out o
-
Will apple ever update imovie software for the desktop
In the Fall of 2010, Apple released iMovie 11' and since the release of the iPad, they seem to only be creating updates for iMovie on the iPad, iPhone and iPod Touch!!! I want to know if Apple is going to update it on OSX.