Automatic failover to an alternate Directory Server

We have two messaging server (Server A and Server B) running with independent LDAP server and replication is enabled between two.
In order to use automatic failover to an alternate Directory Server, I configured like this on Server A
configutil -o local.ugldaphost -v �serverA serverB�
Now to test the configuration I stopped the LDAP server on server A and run[b] imsimta test �rewrite-debug, it worked fine. But when I tried the authenticate user from pop3 or HTTP it came out with an error �Authentication Server is temporarily unavailable�
Any reason why it�s not doing the failover for HTTP, POP3 and IMAP authentication?

sorry I forget this
Sun Java(tm) System Messaging Server 6.1 HotFix 0.01 (built Jun 24 2004)
libimta.so 6.1 HotFix 0.01 (built 17:31:31, Jun 24 2004)
SunOS test.abc.com 5.9 Generic_112234-03 i86pc i386 i86pc

Similar Messages

  • How failover works with SunONE Directory Server?

    Assume that I setup 2 masters using the multimaster scheme.
    When 1 master fails/down, how do the client knows or get routed to the other master?

    For full redundancy:
    At the application level:
    -redundant storage (raid, san,nas)
    -multiple connectors to this storage (fiber,ethernet...)
    -multiple LDAP servers (multimaster, replica's)
    -multiple LDAP proxy servers
    -redundant switches/routers (vrrp, ...)
    -loadbalanced by redundant interconnected loadbalancers (level7)
    All this helps in non persistant connections, if application are using connection pooling (for performance reasons), you have to verify the behaviour. Some applications only create this pool at start, but if the pool connections brake, it should reconnect.

  • Directory server 6 failover

    I plan to have failover capability between two directory servers, they both are java system directory server 6 enterprise edition. I am not sure if the replication is the right solution for failover. How does the failover work? All my systems are solaris 9 systems and I already have one directory server 6 as ldap server and one native solaris ldap client as a test client.
    thanks,
    --xinhuan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

    Thanks for your information.
    I still don't understand what the proxy server will be doing. If I put two directory server ips on the client side configuration file, will the client connect to the other server in case one server is down automatically, given that I am using native Solaris ldap client. Why it is necessary to put a proxy server in front of the two master servers? I actually don't need the load balancing but indeed, I need the failover feature. If I don't use the proxy server, does the failover happen automatically or by human intervention?
    thanks,
    --xinhuan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

  • Automatic failover doesn't failback to the first server if the second server is lost.

    Hi Everybody,
       We use the database mirroring a lot in our product solutions and we have recently experienced a strange behaviour in our failover tests with SQL2008R2.
    We have 2 servers running Windows 2008 R2 standard and SQL 2008 R2 standard SP2. (let's call them DB1 and DB2)
    We also have a Witness workstation running SQL 2008 Express on a Windows 7
    A database from DB1 is mirrored to DB2 in "safety full" mode, with witness. At this stage, the database is principal on DB1 and mirror on DB2
    To test the automatic failover, we first restart the DB1 server which has the database in principal mode
    After a few seconds, the database on DB2 becomes principal, which is normal , that's exactly what we want.
    After a few minutes, DB1 comes back online and its database takes the mirror role (still OK). At this stage then, the database is principal on DB2 and mirror on DB1
    when the monitoring application shows that the mirror is synchronized and that both servers are connected to the witness, we restart DB2 to trigger an automatic failover to DB1.
    What we see is that DB1 never takes the principal role and the database stays in mirror.
    In the DB1 Errorlog, I only see these 2 lines when DB2 disappears, no other message related to the mirroring session.
    2014-01-22 08:57:26.91 spid43s     Starting up database 'Test123'.
    2014-01-22 08:57:26.95 spid43s     Bypassing recovery for database 'Test123' because it is marked as a mirror database, which cannot be recovered. This is an informational message only. No user action is required.
    When DB2 comes back online, the database on DB2 keeps its principal status and the database on DB1 stays mirror.
    And what is really really strange is that, if I restart DB2 once again, directly after that, DB1 failover normally and the database on DB1 takes the principal role after a few seconds. without any configuration changes between the 2 restarts.
    DB1 errorlog shows then :
    2014-01-22 09:00:37.53 spid29s     Error: 1474, Severity: 16, State: 1.
    2014-01-22 09:00:37.53 spid29s     Database mirroring connection error 4 'An error occurred while receiving data: '64(The specified network name is no longer available.)'.' for 'TCP://DB2:5022'.
    2014-01-22 09:00:37.53 spid18s     Database mirroring is inactive for database 'Test123'. This is an informational message only. No user action is required.
    2014-01-22 09:00:42.37 spid32s     The mirrored database "Test123" is changing roles from "MIRROR" to "PRINCIPAL" due to Auto Failover.
    2014-01-22 09:00:42.39 spid32s     Recovery is writing a checkpoint in database 'Test123' (7). This is an informational message only. No user action is required.
    2014-01-22 09:00:42.39 spid32s     Recovery completed for database Test123 (database ID 7) in 78 second(s) (analysis 0 ms, redo 0 ms, undo 7 ms.) This is an informational message only. No user action is required.
    So, if I summarize, 
    - a first failover from DB1 to DB2 always work
    - then, a restart of DB2 never failover to DB1
    - a second restart of DB2 always failover to DB1
    This is pretty much systematic on one our server couple.
    Any explanation for this or any idea where I can search to find the reason of this strange behavior ?
    Thanks a lot for your help
    Seb

    Thank you Tom
    But I have already checked that and reported the Errorlog abstracts in my original post.
    When DB01 disapears for the first time, nothing in the DB01 ERRORLOG (it is restarting :-) )
    AND no particular error message in the DB02 ERRORLOG (nothing related to the fact that DB01 is not reachable anymore !!! )
    Only these two lines
    2014-01-22 08:57:26.91 spid43s     Starting
    up database 'Test123'.
    2014-01-22 08:57:26.95 spid43s     Bypassing recovery
    for database 'Test123' because it is marked as a mirror database, which cannot be recovered. This is an informational message only. No user action is required.
    So my main question remains Why DB02 doesn't detect that DB01 disapears (and the first time only) and why the failover mechanism doesn't trigger the failover ?
    Thank you
    Seb

  • Unplanned automatic failover using Hyper-v Replica , why don't the VMs start up automatically on the replica server?

    Hi,
    We have cluster with two hosts (Host01 , host02) replicated to another server (Replica01)
    in order to test automatic failover to the replica server  (Replica01) We unplugged the power cables from Host01 and Host 02 
    now the VMs on the replica server is still off  , why don't the VMs start up  automatically on the replica server?
    Ramy Shaker

    overall there is no automatic failover in Hyper-V
    Of course there is. It's enabled by Failover Clustering. This is a totally separate technology from Hyper-V Replica.
    There is no automatic start up in Hyper-V Replica because it is not designed to detect a split-brain condition where the same virtual machine is running in multiple locations simultaneously. The replica site has no way to know why it can't reach the primary
    system anymore. It might just be because someone unplugged a network cable. If the primary's virtual machines are still running and the replica decides to spin up its copies, you will have many troubles.
    Eric Siron Altaro Hyper-V Blog
    I am an independent blog contributor, not an Altaro employee. I am solely responsible for the content of my posts.
    "Every relationship you have is in worse shape than you think."

  • Directory Server SMF tripping over itself (crosspost)

    I've posted this question in the SMF related forum too, so if replies could go there, that would be handy: [http://forums.sun.com/thread.jspa?messageID=10940406]
    We have a working instance of DSEE6.3.1 under Solaris 10 managed via SMF (using the manifest generated by dsadm/dscfg -- I forget which).
    # svcs -a | grep ldap-user
    online         10:47:08 svc:/application/sun/ds:ds--data-ldap-user-instanceAfter a forced shutdown, DSEE starts up and does a self-recovery (as it should). When that's complete, the slapd process is running and the startup script exits with status 221 (ie. Not 0) -- however slapd is running.
    SMF notices that it's !0 and tries to restart DSEE... by issuing another start. This second start then exits almost immediately saying "slapd already running" but this time exits with 0 -- are we ok? No... cos SMF then notices that all the processes it just started have gone away so it calls "stop" followed by another "start".
    This is where it gets a bit hazy as it looks like DSEE never shut down cleanly again so the whole process repeats itself ad infinitum (although I suspect that's a separate issue). :-(
    I guess what I'm asking is -- is there a way to stop SMF from doing that: perhaps treat exit=221 as non-fatal and perform a service check?
    Log file below:
    [ Feb 26 21:40:42 Enabled. ]
    [ Feb 26 21:40:50 Executing start method ("/opt/SUNWdsee/ds6/bin/dsadm start --exec /data/ldap/user/instance
    Failed to start Directory Server instance '/data/ldap/user/instance'
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    Directory Server instance '/data/ldap/user/instance' has detected a disorderly shutdown or a change in cache
    size
    Recovery phase is starting, this may take a while...
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    ns-slapd wrote the following lines in the error log (/data/ldap/user/instance/logs/errors):
    ##[26/Feb/2010:22:00:07 +0000] - Sun-Java(tm)-System-Directory/6.3.1 B2008.1121.0156 (64-bit) starting up
    ##[26/Feb/2010:22:00:09 +0000] - WARNING<20488> - Backend Database - conn=-1 op=-1 msgId=-1 -  Detected Diso
    rderly Shutdown last time Directory Server was running, recovering database.
    ##[26/Feb/2010:22:01:38 +0000] - Database recovery is 0% complete.
    ##[26/Feb/2010:22:01:51 +0000] - Database recovery is 100% complete.
    ##[26/Feb/2010:22:01:59 +0000] - WARNING<20805> - Backend Database - conn=-1 op=0 msgId=-1 -  search is not
    indexed base='cn=changelog' filter='(replicationcsn>=4b87f656000000000000)' scope='sub'
    [ Feb 26 22:02:17 Method "start" exited with status 221 ]
    [ Feb 26 22:02:17 Executing start method ("/opt/SUNWdsee/ds6/bin/dsadm start --exec /data/ldap/user/instance
    Directory Server instance '/data/ldap/user/instance' is already running (pid: 352)
    [ Feb 26 22:02:18 Method "start" exited with status 0 ]
    [ Feb 26 22:02:18 Stopping because all processes in service exited. ]
    [ Feb 26 22:02:18 Executing stop method ("/opt/SUNWdsee/ds6/bin/dsadm stop --exec /data/ldap/user/instance")
    Directory Server instance '/data/ldap/user/instance' stopped
    [ Feb 26 22:02:20 Method "stop" exited with status 0 ]
    [ Feb 26 22:02:20 Executing start method ("/opt/SUNWdsee/ds6/bin/dsadm start --exec /data/ldap/user/instance
    Failed to start Directory Server instance '/data/ldap/user/instance'
    .......................... repeat ........................

    Well, one way around it is to write your own start script and manage the exit codes yourself.
    I have some doubts about the autorestart configuration of DS, especially in a case like this where the server seems to be crashing. Realistically, you can end up worse off if your server has crashed by automatically restarting it. Your data may be corrupt, and the process may eventually stay up (especially if you work around the current issue), but the DS is not really healthy and it does need an administrator to investigate what's wrong with it. It may also return inconsistent or simply bad data to clients. All in all, I would prefer an instance in such a state to stay down and trigger alarms, assuming it has failover peers that can take on its workload.

  • Directory Server SMF tripping over itself

    We have a working instance of DSEE6.3.1 under Solaris 10 managed via SMF (using the manifest generated by dsadm/dscfg -- I forget which).
    # svcs -a | grep ldap-user
    online         10:47:08 svc:/application/sun/ds:ds--data-ldap-user-instanceAfter a forced shutdown, DSEE starts up and does a self-recovery (as it should). When that's complete, the slapd process is running and the startup script exits with status 221 (ie. Not 0) -- however slapd is running.
    SMF notices that it's !0 and tries to restart DSEE... by issuing another start. This second start then exits almost immediately saying "slapd already running" but this time exits with 0 -- are we ok? No... cos SMF then notices that all the processes it just started have gone away so it calls "stop" followed by another "start".
    This is where it gets a bit hazy as it looks like DSEE never shut down cleanly again so the whole process repeats itself ad infinitum (although I suspect that's a separate issue). :-(
    I guess what I'm asking is -- is there a way to stop SMF from doing that: perhaps treat exit=221 as non-fatal and perform a service check?
    Log file below:
    [ Feb 26 21:40:42 Enabled. ]
    [ Feb 26 21:40:50 Executing start method ("/opt/SUNWdsee/ds6/bin/dsadm start --exec /data/ldap/user/instance
    Failed to start Directory Server instance '/data/ldap/user/instance'
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    Directory Server instance '/data/ldap/user/instance' has detected a disorderly shutdown or a change in cache
    size
    Recovery phase is starting, this may take a while...
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    Waiting for Directory Server instance '/data/ldap/user/instance' to start...
    ns-slapd wrote the following lines in the error log (/data/ldap/user/instance/logs/errors):
    ##[26/Feb/2010:22:00:07 +0000] - Sun-Java(tm)-System-Directory/6.3.1 B2008.1121.0156 (64-bit) starting up
    ##[26/Feb/2010:22:00:09 +0000] - WARNING<20488> - Backend Database - conn=-1 op=-1 msgId=-1 -  Detected Diso
    rderly Shutdown last time Directory Server was running, recovering database.
    ##[26/Feb/2010:22:01:38 +0000] - Database recovery is 0% complete.
    ##[26/Feb/2010:22:01:51 +0000] - Database recovery is 100% complete.
    ##[26/Feb/2010:22:01:59 +0000] - WARNING<20805> - Backend Database - conn=-1 op=0 msgId=-1 -  search is not
    indexed base='cn=changelog' filter='(replicationcsn>=4b87f656000000000000)' scope='sub'
    [ Feb 26 22:02:17 Method "start" exited with status 221 ]
    [ Feb 26 22:02:17 Executing start method ("/opt/SUNWdsee/ds6/bin/dsadm start --exec /data/ldap/user/instance
    Directory Server instance '/data/ldap/user/instance' is already running (pid: 352)
    [ Feb 26 22:02:18 Method "start" exited with status 0 ]
    [ Feb 26 22:02:18 Stopping because all processes in service exited. ]
    [ Feb 26 22:02:18 Executing stop method ("/opt/SUNWdsee/ds6/bin/dsadm stop --exec /data/ldap/user/instance")
    Directory Server instance '/data/ldap/user/instance' stopped
    [ Feb 26 22:02:20 Method "stop" exited with status 0 ]
    [ Feb 26 22:02:20 Executing start method ("/opt/SUNWdsee/ds6/bin/dsadm start --exec /data/ldap/user/instance
    Failed to start Directory Server instance '/data/ldap/user/instance'
    .......................... repeat ........................

    Well, one way around it is to write your own start script and manage the exit codes yourself.
    I have some doubts about the autorestart configuration of DS, especially in a case like this where the server seems to be crashing. Realistically, you can end up worse off if your server has crashed by automatically restarting it. Your data may be corrupt, and the process may eventually stay up (especially if you work around the current issue), but the DS is not really healthy and it does need an administrator to investigate what's wrong with it. It may also return inconsistent or simply bad data to clients. All in all, I would prefer an instance in such a state to stay down and trigger alarms, assuming it has failover peers that can take on its workload.

  • Open Directory Server "not responding"

    This is strange, and I'm not sure what if anything is wrong...
    My server is an OD Master. LDAP, Password Server, and Kerberos all report running. AFP authentication is set to Kerberos (only). Authenticated directory binding is enabled. Client computers are bound to the directory server. They connect via AFP, a ticket is created (viewable in Ticket Viewer), everything works fine (apparently).
    However... in System Preferences/Accounts/Login Options, there's a red dot (not Leica) next to the directory server IP, and if I click on Edit it says "The server is not responding". This is the case for all client computers, not just one. Not sure when it started; when I set it up they were all green of course.
    So, what does this "server is not responding" mean? Given that clients can do everything they need to do, can/should I consider this a non-issue?

    Thanks Classic and Chris. Good questions.
    The server isn't behaving as expected. Following Classic's suggestion, I tried binding without SSL. I didn't expect it to work, I thought SSL was required. (Under OD Settings/Policies/Binding, "Encrypt all packets (requires SSL or Kerberos)" is checked.) But with SSL unchecked, I was prompted for diradmin username/password. I entered the correct credentials, but they were rejected. So I tried leaving the credentials blank. That bound the client to the directory successfully (green dot). But "Enable authenticated directory binding" is checked.
    With the green dot, I tried connecting to the server over AFP, but could not. Only when I manually copied in the Kerberos file was I able to successfully connect to AFP. (Shouldn't the Kerberos file be created automatically at some point?)
    So, clearly something is wrong with SSL, and also perhaps with my settings. (The server should only allow binding with authentication and over SSL, but it does not, and it does allow unauthenticated binding without SSL.)
    OD Overview confirms that Kerberos is running. Not connected to an AD domain (nor should be).
    Running the kadmin.local command gives me a very long list of items that look like e.g. service/[email protected] or service/LKDC:[email protected] One of the services listed is "afpserver". (There are also listings for a number of services that aren't run on the server.)
    AFP is restricted to two groups; the username I'm using for AFP connections is a member of one of those groups.

  • How to Perform Forced Manual Failover of Availability Group (SQL Server) and WSFC (Windows Server Failover Cluster)

    I have a scenario with the three nodes with server 2012 standard, each running an instance of SQL Server 2012 enterprise, participate in a
    single Windows Server Failover Cluster (WSFC) that spans two data centers.
    If the nodes in the primary data center are unavailable due to data center outage. Then how I can able to access node in the WSFC (Windows Server Failover Cluster) in the secondary disaster recovery data center automatically with some script.
    I want to write script that can be able to check primary data center by pinging some IP after every 5 or 10 minutes.
    If that IP is unable to respond then script can be able to Perform Forced Manual Failover of Availability Group (SQL Server) and WSFC (Windows Server Failover Cluster)
    Can you please guide me for script writing for automatic failover in case of primary datacenter outage?

    please post you question on failover clusters in the cluster forum.  THey will explain how this works and point you at scipts.
    You should also look in the Gallery for cluster management scripts.
    ¯\_(ツ)_/¯

  • How to Perform Forced Manual Failover of Availability Group (SQL Server) and WSFC (Windows Server Failover Cluster) with scrpiting

    I have a scenario with the three nodes with server 2012 standard, each running an instance of SQL Server 2012 enterprise, participate in a
    single Windows Server Failover Cluster (WSFC) that spans two data centers.
    If the nodes in the primary data center are unavailable due to data center outage. Then how I can able to access node in the WSFC (Windows Server Failover Cluster) in the secondary disaster recovery data center automatically with some script.
    I want to write script that can be able to check primary data center by pinging some IP after every 5 or 10 minutes.
    If that IP is unable to respond then script can be able to Perform Forced Manual Failover of Availability Group (SQL Server) and WSFC (Windows Server Failover Cluster)
    Can you please guide me for script writing for automatic failover in case of primary datacenter outage?

    You are trying to implement manually what should be happening automatically in the cluster. If the primary SQL Server becomes unavailable in the data center, it should fail over to the secondary SQL Server automatically.  Is that not working?
    You also might want to run this configuration by some SQL experts.  I am not a SQL expert, but if you have both hosts in the data center in a cluster, there is no need for replication between those two nodes as they would be accessing
    the database from some form of shared storage.  Then it looks like you are trying to implement Always On to the DR site.  I'm not sure you can mix both types of failover in a single configuration.
    FYI, it would make more sense to establish a file share witness in your DR site instead of placing a third node in the data center for Node Majority quorum.
    . : | : . : | : . tim

  • Problem in Publishing the certificate to directory server

    I am having problem regarding the publishing the certificate.I am using iPlanet CMS 4.7 and iPlanet directory server 5.1
    In the CMS >certificate manager > publishing module > mapper
    It provides(manuals) two options to enable the publishing to directory server, i.e
    1)create entry automatically(default plug -in)
    2)Manual entry in directory and mapper to map it.
    I tried both way.When automatically create option is selected it fires an error:
    Failed to create the CA entry.There may be entries in the directory hierachy which do not exist.Please create them manually.
    I am not able to figure out the problem,even if I create certificate hierachy in the directory server it gives the same error.Can anyone figure out the problem so i can publish certificate.Pleae mail me the solution if anybody knows.Thank you

    Hi,
    1. Please open the original project in Captivate 3. i.e. the .cp file in Captivate 3
    2. Go to menu "Audio > Audio Settings"
    3. Change the bitrate to 96kbps or 64kbps
    4. Change the Encoding ferwquency to 44Khz
    5. Save and close the project
    6. Now open the same project in Captivate 5
    7. publish the project
    Audio should play correctly now..
    Hope this helps.
    Regards,
    mukul

  • Binding to directory server vs. OD replica

    Can someone explain the practical differences between binding a server to an OD master vs. being a replica of that OD master?
    Why would I bind a server instead of making a replica? Seems like the replica would always be easier to admin and would provide the same function...?

    I'm wondering why someone would do this. Why bind one server to another vs. making that second server a replica?
    The real issue is whether this server is going to provide authentication services to other clients.
    In addition to not wanting all your data on a single machine, if you have many client systems it may overwhelm a single directory server. For these reasons you may create a replica (or number of replicas) that keep in sync with the master server and have a complete copy of the entire Open Directory database (all users, machines, groups, etc.)
    These replicas can then be used to provide authentication services to client systems, as well as provide failover for the client in case this machine goes away for any reason.
    In contrast there's no need for every client system to have the entire directory. If you have many machines, the number of update messages that get passed around and need to be replicated to every machine on the network would be cumbersome, at best.
    Then there's also an element of security - the directory should have some level of protection since it includes data about every user, including their password and other personal details. If you replicate this to every machine then any user on your network could poke around the data at their leisure. Contrast that with a typical client machine that only has the account credentials for the current user.
    So for any network you should create one master and at least one replica. Client systems should point to a replica and should not be Open Directory replicas themselves.

  • SQL 2005 mirroring : Abrupt Automatic failover

    hi All, 
    We have a SQL 2005 SP4 mirroring  setup of 15 DBs with Principal(P), Mirror(M) & Witness (W).
    We have now seen abrupt DB failovers for some of the databases (yest it was 4 out of 15) from P to M.
    Errors were seen on Witness server as follows for all Dbs that failed over:
    Date 07/01/2015 11:07:48 PM
    Log SQL Server (Current - 08/01/2015 12:00:00 AM)
    Source spid19s
    Message
    The mirroring connection to "TCP://<server.domain.com>:5022" has timed out for database "<DBName>" after 10 seconds without a response.  Check the service and network connections.
    Actions taken:
    1. Network and Firewall team reverted that no error detected and no network traffic between the witness server and db server during the db auto failover period.
    2. On the system side, we have verified that no hardware error found on either VM or SAN storage, and no Symantec SQL backup jobs running nor anti virus scanning during the db auto failover period too.
    3. We did see some high amount of IO activity on P server around failover time. Some IO errors similar to below were seen, however point to note is these errors were not only for the DBs that failed over, but for others including TEMPDB:
    Date 07/01/2015 11:07:38 PM
    Log SQL Server (Current - 08/01/2015 4:06:00 AM)
    Source spid2s
    Message
    SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [R:\SQLDATA\MSSQL.1\MSSQL\Data\<DBName>.mdf] in database [DBName] (5).  The OS file handle is 0x000000000000095C.  The offset of the
    latest long I/O is: 0x0000054ff22000
    Questions:
    1. I assumed that the Witness keeps polling P & M on DB mirroring endpoints (in our case 5022) to check that the DBs are online, but Network team says there is no activity on that port, is my understanding correct?
    2. Is there any other reason for DB failover ? 
    Link referred:  
    http://dba.stackexchange.com/questions/22402/what-can-cause-a-mirroring-session-to-timeout-then-failover-sql-server-2005
    http://msdn.microsoft.com/en-us/library/ms179344(v=sql.90).aspx
    Any help is highly appreciated!!!
    Regards,
    Mandar

    This is common with Mirroring server it is not as resilient to changes as log shipping. Are you aware about
    below fact although not directly related to your question
    If you plan to use high-safety mode with automatic failover, the normal load on each failover partner should be less than 50 percent of the CPU. If your work load overloads the CPU, a failover partner might be unable to ping the other server instances in
    the mirroring session. This causes a unnecessary failover. If you cannot keep the CPU usage under 50 percent, we recommend that you use either high-safety mode without automatic failover or high-performance mode.
    Now to your problem
    The mirroring connection to "TCP://<server.domain.com>:5022" has timed out for database "<DBName>" after 10 seconds without a response.  Check the service and network connections.
    I would say there was network dip for more than 10 seconds and since default failover time is 10 seconds and for few databases witness thought principal cannot be reached it initiated failover.
    Network team is incorrect to say there was no dip (its common with NOC team not to take responsibility)
    This Support Article is worth reading specially the network part
    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it
    My Technet Wiki Article
    MVP

  • SQL 2012 Database Availability Group - Force Automatic Failover

    Hi All,
    I'd appreciate some help in understanding the following scenario in my test environment.
    I have created a DAG with 2 replica servers (both of which are HyperV VM's running W2012 Std).
    From a client PC in my test lab, I can connect to the virtual listener of my DAG and confirm via the "select @@servername" command that I am connecting to the primary replica server.
    Using the Failover Wizard, I can easily move to primary instance between my 2 nodes and the command above confirms that the primary replica server has changed. So far so good.
    What I wanted to test, was what would happen to my DAG in the event of a complete loss of power to the server that was acting as the primary replica server. At first, I thought I would stop the SQL Server service on the primary server, but this did not result
    in my DAG failing over to the secondary replica. I have found out that the only way I can do this is by effectively shutting down the primary server in a controlled manner.
    Is there any reason why either stopping the SQL Server service on the primary replica, or indeed forcing a power off of the primary replica does not result in the DAG failing over to the secondary replica?
    Thanks,
    Bob

    Hi,
    I would verify if Database Availability Group means AlwaysOn Availability Group.
    How did you set the FailureConditionLevel?
    Whether the diagnostic data and health information returned by sp_server_diagnostics warrants an automatic failover depends on the failure-condition level of the availability group. The failure-condition level specifies what failure conditions
    trigger an automatic failover. There are five failure-condition levels, which range from the least restrictive (level one) to the most restrictive (level five). For details about failure-conditions level, see:
    http://msdn.microsoft.com/en-us/library/hh710061.aspx#FClevel
    There are two useful articles may be helpful:
    SQL 2012 AlwaysOn Availability groups Automatic Failover doesn’t occur or does it – A look at the logs
    http://blogs.msdn.com/b/sql_pfe_blog/archive/2013/04/08/sql-2012-alwayson-availability-groups-automatic-failover-doesn-t-occur-or-does-it-a-look-at-the-logs.aspx
    SQL Server 2012 AlwaysOn – Part 7 – Details behind an AlwaysOn Availability Group
    http://blogs.msdn.com/b/saponsqlserver/archive/2012/04/24/sql-server-2012-alwayson-part-7-details-behind-an-alwayson-availability-group.aspx
    Thanks.
    Tracy Cai
    TechNet Community Support
    Hi,
    Thanks for the reply.
    It's an AlwaysOn Availability Group.
    In my test lab, I have changed the quorum configuration to a file share witness and that has allowed an automatic failover when I turn the primary replica server off (rather than power it off).
    I'll take a look at the links you provided.
    Regards,
    Bob

  • HACMP Clustering Script for SAP ECC 6.0 (SR1) - Automatic Failover-Oracle10

    Hello,
    I have installed the SAP ECC 6.0 (SR1) under AIX 5.3 / Oracle 10g with HACMP Clustering environment. Manual Failover is working fine. ASCS and Database instances are loaded in share drive with Virtual IP and Virtual name. Central Instance and Dialog Instance are loaded locally in Node A and Node B. I want to get HACMP Clustering script(automatic failover script) for Automation. Please help me if you have.
    Thanks
    Gautam Poddar

    Here are HA stop & start scripts that you should be able adapt for your particular circumstances. Based on earlier versions of SAP / Oracle but assume should be a reasonable guide
    Script to start SAP is start_sap_prd
    #!/bin/ksh
    Script:         /usr/local/bin/cluster/start_sap_prd
    Comments:       HACMP Application START script for PRD
    Show me obvious information in hacmp.out
    banner "Starting"
    banner "PRD SAP"
    Set the oracle and sap owner.
    ORASID="PRD"
    SAPADM="prdadm"
    ORAUSR="oraprd"
    VIRTUALHOST="vhost"
    DEVHOST="vhostdev"
    Get the volume groups for this resource group
    RG=$( /usr/es/sbin/cluster/utilities/cllsgrp | grep -i $ )
    VG_LIST=$( /usr/es/sbin/cluster/utilities/cllsres -g $ | \
            grep "VOLUME_GROUP=" | \
            awk -F\" '{ print $2 }' )
    Check the transport directory is mounted.
    if mount | grep -w "/usr/sap/trans"
      then
            print "Transport directory is already mounted."
      else
            cd /tmp
            print "Attempting a background mount of the transport directory."
            nohup mount -o intr,bg,soft :/usr/sap/trans1 /usr/sap/trans &
    fi
    #Start SAP and Oracle
    #Start listener
    su - $ -c /rprd/oracle/PRD/920_64/bin/lsnrctl start
    rc=$?
    if [ $? != 0 ]
      then
            echo "ERROR: Listener failed to start\n"
    fi
    #Start Database
    su - $ -c "/rprd/oracle/PRD/bin/start_database_PRD.sh"
    sleep 20
    Standard sapstart script
    su - $ -c startsap $
    Script:       /usr/local/bin/cluster/stop_sap_prd
    Dated:        01/11/06
    Application:  Oracle/SAP
    Comments:     HACMP Application STOP script for SAP / Oracle PRD
    Show me obvious information in hacmp.out
    Set the oracle and sap owner.
    rc=$?
    if [ $? != 0 ]
    then
            echo "ERROR: Failed to start SAP\n"
    fi
    exit 0
    Script to stop SAP is stop_sap_prd
    #!/bin/ksh
    set -x
    banner "stopping"
    banner "PRD SAP"
    ORASID="PRD"
    SAPADM="prdadm"
    ORAUSR="oraprd"
    VIRTUALHOST="vhost"
    #Stop SAP/Oracle
    su - $ -c stopsap $
    rc=$?
    if [ $? != 0 ]
    then
            echo "ERROR: Failed to stop SAP and Oracle\n"
            break
    fi
    Stop SAP collector and Oracle listener.
    su - $ -c /usr/sap/PRD/SYS/exe/run/saposcol -k
    rc=$?
    if [ $? != 0 ]
    then
            echo "ERROR: Failed to stop SAPOSCOL \n"
    fi
    su - $ -c /rprd/oracle/PRD/920_64/bin/lsnrctl stop
    rc=$?
    if [ $? != 0 ]
    then
            echo "ERROR: Listener failed to stop\n"
    fi
    if mount | grep -w "/usr/sap/trans"
      then
            print "Transport directory is mounted."
            /usr/es/sbin/cluster/events/utils/cl_nfskill -k -u /usr/sap/trans
            sleep 1
            /usr/es/sbin/cluster/events/utils/cl_nfskill -k -u /usr/sap/trans
            sleep 1
            umount -f /usr/sap/trans &
      else
            print "Transport directory is not mounted."
    fi
    exit 0

Maybe you are looking for

  • Data Not Found in a Query

    Hi guys, I have built a Query on a Multiprovider which is based on a Remote Cube and Basic Cube.  I can see Data for my filters via ListCube in both Multi Provider and Remote Cube. The Query is showing Data when Restrict to Value Range [First Value,

  • In google video chat, video is not working in safari

    video is not working in google video chat. it shows a black screen. what could be the problem. i have tried in safari, firefox, chrome. face time is working properly. (camera is working)

  • Inconsistencies between sidebar and desktop for disk images

    Hi; I'm tweaking my settings to keep just those things visible that I want to have visible, and I have noticed that a mounted disk image is treated differently by different parts of Finder Prefs. In the General tab of Finder Prefs, I make a mounted d

  • Call Type node not setting Call Type Dynamically By name.

    I'm attempting to use the Call Type node to set a Call Type dynamically by name using the following logic: concatenate("somestring_",userParseFromVXML("CT"),"_Q") The userParseFromVXML is a custom function that works correctly and pulls my data out o

  • Will apple ever update imovie software for the desktop

    In the Fall of 2010, Apple released iMovie 11' and since the release of the iPad, they seem to only be creating updates for iMovie on the iPad, iPhone and iPod Touch!!!  I want to know if Apple is going to update it on OSX.