Proper steps to fail over to another host in a cluster

Hello,
Pardon my ignorance. What is the proper steps to force a fail over to the standby host in a cluster with two nodes?
My secondary host is the currently the active host for custer name. I would like to force it to fail to the primary, which is acting as a standby. Thank you in advance.

Hi MS_Moron,
You can refer the following KB gracefully move the cluster resource to another node.
Test the Failover of a Clustered Service or Application
http://technet.microsoft.com/en-us/library/cc754577.aspx
I’m glad to be of help to you!
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place.

Similar Messages

Server Pool Master fails and cannot fail over to another VM Server

Dear All,
Oracle VM 2.2.2
I have 2 VM Servers connect to Storage 6140 Array and on VM Manager I enable HA on the server pool, then on all virtual machines.
- VM Server 1 has role as Server Pool Master, Utility Server, Virtual Machine Server and has virtual machines running
- VM Server 2 has role as Utility Server, Virtual Machine Server and has virtual machines running.
I try to shutdown the VM Server 1 act as Server Pool Master but I don't see Server Pool Master fail over to another VM Server 2 and also status become to Unreachable both of 2 Servers.
Especially, All virtual machines cannot be accessible.
Please kindly give advice for this.
Thanks and regards,
Heng

Thanks Avi, I'll find and read that document. And thanks also for elaborating about the Utility Server.
After reading the followups to my original question, I tried to think of possible server "layouts" in a HA environment.
1) "N" servers in the pool, one of them is Pool Master, Utility Server AND VM Guests Server at the same time. Maybe this will be the preferred server for smaller, quicker VMs.
2) "N" servers in the pool, one is Pool Master AND Utility Server, but has no VM guests running on it
3) "N" servers in the pool, one is the Pool Master, another one is the Utility Server (none of them has VMs running on them), and finally a number of VM Guest servers
Let's take case 1. If the Pool Master & Utility server fails, given that it has VM guests running on it as well, I understand from your explanation that I'll be ANYWAY able to manually "live migrate" the guests somewhere else, using VM Manager. Is this correct?
If it's correct, then it's just a question of how much money I want to spend to have dedicated servers for different tasks, JUST FOR BETTER PERFORMANCES REASONS. Do you agree? And especially: do YOU have dedicated Pool Masters (just to figure out your "real" approach to the problem :-) )
I feel that I still miss something, the picture is not completely clear to me. The fact is, that I'm now testing on my new bladesystem, but for now I put up one single blade. Testing HA will be the next step. I was just trying to get a few things sorted out in advance, but there is still something that I'm missing, as I was saying...
Looking forward to your next reply, thanx again
Rob

Extend-TCP client not failing over to another proxy after machine failure

I have a configuration of three hosts. on A is the client, on B & C are a proxy and a cache instance. I've defined an AddressProvider that returns the address of B and then C. The client just repeatedly calls the cache (read-only). The client configuration is:
<?xml version="1.0"?>
<cache-config
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://xmlns.oracle.com/coherence/coherence-cache-config"
xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-cache-config
coherence-cache-config.xsd">
<caching-scheme-mapping>
<cache-mapping>
<cache-name>cache1</cache-name>
<scheme-name>extend-near</scheme-name>
</cache-mapping>
</caching-scheme-mapping>

<caching-schemes>
<near-scheme>
<scheme-name>extend-near</scheme-name>
<front-scheme>
<local-scheme>
<high-units>1000</high-units>
</local-scheme>
</front-scheme>
<back-scheme>
<remote-cache-scheme>
<scheme-ref>remote-cache1</scheme-ref>
</remote-cache-scheme>
</back-scheme>
<invalidation-strategy>all</invalidation-strategy>
</near-scheme>
<remote-cache-scheme>
<scheme-name>remote-cache1</scheme-name>
<service-name>cache1ExtendedTcpProxyService</service-name>
<initiator-config>
<tcp-initiator>
<remote-addresses>
<address-provider>
<class-name>com.foo.clients.Cache1AddressProvider</class-name>
</address-provider>
</remote-addresses>
<connect-timeout>10s</connect-timeout>
</tcp-initiator>
<outgoing-message-handler>
<request-timeout>5s</request-timeout>
</outgoing-message-handler>
</initiator-config>
</remote-cache-scheme>
</caching-schemes>
If I shutdown the proxy that the client is connected to on host B, failover occurs quickly by calling the AddressProvider. But if I shut down the network for host B (or drop the TCP port of the proxy on B) to simulate a machine failure, failover does not occur. The client simply continues to try to contact B and dutifully times out in 5 seconds. It never asks the AddressProvider for another address.
How do I get failover to kick in?

Hello,
If you are testing Coherence*Extend failover in the face of a network, machine, or NIC failure, you should enable Connection heartbeats on both the <tcp-initiator/> and <tcp-acceptor/>. For example:
Client cache config:
<remote-cache-scheme>
<scheme-name>extend-direct</scheme-name>
<service-name>ExtendTcpCacheService</service-name>
<initiator-config>
    <tcp-initiator>
      <remote-addresses>
        <socket-address>
          <address system-property="tangosol.coherence.extend.address">localhost</address>
          <port system-property="tangosol.coherence.extend.port">9099</port>
        </socket-address>
      </remote-addresses>
      <connect-timeout>2s</connect-timeout>
    </tcp-initiator>
    <outgoing-message-handler>
      <heartbeat-interval>10s</heartbeat-interval>
      <heartbeat-timeout>5s</heartbeat-timeout>
      <request-timeout>15s</request-timeout>
    </outgoing-message-handler>
</initiator-config>
</remote-cache-scheme>Proxy cache config:
<proxy-scheme>
<scheme-name>example-proxy</scheme-name>
<service-name>ExtendTcpProxyService</service-name>
<thread-count system-property="tangosol.coherence.extend.threads">2</thread-count>
<acceptor-config>
    <tcp-acceptor>
      <local-address>
        <address system-property="tangosol.coherence.extend.address">localhost</address>
        <port system-property="tangosol.coherence.extend.port">9099</port>
      </local-address>
    </tcp-acceptor>
    <outgoing-message-handler>
      <heartbeat-interval>10s</heartbeat-interval>
      <heartbeat-timeout>5s</heartbeat-timeout>
      <request-timeout>15s</request-timeout>
    </outgoing-message-handler>
</acceptor-config>
<autostart system-property="tangosol.coherence.extend.enabled">true</autostart>
</proxy-scheme>This is because it may take the TCP/IP stack a considerable amount of time to detect that it's peer is unavailable after a network, machine, or NIC failure (O/S dependent).
Jason

How Front End pool deals with fail over to keep user state?

   Hello to all, I searched a lot of articles to understand how Lync 2010 keeps user state if a fail happens in a Front Pool node, but didn't find anything clear.
     I found a MS info. about ths topic : " The Front End Servers maintain transient information—such as logged-on state and control information for an IM, Web, or audio/video (A/V) conference—only for the duration of a user’s session.
This configuration
is an advantage because in the event of a Front End Server failure, the clients connected to that server can quickly reconnect to another Front End Server that belongs to the same Front End pool. "
    As I read, the client uses DNS to reconnect to another Front End in the pool. When it reconnects to an available server, does he lose what he/she was doing at Lync client? Can the server that is now hosting his section recover all
"user's session data"? Is positive, how?
   Regards, EEOC.

The presence information and other dynamic user data is stored in the RTCDYN database on the backend SQL database in a 2010 pool:
http://blog.insidelync.com/2011/04/the-lync-server-databases/ If you fail over to another pool member, this pool member has access to the same data.
Ongoing conversations and the like are cached at the workstation.
Please remember, if you see a post that helped you please click "Vote As Helpful" and if it answered your question please click "Mark As Answer".
SWC Unified Communications

Failing over Oracle connections in a pool

          Hi,
          This message is probably a bit out of context (I've already posted
          it to the JDBC group). I post here as well, since I guess it's
          the place where people have the most experience with clustering
          and HA. Original posting below...
          Could you please tell me whether, yes or no, connections to an
          Oracle database should fail over (when the database fails over
          to another machine)? I use Oracle's Transparent Application Failover
          (configured via Net8) with Weblogic 6 on Linux and Oracle 8.1.7
          on Solaris/SPARC.
          If this doesn't work in my configuration, is there any configuration
          where it should work? (Another version of Oracle, WLS, OS, ...)
          When I try TAF using the PetStore application, I get exceptions
          related to no being connected to the database.
          If TAF doesn't work with WebLogic, is there a way to work around
          the problem? Can I catch these exceptions and renew the connections
          in the pool? Or, what else is possible...?
          I'd appreciate any help. I'd like to demonstrate our HA product
          with WLS. If it doesn't work, I'll turn to iPlanet instead. Pity,
          I really like WLS!
          Thanks in advance for any help or advice!
          Regards, Frank Olsen


          Hi (Frank ;-)
          I got carried away a bit too fast...
          Some more testing shows that it doesn't work in all cases:
          - when someone is trying to check out the shopping cart when the
          the database fails (and fails over), I get exceptions once the
          databses has restarted on the backup node
          - the exceptions are related to some transactions being rolled
          back and Oracle stating that it couldn't safely replay the transactions
          - browsing the categories still works fine
          - all access to the shopping cart and sign-in/sign-out causes time-outs
          and exceptions
          Any ideas what may cause this problem, please?
          Regards,
          Frank Olsen
          "Frank Olsen" <[email protected]> wrote:
          >
          >Hi,
          >
          >TAF worked with WLS 6 on NT with the Oracle 8.1.7 client!
          >
          >Has anyone tested it on Solaris/SPARC?
          >
          >Regards,
          >Frank Olsen
          >
          >
          >
          >"Frank Olsen" <[email protected]> wrote:
          >>
          >>Hi,
          >>
          >>Most of my question below is still valid (in particular
          >>concerning
          >>whether TAF should work with WLS on some or all platforms
          >>and
          >>versions).
          >>
          >>However, when I tested TAF with the Oracle client (sqlplus)
          >>there
          >>also was no failover of the (one) connection. I then
          >checked
          >>the
          >>`V$SESSION' view and the colums related to failover showed
          >>that
          >>TAF was not correctly configured. Strange because I copied
          >>the
          >>`tnsnames.ora' parameters from the Oracle documentation
          >>for TAF.
          >>
          >>Has anyone managed to configure and use TAF, with or
          >without
          >>WLS?!
          >>
          >>Thanks in advance for your help!
          >>
          >>Regards,
          >>Frank Olsen
          >>
          >>
          >>"Frank Olsen" <[email protected]> wrote:
          >>>
          >>>Hi,
          >>>
          >>>This message is probably a bit out of context (I've
          >already
          >>>posted
          >>>it to the JDBC group). I post here as well, since I
          >guess
          >>>it's
          >>>the place where people have the most experience with
          >>clustering
          >>>and HA. Original posting below...
          >>>
          >>>----
          >>>
          >>>Could you please tell me whether, yes or no, connections
          >>>to an
          >>>Oracle database should fail over (when the database
          >fails
          >>>over
          >>>to another machine)? I use Oracle's Transparent Application
          >>>Failover
          >>>(configured via Net8) with Weblogic 6 on Linux and Oracle
          >>>8.1.7
          >>>on Solaris/SPARC.
          >>>
          >>>If this doesn't work in my configuration, is there any
          >>>configuration
          >>>where it should work? (Another version of Oracle,
          >WLS,
          >>>OS, ...)
          >>>
          >>>
          >>>When I try TAF using the PetStore application, I get
          >>exceptions
          >>>related to no being connected to the database.
          >>>
          >>>If TAF doesn't work with WebLogic, is there a way to
          >>work
          >>>around
          >>>the problem? Can I catch these exceptions and renew
          >the
          >>>connections
          >>>in the pool? Or, what else is possible...?
          >>>
          >>>I'd appreciate any help. I'd like to demonstrate our
          >>HA
          >>>product
          >>>with WLS. If it doesn't work, I'll turn to iPlanet instead.
          >>>Pity,
          >>>I really like WLS!
          >>>
          >>>Thanks in advance for any help or advice!
          >>>
          >>>Regards, Frank Olsen
          >>>
          >>
          >

Fail over not reliable

          When my database fails over, my weblogic 5.1 (sp10) cluster doesn't always reconnect
          when the DB comes back up. It just works for a minute or so, then freezes....
          no errors... I have tested the DB and it is fine, if I restart my app-servers
          they reconnect just fine. As it stands, i get about a 50% success ratio on the
          fail-over.
          I have the following setting for my connection Pool :
          weblogic.jdbc.connectionPool.ejbPool=\
          url=jdbc20:weblogic:oracle,\
          driver=weblogic.jdbc20.oci.Driver,\
          loginDelaySecs=0,\
          initialCapacity=5,\
          maxCapacity=35,\
          capacityIncrement=2,\
          allowShrinking=true,\
          shrinkPeriodMins=5,\
          refreshTestMinutes=1,\
          testConnsOnReserve=true,\
          testConnsOnRelease=true,\
          testTable=dual,\
          props=user=ZZZ;password=YYY;server=XXX;
          ANY help would be appreciated.
          Thanks,
          Jacques


Have you tried resetting the connection pool manually when the database
          server fails over? You can do this via the weblogic.Admin Java program
          from the command line so it is scriptable. Typically, I recommend that
          you add this to your database fail-over scripts so that as soon as the
          database comes back up, the script invokes the command to reset the
          connection pool on each server...
          Raj Alagumalai wrote:
          > Jacques,
          >
          > The value that you have set for refreshTestMinutes is very low.
          >
          > > refreshTestMinutes=1,\
          >
          >
          >
          > This will cause the server to refresh every connection not being used
          > every minute. I would suggest that you increase this value and enable
          > jdbc logging and test failover
          >
          > Thanks
          >
          >
          > Raj Alagumalai
          > Developer Relations Engineer
          > BEA Support
          >
          >
          >
          > jacques Vigeant wrote:
          >
          >> When my database fails over, my weblogic 5.1 (sp10) cluster doesn't
          >> always reconnect
          >> when the DB comes back up. It just works for a minute or so, then
          >> freezes....
          >> no errors... I have tested the DB and it is fine, if I restart my
          >> app-servers
          >> they reconnect just fine. As it stands, i get about a 50% success
          >> ratio on the
          >> fail-over.
          >> I have the following setting for my connection Pool :
          >> weblogic.jdbc.connectionPool.ejbPool=\
          >> url=jdbc20:weblogic:oracle,\
          >> driver=weblogic.jdbc20.oci.Driver,\
          >> loginDelaySecs=0,\
          >> initialCapacity=5,\
          >> maxCapacity=35,\
          >> capacityIncrement=2,\
          >> allowShrinking=true,\
          >> shrinkPeriodMins=5,\
          >> refreshTestMinutes=1,\
          >> testConnsOnReserve=true,\
          >> testConnsOnRelease=true,\
          >> testTable=dual,\
          >> props=user=ZZZ;password=YYY;server=XXX;
          >>
          >>
          >> ANY help would be appreciated.
          >> Thanks,
          >> Jacques
          >>
          >

Fail over is not happening in Weblogic JSP Server

Hi..
We have 6 Weblogic instances running as application server (EJB) and 4 Weblogic
instances running as web server (JSP). We have configured one cluster for EJB
servers and one cluster for JSP servers. In front-end we are using four Apache
servers to proxy the request to Weblogic JSP cluster. In my httpd.conf file I
have configured with the Weblogic cluster. I can see the requests are going in
all the servers and believe the cluster is working fine in terms of load balancing
(round-robin). The clients are accessing the servers using CSS (Cisco Load Balancer).
But when we test the fail-over in the cluster, we are facing problems. Let me
explain the scenarios of the fail-over test:
1.     The load was generated by the Load Generator
2.     When the load is there, we shut down one Apache server, even though there was
some failed transaction, immedialty the servers become stable. So fail-over is
happening in this stage.
3.     When I shutdown one EJB instance, again after some failed transactions, the
transactions become stable
4.     But, when I shutdown one JSP instance, immediately the transaction failed and
it is not able to fail over to another JSP server and the number of failed transactions
increased.
So I guess, there is some problem in the proxy plug-in configuration, so that
when I shutdown one JSP server, still the requests are being send to the JSP server
by the Apache proxy plug-in.
I have read various queries posted in the News Groups and found some information
about configuring session and cookie information in the Weblogic.xml file. Also
I’m not sure what are all the configurations needs to be done in the Weblogic.xml
and httpd.conf file. Kindly help me to resolve the problem. I would appreciate
your response.
===============================================================
My httpd.conf file plug-in configuration:
###WebLogic Proxy Directives. If proxying to a WebLogic Cluster see WebLogic
Documentation.
<IfModule mod_weblogic.c>
WebLogicCluster X.X.X.X1:7001,X.X.X.X2:7001,X.X.X.X3:7001,X.X.X.X4:7001
MatchExpression *.jsp
</IfModule>
<Location /apollo>
SetHandler weblogic-handler
DynamicServerList ON
HungServerRecoverSecs 600
ConnectTimeoutSecs 40
ConnectRetrySecs 2
</Location>
==============================================================
Thanks in advance,
Siva.

Hi,
I can see that bug 13703600 is already got fixed in 12.1.2 but still you same problem please raise ticket with oracle support.
Regrds,
Kal

Sun Directory Proxy 5.2 fail over?

How do i set up DIR Proxy to fail over on the proxy end. I understand if a directory goes down, but what is the proxy goes down. How to i assure that the client will fail over to another proxy?

There are at least three options, in order of least to most desirable:
1. Have the client maintain a failover list of proxy servers. If it is unable to contact the first server in its local list, have it try the next server.
Pros: No additional network complexity or cost.
Cons: May not be possible unless you have access to client code; if proxy server addresses change or additional proxies are added, all clients configurations must be updated.
2. Use round-robin DNS. Point the client to the round robin address and let DNS determine which proxy server should be contacted.
Pros: Easy to implement on client side; load balances traffic equally across your proxy servers.
Cons: A proxy server failure will cause your clients to fail 1/nth of the time (where n is the number of proxy servers in the RR configuration).
3. Use an IP load balancing device (such as a Cisco Distributed Director or F5 BigIP) and have all clients point to the virtual IP of that device.
Pros: Automatically takes failed proxy servers out of the pool; additional load balancing configurations can be maintained on the hardware load balancing device. If you have redundant load balancers, a client should nearly always be able to connect.
Cons: Additional cost and network complexity

OCR and voting disks on ASM, problems in case of fail-over instances

Hi everybody
in case at your site you :
- have an 11.2 fail-over cluster using Grid Infrastructure (CRS, OCR, voting disks),
where you have yourself created additional CRS resources to handle single-node db instances,
their listener, their disks and so on (which are started only on one node at a time,
can fail from that node and restart to another);
- have put OCR and voting disks into an ASM diskgroup (as strongly suggested by Oracle);
then you might have problems (as we had) because you might:
- reach max number of diskgroups handled by an ASM instance (63 only, above which you get ORA-15068);
- experiment delays (especially in case of multipath), find fake CRS resources, etc.
whenever you dismount disks from one node and mount to another;
So (if both conditions are true) you might be interested in this story,
then please keep reading on for the boring details.
One step backward (I'll try to keep it simple).
Oracle Grid Infrastructure is mainly used by RAC db instances,
which means that any db you create usually has one instance started on each node,
and all instances access read / write the same disks from each node.
So, ASM instance on each node will mount diskgroups in Shared Mode,
because the same diskgroups are mounted also by other ASM instances on the other nodes.
ASM instances have a spfile parameter CLUSTER_DATABASE=true (and this parameter implies
that every diskgroup is mounted in Shared Mode, among other things).
In this context, it is quite obvious that Oracle strongly recommends to put OCR and voting disks
inside ASM: this (usually called CRS_DATA) will become diskgroup number 1
and ASM instances will mount it before CRS starts.
Then, additional diskgroup will be added by users, for DATA, REDO, FRA etc of each RAC db,
and will be mounted later when a RAC db instance starts on the specific node.
In case of fail-over cluster, where instances are not RAC type and there is
only one instance running (on one of the nodes) at any time for each db, it is different.
All diskgroups of db instances don't need to be mounted in Shared Mode,
because they are used by one instance only at a time
(on the contrary, they should be mounted in Exclusive Mode).
Yet, if you follow Oracle advice and put OCR and voting inside ASM, then:
- at installation OUI will start ASM instance on each node with CLUSTER_DATABASE=true;
- the first diskgroup, which contains OCR and votings, will be mounted Shared Mode;
- all other diskgroups, used by each db instance, will be mounted Shared Mode, too,
even if you'll take care that they'll be mounted by one ASM instance at a time.
At our site, for our three-nodes cluster, this fact has two consequences.
One conseguence is that we hit ORA-15068 limit (max 63 diskgroups) earlier than expected:
- none ot the instances on this cluster are Production (only Test, Dev, etc);
- we planned to have usually 10 instances on each node, each of them with 3 diskgroups (DATA, REDO, FRA),
so 30 diskgroups each node, for a total of 90 diskgroups (30 instances) on the cluster;
- in case one node failed, surviving two should get resources of the failing node,
in the worst case: one node with 60 diskgroups (20 instances), the other one with 30 diskgroups (10 instances)
- in case two nodes failed, the only node survived should not be able to mount additional diskgroups
(because of limit of max 63 diskgroup mounted by an ASM instance), so all other would remain unmounted
and their db instances stopped (they are not Production instances);
But it didn't worked, since ASM has parameter CLUSTER_DATABASE=true, so you cannot mount 90 diskgroups,
you can mount 62 globally (once a diskgroup is mounted on one node, it is given a number between 2 and 63,
and other diskgroups mounted on other nodes cannot reuse that number).
So as a matter of fact we can mount only 21 diskgroups (about 7 instances) on each node.
The second conseguence is that, every time our CRS handmade scripts dismount diskgroups
from one node and mount it to another, there are delays in the range of seconds (especially with multipath).
Also we found inside CRS log that, whenever we mounted diskgroups (on one node only), then
behind the scenes were created on the fly additional fake resources
of type ora*.dg, maybe to accomodate the fact that on other nodes those diskgroups were left unmounted
(once again, instances are single-node here, and not RAC type).
That's all.
Did anyone go into similar problems?
We opened a SR to Oracle asking about what options do we have here, and we are disappointed by their answer.
Regards
Oscar

Hi Klaas-Jan
- best practises require that also online redolog files are in a separate diskgroup, in case of ASM logical corruption (we are a little bit paranoid): in case DATA dg gets corrupted, you can restore Full backup plus Archived RedoLog plus Online Redolog (otherwise you will stop at the latest Archived).
So we have 3 diskgroups for each db instance: DATA, REDO, FRA.
- in case of fail-over cluster (active-passive), Oracle provide some templates of CRS scripts (in $CRS_HOME/crs/crs/public) that you edit and change at your will, also you might create additionale scripts in case of additional resources you might need (Oracle Agents, backups agent, file systems, monitoring tools, etc)
About our problem, the only solution is to move OCR and voting disks from ASM and change pfile af all ASM instance (parameter CLUSTER_DATABASE from true to false ).
Oracle aswers were a litlle bit odd:
- first they told us to use Grid Standalone (without CRS, OCR, voting at all), but we told them that we needed a Fail-over solution
- then they told us to use RAC Single Node, which actually has some better features, in csae of planned fail-over it might be able to migreate
client sessions without causing a reconnect (for SELECTs only, not in case of a running transaction), but we already have a few fail-over cluster, we cannot change them all
So we plan to move OCR and voting disks into block devices (we think that the other solution, which needs a Shared File System, will take longer).
Thanks Marko for pointing us to OCFS2 pros / cons.
We asked Oracle a confirmation that it supported, they said yes but it is discouraged (and also, doesn't work with OUI nor ASMCA).
Anyway that's the simplest approach, this is a non-Prod cluster, we'll start here and if everthing is fine, after a while we'll do it also on Prod ones.
- Note 605828.1, paragraph 5, Configuring non-raw multipath devices for Oracle Clusterware 11g (11.1.0, 11.2.0) on RHEL5/OL5
- Note 428681.1: OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE)
-"Grid Infrastructure Install on Linux", paragraph 3.1.6, Table 3-2
Oscar

Is my installation of SQL Server Fail Over cluster correct?

I made a 2 node SQL Server 2012 fail over cluster but having some problems during installation so I wanted to know if the steps below I performed are correct.
Hardware
Node1 192.168.1.10
Node2 192.168.1.11
Added following entries in DNS
cluster.domain.local 192.168.1.12 (for Windows Cluster)
msdtc.domain.local 192.168.1.13 (for MSDTC)
sql.domain.local 192.168.1.14 (for SQL Server Cluster)
Cluster Storage
Disk1 (for Quorum)
Disk2 (for MSDTC
Disk3 (for SQL Server)
Now comes the installation. I am performing all these steps as DOMAIN ADMIN.
1. First I installed clustering role on both nodes
2. Then I ran fail over validation wizard on Node1 adding both nodes which went fine (there were some warnings)
3. Then I made a Windows Cluster on Node1 using these two nodes. I gave the name and IP to this cluster which I wrote above i.e. cluster.domain.local 192.168.1.12
4. Cluster was created and boths nodes are UP.
Now I want to ask a question here. Is it best practice to perform the above operation using DOMAIN ADMIN? Or if I use a standard domain user account with local admin rights, will it work? If not then exactly what rights are required to perform this operation.
5. Then I installed "Application Server" role on both Node1 and Node2 and also added "Distributed Transaction" feature
6. Then I right clicked on Windows Cluster I created and added a new role/feature which is "DTC"
7. I gave it the same name which I wrote above i.e. msdtc.domain.local 192.168.1.13
8. MSDTC was created but when it tried to UP its service, it threw an error. Upon investigation it turns out the Windows Cluster cluster.domain.local doesn't have proper rights to created some objects in AD. I didn't know what rights to give so I gave it full
permission and after that when I created MSDTC again, the service went up fine.
So I want to know what rights does cluster.domain.com require to make MSDTC?
Am I doing good so far?

Hello,
>>Then I made a Windows Cluster on Node1 using these two nodes. I gave the name and IP to this cluster which I wrote above i.e. cluster.domain.local 192.168.1.10
Hello I suppose this IP was physical node IP windows cluster IP was 192.168.1.12 I suppose yo must have given this IP as windows cluster IP.10 and 11 are physical nodes in Cluster but 12 is Cluster IP .Correct me if I am wrong.
Did you do failover and failback to check whether cluster is configured correctly or not ,If not please do it .
>>Then I ran fail over validation wizard on Node1 adding both nodes which went fine (there were some warnings)
Please remove warnings also ,it might cause issue.Not sure its correct every time but make sure cluster validation should be free of error and warning.
>>Now I want to ask a question here. Is it best practice to perform the above operation using DOMAIN ADMIN?
You can do it with domain admin account as this is required to create Cluster NAme object(CNO) in domain and local account might not have that right so I would say its ok.
>>I gave it the same name which I wrote above i.e. msdtc.domain.local
192.168.1.11
again this IP is node 2 IP how can you give it to MSDTC.Use below link for reference
http://blogs.msdn.com/b/cindygross/archive/2009/02/22/how-to-configure-dtc-for-sql-server-in-a-windows-2008-cluster.aspx
Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers

NIC not failing Over in Cluster

Hi there...I have configured 2 Node cluster with SoFS role...for VM Cluster and HA using Windows Server 2012 Data Center. Current set up is Host Server has 3 NICS (2 with Default Gateway setup (192.x.x.x), 3 NIC is for heartbeat 10.X.X.X). Configured CSV
(can also see the shortcut in the C:\). Planning to setup few VMs pointing to the disk in the 2 separate storage servers (1 NIC in 192.x.x.x) and also have 2 NIC in 10.x.x.x network. I am able to install VM and point the disk to the share in the cluster volume
1.
I have created 2 VM Switch for 2 separate Host server (using Hyper-V manager). When I test the functionality by taking Node 2, I can see the Disk Owner node is changing to Node 1, but the VM NIC 2 is not failing over automatically to VM NIC 1 (but I can
see the VM NIC 1 is showing up un-selected in the VM Settings). when I go to the VM Settings > Network Adapter, I get error -
An Error occurred for resource VM "VM Name". select the "information details" action to view events for this resource. The network adapter is configures to a switch which no longer exists or a resource
pool that has been deleted or renamed (with configuration error in "Virtual Switch" drop down menu).
Can you please let me know any resolution to fix this issue...Hoping to hear from you.
VT

Hi,
From your description “My another thing I would like to test is...I also would like to bring a disk down (right now, I have 2 disk - CSV and one Quorum disk) for that 2 node
cluster. I was testing by bringing a csv disk down, the VM didnt failover” Are you trying to test the failover cluster now? If so, please refer the following related KB:
Test the Failover of a Clustered Service or Application
http://technet.microsoft.com/en-us/library/cc754577.aspx
Hope this helps.
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place.

How to add a cloud machine as a node to existing windows fail over cluster having on-premise node in Windows server 2008 R2

Hi All,
We have a windows fail over cluster having one windows machine on local network as one of its node.
I want to add a virtual cloud machine available on microsoft azure as another node to this existing cluster.
Please suggest how to do this?
Thanking all in advance,
Raghvendra

Before you even start working on the SQL side, you will need to create a Windows Server 2008 R2 cluster with no shared storage. You can actually test that in-house. Create a VM running 2008 R2 and cluster it with your physical (from your description,
I am assuming physical) 2008 R2 machine. Create it with a file share witness for quorum. Then configure your environment to see that it works as expected.
Once you know how to configure the cluster between physical and VM with a file share witness, build it to Azure. The location of the FSW gets to be an interesting choice. To have a FSW in Azure means that you will need another VM in Azure to
host the file share, meaning you have two quorum votes in Azure and one in-house. Or, you could create a file share witness on an in-house system, giving you two quorum votes in-house and one in Azure.
In the FSW in Azure scenario, if you have a loss of the in-house server, automatic failover occurs because two quorum votes exist in Azure. With FSW in-house, depending on the loss you have in-house, you might have to force quorum to get the Azure
single-node cluster to run. Loss of access to Azure reverses those scenarios. Neither one is optimal, but it does provide some level of recoverability.
. : | : . : | : . tim

Automatic Site Fail Over

Hi
Here is my setup
2 Mail box servers cluster with one dag in Toronto. 1 Cas server in Toronto.
My question is if I setup another mailbox Cas server in vancouver. Can I acheieve automatic site fail over to Vancouver if Toronto goes down?
Is it recommended to do so? or what would be a better site failover strategy?

In simple words Automatic failover won't happen if Toronto goes down.
First let us read Exchange 2010 DAG Misconceptions
here
For multi-site DAG, you need to enable DAC (Datacenter Activation coordination), that will require to contact each DAG member to seek permission if it can mount databases. Since Toronto is down or there is network problem, Remote site would not be able to
mount databases.
Failover to Local Site is always automatic (preferred) and to remote site manual (preferred again). Because, Exchange has no knowledge of your Infrastructure and can't make smart decision whether to mount databases.
Exchange does support mounting of databases, and requires careful steps for failover and fallback. As explained below -
http://smtpport25.wordpress.com/2010/12/10/exchange-2010-dag-local-and-site-drfailover-and-fail-back/
Hope this helps.
Regards,
Sarvesh Goel
MCP, MCITP, MCTS, MCSA - Directory Services and Microsoft Exchange
Ok but what is the reason automatic site fail over will not occur?
The reason why I'm asking is because another admins believes if Vancouver is added to the DAG then the databases should mount automatically when the other nodes fail.

Servlet fail-over problem

          I'm testing WebLogic clustering of servlets with in-memory-replication in Sun platform
          (wls 5.1 sp9) and using Apache plug-in.
          I did this test:
          - I configured a cluster of two servers
          - I simulate a situation of hang, in one of the two servers, filling all execution
          threads with servlets doing Thread.sleep()
          - I tried to launch a request to the cluster (a JSP request) but my request timed
          out
          after ConnectTimeoutSecs.
          Looking at the wlproxy.log it seems that the cluster attempts to failover to
          the
          secondary server (after HungServerRecoverSecs) but it doesn't respond, then it
          retries
          with the primary server and so on ( waiting every time HungServerRecoverSecs for
          a
          response) until the timeout "ConnectTimeoutSecs" is reached.
          This is very strange because the secondary server is not hung; if I launch a request
          directly to it (specifying in the URL host:port) it responds to me.
          I have also tried to specify the parameter Idempotent ON, even if the default
          is ON, but
          with no result.
          Can anyone help me?


          I solved the problem setting the parameter weblogic.system.servletThreadCount
          in the cluster properties file.
          Now another problem raised.
          When one of server of the cluster is in a status of hang the cluster
          carry out fail-over to the second server but session information is lost.
          Can anyone help me?
          "Mike Reiche" <[email protected]> wrote:
          >
          >Don't take this as absolute gospel - it is just my understanding of how
          >things
          >work.
          >
          >Since the WL server is still alive, it will accept connections. This
          >takes ConnectTimeOutSecs
          >out of the picture.
          >
          >Now you're just left with HungRecoverSeconds. If the response takes longer
          >than
          >HungRecoverSeconds, then wlproxy will deem the request to have 'timed
          >out'. If
          >it is not Idempotent, that's it, you're done. If it is Idempotent, wlproxy
          >will
          >retry - on the other wl instance. From what you describe, the second
          >one should
          >work - unless of course the second WL is also backed up with Thread.sleep()
          >-
          >then after HungRecoverSeconds, the request will be resent to an available
          >WL instance.
          >
          >"Lucia Giraldo" <[email protected]> wrote:
          >>
          >>I'm testing WebLogic clustering of servlets with in-memory-replication
          >>in Sun platform
          >>(wls 5.1 sp9) and using Apache plug-in.
          >>I did this test:
          >>- I configured a cluster of two servers
          >>- I simulate a situation of hang, in one of the two servers, filling
          >>all execution
          >>threads with servlets doing Thread.sleep()
          >>- I tried to launch a request to the cluster (a JSP request) but my
          >request
          >>timed
          >>out
          >>after ConnectTimeoutSecs.
          >>Looking at the wlproxy.log it seems that the cluster attempts to failover
          >>to
          >>the
          >>secondary server (after HungServerRecoverSecs) but it doesn't respond,
          >>then it
          >>retries
          >>with the primary server and so on ( waiting every time HungServerRecoverSecs
          >>for
          >>a
          >>response) until the timeout "ConnectTimeoutSecs" is reached.
          >>This is very strange because the secondary server is not hung; if I
          >launch
          >>a request
          >>directly to it (specifying in the URL host:port) it responds to me.
          >>I have also tried to specify the parameter Idempotent ON, even if the
          >>default
          >>is ON, but
          >>with no result.
          >>Can anyone help me?
          >

Saving a website on the hard drive so that you can put it on another host

I just made a website and want to save it to my hard drive just in case everything crashes or there is a dispute with the host and then I will have the website ready to put on another host. For example, if the host decides they don't like my content they might just pull it off the web. I have heard it happen to people and they had to find another host. I don't think it will happen to me but you never know and I just want to have a copy so that if there is any dispute, I can have the website saved and ready to put up on another hosting account.
Can I just save it from the web using "Save file"? Will that save it in a form that's good enough for putting on another host? I can hire a designer to put it on the host if it takes a few extra programming steps, so long as I have the files saved .....
Or do I have to do something complicated like special save programs like
http://www.sitesucker.us/mac.html ?
I use Scrapbook add-on on Firefox to save the webpages I like just in case the site gets pulled down in the future. Is that the kind of thing I need? If that is all I need, can I use Scrapbook to save my website?
Will it save all the information to restore the website on another host exactly as it appears? My website has been coded with HTML/CSS. I don't think there is much PHP.
I am not sure about databases. My website is simply selling a service so I don't keep a database I think or just a minimal one. The customer fills out a form and the information goes to my email address. And the information on my website doesn't change. It isn't a blog that I am adding information to constantly. It just has links for navigating to different pages on the same website. It doesn't have links that point to other websites. There are no fancy files like movies etc.
I just want something to save my website so that I or an experienced web person can easily put it back up on another host.
I use Snow Leopard 6.0 by the way. I haven't upgraded to Snow Leopard 6.3 because there were a lot of problems when I did that in the past. So I went back to Snow Leopard 6.0. (It's been good so far so I think I will stay with it.)

I didn't make it. I gave the psd design to a web designer and gave him instructions to make it using CSS as I heard CSS web pages load faster. He said he wouldn't use database programming languages like php or MySquareL or whatever they're called because he said I didn't need them and he would charge me a lot more for that, so I said just make it with CSS/HTML. I think he's a PC guy, not a mac guy. He's a professional designer and he made my website from scratch from my designs, not with a website builder.
He hasn't put it on the host yet though he's just about to. I did a search about backing up your website and one website said that some hosts allow you to backup the website using cPanel or something like that. I don't know whether my host has that or not. My host account is very cheap like $16 a year so I don't think there are any special features to the hosting account.

Proper steps to fail over to another host in a cluster

Similar Messages

Maybe you are looking for