SQL not caching on failed over node

Hi Friends,
Running SQL 2008 SP4 Cluster , but when I'm on second node the queries are extremely slow and noticed that the SQL is not caching properly the data after the first warmup run. any thoughts and suggestions? have you had this problem before?
Thanks,
Patrick
Patrick Alexander

Hi Patrick,
According to your description, you face slow running queries. Based on my research, the issue could be due to the lack of useful statistics, or the lack of useful indexes according to this
article.
To trouble shoot the issue, you could follow the steps below:
1. Use SQL Server Profiler to find the slow queries. SQL Server Profiler provides a graphical user interface to create, manage, analyze, and replay SQL traces. SQL Server Profiler provides several built-in templates where tracked events and columns for each
event are defined. You could select TSQL_Duration from the templates, and select the save to file option to save the captured event information into a trace file (.trc) that you can analyze later, or replay in SQL Server Profiler. For how to monitor a query
using SQL Server Profiler, please refer to the article:
http://solutioncenter.apexsql.com/monitor-sql-server-queries-find-poor-performers-sql-server-profiler/
2. Analyze the performance of a slow-running query according to the following article:
Displaying Graphical Execution Plans (SQL Server Management Studio). The information gathered allows you to determine how a query is executed by the SQL Server query optimizer
and which indexes are being used. Using this information, you can determine if performance improvements can be made by changing the indexes on the tables. For more information about how to use indexes to improve query performance, see
General Index Design Guidelines.
3. You could create additional statistics, or update statistics. For more information about how to use statistics to improve query performance, please refer to the article:
https://technet.microsoft.com/en-us/library/ms190397(v=sql.105).aspx#CreateStatistics.
Regards,
Michelle Li

Similar Messages

  • License Probelm  Fail over node.

    Dear Gurus,
    I am reciving an error while applying the liscense to the fail over server.
    SAPLICENSE (Release 700) ERROR ***
    ERROR: Can not set DbSl trace function
    DETAILS: DbSlControl(DBSL_CMD_IMP_FUNS_SET) failed with return code
    20
    RC-INFO: error loading dynamic db-library - check environment for:
    dbms_type = <db-type> (e.g. ora)
    DIR_LIBRARY = <path to db-dll>
    (e.g. /usr/sap/SID/SYS/exe/run)
    LD_LIBRARY_PATH = <path do db and sap libs>
    (e.g. /oracle/SID/lib)
    My Production system no is 00000000031XXXXXXX but the license key is for 00000000031XYYYYY.
    I am anot able to login to fail over node due to this.
    How can I resolve this problem ?
    Sachin

    sachin,
       how r u applying the license through slicense???
    unable to login in failover??  have u deleted the temporary license??? or has it crossed 4 weeks??
    if u have deleted the temp license apply the license through visual admin
    if it has crossed 4 weeks , revert the date backwards and apply license
    u must have applied for a license in oss using other system with same sid,
    u have to copy the system number from the node u have applied license on and apply for a license under the same system  in the oss and give the system number.
    when u get the license apply as mentioned above.

  • Bea weblogic 6.1 does not oracle Database fail over

    Hi We had Concurrency Strategy:excusive . Now we change that to Database for performace
    reasons. Since we change that now when we do oracle database fail over weblogic
    6.1 does not detect database fail over and it need restart.
    how we can resolve this ??

    mt wrote:
    Hi We had Concurrency Strategy:excusive . Now we change that to Database for performace
    reasons. Since we change that now when we do oracle database fail over weblogic
    6.1 does not detect database fail over and it need restart.
    how we can resolve this ??Are your pools set to test connections at reserve time?
    Joe

  • Load balancing not happending but fail over is for Read only Entity beans

              The following are the configuration.
              Two NT servers with WL5.1 sp9 having only EJBs(Read only entity beans)
              One Client with WL5.1 sp9 having servlet/java application as
              EJB client.
              I am trying to make a call like findbyprimarykey in one of the
              entity bean. I could see the request is being directed only to the one of the
              server always. When I bring that server, fail over is happening to the other server.
              Here are the settings I have in the ejb-jar.xml :
                        <entity>
                             <ejb-name>device.StartHome</ejb-name>
                             <home>com.wl.api.device.StartHome</home>
                             <remote>com.wl.api.device.StartRemote</remote>
                             <ejb-class>com.wl.server.device.StartImpl</ejb-class>
                             <persistence-type>Bean</persistence-type>
                             <prim-key-class>java.lang.Long</prim-key-class>
                             <reentrant>False</reentrant>
                             <resource-ref>
                                  <res-ref-name>jdbc/wlPool</res-ref-name>
                                  <res-type>javax.sql.DataSource</res-type>
                                  <res-auth>Container</res-auth>
                             </resource-ref>
                        </entity>
              Here are the settings I have in the weblogic-ejb-jar.xml.
              <weblogic-enterprise-bean>
                        <ejb-name>device.StartHome</ejb-name>
                        <caching-descriptor>
                             <max-beans-in-cache>50</max-beans-in-cache>
                             <cache-strategy>Read-Only</cache-strategy>
                             <read-timeout-seconds>900</read-timeout-seconds>
                        </caching-descriptor>
                        <reference-descriptor>
                             <resource-description>
                                  <res-ref-name>jdbc/wlPool</res-ref-name>
                                  <jndi-name>weblogic.jdbc.pool.wlPool</jndi-name>
                             </resource-description>
                        </reference-descriptor>
                        <enable-call-by-reference>False</enable-call-by-reference>
                        <jndi-name>device.StartHome</jndi-name>
                   </weblogic-enterprise-bean>
              Am I doin any mistake in this?
              Any one's help is appreciated.
              Thanks
              Suresh
              

    we are using 5.1
              "Gene Chuang" <[email protected]> wrote in message
              news:[email protected]...
              > Colocation optimization occurs if your client resides in the same
              container (and also in the same
              > EAR for 6.0) as your ejbs.
              >
              > Gene
              >
              > "Suresh" <[email protected]> wrote in message
              news:[email protected]...
              > > Ok....the ejb-call-by-reference set to true is making the call to one
              server
              > > only. i am not sure why it is. I removed the property name and it
              works.
              > > Also I have one question, in our prduct environment, when i cache the
              ejb
              > > home it is not doing the load balancing. can any one help me for that.
              > > thanks
              > >
              > > Mike,
              > > From the sample pgm I sent, even from single client calls get load
              > > balanced.
              > >
              > > Suresh
              > >
              > >
              > > "Gene Chuang" <[email protected]> wrote in message
              > > news:[email protected]...
              > > > In WL, LoadBalancing will ONLY WORK if you reuse your EJBHome! Take
              your
              > > StartEndPointHome lookup
              > > > out of your for loop and see if this fixes your problem.
              > > >
              > > > I've seen this discussion in ejb-interest, and some other vendor
              (Borland,
              > > I believe it is), brings
              > > > up an interesting point: Clustering and LoadBalance is not in the
              J2EE
              > > specs, hence implementation
              > > > is totally up to the vendor. Weblogic loadbalances from the remote
              > > interfaces (EJBObject, EJBHome,
              > > > etc..), while Borland loadbalances from JNDI Context lookup.
              > > >
              > > > Let me suggest a third implmentation: loadbalance from BOTH Context
              > > lookup as well as stub method
              > > > invocation! Or create a smart replica-aware list manager which
              persists
              > > on the client thread
              > > > (ThreadLocal) and is aware of lookup/evocation history. Hence if I do
              the
              > > following in a client
              > > > hitting a 3 node cluster, I'll still get perfect round-robining
              regardless
              > > of what I do on the
              > > > client side:
              > > >
              > > > InitialContext ctxt = new InitialContext();
              > > > EJBHome myHome = ctxt.lookup(MY_BEAN);
              > > > myHome.findByPrimaryKey(pk); <== hits Node #1
              > > > myHome = ctxt.lookup(MY_BEAN);
              > > > myHome.findByPrimaryKey(pk); <== hits Node #2
              > > > myHome.findByPrimaryKey(pk); <== hits Node #3
              > > > myHome = ctxt.lookup(MY_BEAN);
              > > > myHome.findByPrimaryKey(pk); <== hits Node #1
              > > > ...
              > > >
              > > >
              > > > Gene
              > > >
              > > > "Suresh" <[email protected]> wrote in message
              > > news:[email protected]...
              > > > > Mike ,
              > > > >
              > > > > Do you have any reasons for the total number of machines to be 10.
              > > > >
              > > > > I tried with 7 machines.
              > > > >
              > > > >
              > > > > Here is my sample client java application running individual in the
              > > seven
              > > > > machines.
              > > > >
              > > > > StartEndPointHome =
              > > > > (StartEndPointHome)ctx.lookup("dev.StartEndPointHome");
              > > > > for(;;)
              > > > > {
              > > > > // logMsg(" --in loop "+currentTime);
              > > > > if (currentTime > nextRefereshTime)
              > > > > {
              > > > > logMsg("****- going to call");
              > > > > currentTime=getSystemTime();
              > > > > nextRefereshTime=currentTime+timeInterval;
              > > > > StartEndPointHome =
              > > > > (StartEndPointHome)ctx.lookup("dev.StartEndPointHome");
              > > > > long rndno=(long)(Math.random()*10)+range;
              > > > > logMsg(" going to call remotestub"+rndno);
              > > > > retVal =
              > > > >
              > >
              ((StartEndPointHome)getStartHome()).findByNumber("pe"+rndno+"_mportal_dsk36.
              > > > > mportal.com");
              > > > >
              > > > > logMsg("**++- called stub");
              > > > > }
              > > > >
              > > > >
              > > > >
              > > > > The range value is different for each of the machines in the
              cluster.
              > > > >
              > > > > If the first request starts at srv1, all request starts hitting the
              same
              > > > > server.
              > > > > If the first request starts at srv2, all request starts hitting the
              same
              > > > > server.
              > > > >
              > > > > I have the following for the url , user and pwd values for the
              context
              > > .
              > > > >
              > > > > public static String url="t3://10.11.12.14,10.11.12.117:8000";
              > > > > public static String user="guest";
              > > > > public static String password="guest";
              > > > >
              > > > >
              > > > >
              > > > > It would be great if you could help me.
              > > > >
              > > > > Thanks
              > > > > suresh
              > > > >
              > > > >
              > > > > "Mike Reiche" <[email protected]> wrote in message
              > > > > news:[email protected]...
              > > > > >
              > > > > > If you have only one client don't be surprised if you only hit one
              > > server.
              > > > > Try
              > > > > > running ten different clients and see if the hit the same server.
              > > > > >
              > > > > > Mike
              > > > > >
              > > > > >
              > > > > > "suresh" <[email protected]> wrote:
              > > > > > >
              > > > > > >The following are the configuration.
              > > > > > >
              > > > > > > Two NT servers with WL5.1 sp9 having only EJBs(Read only entity
              > > beans)
              > > > > > >
              > > > > > > One Client with WL5.1 sp9 having servlet/java application as
              > > > > > > EJB client.
              > > > > > >
              > > > > > >
              > > > > > >I am trying to make a call like findbyprimarykey in one of the
              > > > > > >entity bean. I could see the request is being directed only to
              the
              > > one
              > > > > > >of the
              > > > > > >server always. When I bring that server, fail over is happening
              to
              > > the
              > > > > > >other server.
              > > > > > >
              > > > > > >
              > > > > > >Here are the settings I have in the ejb-jar.xml :
              > > > > > > <entity>
              > > > > > > <ejb-name>device.StartHome</ejb-name>
              > > > > > > <home>com.wl.api.device.StartHome</home>
              > > > > > > <remote>com.wl.api.device.StartRemote</remote>
              > > > > > > <ejb-class>com.wl.server.device.StartImpl</ejb-class>
              > > > > > > <persistence-type>Bean</persistence-type>
              > > > > > > <prim-key-class>java.lang.Long</prim-key-class>
              > > > > > > <reentrant>False</reentrant>
              > > > > > > <resource-ref>
              > > > > > > <res-ref-name>jdbc/wlPool</res-ref-name>
              > > > > > > <res-type>javax.sql.DataSource</res-type>
              > > > > > > <res-auth>Container</res-auth>
              > > > > > > </resource-ref>
              > > > > > > </entity>
              > > > > > >
              > > > > > >
              > > > > > >Here are the settings I have in the weblogic-ejb-jar.xml.
              > > > > > >
              > > > > > ><weblogic-enterprise-bean>
              > > > > > > <ejb-name>device.StartHome</ejb-name>
              > > > > > >
              > > > > > > <caching-descriptor>
              > > > > > > <max-beans-in-cache>50</max-beans-in-cache>
              > > > > > > <cache-strategy>Read-Only</cache-strategy>
              > > > > > > <read-timeout-seconds>900</read-timeout-seconds>
              > > > > > > </caching-descriptor>
              > > > > > >
              > > > > > > <reference-descriptor>
              > > > > > > <resource-description>
              > > > > > > <res-ref-name>jdbc/wlPool</res-ref-name>
              > > > > > > <jndi-name>weblogic.jdbc.pool.wlPool</jndi-name>
              > > > > > > </resource-description>
              > > > > > > </reference-descriptor>
              > > > > > > <enable-call-by-reference>False</enable-call-by-reference>
              > > > > > > <jndi-name>device.StartHome</jndi-name>
              > > > > > > </weblogic-enterprise-bean>
              > > > > > >
              > > > > > >
              > > > > > >Am I doin any mistake in this?
              > > > > > >
              > > > > > >Any one's help is appreciated.
              > > > > > >Thanks
              > > > > > >Suresh
              > > > > >
              > > > >
              > > > >
              > > >
              > > >
              > >
              > >
              >
              >
              

  • SQL 2005 cluster rejects SQL logins when in failed over state

    When SQL 2005 SP4 on Windows 2003 server cluster is failed over from Server_A to Server_B, it rejects all SQL Server logins. domain logins are OK. The message is "user is not associated with a trusted server connection", then the IP of the
    client. This is error 18452. Anyone know how to fix this? They should work fine from both servers. We think this started just after installing SP4.
    DaveK

    Hello,
    The connection string is good, you're definitely using sql auth.
    LoginMode on Server_B is REG_DWORD 0x00000001 (1) LoginMode on Server_A is REG_DWORD 0x00000002 (2) Looks like you are on to something. I will schedule another test failover. I assume a 2 is mixed mode? If so, why would SQL allow two different modes
    on each side of a cluster?
    You definitely have a registry replication issue, or at the very least a registry that isn't in sync with the cluster. This could happen for various reasons, none of which we'll probably find out about now, but never the less...
    A good test would be to set it to windows only on Node A, wait a minute and then set it to Windows Auth and see if that replicates the registry setting across nodes correctly - this is actually the windows level and doesn't have anything to do with SQL Server.
    SQL Server reads this value from the registry and it is not stored inside any databases (read, nothing stored in the master database) as such it's a per machine setting. Since it's not set correctly on Node B, when SQL server starts up it correctly reads
    that registry key and acts on it as it should. The culprit isn't SQL Server, it's Windows Clustering.
    Hopefully this makes a little more sense now. You can actually just edit the registry setting to match Node A and fail over to B, everything should work correctly. It doesn't help you with a root cause analysis which definitely needs to be done as who knows
    what else may not be correctly in sync.
    Sean Gallardy | Blog |
    Twitter

  • Extension Mobility not working during fail over

    Hello All
    I am John.   I have a cluster that has 2 subscribers and 1 publisher.  We are running 8.5.1.  Yesterday I had a case where my one subscriber had failed and everything failed over to the other subscriber.  Everything but extension mobility seemed to work.  I checked all my certs, I checked the services, I checked the url for the services.  Everything seemed to check out correctly.  The url is pointed to the publisher and the publisher was available. 
    What i saw during failover:
    When the services button was activated I received the requesting notice.  after a while the message changed to Host not found.  Then the next message was select service. 
    I did a restart on the extension mobility service with no effect.  The other web applications (such as corp directory) all seem to be working correctly.  Extension Mobility is the only service I have running.
    When the failover ended and my lost subscriber was available, extension mobility and the services started working again. 
    Any help or ideas would be greatly appreciated.
    Thank you
    John

    Hi John,
    As suggested by iptuser55, switch on the EM service on both the nodes. This is the only way failover would work. There are 2 services in CUCM, Extension Mobility and EMApp. The EMApp runs on all the nodes, and its job is to keep a list of the nodes with the EM service activated. When the top node in that list goes down, the EMApp will then point to the next node in the list, which will act as the Failover.
    Here's the architecture of the EM service.
    http://www.cisco.com/c/en/us/td/docs/voice_ip_comm/cucm/srnd/8x/uc8x/cmapps.html#wp1187822

  • Weblogic 7.0 SP1 JDBC Multipooling does not appear to fail over consistantly

    We are running Weblogic Platform 7.0 SP1 on a Sun with Solaris 8 O/S
    We have a connection multipool set up associated with a Data Source.
    The multipool has two connection pools, each pointing to a seperate but
    Identical database. The multipool is created with the high availability algorithm
    set.
    both connection pools have the following settings:
    (Connections Tab) (Testing Tab)
    Initial Capacity:0 Test Table Name: dual
    Maximum Capacity: 20 Test Reserved Connections: Checked
    Capacity Increment: 2
    Login Delay Seconds: 2
    Refresh Period: 15
    Allow Shrinking: Checked
    Shrink Period: 15
    When the first database goes down for a backup, We can see the exceptions thrown
    from the
    broken acitve connections, which would be expected. But we also notice a significant
    response time increase (3-7 seconds) and we notice continuing exceptions from
    what appears to be the application EJB still attempting to connect to the first
    instance. It is only when a DISABLE_POOL is issued using the command line interface
    to disable the first pool in the multipool that the response time drops to normal
    and the exceptions stop. The exceptions don't happen for every query, but we cannot
    understand why the test-on-reserve doesn't seem to always work.
    Any help would be appreciated

    This is known behavior of the "high availability" algorithm for multipools.
    Whenever a multipool is asked for a connection, it will always try the first pool first
    (unless it has been diabled). The algorithm doesn't remember that it had trouble with
    the first pool and start with the second pool (e.g., put the first pool at the end of the circular list
    of pools to be tried). This is the way it was designed (in my opinion, this defies the
    principal of least astonishment, but not everyone agrees with me).
    We are planning to add a new algorithm in the next release that does keep state.
    The way it currently works, it depends on how long it takes to figure out that
    the first pool is not available. This depends on the database client driver and
    the network connection (it could take from 100ms to multiple seconds to minutes
    to figure it out).
    "Joe Doyle" <[email protected]> wrote in message news:[email protected]...
    >
    We are running Weblogic Platform 7.0 SP1 on a Sun with Solaris 8 O/S
    We have a connection multipool set up associated with a Data Source.
    The multipool has two connection pools, each pointing to a seperate but
    Identical database. The multipool is created with the high availability algorithm
    set.
    both connection pools have the following settings:
    (Connections Tab) (Testing Tab)
    Initial Capacity:0 Test Table Name: dual
    Maximum Capacity: 20 Test Reserved Connections: Checked
    Capacity Increment: 2
    Login Delay Seconds: 2
    Refresh Period: 15
    Allow Shrinking: Checked
    Shrink Period: 15
    When the first database goes down for a backup, We can see the exceptions thrown
    from the
    broken acitve connections, which would be expected. But we also notice a significant
    response time increase (3-7 seconds) and we notice continuing exceptions from
    what appears to be the application EJB still attempting to connect to the first
    instance. It is only when a DISABLE_POOL is issued using the command line interface
    to disable the first pool in the multipool that the response time drops to normal
    and the exceptions stop. The exceptions don't happen for every query, but we cannot
    understand why the test-on-reserve doesn't seem to always work.
    Any help would be appreciated

  • OPS에서 SYSTEM FAIL로 인한 SQL*NET FAIL-OVER 에 대처할 수 있는 SETUP

    제품 : SQL*NET
    작성날짜 : 1997-11-20
    OPS에서 SYSTEM FAIL로 인한 SQL*NET FAIL-OVER 에 대처할 수 있는 SETUP
    ==================================================================
    OPS에서 여러 node를 사용 중 한 시스템이 fail이 생길 경우 사용자의
    application 수정없이 fail-over 시스템으로 접속하는 방법입니다.
    제약 조건 > SQL*NET V2.3.2.1.6 for windows 95
    SQL*NET V2.3.2.1.8 for windows
    상기의 Version 이상 이라야 하며 위의 경우 tnsnames.ora 화일의
    configuration에 대한 setup 내용입니다.
    Example tnsnames.ora for SQL*Net 2.3 fail-over.
    When a user requests to connect to <alias>, sqlnet will attempt to
    connect to the address in the first description in the
    description_list, if that fails, it will try to connect to the address
    in the next description in the description_list.
    You can have as many descriptions in the description_list as needed.
    The client will only receive an error if all the descriptions fail.
    <alias>=
    (description_list=
    (description=
    (address_list=
    (address=
    (protocol=tcp)
    (host=<server 1>)
    (port=1521)
    (connect_data=
    (sid=<sid1>)
    (description=
    (address_list=
    (address=
    (protocol=tcp)
    (host=<server 2>)
    (port=1521)
    (connect_data=
    (sid=<sid2>)
    )

  • Fail over is not happening in Weblogic JSP Server

    Hi..
    We have 6 Weblogic instances running as application server (EJB) and 4 Weblogic
    instances running as web server (JSP). We have configured one cluster for EJB
    servers and one cluster for JSP servers. In front-end we are using four Apache
    servers to proxy the request to Weblogic JSP cluster. In my httpd.conf file I
    have configured with the Weblogic cluster. I can see the requests are going in
    all the servers and believe the cluster is working fine in terms of load balancing
    (round-robin). The clients are accessing the servers using CSS (Cisco Load Balancer).
    But when we test the fail-over in the cluster, we are facing problems. Let me
    explain the scenarios of the fail-over test:
    1.     The load was generated by the Load Generator
    2.     When the load is there, we shut down one Apache server, even though there was
    some failed transaction, immedialty the servers become stable. So fail-over is
    happening in this stage.
    3.     When I shutdown one EJB instance, again after some failed transactions, the
    transactions become stable
    4.     But, when I shutdown one JSP instance, immediately the transaction failed and
    it is not able to fail over to another JSP server and the number of failed transactions
    increased.
    So I guess, there is some problem in the proxy plug-in configuration, so that
    when I shutdown one JSP server, still the requests are being send to the JSP server
    by the Apache proxy plug-in.
    I have read various queries posted in the News Groups and found some information
    about configuring session and cookie information in the Weblogic.xml file. Also
    I’m not sure what are all the configurations needs to be done in the Weblogic.xml
    and httpd.conf file. Kindly help me to resolve the problem. I would appreciate
    your response.
    ===============================================================
    My httpd.conf file plug-in configuration:
    ###WebLogic Proxy Directives. If proxying to a WebLogic Cluster see WebLogic
    Documentation.
    <IfModule mod_weblogic.c>
    WebLogicCluster X.X.X.X1:7001,X.X.X.X2:7001,X.X.X.X3:7001,X.X.X.X4:7001
    MatchExpression *.jsp
    </IfModule>
    <Location /apollo>
    SetHandler weblogic-handler
    DynamicServerList ON
    HungServerRecoverSecs 600
    ConnectTimeoutSecs 40
    ConnectRetrySecs 2
    </Location>
    ==============================================================
    Thanks in advance,
    Siva.

    Hi,
    I can see that bug 13703600 is already got fixed in 12.1.2 but still you same problem please raise ticket with oracle support.
    Regrds,
    Kal

  • Dabase fail over problem after we change Concurrency Strategy:

    Hi We had Concurrency Strategy:excusive . Now we change that to Database for performace
    reasons. Since we change that now when we do oracle database fail over weblogic
    6.1 does not detect database fail over and it need to be rebooted.
    how we can resolve this ??

    Hi,
    It is just faining one of the application servers, developer wrote that when installing CI, Local hostname is written in Database and SDM. We will have to do a Homogeneous system copy to change the name.
    The problem is that I used Virtual SAP group name in CI and DI application servers, in SCS and ASCS  we used Virtual hostnames and it is OK according to SAP developer.
    The Start and instance profiles were checked and everything was fine, just the dispatcher from CI is having problems when comming from Node B to Node A.
    Regards

  • Fail-over options for Standalone Print Server

    Our organization recently set up 3 2012 r2 Print Servers to handle 3 separate sites.  Each printer server contains only the printers within the site - which makes each server a standalone print server.  I'm concern with the issue of not
    having a fail-over plan in place, in the event one of my server should fail.  Does anyone have any fail-over suggestions.

    Hi Thomas,
    If we want to create a clustered printer server, we need to created another print server in the same site.
    For detailed information, please refer to the link below:
    https://technet.microsoft.com/en-us/library/cc771091.aspx
    Best Regards.
    Steven Lee Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Support, contact [email protected]

  • 3 node cluster with 1 vInstance. vInstance can not to fail-over to one specific node.

    I have a 3 node cluster all running Windows Server 2008 R2. Roughly once a month I see my vInstance become degraded and attempt to fail-over. Everything is good as long as it fail-over to SQL01 or SQL02. However if it attempts to fail-over to SQL03, it does
    not come online
    Quick resolution is to move it manually to SQL01 or SQL02. What could be causing it to fail every time on SQL03.
    A couple points:
    I did not build the environment.
    I am not a DBA.
    I only have general knowledge of SQL clustering.
    I always get two EVENT ID's: 1069
    Cluster resource 'SQL Server (VSQL04)' in clustered service or application 'SQL Server (VSQL04)' failed.
    and then
    EVENT ID 1205
    The Cluster service failed to bring clustered service or application 'SQL Server (VSQL04)' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
    Where should I begin to look for issues?

    Here is the cluster event prior to offline state. I will have to go check the cluster log.
    The Cluster service failed to bring clustered service or application 'SQL Server (VSQL04)' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
    i do not think this helps.. it just says..a resource in offline state.. you need to dig more and see which one it is and why it did not come banck on ..it should be mentioned in the log and\or event viewer.
    Hope it Helps!!

  • Failover cluster server - File Server role is clustered - Shadow copies do not seem to travel to other node when failing over

    Hi,
    New to 2012 and implementing a clustered environment for our File Services role.  Have got to a point where I have successfully configured the Shadow copy settings.
    Have a large (15tb) disk.  S:
    Have a VSS drive (volume shadow copy drive) V:
    Have successfully configured through Windows Explorer the Shadow copy settings.
    Created dependencies in Failcover Cluster Server console whereby S: depends on V:
    However, when I failover the resource and browse the Client Access Point share there are no entries under the "Previous Versions" tab. 
    When I visit the S: drive in windows explorer and open the Shadow copy dialogue box, there are entries showing the times and dates of the shadow copies ran when on the original node.  So the disk knows about the shadow copies that were ran on the
    original node but the "previous versions" tab has no entries to display.
    This is in a 2012 server (NOT R2 version).
    Can anyone explain what might be the reason?  Do I have an "issue" or is this by design?
    All help apprecieated!
    Kathy
    Kathleen Hayhurst Senior IT Support Analyst

    Hi,
    Please first check the requirements in following article:
    Using Shadow Copies of Shared Folders in a server cluster
    http://technet.microsoft.com/en-us/library/cc779378(v=ws.10).aspx
    Cluster-managed shadow copies can only be created in a single quorum device cluster on a disk with a Physical Disk resource. In a single node cluster or majority node set cluster without a shared cluster disk, shadow copies can only be created and managed
    locally.
    You cannot enable Shadow Copies of Shared Folders for the quorum resource, although you can enable Shadow Copies of Shared Folders for a File Share resource.
    The recurring scheduled task that generates volume shadow copies must run on the same node that currently owns the storage volume.
    The cluster resource that manages the scheduled task must be able to fail over with the Physical Disk resource that manages the storage volume.
    If you have any feedback on our support, please send to [email protected]

  • 2 node Sun Cluster 3.2, resource groups not failing over.

    Hello,
    I am currently running two v490s connected to a 6540 Sun Storagetek array. After attempting to install the latest OS patches the cluster seems nearly destroyed. I backed out the patches and right now only one node can process the resource groups properly. The other node will appear to take over the Veritas disk groups but will not mount them automatically. I have been working on this for over a month and have learned alot and fixed alot of other issues that came up, but the cluster is just not working properly. Here is some output.
    bash-3.00# clresourcegroup switch -n coins01 DataWatch-rg
    clresourcegroup: (C776397) Request failed because node coins01 is not a potential primary for resource group DataWatch-rg. Ensure that when a zone is intended, it is explicitly specified by using the node:zonename format.
    bash-3.00# clresourcegroup switch -z zcoins01 -n coins01 DataWatch-rg
    clresourcegroup: (C298182) Cannot use node coins01:zcoins01 because it is not currently in the cluster membership.
    clresourcegroup: (C916474) Request failed because none of the specified nodes are usable.
    bash-3.00# clresource status
    === Cluster Resources ===
    Resource Name Node Name State Status Message
    ftp-rs coins01:zftp01 Offline Offline
    coins02:zftp01 Offline Offline - LogicalHostname offline.
    xprcoins coins01:zcoins01 Offline Offline
    coins02:zcoins01 Offline Offline - LogicalHostname offline.
    xprcoins-rs coins01:zcoins01 Offline Offline
    coins02:zcoins01 Offline Offline - LogicalHostname offline.
    DataWatch-hasp-rs coins01:zcoins01 Offline Offline
    coins02:zcoins01 Offline Offline
    BDSarchive-res coins01:zcoins01 Offline Offline
    coins02:zcoins01 Offline Offline
    I am really at a loss here. Any help appreciated.
    Thanks

    My advice is to open a service call, provided you have a service contract with Oracle. There is much more information required to understand that specific configuration and to analyse the various log files. This is beyond what can be done in this forum.
    From your description I can guess that you want to failover a resource group between non-global zones. And it looks like the zone coins01:zcoins01 is reported to not be in cluster membership.
    Obviously node coins01 needs to be a cluster member. If it is reported as online and has joined the cluster, then you need to verify if the zone zcoins01 is really properly up and running.
    Specifically you need to verify that it reached the multi-user milestone and all cluster related SMF services are running correctly (ie. verify "svcs -x" in the non-global zone).
    You mention Veritas diskgroups. Note that VxVM diskgroups are handled in the global cluster level (ie. in the global zone). The VxVM diskgroup is not imported for a non-global zone. However, with SUNW.HAStoragePlus you can ensure that file systems on top of VxVM diskgroups can be mounted into a non-global zone. But again, more information would be required to see how you configued things and why they don't work as you expect it.
    Regards
    Thorsten

  • VIP is not failed over to surviving nodes in oracle 11.2.0.2 grid infra

    Hi ,
    It is a 8 node 11.2.0.2 grid infra.
    While pulling both cables from public nic the VIP is not failed over to surviving nodes in 2 nodes but remainng nodes VIP is failed over to surviving node in the same cluster. Please help me on this.
    If we will remove the power from these servers VIP is failed over to surviving nodes
    Public nic's are in bonding.
    grdoradr105:/apps/grid/grdhome/sh:+ASM5> ./crsstat.sh |grep -i vip |grep -i 101
    ora.grdoradr101.vip ONLINE OFFLINE
    grdoradr101:/apps/grid/grdhome:+ASM1> cat /proc/net/bonding/bond0
    Ethernet Channel Bonding Driver: v3.4.0-1 (October 7, 2008)
    Bonding Mode: fault-tolerance (active-backup)
    Primary Slave: None
    Currently Active Slave: eth0
    MII Status: up
    MII Polling Interval (ms): 100
    Up Delay (ms): 0
    Down Delay (ms): 0
    Slave Interface: eth0
    MII Status: up
    Speed: 100 Mbps
    Duplex: full
    Link Failure Count: 0
    Permanent HW addr: 84:2b:2b:51:3f:1e
    Slave Interface: eth1
    MII Status: up
    Speed: 100 Mbps
    Duplex: full
    Link Failure Count: 0
    Permanent HW addr: 84:2b:2b:51:3f:20
    Thanks
    Bala

    Please check below MOS note for this issue.
    1276737.1
    HTH
    Edited by: krishan on Jul 28, 2011 2:49 AM

Maybe you are looking for