Fail over not reliable

          When my database fails over, my weblogic 5.1 (sp10) cluster doesn't always reconnect
          when the DB comes back up. It just works for a minute or so, then freezes....
          no errors... I have tested the DB and it is fine, if I restart my app-servers
          they reconnect just fine. As it stands, i get about a 50% success ratio on the
          fail-over.
          I have the following setting for my connection Pool :
          weblogic.jdbc.connectionPool.ejbPool=\
          url=jdbc20:weblogic:oracle,\
          driver=weblogic.jdbc20.oci.Driver,\
          loginDelaySecs=0,\
          initialCapacity=5,\
          maxCapacity=35,\
          capacityIncrement=2,\
          allowShrinking=true,\
          shrinkPeriodMins=5,\
          refreshTestMinutes=1,\
          testConnsOnReserve=true,\
          testConnsOnRelease=true,\
          testTable=dual,\
          props=user=ZZZ;password=YYY;server=XXX;
          ANY help would be appreciated.
          Thanks,
          Jacques
          

Have you tried resetting the connection pool manually when the database
          server fails over? You can do this via the weblogic.Admin Java program
          from the command line so it is scriptable. Typically, I recommend that
          you add this to your database fail-over scripts so that as soon as the
          database comes back up, the script invokes the command to reset the
          connection pool on each server...
          Raj Alagumalai wrote:
          > Jacques,
          >
          > The value that you have set for refreshTestMinutes is very low.
          >
          > > refreshTestMinutes=1,\
          >
          >
          >
          > This will cause the server to refresh every connection not being used
          > every minute. I would suggest that you increase this value and enable
          > jdbc logging and test failover
          >
          > Thanks
          >
          >
          > Raj Alagumalai
          > Developer Relations Engineer
          > BEA Support
          >
          >
          >
          > jacques Vigeant wrote:
          >
          >> When my database fails over, my weblogic 5.1 (sp10) cluster doesn't
          >> always reconnect
          >> when the DB comes back up. It just works for a minute or so, then
          >> freezes....
          >> no errors... I have tested the DB and it is fine, if I restart my
          >> app-servers
          >> they reconnect just fine. As it stands, i get about a 50% success
          >> ratio on the
          >> fail-over.
          >> I have the following setting for my connection Pool :
          >> weblogic.jdbc.connectionPool.ejbPool=\
          >> url=jdbc20:weblogic:oracle,\
          >> driver=weblogic.jdbc20.oci.Driver,\
          >> loginDelaySecs=0,\
          >> initialCapacity=5,\
          >> maxCapacity=35,\
          >> capacityIncrement=2,\
          >> allowShrinking=true,\
          >> shrinkPeriodMins=5,\
          >> refreshTestMinutes=1,\
          >> testConnsOnReserve=true,\
          >> testConnsOnRelease=true,\
          >> testTable=dual,\
          >> props=user=ZZZ;password=YYY;server=XXX;
          >>
          >>
          >> ANY help would be appreciated.
          >> Thanks,
          >> Jacques
          >>
          >
          

Similar Messages

  • Failover cluster server - File Server role is clustered - Shadow copies do not seem to travel to other node when failing over

    Hi,
    New to 2012 and implementing a clustered environment for our File Services role.  Have got to a point where I have successfully configured the Shadow copy settings.
    Have a large (15tb) disk.  S:
    Have a VSS drive (volume shadow copy drive) V:
    Have successfully configured through Windows Explorer the Shadow copy settings.
    Created dependencies in Failcover Cluster Server console whereby S: depends on V:
    However, when I failover the resource and browse the Client Access Point share there are no entries under the "Previous Versions" tab. 
    When I visit the S: drive in windows explorer and open the Shadow copy dialogue box, there are entries showing the times and dates of the shadow copies ran when on the original node.  So the disk knows about the shadow copies that were ran on the
    original node but the "previous versions" tab has no entries to display.
    This is in a 2012 server (NOT R2 version).
    Can anyone explain what might be the reason?  Do I have an "issue" or is this by design?
    All help apprecieated!
    Kathy
    Kathleen Hayhurst Senior IT Support Analyst

    Hi,
    Please first check the requirements in following article:
    Using Shadow Copies of Shared Folders in a server cluster
    http://technet.microsoft.com/en-us/library/cc779378(v=ws.10).aspx
    Cluster-managed shadow copies can only be created in a single quorum device cluster on a disk with a Physical Disk resource. In a single node cluster or majority node set cluster without a shared cluster disk, shadow copies can only be created and managed
    locally.
    You cannot enable Shadow Copies of Shared Folders for the quorum resource, although you can enable Shadow Copies of Shared Folders for a File Share resource.
    The recurring scheduled task that generates volume shadow copies must run on the same node that currently owns the storage volume.
    The cluster resource that manages the scheduled task must be able to fail over with the Physical Disk resource that manages the storage volume.
    If you have any feedback on our support, please send to [email protected]

  • Failed over to a Aysnc Replica and now previous primary replica(Now Secondary) is in NOT SYNC state

    Hello All,
    Here is my situation :
    3 Nodes in an AG configuration, and its a multi-site cluster. Sync commit between 2 nodes in one DC and Async commit to a node in the DR DC. 
    AG is failed over to the Async Replica which is the DR site and all the databases comes up fine and application also can connect using the listener.
    When checked the state of secondary databases, its in NOT SYNC mode. Data is suspended automatically.  I can resume data movement to fix the problem, but was curious why this will be in NOT SYNC mode?
    Thanks in advance.
    Thank you,
    Anup
    <div> <p>Anup | Database Consultant </p> <p></p> <p>Blog: <a href="www.sqlsailor.com/">www.sqlsailor.com</a> Twitter: <a href="https://twitter.com/#!/AnupWarrier"> Follow me !</a>
    </p> <p></p> <p>Please use <b><i>Mark as Answer</i></b> if my post solved your problem and use <b><i>Vote As Helpful </i></b>if a post was useful. </p> </div>

    Hello Anup,
    The reason this happens is because of the forced failover needed to be used when moving to an Async replica. It will cause all other replicas to become suspended due to the fact that it is never known if data loss will occur or not.
    It might not make sense right now, but think about a situation where the databases are not synchronized and failover is forced (it has to work in all situations). There may be a good bit of data on the primary replica that has not yet made it (or partially)
    to the async secondary. It wouldn't make sense to negotiate the primary back down (after all, it's the async one) and undo valid transactions. It also allows for a database snapshot or other method to be done on the old sync primary which could be used for
    DR purposes to get those valid transactions and data out.
    BOL Doc:
    http://msdn.microsoft.com/en-us/library/hh213151.aspx#ForcedFailover
    Sean Gallardy | Blog |
    Twitter

  • Thin Client connection not failing over

    I'm using the following thin client connection and the sessions do not failover. Test with SQLPLUS and the sessions do fail over. One difference I see between the two different connections is the thin connection has NONE for the failover_method and failover_type but the SQLPLUS connection show BASIC for failover_method and SELECT for failover_type.
    Is there any issues with the thin client the version is 10.2.0.3
    jdbc:oracle:thin:@(description=(address_list=(load_balance=YES)(address=(protocol=tcp)(host=crpu306-vip.wm.com)(port=1521))(address=(protocol=tcp)(host=crpu307-vip.wm.com)(port=1521)))(connect_data=(service_name=ocsqat02)(failover_mode=(type=select)(method=basic)(DELAY=5)(RETRIES=180))))

    You have to use (FAILOVER=on) as well on jdbc url.
    http://download.oracle.com/docs/cd/B19306_01/network.102/b14212/advcfg.htm#sthref1292
    Example: TAF with Connect-Time Failover and Client Load Balancing
    Implement TAF with connect-time failover and client load balancing for multiple addresses. In the following example, Oracle Net connects randomly to one of the protocol addresses on sales1-server or sales2-server. If the instance fails after the connection, the TAF application fails over to the other node's listener, reserving any SELECT statements in progress.sales.us.acme.com=
    (DESCRIPTION=
    *(LOAD_BALANCE=on)*
    *(FAILOVER=on)*
    (ADDRESS=
    (PROTOCOL=tcp)
    (HOST=sales1-server)
    (PORT=1521))
    (ADDRESS=
    (PROTOCOL=tcp)
    (HOST=sales2-server)
    (PORT=1521))
    (CONNECT_DATA=
    (SERVICE_NAME=sales.us.acme.com)
    *(FAILOVER_MODE=*
    *(TYPE=select)*
    *(METHOD=basic))))*
    Example: TAF Retrying a Connection
    TAF also provides the ability to automatically retry connecting if the first connection attempt fails with the RETRIES and DELAY parameters. In the following example, Oracle Net tries to reconnect to the listener on sales1-server. If the failover connection fails, Oracle Net waits 15 seconds before trying to reconnect again. Oracle Net attempts to reconnect up to 20 times.sales.us.acme.com=
    (DESCRIPTION=
    (ADDRESS=
    (PROTOCOL=tcp)
    (HOST=sales1-server)
    (PORT=1521))
    (CONNECT_DATA=
    (SERVICE_NAME=sales.us.acme.com)
    *(FAILOVER_MODE=*
    *(TYPE=select)*
    *(METHOD=basic)*
    *(RETRIES=20)*
    *(DELAY=15))))*

  • Stateful bean not failing over

              I have a cluster of two servers and a Admin server. Both servers are running NT
              4 sp6 and WLS6 sp1.
              When I stop one of the servers, the client does n't automatically failover to
              the other server, instead it fails unable to contact server that has failed.
              My bean is configured to have its home clusterable and is a stateful bean. My
              client holds onto the remote interface, and makes calls through this. If Server
              B fails then it should automatically fail over to server A.
              I have tested my multicast address and all seems to be working fine between servers,
              my stateless bean work well, load balancing between servers nicely.
              Does anybody have any ideas, regarding what could be causing the stateful bean
              remote interface not to be providing failover info.
              Also is it true that you can have only one JMS destination queue/topic per cluster..The
              JMS cluster targeting doesn't work at the moment, so you need to deploy to individual
              servers?
              Thanks
              

    Did you enable stateful session bean replication in the
              weblogic-ejb-jar.xml?
              -- Rob
              Wayne Highland wrote:
              >
              > I have a cluster of two servers and a Admin server. Both servers are running NT
              > 4 sp6 and WLS6 sp1.
              > When I stop one of the servers, the client does n't automatically failover to
              > the other server, instead it fails unable to contact server that has failed.
              >
              > My bean is configured to have its home clusterable and is a stateful bean. My
              > client holds onto the remote interface, and makes calls through this. If Server
              > B fails then it should automatically fail over to server A.
              >
              > I have tested my multicast address and all seems to be working fine between servers,
              > my stateless bean work well, load balancing between servers nicely.
              >
              > Does anybody have any ideas, regarding what could be causing the stateful bean
              > remote interface not to be providing failover info.
              >
              > Also is it true that you can have only one JMS destination queue/topic per cluster..The
              > JMS cluster targeting doesn't work at the moment, so you need to deploy to individual
              > servers?
              >
              > Thanks
              Coming Soon: Building J2EE Applications & BEA WebLogic Server
              by Michael Girdley, Rob Woollen, and Sandra Emerson
              http://learnweblogic.com
              

  • NIC not failing Over in Cluster

    Hi there...I have configured 2 Node cluster with SoFS role...for VM Cluster and HA using Windows Server 2012 Data Center. Current set up is Host Server has 3 NICS (2 with Default Gateway setup (192.x.x.x), 3 NIC is for heartbeat 10.X.X.X). Configured CSV
    (can also see the shortcut in the C:\). Planning to setup few VMs pointing to the disk in the 2 separate storage servers (1 NIC in 192.x.x.x) and also have 2 NIC in 10.x.x.x network. I am able to install VM and point the disk to the share in the cluster volume
    1. 
    I have created 2 VM Switch for 2 separate Host server (using Hyper-V manager). When I test the functionality by taking Node 2, I can see the Disk Owner node is changing to Node 1, but the VM NIC 2 is not failing over automatically to VM NIC 1 (but I can
    see the VM NIC 1 is showing up un-selected in the VM Settings). when I go to the VM Settings > Network Adapter, I get error -
    An Error occurred for resource VM "VM Name". select the "information details" action to view events for this resource. The network adapter is configures to a switch which no longer exists or a resource
    pool that has been deleted or renamed (with configuration error in "Virtual Switch" drop down menu). 
    Can you please let me know any resolution to fix this issue...Hoping to hear from you.
    VT

    Hi,
    From your description “My another thing I would like to test is...I also would like to bring a disk down (right now, I have 2 disk - CSV and one Quorum disk) for that 2 node
    cluster. I was testing by bringing a csv disk down, the VM didnt failover” Are you trying to test the failover cluster now? If so, please refer the following related KB:
    Test the Failover of a Clustered Service or Application
    http://technet.microsoft.com/en-us/library/cc754577.aspx
    Hope this helps.
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • Load balancing not happending but fail over is for Read only Entity beans

              The following are the configuration.
              Two NT servers with WL5.1 sp9 having only EJBs(Read only entity beans)
              One Client with WL5.1 sp9 having servlet/java application as
              EJB client.
              I am trying to make a call like findbyprimarykey in one of the
              entity bean. I could see the request is being directed only to the one of the
              server always. When I bring that server, fail over is happening to the other server.
              Here are the settings I have in the ejb-jar.xml :
                        <entity>
                             <ejb-name>device.StartHome</ejb-name>
                             <home>com.wl.api.device.StartHome</home>
                             <remote>com.wl.api.device.StartRemote</remote>
                             <ejb-class>com.wl.server.device.StartImpl</ejb-class>
                             <persistence-type>Bean</persistence-type>
                             <prim-key-class>java.lang.Long</prim-key-class>
                             <reentrant>False</reentrant>
                             <resource-ref>
                                  <res-ref-name>jdbc/wlPool</res-ref-name>
                                  <res-type>javax.sql.DataSource</res-type>
                                  <res-auth>Container</res-auth>
                             </resource-ref>
                        </entity>
              Here are the settings I have in the weblogic-ejb-jar.xml.
              <weblogic-enterprise-bean>
                        <ejb-name>device.StartHome</ejb-name>
                        <caching-descriptor>
                             <max-beans-in-cache>50</max-beans-in-cache>
                             <cache-strategy>Read-Only</cache-strategy>
                             <read-timeout-seconds>900</read-timeout-seconds>
                        </caching-descriptor>
                        <reference-descriptor>
                             <resource-description>
                                  <res-ref-name>jdbc/wlPool</res-ref-name>
                                  <jndi-name>weblogic.jdbc.pool.wlPool</jndi-name>
                             </resource-description>
                        </reference-descriptor>
                        <enable-call-by-reference>False</enable-call-by-reference>
                        <jndi-name>device.StartHome</jndi-name>
                   </weblogic-enterprise-bean>
              Am I doin any mistake in this?
              Any one's help is appreciated.
              Thanks
              Suresh
              

    we are using 5.1
              "Gene Chuang" <[email protected]> wrote in message
              news:[email protected]...
              > Colocation optimization occurs if your client resides in the same
              container (and also in the same
              > EAR for 6.0) as your ejbs.
              >
              > Gene
              >
              > "Suresh" <[email protected]> wrote in message
              news:[email protected]...
              > > Ok....the ejb-call-by-reference set to true is making the call to one
              server
              > > only. i am not sure why it is. I removed the property name and it
              works.
              > > Also I have one question, in our prduct environment, when i cache the
              ejb
              > > home it is not doing the load balancing. can any one help me for that.
              > > thanks
              > >
              > > Mike,
              > > From the sample pgm I sent, even from single client calls get load
              > > balanced.
              > >
              > > Suresh
              > >
              > >
              > > "Gene Chuang" <[email protected]> wrote in message
              > > news:[email protected]...
              > > > In WL, LoadBalancing will ONLY WORK if you reuse your EJBHome! Take
              your
              > > StartEndPointHome lookup
              > > > out of your for loop and see if this fixes your problem.
              > > >
              > > > I've seen this discussion in ejb-interest, and some other vendor
              (Borland,
              > > I believe it is), brings
              > > > up an interesting point: Clustering and LoadBalance is not in the
              J2EE
              > > specs, hence implementation
              > > > is totally up to the vendor. Weblogic loadbalances from the remote
              > > interfaces (EJBObject, EJBHome,
              > > > etc..), while Borland loadbalances from JNDI Context lookup.
              > > >
              > > > Let me suggest a third implmentation: loadbalance from BOTH Context
              > > lookup as well as stub method
              > > > invocation! Or create a smart replica-aware list manager which
              persists
              > > on the client thread
              > > > (ThreadLocal) and is aware of lookup/evocation history. Hence if I do
              the
              > > following in a client
              > > > hitting a 3 node cluster, I'll still get perfect round-robining
              regardless
              > > of what I do on the
              > > > client side:
              > > >
              > > > InitialContext ctxt = new InitialContext();
              > > > EJBHome myHome = ctxt.lookup(MY_BEAN);
              > > > myHome.findByPrimaryKey(pk); <== hits Node #1
              > > > myHome = ctxt.lookup(MY_BEAN);
              > > > myHome.findByPrimaryKey(pk); <== hits Node #2
              > > > myHome.findByPrimaryKey(pk); <== hits Node #3
              > > > myHome = ctxt.lookup(MY_BEAN);
              > > > myHome.findByPrimaryKey(pk); <== hits Node #1
              > > > ...
              > > >
              > > >
              > > > Gene
              > > >
              > > > "Suresh" <[email protected]> wrote in message
              > > news:[email protected]...
              > > > > Mike ,
              > > > >
              > > > > Do you have any reasons for the total number of machines to be 10.
              > > > >
              > > > > I tried with 7 machines.
              > > > >
              > > > >
              > > > > Here is my sample client java application running individual in the
              > > seven
              > > > > machines.
              > > > >
              > > > > StartEndPointHome =
              > > > > (StartEndPointHome)ctx.lookup("dev.StartEndPointHome");
              > > > > for(;;)
              > > > > {
              > > > > // logMsg(" --in loop "+currentTime);
              > > > > if (currentTime > nextRefereshTime)
              > > > > {
              > > > > logMsg("****- going to call");
              > > > > currentTime=getSystemTime();
              > > > > nextRefereshTime=currentTime+timeInterval;
              > > > > StartEndPointHome =
              > > > > (StartEndPointHome)ctx.lookup("dev.StartEndPointHome");
              > > > > long rndno=(long)(Math.random()*10)+range;
              > > > > logMsg(" going to call remotestub"+rndno);
              > > > > retVal =
              > > > >
              > >
              ((StartEndPointHome)getStartHome()).findByNumber("pe"+rndno+"_mportal_dsk36.
              > > > > mportal.com");
              > > > >
              > > > > logMsg("**++- called stub");
              > > > > }
              > > > >
              > > > >
              > > > >
              > > > > The range value is different for each of the machines in the
              cluster.
              > > > >
              > > > > If the first request starts at srv1, all request starts hitting the
              same
              > > > > server.
              > > > > If the first request starts at srv2, all request starts hitting the
              same
              > > > > server.
              > > > >
              > > > > I have the following for the url , user and pwd values for the
              context
              > > .
              > > > >
              > > > > public static String url="t3://10.11.12.14,10.11.12.117:8000";
              > > > > public static String user="guest";
              > > > > public static String password="guest";
              > > > >
              > > > >
              > > > >
              > > > > It would be great if you could help me.
              > > > >
              > > > > Thanks
              > > > > suresh
              > > > >
              > > > >
              > > > > "Mike Reiche" <[email protected]> wrote in message
              > > > > news:[email protected]...
              > > > > >
              > > > > > If you have only one client don't be surprised if you only hit one
              > > server.
              > > > > Try
              > > > > > running ten different clients and see if the hit the same server.
              > > > > >
              > > > > > Mike
              > > > > >
              > > > > >
              > > > > > "suresh" <[email protected]> wrote:
              > > > > > >
              > > > > > >The following are the configuration.
              > > > > > >
              > > > > > > Two NT servers with WL5.1 sp9 having only EJBs(Read only entity
              > > beans)
              > > > > > >
              > > > > > > One Client with WL5.1 sp9 having servlet/java application as
              > > > > > > EJB client.
              > > > > > >
              > > > > > >
              > > > > > >I am trying to make a call like findbyprimarykey in one of the
              > > > > > >entity bean. I could see the request is being directed only to
              the
              > > one
              > > > > > >of the
              > > > > > >server always. When I bring that server, fail over is happening
              to
              > > the
              > > > > > >other server.
              > > > > > >
              > > > > > >
              > > > > > >Here are the settings I have in the ejb-jar.xml :
              > > > > > > <entity>
              > > > > > > <ejb-name>device.StartHome</ejb-name>
              > > > > > > <home>com.wl.api.device.StartHome</home>
              > > > > > > <remote>com.wl.api.device.StartRemote</remote>
              > > > > > > <ejb-class>com.wl.server.device.StartImpl</ejb-class>
              > > > > > > <persistence-type>Bean</persistence-type>
              > > > > > > <prim-key-class>java.lang.Long</prim-key-class>
              > > > > > > <reentrant>False</reentrant>
              > > > > > > <resource-ref>
              > > > > > > <res-ref-name>jdbc/wlPool</res-ref-name>
              > > > > > > <res-type>javax.sql.DataSource</res-type>
              > > > > > > <res-auth>Container</res-auth>
              > > > > > > </resource-ref>
              > > > > > > </entity>
              > > > > > >
              > > > > > >
              > > > > > >Here are the settings I have in the weblogic-ejb-jar.xml.
              > > > > > >
              > > > > > ><weblogic-enterprise-bean>
              > > > > > > <ejb-name>device.StartHome</ejb-name>
              > > > > > >
              > > > > > > <caching-descriptor>
              > > > > > > <max-beans-in-cache>50</max-beans-in-cache>
              > > > > > > <cache-strategy>Read-Only</cache-strategy>
              > > > > > > <read-timeout-seconds>900</read-timeout-seconds>
              > > > > > > </caching-descriptor>
              > > > > > >
              > > > > > > <reference-descriptor>
              > > > > > > <resource-description>
              > > > > > > <res-ref-name>jdbc/wlPool</res-ref-name>
              > > > > > > <jndi-name>weblogic.jdbc.pool.wlPool</jndi-name>
              > > > > > > </resource-description>
              > > > > > > </reference-descriptor>
              > > > > > > <enable-call-by-reference>False</enable-call-by-reference>
              > > > > > > <jndi-name>device.StartHome</jndi-name>
              > > > > > > </weblogic-enterprise-bean>
              > > > > > >
              > > > > > >
              > > > > > >Am I doin any mistake in this?
              > > > > > >
              > > > > > >Any one's help is appreciated.
              > > > > > >Thanks
              > > > > > >Suresh
              > > > > >
              > > > >
              > > > >
              > > >
              > > >
              > >
              > >
              >
              >
              

  • GSLB Zone-Based DNS Payment Gw - Config Active-Active: Not Failing Over

    Hello All:
    Currently having a bit of a problem, have exhausted all resources and brain power dwindling.
    Brief:
    Two geographically diverse sites. Different AS's, different front ends. Migrated from one site with two CSS 11506's to two sites with one 11506 each.
    Flow of connection is as follows:
    Client --> FW Public Destination NAT --> CSS Private content VIP/destination NAT --> server/service --> CSS Source VIP/NAT --> FW Public Source NAT --> client.
    Using Load Balancers as DNS servers, authoritative for zones due to the requirement for second level Domain DNS load balancing (i.e xxxx.com, AND FQDNs http://www.xxxx.com). Thus, CSS is configured to respond as authoritative for xxxx.com, http://www.xxxx.com, postxx.xxxx.com, tmx.xxxx.com, etc..., but of course cannot do MX records, so is also configured with dns-forwarders which consequently were the original DNS servers for the domains. Those DNS servers have had their zone files changed to reflect that the new DNS servers are in fact the CSS'. Domain records (i.e. NS records in the zone file), and the records at the registrar (i.e. tucows, which I believe resells .com, .net and .org for netsol) have been changed to reflect the same. That part of the equation has already been tested and is true to DNS Workings. The reason for the forwarders is of course for things such as non load balanced Domain Names, as well as MX records, etc...
    Due to design, which unfortunately cannot be changed, dns-record configuration uses kal-ap, example:
    dns-record a http://www.xxxx.com 0 111.222.333.444 multiple kal-ap 10.xx.1.xx 254 sticky-enabled weightedrr 10
    So, to explain so we're absolutely clear:
    - 111.222.333.444 is the public address returned to the client.
    - multiple is configured so we return both site addresses for redundancy (unless I'm misunderstanding that configuration option)
    - kal-ap and the 10.xx.1.xx address because due to the configuration we have no other way of knowing the content rule/service is down and to stop advertising the address for said server/rule
    - sticky-enabled because we don't want to lose a payment and have it go through twice or something crazy like that
    - weighterr 10 (and on the other side weightedrr 1) because we want to keep most of the traffic on the site that is closer to where the bulk of the clients are
    So, now, the problem becomes, that the clients (i.e. something like an interac machine, RFID tags...) need to be able to fail over almost instantly to either of the sites should one lose connectivity and/or servers/services. However, this does not happen. The CSS changes it's advertisement, and this has been confirmed by running "nslookups/digs" directly against the CSSs... however, the client does not recognize this and ends up returning a "DNS Error/Page not found".
    Thinking this may have something to do with the "sticky-enabled" and/or the fact that DNS doesn't necessarily react very well to a TTL of "0".
    Any thoughts... comments... suggestions... experiences???
    Much appreciated in advance for any responses!!!
    Oh... should probably add:
    nslookups to some DNS servers consistently - ALWAYS the same ones - take 3 lookups before getting a reply. Other DNS servers are instant....
    Cheers,
    Ben Shellrude
    Sr. Network Analyst
    MTS AllStream Inc

    Hi Ben,
    if I got your posting right the CSSes are doing their job and do advertise the correct IP for a DNS-query right?
    If some of your clients are having a problem this might be related to DNS-caching. Some clients are caching the DNS-response and do not do a refresh until they fail or this timeout is gone.
    Even worse if the request fails you sometimes have to reset the clients DNS-demon so that they are requesting IP-addresses from scratch. I had this issue with some Unixboxes. If I remeber it corretly you can configure the DNS behaviour for unix boxes and can forbidd them to cache DNS responsed.
    Kind Regards,
    joerg

  • BGP in Dual Homing setup not failing over correctly

    Hi all,
    we have dual homed BGP connections to our sister company network but the failover testing is failing.
    If i shutdown the WAN interface on the primary router, after about 5 minutes, everything converges and fails over fine.
    But, if i shut the LAN interface down on the primary router, we never regain connectivity to the sister network.
    Our two ASR's have an iBGP relationship  and I can see that after a certain amount of time, the BGP routes with a next hop of the primary router get flushed from BGP and the prefferred exit path is through the secondary router. This bit works OK, but i believe that the return traffic is still attempting to return over the primary link...
    To add to this, we have two inline firewalls on each link which are only performing IPS, no packet filtering.
    Any pointers would be great.
    thanks
    Mario                

    Hi John,
    right... please look at the output below which is the partial BGP table during a link failure...
    10.128.0.0/9 is the problematic summary that still keeps getting advertised out when we do not want it to during a failure....
    now there are prefixes in the BGP table which fall within that large summary address space. But I am sure that they are all routes that are being advertised to us from the eBGP peer...
    *> 10.128.0.0/9     0.0.0.0                            32768 i
    s> 10.128.56.16/32  172.17.17.241                 150      0 2856 64619 i
    s> 10.128.56.140/32 172.17.17.241                 150      0 2856 64619 i
    s> 10.160.0.0/21    172.17.17.241                 150      0 2856 64611 i
    s> 10.160.14.0/24   172.17.17.241                 150      0 2856 64611 i
    s> 10.160.16.0/24   172.17.17.241                 150      0 2856 64611 i
    s> 10.200.16.8/30   172.17.17.241                 150      0 2856 65008 ?
    s> 10.200.16.12/30  172.17.17.241                 150      0 2856 65006 ?
    s> 10.255.245.0/24  172.17.17.241                 150      0 2856 64548 ?
    s> 10.255.253.4/32  172.17.17.241                 150      0 2856 64548 ?
    s> 10.255.253.10/32 172.17.17.241                 150      0 2856 64548 ?
    s> 10.255.255.8/30  172.17.17.241                 150      0 2856 6670 ?
    s> 10.255.255.10/32 172.17.17.241                 150      0 2856 ?
    s> 10.255.255.12/30 172.17.17.241                 150      0 2856 6670 ?
    s> 10.255.255.14/32 172.17.17.241                 150      0 2856 ?
    i would not expect summary addresses to still be advertised if the specific prefixes are coming from eBGP... am i wrong?
    thanks for everything so far...
    Mario De Rosa

  • Why DML not failed over in TAF??

    Hi,
    I have an OLTP application running on 2 node 10gR2 RAC(10.2.0.3) on AIX 5.3L ML 8. I have configured TAF here for SESSION failover.I would like to know two things from you all:
    1) Though each instance is able to read other instnace's undo tablespace data and redolog, then allso why TAF is not able failover the DML transactions?
    2) As of now is there any way to failover the DML other than cathing the error thrown back to application and re-executing the query?Is it possible in the 11gR1?
    I am gratefull to you all if you are sparing your valuable time to answer this.
    Thanks and Regards,
    Vijay Shanker

    Re: Failover DML on RAC
    The reason is transaction processing and its implications.
    Imagine that you updated a row, then waited idly, then some other session wanted that same row and waited for you to either rollback or commit.
    You failed.
    Automatically, Oracle will rollback your transaction and release all your locks.
    What should the other session do: wait to see that maybe you have TAF or FCF and will reconnect and rerun your uncommitted DML, or should it proceed with its own work?
    Failed session rollback currently happens regardless of whether you or anybody else have TAF, FCF, or even whether you have RAC.
    But in order for you to be able to replay your DML safely after reconnect, that transaction rollback had to be prevented, and your new failed over session should magically re-attach to the failed session's transaction.
    Maybe some day Oracle will implement something like that, but it's not easy, and Oracle leaves it up to the application to decide what to do (TAF-specific error codes).
    On the other hand, replaying selects is fairly easy: re-executing the query (with scn as of the originally failed cursor to ensure read-consistency) and re-fetching up to the point of last fetch.

  • Bea weblogic 6.1 does not oracle Database fail over

    Hi We had Concurrency Strategy:excusive . Now we change that to Database for performace
    reasons. Since we change that now when we do oracle database fail over weblogic
    6.1 does not detect database fail over and it need restart.
    how we can resolve this ??

    mt wrote:
    Hi We had Concurrency Strategy:excusive . Now we change that to Database for performace
    reasons. Since we change that now when we do oracle database fail over weblogic
    6.1 does not detect database fail over and it need restart.
    how we can resolve this ??Are your pools set to test connections at reserve time?
    Joe

  • VIP is not failed over to surviving nodes in oracle 11.2.0.2 grid infra

    Hi ,
    It is a 8 node 11.2.0.2 grid infra.
    While pulling both cables from public nic the VIP is not failed over to surviving nodes in 2 nodes but remainng nodes VIP is failed over to surviving node in the same cluster. Please help me on this.
    If we will remove the power from these servers VIP is failed over to surviving nodes
    Public nic's are in bonding.
    grdoradr105:/apps/grid/grdhome/sh:+ASM5> ./crsstat.sh |grep -i vip |grep -i 101
    ora.grdoradr101.vip ONLINE OFFLINE
    grdoradr101:/apps/grid/grdhome:+ASM1> cat /proc/net/bonding/bond0
    Ethernet Channel Bonding Driver: v3.4.0-1 (October 7, 2008)
    Bonding Mode: fault-tolerance (active-backup)
    Primary Slave: None
    Currently Active Slave: eth0
    MII Status: up
    MII Polling Interval (ms): 100
    Up Delay (ms): 0
    Down Delay (ms): 0
    Slave Interface: eth0
    MII Status: up
    Speed: 100 Mbps
    Duplex: full
    Link Failure Count: 0
    Permanent HW addr: 84:2b:2b:51:3f:1e
    Slave Interface: eth1
    MII Status: up
    Speed: 100 Mbps
    Duplex: full
    Link Failure Count: 0
    Permanent HW addr: 84:2b:2b:51:3f:20
    Thanks
    Bala

    Please check below MOS note for this issue.
    1276737.1
    HTH
    Edited by: krishan on Jul 28, 2011 2:49 AM

  • 3 node cluster with 1 vInstance. vInstance can not to fail-over to one specific node.

    I have a 3 node cluster all running Windows Server 2008 R2. Roughly once a month I see my vInstance become degraded and attempt to fail-over. Everything is good as long as it fail-over to SQL01 or SQL02. However if it attempts to fail-over to SQL03, it does
    not come online
    Quick resolution is to move it manually to SQL01 or SQL02. What could be causing it to fail every time on SQL03.
    A couple points:
    I did not build the environment.
    I am not a DBA.
    I only have general knowledge of SQL clustering.
    I always get two EVENT ID's: 1069
    Cluster resource 'SQL Server (VSQL04)' in clustered service or application 'SQL Server (VSQL04)' failed.
    and then
    EVENT ID 1205
    The Cluster service failed to bring clustered service or application 'SQL Server (VSQL04)' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
    Where should I begin to look for issues?

    Here is the cluster event prior to offline state. I will have to go check the cluster log.
    The Cluster service failed to bring clustered service or application 'SQL Server (VSQL04)' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
    i do not think this helps.. it just says..a resource in offline state.. you need to dig more and see which one it is and why it did not come banck on ..it should be mentioned in the log and\or event viewer.
    Hope it Helps!!

  • Problems with Oracle FailSafe - Primary node not failing over the DB to the

    I am using 11.1.0.7 on Windows 64 bit OS, two nodes clustered at OS level. The Cluster is working fine at Windows level and the shared drive fails over. However, the database does not failover when the primary node is shutdown or restarted.
    The Oracle software is on local drive on each box. The Oracle DB files and Logs are on shared drive.

    Is the database listed in your cluster group that you are failing over?

  • Fail over is not happening in Weblogic JSP Server

    Hi..
    We have 6 Weblogic instances running as application server (EJB) and 4 Weblogic
    instances running as web server (JSP). We have configured one cluster for EJB
    servers and one cluster for JSP servers. In front-end we are using four Apache
    servers to proxy the request to Weblogic JSP cluster. In my httpd.conf file I
    have configured with the Weblogic cluster. I can see the requests are going in
    all the servers and believe the cluster is working fine in terms of load balancing
    (round-robin). The clients are accessing the servers using CSS (Cisco Load Balancer).
    But when we test the fail-over in the cluster, we are facing problems. Let me
    explain the scenarios of the fail-over test:
    1.     The load was generated by the Load Generator
    2.     When the load is there, we shut down one Apache server, even though there was
    some failed transaction, immedialty the servers become stable. So fail-over is
    happening in this stage.
    3.     When I shutdown one EJB instance, again after some failed transactions, the
    transactions become stable
    4.     But, when I shutdown one JSP instance, immediately the transaction failed and
    it is not able to fail over to another JSP server and the number of failed transactions
    increased.
    So I guess, there is some problem in the proxy plug-in configuration, so that
    when I shutdown one JSP server, still the requests are being send to the JSP server
    by the Apache proxy plug-in.
    I have read various queries posted in the News Groups and found some information
    about configuring session and cookie information in the Weblogic.xml file. Also
    I’m not sure what are all the configurations needs to be done in the Weblogic.xml
    and httpd.conf file. Kindly help me to resolve the problem. I would appreciate
    your response.
    ===============================================================
    My httpd.conf file plug-in configuration:
    ###WebLogic Proxy Directives. If proxying to a WebLogic Cluster see WebLogic
    Documentation.
    <IfModule mod_weblogic.c>
    WebLogicCluster X.X.X.X1:7001,X.X.X.X2:7001,X.X.X.X3:7001,X.X.X.X4:7001
    MatchExpression *.jsp
    </IfModule>
    <Location /apollo>
    SetHandler weblogic-handler
    DynamicServerList ON
    HungServerRecoverSecs 600
    ConnectTimeoutSecs 40
    ConnectRetrySecs 2
    </Location>
    ==============================================================
    Thanks in advance,
    Siva.

    Hi,
    I can see that bug 13703600 is already got fixed in 12.1.2 but still you same problem please raise ticket with oracle support.
    Regrds,
    Kal

Maybe you are looking for

  • Corrupted iphoto library

    i've recently experienced volume structure problem with my hard disk and want to recover my iphoto library containing around 100GB of picrtures. The problem I have - - all the data (originals) of the photos are there when i viewed contents of the lib

  • Enhanced internet snails pace

    I thought I would post this before switching to Comcast. I recently upgraded to "enhanced". I am told the best speed that I could purchase was 1.1 to 3.0 mbps. After the "upgrade" I went from .79 to .97. I called the off shore support, and actually s

  • TS1702 downloaded film from itunes onto 4s it says downlowned how do u play on phone

    downloaded movie from itunes on my 4s,says downloaded, doesnt say it in my purchased , how do i watch movie

  • How are contract and service order related....urgent

    Hi all I am ABAP consultant... I am working on Service contract Cost Management threshold alert report... I want clear idea as to how are contract and service order related.... BAsed on contract type and contract start and ,end date,sold to party....

  • Help with filereader

    i am looking for some filereader code and have not been located it by searching google if someone could give me an example code or point to a good tutorial besides the one on Sun's webpage that would be great i have important the correct class and ha