Fail over not reliable

          When my database fails over, my weblogic 5.1 (sp10) cluster doesn't always reconnect
          when the DB comes back up. It just works for a minute or so, then freezes....
          no errors... I have tested the DB and it is fine, if I restart my app-servers
          they reconnect just fine. As it stands, i get about a 50% success ratio on the
          fail-over.
          I have the following setting for my connection Pool :
          weblogic.jdbc.connectionPool.ejbPool=\
          url=jdbc20:weblogic:oracle,\
          driver=weblogic.jdbc20.oci.Driver,\
          loginDelaySecs=0,\
          initialCapacity=5,\
          maxCapacity=35,\
          capacityIncrement=2,\
          allowShrinking=true,\
          shrinkPeriodMins=5,\
          refreshTestMinutes=1,\
          testConnsOnReserve=true,\
          testConnsOnRelease=true,\
          testTable=dual,\
          props=user=ZZZ;password=YYY;server=XXX;
          ANY help would be appreciated.
          Thanks,
          Jacques

Have you tried resetting the connection pool manually when the database
          server fails over? You can do this via the weblogic.Admin Java program
          from the command line so it is scriptable. Typically, I recommend that
          you add this to your database fail-over scripts so that as soon as the
          database comes back up, the script invokes the command to reset the
          connection pool on each server...
          Raj Alagumalai wrote:
          > Jacques,
          >
          > The value that you have set for refreshTestMinutes is very low.
          >
          > > refreshTestMinutes=1,\
          >
          >
          >
          > This will cause the server to refresh every connection not being used
          > every minute. I would suggest that you increase this value and enable
          > jdbc logging and test failover
          >
          > Thanks
          >
          >
          > Raj Alagumalai
          > Developer Relations Engineer
          > BEA Support
          >
          >
          >
          > jacques Vigeant wrote:
          >
          >> When my database fails over, my weblogic 5.1 (sp10) cluster doesn't
          >> always reconnect
          >> when the DB comes back up. It just works for a minute or so, then
          >> freezes....
          >> no errors... I have tested the DB and it is fine, if I restart my
          >> app-servers
          >> they reconnect just fine. As it stands, i get about a 50% success
          >> ratio on the
          >> fail-over.
          >> I have the following setting for my connection Pool :
          >> weblogic.jdbc.connectionPool.ejbPool=\
          >> url=jdbc20:weblogic:oracle,\
          >> driver=weblogic.jdbc20.oci.Driver,\
          >> loginDelaySecs=0,\
          >> initialCapacity=5,\
          >> maxCapacity=35,\
          >> capacityIncrement=2,\
          >> allowShrinking=true,\
          >> shrinkPeriodMins=5,\
          >> refreshTestMinutes=1,\
          >> testConnsOnReserve=true,\
          >> testConnsOnRelease=true,\
          >> testTable=dual,\
          >> props=user=ZZZ;password=YYY;server=XXX;
          >>
          >>
          >> ANY help would be appreciated.
          >> Thanks,
          >> Jacques
          >>
          >

Similar Messages

Failover cluster server - File Server role is clustered - Shadow copies do not seem to travel to other node when failing over

Hi,
New to 2012 and implementing a clustered environment for our File Services role. Have got to a point where I have successfully configured the Shadow copy settings.
Have a large (15tb) disk. S:
Have a VSS drive (volume shadow copy drive) V:
Have successfully configured through Windows Explorer the Shadow copy settings.
Created dependencies in Failcover Cluster Server console whereby S: depends on V:
However, when I failover the resource and browse the Client Access Point share there are no entries under the "Previous Versions" tab.
When I visit the S: drive in windows explorer and open the Shadow copy dialogue box, there are entries showing the times and dates of the shadow copies ran when on the original node. So the disk knows about the shadow copies that were ran on the
original node but the "previous versions" tab has no entries to display.
This is in a 2012 server (NOT R2 version).
Can anyone explain what might be the reason? Do I have an "issue" or is this by design?
All help apprecieated!
Kathy
Kathleen Hayhurst Senior IT Support Analyst

Hi,
Please first check the requirements in following article:
Using Shadow Copies of Shared Folders in a server cluster
http://technet.microsoft.com/en-us/library/cc779378(v=ws.10).aspx
Cluster-managed shadow copies can only be created in a single quorum device cluster on a disk with a Physical Disk resource. In a single node cluster or majority node set cluster without a shared cluster disk, shadow copies can only be created and managed
locally.
You cannot enable Shadow Copies of Shared Folders for the quorum resource, although you can enable Shadow Copies of Shared Folders for a File Share resource.
The recurring scheduled task that generates volume shadow copies must run on the same node that currently owns the storage volume.
The cluster resource that manages the scheduled task must be able to fail over with the Physical Disk resource that manages the storage volume.
If you have any feedback on our support, please send to [email protected]

Failed over to a Aysnc Replica and now previous primary replica(Now Secondary) is in NOT SYNC state

Hello All,
Here is my situation :
3 Nodes in an AG configuration, and its a multi-site cluster. Sync commit between 2 nodes in one DC and Async commit to a node in the DR DC.
AG is failed over to the Async Replica which is the DR site and all the databases comes up fine and application also can connect using the listener.
When checked the state of secondary databases, its in NOT SYNC mode. Data is suspended automatically. I can resume data movement to fix the problem, but was curious why this will be in NOT SYNC mode?
Thanks in advance.
Thank you,
Anup
<div> Anup | Database Consultant Blog: <a href="www.sqlsailor.com/">www.sqlsailor.com</a> Twitter: <a href="https://twitter.com/#!/AnupWarrier"> Follow me !</a>
 Please use Mark as Answer if my post solved your problem and use Vote As Helpful if a post was useful. </div>

Hello Anup,
The reason this happens is because of the forced failover needed to be used when moving to an Async replica. It will cause all other replicas to become suspended due to the fact that it is never known if data loss will occur or not.
It might not make sense right now, but think about a situation where the databases are not synchronized and failover is forced (it has to work in all situations). There may be a good bit of data on the primary replica that has not yet made it (or partially)
to the async secondary. It wouldn't make sense to negotiate the primary back down (after all, it's the async one) and undo valid transactions. It also allows for a database snapshot or other method to be done on the old sync primary which could be used for
DR purposes to get those valid transactions and data out.
BOL Doc:
http://msdn.microsoft.com/en-us/library/hh213151.aspx#ForcedFailover
Sean Gallardy | Blog |
Twitter

Thin Client connection not failing over

I'm using the following thin client connection and the sessions do not failover. Test with SQLPLUS and the sessions do fail over. One difference I see between the two different connections is the thin connection has NONE for the failover_method and failover_type but the SQLPLUS connection show BASIC for failover_method and SELECT for failover_type.
Is there any issues with the thin client the version is 10.2.0.3
jdbc:oracle:thin:@(description=(address_list=(load_balance=YES)(address=(protocol=tcp)(host=crpu306-vip.wm.com)(port=1521))(address=(protocol=tcp)(host=crpu307-vip.wm.com)(port=1521)))(connect_data=(service_name=ocsqat02)(failover_mode=(type=select)(method=basic)(DELAY=5)(RETRIES=180))))

You have to use (FAILOVER=on) as well on jdbc url.
http://download.oracle.com/docs/cd/B19306_01/network.102/b14212/advcfg.htm#sthref1292
Example: TAF with Connect-Time Failover and Client Load Balancing
Implement TAF with connect-time failover and client load balancing for multiple addresses. In the following example, Oracle Net connects randomly to one of the protocol addresses on sales1-server or sales2-server. If the instance fails after the connection, the TAF application fails over to the other node's listener, reserving any SELECT statements in progress.sales.us.acme.com=
(DESCRIPTION=
*(LOAD_BALANCE=on)*
*(FAILOVER=on)*
(ADDRESS=
(PROTOCOL=tcp)
(HOST=sales1-server)
(PORT=1521))
(ADDRESS=
(PROTOCOL=tcp)
(HOST=sales2-server)
(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=sales.us.acme.com)
*(FAILOVER_MODE=*
*(TYPE=select)*
*(METHOD=basic))))*
Example: TAF Retrying a Connection
TAF also provides the ability to automatically retry connecting if the first connection attempt fails with the RETRIES and DELAY parameters. In the following example, Oracle Net tries to reconnect to the listener on sales1-server. If the failover connection fails, Oracle Net waits 15 seconds before trying to reconnect again. Oracle Net attempts to reconnect up to 20 times.sales.us.acme.com=
(DESCRIPTION=
(ADDRESS=
(PROTOCOL=tcp)
(HOST=sales1-server)
(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=sales.us.acme.com)
*(FAILOVER_MODE=*
*(TYPE=select)*
*(METHOD=basic)*
*(RETRIES=20)*
*(DELAY=15))))*

Stateful bean not failing over

          I have a cluster of two servers and a Admin server. Both servers are running NT
          4 sp6 and WLS6 sp1.
          When I stop one of the servers, the client does n't automatically failover to
          the other server, instead it fails unable to contact server that has failed.
          My bean is configured to have its home clusterable and is a stateful bean. My
          client holds onto the remote interface, and makes calls through this. If Server
          B fails then it should automatically fail over to server A.
          I have tested my multicast address and all seems to be working fine between servers,
          my stateless bean work well, load balancing between servers nicely.
          Does anybody have any ideas, regarding what could be causing the stateful bean
          remote interface not to be providing failover info.
          Also is it true that you can have only one JMS destination queue/topic per cluster..The
          JMS cluster targeting doesn't work at the moment, so you need to deploy to individual
          servers?
          Thanks


Did you enable stateful session bean replication in the
          weblogic-ejb-jar.xml?
          -- Rob
          Wayne Highland wrote:
          >
          > I have a cluster of two servers and a Admin server. Both servers are running NT
          > 4 sp6 and WLS6 sp1.
          > When I stop one of the servers, the client does n't automatically failover to
          > the other server, instead it fails unable to contact server that has failed.
          >
          > My bean is configured to have its home clusterable and is a stateful bean. My
          > client holds onto the remote interface, and makes calls through this. If Server
          > B fails then it should automatically fail over to server A.
          >
          > I have tested my multicast address and all seems to be working fine between servers,
          > my stateless bean work well, load balancing between servers nicely.
          >
          > Does anybody have any ideas, regarding what could be causing the stateful bean
          > remote interface not to be providing failover info.
          >
          > Also is it true that you can have only one JMS destination queue/topic per cluster..The
          > JMS cluster targeting doesn't work at the moment, so you need to deploy to individual
          > servers?
          >
          > Thanks
          Coming Soon: Building J2EE Applications & BEA WebLogic Server
          by Michael Girdley, Rob Woollen, and Sandra Emerson
          http://learnweblogic.com

NIC not failing Over in Cluster

Hi there...I have configured 2 Node cluster with SoFS role...for VM Cluster and HA using Windows Server 2012 Data Center. Current set up is Host Server has 3 NICS (2 with Default Gateway setup (192.x.x.x), 3 NIC is for heartbeat 10.X.X.X). Configured CSV
(can also see the shortcut in the C:\). Planning to setup few VMs pointing to the disk in the 2 separate storage servers (1 NIC in 192.x.x.x) and also have 2 NIC in 10.x.x.x network. I am able to install VM and point the disk to the share in the cluster volume
1.
I have created 2 VM Switch for 2 separate Host server (using Hyper-V manager). When I test the functionality by taking Node 2, I can see the Disk Owner node is changing to Node 1, but the VM NIC 2 is not failing over automatically to VM NIC 1 (but I can
see the VM NIC 1 is showing up un-selected in the VM Settings). when I go to the VM Settings > Network Adapter, I get error -
An Error occurred for resource VM "VM Name". select the "information details" action to view events for this resource. The network adapter is configures to a switch which no longer exists or a resource
pool that has been deleted or renamed (with configuration error in "Virtual Switch" drop down menu).
Can you please let me know any resolution to fix this issue...Hoping to hear from you.
VT

Hi,
From your description “My another thing I would like to test is...I also would like to bring a disk down (right now, I have 2 disk - CSV and one Quorum disk) for that 2 node
cluster. I was testing by bringing a csv disk down, the VM didnt failover” Are you trying to test the failover cluster now? If so, please refer the following related KB:
Test the Failover of a Clustered Service or Application
http://technet.microsoft.com/en-us/library/cc754577.aspx
Hope this helps.
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place.

Load balancing not happending but fail over is for Read only Entity beans

 The following are the configuration.
 Two NT servers with WL5.1 sp9 having only EJBs(Read only entity beans)
 One Client with WL5.1 sp9 having servlet/java application as
 EJB client.
 I am trying to make a call like findbyprimarykey in one of the
 entity bean. I could see the request is being directed only to the one of the
 server always. When I bring that server, fail over is happening to the other server.
 Here are the settings I have in the ejb-jar.xml :
 <entity>
 <ejb-name>device.StartHome</ejb-name>
 <home>com.wl.api.device.StartHome</home>
 <remote>com.wl.api.device.StartRemote</remote>
 <ejb-class>com.wl.server.device.StartImpl</ejb-class>
 <persistence-type>Bean</persistence-type>
 <prim-key-class>java.lang.Long</prim-key-class>
 <reentrant>False</reentrant>
 <resource-ref>
 <res-ref-name>jdbc/wlPool</res-ref-name>
 <res-type>javax.sql.DataSource</res-type>
 <res-auth>Container</res-auth>
 </resource-ref>
 </entity>
 Here are the settings I have in the weblogic-ejb-jar.xml.
 <weblogic-enterprise-bean>
 <ejb-name>device.StartHome</ejb-name>
 <caching-descriptor>
 <max-beans-in-cache>50</max-beans-in-cache>
 <cache-strategy>Read-Only</cache-strategy>
 <read-timeout-seconds>900</read-timeout-seconds>
 </caching-descriptor>
 <reference-descriptor>
 <resource-description>
 <res-ref-name>jdbc/wlPool</res-ref-name>
 <jndi-name>weblogic.jdbc.pool.wlPool</jndi-name>
 </resource-description>
 </reference-descriptor>
 <enable-call-by-reference>False</enable-call-by-reference>
 <jndi-name>device.StartHome</jndi-name>
 </weblogic-enterprise-bean>
 Am I doin any mistake in this?
 Any one's help is appreciated.
 Thanks
 Suresh


we are using 5.1
 "Gene Chuang" <[email protected]> wrote in message
 news:[email protected]...
 > Colocation optimization occurs if your client resides in the same
 container (and also in the same
 > EAR for 6.0) as your ejbs.
 >
 > Gene
 >
 > "Suresh" <[email protected]> wrote in message
 news:[email protected]...
 > > Ok....the ejb-call-by-reference set to true is making the call to one
 server
 > > only. i am not sure why it is. I removed the property name and it
 works.
 > > Also I have one question, in our prduct environment, when i cache the
 ejb
 > > home it is not doing the load balancing. can any one help me for that.
 > > thanks
 > >
 > > Mike,
 > > From the sample pgm I sent, even from single client calls get load
 > > balanced.
 > >
 > > Suresh
 > >
 > >
 > > "Gene Chuang" <[email protected]> wrote in message
 > > news:[email protected]...
 > > > In WL, LoadBalancing will ONLY WORK if you reuse your EJBHome! Take
 your
 > > StartEndPointHome lookup
 > > > out of your for loop and see if this fixes your problem.
 > > >
 > > > I've seen this discussion in ejb-interest, and some other vendor
 (Borland,
 > > I believe it is), brings
 > > > up an interesting point: Clustering and LoadBalance is not in the
 J2EE
 > > specs, hence implementation
 > > > is totally up to the vendor. Weblogic loadbalances from the remote
 > > interfaces (EJBObject, EJBHome,
 > > > etc..), while Borland loadbalances from JNDI Context lookup.
 > > >
 > > > Let me suggest a third implmentation: loadbalance from BOTH Context
 > > lookup as well as stub method
 > > > invocation! Or create a smart replica-aware list manager which
 persists
 > > on the client thread
 > > > (ThreadLocal) and is aware of lookup/evocation history. Hence if I do
 the
 > > following in a client
 > > > hitting a 3 node cluster, I'll still get perfect round-robining
 regardless
 > > of what I do on the
 > > > client side:
 > > >
 > > > InitialContext ctxt = new InitialContext();
 > > > EJBHome myHome = ctxt.lookup(MY_BEAN);
 > > > myHome.findByPrimaryKey(pk); <== hits Node #1
 > > > myHome = ctxt.lookup(MY_BEAN);
 > > > myHome.findByPrimaryKey(pk); <== hits Node #2
 > > > myHome.findByPrimaryKey(pk); <== hits Node #3
 > > > myHome = ctxt.lookup(MY_BEAN);
 > > > myHome.findByPrimaryKey(pk); <== hits Node #1
 > > > ...
 > > >
 > > >
 > > > Gene
 > > >
 > > > "Suresh" <[email protected]> wrote in message
 > > news:[email protected]...
 > > > > Mike ,
 > > > >
 > > > > Do you have any reasons for the total number of machines to be 10.
 > > > >
 > > > > I tried with 7 machines.
 > > > >
 > > > >
 > > > > Here is my sample client java application running individual in the
 > > seven
 > > > > machines.
 > > > >
 > > > > StartEndPointHome =
 > > > > (StartEndPointHome)ctx.lookup("dev.StartEndPointHome");
 > > > > for(;;)
 > > > > {
 > > > > // logMsg(" --in loop "+currentTime);
 > > > > if (currentTime > nextRefereshTime)
 > > > > {
 > > > > logMsg("****- going to call");
 > > > > currentTime=getSystemTime();
 > > > > nextRefereshTime=currentTime+timeInterval;
 > > > > StartEndPointHome =
 > > > > (StartEndPointHome)ctx.lookup("dev.StartEndPointHome");
 > > > > long rndno=(long)(Math.random()*10)+range;
 > > > > logMsg(" going to call remotestub"+rndno);
 > > > > retVal =
 > > > >
 > >
 ((StartEndPointHome)getStartHome()).findByNumber("pe"+rndno+"_mportal_dsk36.
 > > > > mportal.com");
 > > > >
 > > > > logMsg("**++- called stub");
 > > > > }
 > > > >
 > > > >
 > > > >
 > > > > The range value is different for each of the machines in the
 cluster.
 > > > >
 > > > > If the first request starts at srv1, all request starts hitting the
 same
 > > > > server.
 > > > > If the first request starts at srv2, all request starts hitting the
 same
 > > > > server.
 > > > >
 > > > > I have the following for the url , user and pwd values for the
 context
 > > .
 > > > >
 > > > > public static String url="t3://10.11.12.14,10.11.12.117:8000";
 > > > > public static String user="guest";
 > > > > public static String password="guest";
 > > > >
 > > > >
 > > > >
 > > > > It would be great if you could help me.
 > > > >
 > > > > Thanks
 > > > > suresh
 > > > >
 > > > >
 > > > > "Mike Reiche" <[email protected]> wrote in message
 > > > > news:[email protected]...
 > > > > >
 > > > > > If you have only one client don't be surprised if you only hit one
 > > server.
 > > > > Try
 > > > > > running ten different clients and see if the hit the same server.
 > > > > >
 > > > > > Mike
 > > > > >
 > > > > >
 > > > > > "suresh" <[email protected]> wrote:
 > > > > > >
 > > > > > >The following are the configuration.
 > > > > > >
 > > > > > > Two NT servers with WL5.1 sp9 having only EJBs(Read only entity
 > > beans)
 > > > > > >
 > > > > > > One Client with WL5.1 sp9 having servlet/java application as
 > > > > > > EJB client.
 > > > > > >
 > > > > > >
 > > > > > >I am trying to make a call like findbyprimarykey in one of the
 > > > > > >entity bean. I could see the request is being directed only to
 the
 > > one
 > > > > > >of the
 > > > > > >server always. When I bring that server, fail over is happening
 to
 > > the
 > > > > > >other server.
 > > > > > >
 > > > > > >
 > > > > > >Here are the settings I have in the ejb-jar.xml :
 > > > > > > <entity>
 > > > > > > <ejb-name>device.StartHome</ejb-name>
 > > > > > > <home>com.wl.api.device.StartHome</home>
 > > > > > > <remote>com.wl.api.device.StartRemote</remote>
 > > > > > > <ejb-class>com.wl.server.device.StartImpl</ejb-class>
 > > > > > > <persistence-type>Bean</persistence-type>
 > > > > > > <prim-key-class>java.lang.Long</prim-key-class>
 > > > > > > <reentrant>False</reentrant>
 > > > > > > <resource-ref>
 > > > > > > <res-ref-name>jdbc/wlPool</res-ref-name>
 > > > > > > <res-type>javax.sql.DataSource</res-type>
 > > > > > > <res-auth>Container</res-auth>
 > > > > > > </resource-ref>
 > > > > > > </entity>
 > > > > > >
 > > > > > >
 > > > > > >Here are the settings I have in the weblogic-ejb-jar.xml.
 > > > > > >
 > > > > > ><weblogic-enterprise-bean>
 > > > > > > <ejb-name>device.StartHome</ejb-name>
 > > > > > >
 > > > > > > <caching-descriptor>
 > > > > > > <max-beans-in-cache>50</max-beans-in-cache>
 > > > > > > <cache-strategy>Read-Only</cache-strategy>
 > > > > > > <read-timeout-seconds>900</read-timeout-seconds>
 > > > > > > </caching-descriptor>
 > > > > > >
 > > > > > > <reference-descriptor>
 > > > > > > <resource-description>
 > > > > > > <res-ref-name>jdbc/wlPool</res-ref-name>
 > > > > > > <jndi-name>weblogic.jdbc.pool.wlPool</jndi-name>
 > > > > > > </resource-description>
 > > > > > > </reference-descriptor>
 > > > > > > <enable-call-by-reference>False</enable-call-by-reference>
 > > > > > > <jndi-name>device.StartHome</jndi-name>
 > > > > > > </weblogic-enterprise-bean>
 > > > > > >
 > > > > > >
 > > > > > >Am I doin any mistake in this?
 > > > > > >
 > > > > > >Any one's help is appreciated.
 > > > > > >Thanks
 > > > > > >Suresh
 > > > > >
 > > > >
 > > > >
 > > >
 > > >
 > >
 > >
 >
 >

GSLB Zone-Based DNS Payment Gw - Config Active-Active: Not Failing Over

Hello All:
Currently having a bit of a problem, have exhausted all resources and brain power dwindling.
Brief:
Two geographically diverse sites. Different AS's, different front ends. Migrated from one site with two CSS 11506's to two sites with one 11506 each.
Flow of connection is as follows:
Client --> FW Public Destination NAT --> CSS Private content VIP/destination NAT --> server/service --> CSS Source VIP/NAT --> FW Public Source NAT --> client.
Using Load Balancers as DNS servers, authoritative for zones due to the requirement for second level Domain DNS load balancing (i.e xxxx.com, AND FQDNs http://www.xxxx.com). Thus, CSS is configured to respond as authoritative for xxxx.com, http://www.xxxx.com, postxx.xxxx.com, tmx.xxxx.com, etc..., but of course cannot do MX records, so is also configured with dns-forwarders which consequently were the original DNS servers for the domains. Those DNS servers have had their zone files changed to reflect that the new DNS servers are in fact the CSS'. Domain records (i.e. NS records in the zone file), and the records at the registrar (i.e. tucows, which I believe resells .com, .net and .org for netsol) have been changed to reflect the same. That part of the equation has already been tested and is true to DNS Workings. The reason for the forwarders is of course for things such as non load balanced Domain Names, as well as MX records, etc...
Due to design, which unfortunately cannot be changed, dns-record configuration uses kal-ap, example:
dns-record a http://www.xxxx.com 0 111.222.333.444 multiple kal-ap 10.xx.1.xx 254 sticky-enabled weightedrr 10
So, to explain so we're absolutely clear:
- 111.222.333.444 is the public address returned to the client.
- multiple is configured so we return both site addresses for redundancy (unless I'm misunderstanding that configuration option)
- kal-ap and the 10.xx.1.xx address because due to the configuration we have no other way of knowing the content rule/service is down and to stop advertising the address for said server/rule
- sticky-enabled because we don't want to lose a payment and have it go through twice or something crazy like that
- weighterr 10 (and on the other side weightedrr 1) because we want to keep most of the traffic on the site that is closer to where the bulk of the clients are
So, now, the problem becomes, that the clients (i.e. something like an interac machine, RFID tags...) need to be able to fail over almost instantly to either of the sites should one lose connectivity and/or servers/services. However, this does not happen. The CSS changes it's advertisement, and this has been confirmed by running "nslookups/digs" directly against the CSSs... however, the client does not recognize this and ends up returning a "DNS Error/Page not found".
Thinking this may have something to do with the "sticky-enabled" and/or the fact that DNS doesn't necessarily react very well to a TTL of "0".
Any thoughts... comments... suggestions... experiences???
Much appreciated in advance for any responses!!!
Oh... should probably add:
nslookups to some DNS servers consistently - ALWAYS the same ones - take 3 lookups before getting a reply. Other DNS servers are instant....
Cheers,
Ben Shellrude
Sr. Network Analyst
MTS AllStream Inc

Hi Ben,
if I got your posting right the CSSes are doing their job and do advertise the correct IP for a DNS-query right?
If some of your clients are having a problem this might be related to DNS-caching. Some clients are caching the DNS-response and do not do a refresh until they fail or this timeout is gone.
Even worse if the request fails you sometimes have to reset the clients DNS-demon so that they are requesting IP-addresses from scratch. I had this issue with some Unixboxes. If I remeber it corretly you can configure the DNS behaviour for unix boxes and can forbidd them to cache DNS responsed.
Kind Regards,
joerg

BGP in Dual Homing setup not failing over correctly

Hi all,
we have dual homed BGP connections to our sister company network but the failover testing is failing.
If i shutdown the WAN interface on the primary router, after about 5 minutes, everything converges and fails over fine.
But, if i shut the LAN interface down on the primary router, we never regain connectivity to the sister network.
Our two ASR's have an iBGP relationship and I can see that after a certain amount of time, the BGP routes with a next hop of the primary router get flushed from BGP and the prefferred exit path is through the secondary router. This bit works OK, but i believe that the return traffic is still attempting to return over the primary link...
To add to this, we have two inline firewalls on each link which are only performing IPS, no packet filtering.
Any pointers would be great.
thanks
Mario

Hi John,
right... please look at the output below which is the partial BGP table during a link failure...
10.128.0.0/9 is the problematic summary that still keeps getting advertised out when we do not want it to during a failure....
now there are prefixes in the BGP table which fall within that large summary address space. But I am sure that they are all routes that are being advertised to us from the eBGP peer...
*> 10.128.0.0/9     0.0.0.0                            32768 i
s> 10.128.56.16/32 172.17.17.241                 150      0 2856 64619 i
s> 10.128.56.140/32 172.17.17.241                 150      0 2856 64619 i
s> 10.160.0.0/21    172.17.17.241                 150      0 2856 64611 i
s> 10.160.14.0/24   172.17.17.241                 150      0 2856 64611 i
s> 10.160.16.0/24   172.17.17.241                 150      0 2856 64611 i
s> 10.200.16.8/30   172.17.17.241                 150      0 2856 65008 ?
s> 10.200.16.12/30 172.17.17.241                 150      0 2856 65006 ?
s> 10.255.245.0/24 172.17.17.241                 150      0 2856 64548 ?
s> 10.255.253.4/32 172.17.17.241                 150      0 2856 64548 ?
s> 10.255.253.10/32 172.17.17.241                 150      0 2856 64548 ?
s> 10.255.255.8/30 172.17.17.241                 150      0 2856 6670 ?
s> 10.255.255.10/32 172.17.17.241                 150      0 2856 ?
s> 10.255.255.12/30 172.17.17.241                 150      0 2856 6670 ?
s> 10.255.255.14/32 172.17.17.241                 150      0 2856 ?
i would not expect summary addresses to still be advertised if the specific prefixes are coming from eBGP... am i wrong?
thanks for everything so far...
Mario De Rosa

Why DML not failed over in TAF??

Hi,
I have an OLTP application running on 2 node 10gR2 RAC(10.2.0.3) on AIX 5.3L ML 8. I have configured TAF here for SESSION failover.I would like to know two things from you all:
1) Though each instance is able to read other instnace's undo tablespace data and redolog, then allso why TAF is not able failover the DML transactions?
2) As of now is there any way to failover the DML other than cathing the error thrown back to application and re-executing the query?Is it possible in the 11gR1?
I am gratefull to you all if you are sparing your valuable time to answer this.
Thanks and Regards,
Vijay Shanker

Re: Failover DML on RAC
The reason is transaction processing and its implications.
Imagine that you updated a row, then waited idly, then some other session wanted that same row and waited for you to either rollback or commit.
You failed.
Automatically, Oracle will rollback your transaction and release all your locks.
What should the other session do: wait to see that maybe you have TAF or FCF and will reconnect and rerun your uncommitted DML, or should it proceed with its own work?
Failed session rollback currently happens regardless of whether you or anybody else have TAF, FCF, or even whether you have RAC.
But in order for you to be able to replay your DML safely after reconnect, that transaction rollback had to be prevented, and your new failed over session should magically re-attach to the failed session's transaction.
Maybe some day Oracle will implement something like that, but it's not easy, and Oracle leaves it up to the application to decide what to do (TAF-specific error codes).
On the other hand, replaying selects is fairly easy: re-executing the query (with scn as of the originally failed cursor to ensure read-consistency) and re-fetching up to the point of last fetch.

Bea weblogic 6.1 does not oracle Database fail over

Hi We had Concurrency Strategy:excusive . Now we change that to Database for performace
reasons. Since we change that now when we do oracle database fail over weblogic
6.1 does not detect database fail over and it need restart.
how we can resolve this ??

mt wrote:
Hi We had Concurrency Strategy:excusive . Now we change that to Database for performace
reasons. Since we change that now when we do oracle database fail over weblogic
6.1 does not detect database fail over and it need restart.
how we can resolve this ??Are your pools set to test connections at reserve time?
Joe

VIP is not failed over to surviving nodes in oracle 11.2.0.2 grid infra

Hi ,
It is a 8 node 11.2.0.2 grid infra.
While pulling both cables from public nic the VIP is not failed over to surviving nodes in 2 nodes but remainng nodes VIP is failed over to surviving node in the same cluster. Please help me on this.
If we will remove the power from these servers VIP is failed over to surviving nodes
Public nic's are in bonding.
grdoradr105:/apps/grid/grdhome/sh:+ASM5> ./crsstat.sh |grep -i vip |grep -i 101
ora.grdoradr101.vip ONLINE OFFLINE
grdoradr101:/apps/grid/grdhome:+ASM1> cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.4.0-1 (October 7, 2008)
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: eth0
MII Status: up
Speed: 100 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 84:2b:2b:51:3f:1e
Slave Interface: eth1
MII Status: up
Speed: 100 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 84:2b:2b:51:3f:20
Thanks
Bala

Please check below MOS note for this issue.
1276737.1
HTH
Edited by: krishan on Jul 28, 2011 2:49 AM

3 node cluster with 1 vInstance. vInstance can not to fail-over to one specific node.

I have a 3 node cluster all running Windows Server 2008 R2. Roughly once a month I see my vInstance become degraded and attempt to fail-over. Everything is good as long as it fail-over to SQL01 or SQL02. However if it attempts to fail-over to SQL03, it does
not come online
Quick resolution is to move it manually to SQL01 or SQL02. What could be causing it to fail every time on SQL03.
A couple points:
I did not build the environment.
I am not a DBA.
I only have general knowledge of SQL clustering.
I always get two EVENT ID's: 1069
Cluster resource 'SQL Server (VSQL04)' in clustered service or application 'SQL Server (VSQL04)' failed.
and then
EVENT ID 1205
The Cluster service failed to bring clustered service or application 'SQL Server (VSQL04)' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
Where should I begin to look for issues?

Here is the cluster event prior to offline state. I will have to go check the cluster log.
The Cluster service failed to bring clustered service or application 'SQL Server (VSQL04)' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
i do not think this helps.. it just says..a resource in offline state.. you need to dig more and see which one it is and why it did not come banck on ..it should be mentioned in the log and\or event viewer.
Hope it Helps!!

Problems with Oracle FailSafe - Primary node not failing over the DB to the

I am using 11.1.0.7 on Windows 64 bit OS, two nodes clustered at OS level. The Cluster is working fine at Windows level and the shared drive fails over. However, the database does not failover when the primary node is shutdown or restarted.
The Oracle software is on local drive on each box. The Oracle DB files and Logs are on shared drive.

Is the database listed in your cluster group that you are failing over?

Fail over is not happening in Weblogic JSP Server

Hi..
We have 6 Weblogic instances running as application server (EJB) and 4 Weblogic
instances running as web server (JSP). We have configured one cluster for EJB
servers and one cluster for JSP servers. In front-end we are using four Apache
servers to proxy the request to Weblogic JSP cluster. In my httpd.conf file I
have configured with the Weblogic cluster. I can see the requests are going in
all the servers and believe the cluster is working fine in terms of load balancing
(round-robin). The clients are accessing the servers using CSS (Cisco Load Balancer).
But when we test the fail-over in the cluster, we are facing problems. Let me
explain the scenarios of the fail-over test:
1. The load was generated by the Load Generator
2. When the load is there, we shut down one Apache server, even though there was
some failed transaction, immedialty the servers become stable. So fail-over is
happening in this stage.
3. When I shutdown one EJB instance, again after some failed transactions, the
transactions become stable
4. But, when I shutdown one JSP instance, immediately the transaction failed and
it is not able to fail over to another JSP server and the number of failed transactions
increased.
So I guess, there is some problem in the proxy plug-in configuration, so that
when I shutdown one JSP server, still the requests are being send to the JSP server
by the Apache proxy plug-in.
I have read various queries posted in the News Groups and found some information
about configuring session and cookie information in the Weblogic.xml file. Also
I’m not sure what are all the configurations needs to be done in the Weblogic.xml
and httpd.conf file. Kindly help me to resolve the problem. I would appreciate
your response.
===============================================================
My httpd.conf file plug-in configuration:
###WebLogic Proxy Directives. If proxying to a WebLogic Cluster see WebLogic
Documentation.
<IfModule mod_weblogic.c>
WebLogicCluster X.X.X.X1:7001,X.X.X.X2:7001,X.X.X.X3:7001,X.X.X.X4:7001
MatchExpression *.jsp
</IfModule>
<Location /apollo>
SetHandler weblogic-handler
DynamicServerList ON
HungServerRecoverSecs 600
ConnectTimeoutSecs 40
ConnectRetrySecs 2
</Location>
==============================================================
Thanks in advance,
Siva.

Hi,
I can see that bug 13703600 is already got fixed in 12.1.2 but still you same problem please raise ticket with oracle support.
Regrds,
Kal

Fail over not reliable

Similar Messages

Maybe you are looking for