Sun Identity Manger 8.0 and fail over..

We are setting up a fail/recovery site for our Sun Identity Manager solution, I had pictured a seem less fail over, but that looks near impossible to do with oracle database. I had pictured load balanced Appserver, with load balanced data bases, sort of a multi-master like LDAP allows..
Curious what others are using for a fail over site / setup.
Thanks

We're using 7.0. For us failover is basically mulitple servers all using the same DB repository, with a "smart" loadbalancer in front of them (smart meaning, able to detect which back end servers are responsive).
IdM doesn't use any inter-server temp-data synchronization, all the servers running off the same repository communicate by committing changes to the database.
So if a specific IdM instance dies, on the next page load the user will be redirected to a new server. That server will redirect to the login page and ask the user to re-auth, with the desired page placed after login.jsp as a "nextPage" argument. After (re-)logging in, the user's returned to the page they were trying to get to. However, in-progress edits that had not been committed back to the database will be lost.
We looked at high availability arrangements where valid sessions are shared across a new server, but fundamentally the limitation is that the app servers still don't sync in-progress edits, so the only difference between an HA environment and a more passive fail-over environment (like ours) is that in an HA environment the user doesn't have to re-login on a server failure; they still lose in-progress edits. So HA didn't seem like it added value to us.
If you are literally talking about an off-site, completely standby, seamless failover site, I agree I don't see how you would do that. I'd expect that you'd need the offsite setup to be a cold-standby site; configured to use the replicated database, but with the apps powered down until you actually need them. Otherwise, I think you'd have problems with the standby site servers not wanting to "standby". You could ensure no users end up on the standby servers, but background processes are likely to be run across both the primary and the standby services; I don't think you can enforce an "idle but running" status for the standby servers.
Edited by: etech on Feb 4, 2009 7:37 PM

Similar Messages

After adding 2nd WiSM and failing over AP's some apps don't work

We have a dual core made up of 2 6513's. In 6513#1 we have WiSM#1 which we have had for sometime now. We have added a 2nd WiSM in 6513#2 for redundancy purposes also we are going to be re-configuring the WiSM in 6513#1 to more match that of the new WiSM in 6513#2. We have installed the new WiSM and failed over the AP's from 6513#1 so we can re-configure it's WiSM. The failover went great and no issues, with the exception that a web application or two didn't function from wireless clients and users were having issues getting to some mapped drives. The only difference from the new WiSM config vs the old WiSM is that on the old WiSM the AP's were in the same VLAN as the controller management interfaces. Now with the new WiSM it's configuration has the controllers AP mgt interfaces ip addresses in a different VLAN from the AP's, we are doing this based on Cisco best practices. If we revert the AP's back to the original WiSM/controllers the PC's where they are on the same vlan/subnet the applications and shares that were having issues the other way work. We have placed a call with Cisco TAC and they say our configs look good and we even sent them some packet captures and they said everything looks normal. The wireless clients can ping and resolve the server hosting the application database just fine.
Thanks

We did create the mobility groups, and we are using DHCP opt 43. The AP's find the 2nd WiSM#2 just fine and associate to the controllers and all the WLAN's work just fine. The only issue is that after the AP's are on the new WiSM and controllers there is an application or 2 that is having trouble locating it's database server and that some share's are not working. Again the only difference in this new setup in that now the AP's are on a different subnet/vlan from the controller mgt addresses where as before they were in the same subnet/vlan and the application and shares worked fine. It's almost like it is a bit of a routing issue?
Thanks

WLS6.1sp1 stateful EJB problem = load-balancing and fail over

          I have three problem
          1. I have 2 clustered server. my weblogic-ejb-jar.xml is here
          <?xml version="1.0"?>
          <!DOCTYPE weblogic-ejb-jar PUBLIC '-//BEA Systems, Inc.//DTD WebLogic 6.0.0 EJB//EN'
          'http://www.bea.com/servers/wls600/dtd/weblogic-ejb-jar.dtd'>
          <weblogic-ejb-jar>
          <weblogic-enterprise-bean>
               <ejb-name>DBStatefulEJB</ejb-name>
               <stateful-session-descriptor>
               <stateful-session-cache>
                    <max-beans-in-cache>100</max-beans-in-cache>
                    <idle-timeout-seconds>120</idle-timeout-seconds>
               </stateful-session-cache>
               <stateful-session-clustering>
                    <home-is-clusterable>true</home-is-clusterable>
                    <home-load-algorithm>RoundRobin</home-load-algorithm>
                    <home-call-router-class-name>common.QARouter</home-call-router-class-name>
                    <replication-type>InMemory</replication-type>
               </stateful-session-clustering>
               </stateful-session-descriptor>
               <jndi-name>com.daou.EJBS.solutions.DBStatefulBean</jndi-name>
          </weblogic-enterprise-bean>
          </weblogic-ejb-jar>
          when i use "<home-call-router-class-name>common.QARouter</home-call-router-class-name>"
          and deploy this ejb, exception cause
          <Warning> <Dispatcher> <RuntimeException thrown b
          y rmi server: 'weblogic.rmi.cluster.ReplicaAwareServerRef@9 - jvmid: '2903098842
          594628659S:203.231.15.167:[5001,5001,5002,5002,5001,5002,-1]:mydomain:cluster1',
          oid: '9', implementation: 'weblogic.jndi.internal.RootNamingNode@5f39bc''
          java.lang.IllegalArgumentException: Failed to instantiate weblogic.rmi.cluster.B
          asicReplicaHandler due to java.lang.reflect.InvocationTargetException
          at weblogic.rmi.cluster.ReplicaAwareInfo.instantiate(ReplicaAwareInfo.ja
          va:185)
          at weblogic.rmi.cluster.ReplicaAwareInfo.getReplicaHandler(ReplicaAwareI
          nfo.java:105)
          at weblogic.rmi.cluster.ReplicaAwareRemoteRef.initialize(ReplicaAwareRem
          oteRef.java:79)
          at weblogic.rmi.cluster.ClusterableRemoteRef.initialize(ClusterableRemot
          eRef.java:28)
          at weblogic.rmi.cluster.ClusterableRemoteObject.initializeRef(Clusterabl
          eRemoteObject.java:255)
          at weblogic.rmi.cluster.ClusterableRemoteObject.onBind(ClusterableRemote
          Object.java:149)
          at weblogic.jndi.internal.BasicNamingNode.rebindHere(BasicNamingNode.jav
          a:392)
          at weblogic.jndi.internal.ServerNamingNode.rebindHere(ServerNamingNode.j
          ava:142)
          at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
          2)
          at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
          9)
          at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
          9)
          at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
          9)
          at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
          9)
          at weblogic.jndi.internal.RootNamingNode_WLSkel.invoke(Unknown Source)
          at weblogic.rmi.internal.BasicServerRef.invoke(BasicServerRef.java:296)
          So do i must use it or not???
          2. When i don't use "<home-call-router-class-name>common.QARouter</home-call-router-class-name>"
          , there's no exception
          but load balancing does not happen. According to the document , there's must load
          balancing when i call home.create() method.
          my client program goes here
               DBStateful the_ejb1 = (DBStateful) PortableRemoteObject.narrow(home.create(),
          DBStateful.class);
               DBStateful the_ejb2 = (DBStateful) PortableRemoteObject.narrow(home.create(3),
          DBStateful.class);
          the result is like that
               the_ejb1 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@4695a6)/397
               the_ejb2 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@acf6e)/398
               or
               the_ejb1 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@252fdf)/380
               the_ejb2 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@6a0252)/381
               I think the result should be like under one... isn't it??
               the_ejb1 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@4695a6)/397
               the_ejb2 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@6a0252)/381
          In this case i think the_ejb1 and the_ejb2 must have instance in different cluster
          server
          but they go to one server .
          3. If i don't use      "<home-call-router-class-name>common.QARouter</home-call-router-class-name>",
          "<replication-type>InMemory</replication-type>" then load balancing happen but
          there's no fail-over
          So how can i get load-balancing and fail over together??


          I have three problem
          1. I have 2 clustered server. my weblogic-ejb-jar.xml is here
          <?xml version="1.0"?>
          <!DOCTYPE weblogic-ejb-jar PUBLIC '-//BEA Systems, Inc.//DTD WebLogic 6.0.0 EJB//EN'
          'http://www.bea.com/servers/wls600/dtd/weblogic-ejb-jar.dtd'>
          <weblogic-ejb-jar>
          <weblogic-enterprise-bean>
               <ejb-name>DBStatefulEJB</ejb-name>
               <stateful-session-descriptor>
               <stateful-session-cache>
                    <max-beans-in-cache>100</max-beans-in-cache>
                    <idle-timeout-seconds>120</idle-timeout-seconds>
               </stateful-session-cache>
               <stateful-session-clustering>
                    <home-is-clusterable>true</home-is-clusterable>
                    <home-load-algorithm>RoundRobin</home-load-algorithm>
                    <home-call-router-class-name>common.QARouter</home-call-router-class-name>
                    <replication-type>InMemory</replication-type>
               </stateful-session-clustering>
               </stateful-session-descriptor>
               <jndi-name>com.daou.EJBS.solutions.DBStatefulBean</jndi-name>
          </weblogic-enterprise-bean>
          </weblogic-ejb-jar>
          when i use "<home-call-router-class-name>common.QARouter</home-call-router-class-name>"
          and deploy this ejb, exception cause
          <Warning> <Dispatcher> <RuntimeException thrown b
          y rmi server: 'weblogic.rmi.cluster.ReplicaAwareServerRef@9 - jvmid: '2903098842
          594628659S:203.231.15.167:[5001,5001,5002,5002,5001,5002,-1]:mydomain:cluster1',
          oid: '9', implementation: 'weblogic.jndi.internal.RootNamingNode@5f39bc''
          java.lang.IllegalArgumentException: Failed to instantiate weblogic.rmi.cluster.B
          asicReplicaHandler due to java.lang.reflect.InvocationTargetException
          at weblogic.rmi.cluster.ReplicaAwareInfo.instantiate(ReplicaAwareInfo.ja
          va:185)
          at weblogic.rmi.cluster.ReplicaAwareInfo.getReplicaHandler(ReplicaAwareI
          nfo.java:105)
          at weblogic.rmi.cluster.ReplicaAwareRemoteRef.initialize(ReplicaAwareRem
          oteRef.java:79)
          at weblogic.rmi.cluster.ClusterableRemoteRef.initialize(ClusterableRemot
          eRef.java:28)
          at weblogic.rmi.cluster.ClusterableRemoteObject.initializeRef(Clusterabl
          eRemoteObject.java:255)
          at weblogic.rmi.cluster.ClusterableRemoteObject.onBind(ClusterableRemote
          Object.java:149)
          at weblogic.jndi.internal.BasicNamingNode.rebindHere(BasicNamingNode.jav
          a:392)
          at weblogic.jndi.internal.ServerNamingNode.rebindHere(ServerNamingNode.j
          ava:142)
          at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
          2)
          at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
          9)
          at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
          9)
          at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
          9)
          at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
          9)
          at weblogic.jndi.internal.RootNamingNode_WLSkel.invoke(Unknown Source)
          at weblogic.rmi.internal.BasicServerRef.invoke(BasicServerRef.java:296)
          So do i must use it or not???
          2. When i don't use "<home-call-router-class-name>common.QARouter</home-call-router-class-name>"
          , there's no exception
          but load balancing does not happen. According to the document , there's must load
          balancing when i call home.create() method.
          my client program goes here
               DBStateful the_ejb1 = (DBStateful) PortableRemoteObject.narrow(home.create(),
          DBStateful.class);
               DBStateful the_ejb2 = (DBStateful) PortableRemoteObject.narrow(home.create(3),
          DBStateful.class);
          the result is like that
               the_ejb1 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@4695a6)/397
               the_ejb2 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@acf6e)/398
               or
               the_ejb1 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@252fdf)/380
               the_ejb2 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@6a0252)/381
               I think the result should be like under one... isn't it??
               the_ejb1 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@4695a6)/397
               the_ejb2 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@6a0252)/381
          In this case i think the_ejb1 and the_ejb2 must have instance in different cluster
          server
          but they go to one server .
          3. If i don't use      "<home-call-router-class-name>common.QARouter</home-call-router-class-name>",
          "<replication-type>InMemory</replication-type>" then load balancing happen but
          there's no fail-over
          So how can i get load-balancing and fail over together??

Load-balancing and fail-over between web containers and EJB containers

When web components and EJB components are run in different OC4J instances, can we achieve load-balancing and fail-over between web containers and EJB containers?
null

It looks like there is clustering, but not loadbalancing available for rmi
from the rmi.xml configuration. The application will treat any ejbs on the
cluster as one-to-one look-ups. Orion will go out and get the first ejb
available on the cluster. See the docs on configuring rmi.xml (and also the
note below).
That is a kind-of failover, because if machine A goes down, and the
myotherAejbs.jar are on machine B too, orion will go out and get the bean
from machine B when it can't find machine A. But it doesn't go machine A
then machine B for each remote instance of the bean. You could also specify
the maximum number of instances of a bean, and as one machine gets "loaded",
orion would go to the next available machine...but that's not really
loadbalancing.
That is, you can set up your web-apps with ejbs, but let all of the ejbs be
remote="true" in the orion-application.xml file:
<?xml version="1.0"?>
<!DOCTYPE orion-application PUBLIC "-//Evermind//DTD J2EE Application
runtime 1.2//EN" "http://www.orionserver.com/dtds/orion-application.dtd">
<orion-application deployment-version="1.5.2">
<ejb-module remote="true" path="myotherAejbs.jar" />
<ejb-module remote="true" path="myotherBejbs.jar" />
<ejb-module remote="true" path="myotherCejbs.jar" />
&ltweb-module id="mysite" path="mysite.war" />
... other stuff ...
</orion-application>In the rmi.xml you would define your clustering:
<cluster host="230.0.0.1" id="123" password="123abc" port="9127"
username="cluster-user" />
Tag that is defined if the application is to be clustered. Used to set up
a local multicast cluster. A username and password used for the servers to
intercommunicate also needs to be specified.
host - The multicast host/ip to transmit and receive cluster data on. The
default is 230.0.0.1.
id - The id (number) of this cluster node to identify itself with in the
cluster. The default is based on local machine IP.
password - The password configured for cluster access. Needs to match that
of the other nodes in the cluster.
port - The port to transmit and receive cluster data on. The default is
9127.
username - The username configured for cluster access. Needs to match that
of the other nodes in the cluster.

Multiple types of database and fail over clustering

Hi,
I have a few questions here.
1) Can I have 2 types of databases (eg: OLTP and OLAP)run at the same time on a same machine?
2) Can I implement a cross fail over cluster in this situation? Meaning I have 2 machines with OLAP and OLTP database instances installed in them (replica of each other), 1st machine running OLTP and 2nd running OLAP. In the situation where one of machines fail, the passive instance on the other machine takes over (back to situation on question 1).
Thanks
Regards
Lai Ling

Dear All,
My problem is solved by disabling antivirus.
thanks for the support
Sunil
SUNIL PATEL SYSTEM ADMINISTRATOR

2960 and fail over

I am going to hook a 2960 to a wireless antenna and have a backup T-1 to use for fail-over. Can a 2960 do this function?
Shannon

It could if any of the 2960 have a WAN interface such as T1 but it does not and since it's a switch.

Load Balance Network Cards and Fail over services

Hi,
Im looking at setting up 2 MAC server each with basic services (ie AFP, Open Driectory, software updates and DHCP)
Both servers are less then 12 months old and are both attached with GB ports to the RAID where all the user data is stored.
My questions is how to I setup the 2 network interface cards to act as one and load balance the traffic accross both interfaces?
also I was wondering if it was possible to fail over AFP services, so if one server went down the other would pickup file services where it left off?
I know how to fail over OD and the other services dont matter to much.
Thanks in advance for your assistance

My questions is how to I setup the 2 network interface cards to act as one and load balance the traffic accross both interfaces?
This is simple link aggregation in System Preferences -> Network
Click the + button at the bottom and choose new Link Aggregate. Choose the existing interfaces (presumably en0 and en1) and you're set.
Note that this requires support in the switch the server is connected to (it needs to support LACP), and that you will bounce your network connection when you set this up (so don't do it when the server is actively servicing clients)
also I was wondering if it was possible to fail over AFP services, so if one server went down the other would pickup file services where it left off?
It's possible, but you need to be very careful with regards to data integrity. For example, typically each server is going to have a local directory (or directories) that are shared. If Server A fails and Server B takes over, how do you intend to ensure that Server B's data is up-to-date, especially with regard to files that might have been in use at the time?
It's a tricky problem to solve without putting the data on a shared storage device using something like XSAN to manage arbitration, and now you could be talking serious $$$s.
I'd recommend looking closely at your file serving needs and work out if it's necessary, or whether you could get by with dividing the load across servers (e.g. some sharepoints are on one server, other sharepoints on the other) so that only a subset of your users are impacted should one server fail.
File synchronization/replication is a major issue (read $$$$$) for a lot of companies.

Forms load balancing and fail-over

Using cisco context switches instead of metric server/metric client. Having trouble making forms fail-over when a 9ias server fails. Has anybody done this?
If so, how?

Using cisco context switches instead of metric server/metric client. Having trouble making forms fail-over when a 9ias server fails. Has anybody done this?
If so, how?

Re: Connected Environments and Fail/Over

Hi,
Yes, the SO in P1 is a session-duration, and I catch the
DistributedAccessException
and I retry the call in the exception handler, so that Forte directs me
to the next
available replicate. But I still get a DistributedAccessException on the
retry from my
P2 server partition, while the client partitions reconnect successfully.
After further investigation, the difference between the P2 server
partition and the client
ones was that the clients had a -fns ServerA:5000;ServerB:5000 in their
shortcut.
After removing this option, the clients fail on the retry just like P2
does, which
proves that the -fns option is not used only on partition startup, but
has a
greater meaning behind the scenes
The next step was thus to add the -fns option to the P2 server partition,
but then,
when retrying the call from the exception handler, the partition either
hangs
or terminates with the following error :
WARNING: Task [6443A488-9C05-11D1-A703-A8262ADEAA77:0x1bc, 6] (cm.Recv)
terminated while still holding mutex(es).
Locks were cancelled - shared data may be corrupted.
Cancelled mutex: do.NsClient (0x166fd38)
FATAL ERROR: Internal mutex corrupted - terminating partition
Any thoughts ?
Vincent Figari
You don't need to buy Internet access to use free Internet e-mail.
Get completely free e-mail from Juno at http://www.juno.com
Or call Juno at (800) 654-JUNO [654-5866]

Hi,
Yes, the SO in P1 is a session-duration, and I catch the
DistributedAccessException
and I retry the call in the exception handler, so that Forte directs me
to the next
available replicate. But I still get a DistributedAccessException on the
retry from my
P2 server partition, while the client partitions reconnect successfully.
After further investigation, the difference between the P2 server
partition and the client
ones was that the clients had a -fns ServerA:5000;ServerB:5000 in their
shortcut.
After removing this option, the clients fail on the retry just like P2
does, which
proves that the -fns option is not used only on partition startup, but
has a
greater meaning behind the scenes
The next step was thus to add the -fns option to the P2 server partition,
but then,
when retrying the call from the exception handler, the partition either
hangs
or terminates with the following error :
WARNING: Task [6443A488-9C05-11D1-A703-A8262ADEAA77:0x1bc, 6] (cm.Recv)
terminated while still holding mutex(es).
Locks were cancelled - shared data may be corrupted.
Cancelled mutex: do.NsClient (0x166fd38)
FATAL ERROR: Internal mutex corrupted - terminating partition
Any thoughts ?
Vincent Figari
You don't need to buy Internet access to use free Internet e-mail.
Get completely free e-mail from Juno at http://www.juno.com
Or call Juno at (800) 654-JUNO [654-5866]

BPEL, clustering and fail over

How is failure of a BPEL PM node monitored so that in-flight processes can be automatically restarted on one of the surviving PM nodes?

BPEL PM has a built-in retry mechanism and end-point load balancing:
<partnerLinkBinding name="RatingService">
<property name="wsdlLocation">
http://localhost:8080/axis/services/RatingService1?wsdl
http://localhost:8080/axis/services/RatingService2?wsdl
</property>
</partnerLinkBinding>
<partnerLinkBinding name="FlakyService">
<property name="wsdlLocation">http://localhost:8080/axis/services/FlakyService?wsdl</property>
<property name="location">http://localhost:2222/axis/services/FlakyService</property>
<property name="retryCount">2</property>
<property name="retryInterval">60</property>
</partnerLinkBinding>
Please refer to the Resilient Flow Demo for more information, packaged in the product and documented on OTN, for more information:
http://www.oracle.com/technology/products/ias/bpel/htdocs/orabpel_technotes.tn007.html

RAC using SUN Geo Clusters with Fail over

Hi ,
My customer is in the process of investigating and deploying Sun GeoClusters to fail over a RAC from one location to another, the distance between the primary and fail over site is 1200km, they are going to use TrueCopy to replicate the storage across the sites.
I am in the process of gathering information and need to find out more detail and if any one has any knowledge of this software.If anybody knows about the clients who are using(some urls) the same please let me know.
Regards
Manoj

TrueCopy is a way of replicating storage offsite. RAC works using a single source for the database. That means that RAC can not be used simultaneously at both locations with the files being used locally.
If my memory serves me well, Hitachi TrueCopy was OSCP (oracle storage compatiblity program) certified, but the OSCP program seems to be discontinued per januari 2007 (see http://www.oracle.com/technology/deploy/availability/htdocs/oscp.html)
That means that you can use TrueCopy to replicate the storage layer to another location (according to the OSCP note), and use the replicated storage to startup the RAC database in case of failover.

Failing over Oracle connections in a pool

          Hi,
          This message is probably a bit out of context (I've already posted
          it to the JDBC group). I post here as well, since I guess it's
          the place where people have the most experience with clustering
          and HA. Original posting below...
          Could you please tell me whether, yes or no, connections to an
          Oracle database should fail over (when the database fails over
          to another machine)? I use Oracle's Transparent Application Failover
          (configured via Net8) with Weblogic 6 on Linux and Oracle 8.1.7
          on Solaris/SPARC.
          If this doesn't work in my configuration, is there any configuration
          where it should work? (Another version of Oracle, WLS, OS, ...)
          When I try TAF using the PetStore application, I get exceptions
          related to no being connected to the database.
          If TAF doesn't work with WebLogic, is there a way to work around
          the problem? Can I catch these exceptions and renew the connections
          in the pool? Or, what else is possible...?
          I'd appreciate any help. I'd like to demonstrate our HA product
          with WLS. If it doesn't work, I'll turn to iPlanet instead. Pity,
          I really like WLS!
          Thanks in advance for any help or advice!
          Regards, Frank Olsen


          Hi (Frank ;-)
          I got carried away a bit too fast...
          Some more testing shows that it doesn't work in all cases:
          - when someone is trying to check out the shopping cart when the
          the database fails (and fails over), I get exceptions once the
          databses has restarted on the backup node
          - the exceptions are related to some transactions being rolled
          back and Oracle stating that it couldn't safely replay the transactions
          - browsing the categories still works fine
          - all access to the shopping cart and sign-in/sign-out causes time-outs
          and exceptions
          Any ideas what may cause this problem, please?
          Regards,
          Frank Olsen
          "Frank Olsen" <[email protected]> wrote:
          >
          >Hi,
          >
          >TAF worked with WLS 6 on NT with the Oracle 8.1.7 client!
          >
          >Has anyone tested it on Solaris/SPARC?
          >
          >Regards,
          >Frank Olsen
          >
          >
          >
          >"Frank Olsen" <[email protected]> wrote:
          >>
          >>Hi,
          >>
          >>Most of my question below is still valid (in particular
          >>concerning
          >>whether TAF should work with WLS on some or all platforms
          >>and
          >>versions).
          >>
          >>However, when I tested TAF with the Oracle client (sqlplus)
          >>there
          >>also was no failover of the (one) connection. I then
          >checked
          >>the
          >>`V$SESSION' view and the colums related to failover showed
          >>that
          >>TAF was not correctly configured. Strange because I copied
          >>the
          >>`tnsnames.ora' parameters from the Oracle documentation
          >>for TAF.
          >>
          >>Has anyone managed to configure and use TAF, with or
          >without
          >>WLS?!
          >>
          >>Thanks in advance for your help!
          >>
          >>Regards,
          >>Frank Olsen
          >>
          >>
          >>"Frank Olsen" <[email protected]> wrote:
          >>>
          >>>Hi,
          >>>
          >>>This message is probably a bit out of context (I've
          >already
          >>>posted
          >>>it to the JDBC group). I post here as well, since I
          >guess
          >>>it's
          >>>the place where people have the most experience with
          >>clustering
          >>>and HA. Original posting below...
          >>>
          >>>----
          >>>
          >>>Could you please tell me whether, yes or no, connections
          >>>to an
          >>>Oracle database should fail over (when the database
          >fails
          >>>over
          >>>to another machine)? I use Oracle's Transparent Application
          >>>Failover
          >>>(configured via Net8) with Weblogic 6 on Linux and Oracle
          >>>8.1.7
          >>>on Solaris/SPARC.
          >>>
          >>>If this doesn't work in my configuration, is there any
          >>>configuration
          >>>where it should work? (Another version of Oracle,
          >WLS,
          >>>OS, ...)
          >>>
          >>>
          >>>When I try TAF using the PetStore application, I get
          >>exceptions
          >>>related to no being connected to the database.
          >>>
          >>>If TAF doesn't work with WebLogic, is there a way to
          >>work
          >>>around
          >>>the problem? Can I catch these exceptions and renew
          >the
          >>>connections
          >>>in the pool? Or, what else is possible...?
          >>>
          >>>I'd appreciate any help. I'd like to demonstrate our
          >>HA
          >>>product
          >>>with WLS. If it doesn't work, I'll turn to iPlanet instead.
          >>>Pity,
          >>>I really like WLS!
          >>>
          >>>Thanks in advance for any help or advice!
          >>>
          >>>Regards, Frank Olsen
          >>>
          >>
          >

SQL 2005 cluster rejects SQL logins when in failed over state

When SQL 2005 SP4 on Windows 2003 server cluster is failed over from Server_A to Server_B, it rejects all SQL Server logins. domain logins are OK. The message is "user is not associated with a trusted server connection", then the IP of the
client. This is error 18452. Anyone know how to fix this? They should work fine from both servers. We think this started just after installing SP4.
DaveK

Hello,
The connection string is good, you're definitely using sql auth.
LoginMode on Server_B is REG_DWORD 0x00000001 (1) LoginMode on Server_A is REG_DWORD 0x00000002 (2) Looks like you are on to something. I will schedule another test failover. I assume a 2 is mixed mode? If so, why would SQL allow two different modes
on each side of a cluster?
You definitely have a registry replication issue, or at the very least a registry that isn't in sync with the cluster. This could happen for various reasons, none of which we'll probably find out about now, but never the less...
A good test would be to set it to windows only on Node A, wait a minute and then set it to Windows Auth and see if that replicates the registry setting across nodes correctly - this is actually the windows level and doesn't have anything to do with SQL Server.
SQL Server reads this value from the registry and it is not stored inside any databases (read, nothing stored in the master database) as such it's a per machine setting. Since it's not set correctly on Node B, when SQL server starts up it correctly reads
that registry key and acts on it as it should. The culprit isn't SQL Server, it's Windows Clustering.
Hopefully this makes a little more sense now. You can actually just edit the registry setting to match Node A and fail over to B, everything should work correctly. It doesn't help you with a root cause analysis which definitely needs to be done as who knows
what else may not be correctly in sync.
Sean Gallardy | Blog |
Twitter

BGP in Dual Homing setup not failing over correctly

Hi all,
we have dual homed BGP connections to our sister company network but the failover testing is failing.
If i shutdown the WAN interface on the primary router, after about 5 minutes, everything converges and fails over fine.
But, if i shut the LAN interface down on the primary router, we never regain connectivity to the sister network.
Our two ASR's have an iBGP relationship and I can see that after a certain amount of time, the BGP routes with a next hop of the primary router get flushed from BGP and the prefferred exit path is through the secondary router. This bit works OK, but i believe that the return traffic is still attempting to return over the primary link...
To add to this, we have two inline firewalls on each link which are only performing IPS, no packet filtering.
Any pointers would be great.
thanks
Mario

Hi John,
right... please look at the output below which is the partial BGP table during a link failure...
10.128.0.0/9 is the problematic summary that still keeps getting advertised out when we do not want it to during a failure....
now there are prefixes in the BGP table which fall within that large summary address space. But I am sure that they are all routes that are being advertised to us from the eBGP peer...
*> 10.128.0.0/9     0.0.0.0                            32768 i
s> 10.128.56.16/32 172.17.17.241                 150      0 2856 64619 i
s> 10.128.56.140/32 172.17.17.241                 150      0 2856 64619 i
s> 10.160.0.0/21    172.17.17.241                 150      0 2856 64611 i
s> 10.160.14.0/24   172.17.17.241                 150      0 2856 64611 i
s> 10.160.16.0/24   172.17.17.241                 150      0 2856 64611 i
s> 10.200.16.8/30   172.17.17.241                 150      0 2856 65008 ?
s> 10.200.16.12/30 172.17.17.241                 150      0 2856 65006 ?
s> 10.255.245.0/24 172.17.17.241                 150      0 2856 64548 ?
s> 10.255.253.4/32 172.17.17.241                 150      0 2856 64548 ?
s> 10.255.253.10/32 172.17.17.241                 150      0 2856 64548 ?
s> 10.255.255.8/30 172.17.17.241                 150      0 2856 6670 ?
s> 10.255.255.10/32 172.17.17.241                 150      0 2856 ?
s> 10.255.255.12/30 172.17.17.241                 150      0 2856 6670 ?
s> 10.255.255.14/32 172.17.17.241                 150      0 2856 ?
i would not expect summary addresses to still be advertised if the specific prefixes are coming from eBGP... am i wrong?
thanks for everything so far...
Mario De Rosa

Fail over Between Gateways

Hello Thanks in advance...
currently we are using 2 management servers NW.contoso.com and WD.contoso.com and four gateway servers(abc.com,xyz.com,123.com,987.com)reporting to above management servers NW.contoso.com as primary and WD.contoso.com as secondary,
Primary Management Server(NW.contoso.com)
abc.com,xyz.com,123.com,987.com
Failover Management server(WD.contoso.com)
abc.com,xyz.com,123.com,987.com
so my aim is to change the primary management server of 123.com and 987.com to WD.contoso.com and set failover as NW.contoso.com,but when i tried to configure the WD.contoso.com as primary management server for 123.com and 987.com am getting error saying
its not possible to set the same server as primary and fail over.
could any one can help me out of this

Thanks Alexis for your reply,
i removed the fail
over management server using
power shell command, now the server is
reporting to NW.contoso.com(primary MS) and there was no fail-over MS,if i changed the primary management server
to WD.contoso.com using power shell command whether it will get communicated with its new management server,,because
in one article i read "An issue that occurs if there is not a fail over server set up already for the Gateway Server and you change the Primary Server programmatically is that the Gateway Server becomes orphaned due to the Gateway Server still
trying to connect to it’s previous Primary Server, since the Gateway Server does not receive it’s new configuration before the Management Servers and therefore the Management Server rejects the Gateway Server’s connection."

Sun Identity Manger 8.0 and fail over..

Similar Messages

Maybe you are looking for