Replication fail-over and reconfiguration

I would like to get a conversation going on the topic of Replication, I have
setup replication on several sites using the Netscape / iPlanet 4.x server
and all has worked fine so far. I now need to produce some documentation and
testing for replication fail-over for the master. I would like to hear from
anyone with some experience on promoting a consumer to a supplier. I'm
looking for the best practice on this issue. Here is what I am thinking,
please feel free to correct me or add input.
Disaster recovery plan:
1.) Select a consumer from the group of read-only replicas
2.) Change the database from Read-Only to Read-Write
3.) Delete the replication agreement (in my case I am using a SIR)
4.) Create a new agreement to reflect the supplier status of the chosen
replica (again a SIR for me)
5.) Reinitialize the consumers (Online or LDIF depending on your number of
entries)
That is the general plan so far. Other questions and topics might include:
1.) What to do when the original master comes back online
2.) DNS round-robin strategies (Hardware assistance, Dynamic DNS, etc)
3.) General backup and recovery procedures when: 1.) Directory is corrupted
2.) Link is down / network is partitioned 3.) Disk / server corruption /
destruction
Well I hope that is a good basis for getting a discussion going. Feel free
to email me if you have questions or I can help you with one of your issues.
Best regards,
Ray Cormier

There is no failover in Meta-Directory 5.1, you can implement manual failover on the metaview by using multi-master replication with Directory Server. There are limitations and this is a manual process.
- Paul

Similar Messages

  • Time Machine Failing Over and Over

    So I am backing up my EX HD with Time Machine for the first time and it keep failing. It stops between 3 gb and about 20gb (I have 180 total). Keeps giving the error "The backup was not performed because an error occurred while copying files to the backup disk ". The HD I was using is a brand new Western Digital that I formatted two partitions. One for my old windows machine and this MBP. I formatted it exactly several times and it keeps failing.
    My neighbor said I need to have a HD that is completely clean, so i went and bought a brand new Iomega HD just for Mac. It also failed over and over! I have read about every thread out there and cant seem to find a possible cause other than my internal HD is toast.
    Has anyone else had this experience before?
    -T

    Sorry I'm not sure I understand your question now.
    Let me try: if you excluded your Documents folder and TM now works, your backup size was likely the problem.
    So if you simply remove Documents from the TM exclusion list you will run back into the size problem... no good.
    If you first delete the large contents from the documents folder on your startup disk, then you can ask TM to backup again the Documents folder by removing it from the TM exclusion list, but obviously only what remains there (not the deleted material) will be saved by TM.
    In other words, if you want to keep your 80 GB files on your startup disk, either you do not backup them or you use a larger backup disk. Otherwise you may remove the 80 GB from your startup disk; but now you probably want to keep 2 copies of them in 2 different external disks, for safety reasons, without using TM.
    Did I answer your question ?
    Piero
    Message was edited by: PieroF

  • ISE admin , PSN and monitoring node fail-over and fall back scenario

    Hi Experts,
    I have question about ISE failover .
    I have two ISE appliaces in two different location . I am trying to understand the fail-over scenario and fall-back scenario
    I have gone through document as well however still not clear.
    my Primary ISE server would have primary admin role , primary monitoring node and secondary ISE would have secondary admin and secondary monitoring role .
    In case of primary ISE appliance failure , I will have to login into secondary ISE node and make admin role as primary but how about if primary ISE comes back ? what would be scenario ?
    during the primary failure will there any impact with users for authentication ? as far as PSN is available from secondary , it should work ...right ?
    and what is the actual method to promote the secondary ISE admin node to primary ? do i have to even manually make monitoring node role changes ?
    will i have to reboot the secondary ISE after promoting admin role to primary  ?

    We have the same set up across an OTV link and have tested this scenario out multiple times. You don't have to do anything if communication is broken between the prim and secondary nodes. The secondary will automatically start authenticating devices that it is in contact with. If you promote the secondary to primary after the link is broke it will assume the primary role when the link is restored and force the former primary nodes to secondary.

  • VPN device with dual ISP, fail-over, and load balancing

    We currently service a client that has a PIX firewall that connects to multiple, separate outside vendors via IPSEC VPN. The VPN connections are mission critical and if for any reason the VPN device or the internet connection (currently only a T1) goes down, the business goes down too. We're looking for a solution that allows dual-ISP, failover, and load balancing. I see that there are several ASA models as well as the IOS that support this but what I'm confused about is what are the requirements for the other end of the VPN, keeping in mind that the other end will always be an outside vendor and out of our control. Current VPN endpoints for outside vendors are to devices like VPN 3000 Concentrator, Sonicwall, etc. that likely do not support any type of fail-over, trunking, load-balancing. Is this just not possible?

    Unless I am mistaken the ASA doesn't do VPN Load Balancing for point-to-point IPSec connections either. What you're really after is opportunistic connection failover, and/or something like DMVPN. Coordinating opportunistic failover shouldn't be too much of an issue with the partners, but be prepared for lot of questions.

  • Users contacts missing after failing over and then failing back pool

    We have 2 Lync enterprise pools that are paired.
    3 days ago, I failed the central management store, and all users from pool01 to pool02.
    This morning, I failed the CMS and all users back from pool02 to pool01.
    All users signed back in to Lync and no issues were reported. A user then contacted me to say that his contact list was empty.
    I had him sign out and back in to Lync, and also had him sign into Lync from a different workstation, as well as his mobile device. All of which showed his contacts list as empty.
    We have unified contacts enabled (Hybrid mode with Office 365 exchange online, and Lync on prem). When I check the users Outlook contacts, I can see all of his contacts listed under "Lync Contacts", along with the current presence of each user.
    If I perform an export-csuserdata for that user's userdata, the XML file contained within the ZIP file shows the contacts that he is missing.
    I've also checked the client log on the workstation too, and can see that Lync can see the contacts as it lists them in the log. They do not appear in the Lync client though.
    Environment details:
    Lync 2013 - 2 enterprise pools running the latest December 2014 CU updates.
    Lync 2013 clients - running on Windows 8.1. User who is experiencing the issue is running client version 15.0.4675.1000 (32 bit)
    I have attempted to re-import the user data using both import-csuserdata (and restarting the front end services) and update-csuserdata. Both of these have had no effect.

    Hi Eason,
    Thanks for your reply. I've doubled checked and can confirm that only one policy exists, which enables it globally.
    I believe this problem relates to issues that always seem to happen when ever our primary pool is failed over to the backup pool, and then failed back.
    What I often see is that upon pool failback, things like response group announcements don't play on inbound calls (white noise is heard, followed by the call disconnecting), and agents somehow get signed out of queues (although they appear to be signed in to
    the queue when checking their Response Group settings in their Lync client. I've also noticed that every time we fail back, a different user will come to me and report that either their entire contacts list is missing, or that half of their contacts are missing.
    I am able to restore these from backup though.
    This appears to happen regardless of if the failover to the backup pool is due to a disaster, or to simply perform pool maintenance on our primary pool.

  • Fail Over and Redundancy with UCCE 7.5

    I have a customer  that is installing UCCE and they want to run side A and side B in stand alone if the visable and private network are both down.. Based on the SRND it states the system looks at the PG with the most active connections and takes over and the other side goes dark. I am desging this in a distributed mode with agants in both sites. Any ideas other than Parent Child.

    ... the system looks at the PG with the most active connections and takes over and the other side goes dark.
    Not quite. Behaviour of a duplex Router pair when the private network breaks is a complex affair.
    As you probably know, the MDS pairs form a "synchronized zone" - one MDS will be PAIRED-ENABLED and the other PAIRED-DISABLED.
    Consider all the PGs out there. On some PGs, the active link of the pgagent will be connected to the ccagent on the enabled side, while on the remainder of the PGs, the pgagent active link will be connected to the disabled side.
    When a pgagent has an active link to the disabled side, that MDS cannot set the message order - it has to send the message to its peer MDS (enabled), who sets the message order, and now both Routers get the message in the same order at the same time.
    Therefore, when the private network breaks, any PGs that have the active link connected to the disabled side will realign to the enabled side. The idle side remains connected - it's just a state change.
    Idle paths and active paths both count for device majority.
    The rules for the enabled side are simple: if it has device majority, it goes straight to ISOLATED-ENABLED. If it doesn't, it goes to ISOLATED-DISABLED.
    The disabled side is more complex. First it checks for device majority. If it has this, it initiates the TOS (test other side) process. If every PG it can communicate with reports that it has no communication to the other side, then it will promote itself to  ISOLATED-ENABLED.
    If the private network breaks and the public network is affected such that neither side has device majority, they both go disabled. Assuming the private link stays down, but the public network starts to come back in stages, eventually the majority of the PGs will be able to talk to one of the disabled sides, and then that will initiate the TOS process, and will go enabled.
    Now let's consider what you have - you say "agents at both sites".
    Let's imagine for a moment you have a 3rd site and 4th site that have no agents - they are just for the central controller. You have a dedicated link between sites 3 and 4 for the private network, and a public network out to sites 1 and 2
    At sites 1 and 2, you have a Call Manager cluster, pair of PGs etc.
    If the private network goes down, one of the sides will run simplex until the network is restored. Routing at sites 1 and 2 is unaffected.
    If the public network to site 1 is down, routing at site 1 is broken until the network is restored. Site 2 is unaffected.
    If the public network to site 2 is down, routing at site 2 is broken until the network is restored. Site 1 is unaffected.
    If both networks are down, the whole system is isolated, no routing occurs until the visible has come back to the point where one of the sides will come up as ISOLATED-ENABLED.
    Now what happens when we colocate the central controllers at the agent sites as in your model. Have we improved the situation? On the surface it looks like we have - and that's what your customer is saying with "they want to run side A and side B in stand alone if the visable and private network are both down".
    When the private link breaks and the public link breaks, each router is ISOLATED-DISABLED and cannot come up because it only sees 1 of 2 PGs (the ones on the LAN at the site). So now you are down on both sites.
    You might address this by installing at site 1 a third PG, configured in the normal way (it doesn't do anything) talking to both Call Routers, one local, one remote. It can be simplex.
    Now when the private link breaks and the public link breaks, site 1 can see the majority of the PGs so it comes up in ISOLATED-ENABLED. Routing resumes at site 1, but site 2 remains off the air. This is the best result you can achieve.
    The most important thing to think about is this: when the private network comes back up, the synchronizers try to do a state transfer. Assuming success, the synchronizers change to PAIRED mode. Now the routers and loggers will exchange state. If each site had been working in simplex mode ("split brain"), then when they come together you will have a totally messed up database. This corrupted state will most likely be unrecoverable.
    It has happened in the past. I'll spare you the gory details.

  • Fail over and Commonj

    We have session bean inside weblogic which creates a number of Work objects executed at cluster of Tangosol Coherence caches. It doesn't wait until created Works complete. If cluster of weblogic servers dies we have an ability to fix integrity problems based on tlog files. What if cluster of cache server dies? Is there any way to find out which Work objects crashed using only Tangosol features?

    It sounds like you may be trying to ensure "at least
    once" (i.e. guaranteed) processing of the work items.
    Is that correct?
    YesOK, first the bad news: The Work Manager implementation in Coherence does not have those guarantees.
    Also, you said previously:
    What if cluster of cache server dies? Is there any way to find out which Work
    objects crashed using only Tangosol features?So you want the work items to survive cluster shut-down. That means that you want them to be persistent. And from your "tlog" comment, I assume you mean transactional as well. None of those qualities is defined by the commonJ spec, but you'll find some of the commonJ implementations (e.g. Redwood) that provide them.
    To achieve similar with Coherence:
    1. Define a partitioned cache (e.g. "pending-work") with write-through (or write-behind) to a database
    2. Place Work items into the cache
    3. On backing map "insert" event(s), issue work item(s); you can do this with local affinity on the work manager, since the work items are already partitioned
    4. On work completion, delete work items from cache
    5. On application startup, preload the "pending-work" cache from the database
    This doesn't guarantee "only once", so the work items MUST be idempotent or otherwise non-destructive.
    You can achieve many guarantees (concurrency, etc.) by taking advantage of other features, but obviously there is some work related to this.
    Peace.

  • CAS ARRAY fail-over and emails stuck

    Dear all,
    for some reasons we are in Exchange Server Coexistance mode that is 1 Exchange 2003 Server and 2. Exchange 2010 servers.
    We have CAS array(node-1 And Node-2) and DAG in place , but the problem is whenever my Node-1 is down emails are getting stuck routing group connector on legacy server and Exchange 2010 to Exchange 2010 emails are working.
    oppositly
    when my Node-2 is down everything works.
    how do I FIX this ?
    TheAtulA

    Dear all,
    for some reasons we are in Exchange Server Coexistance mode that is 1 Exchange 2003 Server and 2. Exchange 2010 servers.
    We have CAS array(node-1 And Node-2) and DAG in place , but the problem is whenever my Node-1 is down emails are getting stuck routing group connector on legacy server and Exchange 2010 to Exchange 2010 emails are working.
    oppositly
    when my Node-2 is down everything works.
    how do I FIX this ?
    TheAtulA
    I assume the CAS also have the hub trasnport installed?
    Check the Routing Group Connector(s) (Get-RoutingGroupConnector) and ensure the source and destination transports include both CAS nodes, not just Node-1.
    If not, then use set-routinggroupconnector to set the correct source and target servers 
    https://technet.microsoft.com/en-us/library/aa998581(v=exchg.141).aspx
    Twitter!: Please Note: My Posts are provided “AS IS” without warranty of any kind, either expressed or implied.

  • Audio Applications in Unity Fail-over

    Hi all,
    I am going to install Cisco Unity with fail-over and what I remember, I should to rebuild the applications like Auto Attendant in secondary server. because this is not part of the replication.
    Am I right? or no need to rebuild the applications?

    Hi JFV,
    That is no longer the case
    How Standby Redundancy Works in Cisco Unity 8.x
    Cisco Unity standby redundancy uses failover functionality to provide duplicate Cisco Unity servers for disaster recovery. The primary server is located at the primary facility, and the secondary server is located at the disaster-recover facility.
    Standby redundancy functions in the following manner:
    •Data is replicated to the secondary server, with the exceptions noted in the "Data That Is Not Replicated in Cisco Unity 8.x" section.
    •Automatic failover is disabled.
    •In the event of a loss of the primary server, the secondary server is manually activated.
    Data That Is Not Replicated in Cisco Unity 8.x
    Changes to the following Cisco Unity settings are not replicated between the primary and secondary servers. You must manually change values on both servers.
    •Registry settings
    •Recording settings
    •Phone language settings
    •GUI language settings
    •Port settings
    •Integration settings
    •Conversation scripts
    •Key mapping scripts (can be modified through the Custom Key Map tool)
    •Media Master server name settings
    •Exchange message store, when installed on the secondary server
    http://www.cisco.com/en/US/docs/voice_ip_comm/unity/8x/failover/guide/8xcufg040.html#wp1099338
    Cheers!
    Rob

  • Bea weblogic 6.1 does not oracle Database fail over

    Hi We had Concurrency Strategy:excusive . Now we change that to Database for performace
    reasons. Since we change that now when we do oracle database fail over weblogic
    6.1 does not detect database fail over and it need restart.
    how we can resolve this ??

    mt wrote:
    Hi We had Concurrency Strategy:excusive . Now we change that to Database for performace
    reasons. Since we change that now when we do oracle database fail over weblogic
    6.1 does not detect database fail over and it need restart.
    how we can resolve this ??Are your pools set to test connections at reserve time?
    Joe

  • Dabase fail over problem after we change Concurrency Strategy:

    Hi We had Concurrency Strategy:excusive . Now we change that to Database for performace
    reasons. Since we change that now when we do oracle database fail over weblogic
    6.1 does not detect database fail over and it need to be rebooted.
    how we can resolve this ??

    Hi,
    It is just faining one of the application servers, developer wrote that when installing CI, Local hostname is written in Database and SDM. We will have to do a Homogeneous system copy to change the name.
    The problem is that I used Virtual SAP group name in CI and DI application servers, in SCS and ASCS  we used Virtual hostnames and it is OK according to SAP developer.
    The Start and instance profiles were checked and everything was fine, just the dispatcher from CI is having problems when comming from Node B to Node A.
    Regards

  • UCCX Purposely Prevent Fail-over

    Hi.  I was wondering if shutting down the engine on a secondary server would be enough to prevent fail-over in an HA environment.
    Basically; we have had a call with TACs on the servers for no apparent reason, failing over and then back.  What we found was happening was that the 2 servers were losing heartbeat to each other, so the secondary server was trying to take control.  This then cause all of our agents to fail over, but calls could get lost as the primary server actually was fully functioning.  This lead us to another TAC case on an error on a router near the secondary server that was causing the loss of heartbeat.  Problem is, that router cannot come down for some time and is due to just be replaced at the end of the year.
    So now, maybe not entirely to my liking; we/someone wants to try just having the primary running and then if worse comes to worse, we can start the secondary back up again and I am curious what the best procedure to do that would be.  Hopes would be that this somehow would at least stop the random fail-overs, even if it doesn't actually address the real issue.

    I have to rely on another guy for the router, switches and UCM side of things and he hasn't said exactly what the error message is, but that he called into TACs and it is supposed to only be cosmetic, and a reboot of the router would clear it.   Unfortunately, where that router is; it will not be brought down until the end of the year on a maintenance window.
    At any rate; the UCCX server has been ruled out as we have had multiple tickets with TACS for the UCCX then to the UCM and they both have been pointing to a network issue that does not get avoided by having the secondary server down, mainly because we do have a CM publisher and subscriber on the same network.

  • WLS6.1sp1 stateful EJB problem =   load-balancing and fail over

              I have three problem
              1. I have 2 clustered server. my weblogic-ejb-jar.xml is here
              <?xml version="1.0"?>
              <!DOCTYPE weblogic-ejb-jar PUBLIC '-//BEA Systems, Inc.//DTD WebLogic 6.0.0 EJB//EN'
              'http://www.bea.com/servers/wls600/dtd/weblogic-ejb-jar.dtd'>
              <weblogic-ejb-jar>
              <weblogic-enterprise-bean>
                   <ejb-name>DBStatefulEJB</ejb-name>
                   <stateful-session-descriptor>
                   <stateful-session-cache>
                        <max-beans-in-cache>100</max-beans-in-cache>
                        <idle-timeout-seconds>120</idle-timeout-seconds>
                   </stateful-session-cache>
                   <stateful-session-clustering>
                        <home-is-clusterable>true</home-is-clusterable>
                        <home-load-algorithm>RoundRobin</home-load-algorithm>
                        <home-call-router-class-name>common.QARouter</home-call-router-class-name>
                        <replication-type>InMemory</replication-type>
                   </stateful-session-clustering>
                   </stateful-session-descriptor>
                   <jndi-name>com.daou.EJBS.solutions.DBStatefulBean</jndi-name>
              </weblogic-enterprise-bean>
              </weblogic-ejb-jar>
              when i use "<home-call-router-class-name>common.QARouter</home-call-router-class-name>"
              and deploy this ejb, exception cause
              <Warning> <Dispatcher> <RuntimeException thrown b
              y rmi server: 'weblogic.rmi.cluster.ReplicaAwareServerRef@9 - jvmid: '2903098842
              594628659S:203.231.15.167:[5001,5001,5002,5002,5001,5002,-1]:mydomain:cluster1',
              oid: '9', implementation: 'weblogic.jndi.internal.RootNamingNode@5f39bc''
              java.lang.IllegalArgumentException: Failed to instantiate weblogic.rmi.cluster.B
              asicReplicaHandler due to java.lang.reflect.InvocationTargetException
              at weblogic.rmi.cluster.ReplicaAwareInfo.instantiate(ReplicaAwareInfo.ja
              va:185)
              at weblogic.rmi.cluster.ReplicaAwareInfo.getReplicaHandler(ReplicaAwareI
              nfo.java:105)
              at weblogic.rmi.cluster.ReplicaAwareRemoteRef.initialize(ReplicaAwareRem
              oteRef.java:79)
              at weblogic.rmi.cluster.ClusterableRemoteRef.initialize(ClusterableRemot
              eRef.java:28)
              at weblogic.rmi.cluster.ClusterableRemoteObject.initializeRef(Clusterabl
              eRemoteObject.java:255)
              at weblogic.rmi.cluster.ClusterableRemoteObject.onBind(ClusterableRemote
              Object.java:149)
              at weblogic.jndi.internal.BasicNamingNode.rebindHere(BasicNamingNode.jav
              a:392)
              at weblogic.jndi.internal.ServerNamingNode.rebindHere(ServerNamingNode.j
              ava:142)
              at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
              2)
              at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
              9)
              at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
              9)
              at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
              9)
              at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
              9)
              at weblogic.jndi.internal.RootNamingNode_WLSkel.invoke(Unknown Source)
              at weblogic.rmi.internal.BasicServerRef.invoke(BasicServerRef.java:296)
              So do i must use it or not???
              2. When i don't use "<home-call-router-class-name>common.QARouter</home-call-router-class-name>"
              , there's no exception
              but load balancing does not happen. According to the document , there's must load
              balancing when i call home.create() method.
              my client program goes here
                   DBStateful the_ejb1 = (DBStateful) PortableRemoteObject.narrow(home.create(),
              DBStateful.class);
                   DBStateful the_ejb2 = (DBStateful) PortableRemoteObject.narrow(home.create(3),
              DBStateful.class);
              the result is like that
                   the_ejb1 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@4695a6)/397
                   the_ejb2 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@acf6e)/398
                   or
                   the_ejb1 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@252fdf)/380
                   the_ejb2 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@6a0252)/381
                   I think the result should be like under one... isn't it??
                   the_ejb1 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@4695a6)/397
                   the_ejb2 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@6a0252)/381
              In this case i think the_ejb1 and the_ejb2 must have instance in different cluster
              server
              but they go to one server .
              3. If i don't use      "<home-call-router-class-name>common.QARouter</home-call-router-class-name>",
              "<replication-type>InMemory</replication-type>" then load balancing happen but
              there's no fail-over
              So how can i get load-balancing and fail over together??
              

              I have three problem
              1. I have 2 clustered server. my weblogic-ejb-jar.xml is here
              <?xml version="1.0"?>
              <!DOCTYPE weblogic-ejb-jar PUBLIC '-//BEA Systems, Inc.//DTD WebLogic 6.0.0 EJB//EN'
              'http://www.bea.com/servers/wls600/dtd/weblogic-ejb-jar.dtd'>
              <weblogic-ejb-jar>
              <weblogic-enterprise-bean>
                   <ejb-name>DBStatefulEJB</ejb-name>
                   <stateful-session-descriptor>
                   <stateful-session-cache>
                        <max-beans-in-cache>100</max-beans-in-cache>
                        <idle-timeout-seconds>120</idle-timeout-seconds>
                   </stateful-session-cache>
                   <stateful-session-clustering>
                        <home-is-clusterable>true</home-is-clusterable>
                        <home-load-algorithm>RoundRobin</home-load-algorithm>
                        <home-call-router-class-name>common.QARouter</home-call-router-class-name>
                        <replication-type>InMemory</replication-type>
                   </stateful-session-clustering>
                   </stateful-session-descriptor>
                   <jndi-name>com.daou.EJBS.solutions.DBStatefulBean</jndi-name>
              </weblogic-enterprise-bean>
              </weblogic-ejb-jar>
              when i use "<home-call-router-class-name>common.QARouter</home-call-router-class-name>"
              and deploy this ejb, exception cause
              <Warning> <Dispatcher> <RuntimeException thrown b
              y rmi server: 'weblogic.rmi.cluster.ReplicaAwareServerRef@9 - jvmid: '2903098842
              594628659S:203.231.15.167:[5001,5001,5002,5002,5001,5002,-1]:mydomain:cluster1',
              oid: '9', implementation: 'weblogic.jndi.internal.RootNamingNode@5f39bc''
              java.lang.IllegalArgumentException: Failed to instantiate weblogic.rmi.cluster.B
              asicReplicaHandler due to java.lang.reflect.InvocationTargetException
              at weblogic.rmi.cluster.ReplicaAwareInfo.instantiate(ReplicaAwareInfo.ja
              va:185)
              at weblogic.rmi.cluster.ReplicaAwareInfo.getReplicaHandler(ReplicaAwareI
              nfo.java:105)
              at weblogic.rmi.cluster.ReplicaAwareRemoteRef.initialize(ReplicaAwareRem
              oteRef.java:79)
              at weblogic.rmi.cluster.ClusterableRemoteRef.initialize(ClusterableRemot
              eRef.java:28)
              at weblogic.rmi.cluster.ClusterableRemoteObject.initializeRef(Clusterabl
              eRemoteObject.java:255)
              at weblogic.rmi.cluster.ClusterableRemoteObject.onBind(ClusterableRemote
              Object.java:149)
              at weblogic.jndi.internal.BasicNamingNode.rebindHere(BasicNamingNode.jav
              a:392)
              at weblogic.jndi.internal.ServerNamingNode.rebindHere(ServerNamingNode.j
              ava:142)
              at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
              2)
              at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
              9)
              at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
              9)
              at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
              9)
              at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
              9)
              at weblogic.jndi.internal.RootNamingNode_WLSkel.invoke(Unknown Source)
              at weblogic.rmi.internal.BasicServerRef.invoke(BasicServerRef.java:296)
              So do i must use it or not???
              2. When i don't use "<home-call-router-class-name>common.QARouter</home-call-router-class-name>"
              , there's no exception
              but load balancing does not happen. According to the document , there's must load
              balancing when i call home.create() method.
              my client program goes here
                   DBStateful the_ejb1 = (DBStateful) PortableRemoteObject.narrow(home.create(),
              DBStateful.class);
                   DBStateful the_ejb2 = (DBStateful) PortableRemoteObject.narrow(home.create(3),
              DBStateful.class);
              the result is like that
                   the_ejb1 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@4695a6)/397
                   the_ejb2 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@acf6e)/398
                   or
                   the_ejb1 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@252fdf)/380
                   the_ejb2 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@6a0252)/381
                   I think the result should be like under one... isn't it??
                   the_ejb1 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@4695a6)/397
                   the_ejb2 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@6a0252)/381
              In this case i think the_ejb1 and the_ejb2 must have instance in different cluster
              server
              but they go to one server .
              3. If i don't use      "<home-call-router-class-name>common.QARouter</home-call-router-class-name>",
              "<replication-type>InMemory</replication-type>" then load balancing happen but
              there's no fail-over
              So how can i get load-balancing and fail over together??
              

  • Load Balance Network Cards and Fail over services

    Hi,
    Im looking at setting up 2 MAC server each with basic services (ie AFP, Open Driectory, software updates and DHCP)
    Both servers are less then 12 months old and are both attached with GB ports to the RAID where all the user data is stored.
    My questions is how to I setup the 2 network interface cards to act as one and load balance the traffic accross both interfaces?
    also I was wondering if it was possible to fail over AFP services, so if one server went down the other would pickup file services where it left off?
    I know how to fail over OD and the other services dont matter to much.
    Thanks in advance for your assistance

    My questions is how to I setup the 2 network interface cards to act as one and load balance the traffic accross both interfaces?
    This is simple link aggregation in System Preferences -> Network
    Click the + button at the bottom and choose new Link Aggregate. Choose the existing interfaces (presumably en0 and en1) and you're set.
    Note that this requires support in the switch the server is connected to (it needs to support LACP), and that you will bounce your network connection when you set this up (so don't do it when the server is actively servicing clients)
    also I was wondering if it was possible to fail over AFP services, so if one server went down the other would pickup file services where it left off?
    It's possible, but you need to be very careful with regards to data integrity. For example, typically each server is going to have a local directory (or directories) that are shared. If Server A fails and Server B takes over, how do you intend to ensure that Server B's data is up-to-date, especially with regard to files that might have been in use at the time?
    It's a tricky problem to solve without putting the data on a shared storage device using something like XSAN to manage arbitration, and now you could be talking serious $$$s.
    I'd recommend looking closely at your file serving needs and work out if it's necessary, or whether you could get by with dividing the load across servers (e.g. some sharepoints are on one server, other sharepoints on the other) so that only a subset of your users are impacted should one server fail.
    File synchronization/replication is a major issue (read $$$$$) for a lot of companies.

  • OCR and voting disks on ASM, problems in case of fail-over instances

    Hi everybody
    in case at your site you :
    - have an 11.2 fail-over cluster using Grid Infrastructure (CRS, OCR, voting disks),
    where you have yourself created additional CRS resources to handle single-node db instances,
    their listener, their disks and so on (which are started only on one node at a time,
    can fail from that node and restart to another);
    - have put OCR and voting disks into an ASM diskgroup (as strongly suggested by Oracle);
    then you might have problems (as we had) because you might:
    - reach max number of diskgroups handled by an ASM instance (63 only, above which you get ORA-15068);
    - experiment delays (especially in case of multipath), find fake CRS resources, etc.
    whenever you dismount disks from one node and mount to another;
    So (if both conditions are true) you might be interested in this story,
    then please keep reading on for the boring details.
    One step backward (I'll try to keep it simple).
    Oracle Grid Infrastructure is mainly used by RAC db instances,
    which means that any db you create usually has one instance started on each node,
    and all instances access read / write the same disks from each node.
    So, ASM instance on each node will mount diskgroups in Shared Mode,
    because the same diskgroups are mounted also by other ASM instances on the other nodes.
    ASM instances have a spfile parameter CLUSTER_DATABASE=true (and this parameter implies
    that every diskgroup is mounted in Shared Mode, among other things).
    In this context, it is quite obvious that Oracle strongly recommends to put OCR and voting disks
    inside ASM: this (usually called CRS_DATA) will become diskgroup number 1
    and ASM instances will mount it before CRS starts.
    Then, additional diskgroup will be added by users, for DATA, REDO, FRA etc of each RAC db,
    and will be mounted later when a RAC db instance starts on the specific node.
    In case of fail-over cluster, where instances are not RAC type and there is
    only one instance running (on one of the nodes) at any time for each db, it is different.
    All diskgroups of db instances don't need to be mounted in Shared Mode,
    because they are used by one instance only at a time
    (on the contrary, they should be mounted in Exclusive Mode).
    Yet, if you follow Oracle advice and put OCR and voting inside ASM, then:
    - at installation OUI will start ASM instance on each node with CLUSTER_DATABASE=true;
    - the first diskgroup, which contains OCR and votings, will be mounted Shared Mode;
    - all other diskgroups, used by each db instance, will be mounted Shared Mode, too,
    even if you'll take care that they'll be mounted by one ASM instance at a time.
    At our site, for our three-nodes cluster, this fact has two consequences.
    One conseguence is that we hit ORA-15068 limit (max 63 diskgroups) earlier than expected:
    - none ot the instances on this cluster are Production (only Test, Dev, etc);
    - we planned to have usually 10 instances on each node, each of them with 3 diskgroups (DATA, REDO, FRA),
    so 30 diskgroups each node, for a total of 90 diskgroups (30 instances) on the cluster;
    - in case one node failed, surviving two should get resources of the failing node,
    in the worst case: one node with 60 diskgroups (20 instances), the other one with 30 diskgroups (10 instances)
    - in case two nodes failed, the only node survived should not be able to mount additional diskgroups
    (because of limit of max 63 diskgroup mounted by an ASM instance), so all other would remain unmounted
    and their db instances stopped (they are not Production instances);
    But it didn't worked, since ASM has parameter CLUSTER_DATABASE=true, so you cannot mount 90 diskgroups,
    you can mount 62 globally (once a diskgroup is mounted on one node, it is given a number between 2 and 63,
    and other diskgroups mounted on other nodes cannot reuse that number).
    So as a matter of fact we can mount only 21 diskgroups (about 7 instances) on each node.
    The second conseguence is that, every time our CRS handmade scripts dismount diskgroups
    from one node and mount it to another, there are delays in the range of seconds (especially with multipath).
    Also we found inside CRS log that, whenever we mounted diskgroups (on one node only), then
    behind the scenes were created on the fly additional fake resources
    of type ora*.dg, maybe to accomodate the fact that on other nodes those diskgroups were left unmounted
    (once again, instances are single-node here, and not RAC type).
    That's all.
    Did anyone go into similar problems?
    We opened a SR to Oracle asking about what options do we have here, and we are disappointed by their answer.
    Regards
    Oscar

    Hi Klaas-Jan
    - best practises require that also online redolog files are in a separate diskgroup, in case of ASM logical corruption (we are a little bit paranoid): in case DATA dg gets corrupted, you can restore Full backup plus Archived RedoLog plus Online Redolog (otherwise you will stop at the latest Archived).
    So we have 3 diskgroups for each db instance: DATA, REDO, FRA.
    - in case of fail-over cluster (active-passive), Oracle provide some templates of CRS scripts (in $CRS_HOME/crs/crs/public) that you edit and change at your will, also you might create additionale scripts in case of additional resources you might need (Oracle Agents, backups agent, file systems, monitoring tools, etc)
    About our problem, the only solution is to move OCR and voting disks from ASM and change pfile af all ASM instance (parameter CLUSTER_DATABASE from true to false ).
    Oracle aswers were a litlle bit odd:
    - first they told us to use Grid Standalone (without CRS, OCR, voting at all), but we told them that we needed a Fail-over solution
    - then they told us to use RAC Single Node, which actually has some better features, in csae of planned fail-over it might be able to migreate
    client sessions without causing a reconnect (for SELECTs only, not in case of a running transaction), but we already have a few fail-over cluster, we cannot change them all
    So we plan to move OCR and voting disks into block devices (we think that the other solution, which needs a Shared File System, will take longer).
    Thanks Marko for pointing us to OCFS2 pros / cons.
    We asked Oracle a confirmation that it supported, they said yes but it is discouraged (and also, doesn't work with OUI nor ASMCA).
    Anyway that's the simplest approach, this is a non-Prod cluster, we'll start here and if everthing is fine, after a while we'll do it also on Prod ones.
    - Note 605828.1, paragraph 5, Configuring non-raw multipath devices for Oracle Clusterware 11g (11.1.0, 11.2.0) on RHEL5/OL5
    - Note 428681.1: OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE)
    -"Grid Infrastructure Install on Linux", paragraph 3.1.6, Table 3-2
    Oscar

Maybe you are looking for

  • How to change the mode of a file dynamically?

    i want to delete a .class file which has been created dynamically... when i tried to delete it .. its prompt access denied message..

  • My database not listed at SQL data source in Web Expression 4

    Can anyone help me out? Am not sure whether this is the correct place for my problem. I set up SQL Server 2005. Created a database named "employee" and a table under it "emp_info" I tried to connect it from Microsoft Web Expression 4 through SQL Data

  • Doubts on JDBC & FIle

    1.  IN A DATABASE -> IDOC scenario, if a reverse scenario, when ever an idoc is triggered it gets updated automatically into DB because of the ALE settings we do. But when DB-> r/3, how does this automation takes place. i mean.. when ever a new recor

  • I installed the latest version of Flash, but PowerPoint 2010 won't play videos

    I installed the latest version of Flash, but PowerPoint 2010 won't play embedded videos.  I keep getting a message telling me to install Flash, even though it is clearly on my machine.

  • Safari Menu Tabs

    In Safari, how can I set the trackpad to open menu links in a new tab with Better Touch Tool? I tried assigning a gesture to 'CMD Click' and it doesn't open a new tab. Any ideas?