Fault Tolerance in Nested Lab

Hi everyone,
I am trying to set up FT in a nested lab.I have followed the instructions on How to configure VMware Fault Tolerance on native and in “nested” ESXi environments | I wish I could… (German).
But after I turn on FT and then try to start the VM I get the following error message:
I have already tried to set the fsr.maxSwitchoverSeconds value to 100 or even higher but I always get the same message with 8 seconds.
Are there any other known sources for this error?
Thanks.

Hi,
have you also looked at the restrictions this FT feature has? As far as my information goes it can only be used with 1 CPU, not to mention the overhead it produces, since all instructions have to be executed on 2 nodes.
Also this feature only helps you if your whole guest system crashes. What if your software inside the VM crashes? It will not run in both environments.
If you are looking for real HA (in virtualized environments), I suggest using RAC in OVM. It will get you down to 30 seconds but will do far better regarding functionality.
Sebastian

Similar Messages

  • Load Balancing & Fault tolerance in Oracle Apps

    Hello
    I am pretty new to Oracle Apps and Oracle 9iAS
    We are trying to asked to find out the ways of deplying load balancing and fault tolerance in Oracle Apps
    I have gone thru the following articles of metalink and some more from google
    Configuring Web Cache as a Load Balancer for Application Servers
    Create new middle tier node in existing Apps 11i environment using cloning and then load balance
    Integrating and using Web Cache with Forms 9i for Load Balancing
    Running Multiple OC4J Instances From a Single Install of 9iAS 1.0.2.2
    LOAD BALANCING ORACLE APPLICATIONS ON UNIX
    Load Balancing in 11i
    OC4J Clustering Setup
    OC4J Load Balancing for Forms 9i
    Setting up 11i E-Business suite using a hardware load balancer
    Sharing an APPL_TOP in Oracle Applications 11i
    In Oracle Apps Concepts pdf, found that
    Load balancing occurs when there are multiple installations of web server, forms server, reports server, concurrent manager server etc
    Lets consider forms server component of middle tier
    Can there be multiple INSTANCES of forms server within SINGLE INSTALLATION ???
    If it has to be MULTIPLE INSTALLATIONS, then it will require multiple physical machines
    What is software load balancing then ???
    Also read that
    Oracle 9i AS instance is combition of Oracle HTTP Server (OHS) and one or more instances of Oracle9iAS Container for J2EE (OC4J)
    Thus software load balancing can be implemented for Web server component of middle tier using multiple instances of OC4J ???
    Forms metrix server configuration is hardware load balancing ??
    What about Oracle9iAS Web Cache clustering ??
    Please clarify the doubts and suggest me the way
    Thanks a lot

    dear all can any one help me on the following;
    We want to install Oracle Apps 11.5.9 on 4 IBM AIX 5L boxes, the first node
    will hold the database , concurrent and Admin Tiers. as for the other 3 nodes
    it will hold the Forms/Web Tiers and we need to use forms metrics for load
    balancing taking into consideration that we can't use Shared APPL_TOP. we tried
    to find the way how to install and configure this solution but all the
    documents are talking aboout the concept only and this concept is applicable.
    We need the exact steps on how do the installation either using rapidwiz or any
    thing else
    fadi

  • Fault Tolerance - MCS 7825 H3 & MCS 7835 H2 Servers

    Hi,
    I have enabled network fault tolerance on my CUC and CCM Servers, i thought after enabling the network fault tolerance the server would change the mac address to Virtual mac address which indeed it may affect the licenses.
    But after enabling fault tolerance and restarting the server also i didn't see any change in the MAC ID. Its the same MAC ID.
    Pls advice.
    Regards
    Jagadish G

    Hi Jagadish,
    do u get the option of 1000 MB/s  when u configure set network nic eth0  speed ? can u check?
    secondly, Bond 0 would be showing Auto disabled which cannot be changed.
    Can u share snapshot of show netowrk failover?
    regds,
    aman

  • Deterministic Fault Tolerant Load Balancing

    The USA has an unfortunate penchant for granting patents that arguably do not merit patent protection. Some of these are things that are blindingly obvious. Others are just not sufficiently inventive.
    Anyway, since I have no funds for patent searches, nor patent applications, and there are some other complications, I've decided to post this to establish prior-art for an algorithm. I don't claim that the algorithm is clever, nor novel, nor even that it violates no existing patents. This posting is simply to ensure that to the extent that someone might be granted a patent on it, they can't, because it has already been published.
    The Java connection is that I've done a fair amount of the work required to turn this into a real system in Java.
    Suppose you have set of processors, p0 thru pn-1, and each piece of work to be performed by a processor has some number k associated with it. The problem is to allocate the work roughly equally across the subset of processors that are actually functioning. Further, over a period of time, a series of related pieces of work may arrive with the same k. To the maximum possible extent you want each of the related pieces of work to be handled by the same processor. If a processor fails, you want its work to be distributed across the remaining processors, but still maintaining the property that pieces of work with a given value for k are handled by the same processor. In general we assume that the k values are randomly spread through a large number space.
    The motivation for these requirements is that for a given k the processor may be caching information that improves performance. Or it may be enforcing some invariant, such as in a lock manager where each request for a given lock must go to the same processor, or it clearly won't function.
    To achieve this, construct a list of integers of size n. Element i contains i if processor i is functional, and -1 otherwise.
    Calculate k mod n, and use the result as an index into the list. If the value contained there is non-negative, then it is the number of the processor to use. If it is -1, remove the element from the list, decrement the value of n and repeat. Continue until a processor number is found.
    This scheme is fault tolerant to a degree, in that the resulting system has a high level of availability.
    It also has the property that the failure of a processor only impacts on the allocation of pieces of work that would have been allocated to the failed processor. It does not result in a complete rearrangement of the work allocations. This makes things a lot simpler when dealing with things like distributed lock managers.
    The fault tolerance can be improved by an extension of the algorithm that allows a distributed master/slave arrangement, where the master number for a given k is determined as above, and a slave number is obtained by treating the master as if it were not functioning. Each processor is a master for some subset of the k values, and is a slave for another subset. For any given master, each of the other processors is a slave for a roughly equal portion of the given master's subset of the k values.
    There are some boring details that I've not discussed, such as how an entity wanting work to be done determines which processors are functioning, and the stuff related to the exact sequence of steps that must be performed when a processor breaks, or is repaired. I don't believe anyone could patent them because once you start thinking about it, the steps are pretty obvious.

    I wouldn't be so sure that a simple post to the java forums
    is all you need to prove this 'prior-art' is it ?
    Don't you need to actually use it? Or have you seen a laywer
    and this was their advice. Even if you have no money for it Im sure
    there are free legal services; even universities, you could contact.
    I don't believe anyone could patent them because once you
    start thinking about it, the steps are pretty obvious.The steps of anything a generally simple, it's the putting-them-together
    that you can patent :)

  • Fault tolerant EJBs

    Hello,
    I'm trying to set up a fault tolerant EJB service. I have two application servers with EJB's deployed on them. The two deployments are identical. Client is a portal application. The EJB client configuration file sun-web.xml looks like this:
    >
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE sun-web-app PUBLIC
    "-//Sun Microsystems Inc.//DTD Sun ONE Application Server 7.0 Servlet 2.3//EN"
    "file:/etc/opt/SUNWps/dtd/sun-web-app_2_3-0.dtd">
    <sun-web-app>
    <ejb-ref>
    <ejb-ref-name>ejb/userService</ejb-ref-name>
    <jndi-name>corbaname:iiop:server1:3700,iiop:server2:3700#ejb/userService</jndi-name>
    </ejb-ref>
    <ejb-ref>
    <ejb-ref-name>ejb/chosenService</ejb-ref-name>
    <jndi-name>corbaname:iiop:server1:3700,iiop:server2:3700#ejb/chosenService</jndi-name>
    </ejb-ref>
    </sun-web-app>
    Everything works fine as long as server1 is up. The system doesn't work after I shut it dow. On the client I get this exception:
    >
    [#|2006-02-01T13:13:07.939+0200|WARNING|sun-appserver-ee8.1_02|javax.enterprise.resource.corba.S1AS-ORB.rpc.transport|_ThreadID=12;|"IOP00410201: (COMM_FAILURE) Connection failure: socketType: IIOP_CLEAR_TEXT; hostname: 10.111.143.169; port: 3700"
    org.omg.CORBA.COMM_FAILURE: vmcid: SUN minor code: 201 completed: No
    at com.sun.corba.ee.impl.logging.ORBUtilSystemException.connectFailure(ORBUtilSystemException.java:2257)
    at com.sun.corba.ee.impl.logging.ORBUtilSystemException.connectFailure(ORBUtilSystemException.java:2278)
    at com.sun.corba.ee.impl.transport.SocketOrChannelConnectionImpl.<init>(SocketOrChannelConnectionImpl.java:208)
    at com.sun.corba.ee.impl.transport.SocketOrChannelConnectionImpl.<init>(SocketOrChannelConnectionImpl.java:221)
    at com.sun.corba.ee.impl.transport.SocketOrChannelContactInfoImpl.createConnection(SocketOrChannelContactInfoImpl.java:104)
    at com.sun.corba.ee.impl.protocol.CorbaClientRequestDispatcherImpl.beginRequest(CorbaClientRequestDispatcherImpl.java:153)
    at com.sun.corba.ee.impl.protocol.CorbaClientDelegateImpl.request(CorbaClientDelegateImpl.java:127)
    at com.sun.corba.ee.impl.protocol.CorbaClientDelegateImpl.is_a(CorbaClientDelegateImpl.java:244)
    Why doesn't it switch to server2? Did I miss something?

    Hi,
    did you try to specify the JVM parameter -Dcom.sun.appserv.iiop.endpoints=<server1>:port1,<server2>:port2 for the jvm where the client is running?

  • WL 7.0 Fault-tolerant stateful beans. (Newbie)

    Hello,
    I was wondering if in WL 7.0 there is a way to make a stateful session
    bean automatically persist its state without having to be passivated
    as a whole.
    The bean that I need to build would persist its state to a DB each
    time a remote method call is executed in it. The purpose is to make
    this bean fault tolerant without having to serialize the whole object
    graph by using passivation.
    I heard that WL 7.0 has a new feature that can make this happen but I
    can't seem to find any info on this. Could anyone point me into the
    right search path?
    Thanks in advance.
    Mateo.

    I'm not aware of such feature in WLS 7.0 EJB container.
    Where did you heard about it? If at all there's a feature you should
    be able to locate at our online docs.
    Here's the link
    http://e-docs.bea.com/wls/docs70/ejb/index.html
    Kumar
    Mateo wrote:
    Hello,
    I was wondering if in WL 7.0 there is a way to make a stateful session
    bean automatically persist its state without having to be passivated
    as a whole.
    The bean that I need to build would persist its state to a DB each
    time a remote method call is executed in it. The purpose is to make
    this bean fault tolerant without having to serialize the whole object
    graph by using passivation.
    I heard that WL 7.0 has a new feature that can make this happen but I
    can't seem to find any info on this. Could anyone point me into the
    right search path?
    Thanks in advance.
    Mateo.

  • Load balancing and fault tolerance in BPEL PM

    BPEL documentation talks about the ability of clustering processes in BPEL PM Server for fault tolerance and Load balancing. Can anybody tell me how is it done?
    Thanks

    Have you seen these links?
    http://www.oracle.com/technology/products/ias/hi_av/BPEL_HA_Paper.pdf
    http://download-uk.oracle.com/docs/cd/B31017_01/integrate.1013/b28980/clusteringsoa.htm#CHDCGIEJ

  • Load Balance & Fault Tolerance

    I need do design a solution for load balance the DLSw traffic between 4 central routers and, if this 4 routers fail (oe wan fail) all peers and circuits need to be restablished on other site with other 4 routers.
    To balance the traffic I will use the DLSw circuit count. To provide fault tolerance between sites I thinking to use backup peer.
    My question is, "circuit count" will work togheter with "backup peer" ?
    Thank´s in advance.

    Only one backup dlsw peer is allowed. I cut and paste the following when I try to define more than one backup peer:
    c3-2500(config)#dlsw remote-peer 0 tcp 2.2.2.2
    c3-2500(config)#dlsw remote-peer 0 tcp 3.3.3.3 backup-peer 2.2.2.2
    c3-2500(config)#dlsw remote-peer 0 tcp 4.4.4.4 backup-peer 2.2.2.2
    %Primary peer already has backup defined
    There are a number of approaches:
    1. Remote routers have 8 peer connections. The cost for A, B, C, and D are lower than that of E, F, G, and H. Normally, the circuits are distributed among A, B, C, and D. Even one or more than one of A, B, C, and D goes down, the rest will take the load. If all A, B, C, and D goes down, E, F, G, and H will take all the circuits.
    2. Slightly different than 1. Instead of making E, F, G, and H are permanent DLSw peer connection, make E is a backup peer for A, F is a backup peer for B, and so on.
    3. Just another idea. Have you considered SNASw using HPR/IP? It may take you a while to set up on the host. However, this is the way to go because IBM has stopped selling 3746/3745. All SNI link will eventually go to HPR/IP.

  • JMS adapters in a fault tolerant environment

    Hi,
    I am working on some IDOC <> XI <> JMS scenario's in a SONICMQ fault tolerant invironment, meaning that on communication channel level hostname should be dynamic.
    XI is sending to SONIC when JMS system a is down send to system b....
    Any suggestions? Samples?
    Thanks.

    Hi,
    Check this below note and Q 2.9..
    Note 856346 - J2EE JMS Adapter: Frequently Asked Questions (FAQ)
    2.9) How do I configure the receiver JMS communication channel to dynamically target a message to a queue that is different from what is statically configured?
    Answer: The mechanics follow the same semantics as described in the JMSReplyTo question answered previously.
    Regards,
    Srini

  • UOO sequencing along with WLS high availability cluster and fault tolerance

    Hi WebLogic gurus.
    My customer is currently using the following Oracle products to integrate Siebel Order Mgmt to Oracle BRM:
    * WebLogic Server 10.3.1
    * Oracle OSB 11g
    They use path service feature of a WebLogic clustered environment.
    They have configured EAI to use the UOO(Unit Of Order) Weblogic 10.3.1 feature to preserve the natural order of subsequent modifications on the same entity.
    They are going to apply UOO to a distributed queue for high availability.
    They have the following questions:
    1) When during the processing of messages having the same UOO, the end point becomes unavailable, and another node is available in order to migrate, there is a chance the UOO messages exist in the failed endpoint.
    2) During the migration of the initial endpoint, are these messages persisted?
    By persisted we mean that when other messages arrive with the same UOO in the migrated endpoint this migrated resource contains also the messages that existed before the migration?
    3) During the migration of endpoints is the client receiving error messages or not?
    I've found an entry on the WLS cluster documentation regarding fault tolerance of such solution.
    Special Considerations For Targeting a Path Service
    When the path service for a cluster is targeted to a migratable target, as a best practice, the path
    service and its custom store should be the only users of that migratable target.
    When a path service is targeted to a migratable target its provides enhanced storage of message
    unit-of-order (UOO) information for JMS distributed destinations, since the UOO information
    will be based on the entire migratable target instead of being based only on the server instance
    hosting the distributed destinations member.
    Do you have any feedback to that?
    My customer is worry about loosing UOO sequencing during migration of endpoints !!
    best regards & thanks,
    Marco

    First, if using a distributed queue the Forward Delay attribute controls the number of seconds WebLogic JMS will wait before trying to forward the messages. By default, the value is set to −1, which means that forwarding is disabled. Setting a Forward Delay is incompatible with strictly ordered message processing, including the Unit-of-Order feature.
    When using unit-of-order with distributed destinations, you should always send the messages to the distributed destination rather than to one of its members. If you are not careful, sending messages directly to a member destination may result in messages for the same unit-of-order going to more than one member destination and cause you to lose your message ordering.
    When unit-of-order messages are processed, they will be processed in strict order. While the current unit-of-order message is being processed by a message consumer, the next message in the unit-of-order will not be delivered unless it is to the same transaction or session. If no message associated with a particular unit-of-order is processing, then a message associated with that unit-of-order may go to any session that’s consuming from the message’s destination. This guarantees that all messages will be processed one at a time and in order, and any rollback or recover will not prevent ordered processing of the messages.
    The path service uses a persistent store to save the state of which member destination a particular unit-of-order is currently using. When a Path Service receives the first message for a particular unit-of-order bound for a distributed destination, it uses the normal JMS load balancing heuristics to select which member destination will handle the unit and writes that information into its persistent store. The Path Service ensures that a new UOO, or an old UOO that has no messages currently on any destination, can be enqueued anywhere in the cluster. Adding and removing member destinations will not disrupt any existing unit-of-order because the routing decision is made dynamically and those decisions are persistent.
    If the Path Service is unavailable, any requests to create new units-of-order will throw the JMSOrderException until the Path Service is available. Information about existing units-of-order are cached in the connection factory and destination servers so the Path Service availability typically will not prevent existing unit-of-order messages from being sent or processed.
    Hope this helps.

  • Fault tolerance

    Hi support,
    there is an oss note certifying the use of fault tolerance vSphere 4.1 in production environment?
    I have read the best practices for its use with MSCS, but I find no note
    Thanks a lot

    Hello,
    unfortunately, we did not mention the use of VMware Fault Tolerance (FT) in an SAP Note so far. But it is supported.
    The use case for FT is quite limited as FT only support Virtual Machines with 1 vCPU. In an SAP environment, we recommend using FT for the SAP System Central Services (SCS / ASCS). See the [SAP HA Best Practices Guide|http://www.vmware.com/resources/techresources/10031]. Note: this is not about using FT in connection with MSCS (which is not supported afaik), but rather descirbing alternatives.
    Kind regards,
    Matthias Schlarb

  • RAID not fault tolerant on K7N2 Delta 2 Platinum?

    Hi all,
    I'd really appreciate your help with this puzzle.....
    Got a new mobo & 3k Barton chip.... Installed 'em into rig described in my sig below.
    Minimum hardware was installed for this operation:
    new mobo & chip
    1 gig mem card
    CDRW
    Floppy drive
    SATA drives x2 each on separate SATA port
    video card
    Entered Bios on 1st boot:
    Set SATA to enabled in IDE RAID window
    Boot priority to CD Rom for install
    Hard disk boot priority to SATA Mirror
    F10'ed into NVRAID BIOS:
    Set up the RAID as a mirror (RAID 1) with optimal stripe size
    cleared & reformatted both discs
    Mirror is recognised as healthy on reboot.
    Booted into WIN XP SP2 slipstreamed install (this disc has been used successfully before on my previous mobo so I know it works)
    Pressed F6 & loaded NV RAID class driver & NV nForce Storage Controller successfully
    Did the Windows install routine & this is where it starts to deviate from what is written in the manual....
    the request to install the RAID driver during the GUI part of the install never appeared (but this is described as a "might be prompted" in the manual so I wasn't too bothered by its non-appearance) and the driver floppy did spin up at one point so I assumed it had installed each drive automatically
    Went into WinXP and finished the install...
    Surprisingly, both RAID drives are now visible individually in the systray as removable drives under the SAFELY REMOVE HARDWARE icon.
    Went into Disk Management expecting the Initialize & Convert Disk Wizard to appear but no sign of it: RAID drives are visible as ONE drive (as you would expect) but are NOT fault tolerant.
    Things I've done differently each time I installed:
    Tried to convert the mirror to a dynamic disc but this also failed every time.
    Installed nForce2 system drivers before & after going into Disk Management on different installs
    Installed NVRAIDMAN.exe and this reports the mirror as functional & healthy.
    Left the floppy in the drive during the GUI part of WIN XP installing itself and removed it on other install
    I've read the nvRAID FAQ & this isn't covered by it.
    Can't find any reference to the Initialize & Convert Disk Wizard in Microsoft Help & Support, except in a Windows Server 2003 KB article, which I can't find again....
    I've repeated the whole process several times to ensure I haven't missed anything, but the end result is always the same.
    So I now have a weird mirror array that is not fault tolerant in Windoze Disk Management - is it doing what it's supposed to?
    Any ideas??

    As it seems you are doing extensive tests, i would suggest a simple one:
    install as you have (where the RAID manager tells you it is setup correctly and redundant and windows does not see it as redundant), power off and take one drive off. Reboot and see the result. (it should still work and the controller should tell you that redundancy is not valid anymore)
    After that, power off, replug the drive and see if it will reconstruct the array.
    Try for the 2nd drive.
    Then, you will be sure that redundancy is present...
    It seems that with SATA, you can even hotswap the drives (hence the appearance of the drives in the SAFELY remove box).

  • Fault tolerant, highly available BOSE XI R2 questions

    Post Author: waynemr
    CA Forum: Deployment
    I am designing a set of BOSE XI R2 deployment proposals for a customer, and I had a couple of questions about clustering. I understand that I can use traditional Windows clustering to setup an active/passive cluster for the input/output file repositories - so that if one server goes down, the other can seamlessly pick up where the other left off. On this Windows-based active/passive cluster, can I install other BOSE services and will they be redundant, or will they also be active/passive. For example: server A is active and has the input/output file repository services and the Page Server. Server B is passive and also has the input/output file repository services and the Page Server. Can the page Server on B be actively used as a redundant Page Server for the entire BOSE deployment? (probably not, but I am trying to check just to make sure) If I wanted to make the most fault-tolerant deployment possible, I think I would need to:Setup two hardware load-balanced web front-end serversSetup two servers for a clustered CMSSetup two web application servers (hardware load-balanced, or can BOSE do that load-balancing?)Setup two Windows-clustered servers for the input/output file repositoriesSetup two servers to provide pairs of all of the remaining BOSE services (job servers, page servers, webi, etc.)Setup the CMS, auditing, and report databases on a cluster of some form (MS SQL or Oracle)So 10 servers - 2 Windows 2003 enterprise and 8 Windows 2003 standard boxes, not including the database environment.Thanks!

    Post Author: jsanzone
    CA Forum: Deployment
    Wayne,
    I hate to beat the old drum, and no I don't work for BusinessObjects education services, but all of your questions and notions of a concept of operations in regards to redundancy/load balancing are easily answered by digesting the special BO course "SA310R2" (BusinessObjects Enterprise XI R1/R2 Administering Servers - Windows).  This course fully covers the topics of master/slave operations, BO's own load balancing operations within its application, and pitfalls to avoid.  Without attending this course, I for one would not have properly understood the BusinessObjects approach and would've been headed on a collision course with disaster in setting up a multi-server environment.
    Best wishes-- John.

  • How make DNS for real fault tolerance?

    Hi,
    Is there any change to make DNS-server fault tolerance for clients (member servers and applications)
    Scenario:
    Primary and Secodary DNS (not AD-integrated)
    If we like to move or rebuild crashed DNS-server from old to new host, there will came service brake for one DNS-server (new server is up and running some time before DNS have been configured (you can ping it and icmp response)), 
    DNS-client is happy for that thus there is no DNS-service or zones and not use secondary DNS-server cause server answer with ICMP and criticals softwares stop working. Is there any change for DNS real fault tolerance?
    I have understood this feature is same in Windows world and non-Windows world

    Hi,
    According to your description, my understanding is that there will came service brake for one DNS server when moving or rebuilding crashed DNS-server from old to new host.
    There are 2 DNS servers: primary and secondary DNS server, both are not AD-Integrated.
    DNS design specifications recommend that at least 2 DNS servers be used to host each zone. For standard primary-type zones, a secondary server is required to add and configure the zone to appear to other DNS servers in the network. This design may provide
    a basic level of fault tolerance for resolving names. Once primary DNS server crashes, secondary DNS server is ready only and can’t process update requests, so we need to manually change the secondary server to primary, and then try to repair the crashed one
    or add another DNS server.
    Comparatively, AD-Integrated primary zones, secondary servers are supported but not required for this purpose. For example, two DNS servers running on domain controllers can be redundant primary servers for a zone, providing the same benefits of adding a
    secondary server while including additional advantages.
    Depending on your need, and choose a better one.
    Best Regards,
    Eve Wang
    Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Support, contact [email protected]

  • CSM fault tolerance takeover time

    Hi,
    I am currently testing the fault tolerance feature on the CSM. During the testing I discovered that the takeover from active to standby is immediate when you reboot the active CSM. However when the active CSM comes back online it does not take over immediately. What I am seeing is that it takes between 3-4 minutes before the active module does a preempt.
    Is there a hidden timer similar to preempt delay in HSRP at work here and is it possible to tweak it somehow?
    Many thanks,
    Murtaza

    there is no hidden parameteres.
    The delay is to make sure the flows can be copied from active to standby before it takes over.
    Gilles.

Maybe you are looking for