Fault tolerance of Distributed Locking Mechanism ?

Is there a lock server for managing locks
     in a Coherence distributed cluster ?
     How fault tolerant is the locking mechanism ?
     What happens if a node which has acquired a lock dies ?
     If there is a lock server, what happens if the lock
     server dies ?
     Is there an upper limit on the number of nodes that
     a Coherence distributed cluster supporting distributed
     locking can have ? And how long does it take for
     the cluster to re-organize itself after a node has gone
     down ?
     Thanks!

A central lock server does not address any of the issues that you raise, and it also adds a single point of failure (SPOF) into the environment.
     The "link breaks" problem is known as the "split brain" scenario. There is no single correct answer for the split brain, and there cannot be (by principle) a single correct answer for it.
     Our plans for managing split brains involve the ability to define one site as a primary and one as a backup. If the primary becomes disconnected from the backup, it doesn't know if the backup is "down" or just disconnected, but it doesn't matter, because the primary is the primary so it needs to keep going.
     On the other hand, if the backup becomes disconnected from the primary, it may not know whether the primary went down or not, but it may be able to use a third party to find out. Consider points P, B, O1, and O2 where P is primary, B is backup, O1 is some other location and O2 is some other location. If B can reach O1 and O2, it knows that it still has connectivity. If B asks O1 and O2 if they can reach P, and neither can reach P, then it indicates that the problem is on P's side. Now the question is whether B should take over processing, which could be automatic, or it could be configured to be a manual switchover process.
     In either case, the answer could theoretically be incorrect, and like all unsolvable (literally unsolvable, not just unsolved) problems, you simply must choose which risks to mitigate, and how much expense you are willing to apply in order to reduce those risks.
     For example, Coherence 3.0 will allow you to run redundant networking across those sites, so losing a connection between two sites might actually require a minimum of three networks to fail completely. What's the cost of running three independent WANs over physically separated fiber? Especially if it is your own private cabling and networking and it's a 50km distance? It's pretty expensive, but that's how some financial services firms set up their DR sites.
     Like I said, there are no perfect answers, and there cannot be, but we provide the means for you to select from a variety of risk-mitigating solutions that in total can reduce the likelihood of any of these problems actually occurring into an infinitesimally small probability domain.
     Peace,
     Cameron Purdy
     Tangosol, Inc.

Similar Messages

  • Deterministic Fault Tolerant Load Balancing

    The USA has an unfortunate penchant for granting patents that arguably do not merit patent protection. Some of these are things that are blindingly obvious. Others are just not sufficiently inventive.
    Anyway, since I have no funds for patent searches, nor patent applications, and there are some other complications, I've decided to post this to establish prior-art for an algorithm. I don't claim that the algorithm is clever, nor novel, nor even that it violates no existing patents. This posting is simply to ensure that to the extent that someone might be granted a patent on it, they can't, because it has already been published.
    The Java connection is that I've done a fair amount of the work required to turn this into a real system in Java.
    Suppose you have set of processors, p0 thru pn-1, and each piece of work to be performed by a processor has some number k associated with it. The problem is to allocate the work roughly equally across the subset of processors that are actually functioning. Further, over a period of time, a series of related pieces of work may arrive with the same k. To the maximum possible extent you want each of the related pieces of work to be handled by the same processor. If a processor fails, you want its work to be distributed across the remaining processors, but still maintaining the property that pieces of work with a given value for k are handled by the same processor. In general we assume that the k values are randomly spread through a large number space.
    The motivation for these requirements is that for a given k the processor may be caching information that improves performance. Or it may be enforcing some invariant, such as in a lock manager where each request for a given lock must go to the same processor, or it clearly won't function.
    To achieve this, construct a list of integers of size n. Element i contains i if processor i is functional, and -1 otherwise.
    Calculate k mod n, and use the result as an index into the list. If the value contained there is non-negative, then it is the number of the processor to use. If it is -1, remove the element from the list, decrement the value of n and repeat. Continue until a processor number is found.
    This scheme is fault tolerant to a degree, in that the resulting system has a high level of availability.
    It also has the property that the failure of a processor only impacts on the allocation of pieces of work that would have been allocated to the failed processor. It does not result in a complete rearrangement of the work allocations. This makes things a lot simpler when dealing with things like distributed lock managers.
    The fault tolerance can be improved by an extension of the algorithm that allows a distributed master/slave arrangement, where the master number for a given k is determined as above, and a slave number is obtained by treating the master as if it were not functioning. Each processor is a master for some subset of the k values, and is a slave for another subset. For any given master, each of the other processors is a slave for a roughly equal portion of the given master's subset of the k values.
    There are some boring details that I've not discussed, such as how an entity wanting work to be done determines which processors are functioning, and the stuff related to the exact sequence of steps that must be performed when a processor breaks, or is repaired. I don't believe anyone could patent them because once you start thinking about it, the steps are pretty obvious.

    I wouldn't be so sure that a simple post to the java forums
    is all you need to prove this 'prior-art' is it ?
    Don't you need to actually use it? Or have you seen a laywer
    and this was their advice. Even if you have no money for it Im sure
    there are free legal services; even universities, you could contact.
    I don't believe anyone could patent them because once you
    start thinking about it, the steps are pretty obvious.The steps of anything a generally simple, it's the putting-them-together
    that you can patent :)

  • Do distributed lock manager implementations already exist

    Hi there,
    Does anybody know a distributed locking implementation which:
    - is opensource
    - is fault tolerant. If one node goes down it has some timeout/whatsoever to handle this.
    Cool would be if peers could find each other via multicasting, however this is not critical.
    It just would be the sugar couting ;)
    Any help is very, very welcome.
    Thank you in advance, lg Clemens

    You may want to check out JGroups and Hazelcast.
    Best,
    -talip
    [http://jroller.com/talipozturk|http://jroller.com/talipozturk]

  • Load Balance & Fault Tolerance

    I need do design a solution for load balance the DLSw traffic between 4 central routers and, if this 4 routers fail (oe wan fail) all peers and circuits need to be restablished on other site with other 4 routers.
    To balance the traffic I will use the DLSw circuit count. To provide fault tolerance between sites I thinking to use backup peer.
    My question is, "circuit count" will work togheter with "backup peer" ?
    Thank´s in advance.

    Only one backup dlsw peer is allowed. I cut and paste the following when I try to define more than one backup peer:
    c3-2500(config)#dlsw remote-peer 0 tcp 2.2.2.2
    c3-2500(config)#dlsw remote-peer 0 tcp 3.3.3.3 backup-peer 2.2.2.2
    c3-2500(config)#dlsw remote-peer 0 tcp 4.4.4.4 backup-peer 2.2.2.2
    %Primary peer already has backup defined
    There are a number of approaches:
    1. Remote routers have 8 peer connections. The cost for A, B, C, and D are lower than that of E, F, G, and H. Normally, the circuits are distributed among A, B, C, and D. Even one or more than one of A, B, C, and D goes down, the rest will take the load. If all A, B, C, and D goes down, E, F, G, and H will take all the circuits.
    2. Slightly different than 1. Instead of making E, F, G, and H are permanent DLSw peer connection, make E is a backup peer for A, F is a backup peer for B, and so on.
    3. Just another idea. Have you considered SNASw using HPR/IP? It may take you a while to set up on the host. However, this is the way to go because IBM has stopped selling 3746/3745. All SNI link will eventually go to HPR/IP.

  • UOO sequencing along with WLS high availability cluster and fault tolerance

    Hi WebLogic gurus.
    My customer is currently using the following Oracle products to integrate Siebel Order Mgmt to Oracle BRM:
    * WebLogic Server 10.3.1
    * Oracle OSB 11g
    They use path service feature of a WebLogic clustered environment.
    They have configured EAI to use the UOO(Unit Of Order) Weblogic 10.3.1 feature to preserve the natural order of subsequent modifications on the same entity.
    They are going to apply UOO to a distributed queue for high availability.
    They have the following questions:
    1) When during the processing of messages having the same UOO, the end point becomes unavailable, and another node is available in order to migrate, there is a chance the UOO messages exist in the failed endpoint.
    2) During the migration of the initial endpoint, are these messages persisted?
    By persisted we mean that when other messages arrive with the same UOO in the migrated endpoint this migrated resource contains also the messages that existed before the migration?
    3) During the migration of endpoints is the client receiving error messages or not?
    I've found an entry on the WLS cluster documentation regarding fault tolerance of such solution.
    Special Considerations For Targeting a Path Service
    When the path service for a cluster is targeted to a migratable target, as a best practice, the path
    service and its custom store should be the only users of that migratable target.
    When a path service is targeted to a migratable target its provides enhanced storage of message
    unit-of-order (UOO) information for JMS distributed destinations, since the UOO information
    will be based on the entire migratable target instead of being based only on the server instance
    hosting the distributed destinations member.
    Do you have any feedback to that?
    My customer is worry about loosing UOO sequencing during migration of endpoints !!
    best regards & thanks,
    Marco

    First, if using a distributed queue the Forward Delay attribute controls the number of seconds WebLogic JMS will wait before trying to forward the messages. By default, the value is set to −1, which means that forwarding is disabled. Setting a Forward Delay is incompatible with strictly ordered message processing, including the Unit-of-Order feature.
    When using unit-of-order with distributed destinations, you should always send the messages to the distributed destination rather than to one of its members. If you are not careful, sending messages directly to a member destination may result in messages for the same unit-of-order going to more than one member destination and cause you to lose your message ordering.
    When unit-of-order messages are processed, they will be processed in strict order. While the current unit-of-order message is being processed by a message consumer, the next message in the unit-of-order will not be delivered unless it is to the same transaction or session. If no message associated with a particular unit-of-order is processing, then a message associated with that unit-of-order may go to any session that’s consuming from the message’s destination. This guarantees that all messages will be processed one at a time and in order, and any rollback or recover will not prevent ordered processing of the messages.
    The path service uses a persistent store to save the state of which member destination a particular unit-of-order is currently using. When a Path Service receives the first message for a particular unit-of-order bound for a distributed destination, it uses the normal JMS load balancing heuristics to select which member destination will handle the unit and writes that information into its persistent store. The Path Service ensures that a new UOO, or an old UOO that has no messages currently on any destination, can be enqueued anywhere in the cluster. Adding and removing member destinations will not disrupt any existing unit-of-order because the routing decision is made dynamically and those decisions are persistent.
    If the Path Service is unavailable, any requests to create new units-of-order will throw the JMSOrderException until the Path Service is available. Information about existing units-of-order are cached in the connection factory and destination servers so the Path Service availability typically will not prevent existing unit-of-order messages from being sent or processed.
    Hope this helps.

  • Fault tolerant server on SLES?

    Hi,
    How would you go go about setting up a fault tolerant suse file server? Id like to mirror the edirectory and the primary NSS shares to another server. Is there a suse equivalent method of widows server active directory replication and distributed file system? Currently running SLES 10 SP3.
    Thanks

    On 20/03/2014 23:16, ataubman wrote:
    > You say SLES but you've posted in OES ... which is it?
    I suspect OES since eDirectory and NSS also mentioned. Perhaps at0mic
    can post the output from "cat /etc/*release" so we know.
    > But as a general answer clustering is probably what you're looking for.
    If OES then look at Novell Cluster Services but if SLES then High
    Availability Extension.
    HTH.
    Simon
    Novell Knowledge Partner
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below. Thanks.

  • Load Balancing & Fault tolerance in Oracle Apps

    Hello
    I am pretty new to Oracle Apps and Oracle 9iAS
    We are trying to asked to find out the ways of deplying load balancing and fault tolerance in Oracle Apps
    I have gone thru the following articles of metalink and some more from google
    Configuring Web Cache as a Load Balancer for Application Servers
    Create new middle tier node in existing Apps 11i environment using cloning and then load balance
    Integrating and using Web Cache with Forms 9i for Load Balancing
    Running Multiple OC4J Instances From a Single Install of 9iAS 1.0.2.2
    LOAD BALANCING ORACLE APPLICATIONS ON UNIX
    Load Balancing in 11i
    OC4J Clustering Setup
    OC4J Load Balancing for Forms 9i
    Setting up 11i E-Business suite using a hardware load balancer
    Sharing an APPL_TOP in Oracle Applications 11i
    In Oracle Apps Concepts pdf, found that
    Load balancing occurs when there are multiple installations of web server, forms server, reports server, concurrent manager server etc
    Lets consider forms server component of middle tier
    Can there be multiple INSTANCES of forms server within SINGLE INSTALLATION ???
    If it has to be MULTIPLE INSTALLATIONS, then it will require multiple physical machines
    What is software load balancing then ???
    Also read that
    Oracle 9i AS instance is combition of Oracle HTTP Server (OHS) and one or more instances of Oracle9iAS Container for J2EE (OC4J)
    Thus software load balancing can be implemented for Web server component of middle tier using multiple instances of OC4J ???
    Forms metrix server configuration is hardware load balancing ??
    What about Oracle9iAS Web Cache clustering ??
    Please clarify the doubts and suggest me the way
    Thanks a lot

    dear all can any one help me on the following;
    We want to install Oracle Apps 11.5.9 on 4 IBM AIX 5L boxes, the first node
    will hold the database , concurrent and Admin Tiers. as for the other 3 nodes
    it will hold the Forms/Web Tiers and we need to use forms metrics for load
    balancing taking into consideration that we can't use Shared APPL_TOP. we tried
    to find the way how to install and configure this solution but all the
    documents are talking aboout the concept only and this concept is applicable.
    We need the exact steps on how do the installation either using rapidwiz or any
    thing else
    fadi

  • Locking mechanism provided by the Web AS Java

    Hi All,
    I have a requirement to lock an object for a specific action
    with using locks provided and managed by the Web AS Java (i have to use TableLocking API).
    I have read about the locking mechanism in "SAP NetWeaver Developer Studio Documentation help" .
    I need code samples (how it can be done).
    How i should check data availability (is it locked?)?
    Thanks.

    hi
    new features added in WAS7.0 for ABAP stack
    1.Webdynpro for ABAP
    2.New Enhancement Framework
    3.Switch Frame Work
    4.Adobe Forms integeration
    5.New features added to ABAP Editor.
    Also note that MySAP Business Suite is also the part of WAS7.0 release.
    For Java stack new features have been added to Netweaver Developer Studio,EP and KM areas.Also BI-Java is also the part of WAS7.0 Release.
    Cheers,
    Abdul Hakim
    Mark all useful answers..

  • Fault Tolerance - MCS 7825 H3 & MCS 7835 H2 Servers

    Hi,
    I have enabled network fault tolerance on my CUC and CCM Servers, i thought after enabling the network fault tolerance the server would change the mac address to Virtual mac address which indeed it may affect the licenses.
    But after enabling fault tolerance and restarting the server also i didn't see any change in the MAC ID. Its the same MAC ID.
    Pls advice.
    Regards
    Jagadish G

    Hi Jagadish,
    do u get the option of 1000 MB/s  when u configure set network nic eth0  speed ? can u check?
    secondly, Bond 0 would be showing Auto disabled which cannot be changed.
    Can u share snapshot of show netowrk failover?
    regds,
    aman

  • Fault tolerant EJBs

    Hello,
    I'm trying to set up a fault tolerant EJB service. I have two application servers with EJB's deployed on them. The two deployments are identical. Client is a portal application. The EJB client configuration file sun-web.xml looks like this:
    >
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE sun-web-app PUBLIC
    "-//Sun Microsystems Inc.//DTD Sun ONE Application Server 7.0 Servlet 2.3//EN"
    "file:/etc/opt/SUNWps/dtd/sun-web-app_2_3-0.dtd">
    <sun-web-app>
    <ejb-ref>
    <ejb-ref-name>ejb/userService</ejb-ref-name>
    <jndi-name>corbaname:iiop:server1:3700,iiop:server2:3700#ejb/userService</jndi-name>
    </ejb-ref>
    <ejb-ref>
    <ejb-ref-name>ejb/chosenService</ejb-ref-name>
    <jndi-name>corbaname:iiop:server1:3700,iiop:server2:3700#ejb/chosenService</jndi-name>
    </ejb-ref>
    </sun-web-app>
    Everything works fine as long as server1 is up. The system doesn't work after I shut it dow. On the client I get this exception:
    >
    [#|2006-02-01T13:13:07.939+0200|WARNING|sun-appserver-ee8.1_02|javax.enterprise.resource.corba.S1AS-ORB.rpc.transport|_ThreadID=12;|"IOP00410201: (COMM_FAILURE) Connection failure: socketType: IIOP_CLEAR_TEXT; hostname: 10.111.143.169; port: 3700"
    org.omg.CORBA.COMM_FAILURE: vmcid: SUN minor code: 201 completed: No
    at com.sun.corba.ee.impl.logging.ORBUtilSystemException.connectFailure(ORBUtilSystemException.java:2257)
    at com.sun.corba.ee.impl.logging.ORBUtilSystemException.connectFailure(ORBUtilSystemException.java:2278)
    at com.sun.corba.ee.impl.transport.SocketOrChannelConnectionImpl.<init>(SocketOrChannelConnectionImpl.java:208)
    at com.sun.corba.ee.impl.transport.SocketOrChannelConnectionImpl.<init>(SocketOrChannelConnectionImpl.java:221)
    at com.sun.corba.ee.impl.transport.SocketOrChannelContactInfoImpl.createConnection(SocketOrChannelContactInfoImpl.java:104)
    at com.sun.corba.ee.impl.protocol.CorbaClientRequestDispatcherImpl.beginRequest(CorbaClientRequestDispatcherImpl.java:153)
    at com.sun.corba.ee.impl.protocol.CorbaClientDelegateImpl.request(CorbaClientDelegateImpl.java:127)
    at com.sun.corba.ee.impl.protocol.CorbaClientDelegateImpl.is_a(CorbaClientDelegateImpl.java:244)
    Why doesn't it switch to server2? Did I miss something?

    Hi,
    did you try to specify the JVM parameter -Dcom.sun.appserv.iiop.endpoints=<server1>:port1,<server2>:port2 for the jvm where the client is running?

  • WL 7.0 Fault-tolerant stateful beans. (Newbie)

    Hello,
    I was wondering if in WL 7.0 there is a way to make a stateful session
    bean automatically persist its state without having to be passivated
    as a whole.
    The bean that I need to build would persist its state to a DB each
    time a remote method call is executed in it. The purpose is to make
    this bean fault tolerant without having to serialize the whole object
    graph by using passivation.
    I heard that WL 7.0 has a new feature that can make this happen but I
    can't seem to find any info on this. Could anyone point me into the
    right search path?
    Thanks in advance.
    Mateo.

    I'm not aware of such feature in WLS 7.0 EJB container.
    Where did you heard about it? If at all there's a feature you should
    be able to locate at our online docs.
    Here's the link
    http://e-docs.bea.com/wls/docs70/ejb/index.html
    Kumar
    Mateo wrote:
    Hello,
    I was wondering if in WL 7.0 there is a way to make a stateful session
    bean automatically persist its state without having to be passivated
    as a whole.
    The bean that I need to build would persist its state to a DB each
    time a remote method call is executed in it. The purpose is to make
    this bean fault tolerant without having to serialize the whole object
    graph by using passivation.
    I heard that WL 7.0 has a new feature that can make this happen but I
    can't seem to find any info on this. Could anyone point me into the
    right search path?
    Thanks in advance.
    Mateo.

  • How to implementing locking mechanism in abap?

    Hi
         my program run by different users. I want
         to ensure that at a particular point of time only
         one instance of my program running, and all others
         should be in wait.
         if have a solution for this. i can make use of a flag
         (global flag ) i set/get this flag from import/export
         mechanism. for example.
         do.
         import v_flag = v_flag from MEMORY id 'ZFLAG'.
         if v_flag is initial.
             v_flag = '1'.
             export v_flag to memory id 'ZFLAG'.
             exit.
         endif.  
         enddo.
         ***Rest of the program main code****
         clear v_flag.
         export v_flag to memory id 'ZFLAG'.
         is this ok? or any other locking mechanism supported
         by abap.
    Regards,
    Abhimanyu.L

    Hi
    Check the following,
    http://help.sap.com/saphelp_nw04/helpdata/en/7b/f9813712f7434be10000009b38f8cf/content.htm
    http://help.sap.com/saphelp_nw04s/helpdata/en/aa/fd823730fa874ae10000009b38f8cf/content.htm
    http://www.sapdb.org/7.4/htmhelp/7d/75d34a6a210b4b95f232e5f9acd232/content.htm
    http://www.sapdb.org/7.4/htmhelp/6e/ab5d79286b3d4a9f72ef140191d208/content.htm
    http://sapdb.net/7.4/htmhelp/43/151d12671a2240947990c5152a4bbd/content.htm
    Please reward if it helps.

  • Load balancing and fault tolerance in BPEL PM

    BPEL documentation talks about the ability of clustering processes in BPEL PM Server for fault tolerance and Load balancing. Can anybody tell me how is it done?
    Thanks

    Have you seen these links?
    http://www.oracle.com/technology/products/ias/hi_av/BPEL_HA_Paper.pdf
    http://download-uk.oracle.com/docs/cd/B31017_01/integrate.1013/b28980/clusteringsoa.htm#CHDCGIEJ

  • JMS adapters in a fault tolerant environment

    Hi,
    I am working on some IDOC <> XI <> JMS scenario's in a SONICMQ fault tolerant invironment, meaning that on communication channel level hostname should be dynamic.
    XI is sending to SONIC when JMS system a is down send to system b....
    Any suggestions? Samples?
    Thanks.

    Hi,
    Check this below note and Q 2.9..
    Note 856346 - J2EE JMS Adapter: Frequently Asked Questions (FAQ)
    2.9) How do I configure the receiver JMS communication channel to dynamically target a message to a queue that is different from what is statically configured?
    Answer: The mechanics follow the same semantics as described in the JMSReplyTo question answered previously.
    Regards,
    Srini

  • Fault tolerance

    Hi support,
    there is an oss note certifying the use of fault tolerance vSphere 4.1 in production environment?
    I have read the best practices for its use with MSCS, but I find no note
    Thanks a lot

    Hello,
    unfortunately, we did not mention the use of VMware Fault Tolerance (FT) in an SAP Note so far. But it is supported.
    The use case for FT is quite limited as FT only support Virtual Machines with 1 vCPU. In an SAP environment, we recommend using FT for the SAP System Central Services (SCS / ASCS). See the [SAP HA Best Practices Guide|http://www.vmware.com/resources/techresources/10031]. Note: this is not about using FT in connection with MSCS (which is not supported afaik), but rather descirbing alternatives.
    Kind regards,
    Matthias Schlarb

Maybe you are looking for

  • How do I cancel my Verizon contract online

    Hi, I have found that even though you can subscribe online, I can't cancel online? What? So I have to wait and listen to Verizon support while they try to convince me to stay? So this is what being customer friendly means in Verizon, right?  I wanted

  • Detailed comments

    I've gone through the EAD spec. and I've come up with a detailed list of comments that I'd like to post here and also will post over at TheServerSide.com and send directly to the JSR group. This area will give the community a chance to discuss this m

  • ITunes .exe - Corrupt File

    The file or directory \iPOD_Control\iTunesDB is corrput and unreadable. Please run the Chkdsk utility. That is the message I keep receiving every time I open iTunes. My iTunes Library opens up with no problem, my iPod is recognized by the iTunes prog

  • After 2.2, this phone is trash

    2.2 was recently pushed out to my phone, and I couldn't be unhappier. On 2.1 my battery would last pretty much all day, draining to the point that it asks to connect a charger by maybe 7 or 8pm. By then I usually am home and its not a problem. Now wh

  • Error in result set from join query

    I get a SQL exception from the JDBC thin driver when I make a getXXX( "string" ) call on a result set object when the query is a join. Aliases don't seem to help. Below is the stack trace. Anybody have any ideas? matt ResultSet.findColumn java.sql.SQ