Write behind batch behaviour confirmation

Scenario as follows. Cache is configured with a write behind delay of 10 seconds. An entry is inserted into the cache. For some reason, the db is slow and when the cache store attempts to flush this entry to disk it takes 1 minute. During this period, that same entry is updated in the cache twice, but those updates are > 10 seconds apart).
Do those last 2 updates get aggregated into one store operation to the cache store? Or will the cache store be asked to store both versions of the entry because the updates were greater than the batch delay apart?
Thanks

Hi anonymous user628574,
The consecutive puts will indeed be coalesced into a single "store" operation.
Regards,
Gene

Similar Messages

  • How to limit Write-Behind batch

    We have a scenario:
    we use read-write-backing-map-scheme having write-delay 60s.
    System insert a lot of data and then time comes to write data coherence find 40-50 k of unsaved record and pass them all to cachestore.
    Due to data volume or database busyness cashstore may work some time several seconds for instance.
    <read-write-backing-map-scheme>
    <scheme-name>TicketDatabaseScheme</scheme-name>
    <scheme-ref>DefaultDatabaseScheme</scheme-ref>
    <!--<write-delay>1m</write-delay>-->
    <cachestore-scheme>
    <class-scheme>
    <class-name>com.griddynamics.ticketon.app.dao.coherence.TicketCacheStore</class-name>
    </class-scheme>
    </cachestore-scheme>
    </read-write-backing-map-scheme>
    <read-write-backing-map-scheme>
    <scheme-name>DefaultDatabaseScheme</scheme-name>
    <internal-cache-scheme>
    <local-scheme>
    <scheme-ref>LocalScheme</scheme-ref>
    </local-scheme>
    </internal-cache-scheme>
    <write-delay>60s</write-delay>
    </read-write-backing-map-scheme>
    In the case we experience "Terminating guarded execution" followed by service termination.
    2010-02-11 09:26:52.223/511.457 Oracle Coherence GE 3.5.2/463 <Error> (thread=DistributedCache:TicketonCache, member=2): Terminating guarded execution (due to hard timeout 1924ms ago) of Daemon{Thread="Thread[WriteBehindThread:CacheStoreWrapper(com.griddynamics.ticketon.app.dao.coherence.TicketCacheStore),5,WriteBehindThread:CacheStoreWrapper(com.griddynamics.ticketon.app.dao.coherence.TicketCacheStore)]", State=Running}
    2010-02-11 09:26:52.225/511.459 Oracle Coherence GE 3.5.2/463 <Error> (thread=Termination Thread, member=2): Write-behind thread timed out; stopping the cache service
    2010-02-11 09:26:52.226/511.460 Oracle Coherence GE 3.5.2/463 <D5> (thread=DistributedCache:TicketonCache, member=2): Service TicketonCache left the cluster
    INFO 09:26:52,227 [http--80-22$27432016 DaoCoherenceImpl] - PROFILE_doCreatetickets putAll 200 tickets time 3444 time per 10 objects 172
    INFO 09:26:52,227 [http--80-22$27432016 DaoCoherenceImpl] - PROFILE event and 1000 tickets ctreated time 9668
    Broadcast Message from root (msglog) on ip-10-226-137-172 Thu Feb 11 09:57:18...ets putAll 200 tickets time 3365 time per 10 objects 168
    2010-02-11 09:26:52.228/511.462 Oracle Coherence GE 3.5.2/463 <Info> (thread=httTHE SYSTEM ip-10-226-137-172 IS BEING SHUT DOWN NOW ! ! !et
    Log off now or risk your files being damagedence GE 3.5.2/463 <Info> (thread=http--80-27$25787595, member=2): Restarting Service: TicketonCache
    INFO 09:26:52,229 [http--80-20$15974570 DaoCoherenceImpl] - PROFILE_doCreatetickets putAll 200 tickets time 3447 time per 10 objects 172
    INFO 09:26:52,289 [http--80-22$26935588 BackingBeanSuper] - request HttpRequest[22]
    [09:26:53.446] {http--80-35$24027494} java.lang.RuntimeException: Failed to start Service "TicketonCache" (ServiceState=SERVICE_STOPPED)
    [09:26:53.446] {http--80-35$24027494} at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.waitAcceptingClients(Service.CDB:12)
    Resume: coherence crashes in case far from fatal.
    Questions
    1. May i limit the size of a batch passed to cachestore?
    2. Is it possible to configure the timeout "(due to hard timeout 1924ms ago)"
    3. Is it possible to handle like this some way to prevent self killing of coherence cluster.

    Thank you Mark you are vary helpfull
    Did you mean that (lower) by "bundle strategy"?
    <cachestore-scheme>
    <class-scheme>
    <class-name>com.griddynamics.ticketon.app.dao.coherence.TicketCacheStore</class-name>
    </class-scheme>
    <operation-bundling>
    <bundle-config>
    <operation-name>store</operation-name>
    <preferred-size>5000</preferred-size>
    <auto-adjust>true</auto-adjust>
    </bundle-config>
    </operation-bundling>
    </cachestore-scheme>
    And if yes is it looks sense?
    I mean by this, "send records to TicketCacheStore by 5000 per call " am i right?
    I dropped delay to 10s and set factor to 0.5
    Not coherece send me 5-20k records and cachestore handle whis successfuly.
    But! By diferent means it may work longer sometimes, some lock in database for instance.
    I want to find durable solution for the case, not only lower a chance i meet one.
    Issuing heartbeat from cachestore looks best for me now.
    I find that default guardian timeout is 65s and it is not looks as good idea to make it higher.

  • Can you limit write-behind batch sizes to a set number in Coherence 3.5?

    I'm currently running Coherence 3.5.3p9 on Windows.
    The cache store is set up to use the write-behind scheme via the read-write-backing-map-scheme tag.
    Batching is enabled with a write-delay of 5s.
    My understanding is that essentially anything that was newly inserted into the cache more than 5 seconds ago becomes eligable for storage.
    Our application goes through a bit of a peak-trough cycle. Sometimes very little data is inserted all at once and sometimes a lot is. This results in quite varying batch sizes and big batch sizes do cause issues on our db from time to time.
    I can decrease the write-delay to 1s in the hope that this will in turn decrease the batch sizes, but is there a way to set a specific number e.g. I only ever want to write 20 in a batch?

    Hi,
    you can just break down that big batch into smaller batches (DB transactions) yourself, and you can also decide that you don't want to write more at the moment.
    If you throw an exception Coherence will retry whatever is in the parameter map passed to storeAll and parameter collection to eraseAll. But it does not have to be the full list, it is expected that you remove those entries/elements which you have successfully persisted.
    This way you can control the rate of writing yourself. Also, since the write-behind thread does not block event processing, therefore you are sort of safe to do longer waits in those methods if you want to somewhat space out resource transactions without returning from storeAll.
    To answer your question:
    You can either
    - do a physical transaction of 20 elements, remove those 20 elements from the map, then optionally wait and then continue with more elements from the map as long as there are any (this gives you the chance of controlling the rate of transactions).
    - send a physical transaction of at most 20 elements to the database, remove those 20 elements from the map and then throw a dummy exception (in which case Coherence requeues the rest... take care, after this they are considered freshly changed entries).
    Best regards,
    Robert

  • Write-Behind batch behavior in EP partition level transactions

    Hi,
    We use EntryProcessors to perform updates on multiple entities stored in the same cache partition. According to the documentation, Coherence handles all the updates in a "sandbox" and then commits them atomically to the cache backing map.
    The question is, when using write-behind, does Coherence guarantee that all entries updated in the same "partition level transaction" will be present in the same "storeAll" operation?
    Again, according to the documentation, the write-behind thread behavior is the following:
    The thread waits for a queued entry to become ripe.
    When an entry becomes ripe, the thread dequeues all ripe and soft-ripe entries in the queue.
    The thread then writes all ripe and soft-ripe entries either via store() (if there is only the single ripe entry) or storeAll() (if there are multiple ripe/soft-ripe entries).
    The thread then repeats (1).
    If all entries updated in the same partition level transaction become ripe or soft-ripe at the same instant they will all be present in the storeAll operation. If they do not become ripe/soft-ripe in the same instant, they may not be all present.
    So, it all depends on the behavior of the commit of the partition level transaction, if all entries get the same update timestamp, they will all become ripe at the same time.
    Does anyone know what is the behavior we can expect regarding this issue?
    Thanks.

    Hi,
    That comment is still correct for 12.1 and 3.7.1.10.
    I've checked Coherence APIs and the ReadWriteBackingMap behavior, and although partition level transactions are atomic, the updated entries will be added one by one to the write behind queue. In each added entry coherence uses current time to calculate when each entry will become ripe, so, there is no guarantee that all entries in the same partition level transaction will become ripe at the same time.
    This leads me to another question.
    We have a use case where we want to split a large entity we are storing in coherence into several smaller fragments. We use EntryProcessors and partition level transactions to guarantee atomicity in operations that need to update more than one fragment of the same entity. This guarantees that all fragments of the same entity are fully consistent. The cached fragments are then persisted into database using write-behind.
    The problem now is how to guarantee that all fragments are fully consistent in the database. If we just relly on coherence write-behind mecanism we will have eventual consistency in DB, but in case of multi-server failure the entity may become inconsistent in database, which is a risk we wouldnt like to take.
    Is there any other option/pattern that would allow us to either store all updates done on the entity or no update at all?
    Probably if in the EntryProcessor we identify which entities were updated and if we place them in another persistency queue as a whole, we will be able to achieve this, but this is a kind of tricky workaround that we wouldnt like to use.
    Thanks.

  • Switching from write through to write behind automatically

    Hi,
    We are considering a Coherence solution to protect a customer facing application from outages due to database failures. This is for a financial company and the monetary value of each transaction is large and we want to provide 100% guarantee against data loss while not incurring any outages. We want to provide a write-through persistence to the database through Coherence which can switch to a write-behind automatically at runtime if the database persistence fails. Is this doable automatically and would it solve the problem I am trying to solve without losing any inflight transactions? Are there any real customer cases that were successful in achieving this using Coherence?
    Thanks
    Sairam
    Edited by: SKR on Feb 16, 2012 3:14 PM
    Edited by: SKR on Feb 16, 2012 3:15 PM

    SKR wrote:
    Jonathan.Knight wrote:
    Hi Sairam
    I know you can change the write-delay in JMX for a cache using write-behind but I pretty certauin you cannot make a write-through cache suddenly become a write-behind cache.
    I'm not sure why you think changing from write-through to write-behind will allow you to guarantee 100% no data loss - do you mean no loss of updates to the DB or no loss of data in the cache cluster? There are certainly scenarios that can occur where you can loose data from either the cluster or the DB that write-through or write-behind will not save you from. Presumably you want to use write-behind to allow for the DB to go down, although you will still need to configure Coherence to properly retry failed write-behind calls CacheStore behaviour on failure. What happens to your data if you are using write-behind and you loose a partition from you cluster (i.e. you loose a physical machine or two or more JVMs in a short space of time) - you have data loss - you cannot guarantee against this you can only mitigate it and have a recovery policy/procedure.
    JKJK,
    Thanks for your reply. I must have explained the scenario better. What we are trying to do is to have our transactions commit to the database synchronously using write-through, so that during normal operation, the data will be committed, persisted and durable in the database. But our RW database becomes a single point of failure and if some problem occurs to the database during the peak load time, we run the risk of an outage till we fix the database problem or failover to the standby (We don't have RAC architecture or automatic failover and the manual switchover takes about 10 - 15 mins minimum). We want to avoid this by providing a cache-only operation mode during such a failure, where the customers can continue to transact and the writes will get queued in the cache. I do understand that losing both the database and the cache or losing the primary and the backup in the cache would result in a data loss. But I am assuming such a dual failure is rare.
    We do not want to run write-behind all the time but only during the database failure window. From what you mentioned, it seems the runtime switching from write-through to write-behind is not available as an option.
    SairamHi Sairam,
    I would suggest that you configure write-behind to have a fairly short write-delay, and you only return a confirmation to the client
    - either after the write-behind succeeded (you can use a backing map listener to listen for the removal of the decoration which meant that the entry was dirty)
    - or if the database went down (noticeable from the failure), then it is up to you whether you send a confirmation which also mentions that it is not persisted to disk yet, or not at all
    Best regards,
    Robert

  • How to force write-behind store on cache node shutdown?

    Hi,
    I built a small pilot project based on Coherence and now I test it for failover. I found replication issues with Distributed cache in the following scenario:
    - start cache node 1 (JVM instance 1);
    - connect Extend client to it and get 1 object from cache (only 1 object in the cache - loaded by CacheStore from DB);
    - change the object and put it back (I use EntryProcessor for this);
    - start cache node 2 (JVM instance 2);
    - stop cache instance 1 (write-behind store wasn't invoked yet: write-delay = 2m);
    - load/change the same object on node 2; all changes done on node 1 are lost.
    My expectation was that cache will replicate its data between nodes when new member joins cache cluster. The backup count = 1 by default, right?
    What should I do in order to prevent such behavior? Is it possible to force write-behind store on cache node shutdown event?
    Thanks, Denis.
    My cache-config, just in case:
    <cache-config>
    <caching-scheme-mapping>
    <cache-mapping>
    <cache-name>AccountCache</cache-name>
    <scheme-name>account-distributed</scheme-name>
    </cache-mapping>
    </caching-scheme-mapping>
    <caching-schemes>
    <distributed-scheme>
    <scheme-name>account-distributed</scheme-name>
    <service-name>DistributedCache</service-name>
    <serializer>
    <class-name>com.tangosol.io.pof.ConfigurablePofContext</class-name>
    <init-params>
    <init-param>
    <param-type>String</param-type>
    <param-value>account-pof-config.xml</param-value>
    </init-param>
    </init-params>
    </serializer>
    <backing-map-scheme>
    <read-write-backing-map-scheme>
    <scheme-name>AccountDatabaseScheme</scheme-name>
    <internal-cache-scheme>
    <local-scheme>
    <!--scheme-ref>default-eviction</scheme-ref-->
    <eviction-policy>LRU</eviction-policy>
    <high-units>0</high-units>
    <expiry-delay>30m</expiry-delay>
    </local-scheme>
    </internal-cache-scheme>
    <cachestore-scheme>
    <class-scheme>
    <class-name>com.roox.bss.cache.store.AccountCacheStore</class-name>
    <init-params>
    <init-param>
    <param-type>java.lang.String</param-type>
    <param-value>dburl_</param-value>
    </init-param>
    <init-param>
    <param-type>java.lang.String</param-type>
    <param-value>user</param-value>
    </init-param>
    <init-param>
    <param-type>java.lang.String</param-type>
    <param-value>password</param-value>
    </init-param>
    </init-params>
    </class-scheme>
    </cachestore-scheme>
    <write-delay>2m</write-delay>
    <write-batch-factor>.5</write-batch-factor>
    </read-write-backing-map-scheme>
    </backing-map-scheme>
    </distributed-scheme>
    <proxy-scheme>
    <service-name>ExtendTcpProxyService</service-name>
    <thread-count>10</thread-count>
    <acceptor-config>
    <tcp-acceptor>
    <local-address>
    <address>localhost</address>
    <port>9098</port>
    <reuse-address>true</reuse-address>
    <reusable>true</reusable>
    </local-address>
    </tcp-acceptor>
    <serializer>
    <class-name>com.tangosol.io.pof.ConfigurablePofContext</class-name>
    <init-params>
    <init-param>
    <param-type>String</param-type>
    <param-value>account-pof-config.xml</param-value>
    </init-param>
    </init-params>
    </serializer>
    </acceptor-config>
    <autostart>true</autostart>
    </proxy-scheme>
    </caching-schemes>
    </cache-config>

    solved with autostart=true

  • Write-Behind, Expiration, and SQL Exceptions.

    Hi Chaps,
    If a cache with write-behind enabled has problems writing to the DB I understand that Coherence will re-queue the objects and write them when the DB is available.
    The problem I have is that (after a DB failure) I don't see them being written - I can see these items in the cache but not in the DB, even several hours after the outage. (Items that were added to the cache after the outage are being written).
    Is there anything the cachestore methods (specifically store() ) need to do with regards to exceptions to ensure that these items are re-qeueued?
    Next question is: I was also wondering how is this managed with regards to expiry?
    We have our own expiry routine which removes items from the cache that are older than 24 hours (this was from before we could expire objects by specifying the timeout in the put() method call, which I am intending to switch to).
    If an item has not been written to the DB due to an outage and is then expired (by our own routine or by Coherence) is it then lost forever, or will it remain in the queue? (seeing as the queue holds references I am guessing not but though I'd check).
    Thanks,
    Randal.

    Jon,
    I have a question related to this...If you remember a few weeks back, I stumbled upon the problem of the "version-persistent" map for the versioned-backing-map-scheme does not accept putAll operations. The workaround until you guys implement it, was to override the putAll method of the cacheStore and throw and unsupported operation exception (to force individual puts).
    Well, although this workaround works, I am getting tons and tons of:
    2006-04-06 17:18:27.347 Tangosol Coherence 3.1/339 <Warning> (thread=WriteBehindThread:MyCacheStore, member=1): The CacheStore "MyCacheStore@46b9979b" does not support storeAll().
    2006-04-06 17:18:27.348 Tangosol Coherence 3.1/339 <Error> (thread=WriteBehindThread:MyCacheStore, member=1): Failed to store keys="[16, 18, 21, 26, 5, 13, 14, 25, 17, 15, 23, 19, 2, 6, 9, 7]":
    java.lang.UnsupportedOperationException
    at ...MyCacheStore.storeAll(MyCacheStore.java:126)
    at com.tangosol.net.cache.ReadWriteBackingMap$CacheStoreWrapper.storeAll(ReadWriteBackingMap.java:3820)
    at com.tangosol.net.cache.ReadWriteBackingMap$WriteThread.run(ReadWriteBackingMap.java:3538)
    at com.tangosol.util.Daemon$1.run(Daemon.java:63)
    2006-04-06 17:18:27.349 Tangosol Coherence 3.1/339 <Warning> (thread=WriteBehindThread:MyCacheStore, member=1): Requeued store for key="16"
    2006-04-06 17:18:27.349 Tangosol Coherence 3.1/339 <Warning> (thread=WriteBehindThread:MyCacheStore, member=1): Requeued store for key="18"
    2006-04-06 17:18:27.350 Tangosol Coherence 3.1/339 <Warning> (thread=WriteBehindThread:MyCacheStore, member=1): Requeued store for key="21"
    2006-04-06 17:18:27.351 Tangosol Coherence 3.1/339 <Warning> (thread=WriteBehindThread:MyCacheStore, member=1): Requeued store for key="26"
    the first OperationNotSupported is expected, but I'm not sure what the requeued warnings are all about. These are not failures to the DB...it is something else. (mind you that this happens when trying to load a lot of data into the map.)
    1- Is this requeuing related or the same as in failed DB stores?
    2- Is it possible to "lose" stores if I don't configure the write-requeue-threshold with very, very high values? I must ensure I don't lose anything.
    In a related note, in some circumstances, I need to ensure that the "write queue" is flushed or cleared. For example, I may want to force a flush of all pending stores (and wait/block until that's done).
    I have looked into it and I don't seem to know how to do it. I can read the write-queue length, but I believe that this is not very accurate...since my tests seem to indicate that the write-behind thread may take the entries to store off the write-queue and then deal with them in parallel (which means that there are still entries althought the write-queue size is 0). Also, there are some calls from the cache store that, at first, seem to give some access to the write thread (potentially allowing me to contact the thread to tell him to flush or discard any pending stores)...but I believe that all of the functions are protected...but there may be other ways..
    I guess my second batch of questions are:
    1- How can I effectively force a flush (or clear) of the pending stores. Such that there is no single store pending in any queue (visible or invisible to the programmer).
    2- What is the role of re-queuing in these situations? where is the queue sitting, the thread? the cache store? who's responsible of retrying that, and when?...I would like to flush those entries too.
    A quick explanation of the operation of the write thread would also be very appreciated.
    Thanks!
    Josep M.

  • Write-Behind Caching and Limited Internal Cache Size

    Let's say I have a write-behind cache and configure its internal cache to be of a fixed limited size, e.g. 10000 units. What would happen if more than 10000 units are added to the write-behind cache within the write-delay period? Would my CacheStore's storeAll() get all of the added values or would some of the values be missed because of the internal cache size limitation?

    Hi Denis,     >
         > If an entry is removed while it is still in the
         > write-behind queue, it will be removed from the queue
         > and CacheStore.store(oKey, oValue) will be invoked
         > immediately.
         >
         > Regards,
         > Dimitri
         Dimitri,
         Just to confirm, that I understand it right if there is a queued update to a key which is then remove()-ed from the cache, then the following happens:
         First CacheStore.store(key, queuedUpdateValue) is invoked.
         Afterwards CacheStore.erase(key) is invoked.
         Both synchronously to the remove() call.
         I expected only erase will be invoked.
         BR,
         Robert

  • Write-behind max speed?

    Hi,
    We are trying to test the speed of the write behind mechanism and we would be interested to know how other coherence users handle, for example, writing 1 million rows into the database.
    At the moment, using jdbc batch inserts we can write approximately 30000 rows per minute, which means it would take about 30 minutes to save 1 million rows. Are there any other methods that other coherence user's use that can improve on this?
    Many thanks,

    user738616 wrote:
    Hi,
    This has nothing to do with Coherence as the implementation of CacheStore is outside of Coherence. Apart from JDBC Batch, you should try using PLSQL Bulk binds for such numbers.
    Hope this helps!
    Cheers,
    NJHi NJ,
    we actually measured PLSQL bulk binds against plain SQL (both with JDBC)... for anything which can be translated to plain inserts/updates, plain SQL is way faster (more than 10x).
    You can only win with bulk binds when that statement which you send down actually does more complex logic and multiple statements so you actually win with optimizing away the roundtrips, too.
    Best regards,
    Robert

  • Preventing write-coalescing with write-behind

    Dear all,
    I am very interested in the write-behind feature but I would like to disable the write-coalescing optimization to see each.
    Is there any way to do this ?
    I feel the fact that CacheStore.storeAll(Map entries) works on cache-key/cache-value makes this impossible to have several cache-value for the same key :-( Not coalescing the consecutive changes on a given entry would require to have a method like CacheStore.storeAll(List<Change>) with Change holding the modification (insert/update/delete), the key and the value.
    Thanks,
    Cyrille
    Cyrille Le Clerc
    [email protected]
    http://blog.xebia.fr

    Thanks for the feedback Robert,
    I implemented this "batch processing without coalescing" thanks a "command queue" colocated with my data.
    Sample : perform async processing on each modification without coalescence of MyEntity identified by MyEntityKey store in "my-entity-cache".
    In my agent/entry-processor, I simultaneously modify my data "MyEntity" and put an entry MyEntityCommand (stored in "my-entity-command-cache"), MyEntityCommand holds enough information to do my async processing. This processing is done asynchronously by MyEntityCommandcacheStore.
    MyEntityCommand is associated with a key MyEntityCommandKey which is composed of MyIdentityKey+sequence-number, MyEntityCommandKey has a KeyAssociation with MyEntityKey to ensure colocation.
    Benefits :
    * There is no coalescence because MyEntityCommandKey contains a unique sequence number.
    * The overweight of this "command queue" is limited because the command object only contains that limited piece of data I need for my async processing (foreign key on the entity + few things) and the queue is self purging thanks to a <expiry-delay> on "my-entity-command-cache".
    Here is a configuration extract
    <distributed-scheme>
      <scheme-name>entity-partitionned</scheme-name>
      <service-name>EntityDistributedCache</service-name>
      <serializer>
      </serializer>
      <backing-map-scheme>
        <local-scheme>
          <scheme-ref>unlimited-backing-map</scheme-ref>
        </local-scheme>
      </backing-map-scheme>
      <thread-count>10</thread-count>
      <autostart>true</autostart>
    </distributed-scheme>
    <distributed-scheme>
      <scheme-name>entity-command-partitionned</scheme-name>
      <service-name>EntityDistributedCache</service-name>
      <serializer>
      </serializer>
      <backing-map-scheme>
        <read-write-backing-map-scheme>
          <cachestore-scheme>
            <class-scheme>
              <class-name>... EntityCommandStore</class-name>
            </class-scheme>
          </cachestore-scheme>
          <internal-cache-scheme>
            <local-scheme>
              <expiry-delay>30s</expiry-delay>
            </local-scheme>
          </internal-cache-scheme>
          <write-delay>10s</write-delay>
          <write-batch-factor>0.5</write-batch-factor>
        </read-write-backing-map-scheme>
      </backing-map-scheme>
      <thread-count>10</thread-count>
      <autostart>true</autostart>
    </distributed-scheme>
    {code}
    Please note that I had to play with <write-batch-factor> to increase the batching factor (ie the number of entries in each CacheStore.store()/storeAll() invocation). Under very high write load, <write-batch-factor> default value of 0 gave me an average of 1.2 entry per CacheStore.store()/storeAll() invocation. My first try of a 0.5 <write-batch-factor> largely increased this average to probably hundreds (my stats are done on an underlying layer, I don't have the exact average).
    Cyrille
    Cyrille Le Clerc
    [email protected]
    http://blog.xebia.fr                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               

  • Write behind cache, DB down, when should the system stop taking new data in

    Hello:
    We are trying to use Coherence for our custom ESB, which is brokering payloads of various size between consumer and provider applications.
    Before Coherence, stopping our DB meant organization-wide outage for critically important business services.
    Since we have at least 40G of RAM in production environment, we believe that our app
    can use Coherence write-behind option for tolerating at least several hours worth of DB outage.
    We are currently using a near cache backed by distributed cache in write-behind mode.
    9 business service JVMs (storage enabled=false) use 30 storage enabled JVMs.
    IMPORTANT: We need to create an automated alerting facility determining when
    amount of unsaved data reaches critical level since DB goes down. This alert should help us decide when our application stops accepting inbound traffic.
    It is hard to use QueueSize parameter for that because our payload memory footprint can vary from 1KB to 3MB.
    We do not expire any entries in order to enable support queries against the cache during DB outage.
    Our experiments with trying various flavors of overflow-scheme resulted in OutOfMemoryError, therefore
    we decided to implement RAM-only cache as a first step.
    <near-scheme>
    <scheme-name>message_payload_scheme</scheme-name>
    <front-scheme>
    <local-scheme>
    <scheme-ref>limited_entities_front_scheme</scheme-ref>
    <high-units>100</high-units>
    </local-scheme>
    </front-scheme>
    <back-scheme>
    <distributed-scheme>
    <backing-map-scheme>
    <read-write-backing-map-scheme>
    <internal-cache-scheme>
    <local-scheme>
    <scheme-ref>limited_bytes_scheme</scheme-ref>
    <high-units>199229440</high-units>
    </local-scheme>
    </internal-cache-scheme>
    <cachestore-scheme>
    <class-scheme>
    <class-name>com.comp.MessagePayloadStore</class-name>
    </class-scheme>
    </cachestore-scheme>
    <read-only>false</read-only>
    <write-delay-seconds>3</write-delay-seconds>
    <write-requeue-threshold>2147483646</write-requeue-threshold>
    </read-write-backing-map-scheme>
    </backing-map-scheme>
    <autostart>true</autostart>
    </distributed-scheme>
    </back-scheme>
    </near-scheme>
    <local-scheme>
    <scheme-name>limited_entities_front_scheme</scheme-name>
    <eviction-policy>LRU</eviction-policy>
    <unit-calculator>FIXED</unit-calculator>
    </local-scheme>
    <local-scheme>
    <scheme-name>limited_bytes_scheme</scheme-name>
    <eviction-policy>HYBRID</eviction-policy>
    <unit-calculator>BINARY</unit-calculator>
    </local-scheme>

    Good info ... I feel like I need to restate my original question along with a couple of new questions caused by the discussion above.
    Q1. Does Coherence evict 'dirty', or 'queued', or 'unsaved' objects for cache configuration provided above?
    The answer should be 'NO', otherwise Coherence is unsafe to use as a system of record,
    it should not just drop unsaved information on the floor.
    Q2. What happens to the front tier of the near+partitioned write behind cache described above when amount of unsaved data exceeds max cache capacity defined via high-units?
    I would expect that map.put starts throwing exceptions: cache storage is full, so it should not accept more data
    Q3. How can I determine a moment when amount of dirty data in bytes(!), not in objects, hits 85% of
    max allowed cache capasity configured in bytes (using high-units param and BINARY calculator).
    'DirtyUnits' counter can probably be built with some lower-level Coherence API. Can we use
    this API?
    Please, understand, that we purchased Coherence for reliability, for making our
    system independent from short DB outages, for keeping our business services up
    and running when DBA need some time for admin operations like rebuilding an index.
    Performance benefits are secondary and are not as obvious for our system which
    uses primary keys only and has a well-tuned co-located Oracle back-end.
    We simply cannot put Coherence to production unless we prove that Coherence
    can reliably hold the data and give us information about approaching crisis
    (the cache full of unsaved data).
    If possible, forward this message to Cameron Purdy,
    who was presenting Coherence to our team several moths ago.
    Thanks,
    Vasili Smaliak
    Applications Architect, Enterprise App Integration
    GMAC ResCap
    [email protected]

  • TTL specified in put operation doesn't always work when using write-behind

    I'm using a distributed cache with a write-behind cache store (see the config below). I found that when I do something like myCache.put(key, value, ttl), the entry survives the specified ttl. I tried doing the same with a distributed cache with a write-through cachestore and there everything does happen correctly.
    Is this sort of operation not permitted in caches containing a write-behind cachestore? If not wouldn't it be better to throw an UnsupportedOperationException.
    I created a small test to simulate this. I added values to the cache with a TTL of 1 to 10 seconds and found that the 10 second entries stayed in the cache.
    Configuration used:
    <?xml version="1.0"?>
    <!DOCTYPE cache-config SYSTEM "cache-config.dtd">
    <cache-config>
         <caching-scheme-mapping>
              <cache-mapping>
                   <cache-name>TTL_TEST</cache-name>
                   <scheme-name>testScheme</scheme-name>
              </cache-mapping>
         </caching-scheme-mapping>
         <caching-schemes>
              <distributed-scheme>
                   <scheme-name>testScheme</scheme-name>
                   <service-name>testService</service-name>
                   <backing-map-scheme>
                        <read-write-backing-map-scheme>
                             <internal-cache-scheme>
                                  <local-scheme>
                                       <service-name>testBackLocalService</service-name>
                                  </local-scheme>
                             </internal-cache-scheme>
                             <cachestore-scheme>
                                  <class-scheme>
                                       <scheme-name>testBackStore</scheme-name>
                                       <class-name>TTLTestServer$TestCacheStore</class-name>
                                  </class-scheme>
                             </cachestore-scheme>
                             <write-delay>3s</write-delay>
                        </read-write-backing-map-scheme>
                   </backing-map-scheme>
                   <local-storage>true</local-storage>
                   <autostart>true</autostart>
              </distributed-scheme>
         </caching-schemes>
    </cache-config>Code of test:
    import java.util.Collection;
    import java.util.List;
    import java.util.Map;
    import java.util.concurrent.Callable;
    import java.util.concurrent.ExecutionException;
    import java.util.concurrent.ExecutorService;
    import java.util.concurrent.Executors;
    import java.util.concurrent.Future;
    import org.joda.time.DateTime;
    import org.joda.time.Duration;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    import org.springframework.util.StopWatch;
    import org.testng.annotations.BeforeClass;
    import org.testng.annotations.Test;
    import com.google.common.collect.Lists;
    import com.tangosol.net.CacheFactory;
    import com.tangosol.net.NamedCache;
    import com.tangosol.net.cache.CacheStore;
    @Test
    public class TTLTestServer
         private static final int RETRIES = 5;
         private static final Logger logger = LoggerFactory.getLogger( TTLTestServer.class );
         private NamedCache m_cache;
          * List of Time-To-Lives in seconds to check
         private final List<Integer> m_listOfTTLs = Lists.newArrayList(1, 3, 5, 10);
          * Test is done in separate threads to speed up the test
         private final  ExecutorService m_executorService = Executors.newCachedThreadPool();
         @BeforeClass
         public void setup()
              logger.info("Getting the cache");
              m_cache =  CacheFactory.getCache("TTL_TEST");
         public static class TestCacheStore implements CacheStore
              public void erase(Object arg0)
              public void eraseAll(Collection arg0)
              public void store(Object arg0, Object arg1)
              public void storeAll(Map arg0)
              public Object load(Object arg0)
              {return null;}
              public Map loadAll(Collection arg0)
              {return null;}
         public void testTTL() throws InterruptedException, ExecutionException
              logger.info("Starting TTL test");
              List<Future<StopWatch>> futures = Lists.newArrayList();
              for (final Integer ttl : m_listOfTTLs)
                   futures.add(m_executorService.submit(new Callable()
                        public Object call() throws Exception
                             StopWatch stopWatch= new StopWatch("TTL=" + ttl);
                             for (int retry = 0; retry < RETRIES; retry++)
                                  logger.info("Adding a value in cache for TTL={} in try={}", ttl, retry+1);
                                  stopWatch.start("Retry="+retry);
                                  m_cache.put(ttl, null, ttl*1000);
                                  waitUntilNotInCacheAnymore(ttl, retry);
                                  stopWatch.stop();
                             return stopWatch;
                        private void waitUntilNotInCacheAnymore(final Integer ttl, final int currentTry) throws InterruptedException
                             DateTime startTime = new DateTime();
                             long maxMillisToWait = ttl*2*1000;     //wait max 2 times the time of the ttl
                             while(m_cache.containsKey(ttl) )
                                  Duration timeTaken = new Duration(startTime, new DateTime());
                                  if(timeTaken.getMillis() > maxMillisToWait)
                                       throw new RuntimeException("Already waiting " + timeTaken + " for ttl=" + ttl + " and retry=" +  currentTry);
                                  Thread.sleep(1000);
              logger.info("Waiting until all futures are finished");
              m_executorService.shutdown();
              logger.info("Getting results from futures");
              for (Future<StopWatch> future : futures)
                   StopWatch sw = future.get();
                   logger.info(sw.prettyPrint());
    }Failure message:
    FAILED: testTTL
    java.util.concurrent.ExecutionException: java.lang.RuntimeException: Already waiting PT20.031S for ttl=10 and retry=0
         at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
         at java.util.concurrent.FutureTask.get(Unknown Source)
         at TTLTestServer.testTTL(TTLTestServer.java:159)
    Caused by: java.lang.RuntimeException: Already waiting PT20.031S for ttl=10 and retry=0
         at TTLTestServer$1.waitUntilNotInCacheAnymore(TTLTestServer.java:139)
         at TTLTestServer$1.call(TTLTestServer.java:122)
         at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
         at java.util.concurrent.FutureTask.run(Unknown Source)
         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
         at java.lang.Thread.run(Unknown Source)I'm using Coherence 3.4.2.
    Best regards
    Jan

    Hi, still no luck. However, I noticed that setting the write-delay value of the write-behind store to 0s or 1s, solved the problem. It only starts to given me "the node has already been removed" excpetions once the write-delay value is 2s or higher.
    You can find the coherence-cache-config.xml below:
    <?xml version="1.0"?>
    <!DOCTYPE cache-config SYSTEM "cache-config.dtd">
    <cache-config>
         <caching-scheme-mapping>
              <cache-mapping>
                   <cache-name>TTL_TEST</cache-name>
                   <scheme-name>testScheme</scheme-name>
              </cache-mapping>
         </caching-scheme-mapping>
         <caching-schemes>
              <distributed-scheme>
                   <scheme-name>testScheme</scheme-name>
                   <service-name>testService</service-name>
                   <backing-map-scheme>
                        <read-write-backing-map-scheme>
                             <internal-cache-scheme>
                                  <local-scheme>
                                       <service-name>testBackLocalService</service-name>
                                  </local-scheme>
                             </internal-cache-scheme>
                             <cachestore-scheme>
                                  <class-scheme>
                                       <scheme-name>testBackStore</scheme-name>
                                       <class-name>TTLTestServer$TestCacheStore</class-name>
                                  </class-scheme>
                             </cachestore-scheme>
                             <write-delay>2s</write-delay>
                        </read-write-backing-map-scheme>
                   </backing-map-scheme>
                   <local-storage>true</local-storage>
                   <autostart>true</autostart>
              </distributed-scheme>
         </caching-schemes>
    </cache-config>You can find the test program below:
    import java.util.ArrayList;
    import java.util.Collection;
    import java.util.List;
    import java.util.Map;
    import org.joda.time.DateTime;
    import org.joda.time.Duration;
    import org.springframework.util.StopWatch;
    import com.tangosol.net.CacheFactory;
    import com.tangosol.net.NamedCache;
    import com.tangosol.net.cache.CacheStore;
    public class TTLTestServer
         private static final int RETRIES = 5;
         private NamedCache m_cache;
          * List of Time-To-Lives in seconds to check
         private final List<Integer> m_listOfTTLs = new ArrayList<Integer>();
          * @param args
          * @throws Exception
         public static void main( String[] args ) throws Exception
              new TTLTestServer().test();
          * Empty CacheStore
          * @author jbe
         public static class TestCacheStore implements CacheStore
              public void erase(Object arg0)
              @SuppressWarnings ( "unchecked" )
              public void eraseAll(Collection arg0)
              public void store(Object arg0, Object arg1)
              @SuppressWarnings ( "unchecked" )
              public void storeAll(Map arg0)
              public Object load(Object arg0)
              {return null;}
              @SuppressWarnings ( "unchecked" )
              public Map loadAll(Collection arg0)
              {return null;}
          * Sets up and executes the test setting values in a cache with a given time-to-live value and waiting for the value to disappear.
          * @throws Exception
         private void test() throws Exception
              System.out.println(new DateTime() + " - Setting up TTL test");
              m_cache =  CacheFactory.getCache("TTL_TEST");
              m_listOfTTLs.add( 1 );
              m_listOfTTLs.add( 3 );
              m_listOfTTLs.add( 5 );
              m_listOfTTLs.add( 10);
              System.out.println(new DateTime() + " - Starting TTL test");
              for (final Integer ttl : m_listOfTTLs)
                   StopWatch sw = doTest(ttl);
                   System.out.println(sw.prettyPrint());
          * Adds a value to the cache with the time-to-live as given by the ttl parameter and waits until it's removed from the cache.
          * Repeats this {@link #RETRIES} times
          * @param ttl
          * @return
          * @throws Exception
         private StopWatch doTest(Integer ttl) throws Exception
              StopWatch stopWatch= new StopWatch("TTL=" + ttl);
              for (int retry = 0; retry < RETRIES; retry++)
                   System.out.println(new DateTime() + " - Adding a value in cache for TTL=" + ttl + " in try= " + (retry+1));
                   stopWatch.start("Retry="+retry);
                   m_cache.put(ttl, null, ttl*1000);
                   waitUntilNotInCacheAnymore(ttl, retry);
                   stopWatch.stop();
              return stopWatch;
          * Wait until the value for the given ttl is not in the cache anymore
          * @param ttl
          * @param currentTry
          * @throws InterruptedException
         private void waitUntilNotInCacheAnymore(final Integer ttl, final int currentTry) throws InterruptedException
              DateTime startTime = new DateTime();
              long maxMillisToWait = ttl*2*1000;     //wait max 2 times the time of the ttl
              while(m_cache.containsKey(ttl) )
                   Duration timeTaken = new Duration(startTime, new DateTime());
                   if(timeTaken.getMillis() > maxMillisToWait)
                        throw new RuntimeException("Already waiting " + timeTaken + " for ttl=" + ttl + " and retry=" +  currentTry);
                   Thread.sleep(1000);
    }You can find the output below:
    2009-12-03T11:50:04.584+01:00 - Setting up TTL test
    2009-12-03 11:50:04.803/0.250 Oracle Coherence 3.5.2/463p2 <Info> (thread=main, member=n/a): Loaded operational configuration from resource "jar:file:/C:/Temp/coherence3.5.2/coherence-java-v3.5.2b463-p1_2/coherence/lib/coherence.jar!/tangosol-coherence.xml"
    2009-12-03 11:50:04.803/0.250 Oracle Coherence 3.5.2/463p2 <Info> (thread=main, member=n/a): Loaded operational overrides from resource "jar:file:/C:/Temp/coherence3.5.2/coherence-java-v3.5.2b463-p1_2/coherence/lib/coherence.jar!/tangosol-coherence-override-dev.xml"
    2009-12-03 11:50:04.803/0.250 Oracle Coherence 3.5.2/463p2 <D5> (thread=main, member=n/a): Optional configuration override "/tangosol-coherence-override.xml" is not specified
    2009-12-03 11:50:04.803/0.250 Oracle Coherence 3.5.2/463p2 <D5> (thread=main, member=n/a): Optional configuration override "/custom-mbeans.xml" is not specified
    Oracle Coherence Version 3.5.2/463p2
    Grid Edition: Development mode
    Copyright (c) 2000, 2009, Oracle and/or its affiliates. All rights reserved.
    2009-12-03 11:50:04.943/0.390 Oracle Coherence GE 3.5.2/463p2 <Info> (thread=main, member=n/a): Loaded cache configuration from "file:/C:/jb/workspace3.5/TTLTest/target/classes/coherence-cache-config.xml"
    2009-12-03 11:50:05.318/0.765 Oracle Coherence GE 3.5.2/463p2 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a
    2009-12-03 11:50:08.568/4.015 Oracle Coherence GE 3.5.2/463p2 <Info> (thread=Cluster, member=n/a): Created a new cluster "cluster:0xD3FB" with Member(Id=1, Timestamp=2009-12-03 11:50:05.193, Address=172.16.44.32:8088, MachineId=36896, Location=process:11848, Role=TTLTestServerTTLTestServer, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=2) UID=0xAC102C20000001255429380990201F98
    2009-12-03 11:50:08.584/4.031 Oracle Coherence GE 3.5.2/463p2 <D5> (thread=Invocation:Management, member=1): Service Management joined the cluster with senior service member 1
    2009-12-03 11:50:08.756/4.203 Oracle Coherence GE 3.5.2/463p2 <D5> (thread=DistributedCache:testService, member=1): Service testService joined the cluster with senior service member 1
    2009-12-03T11:50:08.803+01:00 - Starting TTL test
    2009-12-03T11:50:08.818+01:00 - Adding a value in cache for TTL=1 in try= 1
    2009-12-03T11:50:09.818+01:00 - Adding a value in cache for TTL=1 in try= 2
    Exception in thread "main" (Wrapped: Failed request execution for testService service on Member(Id=1, Timestamp=2009-12-03 11:50:05.193, Address=172.16.44.32:8088, MachineId=36896, Location=process:11848, Role=TTLTestServerTTLTestServer)) java.lang.IllegalStateException: the node has already been removed
         at com.tangosol.util.Base.ensureRuntimeException(Base.java:293)
         at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.tagException(Grid.CDB:36)
         at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache.onContainsKeyRequest(DistributedCache.CDB:41)
         at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ContainsKeyRequest.run(DistributedCache.CDB:1)
         at com.tangosol.coherence.component.net.message.requestMessage.DistributedCacheKeyRequest.onReceived(DistributedCacheKeyRequest.CDB:12)
         at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onMessage(Grid.CDB:9)
         at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onNotify(Grid.CDB:136)
         at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache.onNotify(DistributedCache.CDB:3)
         at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
         at java.lang.Thread.run(Thread.java:619)
    Caused by: java.lang.IllegalStateException: the node has already been removed
         at com.tangosol.util.AbstractSparseArray$Crawler.remove(AbstractSparseArray.java:1274)
         at com.tangosol.net.cache.OldCache.evict(OldCache.java:580)
         at com.tangosol.net.cache.OldCache.containsKey(OldCache.java:171)
         at com.tangosol.net.cache.ReadWriteBackingMap.containsKey(ReadWriteBackingMap.java:597)
         at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache.onContainsKeyRequest(DistributedCache.CDB:25)
         ... 7 more
    2009-12-03 11:50:10.834/6.281 Oracle Coherence GE 3.5.2/463p2 <D4> (thread=ShutdownHook, member=1): ShutdownHook: stopping cluster node
    2009-12-03 11:50:10.834/6.281 Oracle Coherence GE 3.5.2/463p2 <D5> (thread=Cluster, member=1): Service Cluster left the clusterBest regards
    Jan

  • Write-Behind Caching and Re-entrant Calls

    Support Team -
         The Coherence User Guide states that:
         "The CacheStore implementation must not call back into the hosting cache service. This includes OR/M solutions that may internally reference Coherence cache services. Note that calling into another cache service instance is allowed, though care should be taken to avoid deeply nested calls (as each call will "consume" a cache service thread and could result in deadlock if a cache service threadpool is exhausted)."
         I have Load-tested a use case wherein I have two caches: ABCache and BACache. ABCache is accessed by the application for write operation, BACache is accessed by the application for read operation. ABCache is a write-behind cache whose CacheStore populates BACache by reversing key and value of each cache entry stored in the ABCache.
         The solution worked under load with no issues.
         But can I use it? Or is it too dangerous?
         My write-behind thread-count setting is left at default (0). The documentation states that
         "If zero, all relevant tasks are performed on the service thread."
         What does this mean? Can I re-enter the caching service if my thread-count is zero?
         Thank you,
         Denis.

    Dimitri -
         I am not sure I fully understand your answer:
         1. "Your test worked because write-behing backing map invokes CacheStore methods asynchronously, on a write-behind thread." In my configuration, I have default value for thread-count, which is zero. According to the documentation, that means that CacheStore methods would be executed by the service thread and not by the write-behind thread. Do I understand this correctly?
         2. "If will fail if CacheStore method will need to be invoked synchronously on a service thread." I am not sure what is the purpose of the "service thread". In which scenarios the "CacheStore method will need to be invoked synchronously on a service thread"?
         Thank you,
         Denis.

  • Thread pool configuration for write-behind cache store operation?

    Hi,
    Does Coherence have a thread pool configuration for the Coherence CacheStore operation?
    Or the CacheStore implementation needs to do that?
    We're using write-behind and want to use multiple threads to speed up the store operation (storeAll()...)
    Thanks in advance for your help.

    user621063 wrote:
    Hi,
    Does Coherence have a thread pool configuration for the Coherence CacheStore operation?
    Or the CacheStore implementation needs to do that?
    We're using write-behind and want to use multiple threads to speed up the store operation (storeAll()...)
    Thanks in advance for your help.Hi,
    read/write-through operations are carried out on the worker thread (so if you configured a thread-pool for the service the same thread-pool will be used for the cache-store operation).
    for write-behind/read-ahead operations, there is a single dedicated thread per cache above whatever thread-pool is configured, except for remove operations which are synchronous and still carried out on the worker thread (see above).
    All above is of course per storage node.
    Best regards,
    Robert

  • Write behind exception and recovery

    Hi all,
    I am working on write behind part in equity trading system. I know that cache store operation will eventually be thrown away if no of retry exceed write-requeue-threshold. However, this is not acceptable as DB must sync with caches at least at day end. For some more complicated caches, we use cache store implementation and Hiberate for simple cache. I am thinking to capture the sql statements that failed during the day and finally at day end, manually fix issues (egDB issue or others) then have them executed.
    Questions:
    1. Is this a good approach for handling the scenario? If yes, any way I can capture the statements and write to file for running in SQL plus for example in case of Hiberate?
    2. Is there any out of box mechanism in Coherence for recovering write-behind queues in case of WHOLE cluster fail (not node fail).
    Henry

    922963 wrote:
    Hi all,
    I am working on write behind part in equity trading system. I know that cache store operation will eventually be thrown away if no of retry exceed write-requeue-threshold. However, this is not acceptable as DB must sync with caches at least at day end. For some more complicated caches, we use cache store implementation and Hiberate for simple cache. I am thinking to capture the sql statements that failed during the day and finally at day end, manually fix issues (egDB issue or others) then have them executed.
    Questions:
    1. Is this a good approach for handling the scenario? If yes, any way I can capture the statements and write to file for running in SQL plus for example in case of Hiberate?Hi Henry,
    There are a few caveats you need to care about but in general it is possible.
    Not necessarily SQLs but serialized entries would probably be simpler to work with when you try to restore them.
    Also, you have to be aware that Coherence may fail to write an entry to the DB but at retry it may try to write a new entry. If it succeeds, you have to be able to figure that out that the earlier failure must not be reexecuted.
    In effect, you should have per-entry versioning in the database and you should check versions of the entity in the database upon writing both from the cache store and also from your end-of-day retry logic.
    2. Is there any out of box mechanism in Coherence for recovering write-behind queues in case of WHOLE cluster fail (not node fail).
    No, nothing like that comes out-of-the-box, if you lost a partition, you lost your write-behind-enqueued entries, too. You could log your failed writes to disk though as you indicated above.
    Best regards,
    Robert

Maybe you are looking for

  • Use the authorization object while creating RFC

    Hi All, I'm able to create a RFC, can login from one sap system to another sap system and use the  following FM.  Here my concern is how to make the RFC more secure, i mean any user can access the target system with my login. Meanwhile came across a

  • Magic mouse no longer works, Magic mouse no longer works

    Magic mouse will right and left click, but otherwise, is not responsive.

  • Can a hyperlink be created for an iCal Todo?

    I would like to be able to refer create an iCal Todo URL (not Mail todo) so that I can link to a particular todo from elsewhere. Is this possible?

  • Data Model - Idea Required

    Hello, I have 4 different flat files. Each File will have about 16 lakh records every year. I want to know what is best possible data flow i can use. Like 4 DSO from 4 data sources then whether to use one DSO collecting all or 4 DSO or 4 Cubes and so

  • Outgoing Mail Password

    How to change /set up outgoing mail server password