Write behind exception and recovery

Hi all,
I am working on write behind part in equity trading system. I know that cache store operation will eventually be thrown away if no of retry exceed write-requeue-threshold. However, this is not acceptable as DB must sync with caches at least at day end. For some more complicated caches, we use cache store implementation and Hiberate for simple cache. I am thinking to capture the sql statements that failed during the day and finally at day end, manually fix issues (egDB issue or others) then have them executed.
Questions:
1. Is this a good approach for handling the scenario? If yes, any way I can capture the statements and write to file for running in SQL plus for example in case of Hiberate?
2. Is there any out of box mechanism in Coherence for recovering write-behind queues in case of WHOLE cluster fail (not node fail).
Henry

922963 wrote:
Hi all,
I am working on write behind part in equity trading system. I know that cache store operation will eventually be thrown away if no of retry exceed write-requeue-threshold. However, this is not acceptable as DB must sync with caches at least at day end. For some more complicated caches, we use cache store implementation and Hiberate for simple cache. I am thinking to capture the sql statements that failed during the day and finally at day end, manually fix issues (egDB issue or others) then have them executed.
Questions:
1. Is this a good approach for handling the scenario? If yes, any way I can capture the statements and write to file for running in SQL plus for example in case of Hiberate?Hi Henry,
There are a few caveats you need to care about but in general it is possible.
Not necessarily SQLs but serialized entries would probably be simpler to work with when you try to restore them.
Also, you have to be aware that Coherence may fail to write an entry to the DB but at retry it may try to write a new entry. If it succeeds, you have to be able to figure that out that the earlier failure must not be reexecuted.
In effect, you should have per-entry versioning in the database and you should check versions of the entity in the database upon writing both from the cache store and also from your end-of-day retry logic.
2. Is there any out of box mechanism in Coherence for recovering write-behind queues in case of WHOLE cluster fail (not node fail).
No, nothing like that comes out-of-the-box, if you lost a partition, you lost your write-behind-enqueued entries, too. You could log your failed writes to disk though as you indicated above.
Best regards,
Robert

Similar Messages

Write-Behind Caching and Limited Internal Cache Size

Let's say I have a write-behind cache and configure its internal cache to be of a fixed limited size, e.g. 10000 units. What would happen if more than 10000 units are added to the write-behind cache within the write-delay period? Would my CacheStore's storeAll() get all of the added values or would some of the values be missed because of the internal cache size limitation?

Hi Denis,     >
     > If an entry is removed while it is still in the
     > write-behind queue, it will be removed from the queue
     > and CacheStore.store(oKey, oValue) will be invoked
     > immediately.
     >
     > Regards,
     > Dimitri
     Dimitri,
     Just to confirm, that I understand it right if there is a queued update to a key which is then remove()-ed from the cache, then the following happens:
     First CacheStore.store(key, queuedUpdateValue) is invoked.
     Afterwards CacheStore.erase(key) is invoked.
     Both synchronously to the remove() call.
     I expected only erase will be invoked.
     BR,
     Robert

Write-Behind, Expiration, and SQL Exceptions.

Hi Chaps,
If a cache with write-behind enabled has problems writing to the DB I understand that Coherence will re-queue the objects and write them when the DB is available.
The problem I have is that (after a DB failure) I don't see them being written - I can see these items in the cache but not in the DB, even several hours after the outage. (Items that were added to the cache after the outage are being written).
Is there anything the cachestore methods (specifically store() ) need to do with regards to exceptions to ensure that these items are re-qeueued?
Next question is: I was also wondering how is this managed with regards to expiry?
We have our own expiry routine which removes items from the cache that are older than 24 hours (this was from before we could expire objects by specifying the timeout in the put() method call, which I am intending to switch to).
If an item has not been written to the DB due to an outage and is then expired (by our own routine or by Coherence) is it then lost forever, or will it remain in the queue? (seeing as the queue holds references I am guessing not but though I'd check).
Thanks,
Randal.

Jon,
I have a question related to this...If you remember a few weeks back, I stumbled upon the problem of the "version-persistent" map for the versioned-backing-map-scheme does not accept putAll operations. The workaround until you guys implement it, was to override the putAll method of the cacheStore and throw and unsupported operation exception (to force individual puts).
Well, although this workaround works, I am getting tons and tons of:
2006-04-06 17:18:27.347 Tangosol Coherence 3.1/339 <Warning> (thread=WriteBehindThread:MyCacheStore, member=1): The CacheStore "MyCacheStore@46b9979b" does not support storeAll().
2006-04-06 17:18:27.348 Tangosol Coherence 3.1/339 <Error> (thread=WriteBehindThread:MyCacheStore, member=1): Failed to store keys="[16, 18, 21, 26, 5, 13, 14, 25, 17, 15, 23, 19, 2, 6, 9, 7]":
java.lang.UnsupportedOperationException
at ...MyCacheStore.storeAll(MyCacheStore.java:126)
at com.tangosol.net.cache.ReadWriteBackingMap$CacheStoreWrapper.storeAll(ReadWriteBackingMap.java:3820)
at com.tangosol.net.cache.ReadWriteBackingMap$WriteThread.run(ReadWriteBackingMap.java:3538)
at com.tangosol.util.Daemon$1.run(Daemon.java:63)
2006-04-06 17:18:27.349 Tangosol Coherence 3.1/339 <Warning> (thread=WriteBehindThread:MyCacheStore, member=1): Requeued store for key="16"
2006-04-06 17:18:27.349 Tangosol Coherence 3.1/339 <Warning> (thread=WriteBehindThread:MyCacheStore, member=1): Requeued store for key="18"
2006-04-06 17:18:27.350 Tangosol Coherence 3.1/339 <Warning> (thread=WriteBehindThread:MyCacheStore, member=1): Requeued store for key="21"
2006-04-06 17:18:27.351 Tangosol Coherence 3.1/339 <Warning> (thread=WriteBehindThread:MyCacheStore, member=1): Requeued store for key="26"
the first OperationNotSupported is expected, but I'm not sure what the requeued warnings are all about. These are not failures to the DB...it is something else. (mind you that this happens when trying to load a lot of data into the map.)
1- Is this requeuing related or the same as in failed DB stores?
2- Is it possible to "lose" stores if I don't configure the write-requeue-threshold with very, very high values? I must ensure I don't lose anything.
In a related note, in some circumstances, I need to ensure that the "write queue" is flushed or cleared. For example, I may want to force a flush of all pending stores (and wait/block until that's done).
I have looked into it and I don't seem to know how to do it. I can read the write-queue length, but I believe that this is not very accurate...since my tests seem to indicate that the write-behind thread may take the entries to store off the write-queue and then deal with them in parallel (which means that there are still entries althought the write-queue size is 0). Also, there are some calls from the cache store that, at first, seem to give some access to the write thread (potentially allowing me to contact the thread to tell him to flush or discard any pending stores)...but I believe that all of the functions are protected...but there may be other ways..
I guess my second batch of questions are:
1- How can I effectively force a flush (or clear) of the pending stores. Such that there is no single store pending in any queue (visible or invisible to the programmer).
2- What is the role of re-queuing in these situations? where is the queue sitting, the thread? the cache store? who's responsible of retrying that, and when?...I would like to flush those entries too.
A quick explanation of the operation of the write thread would also be very appreciated.
Thanks!
Josep M.

Write-Behind Caching and Re-entrant Calls

Support Team -
     The Coherence User Guide states that:
     "The CacheStore implementation must not call back into the hosting cache service. This includes OR/M solutions that may internally reference Coherence cache services. Note that calling into another cache service instance is allowed, though care should be taken to avoid deeply nested calls (as each call will "consume" a cache service thread and could result in deadlock if a cache service threadpool is exhausted)."
     I have Load-tested a use case wherein I have two caches: ABCache and BACache. ABCache is accessed by the application for write operation, BACache is accessed by the application for read operation. ABCache is a write-behind cache whose CacheStore populates BACache by reversing key and value of each cache entry stored in the ABCache.
     The solution worked under load with no issues.
     But can I use it? Or is it too dangerous?
     My write-behind thread-count setting is left at default (0). The documentation states that
     "If zero, all relevant tasks are performed on the service thread."
     What does this mean? Can I re-enter the caching service if my thread-count is zero?
     Thank you,
     Denis.

Dimitri -
     I am not sure I fully understand your answer:
     1. "Your test worked because write-behing backing map invokes CacheStore methods asynchronously, on a write-behind thread." In my configuration, I have default value for thread-count, which is zero. According to the documentation, that means that CacheStore methods would be executed by the service thread and not by the write-behind thread. Do I understand this correctly?
     2. "If will fail if CacheStore method will need to be invoked synchronously on a service thread." I am not sure what is the purpose of the "service thread". In which scenarios the "CacheStore method will need to be invoked synchronously on a service thread"?
     Thank you,
     Denis.

Write-Behind Caching and Old Values

Is there a way to access the old value cached in the write-behind cache for the same key from the CacheStore's store() or storeAll() method?

I have a business POJO with three parts: partA,     > partB, partC inside. Each of these three parts is
     > persisted by a separate SQL. So, every time I persist
     > my POJO, up to 3 SQLs may be executed.
     I understand.
     > When a change happens in my POJO, it goes onto the
     > write-behind queue. In my CacheStore.store() or
     > CacheStore.storeAll() I would like to be able to make
     > an intelligent decision about which of the three
     > parts: partA, partB or partC has actually changed and
     > only run the SQL updates for the changed parts. This
     > would allow me to avoid massive amounts of
     > unnecessary SQL updates for the parts that did not
     > change.
     Right. Keep in mind that there are two conditions that you must be aware of:
     1) Multiple updates could have occurred to the object, meaning that the database update would have to "roll up" the results of multiple changes to the object.
     2) Some or all of the updates could have already occurred to the database. This may be a little trickier to understand, but it reflects the possible machine failure conditions that occurred while a write-behind was in progress.
     Although the latter are unlikely, they should be accounted for, and of course they are harder to test for with certainty. As a result, the updates to the information (the CacheStore implementation) must be built in an "idempotent" manner, i.e. allowing it to be executed more than once with no additional side-effects.
     > If I had access to the POJO stored under the same key
     > before the new value was put in cache, I could use
     > equals() on each of the three parts to find out
     > exactly which one of them changed.
     While this is true, you would need to compare the "known previous database state" version, not just the "old" version.
     > Of course, if this functionality is not available, I
     > would have to create dirty flags for each of the
     > three POJO parts. But I can't really clear my POJO's
     > flags and recache the POJO from within the store() or
     > storeAll(), right?
     Yes, but remember that those flags are "could be dirty" flags, because of the above failure modes that I described.
     Peace,
     Cameron Purdy
     Tangosol Coherence: The Java Data Grid

Write-Behind Caching and Multiple Puts

What happens when two consecutive puts are performed on the write-behind cache for the same key? Will CacheStore's store() or storeAll() be invoked once for every put() or only once for the last put() (the one which overrode the previous cached values)?

Hi Denis,
     If you use write-behind, there will be no unnesessary database updates - only the last put() will result in database update.
     Regards,
     Dimitri

Can a db slowdown with write-behind cause a slowdown in cache operations?

If we have a coherence cluster, and one cache configured with write-behind is having trouble writing to the db (ie, it's slow), and we keep adding objects to the cache that exceed the ability of the db to consume them; will flow-control kick in and cause the writes to the cache to block/slow-down? Ie, the classic producer-consumer problem, where we are adding objects to the cache, faster than the cachestore can consume them.
What happens in this case? Will flow-control kick in and block writes to the cache? Will an internal buffer just keep growing? Are there any knobs to tweak this behavior (eg, in the case of spikes, where temporarily the producer is producing faster than the consumer can consume for a brief period of time, but then things go back to normal)?

user9222505 wrote:
I believe we discovered that the same thread pool is used for all requests to the cache, including gets, puts and calls into the cachestore. So if the writes are slow within the cachestore, then it uses up all of the threads and slows everything down.Hi,
This is not really correct.
If a cache in a service is configured to use write-behind then a separate thread for that service is started, which deals with write-behind store and storeAll operations.
The remove operations need to be handled synchronously to avoid corruption of the data-set in the scenario of reading a entry from the cache immediately after removing it (if it were not synchronously deleted from the backing storage, then reading it back could give an incorrect non-null value). Therefore remove operations are handled synchronously on the service / worker thread, and not delayed on the write-behind thread.
Gets are also synchronously handled, so they again are served on the service / worker thread.
So if the puts are slow and wait too much, that may delay other puts but should not contend with other threads. If the puts are computation intensive, then obviously they hinder other threads because of consumption of the same CPU resource, and not simply because they execute.
Best regards,
Robert

Write-behind max speed?

Hi,
We are trying to test the speed of the write behind mechanism and we would be interested to know how other coherence users handle, for example, writing 1 million rows into the database.
At the moment, using jdbc batch inserts we can write approximately 30000 rows per minute, which means it would take about 30 minutes to save 1 million rows. Are there any other methods that other coherence user's use that can improve on this?
Many thanks,

user738616 wrote:
Hi,
This has nothing to do with Coherence as the implementation of CacheStore is outside of Coherence. Apart from JDBC Batch, you should try using PLSQL Bulk binds for such numbers.
Hope this helps!
Cheers,
NJHi NJ,
we actually measured PLSQL bulk binds against plain SQL (both with JDBC)... for anything which can be translated to plain inserts/updates, plain SQL is way faster (more than 10x).
You can only win with bulk binds when that statement which you send down actually does more complex logic and multiple statements so you actually win with optimizing away the roundtrips, too.
Best regards,
Robert

Read-through/write-behind and queued deletes (and updates)

Hi,
If I am changing the state of objects in a cache and using the write-behind and read-through mechanism what happens when I have deleted or updated an object in the cache but the change has not yet been committed to the database?.
If I delete and object in the cache and the delete DB operation is being queued and during this time try and perform a get against the key for the object is the value read through from the database or is it ignored since the database delete is pending?
For updates I presume that the value in the cache will be used - as the value exists in the cache and a read-through from the database will not be triggered.
Can you clarify the behavior of Coherence under these circumstances, particularly that of a pending delete.
Thanks,
Dave

Hi Dave,
If I am changing the state of objects in a cache and
using the write-behind and read-through mechanism
what happens when I have deleted or updated an object
in the cache but the change has not yet been
committed to the database?.
If I delete and object in the cache and the delete DB
operation is being queued and during this time try
and perform a get against the key for the object is
the value read through from the database or is it
ignored since the database delete is pending?I seem to remember a forum post mentioning that the removes from a write-behind cache are performed synchronously (they are done as part of the backingMap.remove(key)) operation so even if there are were pending updates in the write-behind queue. If I remember correctly, then the above mentioned problem cannot happen.
>
For updates I presume that the value in the cache
will be used - as the value exists in the cache and a
read-through from the database will not be
triggered.
Exactly.
Best regards,
Robert

Rescue and Recovery not recognizing USB DVD writer on X200

I have a SAMSUNG SE-S084B USB CD/DVDRW that reads/burns DVD-R ok in ImgBurn. I already have backup made on the hard drive by RR.
I tried to use the Rescue and Recovery Advanced mode -> menu-Advanced -> "Copy backup from hard drives" to make a set up of backup on the DVD-Rs. However, RR doesn't recognize my USB DVD writer, and therefore I cannot proceed the backup.
Please advise how to make backups on a DVD for Thinkpad X200. Thanks a lot!

I got it!
You just need to be patient. It can take up to two hours.
I wasted about 30 minutes in the fine quality customer support of inner-city Atlanta just to be told it takes that long. 30 minutes.
I am extremely unimpressed with Atlanta customer service, and the rep's advice after 25 minutes was that he was going to send me a couple more recovery disks! He had no idea about anything I was talking about. I complained to the manager who finally told me to just wait. However when I complained to him he thought it was a joke.
Customer service in Lenovo, especially their Atlanta location is a farce. You're welcome for me having to sit through that.

Write behind cache, DB down, when should the system stop taking new data in

Hello:
We are trying to use Coherence for our custom ESB, which is brokering payloads of various size between consumer and provider applications.
Before Coherence, stopping our DB meant organization-wide outage for critically important business services.
Since we have at least 40G of RAM in production environment, we believe that our app
can use Coherence write-behind option for tolerating at least several hours worth of DB outage.
We are currently using a near cache backed by distributed cache in write-behind mode.
9 business service JVMs (storage enabled=false) use 30 storage enabled JVMs.
IMPORTANT: We need to create an automated alerting facility determining when
amount of unsaved data reaches critical level since DB goes down. This alert should help us decide when our application stops accepting inbound traffic.
It is hard to use QueueSize parameter for that because our payload memory footprint can vary from 1KB to 3MB.
We do not expire any entries in order to enable support queries against the cache during DB outage.
Our experiments with trying various flavors of overflow-scheme resulted in OutOfMemoryError, therefore
we decided to implement RAM-only cache as a first step.
<near-scheme>
<scheme-name>message_payload_scheme</scheme-name>
<front-scheme>
<local-scheme>
<scheme-ref>limited_entities_front_scheme</scheme-ref>
<high-units>100</high-units>
</local-scheme>
</front-scheme>
<back-scheme>
<distributed-scheme>
<backing-map-scheme>
<read-write-backing-map-scheme>
<internal-cache-scheme>
<local-scheme>
<scheme-ref>limited_bytes_scheme</scheme-ref>
<high-units>199229440</high-units>
</local-scheme>
</internal-cache-scheme>
<cachestore-scheme>
<class-scheme>
<class-name>com.comp.MessagePayloadStore</class-name>
</class-scheme>
</cachestore-scheme>
<read-only>false</read-only>
<write-delay-seconds>3</write-delay-seconds>
<write-requeue-threshold>2147483646</write-requeue-threshold>
</read-write-backing-map-scheme>
</backing-map-scheme>
<autostart>true</autostart>
</distributed-scheme>
</back-scheme>
</near-scheme>
<local-scheme>
<scheme-name>limited_entities_front_scheme</scheme-name>
<eviction-policy>LRU</eviction-policy>
<unit-calculator>FIXED</unit-calculator>
</local-scheme>
<local-scheme>
<scheme-name>limited_bytes_scheme</scheme-name>
<eviction-policy>HYBRID</eviction-policy>
<unit-calculator>BINARY</unit-calculator>
</local-scheme>

Good info ... I feel like I need to restate my original question along with a couple of new questions caused by the discussion above.
Q1. Does Coherence evict 'dirty', or 'queued', or 'unsaved' objects for cache configuration provided above?
The answer should be 'NO', otherwise Coherence is unsafe to use as a system of record,
it should not just drop unsaved information on the floor.
Q2. What happens to the front tier of the near+partitioned write behind cache described above when amount of unsaved data exceeds max cache capacity defined via high-units?
I would expect that map.put starts throwing exceptions: cache storage is full, so it should not accept more data
Q3. How can I determine a moment when amount of dirty data in bytes(!), not in objects, hits 85% of
max allowed cache capasity configured in bytes (using high-units param and BINARY calculator).
'DirtyUnits' counter can probably be built with some lower-level Coherence API. Can we use
this API?
Please, understand, that we purchased Coherence for reliability, for making our
system independent from short DB outages, for keeping our business services up
and running when DBA need some time for admin operations like rebuilding an index.
Performance benefits are secondary and are not as obvious for our system which
uses primary keys only and has a well-tuned co-located Oracle back-end.
We simply cannot put Coherence to production unless we prove that Coherence
can reliably hold the data and give us information about approaching crisis
(the cache full of unsaved data).
If possible, forward this message to Cameron Purdy,
who was presenting Coherence to our team several moths ago.
Thanks,
Vasili Smaliak
Applications Architect, Enterprise App Integration
GMAC ResCap
[email protected]

Rescue and Recovery 4 DOES NOT RESCUE/COPY LARGE FILES?!

Hello all,
Thank you in advanced for helping me out/any helpful advice you may have.
System:
Windows XP Professional
T61
4 GB RAM
2.50 Ghz Dual Core Intel Processor
120 GB Hard Drive
140m nVidia Quadro video card (256 MB available memory I believe)
Problem Summary:
I ran into the unlucky computer problem of having a Microsoft.net framework addon for Firefox 3.5.3 result in a fatal interaction leaving me unable to boot up in both safe and normal mode. If you do have this addon, please uninstall it NOW, so that it does not happen to you. There is no current fix or patch for it as far as I know. Luckily, I have the (formerly) fantastic Rescue and Recovery 4 to backup my files and fix things.
So in sequence of what I did:
Tried several different attempts to fix problem to no avail (via google searches on the Rescue and Recovery Browser)
Decided that it would be best to just reformat the drive and start a clean install.
I had not done a real back up recently (several months due to my own carelessness and being busy).
I chose to do a full factory restore using the 'Rescue My Files' option, since I had never done a backup using the included Lenovo Software, I usually just ran my own manual ones in Windows.
To be on the safe side, I just decided to transfer my entire hard drive directory 'C:' to my 500 GB External Western Digital Hard Drive. I know I didn't really NEED all the files (like program files I would just need to reinstall anyway). I just wanted to return my computer to how it was as fast as possible.
Rescue and Recovery registered as copying all my data (rougly 50-60 GB) of data to my external USB Hard Drive.
I then unplugged my External USB Drive From the computer once it said it was complete. Rescue and Recovery proceeds with the factory restore.
Things are going well, factory restore is successful, no problems yet.
Once Windows was functional again and I had remade my usernames and settings just as it was before the crash, I plugged in my External HD again to restore my files.
Every file that I copied was there EXCEPT FOR MY MOST IMPORTANT ONES. It appears the directory, 'C:\Documents and Settings\Justin\My Documents'
AND
'C:\Documents and Settings\Justin\Desktop'
DID NOT COPY. I DID A DOUBLE CHECK, and the program said that all files had transferred successfully and I saw the directories on the USB Hard Drive in the Rescue and Recovery View. (I did not do a double check on another windows computer with my USB Hard Drive, however as I did not have access to one at the time and I was trusting in the Lenovo software, which did actually manage to rescue all the files, except for my most important ones.)
Is there anyway way that:
That because these files were so large (probably account for 50% of all data copied to my external USB Drive or 10-15 GB a piece), that the Rescue and Recovery file simply "passed over them"? The other files appeared to copy perfectly fine.
There was an error with the program, like a file size limit cap that Rescue and Recovery cannot surpass so they were just skipped over?
My own carelessness? Are all the root files not selected when you choose the entire 'C:' Drive as a directory? I did just choose to copy/rescue the entire 'C:' Drive and everything else (besides my two most important files) seemed to copy over so I do not beleive that this is the case.
I did not individually, select each file in the 'C:\Documents and Settings' because I assumed that it would select each of the root files within it. (Is that not the case?)
That the files are 'hiding' somewhere due to their large size?
Most importantly of all:
IS THERE ANYWAY THAT I CAN RECOVER MY DATA? There is about 2 years worth of photos, music, documents, etc. that I would like to have back that I do not have a recent back up of.
If anything, does anyone know WHY this happened? It was my understanding that Lenovo Rescue and Recovery was some of the best software for emergency crashes out there. And now it has failed me miserably. I'd at least like to know why it happened so that I may not repeat my mistake (or put too much faith in Lenovo) for the future.
I've run some data recovery software but I know it will not recovery all the files that I once had in their state before my hard drive was reformatted back to factory settings.
Thank you,
-nofeet3
Justin

Hello , and ARGH !
I AM HAVING THE EXACT SAME PROBLEM RIGHT NOW !
If anybody could help this would really be appreciated .
Here is the story
Lenovo SL500
Windows XP SP3 I think
Specs maybe not too important
My computer has some corrupted file and will not boot in normal or safe mode . When I try to do either , it starts booting , but before it finishes the load-up-windows screen , it displays an error message which says
" lsass.exe - System Error
the endpoint format is invalid "
and then goes no further , and the machine simply restarts itself again and would do this indefinitely if I didn't stop it .
SO when it starts booting and shows the Lenovo Logo , I press the Lenovo Help/Tool Button to enter Rescue and Recovery [not sure what version it is , I cannot find information within it] and it gives me the option to Rescue files before recovering the system . I have the same problem as Nofeet3.
It rescues all the small files
BUT WILL NOT RESCUE ANY OF MY IMPORTANT FILES
CERTAINLY IT DOES NOT RESCUE ANYTHING BIG [or bigger than roughly 2mb]
It rescues picture files BUT WILL NOT RESCUE MUSIC FILES !!
It may not even rescue bigger picture files , but I haven't noticed any pictures it has not copied since I can't go through ALL my picture files to see exactly what it has / hasn't copied.
However , not a single music file is copied.
It will NOT show up on the hard drive . What is strange , is when I try to plug the hard drive back in to try and copy the music files again , as I tried it a few times , it asks me if I would like to Write these files over the files already there , as though it has already copied. I check the hard disk , and these fils have definitely NOT copied onto the Disk . They are nowhere to be found .
I don't think the files are hiding , but if they are , maybe somebody could tell me where.
I suspect though that Rescue and Recovery has "recorded" the files as having been copied , some kind of record , EVEN THOUGH THEY HAVEN'T !! Or if they have , they are absolutely invisible . WEIRD !
nofeet3
It is NOT because you selected the whole C drive , and is not due to any carelessness on your part. I have selected all files individually , and have tried time and time and time again , when I plug the ext hard drive into my other laptop to see if the files copy , they don't . I have tried this over and over and over again desperately all today , it is not working , even going so far as to select individual files rather than bigger directories to try and transfer in smaller in smaller batches ,just in case it couldn't handle large transfers all at once . That is not the case either . It just simply will not transfer files roughly bigger than 2mb . NO MUSIC !! THIS IS VERY FRUSTRATING !!
I am not sure at present whether this program can rightly be called "Rescue and Recover". It is one of the main reasons I selected Lenovo , and this is not a good demonstration of it's abilites at all .
PLEASE HELP ME !!
ANY HELP WOULD BE HIGHLY APPRECIATED !!
Thanks so much in advance for any help anybody can give , and best regards ,
Carl .

TTL specified in put operation doesn't always work when using write-behind

I'm using a distributed cache with a write-behind cache store (see the config below). I found that when I do something like myCache.put(key, value, ttl), the entry survives the specified ttl. I tried doing the same with a distributed cache with a write-through cachestore and there everything does happen correctly.
Is this sort of operation not permitted in caches containing a write-behind cachestore? If not wouldn't it be better to throw an UnsupportedOperationException.
I created a small test to simulate this. I added values to the cache with a TTL of 1 to 10 seconds and found that the 10 second entries stayed in the cache.
Configuration used:
<?xml version="1.0"?>
<!DOCTYPE cache-config SYSTEM "cache-config.dtd">
<cache-config>
     <caching-scheme-mapping>
          <cache-mapping>
               <cache-name>TTL_TEST</cache-name>
               <scheme-name>testScheme</scheme-name>
          </cache-mapping>
     </caching-scheme-mapping>
     <caching-schemes>
          <distributed-scheme>
               <scheme-name>testScheme</scheme-name>
               <service-name>testService</service-name>
               <backing-map-scheme>
                    <read-write-backing-map-scheme>
                         <internal-cache-scheme>
                              <local-scheme>
                                   <service-name>testBackLocalService</service-name>
                              </local-scheme>
                         </internal-cache-scheme>
                         <cachestore-scheme>
                              <class-scheme>
                                   <scheme-name>testBackStore</scheme-name>
                                   <class-name>TTLTestServer$TestCacheStore</class-name>
                              </class-scheme>
                         </cachestore-scheme>
                         <write-delay>3s</write-delay>
                    </read-write-backing-map-scheme>
               </backing-map-scheme>
               <local-storage>true</local-storage>
               <autostart>true</autostart>
          </distributed-scheme>
     </caching-schemes>
</cache-config>Code of test:
import java.util.Collection;
import java.util.List;
import java.util.Map;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import org.joda.time.DateTime;
import org.joda.time.Duration;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.util.StopWatch;
import org.testng.annotations.BeforeClass;
import org.testng.annotations.Test;
import com.google.common.collect.Lists;
import com.tangosol.net.CacheFactory;
import com.tangosol.net.NamedCache;
import com.tangosol.net.cache.CacheStore;
@Test
public class TTLTestServer
     private static final int RETRIES = 5;
     private static final Logger logger = LoggerFactory.getLogger( TTLTestServer.class );
     private NamedCache m_cache;
      * List of Time-To-Lives in seconds to check
     private final List<Integer> m_listOfTTLs = Lists.newArrayList(1, 3, 5, 10);
      * Test is done in separate threads to speed up the test
     private final ExecutorService m_executorService = Executors.newCachedThreadPool();
     @BeforeClass
     public void setup()
          logger.info("Getting the cache");
          m_cache = CacheFactory.getCache("TTL_TEST");
     public static class TestCacheStore implements CacheStore
          public void erase(Object arg0)
          public void eraseAll(Collection arg0)
          public void store(Object arg0, Object arg1)
          public void storeAll(Map arg0)
          public Object load(Object arg0)
          {return null;}
          public Map loadAll(Collection arg0)
          {return null;}
     public void testTTL() throws InterruptedException, ExecutionException
          logger.info("Starting TTL test");
          List<Future<StopWatch>> futures = Lists.newArrayList();
          for (final Integer ttl : m_listOfTTLs)
               futures.add(m_executorService.submit(new Callable()
                    public Object call() throws Exception
                         StopWatch stopWatch= new StopWatch("TTL=" + ttl);
                         for (int retry = 0; retry < RETRIES; retry++)
                              logger.info("Adding a value in cache for TTL={} in try={}", ttl, retry+1);
                              stopWatch.start("Retry="+retry);
                              m_cache.put(ttl, null, ttl*1000);
                              waitUntilNotInCacheAnymore(ttl, retry);
                              stopWatch.stop();
                         return stopWatch;
                    private void waitUntilNotInCacheAnymore(final Integer ttl, final int currentTry) throws InterruptedException
                         DateTime startTime = new DateTime();
                         long maxMillisToWait = ttl*2*1000;     //wait max 2 times the time of the ttl
                         while(m_cache.containsKey(ttl) )
                              Duration timeTaken = new Duration(startTime, new DateTime());
                              if(timeTaken.getMillis() > maxMillisToWait)
                                   throw new RuntimeException("Already waiting " + timeTaken + " for ttl=" + ttl + " and retry=" + currentTry);
                              Thread.sleep(1000);
          logger.info("Waiting until all futures are finished");
          m_executorService.shutdown();
          logger.info("Getting results from futures");
          for (Future<StopWatch> future : futures)
               StopWatch sw = future.get();
               logger.info(sw.prettyPrint());
}Failure message:
FAILED: testTTL
java.util.concurrent.ExecutionException: java.lang.RuntimeException: Already waiting PT20.031S for ttl=10 and retry=0
     at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
     at java.util.concurrent.FutureTask.get(Unknown Source)
     at TTLTestServer.testTTL(TTLTestServer.java:159)
Caused by: java.lang.RuntimeException: Already waiting PT20.031S for ttl=10 and retry=0
     at TTLTestServer$1.waitUntilNotInCacheAnymore(TTLTestServer.java:139)
     at TTLTestServer$1.call(TTLTestServer.java:122)
     at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
     at java.util.concurrent.FutureTask.run(Unknown Source)
     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
     at java.lang.Thread.run(Unknown Source)I'm using Coherence 3.4.2.
Best regards
Jan

Hi, still no luck. However, I noticed that setting the write-delay value of the write-behind store to 0s or 1s, solved the problem. It only starts to given me "the node has already been removed" excpetions once the write-delay value is 2s or higher.
You can find the coherence-cache-config.xml below:
<?xml version="1.0"?>
<!DOCTYPE cache-config SYSTEM "cache-config.dtd">
<cache-config>
     <caching-scheme-mapping>
          <cache-mapping>
               <cache-name>TTL_TEST</cache-name>
               <scheme-name>testScheme</scheme-name>
          </cache-mapping>
     </caching-scheme-mapping>
     <caching-schemes>
          <distributed-scheme>
               <scheme-name>testScheme</scheme-name>
               <service-name>testService</service-name>
               <backing-map-scheme>
                    <read-write-backing-map-scheme>
                         <internal-cache-scheme>
                              <local-scheme>
                                   <service-name>testBackLocalService</service-name>
                              </local-scheme>
                         </internal-cache-scheme>
                         <cachestore-scheme>
                              <class-scheme>
                                   <scheme-name>testBackStore</scheme-name>
                                   <class-name>TTLTestServer$TestCacheStore</class-name>
                              </class-scheme>
                         </cachestore-scheme>
                         <write-delay>2s</write-delay>
                    </read-write-backing-map-scheme>
               </backing-map-scheme>
               <local-storage>true</local-storage>
               <autostart>true</autostart>
          </distributed-scheme>
     </caching-schemes>
</cache-config>You can find the test program below:
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.Map;
import org.joda.time.DateTime;
import org.joda.time.Duration;
import org.springframework.util.StopWatch;
import com.tangosol.net.CacheFactory;
import com.tangosol.net.NamedCache;
import com.tangosol.net.cache.CacheStore;
public class TTLTestServer
     private static final int RETRIES = 5;
     private NamedCache m_cache;
      * List of Time-To-Lives in seconds to check
     private final List<Integer> m_listOfTTLs = new ArrayList<Integer>();
      * @param args
      * @throws Exception
     public static void main( String[] args ) throws Exception
          new TTLTestServer().test();
      * Empty CacheStore
      * @author jbe
     public static class TestCacheStore implements CacheStore
          public void erase(Object arg0)
          @SuppressWarnings ( "unchecked" )
          public void eraseAll(Collection arg0)
          public void store(Object arg0, Object arg1)
          @SuppressWarnings ( "unchecked" )
          public void storeAll(Map arg0)
          public Object load(Object arg0)
          {return null;}
          @SuppressWarnings ( "unchecked" )
          public Map loadAll(Collection arg0)
          {return null;}
      * Sets up and executes the test setting values in a cache with a given time-to-live value and waiting for the value to disappear.
      * @throws Exception
     private void test() throws Exception
          System.out.println(new DateTime() + " - Setting up TTL test");
          m_cache = CacheFactory.getCache("TTL_TEST");
          m_listOfTTLs.add( 1 );
          m_listOfTTLs.add( 3 );
          m_listOfTTLs.add( 5 );
          m_listOfTTLs.add( 10);
          System.out.println(new DateTime() + " - Starting TTL test");
          for (final Integer ttl : m_listOfTTLs)
               StopWatch sw = doTest(ttl);
               System.out.println(sw.prettyPrint());
      * Adds a value to the cache with the time-to-live as given by the ttl parameter and waits until it's removed from the cache.
      * Repeats this {@link #RETRIES} times
      * @param ttl
      * @return
      * @throws Exception
     private StopWatch doTest(Integer ttl) throws Exception
          StopWatch stopWatch= new StopWatch("TTL=" + ttl);
          for (int retry = 0; retry < RETRIES; retry++)
               System.out.println(new DateTime() + " - Adding a value in cache for TTL=" + ttl + " in try= " + (retry+1));
               stopWatch.start("Retry="+retry);
               m_cache.put(ttl, null, ttl*1000);
               waitUntilNotInCacheAnymore(ttl, retry);
               stopWatch.stop();
          return stopWatch;
      * Wait until the value for the given ttl is not in the cache anymore
      * @param ttl
      * @param currentTry
      * @throws InterruptedException
     private void waitUntilNotInCacheAnymore(final Integer ttl, final int currentTry) throws InterruptedException
          DateTime startTime = new DateTime();
          long maxMillisToWait = ttl*2*1000;     //wait max 2 times the time of the ttl
          while(m_cache.containsKey(ttl) )
               Duration timeTaken = new Duration(startTime, new DateTime());
               if(timeTaken.getMillis() > maxMillisToWait)
                    throw new RuntimeException("Already waiting " + timeTaken + " for ttl=" + ttl + " and retry=" + currentTry);
               Thread.sleep(1000);
}You can find the output below:
2009-12-03T11:50:04.584+01:00 - Setting up TTL test
2009-12-03 11:50:04.803/0.250 Oracle Coherence 3.5.2/463p2 <Info> (thread=main, member=n/a): Loaded operational configuration from resource "jar:file:/C:/Temp/coherence3.5.2/coherence-java-v3.5.2b463-p1_2/coherence/lib/coherence.jar!/tangosol-coherence.xml"
2009-12-03 11:50:04.803/0.250 Oracle Coherence 3.5.2/463p2 <Info> (thread=main, member=n/a): Loaded operational overrides from resource "jar:file:/C:/Temp/coherence3.5.2/coherence-java-v3.5.2b463-p1_2/coherence/lib/coherence.jar!/tangosol-coherence-override-dev.xml"
2009-12-03 11:50:04.803/0.250 Oracle Coherence 3.5.2/463p2 <D5> (thread=main, member=n/a): Optional configuration override "/tangosol-coherence-override.xml" is not specified
2009-12-03 11:50:04.803/0.250 Oracle Coherence 3.5.2/463p2 <D5> (thread=main, member=n/a): Optional configuration override "/custom-mbeans.xml" is not specified
Oracle Coherence Version 3.5.2/463p2
Grid Edition: Development mode
Copyright (c) 2000, 2009, Oracle and/or its affiliates. All rights reserved.
2009-12-03 11:50:04.943/0.390 Oracle Coherence GE 3.5.2/463p2 <Info> (thread=main, member=n/a): Loaded cache configuration from "file:/C:/jb/workspace3.5/TTLTest/target/classes/coherence-cache-config.xml"
2009-12-03 11:50:05.318/0.765 Oracle Coherence GE 3.5.2/463p2 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a
2009-12-03 11:50:08.568/4.015 Oracle Coherence GE 3.5.2/463p2 <Info> (thread=Cluster, member=n/a): Created a new cluster "cluster:0xD3FB" with Member(Id=1, Timestamp=2009-12-03 11:50:05.193, Address=172.16.44.32:8088, MachineId=36896, Location=process:11848, Role=TTLTestServerTTLTestServer, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=2) UID=0xAC102C20000001255429380990201F98
2009-12-03 11:50:08.584/4.031 Oracle Coherence GE 3.5.2/463p2 <D5> (thread=Invocation:Management, member=1): Service Management joined the cluster with senior service member 1
2009-12-03 11:50:08.756/4.203 Oracle Coherence GE 3.5.2/463p2 <D5> (thread=DistributedCache:testService, member=1): Service testService joined the cluster with senior service member 1
2009-12-03T11:50:08.803+01:00 - Starting TTL test
2009-12-03T11:50:08.818+01:00 - Adding a value in cache for TTL=1 in try= 1
2009-12-03T11:50:09.818+01:00 - Adding a value in cache for TTL=1 in try= 2
Exception in thread "main" (Wrapped: Failed request execution for testService service on Member(Id=1, Timestamp=2009-12-03 11:50:05.193, Address=172.16.44.32:8088, MachineId=36896, Location=process:11848, Role=TTLTestServerTTLTestServer)) java.lang.IllegalStateException: the node has already been removed
     at com.tangosol.util.Base.ensureRuntimeException(Base.java:293)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.tagException(Grid.CDB:36)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache.onContainsKeyRequest(DistributedCache.CDB:41)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ContainsKeyRequest.run(DistributedCache.CDB:1)
     at com.tangosol.coherence.component.net.message.requestMessage.DistributedCacheKeyRequest.onReceived(DistributedCacheKeyRequest.CDB:12)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onMessage(Grid.CDB:9)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onNotify(Grid.CDB:136)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache.onNotify(DistributedCache.CDB:3)
     at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
     at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.IllegalStateException: the node has already been removed
     at com.tangosol.util.AbstractSparseArray$Crawler.remove(AbstractSparseArray.java:1274)
     at com.tangosol.net.cache.OldCache.evict(OldCache.java:580)
     at com.tangosol.net.cache.OldCache.containsKey(OldCache.java:171)
     at com.tangosol.net.cache.ReadWriteBackingMap.containsKey(ReadWriteBackingMap.java:597)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache.onContainsKeyRequest(DistributedCache.CDB:25)
     ... 7 more
2009-12-03 11:50:10.834/6.281 Oracle Coherence GE 3.5.2/463p2 <D4> (thread=ShutdownHook, member=1): ShutdownHook: stopping cluster node
2009-12-03 11:50:10.834/6.281 Oracle Coherence GE 3.5.2/463p2 <D5> (thread=Cluster, member=1): Service Cluster left the clusterBest regards
Jan

Write-through limitation and putAll

Please find the quote below from developer guide, particularly this one In other words, if two cache entries are updated, triggering calls to CacheStore modules sitting on separate cache servers, it is possible for one database update to succeed and for the other to fail.If a putAll is called on a cache, will it result in one CacheStore.storeAll or many storeAll triggered from different coherence nodes/servers? (assume a distributed topology coherence 3.7.1)
Will the store transaction failure lead to putAll transaction failure?
Are there any patterns that shows how this coherence works with typical databases?
14.7.2 Write-Through LimitationsCoherence does not support two-phase CacheStore operations across multiple CacheStore instances. In other words, if two cache entries are updated, triggering calls to CacheStore modules sitting on separate cache servers, it is possible for one database update to succeed and for the other to fail. In this case, it may be preferable to use a cache-aside architecture (updating the cache and database as two separate components of a single transaction) with the application server transaction manager. In many cases it is possible to design the database schema to prevent logical commit failures (but obviously not server failures). Write-behind caching avoids this issue as "puts" are not affected by database behavior (as the underlying issues have been addressed earlier in the design process).

gs100 wrote:
Thanks for the input, I have further questions based on these suggestions.
1. Let us say one of the putAll fails we would know that it has failed due to underlying one or more store/storeAll. And even if we rollback the coherence transaction, the store/storeAll that succeeded would not be rolled back automatically, is that correct? If true, this means that it would leave the underlying DB/store in the inconsistent state with that of in-memory cache?I guess that is one of the reasons why the transaction framework does not support cache stores... also, write-behind would coalesce updates which would have funny consequences with regards to the transactional context...
2. How do we get the custom implementation of putAll, that you suggested to handle specific errors? any pointers on this would be helpful.I guess it is not going to be posted, the Coherence team may or may not add something which is a bit more deterministic with regards to error.
A few aspects of Coherence behaviour (a.k.a pitfalls) which you need to be aware of to be able to implement your own solution:
Exceptions propagating back to the client can happen in:
- entry-processor (not for putAll specifically)
- result serialization code (not for putAll specifically, but for processAll/aggregate for example)
- deserialization code (indexes/filter-based backing map listeners/cache stores lead to deserialization even for putAll)
- triggers (intentionally, too)
- cache stores
There is no place where you could catch any exceptions from inside the NamedCache call, so they will come out.
Coherence may execute the operation on one thread per partition or one thread per multiple partitions, but never on multiple threads per partition. This means there may be multiple exceptions even from a single storage node, but only at most one exception would be generated per partition (starting with 3.5).
If you send multiple partitions with the same NamedCache call, you can lose exceptions as you wouldn't know if an exception would have or wouldn't have happened with a partition if it was sent alone instead of together with another on the same node.
As you need to be able to return all exceptions from your method call, you have to produce and catch all of them and collect them otherwise you would lose all but one. To produce and catch all exceptions you have to produce all exceptions independently, i.e. different partitions must be operated on independently.
To send an operation to a single partition only, you can separate the operations to different partitions by separating the keysets for different partitions with key-based operations, or applying a PartitionedFilter for filter-based operations.
It is up to you where and how you iterate through the partitions. You can do it on the caller, you can do it on storage node from an Invocable sent via an InvocationService (in this case you can be either optimistic with ownership or chase a partition).
3. Because we are thinking putAll that coherence implemented is most optimized (parallelism). I am not sure how the custom implementation can be as optimal (hope we don't end up calling one by one).You cannot implement it as optimally as Coherence itself does as it interleaves operations (Messages) to independent partitions/nodes (does not have to wait for the return message) from a single thread without waiting for the responses from individual nodes/partitions.
You can either parallelize operations to multiple threads, or do the iteration on the single thread at the cost of higher latency.
Best regards,
Robert

Thread pool configuration for write-behind cache store operation?

Hi,
Does Coherence have a thread pool configuration for the Coherence CacheStore operation?
Or the CacheStore implementation needs to do that?
We're using write-behind and want to use multiple threads to speed up the store operation (storeAll()...)
Thanks in advance for your help.

user621063 wrote:
Hi,
Does Coherence have a thread pool configuration for the Coherence CacheStore operation?
Or the CacheStore implementation needs to do that?
We're using write-behind and want to use multiple threads to speed up the store operation (storeAll()...)
Thanks in advance for your help.Hi,
read/write-through operations are carried out on the worker thread (so if you configured a thread-pool for the service the same thread-pool will be used for the cache-store operation).
for write-behind/read-ahead operations, there is a single dedicated thread per cache above whatever thread-pool is configured, except for remove operations which are synchronous and still carried out on the worker thread (see above).
All above is of course per storage node.
Best regards,
Robert

Write behind exception and recovery

Similar Messages

Maybe you are looking for