You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Alexander Belyak (JIRA)" <ji...@apache.org> on 2017/04/11 07:30:41 UTC
[jira] [Updated] (IGNITE-4940) GridCacheWriteBehindStore lose more
data then necessary
[ https://issues.apache.org/jira/browse/IGNITE-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Belyak updated IGNITE-4940:
-------------------------------------
Remaining Estimate: 8h (was: 24h)
Original Estimate: 8h (was: 24h)
> GridCacheWriteBehindStore lose more data then necessary
> -------------------------------------------------------
>
> Key: IGNITE-4940
> URL: https://issues.apache.org/jira/browse/IGNITE-4940
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 1.9
> Reporter: Alexander Belyak
> Priority: Minor
> Original Estimate: 8h
> Remaining Estimate: 8h
>
> Unnecessary data loss happen in case of slowdown or errors in underlying store & populate new data in cache:
> 1) Writer add new cache entry and check cache size
> 2) If cache size > criticalSize (by default criticalSize = 1.5 * cacheSize) - writer will try to flush single value synchronously
> At this point we have:
> N flusher threads wich trying to flush data in batch mode
> 1+ writer threads wich trying to flush single value
> Both writer and flusher use updateStore procedure, but if updateStore get Exception from underlying store it will check cacheSize and if it will be greater chen criticalCacheSize - it log cache overflow event and return true (as if data was sucessfully stored). Then data will be removed from writeBehind cache.
> Moreower, we can loss not only single value, but 1+ batch if flusher's threads will get store exception on overflowed cache.
> Reproduce:
> {panel}
> /**
> * Tests that cache would keep values if underlying store fails.
> *
> * @throws Exception If failed.
> */
> private void testStoreFailure(boolean writeCoalescing) throws Exception {
> delegate.setShouldFail(true);
> initStore(2, writeCoalescing);
> Set<Integer> exp;
> try {
> Thread timer = new Thread(new Runnable() {
> @Override
> public void run() {
> try {
> U.sleep(FLUSH_FREQUENCY*2);
> } catch (IgniteInterruptedCheckedException e) {
> assertTrue("Timer was interrupted", false);
> }
> delegate.setShouldFail(false);
> }
> });
> timer.start();
> exp = runPutGetRemoveMultithreaded(10, 100000);
> timer.join();
> info(">>> There are " + store.getWriteBehindErrorRetryCount() + " entries in RETRY state");
> // Despite that we set shouldFail flag to false, flush thread may just have caught an exception.
> // If we move store to the stopping state right away, this value will be lost. That's why this sleep
> // is inserted here to let all exception handlers in write-behind store exit.
> U.sleep(1000);
> }
> finally {
> shutdownStore();
> }
> Map<Integer, String> map = delegate.getMap();
> Collection<Integer> extra = new HashSet<>(map.keySet());
> extra.removeAll(exp);
> assertTrue("The underlying store contains extra keys: " + extra, extra.isEmpty());
> Collection<Integer> missing = new HashSet<>(exp);
> missing.removeAll(map.keySet());
> assertTrue("Missing keys in the underlying store: " + missing, missing.isEmpty());
> for (Integer key : exp)
> assertEquals("Invalid value for key " + key, "val" + key, map.get(key));
> }
> {panel}
> Solution: test cache size before inserting new value +
> a) with some kind of synchronization to prevent cacheSize growing more then criticalCacheSize (strong restriction)
> b) remove cache size test from updateStore - cache can grow more then cacheCriticalSize in single point - if we get race on updateCache...
> I preferr b becouse of less synchronization pressure (cache can store 1 or 2 extra elements)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)