You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Alexander Belyak (JIRA)" <ji...@apache.org> on 2017/04/11 07:30:41 UTC

[jira] [Updated] (IGNITE-4940) GridCacheWriteBehindStore lose more data then necessary

     [ https://issues.apache.org/jira/browse/IGNITE-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexander Belyak updated IGNITE-4940:
-------------------------------------
    Remaining Estimate: 8h  (was: 24h)
     Original Estimate: 8h  (was: 24h)

> GridCacheWriteBehindStore lose more data then necessary
> -------------------------------------------------------
>
>                 Key: IGNITE-4940
>                 URL: https://issues.apache.org/jira/browse/IGNITE-4940
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 1.9
>            Reporter: Alexander Belyak
>            Priority: Minor
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Unnecessary data loss happen in case of slowdown or errors in underlying store & populate new data in cache:
> 1) Writer add new cache entry and check cache size
> 2) If cache size > criticalSize (by default criticalSize = 1.5 * cacheSize) - writer will try to flush single value synchronously
> At this point we have:
> N flusher threads wich trying to flush data in batch mode
> 1+ writer threads wich trying to flush single value
> Both writer and flusher use updateStore procedure, but if updateStore get Exception from underlying store it will check cacheSize and if it will be greater chen criticalCacheSize - it log cache overflow event and return true (as if data was sucessfully stored). Then data will be removed from writeBehind cache.
> Moreower, we can loss not only single value, but 1+ batch if flusher's threads will get store exception on overflowed cache.
> Reproduce:
> {panel}
> /**
>      * Tests that cache would keep values if underlying store fails.
>      *
>      * @throws Exception If failed.
>      */
>     private void testStoreFailure(boolean writeCoalescing) throws Exception {
>         delegate.setShouldFail(true);
>         initStore(2, writeCoalescing);
>         Set<Integer> exp;
>         try {
>             Thread timer = new Thread(new Runnable() {
>                 @Override
>                 public void run() {
>                     try {
>                         U.sleep(FLUSH_FREQUENCY*2);
>                     } catch (IgniteInterruptedCheckedException e) {
>                         assertTrue("Timer was interrupted", false);
>                     }
>                     delegate.setShouldFail(false);
>                 }
>             });
>             timer.start();
>             exp = runPutGetRemoveMultithreaded(10, 100000);
>             timer.join();
>             info(">>> There are " + store.getWriteBehindErrorRetryCount() + " entries in RETRY state");
>             // Despite that we set shouldFail flag to false, flush thread may just have caught an exception.
>             // If we move store to the stopping state right away, this value will be lost. That's why this sleep
>             // is inserted here to let all exception handlers in write-behind store exit.
>             U.sleep(1000);
>         }
>         finally {
>             shutdownStore();
>         }
>         Map<Integer, String> map = delegate.getMap();
>         Collection<Integer> extra = new HashSet<>(map.keySet());
>         extra.removeAll(exp);
>         assertTrue("The underlying store contains extra keys: " + extra, extra.isEmpty());
>         Collection<Integer> missing = new HashSet<>(exp);
>         missing.removeAll(map.keySet());
>         assertTrue("Missing keys in the underlying store: " + missing, missing.isEmpty());
>         for (Integer key : exp)
>             assertEquals("Invalid value for key " + key, "val" + key, map.get(key));
>     }
> {panel}
> Solution: test cache size before inserting new value +
> a) with some kind of synchronization to prevent cacheSize growing more then criticalCacheSize (strong restriction)
> b) remove cache size test from updateStore - cache can grow more then cacheCriticalSize in single point - if we get race on updateCache...
> I preferr b becouse of less synchronization pressure (cache can store 1 or 2 extra elements)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)