You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Alexander Belyak (JIRA)" <ji...@apache.org> on 2017/04/11 07:29:41 UTC

[jira] [Created] (IGNITE-4940) GridCacheWriteBehindStore lose more data then necessary

Alexander Belyak created IGNITE-4940:
----------------------------------------

             Summary: GridCacheWriteBehindStore lose more data then necessary
                 Key: IGNITE-4940
                 URL: https://issues.apache.org/jira/browse/IGNITE-4940
             Project: Ignite
          Issue Type: Bug
    Affects Versions: 1.9
            Reporter: Alexander Belyak
            Priority: Minor


Unnecessary data loss happen in case of slowdown or errors in underlying store & populate new data in cache:
1) Writer add new cache entry and check cache size
2) If cache size > criticalSize (by default criticalSize = 1.5 * cacheSize) - writer will try to flush single value synchronously
At this point we have:
N flusher threads wich trying to flush data in batch mode
1+ writer threads wich trying to flush single value
Both writer and flusher use updateStore procedure, but if updateStore get Exception from underlying store it will check cacheSize and if it will be greater chen criticalCacheSize - it log cache overflow event and return true (as if data was sucessfully stored). Then data will be removed from writeBehind cache.
Moreower, we can loss not only single value, but 1+ batch if flusher's threads will get store exception on overflowed cache.
Reproduce:
{panel}
/**
     * Tests that cache would keep values if underlying store fails.
     *
     * @throws Exception If failed.
     */
    private void testStoreFailure(boolean writeCoalescing) throws Exception {
        delegate.setShouldFail(true);

        initStore(2, writeCoalescing);

        Set<Integer> exp;

        try {
            Thread timer = new Thread(new Runnable() {
                @Override
                public void run() {
                    try {
                        U.sleep(FLUSH_FREQUENCY*2);
                    } catch (IgniteInterruptedCheckedException e) {
                        assertTrue("Timer was interrupted", false);
                    }
                    delegate.setShouldFail(false);
                }
            });
            timer.start();
            exp = runPutGetRemoveMultithreaded(10, 100000);

            timer.join();

            info(">>> There are " + store.getWriteBehindErrorRetryCount() + " entries in RETRY state");

            // Despite that we set shouldFail flag to false, flush thread may just have caught an exception.
            // If we move store to the stopping state right away, this value will be lost. That's why this sleep
            // is inserted here to let all exception handlers in write-behind store exit.
            U.sleep(1000);
        }
        finally {
            shutdownStore();
        }

        Map<Integer, String> map = delegate.getMap();

        Collection<Integer> extra = new HashSet<>(map.keySet());

        extra.removeAll(exp);

        assertTrue("The underlying store contains extra keys: " + extra, extra.isEmpty());

        Collection<Integer> missing = new HashSet<>(exp);

        missing.removeAll(map.keySet());

        assertTrue("Missing keys in the underlying store: " + missing, missing.isEmpty());

        for (Integer key : exp)
            assertEquals("Invalid value for key " + key, "val" + key, map.get(key));
    }
{panel}
Solution: test cache size before inserting new value +
a) with some kind of synchronization to prevent cacheSize growing more then criticalCacheSize (strong restriction)
b) remove cache size test from updateStore - cache can grow more then cacheCriticalSize in single point - if we get race on updateCache...
I preferr b becouse of less synchronization pressure (cache can store 1 or 2 extra elements)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)