You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Nicolas Maquet (JIRA)" <ji...@apache.org> on 2016/02/15 22:38:18 UTC

[jira] [Created] (SAMZA-873) Avoid unnecessary flushes in CachedStore

Nicolas Maquet created SAMZA-873:
------------------------------------

             Summary: Avoid unnecessary flushes in CachedStore
                 Key: SAMZA-873
                 URL: https://issues.apache.org/jira/browse/SAMZA-873
             Project: Samza
          Issue Type: Improvement
          Components: kv
    Affects Versions: 0.10.1
            Reporter: Nicolas Maquet


The class {{org.apache.samza.storage.kv.CachedStore}} is currently calling {{store.flush()}} when evicting dirty entries. This in turn causes RocksDB to flush its memtables much more than necessary, causing slowdowns. 

In a mixed put / get workload, e.g. 2 gets for 1 put with an object cache size of 1000, RocksDB will flush its memtable roughly every 333 calls to put(); that is every time the eldest entry from the cache is dirty. In our benchmarks, this leads to a more than 20x drop in throughput.

The attached patch fixes the issue as follows:

- {{CachedStore.put()}} no longer flushes when evicting dirty entries. It calls {{store.putAll()}} with all dirty entries and resets the dirty list and count but does not call {{store.flush()}}.
- Likewise, {{CachedStore.cache.removeEldestEntry()}} no longer flushes when evicting dirty entries but calls {{store.putAll()}} on all dirty entries and resets the dirty list and count.
- {{CachedStore.flush()}}'s behaviour is unaffected.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)