You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ben Manes (JIRA)" <ji...@apache.org> on 2016/01/06 21:38:40 UTC
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches

    [ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086242#comment-15086242 ] 

Ben Manes commented on CASSANDRA-10855:
---------------------------------------

Happy new year. Anything I can do to help keep this moving?

Ariel's comment explains the poor hit rate, as a uniform distribution will result in a fixed and low hit rate regardless of policy. An effective cache is often at around 85%, ideally in the high 90s to make reads the dominant case, but even 65% is useful. Even when the hit rate is maxed out, the effect of a better policy can be noticeable. In that case it reduces the TCO by being able to achieve the same performance with smaller, cheaper machines.

Glancing at the uniform results the degredation is small enough to probably be within the margin of error where the run and other system effects dominate. In an update heavy workload the new cache should be faster due to synchronization having less penalty than CAS storms. But on the perf test's insertion heavy workload it is probably a little slower due to features incurring more complexity. Another set of eyes might uncover some improvements, so that's always welcome.

[Zipf-like|http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.2253&rank=1] distributions are considered the most common workload patterns. Ideally we could capture a production trace and simulate it, as the [database trace|https://github.com/ben-manes/caffeine/wiki/Efficiency#database] I use shows very promising results.

> Use Caffeine (W-TinyLFU) for on-heap caches
> -------------------------------------------
>
>                 Key: CASSANDRA-10855
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10855
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Ben Manes
>              Labels: performance
>
> Cassandra currently uses [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] for performance critical caches (key, counter) and Guava's cache for non-critical (auth, metrics, security). All of these usages have been replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the author of the previously mentioned libraries.
> The primary incentive is to switch from LRU policy to W-TinyLFU, which provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] hit rates. It performs particularly well in database and search traces, is scan resistant, and as adds a very small time/space overhead to LRU.
> Secondarily, Guava's caches never obtained similar [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM due to some optimizations not being ported over. This change results in faster reads and not creating garbage as a side-effect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)