You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict (JIRA)" <ji...@apache.org> on 2014/05/11 00:00:09 UTC

[jira] [Created] (CASSANDRA-7203) Flush (and Compact) High Traffic Partitions Separately

Benedict created CASSANDRA-7203:
-----------------------------------

             Summary: Flush (and Compact) High Traffic Partitions Separately
                 Key: CASSANDRA-7203
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7203
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Benedict


An idea possibly worth exploring is the use of streaming count-min sketches to collect data over the up-time of a server to estimating the velocity of different partitions, so that high-volume partitions can be flushed separately on the assumption that they will be much smaller in number, thus reducing write amplification by permitting compaction independently of any low-velocity data.

Whilst the idea is reasonably straight forward, it seems that the biggest problem here will be defining any success metric. Obviously any workload following an exponential/zipf/extreme distribution is likely to benefit from such an approach, but whether or not that would translate in real terms is another matter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)