You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jeff Jirsa (JIRA)" <ji...@apache.org> on 2015/06/15 03:20:00 UTC

[jira] [Created] (CASSANDRA-9597) DTCS should consider file SIZE in addition to time windowing

Jeff Jirsa created CASSANDRA-9597:
-------------------------------------

             Summary: DTCS should consider file SIZE in addition to time windowing
                 Key: CASSANDRA-9597
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9597
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Jeff Jirsa
            Priority: Minor


DTCS seems to work well for the typical use case - writing data in perfect time order, compacting recent files, and ignoring older files.

However, there are "normal" operational actions where DTCS will fall behind and is unlikely to recover.

An example of this is streaming operations (for example, bootstrap or loading data into a cluster using sstableloader), where lots (tens of thousands) of very small sstables can be created spanning multiple time buckets. In these case, even if max_sstable_age_days is extended to allow the older incoming files to be compacted, the selection logic is likely to re-compact large files with fewer small files over and over, rather than prioritizing selection of max_threshold smallest files to decrease the number of candidate sstables as quickly as possible.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)