You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2013/10/02 20:25:44 UTC
[jira] [Comment Edited] (CASSANDRA-6109) Consider coldness in STCS
compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784245#comment-13784245 ]
Jonathan Ellis edited comment on CASSANDRA-6109 at 10/2/13 6:25 PM:
--------------------------------------------------------------------
I guess whether hotness or overlap is a more important criterion depends on your goal:
# prioritizing by hotness helps speed reads up more, especially when you have a lot of cold data sitting around
# prioritizing by overlap ratio reduces disk space and helps throw away obsolete cells faster
I was hoping to tackle #1 here, but maybe that needs a separate strategy a la CASSANDRA-5561.
For #2, CASSANDRA-5906 adds a HyperLogLog component that does a fantastic job of letting us estimate overlap ratios.
was (Author: jbellis):
I guess whether hotness or overlap is a more important criterion depends on your goal:
# prioritizing by hotness helps speed reads up more, especially when you have a lot of cold data sitting around
# prioritizing by overlap ratio reduces disk space and helps throw away obsolete cells faster
I was hoping to tackle #1 here, but maybe that needs a separate strategy a la CASSANDRA-5560.
For #2, CASSANDRA-5906 adds a HyperLogLog component that does a fantastic job of letting us estimate overlap ratios.
> Consider coldness in STCS compaction
> ------------------------------------
>
> Key: CASSANDRA-6109
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6109
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Tyler Hobbs
> Fix For: 2.0.2
>
>
> I see two options:
> # Don't compact cold sstables at all
> # Compact cold sstables only if there is nothing more important to compact
> The latter is better if you have cold data that may become hot again... but it's confusing if you have a workload such that you can't keep up with *all* compaction, but you can keep up with hot sstable. (Compaction backlog stat becomes useless since we fall increasingly behind.)
--
This message was sent by Atlassian JIRA
(v6.1#6144)