You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Björn Hegerfors (JIRA)" <ji...@apache.org> on 2015/02/05 21:54:40 UTC

[jira] [Commented] (CASSANDRA-7272) Add "Major" Compaction to LCS

    [ https://issues.apache.org/jira/browse/CASSANDRA-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307969#comment-14307969 ] 

Björn Hegerfors commented on CASSANDRA-7272:
--------------------------------------------

I don't understand why major compaction for STCS isn't already optimal. I do see why one might want to compact some but not all SSTables in a multi-tombstone compaction (CASSANDRA-7019) (though DTCS should be a better fit for anyone wanting this). But if every single SSTable is being rewritten to disk, why not write them into one file? As far as I understand, the ultimate goal of STCS is to be one SSTable. STCS only gets there, the natural way, once in a blue moon. But that's the most optimal state that it can be in. Am I wrong?

The only explanation I can see for splitting the result of compacting all SSTables into fragments, is if those fragments are:
1. Partitioned smartly. For example into separate token ranges (à la LCS), timestamp ranges (à la DTCS) or clustering column ranges (which would be interesting). Or a combination of these.
2. The structure upheld by the resulting fragments is not subsequently demolished by the running compaction strategy going on with its usual business.

> Add "Major" Compaction to LCS 
> ------------------------------
>
>                 Key: CASSANDRA-7272
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7272
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: T Jake Luciani
>            Assignee: Marcus Eriksson
>            Priority: Minor
>              Labels: compaction
>             Fix For: 3.0
>
>
> LCS has a number of minor issues (maybe major depending on your perspective).
> LCS is primarily used for wide rows so for instance when you repair data in LCS you end up with a copy of an entire repaired row in L0.  Over time if you repair you end up with multiple copies of a row in L0 - L5.  This can make predicting disk usage confusing.  
> Another issue is cleaning up tombstoned data.  If a tombstone lives in level 1 and data for the cell lives in level 5 the data will not be reclaimed from disk until the tombstone reaches level 5.
> I propose we add a "major" compaction for LCS that forces consolidation of data to level 5 to address these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)