You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Björn Hegerfors (JIRA)" <ji...@apache.org> on 2015/02/05 21:58:40 UTC
[jira] [Comment Edited] (CASSANDRA-7019) Improve tombstone compactions

    [ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307963#comment-14307963 ] 

Björn Hegerfors edited comment on CASSANDRA-7019 at 2/5/15 8:58 PM:
--------------------------------------------------------------------

I posted a related ticked some time ago, CASSANDRA-8359. In particular, the side note at the end is essentially this ticket exactly, for DTCS. A solution to this ticket may or may not solve the main issue in that ticket, but that's a matter for that ticket.

Since DTCS SSTables are (supposed to be) separated into time windows, we have the concept of an _oldest_ SSTable in a way that we don't with STCS. To me it seems pretty clear that a multi-SSTable tombstone compaction on _n_ SSTables should always target the _n_ oldest ones. The oldest one alone is practically guaranteed to overlap with any other SSTable, in terms of tokens. So picking the right SSTables for multi-tombstone compaction should be as easy as sorting by age (min timestamp), taking the oldest one, and include the newer ones in succession, checking at which point the tombstone ratio is the highest. Or something close to that, anyway. Then we might as well write them back as a single SSTable, I don't see why not.

EDIT: moved the all of the below to CASSANDRA-7272, where it belongs.

-As for the STCS case, I don't understand why major compaction for STCS isn't already optimal. I do see why one might want to compact some but not all SSTables in a multi-tombstone compaction (though DTCS should be a better fit for anyone wanting this). But if every single SSTable is being rewritten to disk, why not write them into one file? As far as I understand, the ultimate goal of STCS is to be one SSTable. STCS only gets there, the natural way, once in a blue moon. But that's the most optimal state that it can be in. Am I wrong?-

-The only explanation I can see for splitting the result of compacting all SSTables into fragments, is if those fragments are:-
-1. Partitioned smartly. For example into separate token ranges (à la LCS), timestamp ranges (à la DTCS) or clustering column ranges (which would be interesting). Or a combination of these.-
-2. The structure upheld by the resulting fragments is not subsequently demolished by the running compaction strategy going on with its usual business.-


was (Author: bj0rn):
I posted a related ticked some time ago, CASSANDRA-8359. In particular, the side note at the end is essentially this ticket exactly, for DTCS. A solution to this ticket may or may not solve the main issue in that ticket, but that's a matter for that ticket.

Since DTCS SSTables are (supposed to be) separated into time windows, we have the concept of an _oldest_ SSTable in a way that we don't with STCS. To me it seems pretty clear that a multi-SSTable tombstone compaction on _n_ SSTables should always target the _n_ oldest ones. The oldest one alone is practically guaranteed to overlap with any other SSTable, in terms of tokens. So picking the right SSTables for multi-tombstone compaction should be as easy as sorting by age (min timestamp), taking the oldest one, and include the newer ones in succession, checking at which point the tombstone ratio is the highest. Or something close to that, anyway. Then we might as well write them back as a single SSTable, I don't see why not.

As for the STCS case, I don't understand why major compaction for STCS isn't already optimal. I do see why one might want to compact some but not all SSTables in a multi-tombstone compaction (though DTCS should be a better fit for anyone wanting this). But if every single SSTable is being rewritten to disk, why not write them into one file? As far as I understand, the ultimate goal of STCS is to be one SSTable. STCS only gets there, the natural way, once in a blue moon. But that's the most optimal state that it can be in. Am I wrong?

The only explanation I can see for splitting the result of compacting all SSTables into fragments, is if those fragments are:
1. Partitioned smartly. For example into separate token ranges (à la LCS), timestamp ranges (à la DTCS) or clustering column ranges (which would be interesting). Or a combination of these.
2. The structure upheld by the resulting fragments is not subsequently demolished by the running compaction strategy going on with its usual business.

> Improve tombstone compactions
> -----------------------------
>
>                 Key: CASSANDRA-7019
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Eriksson
>            Assignee: Branimir Lambov
>              Labels: compaction
>             Fix For: 3.0
>
>
> When there are no other compactions to do, we trigger a single-sstable compaction if there is more than X% droppable tombstones in the sstable.
> In this ticket we should try to include overlapping sstables in those compactions to be able to actually drop the tombstones. Might only be doable with LCS (with STCS we would probably end up including all sstables)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)