You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2013/01/24 09:41:14 UTC

[jira] [Updated] (CASSANDRA-5183) Improve cases where we purge tombstone on (minor) compaction

     [ https://issues.apache.org/jira/browse/CASSANDRA-5183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-5183:
----------------------------------------

    Fix Version/s: 1.2.2
    
> Improve cases where we purge tombstone on (minor) compaction
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-5183
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5183
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.2.2
>
>
> Currently, to be able to purge a tombstone, we check that the row it is part of is not present in a non-compacted sstable, as we should not remove a tombstone that may delete other columns in the non-compacted sstables.
> The (known) problem is, if you regularly update a row on which you've made deletes, tombstone may theoretically be kept forever unless you run a major compaction (which is bad and not even a possibility with leveled compaction).
> In practice, with wide rows and more precisely time-series type of load, it is not unlikely that tombstones might be kept, if not forever, at least much longer than gcgrace.
> One avoid to improve on that would be to start storing the minTimestamp of sstables (like we keep the maxTimestamp). During compaction, on top checking bloom filters, we would also check if the max timestamp of what we're about to purge is smaller than the min timestamp of the non compact sstable. If it is, then whatever tombstone we are looking at cannot shadow something in the non-compacted sstable and we can purge it (that is, even if the row in question may have columns in those non-compacted sstables).
> Note that while this isn't perfect in theory:
> # this is cheap to check. We may even compute the min timestamp of the non compacted sstable once at the beginning of the compaction and check that before looking at the BF, which may save a few intervalTree search (if we do end up doing the intervalTree search however, we might still want recomputing the min timestamp of the returned sstable as this may be bigger that the min timestamp of all the non compacted sstables).
> # both size tiered and leveled natural tend to compact sstable having data of rougthly the same age, so this should work reasonably well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira