You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Aleksey Yeschenko (JIRA)" <ji...@apache.org> on 2016/03/18 22:00:34 UTC
[jira] [Resolved] (CASSANDRA-6909) A way to expire columns without converting to tombstones

     [ https://issues.apache.org/jira/browse/CASSANDRA-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aleksey Yeschenko resolved CASSANDRA-6909.
------------------------------------------
    Resolution: Duplicate

CASSANDRA-5546 is going to mostly address the problem from a different angle - closing the ticket as a dup of that.

> A way to expire columns without converting to tombstones
> --------------------------------------------------------
>
>                 Key: CASSANDRA-6909
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6909
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Bartłomiej Romański
>
> Imagine the following scenario. 
> - You need to store some data knowing that you will need them only for limited time (say 7 days).
> - After that you just don't care. You don't need them to be returned in the queries, but if they are returned that's not a problem at all - you won't look at them anyway.
> - You records are small. Row keys and column names are even longer than the actual values (e.g. ints vs strings).
> - You reuse rows. You add some new columns to most of the rows every day or two. This means that columns expire often, rows usually not.
> - You generate a lot of data and want to make sure that expired records do not consume disk space for too long.
> Current TTL feature do not handle that situation well. When compaction finally decides that it's worth to compact the given sstable it won't simply get rid of expired columns. Instead it will transform them into tombstones. In case of small values that's not a saving at all.
> Even if you set grace period to 0 tombstones cannot be removed too early because some other sstable can still have values that should be "covered" by this tombstone. 
> You can get rid of tombstone only in two cases:
> - it's a major compaction (never happens with LCS, requires a lot of space in STCS)
> - bloom filters tell you that there are no others sstable with this row key
> The second option is not common if you usually have multiple columns in a single row that was written not at once. It's a great chance you'll have your row spread across multiple sstables. And from time to time a new ones are generated. There's very little chance they'll all meet in one compaction at some point. 
> What's funny, bloom filters returns true if there's a tombstone for the given row in the given sstable. So you won't remove tombstones during compaction, because there's some other tombstone in another sstable for that row :/
> After a while, you end up with a lot of tombstones (majority of your data) and can do nothing about that.
> Now image that Cassandra knows that we just don't care about data older than 7 days. 
> Firstly, it can simply drop such columns during compactions (without converting them to tombstones or anything like that).
> Secondly, if it detects an sstable older than 7 days it can safely remove it at all (it cannot contain any active data).
> These two *guarantee* that you data will be removed after 14 days (2xTTL). If we do compaction after 7 days, expired data will be removed. If we not, whole sstable will be removed after another 7 days.
> That's what I expected from CASSANDRA-3974, but it turned out to be a just trivial, frontend feature. 
> I suggest to rethink this mechanism. I don't believe that it's a common scenario that someone who sets TTL for whole CF need all this strong guarantees that data will not reappear in the future in case of some issues with consistency (that's why we need this whole mess with tombstones). 
> I believe common case with per-CF TTL is that you just want an efficient way of recover you disk space (and improve reads performance by having less sstables and less data in general).
> To work around this we currently periodically stop Cassandra, simply remove too old sstables, and start it back. Works OK, but does not solve problem fully (if tombstone is rewritten by compactions often, we will never remove it).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)