You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "sankalp kohli (JIRA)" <ji...@apache.org> on 2014/05/31 00:02:01 UTC

[jira] [Created] (CASSANDRA-7331) Improve Droppable Tombstone compaction

sankalp kohli created CASSANDRA-7331:
----------------------------------------

Summary: Improve Droppable Tombstone compaction
Key: CASSANDRA-7331
URL: https://issues.apache.org/jira/browse/CASSANDRA-7331
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: sankalp kohli
Priority: Minor

I was thinking about this idea so creating a JIRA to discuss it.
Currently we do compaction for stables which have more than configurable number of droppable tombstones.
Also there is another JIRA CASSANDRA-7019 to do compactions involving multiple stables from different levels which will be triggered based of same threshold.

One of the areas of improvement here to pick better candidates will be to find out if a tombstone can actually get rid of data in other stables.
We can add a byte to tombstone to keep track of whether it has knocked off the actual data(for which it is there) or not.
All tombstones will start out with 0 as its value. When it compacts with other stables and causes data to be deleted, it will be incremented.
For cases where there are multiple updates and then a delete, this value can be more than 1 depending on how many updates came in before delete.

If we have this, by looking at these numbers in tombstones, we can find a stable which by compacting, we will get rid of maximum data. We can also add a global number per stable which sums up these numbers.

I am not sure how this will work with range tombstones and whether this will be useful.

--
This message was sent by Atlassian JIRA
(v6.2#6252)