You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by DuyHai Doan <do...@gmail.com> on 2018/11/06 08:40:19 UTC

Tombstone removal optimization and question

Hello all

I have tried to sum up all rules related to tombstone removal:

----------------------------------------------------------------------------------

Given a tombstone written at timestamp (t) for a partition key (P) in
SSTable (S1). This tombstone will be removed:

1) after gc_grace_seconds period has passed
2) at the next compaction round, if SSTable S1 is selected (not at all
guaranteed because compaction is not deterministic)
3) if the partition key (P) is not present in any other SSTable that is NOT
picked by the current round of compaction

Rule 3) is quite complex to understand so here is the detailed explanation:

If Partition Key (P) also exists in another SSTable (S2) that is NOT
compacted together with SSTable (S1), if we remove the tombstone, there is
some data in S2 that may resurrect.

Precisely, at compaction time, Cassandra does not have ANY detail about
Partition (P) that stays in S2 so it cannot remove the tombstone right away.

Now, for each SSTable, we have some metadata, namely minTimestamp and
maxTimestamp.

I wonder if the current compaction optimization does use/leverage this
metadata for tombstone removal. Indeed if we know that tombstone timestamp
(t) < minTimestamp, it can be safely removed.

Does someone has the info ?

Regards

Re: Tombstone removal optimization and question

Posted by DuyHai Doan <do...@gmail.com>.

Thanks for the confirmation Kurt

Le 6 nov. 2018 11:59, "kurt greaves" <ku...@instaclustr.com> a écrit :

> Yes it does. Consider if it didn't and you kept writing to the same
> partition, you'd never be able to remove any tombstones for that partition.
>
> On Tue., 6 Nov. 2018, 19:40 DuyHai Doan <doanduyhai@gmail.com wrote:
>
>> Hello all
>>
>> I have tried to sum up all rules related to tombstone removal:
>>
>> ------------------------------------------------------------
>> ----------------------
>>
>> Given a tombstone written at timestamp (t) for a partition key (P) in
>> SSTable (S1). This tombstone will be removed:
>>
>> 1) after gc_grace_seconds period has passed
>> 2) at the next compaction round, if SSTable S1 is selected (not at all
>> guaranteed because compaction is not deterministic)
>> 3) if the partition key (P) is not present in any other SSTable that is
>> NOT picked by the current round of compaction
>>
>> Rule 3) is quite complex to understand so here is the detailed
>> explanation:
>>
>> If Partition Key (P) also exists in another SSTable (S2) that is NOT
>> compacted together with SSTable (S1), if we remove the tombstone, there is
>> some data in S2 that may resurrect.
>>
>> Precisely, at compaction time, Cassandra does not have ANY detail about
>> Partition (P) that stays in S2 so it cannot remove the tombstone right away.
>>
>> Now, for each SSTable, we have some metadata, namely minTimestamp and
>> maxTimestamp.
>>
>> I wonder if the current compaction optimization does use/leverage this
>> metadata for tombstone removal. Indeed if we know that tombstone timestamp
>> (t) < minTimestamp, it can be safely removed.
>>
>> Does someone has the info ?
>>
>> Regards
>>
>>
>>

Re: Tombstone removal optimization and question

Posted by kurt greaves <ku...@instaclustr.com>.

Yes it does. Consider if it didn't and you kept writing to the same
partition, you'd never be able to remove any tombstones for that partition.

On Tue., 6 Nov. 2018, 19:40 DuyHai Doan <doanduyhai@gmail.com wrote:

> Hello all
>
> I have tried to sum up all rules related to tombstone removal:
>
>
> ----------------------------------------------------------------------------------
>
> Given a tombstone written at timestamp (t) for a partition key (P) in
> SSTable (S1). This tombstone will be removed:
>
> 1) after gc_grace_seconds period has passed
> 2) at the next compaction round, if SSTable S1 is selected (not at all
> guaranteed because compaction is not deterministic)
> 3) if the partition key (P) is not present in any other SSTable that is
> NOT picked by the current round of compaction
>
> Rule 3) is quite complex to understand so here is the detailed explanation:
>
> If Partition Key (P) also exists in another SSTable (S2) that is NOT
> compacted together with SSTable (S1), if we remove the tombstone, there is
> some data in S2 that may resurrect.
>
> Precisely, at compaction time, Cassandra does not have ANY detail about
> Partition (P) that stays in S2 so it cannot remove the tombstone right away.
>
> Now, for each SSTable, we have some metadata, namely minTimestamp and
> maxTimestamp.
>
> I wonder if the current compaction optimization does use/leverage this
> metadata for tombstone removal. Indeed if we know that tombstone timestamp
> (t) < minTimestamp, it can be safely removed.
>
> Does someone has the info ?
>
> Regards
>
>
>