You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Gabriel Giussi <ga...@gmail.com> on 2018/09/25 15:37:42 UTC

TTL tombstones in Cassandra using LCS are cretaed in the same level data TTLed data?

I'm using LCS and a relatively large TTL of 2 years for all inserted rows
and I'm concerned about the moment at wich C* would drop the corresponding
tombstones (neither explicit deletes nor updates are being performed).

From [Missing Manual for Leveled Compaction Strategy](
https://www.youtube.com/watch?v=-5sNVvL8RwI), [Tombstone Compactions in
Cassandra](https://www.youtube.com/watch?v=pher-9jqqC4) and [Deletes
Without Tombstones or TTLs](https://www.youtube.com/watch?v=BhGkSnBZgJA) I
understand that

 - All levels except L0 contain non-overlapping SSTables, but a partition
key may be present in one SSTable in each level (aka distributed in all
levels).
 - For a compaction to be able to drop a tombstone it must be sure that is
compacting all SStables that contains de data to prevent zombie data (this
is done checking bloom filters). It also considers gc_grace_seconds

So, for my particular use case (2 years TTL and write heavy load) I can
conclude that TTLed data will be in highest levels so I'm wondering when
those SSTables with TTLed data will be compacted with the SSTables that
contains the corresponding SSTables.
The main question will be: **Where are tombstones (from ttls) being
created? Are being created at Level 0 so it will take a long time until it
will end up in the highest levels (hence disk space will take long time to
be freed)?**

In a comment from [About deletes and tombstones](
http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html)
Alain says that
> Yet using TTLs helps, it reduces the chances of having data being
fragmented between SSTables that will not be compacted together any time
soon. Using any compaction strategy, if the delete comes relatively late in
the row history, as it use to happen, the 'upsert'/'insert' of the
tombstone will go to a new SSTable. It might take time for this tombstone
to get to the right compaction "bucket" (with the rest of the row) and for
Cassandra to be able to finally free space.
**My understanding is that with TTLs the tombstones is created in-place**,
thus it is often and for many reasons easier and safer to get rid of a TTLs
than from a delete.
Another clue to explore would be to use the TTL as a default value if
that's a good fit. TTLs set at the table level with 'default_time_to_live'
should not generate any tombstone at all in C*3.0+. Not tested on my hand,
but I read about this.

I'm not sure what it means with "*in-place*" since SSTables are immutable.
(I also have some doubts about what it says of using `default_time_to_live`
that I've asked in [How default_time_to_live would delete rows without
tombstones in Cassandra?](https://stackoverflow.com/q/52282517/3517383)).

My guess is that is referring to tombstones being created in the same level
(but different SStables) that the TTLed data during a compaction triggered
by one of the following reasons:

 1. "Going from highest level, any level having score higher than 1.001 can
be picked by a compaction thread" [The Missing Manual for Leveled
Compaction Strategy](
https://image.slidesharecdn.com/csummit16lcstalk-161004232416/95/the-missing-manual-for-leveled-compaction-strategy-wei-deng-ryan-svihla-datastax-cassandra-summit-2016-12-638.jpg?cb=1475693117
)
 2. "If we go 25 rounds without compacting in the highest level, we start
bringing in sstables from that level into lower level compactions" [The
Missing Manual for Leveled Compaction Strategy](
https://image.slidesharecdn.com/csummit16lcstalk-161004232416/95/the-missing-manual-for-leveled-compaction-strategy-wei-deng-ryan-svihla-datastax-cassandra-summit-2016-12-638.jpg?cb=1475693117
)
 3. "When there are no other compactions to do, we trigger a single-sstable
compaction if there is more than X% droppable tombstones in the sstable."
[CASSANDRA-7019](https://issues.apache.org/jira/browse/CASSANDRA-7019)
Since tombstones are created during compaction, I think it may be using
SSTable metadata to estimate droppable tombstones.

**So, compactions (2) and (3) should be creating/dropping tombstones in
highest levels hence using LCS with a large TTL should not be an issue per
se.**
With creating/dropping I mean that the same kind of compactions will be
creating tombstones for expired data and/or dropping tombstones if the gc
period has already passed.

A link to source code that clarifies this situation will be great, thanks.

Re: TTL tombstones in Cassandra using LCS are cretaed in the same level data TTLed data?

Posted by Gabriel Giussi <ga...@gmail.com>.

Hello Alain,

thanks again for answering.

Yes, I believe during the next compaction following the expiration date,
> the entry is 'transformed' into a tombstone, and lives in the SSTable that
> is the result of the compaction, on the level/bucket this SSTable is put
> into.
>

Great, however I'm still trying to figure it out a way to test this or see
it in code. If you have any idea I could give it a try.

I didn't understand what you mean with

> generally, it's good if you can rotate the partitions over time, not to
> reuse old partitions for example
>

About garbagecollect, it is a good idea but is not available in version
3.0.13.

Again, I've asked this on stackoverflow (
https://stackoverflow.com/q/52370661/3517383) top, so, just if you want,
you can answer there too and I will mark it as correct.

Cheers.

El jue., 27 sept. 2018 a las 14:11, Alain RODRIGUEZ (<ar...@gmail.com>)
escribió:

> Hello Gabriel,
>
> Another clue to explore would be to use the TTL as a default value if
>> that's a good fit. TTLs set at the table level with 'default_time_to_live'
>> should not generate any tombstone at all in C*3.0+. Not tested on my hand,
>> but I read about this.
>>
>
> As explained on a parallel thread, this is wrong ^, mea culpa. I believe
> the rest of my comment still stands (hopefully :)).
>
> I'm not sure what it means with "*in-place*" since SSTables are immutable.
>> [...]
>
>  My guess is that is referring to tombstones being created in the same
>> level (but different SStables) that the TTLed data during a compaction
>> triggered
>
>
> Yes, I believe during the next compaction following the expiration date,
> the entry is 'transformed' into a tombstone, and lives in the SSTable that
> is the result of the compaction, on the level/bucket this SSTable is put
> into. That's why I said 'in-place' which is indeed a bit weird for
> immutable data.
>
> As a side idea for your problem, on 'modern' versions of Cassandra (I
> don't remember the version, that's what 'modern' means ;-)), you can run
> 'nodetool garbagecollect' regularly (not necessarily frequently) during the
> off-peak period. That might use the cluster resources when you don't need
> them to claim some disk space. Also making sure that a 2 years old record
> is not being updated regularly by design would definitely help. In the
> extreme case of writing a data once (never updated) and with a TTL for
> example, I see no reason for a 2 years old data not to be evicted
> correctly. As long as the disk can grow, it should be fine.
>
> I would not be too much scared about it, as there is 'always' a way to
> remove tombstones. Yet it's good to think about the design beforehand
> indeed, generally, it's good if you can rotate the partitions over time,
> not to reuse old partitions for example.
>
> C*heers,
> -----------------------
> Alain Rodriguez - @arodream - alain@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> Le mar. 25 sept. 2018 à 17:38, Gabriel Giussi <ga...@gmail.com> a
> écrit :
>
>> I'm using LCS and a relatively large TTL of 2 years for all inserted rows
>> and I'm concerned about the moment at wich C* would drop the corresponding
>> tombstones (neither explicit deletes nor updates are being performed).
>>
>> From [Missing Manual for Leveled Compaction Strategy](
>> https://www.youtube.com/watch?v=-5sNVvL8RwI), [Tombstone Compactions in
>> Cassandra](https://www.youtube.com/watch?v=pher-9jqqC4) and [Deletes
>> Without Tombstones or TTLs](https://www.youtube.com/watch?v=BhGkSnBZgJA)
>> I understand that
>>
>>  - All levels except L0 contain non-overlapping SSTables, but a partition
>> key may be present in one SSTable in each level (aka distributed in all
>> levels).
>>  - For a compaction to be able to drop a tombstone it must be sure that
>> is compacting all SStables that contains de data to prevent zombie data
>> (this is done checking bloom filters). It also considers gc_grace_seconds
>>
>> So, for my particular use case (2 years TTL and write heavy load) I can
>> conclude that TTLed data will be in highest levels so I'm wondering when
>> those SSTables with TTLed data will be compacted with the SSTables that
>> contains the corresponding SSTables.
>> The main question will be: **Where are tombstones (from ttls) being
>> created? Are being created at Level 0 so it will take a long time until it
>> will end up in the highest levels (hence disk space will take long time to
>> be freed)?**
>>
>> In a comment from [About deletes and tombstones](
>> http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html)
>> Alain says that
>> > Yet using TTLs helps, it reduces the chances of having data being
>> fragmented between SSTables that will not be compacted together any time
>> soon. Using any compaction strategy, if the delete comes relatively late in
>> the row history, as it use to happen, the 'upsert'/'insert' of the
>> tombstone will go to a new SSTable. It might take time for this tombstone
>> to get to the right compaction "bucket" (with the rest of the row) and for
>> Cassandra to be able to finally free space.
>> **My understanding is that with TTLs the tombstones is created
>> in-place**, thus it is often and for many reasons easier and safer to get
>> rid of a TTLs than from a delete.
>> Another clue to explore would be to use the TTL as a default value if
>> that's a good fit. TTLs set at the table level with 'default_time_to_live'
>> should not generate any tombstone at all in C*3.0+. Not tested on my hand,
>> but I read about this.
>>
>> I'm not sure what it means with "*in-place*" since SSTables are
>> immutable.
>> (I also have some doubts about what it says of using
>> `default_time_to_live` that I've asked in [How default_time_to_live would
>> delete rows without tombstones in Cassandra?](
>> https://stackoverflow.com/q/52282517/3517383)).
>>
>> My guess is that is referring to tombstones being created in the same
>> level (but different SStables) that the TTLed data during a compaction
>> triggered by one of the following reasons:
>>
>>  1. "Going from highest level, any level having score higher than 1.001
>> can be picked by a compaction thread" [The Missing Manual for Leveled
>> Compaction Strategy](
>> https://image.slidesharecdn.com/csummit16lcstalk-161004232416/95/the-missing-manual-for-leveled-compaction-strategy-wei-deng-ryan-svihla-datastax-cassandra-summit-2016-12-638.jpg?cb=1475693117
>> )
>>  2. "If we go 25 rounds without compacting in the highest level, we start
>> bringing in sstables from that level into lower level compactions" [The
>> Missing Manual for Leveled Compaction Strategy](
>> https://image.slidesharecdn.com/csummit16lcstalk-161004232416/95/the-missing-manual-for-leveled-compaction-strategy-wei-deng-ryan-svihla-datastax-cassandra-summit-2016-12-638.jpg?cb=1475693117
>> )
>>  3. "When there are no other compactions to do, we trigger a
>> single-sstable compaction if there is more than X% droppable tombstones in
>> the sstable." [CASSANDRA-7019](
>> https://issues.apache.org/jira/browse/CASSANDRA-7019)
>> Since tombstones are created during compaction, I think it may be using
>> SSTable metadata to estimate droppable tombstones.
>>
>> **So, compactions (2) and (3) should be creating/dropping tombstones in
>> highest levels hence using LCS with a large TTL should not be an issue per
>> se.**
>> With creating/dropping I mean that the same kind of compactions will be
>> creating tombstones for expired data and/or dropping tombstones if the gc
>> period has already passed.
>>
>> A link to source code that clarifies this situation will be great, thanks.
>>
>

Re: TTL tombstones in Cassandra using LCS are cretaed in the same level data TTLed data?

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

Hello Gabriel,

Another clue to explore would be to use the TTL as a default value if
> that's a good fit. TTLs set at the table level with 'default_time_to_live'
> should not generate any tombstone at all in C*3.0+. Not tested on my hand,
> but I read about this.
>

As explained on a parallel thread, this is wrong ^, mea culpa. I believe
the rest of my comment still stands (hopefully :)).

I'm not sure what it means with "*in-place*" since SSTables are immutable.
> [...]

 My guess is that is referring to tombstones being created in the same
> level (but different SStables) that the TTLed data during a compaction
> triggered


Yes, I believe during the next compaction following the expiration date,
the entry is 'transformed' into a tombstone, and lives in the SSTable that
is the result of the compaction, on the level/bucket this SSTable is put
into. That's why I said 'in-place' which is indeed a bit weird for
immutable data.

As a side idea for your problem, on 'modern' versions of Cassandra (I don't
remember the version, that's what 'modern' means ;-)), you can run
'nodetool garbagecollect' regularly (not necessarily frequently) during the
off-peak period. That might use the cluster resources when you don't need
them to claim some disk space. Also making sure that a 2 years old record
is not being updated regularly by design would definitely help. In the
extreme case of writing a data once (never updated) and with a TTL for
example, I see no reason for a 2 years old data not to be evicted
correctly. As long as the disk can grow, it should be fine.

I would not be too much scared about it, as there is 'always' a way to
remove tombstones. Yet it's good to think about the design beforehand
indeed, generally, it's good if you can rotate the partitions over time,
not to reuse old partitions for example.

C*heers,
-----------------------
Alain Rodriguez - @arodream - alain@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

Le mar. 25 sept. 2018 à 17:38, Gabriel Giussi <ga...@gmail.com> a
écrit :

> I'm using LCS and a relatively large TTL of 2 years for all inserted rows
> and I'm concerned about the moment at wich C* would drop the corresponding
> tombstones (neither explicit deletes nor updates are being performed).
>
> From [Missing Manual for Leveled Compaction Strategy](
> https://www.youtube.com/watch?v=-5sNVvL8RwI), [Tombstone Compactions in
> Cassandra](https://www.youtube.com/watch?v=pher-9jqqC4) and [Deletes
> Without Tombstones or TTLs](https://www.youtube.com/watch?v=BhGkSnBZgJA)
> I understand that
>
>  - All levels except L0 contain non-overlapping SSTables, but a partition
> key may be present in one SSTable in each level (aka distributed in all
> levels).
>  - For a compaction to be able to drop a tombstone it must be sure that is
> compacting all SStables that contains de data to prevent zombie data (this
> is done checking bloom filters). It also considers gc_grace_seconds
>
> So, for my particular use case (2 years TTL and write heavy load) I can
> conclude that TTLed data will be in highest levels so I'm wondering when
> those SSTables with TTLed data will be compacted with the SSTables that
> contains the corresponding SSTables.
> The main question will be: **Where are tombstones (from ttls) being
> created? Are being created at Level 0 so it will take a long time until it
> will end up in the highest levels (hence disk space will take long time to
> be freed)?**
>
> In a comment from [About deletes and tombstones](
> http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html)
> Alain says that
> > Yet using TTLs helps, it reduces the chances of having data being
> fragmented between SSTables that will not be compacted together any time
> soon. Using any compaction strategy, if the delete comes relatively late in
> the row history, as it use to happen, the 'upsert'/'insert' of the
> tombstone will go to a new SSTable. It might take time for this tombstone
> to get to the right compaction "bucket" (with the rest of the row) and for
> Cassandra to be able to finally free space.
> **My understanding is that with TTLs the tombstones is created in-place**,
> thus it is often and for many reasons easier and safer to get rid of a TTLs
> than from a delete.
> Another clue to explore would be to use the TTL as a default value if
> that's a good fit. TTLs set at the table level with 'default_time_to_live'
> should not generate any tombstone at all in C*3.0+. Not tested on my hand,
> but I read about this.
>
> I'm not sure what it means with "*in-place*" since SSTables are
> immutable.
> (I also have some doubts about what it says of using
> `default_time_to_live` that I've asked in [How default_time_to_live would
> delete rows without tombstones in Cassandra?](
> https://stackoverflow.com/q/52282517/3517383)).
>
> My guess is that is referring to tombstones being created in the same
> level (but different SStables) that the TTLed data during a compaction
> triggered by one of the following reasons:
>
>  1. "Going from highest level, any level having score higher than 1.001
> can be picked by a compaction thread" [The Missing Manual for Leveled
> Compaction Strategy](
> https://image.slidesharecdn.com/csummit16lcstalk-161004232416/95/the-missing-manual-for-leveled-compaction-strategy-wei-deng-ryan-svihla-datastax-cassandra-summit-2016-12-638.jpg?cb=1475693117
> )
>  2. "If we go 25 rounds without compacting in the highest level, we start
> bringing in sstables from that level into lower level compactions" [The
> Missing Manual for Leveled Compaction Strategy](
> https://image.slidesharecdn.com/csummit16lcstalk-161004232416/95/the-missing-manual-for-leveled-compaction-strategy-wei-deng-ryan-svihla-datastax-cassandra-summit-2016-12-638.jpg?cb=1475693117
> )
>  3. "When there are no other compactions to do, we trigger a
> single-sstable compaction if there is more than X% droppable tombstones in
> the sstable." [CASSANDRA-7019](
> https://issues.apache.org/jira/browse/CASSANDRA-7019)
> Since tombstones are created during compaction, I think it may be using
> SSTable metadata to estimate droppable tombstones.
>
> **So, compactions (2) and (3) should be creating/dropping tombstones in
> highest levels hence using LCS with a large TTL should not be an issue per
> se.**
> With creating/dropping I mean that the same kind of compactions will be
> creating tombstones for expired data and/or dropping tombstones if the gc
> period has already passed.
>
> A link to source code that clarifies this situation will be great, thanks.
>