You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Gabriel Giussi <ga...@gmail.com> on 2018/10/04 19:35:38 UTC

Re: TTL tombstones in Cassandra using LCS are cretaed in the same level data TTLed data?

Hello Alain,

thanks again for answering.

Yes, I believe during the next compaction following the expiration date,
> the entry is 'transformed' into a tombstone, and lives in the SSTable that
> is the result of the compaction, on the level/bucket this SSTable is put
> into.
>

Great, however I'm still trying to figure it out a way to test this or see
it in code. If you have any idea I could give it a try.

I didn't understand what you mean with

> generally, it's good if you can rotate the partitions over time, not to
> reuse old partitions for example
>

About garbagecollect, it is a good idea but is not available in version
3.0.13.

Again, I've asked this on stackoverflow (
https://stackoverflow.com/q/52370661/3517383) top, so, just if you want,
you can answer there too and I will mark it as correct.

Cheers.

El jue., 27 sept. 2018 a las 14:11, Alain RODRIGUEZ (<ar...@gmail.com>)
escribió:

> Hello Gabriel,
>
> Another clue to explore would be to use the TTL as a default value if
>> that's a good fit. TTLs set at the table level with 'default_time_to_live'
>> should not generate any tombstone at all in C*3.0+. Not tested on my hand,
>> but I read about this.
>>
>
> As explained on a parallel thread, this is wrong ^, mea culpa. I believe
> the rest of my comment still stands (hopefully :)).
>
> I'm not sure what it means with "*in-place*" since SSTables are immutable.
>> [...]
>
>  My guess is that is referring to tombstones being created in the same
>> level (but different SStables) that the TTLed data during a compaction
>> triggered
>
>
> Yes, I believe during the next compaction following the expiration date,
> the entry is 'transformed' into a tombstone, and lives in the SSTable that
> is the result of the compaction, on the level/bucket this SSTable is put
> into. That's why I said 'in-place' which is indeed a bit weird for
> immutable data.
>
> As a side idea for your problem, on 'modern' versions of Cassandra (I
> don't remember the version, that's what 'modern' means ;-)), you can run
> 'nodetool garbagecollect' regularly (not necessarily frequently) during the
> off-peak period. That might use the cluster resources when you don't need
> them to claim some disk space. Also making sure that a 2 years old record
> is not being updated regularly by design would definitely help. In the
> extreme case of writing a data once (never updated) and with a TTL for
> example, I see no reason for a 2 years old data not to be evicted
> correctly. As long as the disk can grow, it should be fine.
>
> I would not be too much scared about it, as there is 'always' a way to
> remove tombstones. Yet it's good to think about the design beforehand
> indeed, generally, it's good if you can rotate the partitions over time,
> not to reuse old partitions for example.
>
> C*heers,
> -----------------------
> Alain Rodriguez - @arodream - alain@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> Le mar. 25 sept. 2018 à 17:38, Gabriel Giussi <ga...@gmail.com> a
> écrit :
>
>> I'm using LCS and a relatively large TTL of 2 years for all inserted rows
>> and I'm concerned about the moment at wich C* would drop the corresponding
>> tombstones (neither explicit deletes nor updates are being performed).
>>
>> From [Missing Manual for Leveled Compaction Strategy](
>> https://www.youtube.com/watch?v=-5sNVvL8RwI), [Tombstone Compactions in
>> Cassandra](https://www.youtube.com/watch?v=pher-9jqqC4) and [Deletes
>> Without Tombstones or TTLs](https://www.youtube.com/watch?v=BhGkSnBZgJA)
>> I understand that
>>
>>  - All levels except L0 contain non-overlapping SSTables, but a partition
>> key may be present in one SSTable in each level (aka distributed in all
>> levels).
>>  - For a compaction to be able to drop a tombstone it must be sure that
>> is compacting all SStables that contains de data to prevent zombie data
>> (this is done checking bloom filters). It also considers gc_grace_seconds
>>
>> So, for my particular use case (2 years TTL and write heavy load) I can
>> conclude that TTLed data will be in highest levels so I'm wondering when
>> those SSTables with TTLed data will be compacted with the SSTables that
>> contains the corresponding SSTables.
>> The main question will be: **Where are tombstones (from ttls) being
>> created? Are being created at Level 0 so it will take a long time until it
>> will end up in the highest levels (hence disk space will take long time to
>> be freed)?**
>>
>> In a comment from [About deletes and tombstones](
>> http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html)
>> Alain says that
>> > Yet using TTLs helps, it reduces the chances of having data being
>> fragmented between SSTables that will not be compacted together any time
>> soon. Using any compaction strategy, if the delete comes relatively late in
>> the row history, as it use to happen, the 'upsert'/'insert' of the
>> tombstone will go to a new SSTable. It might take time for this tombstone
>> to get to the right compaction "bucket" (with the rest of the row) and for
>> Cassandra to be able to finally free space.
>> **My understanding is that with TTLs the tombstones is created
>> in-place**, thus it is often and for many reasons easier and safer to get
>> rid of a TTLs than from a delete.
>> Another clue to explore would be to use the TTL as a default value if
>> that's a good fit. TTLs set at the table level with 'default_time_to_live'
>> should not generate any tombstone at all in C*3.0+. Not tested on my hand,
>> but I read about this.
>>
>> I'm not sure what it means with "*in-place*" since SSTables are
>> immutable.
>> (I also have some doubts about what it says of using
>> `default_time_to_live` that I've asked in [How default_time_to_live would
>> delete rows without tombstones in Cassandra?](
>> https://stackoverflow.com/q/52282517/3517383)).
>>
>> My guess is that is referring to tombstones being created in the same
>> level (but different SStables) that the TTLed data during a compaction
>> triggered by one of the following reasons:
>>
>>  1. "Going from highest level, any level having score higher than 1.001
>> can be picked by a compaction thread" [The Missing Manual for Leveled
>> Compaction Strategy](
>> https://image.slidesharecdn.com/csummit16lcstalk-161004232416/95/the-missing-manual-for-leveled-compaction-strategy-wei-deng-ryan-svihla-datastax-cassandra-summit-2016-12-638.jpg?cb=1475693117
>> )
>>  2. "If we go 25 rounds without compacting in the highest level, we start
>> bringing in sstables from that level into lower level compactions" [The
>> Missing Manual for Leveled Compaction Strategy](
>> https://image.slidesharecdn.com/csummit16lcstalk-161004232416/95/the-missing-manual-for-leveled-compaction-strategy-wei-deng-ryan-svihla-datastax-cassandra-summit-2016-12-638.jpg?cb=1475693117
>> )
>>  3. "When there are no other compactions to do, we trigger a
>> single-sstable compaction if there is more than X% droppable tombstones in
>> the sstable." [CASSANDRA-7019](
>> https://issues.apache.org/jira/browse/CASSANDRA-7019)
>> Since tombstones are created during compaction, I think it may be using
>> SSTable metadata to estimate droppable tombstones.
>>
>> **So, compactions (2) and (3) should be creating/dropping tombstones in
>> highest levels hence using LCS with a large TTL should not be an issue per
>> se.**
>> With creating/dropping I mean that the same kind of compactions will be
>> creating tombstones for expired data and/or dropping tombstones if the gc
>> period has already passed.
>>
>> A link to source code that clarifies this situation will be great, thanks.
>>
>