You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Adarsh Kumar <ad...@gmail.com> on 2019/10/18 05:27:08 UTC

TWCS and gc_grace_seconds

Hi,

We have a use case of time series data with TTL where we want to use
TimeWindowCompactionStrategy because of its better management for TTL and
tombstones. In this case, data we have is frequently deleted so we want to
reduce gc_grace_seconds to reduce the tombstones' life and reduce pressure
on storage. I have following questions:

   1. Do we always need to run repair for the table in reduced
   gc_grace_seconds or there is any other way to manage repairs in this vase
   2. Do we have any other strategy (or combination of strategies) to
   manage frequently deleted time-series data

Thanks in advance.

Adarsh Kumar

Re: TWCS and gc_grace_seconds

Posted by Jon Haddad <jo...@jonhaddad.com>.
My coworker Radovan wrote up a post on the relationship between gc grace
and hinted handoff:
https://thelastpickle.com/blog/2018/03/21/hinted-handoff-gc-grace-demystified.html

Jon

On Sat, Oct 26, 2019 at 6:45 AM Hossein Ghiyasi Mehr <gh...@gmail.com>
wrote:

> It needs to change gc_grace_seconds carefully because it has side effect
> on hinted handoff.
>
> On Fri, Oct 18, 2019 at 5:04 PM Paul Chandler <pa...@redshots.com> wrote:
>
>> Hi Adarsh,
>>
>> You will have problems if you manually delete data when using TWCS.
>>
>> To fully understand why, I recommend reading this The Last Pickle post:
>> https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html
>> And this post I wrote that dives deeper into the problems with deletes:
>> http://www.redshots.com/cassandra-twcs-must-have-ttls/
>>
>> Thanks
>>
>> Paul
>>
>> On 18 Oct 2019, at 14:22, Adarsh Kumar <ad...@gmail.com> wrote:
>>
>> Thanks Jeff,
>>
>>
>> I just checked with business and we have differences in having TTL. So it
>> will be manula purging always. We do not want to use LCS due to high IOs.
>> So:
>>
>>    1. As the use case is of time series data model, TWCS will be give
>>    some benefit (without TTL) and with frequent deleted data
>>    2. Are there any best practices/recommendations to handle high number
>>    of tombstones
>>    3. Can we handle this use case  with STCS also (with some
>>    configurations)
>>
>>
>> Thanks in advance
>>
>> Adarsh Kumar
>>
>> On Fri, Oct 18, 2019 at 11:46 AM Jeff Jirsa <jj...@gmail.com> wrote:
>>
>>> Is everything in the table TTL’d?
>>>
>>> Do you do explicit deletes before the data is expected to expire ?
>>>
>>> Generally speaking, gcgs exists to prevent data resurrection. But ttl’d
>>> data can’t be resurrected once it expires, so gcgs has no purpose unless
>>> you’re deleting it before the ttl expires. If you’re doing that, twcs won’t
>>> be able to drop whole sstables anyway, so maybe LCS will be less disk usage
>>> (but much higher IO)
>>>
>>> On Oct 17, 2019, at 10:36 PM, Adarsh Kumar <ad...@gmail.com> wrote:
>>>
>>> 
>>> Hi,
>>>
>>> We have a use case of time series data with TTL where we want to use
>>> TimeWindowCompactionStrategy because of its better management for TTL and
>>> tombstones. In this case, data we have is frequently deleted so we want to
>>> reduce gc_grace_seconds to reduce the tombstones' life and reduce pressure
>>> on storage. I have following questions:
>>>
>>>    1. Do we always need to run repair for the table in reduced
>>>    gc_grace_seconds or there is any other way to manage repairs in this vase
>>>    2. Do we have any other strategy (or combination of strategies) to
>>>    manage frequently deleted time-series data
>>>
>>> Thanks in advance.
>>>
>>> Adarsh Kumar
>>>
>>>
>>

Re: TWCS and gc_grace_seconds

Posted by Hossein Ghiyasi Mehr <gh...@gmail.com>.
It needs to change gc_grace_seconds carefully because it has side effect on
hinted handoff.

On Fri, Oct 18, 2019 at 5:04 PM Paul Chandler <pa...@redshots.com> wrote:

> Hi Adarsh,
>
> You will have problems if you manually delete data when using TWCS.
>
> To fully understand why, I recommend reading this The Last Pickle post:
> https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html
> And this post I wrote that dives deeper into the problems with deletes:
> http://www.redshots.com/cassandra-twcs-must-have-ttls/
>
> Thanks
>
> Paul
>
> On 18 Oct 2019, at 14:22, Adarsh Kumar <ad...@gmail.com> wrote:
>
> Thanks Jeff,
>
>
> I just checked with business and we have differences in having TTL. So it
> will be manula purging always. We do not want to use LCS due to high IOs.
> So:
>
>    1. As the use case is of time series data model, TWCS will be give
>    some benefit (without TTL) and with frequent deleted data
>    2. Are there any best practices/recommendations to handle high number
>    of tombstones
>    3. Can we handle this use case  with STCS also (with some
>    configurations)
>
>
> Thanks in advance
>
> Adarsh Kumar
>
> On Fri, Oct 18, 2019 at 11:46 AM Jeff Jirsa <jj...@gmail.com> wrote:
>
>> Is everything in the table TTL’d?
>>
>> Do you do explicit deletes before the data is expected to expire ?
>>
>> Generally speaking, gcgs exists to prevent data resurrection. But ttl’d
>> data can’t be resurrected once it expires, so gcgs has no purpose unless
>> you’re deleting it before the ttl expires. If you’re doing that, twcs won’t
>> be able to drop whole sstables anyway, so maybe LCS will be less disk usage
>> (but much higher IO)
>>
>> On Oct 17, 2019, at 10:36 PM, Adarsh Kumar <ad...@gmail.com> wrote:
>>
>> 
>> Hi,
>>
>> We have a use case of time series data with TTL where we want to use
>> TimeWindowCompactionStrategy because of its better management for TTL and
>> tombstones. In this case, data we have is frequently deleted so we want to
>> reduce gc_grace_seconds to reduce the tombstones' life and reduce pressure
>> on storage. I have following questions:
>>
>>    1. Do we always need to run repair for the table in reduced
>>    gc_grace_seconds or there is any other way to manage repairs in this vase
>>    2. Do we have any other strategy (or combination of strategies) to
>>    manage frequently deleted time-series data
>>
>> Thanks in advance.
>>
>> Adarsh Kumar
>>
>>
>

Re: TWCS and gc_grace_seconds

Posted by Paul Chandler <pa...@redshots.com>.
Hi Adarsh,

You will have problems if you manually delete data when using TWCS.

To fully understand why, I recommend reading this The Last Pickle post: https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html
And this post I wrote that dives deeper into the problems with deletes: http://www.redshots.com/cassandra-twcs-must-have-ttls/

Thanks 

Paul

> On 18 Oct 2019, at 14:22, Adarsh Kumar <ad...@gmail.com> wrote:
> 
> Thanks Jeff,
> 
> 
> I just checked with business and we have differences in having TTL. So it will be manula purging always. We do not want to use LCS due to high IOs.
> So:
> As the use case is of time series data model, TWCS will be give some benefit (without TTL) and with frequent deleted data
> Are there any best practices/recommendations to handle high number of tombstones 
> Can we handle this use case  with STCS also (with some configurations)
> 
> Thanks in advance
> 
> Adarsh Kumar
> 
> On Fri, Oct 18, 2019 at 11:46 AM Jeff Jirsa <jjirsa@gmail.com <ma...@gmail.com>> wrote:
> Is everything in the table TTL’d? 
> 
> Do you do explicit deletes before the data is expected to expire ? 
> 
> Generally speaking, gcgs exists to prevent data resurrection. But ttl’d data can’t be resurrected once it expires, so gcgs has no purpose unless you’re deleting it before the ttl expires. If you’re doing that, twcs won’t be able to drop whole sstables anyway, so maybe LCS will be less disk usage (but much higher IO)
> 
>> On Oct 17, 2019, at 10:36 PM, Adarsh Kumar <adarsh0007@gmail.com <ma...@gmail.com>> wrote:
>> 
>> 
>> Hi,
>> 
>> We have a use case of time series data with TTL where we want to use TimeWindowCompactionStrategy because of its better management for TTL and tombstones. In this case, data we have is frequently deleted so we want to reduce gc_grace_seconds to reduce the tombstones' life and reduce pressure on storage. I have following questions:
>> Do we always need to run repair for the table in reduced gc_grace_seconds or there is any other way to manage repairs in this vase
>> Do we have any other strategy (or combination of strategies) to manage frequently deleted time-series data
>> Thanks in advance.
>> 
>> Adarsh Kumar


Re: TWCS and gc_grace_seconds

Posted by Adarsh Kumar <ad...@gmail.com>.
Thanks Jeff,


I just checked with business and we have differences in having TTL. So it
will be manula purging always. We do not want to use LCS due to high IOs.
So:

   1. As the use case is of time series data model, TWCS will be give some
   benefit (without TTL) and with frequent deleted data
   2. Are there any best practices/recommendations to handle high number of
   tombstones
   3. Can we handle this use case  with STCS also (with some configurations)


Thanks in advance

Adarsh Kumar

On Fri, Oct 18, 2019 at 11:46 AM Jeff Jirsa <jj...@gmail.com> wrote:

> Is everything in the table TTL’d?
>
> Do you do explicit deletes before the data is expected to expire ?
>
> Generally speaking, gcgs exists to prevent data resurrection. But ttl’d
> data can’t be resurrected once it expires, so gcgs has no purpose unless
> you’re deleting it before the ttl expires. If you’re doing that, twcs won’t
> be able to drop whole sstables anyway, so maybe LCS will be less disk usage
> (but much higher IO)
>
> On Oct 17, 2019, at 10:36 PM, Adarsh Kumar <ad...@gmail.com> wrote:
>
> 
> Hi,
>
> We have a use case of time series data with TTL where we want to use
> TimeWindowCompactionStrategy because of its better management for TTL and
> tombstones. In this case, data we have is frequently deleted so we want to
> reduce gc_grace_seconds to reduce the tombstones' life and reduce pressure
> on storage. I have following questions:
>
>    1. Do we always need to run repair for the table in reduced
>    gc_grace_seconds or there is any other way to manage repairs in this vase
>    2. Do we have any other strategy (or combination of strategies) to
>    manage frequently deleted time-series data
>
> Thanks in advance.
>
> Adarsh Kumar
>
>

Re: TWCS and gc_grace_seconds

Posted by Jeff Jirsa <jj...@gmail.com>.
Is everything in the table TTL’d? 

Do you do explicit deletes before the data is expected to expire ? 

Generally speaking, gcgs exists to prevent data resurrection. But ttl’d data can’t be resurrected once it expires, so gcgs has no purpose unless you’re deleting it before the ttl expires. If you’re doing that, twcs won’t be able to drop whole sstables anyway, so maybe LCS will be less disk usage (but much higher IO)

> On Oct 17, 2019, at 10:36 PM, Adarsh Kumar <ad...@gmail.com> wrote:
> 
> 
> Hi,
> 
> We have a use case of time series data with TTL where we want to use TimeWindowCompactionStrategy because of its better management for TTL and tombstones. In this case, data we have is frequently deleted so we want to reduce gc_grace_seconds to reduce the tombstones' life and reduce pressure on storage. I have following questions:
> Do we always need to run repair for the table in reduced gc_grace_seconds or there is any other way to manage repairs in this vase
> Do we have any other strategy (or combination of strategies) to manage frequently deleted time-series data
> Thanks in advance.
> 
> Adarsh Kumar