You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by DuyHai Doan <do...@gmail.com> on 2019/02/11 20:12:12 UTC

Max number of windows when using TWCS

Hello users

On the official documentation for TWCS (
http://cassandra.apache.org/doc/latest/operating/compaction.html#time-window-compactionstrategy)
it is advised to select the windows unit and size so that the total number
of windows intervals is around 20-30.

Is there any explanation for this range of 20-30 ? What if we exceed this
range, let's say having 1 day windows and keeping data for 1year, thus
having indeed 356 intervals ? What can go wrong with this ?

Regards

Duy Hai DOAN

Re: Max number of windows when using TWCS

Posted by DuyHai Doan <do...@gmail.com>.
thanks for the pointer Jeff

On Mon, Feb 11, 2019 at 9:40 PM Jeff Jirsa <jj...@gmail.com> wrote:

> There's a bit of headache around overlapping sstables being strictly safe
> to delete.  https://issues.apache.org/jira/browse/CASSANDRA-13418 was
> added to allow the "I know it's not technically safe, but just delete it
> anyway" use case. For a lot of people who started using TWCS before 13418,
> "stop cassandra, remove stuff we know is expired, start cassandra" is a
> not-uncommon pattern in very high-write, high-disk-space use cases.
>
>
>
> On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth <ni...@gmail.com>
> wrote:
>
>> Hi,
>> In regards to comment “Purging data is also straightforward, just
>> dropping SSTables (by a script) where create date is older than a
>> threshold, we don't even need to rely on TTL”
>>
>> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds
>> past whole sstable will have only tombstones.
>>
>>
>> Regards,
>>
>> Nitan
>>
>> Cell: 510 449 9629
>>
>> On Feb 11, 2019, at 2:23 PM, DuyHai Doan <do...@gmail.com> wrote:
>>
>> Purging data is also straightforward, just dropping SSTables (by a
>> script) where create date is older than a threshold, we don't even need to
>> rely on TTL
>>
>>

Re: Max number of windows when using TWCS

Posted by Osman YOZGATLIOĞLU <os...@krontech.com>.
Hello,

By the way, about https://issues.apache.org/jira/browse/CASSANDRA-13418, I'm not sure how to apply this solution.

Do you have a guide about it?


Regards,

Osman


On 12.02.2019 01:42, Nitan Kainth wrote:
That’s right Jeff. That’s why I am thinking why not compaction gets rid of old exited sstables?


Regards,
Nitan
Cell: 510 449 9629<tel:510%20449%209629>

On Feb 11, 2019, at 3:53 PM, Jeff Jirsa <jj...@gmail.com>> wrote:

It's probably not safe. You shouldn't touch the underlying sstables unless you're very sure you know what you're doing.


On Mon, Feb 11, 2019 at 1:05 PM Akash Gangil <ak...@gmail.com>> wrote:
I have in the past tried to delete SSTables manually, but have noticed bits and pieces of that data still remain, even though the sstables of that window is deleted. So always wondered if playing directly with the underlying filesystem is a safe bet?


On Mon, Feb 11, 2019 at 1:01 PM Jonathan Haddad <jo...@jonhaddad.com>> wrote:
Deleting SSTables manually can be useful if you don't know your TTL up front.  For example, you have an ETL process that moves your raw Cassandra data into S3 as parquet files, and you want to be sure that process is completed before you delete the data.  You could also start out without setting a TTL and later realize you need one.  This is a remarkably common problem.

On Mon, Feb 11, 2019 at 12:51 PM Nitan Kainth <ni...@gmail.com>> wrote:
Jeff,

It means we have to delete sstables manually?


Regards,
Nitan
Cell: 510 449 9629<tel:510%20449%209629>

On Feb 11, 2019, at 2:40 PM, Jeff Jirsa <jj...@gmail.com>> wrote:

There's a bit of headache around overlapping sstables being strictly safe to delete.  https://issues.apache.org/jira/browse/CASSANDRA-13418 was added to allow the "I know it's not technically safe, but just delete it anyway" use case. For a lot of people who started using TWCS before 13418, "stop cassandra, remove stuff we know is expired, start cassandra" is a not-uncommon pattern in very high-write, high-disk-space use cases.



On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth <ni...@gmail.com>> wrote:
Hi,
In regards to comment “Purging data is also straightforward, just dropping SSTables (by a script) where create date is older than a threshold, we don't even need to rely on TTL”

Doesn’t the old sstables drop by itself? One ttl and gc grace seconds past whole sstable will have only tombstones.


Regards,
Nitan
Cell: 510 449 9629<tel:510%20449%209629>

On Feb 11, 2019, at 2:23 PM, DuyHai Doan <do...@gmail.com>> wrote:

Purging data is also straightforward, just dropping SSTables (by a script) where create date is older than a threshold, we don't even need to rely on TTL


--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


--
Akash

Re: Max number of windows when using TWCS

Posted by Nitan Kainth <ni...@gmail.com>.
That’s right Jeff. That’s why I am thinking why not compaction gets rid of old exited sstables?


Regards,
Nitan
Cell: 510 449 9629

> On Feb 11, 2019, at 3:53 PM, Jeff Jirsa <jj...@gmail.com> wrote:
> 
> It's probably not safe. You shouldn't touch the underlying sstables unless you're very sure you know what you're doing.
> 
> 
>> On Mon, Feb 11, 2019 at 1:05 PM Akash Gangil <ak...@gmail.com> wrote:
>> I have in the past tried to delete SSTables manually, but have noticed bits and pieces of that data still remain, even though the sstables of that window is deleted. So always wondered if playing directly with the underlying filesystem is a safe bet?
>> 
>> 
>>> On Mon, Feb 11, 2019 at 1:01 PM Jonathan Haddad <jo...@jonhaddad.com> wrote:
>>> Deleting SSTables manually can be useful if you don't know your TTL up front.  For example, you have an ETL process that moves your raw Cassandra data into S3 as parquet files, and you want to be sure that process is completed before you delete the data.  You could also start out without setting a TTL and later realize you need one.  This is a remarkably common problem.
>>> 
>>>> On Mon, Feb 11, 2019 at 12:51 PM Nitan Kainth <ni...@gmail.com> wrote:
>>>> Jeff,
>>>> 
>>>> It means we have to delete sstables manually?
>>>> 
>>>> 
>>>> Regards,
>>>> Nitan
>>>> Cell: 510 449 9629
>>>> 
>>>>> On Feb 11, 2019, at 2:40 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>>>>> 
>>>>> There's a bit of headache around overlapping sstables being strictly safe to delete.  https://issues.apache.org/jira/browse/CASSANDRA-13418 was added to allow the "I know it's not technically safe, but just delete it anyway" use case. For a lot of people who started using TWCS before 13418, "stop cassandra, remove stuff we know is expired, start cassandra" is a not-uncommon pattern in very high-write, high-disk-space use cases. 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth <ni...@gmail.com> wrote:
>>>>>> Hi,
>>>>>> In regards to comment “Purging data is also straightforward, just dropping SSTables (by a script) where create date is older than a threshold, we don't even need to rely on TTL”
>>>>>> 
>>>>>> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds past whole sstable will have only tombstones.
>>>>>> 
>>>>>> 
>>>>>> Regards,
>>>>>> Nitan
>>>>>> Cell: 510 449 9629
>>>>>> 
>>>>>>> On Feb 11, 2019, at 2:23 PM, DuyHai Doan <do...@gmail.com> wrote:
>>>>>>> 
>>>>>>> Purging data is also straightforward, just dropping SSTables (by a script) where create date is older than a threshold, we don't even need to rely on TTL
>>> 
>>> 
>>> -- 
>>> Jon Haddad
>>> http://www.rustyrazorblade.com
>>> twitter: rustyrazorblade
>> 
>> 
>> -- 
>> Akash

Re: Max number of windows when using TWCS

Posted by Jeff Jirsa <jj...@gmail.com>.
It's probably not safe. You shouldn't touch the underlying sstables unless
you're very sure you know what you're doing.


On Mon, Feb 11, 2019 at 1:05 PM Akash Gangil <ak...@gmail.com> wrote:

> I have in the past tried to delete SSTables manually, but have noticed
> bits and pieces of that data still remain, even though the sstables of that
> window is deleted. So always wondered if playing directly with the
> underlying filesystem is a safe bet?
>
>
> On Mon, Feb 11, 2019 at 1:01 PM Jonathan Haddad <jo...@jonhaddad.com> wrote:
>
>> Deleting SSTables manually can be useful if you don't know your TTL up
>> front.  For example, you have an ETL process that moves your raw Cassandra
>> data into S3 as parquet files, and you want to be sure that process is
>> completed before you delete the data.  You could also start out without
>> setting a TTL and later realize you need one.  This is a remarkably common
>> problem.
>>
>> On Mon, Feb 11, 2019 at 12:51 PM Nitan Kainth <ni...@gmail.com>
>> wrote:
>>
>>> Jeff,
>>>
>>> It means we have to delete sstables manually?
>>>
>>>
>>> Regards,
>>>
>>> Nitan
>>>
>>> Cell: 510 449 9629
>>>
>>> On Feb 11, 2019, at 2:40 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>>>
>>> There's a bit of headache around overlapping sstables being strictly
>>> safe to delete.  https://issues.apache.org/jira/browse/CASSANDRA-13418
>>> was added to allow the "I know it's not technically safe, but just delete
>>> it anyway" use case. For a lot of people who started using TWCS before
>>> 13418, "stop cassandra, remove stuff we know is expired, start cassandra"
>>> is a not-uncommon pattern in very high-write, high-disk-space use cases.
>>>
>>>
>>>
>>> On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth <ni...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>> In regards to comment “Purging data is also straightforward, just
>>>> dropping SSTables (by a script) where create date is older than a
>>>> threshold, we don't even need to rely on TTL”
>>>>
>>>> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds
>>>> past whole sstable will have only tombstones.
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Nitan
>>>>
>>>> Cell: 510 449 9629
>>>>
>>>> On Feb 11, 2019, at 2:23 PM, DuyHai Doan <do...@gmail.com> wrote:
>>>>
>>>> Purging data is also straightforward, just dropping SSTables (by a
>>>> script) where create date is older than a threshold, we don't even need to
>>>> rely on TTL
>>>>
>>>>
>>
>> --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
>>
>
>
> --
> Akash
>

Re: Max number of windows when using TWCS

Posted by Akash Gangil <ak...@gmail.com>.
I have in the past tried to delete SSTables manually, but have noticed bits
and pieces of that data still remain, even though the sstables of that
window is deleted. So always wondered if playing directly with the
underlying filesystem is a safe bet?


On Mon, Feb 11, 2019 at 1:01 PM Jonathan Haddad <jo...@jonhaddad.com> wrote:

> Deleting SSTables manually can be useful if you don't know your TTL up
> front.  For example, you have an ETL process that moves your raw Cassandra
> data into S3 as parquet files, and you want to be sure that process is
> completed before you delete the data.  You could also start out without
> setting a TTL and later realize you need one.  This is a remarkably common
> problem.
>
> On Mon, Feb 11, 2019 at 12:51 PM Nitan Kainth <ni...@gmail.com>
> wrote:
>
>> Jeff,
>>
>> It means we have to delete sstables manually?
>>
>>
>> Regards,
>>
>> Nitan
>>
>> Cell: 510 449 9629
>>
>> On Feb 11, 2019, at 2:40 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>>
>> There's a bit of headache around overlapping sstables being strictly safe
>> to delete.  https://issues.apache.org/jira/browse/CASSANDRA-13418 was
>> added to allow the "I know it's not technically safe, but just delete it
>> anyway" use case. For a lot of people who started using TWCS before 13418,
>> "stop cassandra, remove stuff we know is expired, start cassandra" is a
>> not-uncommon pattern in very high-write, high-disk-space use cases.
>>
>>
>>
>> On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth <ni...@gmail.com>
>> wrote:
>>
>>> Hi,
>>> In regards to comment “Purging data is also straightforward, just
>>> dropping SSTables (by a script) where create date is older than a
>>> threshold, we don't even need to rely on TTL”
>>>
>>> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds
>>> past whole sstable will have only tombstones.
>>>
>>>
>>> Regards,
>>>
>>> Nitan
>>>
>>> Cell: 510 449 9629
>>>
>>> On Feb 11, 2019, at 2:23 PM, DuyHai Doan <do...@gmail.com> wrote:
>>>
>>> Purging data is also straightforward, just dropping SSTables (by a
>>> script) where create date is older than a threshold, we don't even need to
>>> rely on TTL
>>>
>>>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>


-- 
Akash

Re: Max number of windows when using TWCS

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
Deleting SSTables manually can be useful if you don't know your TTL up
front.  For example, you have an ETL process that moves your raw Cassandra
data into S3 as parquet files, and you want to be sure that process is
completed before you delete the data.  You could also start out without
setting a TTL and later realize you need one.  This is a remarkably common
problem.

On Mon, Feb 11, 2019 at 12:51 PM Nitan Kainth <ni...@gmail.com> wrote:

> Jeff,
>
> It means we have to delete sstables manually?
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
> On Feb 11, 2019, at 2:40 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>
> There's a bit of headache around overlapping sstables being strictly safe
> to delete.  https://issues.apache.org/jira/browse/CASSANDRA-13418 was
> added to allow the "I know it's not technically safe, but just delete it
> anyway" use case. For a lot of people who started using TWCS before 13418,
> "stop cassandra, remove stuff we know is expired, start cassandra" is a
> not-uncommon pattern in very high-write, high-disk-space use cases.
>
>
>
> On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth <ni...@gmail.com>
> wrote:
>
>> Hi,
>> In regards to comment “Purging data is also straightforward, just
>> dropping SSTables (by a script) where create date is older than a
>> threshold, we don't even need to rely on TTL”
>>
>> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds
>> past whole sstable will have only tombstones.
>>
>>
>> Regards,
>>
>> Nitan
>>
>> Cell: 510 449 9629
>>
>> On Feb 11, 2019, at 2:23 PM, DuyHai Doan <do...@gmail.com> wrote:
>>
>> Purging data is also straightforward, just dropping SSTables (by a
>> script) where create date is older than a threshold, we don't even need to
>> rely on TTL
>>
>>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Max number of windows when using TWCS

Posted by Nitan Kainth <ni...@gmail.com>.
Jeff,

It means we have to delete sstables manually?


Regards,
Nitan
Cell: 510 449 9629

> On Feb 11, 2019, at 2:40 PM, Jeff Jirsa <jj...@gmail.com> wrote:
> 
> There's a bit of headache around overlapping sstables being strictly safe to delete.  https://issues.apache.org/jira/browse/CASSANDRA-13418 was added to allow the "I know it's not technically safe, but just delete it anyway" use case. For a lot of people who started using TWCS before 13418, "stop cassandra, remove stuff we know is expired, start cassandra" is a not-uncommon pattern in very high-write, high-disk-space use cases. 
> 
> 
> 
>> On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth <ni...@gmail.com> wrote:
>> Hi,
>> In regards to comment “Purging data is also straightforward, just dropping SSTables (by a script) where create date is older than a threshold, we don't even need to rely on TTL”
>> 
>> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds past whole sstable will have only tombstones.
>> 
>> 
>> Regards,
>> Nitan
>> Cell: 510 449 9629
>> 
>>> On Feb 11, 2019, at 2:23 PM, DuyHai Doan <do...@gmail.com> wrote:
>>> 
>>> Purging data is also straightforward, just dropping SSTables (by a script) where create date is older than a threshold, we don't even need to rely on TTL

Re: Max number of windows when using TWCS

Posted by Jeff Jirsa <jj...@gmail.com>.
There's a bit of headache around overlapping sstables being strictly safe
to delete.  https://issues.apache.org/jira/browse/CASSANDRA-13418 was added
to allow the "I know it's not technically safe, but just delete it anyway"
use case. For a lot of people who started using TWCS before 13418, "stop
cassandra, remove stuff we know is expired, start cassandra" is a
not-uncommon pattern in very high-write, high-disk-space use cases.



On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth <ni...@gmail.com> wrote:

> Hi,
> In regards to comment “Purging data is also straightforward, just
> dropping SSTables (by a script) where create date is older than a
> threshold, we don't even need to rely on TTL”
>
> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds past
> whole sstable will have only tombstones.
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
> On Feb 11, 2019, at 2:23 PM, DuyHai Doan <do...@gmail.com> wrote:
>
> Purging data is also straightforward, just dropping SSTables (by a script)
> where create date is older than a threshold, we don't even need to rely on
> TTL
>
>

Re: Max number of windows when using TWCS

Posted by Nitan Kainth <ni...@gmail.com>.
Hi,
In regards to comment “Purging data is also straightforward, just dropping SSTables (by a script) where create date is older than a threshold, we don't even need to rely on TTL”

Doesn’t the old sstables drop by itself? One ttl and gc grace seconds past whole sstable will have only tombstones.


Regards,
Nitan
Cell: 510 449 9629

> On Feb 11, 2019, at 2:23 PM, DuyHai Doan <do...@gmail.com> wrote:
> 
> Purging data is also straightforward, just dropping SSTables (by a script) where create date is older than a threshold, we don't even need to rely on TTL

Re: Max number of windows when using TWCS

Posted by DuyHai Doan <do...@gmail.com>.
No worry for overlapping, the use-case is about events/timeseries and there
is almost no delay so it should be fine.

On the note-side, since we have the guarantee to have 1 SSTable/day of
ingestion, this is very easy to "emulate" incremental backup. You just need
to find the generated SSTable with the latest create date and back it up
every day at midnight with a script.

Purging data is also straightforward, just dropping SSTables (by a script)
where create date is older than a threshold, we don't even need to rely on
TTL



On Mon, Feb 11, 2019 at 9:19 PM Jeff Jirsa <jj...@gmail.com> wrote:

> Wild ass guess based on a large use case I knew about at the time
>
> If you go above that, I expect it’d largely be fine as long as you were
> sure they weren’t overlapping so reads only ever touched a small subset of
> the windows (ideally 1).
>
> If you have one day windows and every read touches all of the windows,
> you’re going to have a bad time.
>
> --
> Jeff Jirsa
>
>
> On Feb 11, 2019, at 12:12 PM, DuyHai Doan <do...@gmail.com> wrote:
>
> Hello users
>
> On the official documentation for TWCS (
> http://cassandra.apache.org/doc/latest/operating/compaction.html#time-window-compactionstrategy)
> it is advised to select the windows unit and size so that the total number
> of windows intervals is around 20-30.
>
> Is there any explanation for this range of 20-30 ? What if we exceed this
> range, let's say having 1 day windows and keeping data for 1year, thus
> having indeed 356 intervals ? What can go wrong with this ?
>
> Regards
>
> Duy Hai DOAN
>
>

Re: Max number of windows when using TWCS

Posted by Jeff Jirsa <jj...@gmail.com>.
Wild ass guess based on a large use case I knew about at the time

If you go above that, I expect it’d largely be fine as long as you were sure they weren’t overlapping so reads only ever touched a small subset of the windows (ideally 1).

If you have one day windows and every read touches all of the windows, you’re going to have a bad time. 

-- 
Jeff Jirsa


> On Feb 11, 2019, at 12:12 PM, DuyHai Doan <do...@gmail.com> wrote:
> 
> Hello users
> 
> On the official documentation for TWCS (http://cassandra.apache.org/doc/latest/operating/compaction.html#time-window-compactionstrategy) it is advised to select the windows unit and size so that the total number of windows intervals is around 20-30.
> 
> Is there any explanation for this range of 20-30 ? What if we exceed this range, let's say having 1 day windows and keeping data for 1year, thus having indeed 356 intervals ? What can go wrong with this ?
> 
> Regards
> 
> Duy Hai DOAN