You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Grzegorz Pietrusza <gp...@gmail.com> on 2022/09/28 13:58:20 UTC

TWCS recommendation on number of windows

Hi All!

According to TWCS documentation (
https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/twcs.html)
the operator should choose compaction window parameters to select a
compaction_window_unit and compaction_window_size pair that produces
approximately 20-30 windows.

I'm curious where this recommendation comes from? Also should the number of
windows be changed when more than one data directory is used? In my example
there are 7 data directories (partitions) and it seems that all of them
store 20-30 windows. Effectively this gives 140-210 sstables in total. Is
that an optimal configuration?

Running on Cassandra 3.11

Regards
Grzegorz

Re: TWCS recommendation on number of windows

Posted by Grzegorz Pietrusza <gp...@gmail.com>.
Hi Jeff
Thanks a lot for all these details, they are really helpful. My
understanding is that the number of windows is a tradeoff between the
amount of data waiting for expiration and the number of sstables required
to satisfy a read request.

In my case the data model does have a timestamp component. What is your
recommendation for these cases?
* TTL = 21 days, typical read span <= 2 days
* TTL = 1300 days, typical read span 30 to 60 days



śr., 28 wrz 2022 o 16:22 Jeff Jirsa <jj...@gmail.com> napisał(a):

> So when I wrote TWCS, I wrote it for a use case that had 24h TTLs and 30
> days of retention. In that application, we had tested 12h windows, 24h
> windows, and 7 day windows, and eventually settled on 24h windows because
> that balanced factors like sstable size, sstables-per-read, and expired
> data waiting to be dropped (about 3%, 1/30th, on any given day). That's
> where that recommendation came from - it was mostly around how much expired
> data will sit around waiting to be dropped. That doesn't change with
> multiple data directories.
>
> If you go with fewer windows, you'll expire larger chunks at a time, which
> means you'll retain larger chunks waiting on expiration.
> If you go with more windows, you'll potentially touch more sstables on
> read.
>
> Realistically, if you can model your data to align with chunks (so each
> read only touches one window), the actual number of sstables shouldn't
> really matter much - the timestamps and bloom filter will avoid touching
> most of them on the read path anyway. If your data model doesnt have a
> timestamp component to it and you're touching lots of sstables on read,
> even 30 sstables is probably going to hurt you, and 210 would be really,
> really bad.
>
>
>
>
>
> On Wed, Sep 28, 2022 at 7:00 AM Grzegorz Pietrusza <gp...@gmail.com>
> wrote:
>
>> Hi All!
>>
>> According to TWCS documentation (
>> https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/twcs.html)
>> the operator should choose compaction window parameters to select a
>> compaction_window_unit and compaction_window_size pair that produces
>> approximately 20-30 windows.
>>
>> I'm curious where this recommendation comes from? Also should the number
>> of windows be changed when more than one data directory is used? In my
>> example there are 7 data directories (partitions) and it seems that all of
>> them store 20-30 windows. Effectively this gives 140-210 sstables in total.
>> Is that an optimal configuration?
>>
>> Running on Cassandra 3.11
>>
>> Regards
>> Grzegorz
>>
>

Re: TWCS recommendation on number of windows

Posted by Jeff Jirsa <jj...@gmail.com>.
So when I wrote TWCS, I wrote it for a use case that had 24h TTLs and 30
days of retention. In that application, we had tested 12h windows, 24h
windows, and 7 day windows, and eventually settled on 24h windows because
that balanced factors like sstable size, sstables-per-read, and expired
data waiting to be dropped (about 3%, 1/30th, on any given day). That's
where that recommendation came from - it was mostly around how much expired
data will sit around waiting to be dropped. That doesn't change with
multiple data directories.

If you go with fewer windows, you'll expire larger chunks at a time, which
means you'll retain larger chunks waiting on expiration.
If you go with more windows, you'll potentially touch more sstables on read.

Realistically, if you can model your data to align with chunks (so each
read only touches one window), the actual number of sstables shouldn't
really matter much - the timestamps and bloom filter will avoid touching
most of them on the read path anyway. If your data model doesnt have a
timestamp component to it and you're touching lots of sstables on read,
even 30 sstables is probably going to hurt you, and 210 would be really,
really bad.





On Wed, Sep 28, 2022 at 7:00 AM Grzegorz Pietrusza <gp...@gmail.com>
wrote:

> Hi All!
>
> According to TWCS documentation (
> https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/twcs.html)
> the operator should choose compaction window parameters to select a
> compaction_window_unit and compaction_window_size pair that produces
> approximately 20-30 windows.
>
> I'm curious where this recommendation comes from? Also should the number
> of windows be changed when more than one data directory is used? In my
> example there are 7 data directories (partitions) and it seems that all of
> them store 20-30 windows. Effectively this gives 140-210 sstables in total.
> Is that an optimal configuration?
>
> Running on Cassandra 3.11
>
> Regards
> Grzegorz
>