You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by onmstester onmstester via user <us...@cassandra.apache.org> on 2023/01/07 06:00:51 UTC
RE: Best compaction strategy for rarely used data
Another solution: distribute data in more tables, for example you could create multiple tables based on value or hash_bucket of one of the columns, by doing this current data volume and compaction overhead would be divided to the number of underlying tables. Although there is a limitation for number of tables in Cassandra (a few hundreds).
I wish STCS simply had a limitation for maximum sstable size so sstables bigger that this limit would not be compacted at all, that would have solved most of similar problems?!
Sent using https://www.zoho.com/mail/
---- On Fri, 30 Dec 2022 21:43:27 +0330 Durity, Sean R via user <us...@cassandra.apache.org> wrote ---
Yes, clean-up will reduce the disk space on the existing nodes by re-writing only the data that the node now owns into new sstables.
Sean R. Durity
DB Solutions
Staff Systems Engineer – Cassandra
From: Lapo Luchini <ma...@lapo.it>
Sent: Friday, December 30, 2022 4:12 AM
To: mailto:user@cassandra.apache.org
Subject: [EXTERNAL] Re: Best compaction strategy for rarely used data
On 2022-12-29 21: 54, Durity, Sean R via user wrote: > At some point you will end up with large sstables (like 1 TB) that won’t > compact because there are not
4 similar-sized ones able to be compacted Yes, that's exactly what's happening.
INTERNAL USE
On 2022-12-29 21:54, Durity, Sean R via user wrote:
> At some point you will end up with large sstables (like 1 TB) that won’t
> compact because there are not 4 similar-sized ones able to be compacted
Yes, that's exactly what's happening.
I'll see maybe just one more compaction, since the biggest sstable is
already more than 20% of residual free space.
> For me, the backup strategy shouldn’t drive the rest.
Mhh, yes, that makes sense.
> And if your data is ever-growing
> and never deleted, you will be adding nodes to handle the extra data as
> time goes by (and running clean-up on the existing nodes).
What will happen when adding new nodes, as you say, though?
If I have a 1GB sstable with 250GB of data that will be no longer useful
(as a new node will be the new owner) will that sstable be reduced to
750GB by "cleanup" or will it retain old data?
Thanks,
--
Lapo Luchini
mailto:lapo@lapo.it