You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by onmstester onmstester via user <us...@cassandra.apache.org> on 2023/01/07 06:00:51 UTC

RE: Best compaction strategy for rarely used data

Another solution: distribute data in more tables, for example you could create multiple tables based on value or hash_bucket of one of the columns, by doing this current data volume  and compaction overhead would be divided to the number of underlying tables. Although there is a limitation for number of tables in Cassandra (a few hundreds).

I wish STCS simply had a limitation for maximum sstable size so sstables bigger that this limit would not be compacted at all, that would have solved most of similar problems?!



Sent using https://www.zoho.com/mail/








---- On Fri, 30 Dec 2022 21:43:27 +0330 Durity, Sean R via user <us...@cassandra.apache.org> wrote ---




Yes, clean-up will reduce the disk space on the existing nodes by re-writing only the data that the node now owns into new sstables.

 

 

Sean R. Durity

DB Solutions

Staff Systems Engineer – Cassandra

 

From: Lapo Luchini <ma...@lapo.it> 
 Sent: Friday, December 30, 2022 4:12 AM
 To: mailto:user@cassandra.apache.org
 Subject: [EXTERNAL] Re: Best compaction strategy for rarely used data

 

On 2022-12-29 21: 54, Durity, Sean R via user wrote: > At some point you will end up with large sstables (like 1 TB) that won’t > compact because there are not
 4 similar-sized ones able to be compacted Yes, that's exactly what's happening. 




 

INTERNAL USE


On 2022-12-29 21:54, Durity, Sean R via user wrote:

> At some point you will end up with large sstables (like 1 TB) that won’t 

> compact because there are not 4 similar-sized ones able to be compacted 

 

Yes, that's exactly what's happening.

 

I'll see maybe just one more compaction, since the biggest sstable is 

already more than 20% of residual free space.

 

> For me, the backup strategy shouldn’t drive the rest.

 

Mhh, yes, that makes sense.

 

> And if your data is ever-growing 

> and never deleted, you will be adding nodes to handle the extra data as 

> time goes by (and running clean-up on the existing nodes).

 

What will happen when adding new nodes, as you say, though?

If I have a 1GB sstable with 250GB of data that will be no longer useful 

(as a new node will be the new owner) will that sstable be reduced to 

750GB by "cleanup" or will it retain old data?

 

Thanks,

 

-- 

Lapo Luchini

mailto:lapo@lapo.it