You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Oleksandr Shulgin <ol...@zalando.de> on 2020/03/02 10:01:27 UTC

Re: Deleting Compaction Strategy for Cassandra 3.0?

On Sat, Feb 29, 2020 at 8:49 AM Jeff Jirsa <jj...@gmail.com> wrote:

> If you’re really really advanced you MIGHT be able to  use spark +
> cqlsstablewriter to create a ton of sstables with just tombstones one them
> representing deletes, then either nodetool refresh or sstableloader them
> into the cluster
>
> If you create sstables on the right timestamp boundaries to match your
> twcs windows, each one will compact with the data file or the same window
> and delete the data.
>
> Will be a ton of compaction though. Not as efficient as the deleting
> strategy. Also not sure if the offline cqlsstablewriter actually supports
> deletes because I’m on my phone and too lazy to check. If it doesn’t it
> probably wouldn’t be that hard to add.
>

Yeah, even if that would work with the CQLSSTableWriter, the ton of
user-defined compaction is what we would like to avoid.  We are OK with
rewriting all files once, though.

Assuming, we get it running on our server version: do I get it right that
running `nodetool upgradesstables -a` is going to rewrite all the SSTable
files subject to the defined compaction strategy?

--
Alex

Re: Deleting Compaction Strategy for Cassandra 3.0?

Posted by Jeff Jirsa <jj...@gmail.com>.


> On Mar 2, 2020, at 2:02 AM, Oleksandr Shulgin <ol...@zalando.de> wrote:
> 
> 
>> On Sat, Feb 29, 2020 at 8:49 AM Jeff Jirsa <jj...@gmail.com> wrote:
> 
>> If you’re really really advanced you MIGHT be able to  use spark + cqlsstablewriter to create a ton of sstables with just tombstones one them representing deletes, then either nodetool refresh or sstableloader them into the cluster 
>> 
>> If you create sstables on the right timestamp boundaries to match your twcs windows, each one will compact with the data file or the same window and delete the data. 
>> 
>> Will be a ton of compaction though. Not as efficient as the deleting strategy. Also not sure if the offline cqlsstablewriter actually supports deletes because I’m on my phone and too lazy to check. If it doesn’t it probably wouldn’t be that hard to add.
> 
> Yeah, even if that would work with the CQLSSTableWriter, the ton of user-defined compaction is what we would like to avoid.  We are OK with rewriting all files once, though.
> 
> Assuming, we get it running on our server version: do I get it right that running `nodetool upgradesstables -a` is going to rewrite all the SSTable files subject to the defined compaction strategy?


You don’t need to do user defined compaction here

As soon as the data files are on the server, the next time TWCS looks for compaction candidates (e.g. next flush, so “nodetool flush”), it’ll find all of the extra sstables and start putting them into the right windows.

Note that you have to have the sstables lined up properly - when you build them, they must stop on the right timestamp boundaries or this doesn’t work. You can try a day at a time though - process all of the deletes for one time window and load them in.

(Again, presumes this works with the cqlsstablewriter which I haven’t looked at in years)