You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jeff Ferland <jb...@tubularlabs.com> on 2015/05/07 21:07:21 UTC

Offline Compaction and Token Splitting

I have an ideal for backups in my mind with Cassandra to dump each columnfamily to a directory and use an offline process to compact them all into one sstable (or max sstable size set). I have an ideal for restoration which involves a streaming read an sstable set and output based on whether the data fits within a token range. The result of this is that I can store a single copy of data that is effectively already repaired and can read from the specific range that covers a node that I wish to restore. My first look at this was somewhat frustrated by sstable code in the current versions have a strong reliance on the system keyspace.

Does anybody have any thoughts in regards to other things that might exist and fulfill this (particularly offline collective compaction), have a desire for such tools, or have any useful information for me before I attempt to build such beasts?

-Jeff

Re: Offline Compaction and Token Splitting

Posted by Robert Coli <rc...@eventbrite.com>.
On Thu, May 7, 2015 at 12:07 PM, Jeff Ferland <jb...@tubularlabs.com> wrote:

> Does anybody have any thoughts in regards to other things that might exist
> and fulfill this (particularly offline collective compaction), have a
> desire for such tools, or have any useful information for me before I
> attempt to build such beasts?
>

Were I doing this, I'd :

1) probably just run an embedded cassandra cluster-of-one node and use that
to compact
2) look at the code of offline scrub and/or sstablesplit tools

=Rob