You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Gene <gh...@gmail.com> on 2015/09/06 09:32:59 UTC

What is your backup strategy for Cassandra?

Hello everyone,

I'm new to this mailing list, and still fairly new to Cassandra.  I'm a
systems administrator and have had a 3-node Cassandra cluster with a
replication factor of 3 running in Production for about a year now.  We
have about 200 GB of data per node currently.

Up until recently I have just been performing snapshots and clearing them
out as needed.  I recently implemented an automated process to perform
snapshots of our data and copy them off of our cluster via rsync+ssh.
Pretty soon I'll also be utilising the incremental backup feature for
sstables (cassandra.yaml:incremental_backups), and will be taking a look at
archiving for commitlog as well (commitlog_archiving.properties).

I've seen quite a few blog posts here and there about various back up
strategies.  I'm wondering if anyone on this list would be willing to share
theirs.

Things I'm curious about:

1. Data size
2. Frequency for full snapshots
3. Frequency for copying snapshots off of the Cassandra nodes
4. Do you use the incremental backups feature
5. Do you use commitlog archiving
6. What method you use to copy data off of the cluster (e.g. NFS, rsync,
rsync+ssh, etc)
7. Do you compress your backups, if so how soon (e.g. compress backups
older than N days)
8. Do you use any Off the Shelf scripts for your backups (e.g. tablesnap,
cassandra_snapshotter, etc)
9. Do you utilise AWS for your backups, or do you keep it local (or offsite
on your own hardware)
10. Anything else you'd like to add, especially if I missed something
important

I'm not asking for the best, perfect method for Cassandra backups. I'd just
like to see what others are doing and hopefully use some ideas to improve
our processes.

Thanks in advance for any responses, and sorry for the wall of text.

-Gene

Re: What is your backup strategy for Cassandra?

Posted by Robert Coli <rc...@eventbrite.com>.

On Sun, Sep 6, 2015 at 12:32 AM, Gene <gh...@gmail.com> wrote:

> I've seen quite a few blog posts here and there about various back up
> strategies.  I'm wondering if anyone on this list would be willing to share
> theirs.
>

https://github.com/JeremyGrosser/tablesnap

> Things I'm curious about:
>
> 1. Data size
>

Up to hundreds of gigs per node.

> 2. Frequency for full snapshots
>

Never/always (depends on your perspective).

> 3. Frequency for copying snapshots off of the Cassandra nodes
>

As SSTables are flushed.

> 4. Do you use the incremental backups feature
>

No.

> 5. Do you use commitlog archiving
>

No.

> 6. What method you use to copy data off of the cluster (e.g. NFS, rsync,
> rsync+ssh, etc)
>

S3 upload.

> 7. Do you compress your backups, if so how soon (e.g. compress backups
> older than N days)
>

My SSTables are already snappy compressed, so I am skeptical of benefit
from re-compression.

> 8. Do you use any Off the Shelf scripts for your backups (e.g. tablesnap,
> cassandra_snapshotter, etc)
>

tablesnap

> 9. Do you utilise AWS for your backups, or do you keep it local (or
> offsite on your own hardware)
>

AWS.

tl;dr - tablesnap works. There are awkward aspects to its use, but if you
are operating Cassandra in AWS it's probably the best off the shelf
off-node backup.