You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Thoku Hansen <th...@gmail.com> on 2011/06/23 00:48:44 UTC

Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

I have a couple of questions regarding the coordination of Cassandra nodetool snapshots with Amazon EBS snapshots as part of a Cassandra backup/restore strategy.

Background: I have a cluster running in EC2. Its nodes are configured like so:

* Instance type: m1.xlarge
* Cassandra commit log writing to RAID-0 ephemeral storage
* Cassandra data writing to an EBS volume.

Note: there is a lot of conflicting information/advice about using Cassandra in EC2 w.r.t ephemeral vs. EBS. The above configuration seems to work well for my application. I only described this to provide context for my EBS snapshotting question. With respect, I hope not to debate Cassandra performance for ephemeral vs. EBS in this thread!

I am setting up a process that performs regular EBS (->S3) snapshots for the purpose of backing up Cassandra plus other data.
I presume this will need to be coordinated with regular Cassandra (nodetool) snapshots also.

My questions:
1. Is it feasible to run directly against a Cassandra data directory restored from an EBS snapshot? (as opposed to nodetool snapshots restored from an EBS snapshot).
2. Noting the wiki's consistent Cassandra backups advice; if I schedule nodetool snapshots across the cluster, should the relative age of the 'sibling' snapshots be a concern? How far apart can they be before its a problem? (seconds? minutes? hours?)

My motivation for these two questions: I'm trying to figure out how much effort needs to be put into:
* Time-coordinated scheduling of nodetool snapshots across the cluster
* Automation of the process of determining the most appropriate set of nodetool snapshots to use when restoring a cluster.

Thanks!

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

Posted by aaron morton <aa...@thelastpickle.com>.

> 1. Is it feasible to run directly against a Cassandra data directory restored from an EBS snapshot? (as opposed to nodetool snapshots restored from an EBS snapshot).

I dont have experience with the EBS snapshot, but I've never been a fan of OS level snapshots that are not coordinated with the DB layer. 

> 2. Noting the wiki's consistent Cassandra backups advice; if I schedule nodetool snapshots across the cluster, should the relative age of the 'sibling' snapshots be a concern? How far apart can they be before its a problem? (seconds? minutes? hours?)

Consider the snapshot to be from the time of the first one. 

Previous discussion on AWS backup 
http://www.mail-archive.com/user@cassandra.apache.org/msg12831.html

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23 Jun 2011, at 10:48, Thoku Hansen wrote:

> I have a couple of questions regarding the coordination of Cassandra nodetool snapshots with Amazon EBS snapshots as part of a Cassandra backup/restore strategy.
> 
> Background: I have a cluster running in EC2. Its nodes are configured like so:
> 
> * Instance type: m1.xlarge
> * Cassandra commit log writing to RAID-0 ephemeral storage
> * Cassandra data writing to an EBS volume.
> 
> Note: there is a lot of conflicting information/advice about using Cassandra in EC2 w.r.t ephemeral vs. EBS. The above configuration seems to work well for my application. I only described this to provide context for my EBS snapshotting question. With respect, I hope not to debate Cassandra performance for ephemeral vs. EBS in this thread!
> 
> I am setting up a process that performs regular EBS (->S3) snapshots for the purpose of backing up Cassandra plus other data.
> I presume this will need to be coordinated with regular Cassandra (nodetool) snapshots also.
> 
> My questions:
> 1. Is it feasible to run directly against a Cassandra data directory restored from an EBS snapshot? (as opposed to nodetool snapshots restored from an EBS snapshot).
> 2. Noting the wiki's consistent Cassandra backups advice; if I schedule nodetool snapshots across the cluster, should the relative age of the 'sibling' snapshots be a concern? How far apart can they be before its a problem? (seconds? minutes? hours?)
> 
> My motivation for these two questions: I'm trying to figure out how much effort needs to be put into:
> * Time-coordinated scheduling of nodetool snapshots across the cluster
> * Automation of the process of determining the most appropriate set of nodetool snapshots to use when restoring a cluster.
> 
> Thanks!

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

Posted by Josep Blanquer <bl...@rightscale.com>.

On Thu, Jun 23, 2011 at 8:02 AM, William Oberman
<ob...@civicscience.com>wrote:

> I've been doing EBS snapshots for mysql for some time now, and was using a
> similar pattern as Josep (XFS with freeze, snap, unfreeze), with the extra
> complication that I was actually using 8 EBS's in RAID-0 (and the extra
> extra complication that I had to lock the MyISAM tables... glad to be moving
> away from that).  For cassandra I switched to ephemeral disks, as per
> recommendations from this forum.
>
> yes, if you want to consistently snap MySQL you need to get it into a
consistent state, so you need to do the whole FLUSH TABLES WITH READ LOCK
yadda yadda, on top of the rest. Otherwise you might snapshot something that
is not correct/consistent...and it's a bit more tricky with snapshotting
slaves, since you need to know where they are in the replication
stream...etc



> One note on EBS snapshots though: the last time I checked (which was some
> time ago) I noticed degraded IO performance on the box during the
> snapshotting process even though the take snapshot command returns almost
> immediately.  My theory back then was that amazon does the
> delta/compress/store "outside" of the VM, but it obviously has an effect on
> resources on the box the VM runs on.  I was doing this on a mysql slave that
> no one talked to, so I didn't care/bother looking into it further.
>
>
Yes, that is correct. The underlying copy-on-write-and-ship-to-EBS/S3 does
has some performance impact  on the running box. For the most part it's
never presented a problem for us or many of our customers, although you're
right, it's something you want to know about and have in mind when designing
your system (for example for snapshot slaves much more often than masters,
and do masters when the traffic is low, stagger cassandra snaps...yadda
yadda).
If you think about it, this effect is not that different from using LVM
snaps on the ephemeral, and then moving the data from the snap to another
disk or a remote storage...moving those blocks it would have an impact on
the original LVM volume since it's reading the same physical (ephemeral)
disk/s underneath (list of clean and dirty blocks).

One case I could see the slightly reduced IO performance being problematic
if your DB/storage is already at the edge of I/O capacity...but in that
case, the small overhead of a snapshots is probably the least of your
problems :) EBS slowness or malfunction can also impact the instance,
obviously, although that is not only related to snapshots, since it can
impact the actual volume regardless.

 Josep M.