You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Thoku Hansen <th...@gmail.com> on 2011/06/23 00:48:44 UTC

Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

I have a couple of questions regarding the coordination of Cassandra nodetool snapshots with Amazon EBS snapshots as part of a Cassandra backup/restore strategy.

Background: I have a cluster running in EC2. Its nodes are configured like so:

* Instance type: m1.xlarge
* Cassandra commit log writing to RAID-0 ephemeral storage
* Cassandra data writing to an EBS volume.

Note: there is a lot of conflicting information/advice about using Cassandra in EC2 w.r.t ephemeral vs. EBS. The above configuration seems to work well for my application. I only described this to provide context for my EBS snapshotting question. With respect, I hope not to debate Cassandra performance for ephemeral vs. EBS in this thread!

I am setting up a process that performs regular EBS (->S3) snapshots for the purpose of backing up Cassandra plus other data.
I presume this will need to be coordinated with regular Cassandra (nodetool) snapshots also.

My questions:
1. Is it feasible to run directly against a Cassandra data directory restored from an EBS snapshot? (as opposed to nodetool snapshots restored from an EBS snapshot).
2. Noting the wiki's consistent Cassandra backups advice; if I schedule nodetool snapshots across the cluster, should the relative age of the 'sibling' snapshots be a concern? How far apart can they be before its a problem? (seconds? minutes? hours?)

My motivation for these two questions: I'm trying to figure out how much effort needs to be put into:
* Time-coordinated scheduling of nodetool snapshots across the cluster
* Automation of the process of determining the most appropriate set of nodetool snapshots to use when restoring a cluster.

Thanks!

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

Posted by aaron morton <aa...@thelastpickle.com>.
> 1. Is it feasible to run directly against a Cassandra data directory restored from an EBS snapshot? (as opposed to nodetool snapshots restored from an EBS snapshot).

I dont have experience with the EBS snapshot, but I've never been a fan of OS level snapshots that are not coordinated with the DB layer. 

> 2. Noting the wiki's consistent Cassandra backups advice; if I schedule nodetool snapshots across the cluster, should the relative age of the 'sibling' snapshots be a concern? How far apart can they be before its a problem? (seconds? minutes? hours?)

Consider the snapshot to be from the time of the first one. 

Previous discussion on AWS backup 
http://www.mail-archive.com/user@cassandra.apache.org/msg12831.html

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23 Jun 2011, at 10:48, Thoku Hansen wrote:

> I have a couple of questions regarding the coordination of Cassandra nodetool snapshots with Amazon EBS snapshots as part of a Cassandra backup/restore strategy.
> 
> Background: I have a cluster running in EC2. Its nodes are configured like so:
> 
> * Instance type: m1.xlarge
> * Cassandra commit log writing to RAID-0 ephemeral storage
> * Cassandra data writing to an EBS volume.
> 
> Note: there is a lot of conflicting information/advice about using Cassandra in EC2 w.r.t ephemeral vs. EBS. The above configuration seems to work well for my application. I only described this to provide context for my EBS snapshotting question. With respect, I hope not to debate Cassandra performance for ephemeral vs. EBS in this thread!
> 
> I am setting up a process that performs regular EBS (->S3) snapshots for the purpose of backing up Cassandra plus other data.
> I presume this will need to be coordinated with regular Cassandra (nodetool) snapshots also.
> 
> My questions:
> 1. Is it feasible to run directly against a Cassandra data directory restored from an EBS snapshot? (as opposed to nodetool snapshots restored from an EBS snapshot).
> 2. Noting the wiki's consistent Cassandra backups advice; if I schedule nodetool snapshots across the cluster, should the relative age of the 'sibling' snapshots be a concern? How far apart can they be before its a problem? (seconds? minutes? hours?)
> 
> My motivation for these two questions: I'm trying to figure out how much effort needs to be put into:
> * Time-coordinated scheduling of nodetool snapshots across the cluster
> * Automation of the process of determining the most appropriate set of nodetool snapshots to use when restoring a cluster.
> 
> Thanks!


Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

Posted by Josep Blanquer <bl...@rightscale.com>.
On Thu, Jun 23, 2011 at 8:02 AM, William Oberman
<ob...@civicscience.com>wrote:

> I've been doing EBS snapshots for mysql for some time now, and was using a
> similar pattern as Josep (XFS with freeze, snap, unfreeze), with the extra
> complication that I was actually using 8 EBS's in RAID-0 (and the extra
> extra complication that I had to lock the MyISAM tables... glad to be moving
> away from that).  For cassandra I switched to ephemeral disks, as per
> recommendations from this forum.
>
> yes, if you want to consistently snap MySQL you need to get it into a
consistent state, so you need to do the whole FLUSH TABLES WITH READ LOCK
yadda yadda, on top of the rest. Otherwise you might snapshot something that
is not correct/consistent...and it's a bit more tricky with snapshotting
slaves, since you need to know where they are in the replication
stream...etc



> One note on EBS snapshots though: the last time I checked (which was some
> time ago) I noticed degraded IO performance on the box during the
> snapshotting process even though the take snapshot command returns almost
> immediately.  My theory back then was that amazon does the
> delta/compress/store "outside" of the VM, but it obviously has an effect on
> resources on the box the VM runs on.  I was doing this on a mysql slave that
> no one talked to, so I didn't care/bother looking into it further.
>
>
Yes, that is correct. The underlying copy-on-write-and-ship-to-EBS/S3 does
has some performance impact  on the running box. For the most part it's
never presented a problem for us or many of our customers, although you're
right, it's something you want to know about and have in mind when designing
your system (for example for snapshot slaves much more often than masters,
and do masters when the traffic is low, stagger cassandra snaps...yadda
yadda).
If you think about it, this effect is not that different from using LVM
snaps on the ephemeral, and then moving the data from the snap to another
disk or a remote storage...moving those blocks it would have an impact on
the original LVM volume since it's reading the same physical (ephemeral)
disk/s underneath (list of clean and dirty blocks).

One case I could see the slightly reduced IO performance being problematic
if your DB/storage is already at the edge of I/O capacity...but in that
case, the small overhead of a snapshots is probably the least of your
problems :) EBS slowness or malfunction can also impact the instance,
obviously, although that is not only related to snapshots, since it can
impact the actual volume regardless.

 Josep M.

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

Posted by William Oberman <ob...@civicscience.com>.
I've been doing EBS snapshots for mysql for some time now, and was using a
similar pattern as Josep (XFS with freeze, snap, unfreeze), with the extra
complication that I was actually using 8 EBS's in RAID-0 (and the extra
extra complication that I had to lock the MyISAM tables... glad to be moving
away from that).  For cassandra I switched to ephemeral disks, as per
recommendations from this forum.

One note on EBS snapshots though: the last time I checked (which was some
time ago) I noticed degraded IO performance on the box during the
snapshotting process even though the take snapshot command returns almost
immediately.  My theory back then was that amazon does the
delta/compress/store "outside" of the VM, but it obviously has an effect on
resources on the box the VM runs on.  I was doing this on a mysql slave that
no one talked to, so I didn't care/bother looking into it further.

will

On Thu, Jun 23, 2011 at 10:30 AM, Peter Schuller <
peter.schuller@infidyne.com> wrote:

> >> EBS volume atomicity is good. We've had tons of experience since EBS
> came
> >> out almost 4 years ago,  to back all kinds of things, including large
> DBs.
>
> And thanks a lot for coming forward with production experience. That
> is always useful with these things.
>
> --
> / Peter Schuller
>



-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) oberman@civicscience.com

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

Posted by Peter Schuller <pe...@infidyne.com>.
>> EBS volume atomicity is good. We've had tons of experience since EBS came
>> out almost 4 years ago,  to back all kinds of things, including large DBs.

And thanks a lot for coming forward with production experience. That
is always useful with these things.

-- 
/ Peter Schuller

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

Posted by Peter Schuller <pe...@infidyne.com>.
> If taking an atomic snapshot of the device on which a file system is
> located on, assuming the file system is designed to be crash
> consistent, it *has* to result in a consistent snapshot. Anything else
> would directly violate the claim that the file system is crash
> consistent, making the premise false.

Let me clarify. Crash-consistent file systems work like that by
relying on write barriers. This is what is exposed by fsync() to
userland (fsync() actually provides full durability guarantees, not
just write barriers, but for the purpose of consistency, it is the
write barrier you are interested in).

A write barrier is such that given a sequence of events like:

(1) write X
(2) insert write barrier
(3) write Y

It is guaranteed that if Y is written (i.e., readable in the future)
then X is also written.

It is upon this underlying guarantee that file systems like xfs,
ext4fs, zfs do their job. Their consistency semantics rely on this
behavior, and it is what it allows the file system to be
crash-consistent. In other words, in a timeline of writes, at any
given moment you can pause/crash/restart causing a sudden interruption
of the I/O. This has to lead to a directly consistent, or a
deterministically recoverable state, in order for the file system to
be called crash-consistent.

The "event" of suddenly interrupting I/O can be caused by several
things, such as a kernel panic (some assertion) pausing all kernel
activity, a power outtage causing a restart, an LVM atomic snapshot
being taken (in which case the I/O stops in the timeline of the
snapshot), or an EBS snapshot.

Only if the EBS snapshos are not consistent, or write barriers are
somehow violated on the EBS volume, would an EBS snapshot not be
consistent. Freezing is not required. But again, see my previous post
about freeze maybe being probabilistically useful anyway.

-- 
/ Peter Schuller

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

Posted by Peter Schuller <pe...@infidyne.com>.
> that with storage, there are *lots* of urban legends and people making
> strange claims. In this case it is wrong for fundamental reasons
> independent of kernel implementation details.

Also, note that it is not specific to log based file systems. Even
"old" file systems the predates journaling or softupdates, were
specifically and carefully designed such that I/O was performed in a
way that would yield a recoverable state after a crash (this is why
'fsck' runs on boot on old systems). fsck was never primarily intended
to fix arbitrary corruption; file systems were written to perform I/O
in very careful ways (careful with respect to ordering of I/O
operations) such that a crash/reboot/power outage results in an
on-disk state which is recoverable to be consistent, *WITHOUT*
arbitrary data loss or corruption in the file system.

A journaling file system is just another method of achieving certain
goals, including crash consistency. It still relies on the same
fundamental properties of the underlying storage device.

-- 
/ Peter Schuller

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

Posted by Peter Schuller <pe...@infidyne.com>.
> A snippet from the wikipedia page on XFS for example:
> http://en.wikipedia.org/wiki/XFS
> ...
>
> Snapshots
>
> XFS does not provide direct support for snapshots, as it expects the
> snapshot process to be implemented by the volume manager. Taking a snapshot
> of an XFS filesystem involves freezing I/O to the filesystem using
> the xfs_freeze utility, having the volume manager perform the actual
> snapshot, and then unfreezing I/O to resume normal operations. The snapshot
> can then be mounted read-only for backup purposes. XFS releases on IRIX
> incorporated an integrated volume manager called XLV. This volume manager
> has not been ported to Linux and XFS works with standard LVM instead. In
> recent Linux kernels, thexfs_freeze functionality is implemented in the VFS
> layer, and happens automatically when the Volume Manager's snapshot
> functionality is invoked. This was once a valuable advantage as Ext3 system
> could not be suspended[4] and volume manager was unable to create a
> consistent 'hot' snapshot to backup a heavily busy database.[5] Fortunately
> this is no longer the case. Since Linux 2.6.29 ext3, ext4, gfs2 and jfs have
> the freeze feature as well.[6]
>
> ...

The above is misleading, at least when read out of context (I didn't
check the article). The only hint that the freezing is only necessary
with non-atomic snapshots is in the "... and volumen manager was
unable to create a consistent hot snapshot" part.

> I haven't touched the linux kernel for many years now, so I honestly I'm
> talking about what I've read in the last few years (rather than relying on
> the actual kernel/drivers code). But if I have to trust this and many other
> articles like it, I'm interpreting that freezing the FS (directly or
> indirectly by LVM) is, indeed, necessary. Not just for XFS but for other
> log-based filesystems. Honestly speaking, I'm not sure if the exact
> technical reason why...maybe it is to stop reads to the actual device, or to
> ensure some sort of log flushing depending on your settings, ... etc.
> dunno...erhaps somebody else knows and want to share it.

It's wrong, no matter how many places people claim it. The problem is
that with storage, there are *lots* of urban legends and people making
strange claims. In this case it is wrong for fundamental reasons
independent of kernel implementation details.

While the freezing may very well have been empirically needed in some
particular case, reasons include write barriers not propagating, using
the fs on a multi-device array where global snapshots are not
supported, etc - but fundamentally, an atomic snapshot will yield a
consistent (or recoverable) file system if the file system is
crash-consistent AND is used/configured/accessed in such a way to
actually be crash-consistent for real (e.g., disabling synchronous
writes, not propagating write barriers due to lvm, are examples where
it's not).

Of course I realize I am just another screaming voice. I guess this is
another blog entry to write and explain this in detail. I should
really start working off the backlog of those blog entries...

-- 
/ Peter Schuller

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

Posted by Josep Blanquer <bl...@rightscale.com>.
On Thu, Jun 23, 2011 at 8:54 AM, Peter Schuller <peter.schuller@infidyne.com
> wrote:

> > Actually, I'm afraid that's not true (unless I'm missing something). Even
> if
> > you have only 1 drive, you still need to stop writes to the disk for the
> > short time it takes the low level "drivers" to snapshot it (i.e., marking
> > all blocks as clean so you can do CopyOnWrite later). I.e., you need to
> give
> > a chance to LVM, or the EBS low level 'modules' in the hypervisor (
> whatever
> > you use underneath...), to have exclusive control of the drive for a
> moment.
> > Now, that being said, some systems (like LVM)  will do a freeze
> themselves,
> > so technically speaking you don't need to explicitly do a freeze
> > yourself...but that's not to say that a freeze is not required for
> > snapshotting.
>
> This doesn't make sense unless you can provide some specific reason
> why this would be required. If a file system is crash-consistent,
> relying on write barriers to work, and given that the setup (kernel,
> mounts opts, device driver etc) is such that write barriers are not
> broken, it is directly implied that a consistent snapshot of the under
> lying device is equivalent to a sudden halt (power off, sudden reboot,
> etc).
>
> If taking an atomic snapshot of the device on which a file system is
> located on, assuming the file system is designed to be crash
> consistent, it *has* to result in a consistent snapshot. Anything else
> would directly violate the claim that the file system is crash
> consistent, making the premise false.
>
> Peter,

A snippet from the wikipedia page on XFS for example:
http://en.wikipedia.org/wiki/XFS
...
Snapshots

XFS does not provide direct support for snapshots, as it expects the
snapshot process to be implemented by the volume manager. Taking a snapshot
of an XFS filesystem involves freezing I/O to the filesystem using the
xfs_freeze utility, having the volume manager perform the actual snapshot,
and then unfreezing I/O to resume normal operations. The snapshot can then
be mounted read-only for backup purposes. XFS releases on IRIX incorporated
an integrated volume manager called XLV. This volume manager has not been
ported to Linux and XFS works with standard
LVM<http://en.wikipedia.org/wiki/Logical_Volume_Manager_%28Linux%29>
 instead. In recent Linux kernels, thexfs_freeze functionality is
implemented in the VFS layer, and happens automatically when the Volume
Manager's snapshot functionality is invoked. This was once a valuable
advantage as Ext3 system could not be
suspended[4]<http://en.wikipedia.org/wiki/XFS#cite_note-3>
 and volume manager was unable to create a consistent 'hot' snapshot to
backup a heavily busy database.[5]<http://en.wikipedia.org/wiki/XFS#cite_note-4>
 Fortunately this is no longer the case. Since Linux 2.6.29 ext3, ext4, gfs2
and jfs have the freeze feature as
well.[6]<http://en.wikipedia.org/wiki/XFS#cite_note-5>
...


I haven't touched the linux kernel for many years now, so I honestly I'm
talking about what I've read in the last few years (rather than relying on
the actual kernel/drivers code). But if I have to trust this and many other
articles like it, I'm interpreting that freezing the FS (directly or
indirectly by LVM) is, indeed, necessary. Not just for XFS but for other
log-based filesystems. Honestly speaking, I'm not sure if the exact
technical reason why...maybe it is to stop reads to the actual device, or to
ensure some sort of log flushing depending on your settings, ... etc.
dunno...erhaps somebody else knows and want to share it.

:)

Josep M.

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

Posted by Peter Schuller <pe...@infidyne.com>.
> Actually, I'm afraid that's not true (unless I'm missing something). Even if
> you have only 1 drive, you still need to stop writes to the disk for the
> short time it takes the low level "drivers" to snapshot it (i.e., marking
> all blocks as clean so you can do CopyOnWrite later). I.e., you need to give
> a chance to LVM, or the EBS low level 'modules' in the hypervisor ( whatever
> you use underneath...), to have exclusive control of the drive for a moment.
> Now, that being said, some systems (like LVM)  will do a freeze themselves,
> so technically speaking you don't need to explicitly do a freeze
> yourself...but that's not to say that a freeze is not required for
> snapshotting.

This doesn't make sense unless you can provide some specific reason
why this would be required. If a file system is crash-consistent,
relying on write barriers to work, and given that the setup (kernel,
mounts opts, device driver etc) is such that write barriers are not
broken, it is directly implied that a consistent snapshot of the under
lying device is equivalent to a sudden halt (power off, sudden reboot,
etc).

If taking an atomic snapshot of the device on which a file system is
located on, assuming the file system is designed to be crash
consistent, it *has* to result in a consistent snapshot. Anything else
would directly violate the claim that the file system is crash
consistent, making the premise false.

-- 
/ Peter Schuller

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

Posted by Josep Blanquer <bl...@rightscale.com>.
On Thu, Jun 23, 2011 at 7:30 AM, Peter Schuller <peter.schuller@infidyne.com
> wrote:

> > EBS volume atomicity is good. We've had tons of experience since EBS came
> > out almost 4 years ago,  to back all kinds of things, including large
> DBs.
> > One important thing to have in mind though, is that EBS snapshots are
> done
> > at the block level, not at the filesystem level. So depending on the
> > filesystem you have on top of the drives you might need to tell the
> > filesystem to "make sure this is consistent or recoverable now". For
> > example, if you use the log-based XFS, you might need to do xfs_freeze,
> > snapshot disc/s, xfs_unfreeze. To make sure that the restored filesystem
> > data (and not only the low level disk blocks) is recoverable when you
> > restore them.
>
> No. That is only require if you're doing multi-volume EBS snapshots
> (e.g. XFS on LVM). The entire point of an atomic snapshot is that
> atomicity gives a consistent snapshot; a modern file system which is
> already crash-consistent will be consistent in an atomic snapshot
> without additional action taken.
>
>

> That said, of course exercising those code paths regularly, rather
> than just on crashes, may mean that you have an elevated chance of
> triggering a bug that you would normally see very rarely. In that way,
> xfs_freeze might actually help probabilistically; however strictly
> speaking, discounting bugs, a crash-consistent fs will be "consistent
> snapshot consistent" as well (it is logically implied).
>
>
Actually, I'm afraid that's not true (unless I'm missing something). Even if
you have only 1 drive, you still need to stop writes to the disk for the
short time it takes the low level "drivers" to snapshot it (i.e., marking
all blocks as clean so you can do CopyOnWrite later). I.e., you need to give
a chance to LVM, or the EBS low level 'modules' in the hypervisor ( whatever
you use underneath...), to have exclusive control of the drive for a moment.
Now, that being said, some systems (like LVM)  will do a freeze themselves,
so technically speaking you don't need to explicitly do a freeze
yourself...but that's not to say that a freeze is not required for
snapshotting.

But this all assumes the entire stack is correct, and that e.g. an
> fsync() propagates correctly (i.e., not eaten by some LVM or mount
> option to the fs) in order to bring that consistency up to the
> application level.
>
> --
> / Peter Schuller
>

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

Posted by Peter Schuller <pe...@infidyne.com>.
> EBS volume atomicity is good. We've had tons of experience since EBS came
> out almost 4 years ago,  to back all kinds of things, including large DBs.
> One important thing to have in mind though, is that EBS snapshots are done
> at the block level, not at the filesystem level. So depending on the
> filesystem you have on top of the drives you might need to tell the
> filesystem to "make sure this is consistent or recoverable now". For
> example, if you use the log-based XFS, you might need to do xfs_freeze,
> snapshot disc/s, xfs_unfreeze. To make sure that the restored filesystem
> data (and not only the low level disk blocks) is recoverable when you
> restore them.

No. That is only require if you're doing multi-volume EBS snapshots
(e.g. XFS on LVM). The entire point of an atomic snapshot is that
atomicity gives a consistent snapshot; a modern file system which is
already crash-consistent will be consistent in an atomic snapshot
without additional action taken.

That said, of course exercising those code paths regularly, rather
than just on crashes, may mean that you have an elevated chance of
triggering a bug that you would normally see very rarely. In that way,
xfs_freeze might actually help probabilistically; however strictly
speaking, discounting bugs, a crash-consistent fs will be "consistent
snapshot consistent" as well (it is logically implied).

But this all assumes the entire stack is correct, and that e.g. an
fsync() propagates correctly (i.e., not eaten by some LVM or mount
option to the fs) in order to bring that consistency up to the
application level.

-- 
/ Peter Schuller

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

Posted by Josep Blanquer <bl...@rightscale.com>.
On Thu, Jun 23, 2011 at 5:04 AM, Peter Schuller <peter.schuller@infidyne.com
> wrote:

> > 1. Is it feasible to run directly against a Cassandra data directory
> > restored from an EBS snapshot? (as opposed to nodetool snapshots restored
> > from an EBS snapshot).
>
> Assuming EBS is not buggy, including honor write barriers, including
> the linux guest kernel etc, then yes. EBS snapshots of a single
> volumes are promised to be atomic. As such, a restore from an EBS
> snapshot should be semantically identical to recover after a power
> outage or sudden reboot of the node.
>
> I make no claims as to how well EBS snapshot atomicity is actually
> tested in practice.
>
>
EBS volume atomicity is good. We've had tons of experience since EBS came
out almost 4 years ago,  to back all kinds of things, including large DBs.
One important thing to have in mind though, is that EBS snapshots are done
at the block level, not at the filesystem level. So depending on the
filesystem you have on top of the drives you might need to tell the
filesystem to "make sure this is consistent or recoverable now". For
example, if you use the log-based XFS, you might need to do xfs_freeze,
snapshot disc/s, xfs_unfreeze. To make sure that the restored filesystem
data (and not only the low level disk blocks) is recoverable when you
restore them.

 Snapshotting volume stripes works exactly in the same way, you just have to
keep track of what position each snapshot has in the stripe, so you can
recreate the stripe back correctly.

The "freezing" of the filesystem might cause a quick/mini hickup, which is
usually not noticeable unless you have very stringent requirements in the
box (or if you have a very large stripe, and/or some sort of network issue
where the calls to amazon endpoint are very slow...and therefore you're
locking the FS a tad longer than you'd want to).

 Cheers,

Josep M.

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

Posted by Peter Schuller <pe...@infidyne.com>.
> 1. Is it feasible to run directly against a Cassandra data directory
> restored from an EBS snapshot? (as opposed to nodetool snapshots restored
> from an EBS snapshot).

Assuming EBS is not buggy, including honor write barriers, including
the linux guest kernel etc, then yes. EBS snapshots of a single
volumes are promised to be atomic. As such, a restore from an EBS
snapshot should be semantically identical to recover after a power
outage or sudden reboot of the node.

I make no claims as to how well EBS snapshot atomicity is actually
tested in practice.

> 2. Noting the wiki's consistent Cassandra backups advice; if I schedule
> nodetool snapshots across the cluster, should the relative age of the
> 'sibling' snapshots be a concern? How far apart can they be before its a
> problem? (seconds? minutes? hours?)

The only strict requirement from Cassandra's point of view, that I can
think of, is the tombstone problem. It is the same as for a node going
offline for an extended period; if GC grace times are exceeded than
bringing a node back up can cause data that was deleted to re-appear
in the cluster. The same is true when restoring a node from an EBS
snapshot (essentially equivalent of the node being down for a while).

Once you have satisfied that requirement, the remaining concern is
mostly that of your application. I.e., to what extent it is acceptable
for your application that the cluster contains data representing
different points in time. Remember that any data not on the same row
key will essentially have their own "timeline" with respect to
back/restore, since different rows will never be guaranteed to be
contained on overlapping nodes in the cluster.

Also be aware that while per-node restores from EBS snapshots is
probably a pretty good catastrophic failure recovery technique, do
realize that a "total loss and restore" event will have an impact on
consistency other than going back in time - unless you can co-ordinate
strictly a fully synchronized snapshot across all nodes in the cluster
(not really feasible on EC2 without extensive mucking about in
userland and temporarily bringing down the cluster). For example, if
you do one QUORUM write to row key A followed by a QUORUM write to row
key B, and you rely on referential integrity (for example) of data in
B referring tot he data in A, that integrity can be broken after a
non-globally-consistent restore happens.

Whether that is a problem will be entirely up to your application.

In any case, after a restore from snapshots, you'll want to run
rolling 'nodetool repair':s to make sure all data is replicated as
soon as possible to the greatest extent possible. At least, again, if
your application benefits from this. The only hard requirement is the
repair schedule relative to GC grace time, and that requirement does
not change - just be mindful of the timing of the EBS snapshots and
what that means to your repair schedule.

-- 
/ Peter Schuller