You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Dave Viner <da...@vinertech.com> on 2010/07/08 07:50:19 UTC

Backing up the data stored in cassandra

Hi all,

What is the recommended strategy for backing up the data stored inside
cassandra?

I realized that Cass. is a distributed database, and with a decent
replication factor, backups are "already done" in some sense.  But, as a
relatively new user, I'm always concerned that the data is only within the
system and not stored *anywhere* else.

In an earlier email in the list, the recommendation was:

Until tickets 193 and 520 are done, the easiest thing is to copy all
the sstables from the other nodes that have replicas for the ranges it
is responsible for (e.g. for replication factor of 3 on rack unaware
partitioner, the nodes before it and the node after it on the right
would suffice), and then run nodeprobe cleanup to clear out the
excess.

Is this still the recommended approach?  If I backed up the files in
DataDirectories/*, is it possible to restore a node using those files?
 (That is, bring up a new node, copy the backed up files from the
crashed node onto the new node, then have the new node join the
cluster?)


Thanks

Dave Viner

Re: Backing up the data stored in cassandra

Posted by Jonathan Ellis <jb...@gmail.com>.

see http://wiki.apache.org/cassandra/Operations

On Thu, Jul 8, 2010 at 12:50 AM, Dave Viner <da...@vinertech.com> wrote:
> Hi all,
> What is the recommended strategy for backing up the data stored inside
> cassandra?
> I realized that Cass. is a distributed database, and with a decent
> replication factor, backups are "already done" in some sense.  But, as a
> relatively new user, I'm always concerned that the data is only within the
> system and not stored *anywhere* else.
> In an earlier email in the list, the recommendation was:
>
> Until tickets 193 and 520 are done, the easiest thing is to copy all
> the sstables from the other nodes that have replicas for the ranges it
> is responsible for (e.g. for replication factor of 3 on rack unaware
> partitioner, the nodes before it and the node after it on the right
> would suffice), and then run nodeprobe cleanup to clear out the
> excess.
>
> Is this still the recommended approach?  If I backed up the files in
> DataDirectories/*, is it possible to restore a node using those files?
> (That is, bring up a new node, copy the backed up files from the crashed
> node onto the new node, then have the new node join the cluster?)
>
> Thanks
>
> Dave Viner
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com