You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Shubham Srivastava <Sh...@makemytrip.com> on 2012/04/26 08:34:16 UTC

Taking a Cluster Wide Snapshot

Whats the best way(or the only way) to take a cluster wide backup of Cassandra. Cant find much of the documentation on the same.

I am using a MultiDC setup with cassandra 0.8.6.


Regards,
Shubham

RE: Taking a Cluster Wide Snapshot

Posted by Shubham Srivastava <Sh...@makemytrip.com>.

I finally tried the global snapshots stuff below is what I did.

I have taken data from all individual nodes from a single DC after changing the numerical part(this numerical part can have only a max value of 2147483647).

1.My individual nodes were having 12Gb odd data so after combining it was around 77 Gb of data.

2.My new ring had exact configuration in terms of number of nodes,RF and token distribution.

3.I copied all the data on all individual nodes and ran cleanup.

4.After running cleanup all the nodes had an data in range of 34Gb-38Gb .

5.The data on all the nodes was correct validated against the original ring.

6.One of the node in a DC had data close to 68Gb and running cleanup didnt triggered anything

What I can't understood by now is the

1.Is the data size correct . I have RF=3 and 6 nodes in one DC as on the other . Is it because ((77 * 3)/6
2.Why was one node having 68Gb data .

Below is the ring status

Address         DC          Rack        Status State   Load            Owns    Token
                                                                               155962751505430122790891384580033478656
11.60     DC1         RC1         Up     Normal  38.83 GB        8.33%   0
6.136   DC2         RC1         Up     Normal  34.49 GB        8.33%   14178431955039101857246194831382806528
11.61     DC1         RC2         Up     Normal  37.62 GB        8.33%   28356863910078203714492389662765613056
6.137   DC2         RC2         Up     Normal  33.96 GB        8.33%   42535295865117307932921825928971026432
11.62     DC1         RC1         Up     Normal  36.91 GB        8.33%   56713727820156407428984779325531226112
6.138   DC2         RC1         Up     Normal  34.12 GB        8.33%   70892159775195516369780698461381853184
11.63     DC1         RC2         Up     Normal  37.63 GB        8.33%   85070591730234615865843651857942052864
6.139   DC2         RC2         Up     Normal  33.84 GB        8.33%   99249023685273724806639570993792679936
11.64     DC1         RC1         Up     Normal  38.83 GB        8.33%   113427455640312814857969558651062452224
6.140   DC2         RC1         Up     Normal  34.37 GB        8.33%   127605887595351923798765477786913079296
11.65     DC1         RC2         Up     Normal  39.54 GB        8.33%   141784319550391032739561396922763706368
6.141   DC2         RC2         Up     Normal  68.29 GB        8.33%   155962751505430122790891384580033478656

Regards,
Shubham
________________________________
From: Tamar Fraenkel [tamar@tok-media.com]
Sent: Wednesday, May 02, 2012 11:13 AM
To: user@cassandra.apache.org
Subject: Re: Taking a Cluster Wide Snapshot

I think it make's sense and would be happy if you can share the incremental snapshot scripts.
Thanks!
Tamar Fraenkel
Senior Software Engineer, TOK Media

[Inline image 1]

tamar@tok-media.com<ma...@tok-media.com>
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Tue, May 1, 2012 at 11:06 AM, Shubham Srivastava <Sh...@makemytrip.com>> wrote:
On another thought I am writing a code/script for taking a backup of all the nodes in a single DC , renaming data files with some uid and then merging them . The storage however would happen on some storage medium nas for ex which would be in the same DC. This would help in data copying a non hefty job.

Hopefully the one single DC data(from all the nodes in this DC) should give me the complete data just in case if RF >=1 .

The next improvement would be do do the same on incremental snapshots so that once you have a baseline data all the rest would be collecting chunks of increments alone and merging it with the original global snapshot.

I have do the same on each individual DC's.

Do you guys agree?

Regards,
Shubham


From: Tamar Fraenkel [tamar@tok-media.com<ma...@tok-media.com>]
Sent: Tuesday, May 01, 2012 10:50 AM

To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Taking a Cluster Wide Snapshot

Thanks for posting the script.
I see that the snapshot is always a full one, and if I understand correctly, it replaces the old snapshot on S3. Am I right?

Tamar Fraenkel
Senior Software Engineer, TOK Media

[Inline image 1]

tamar@tok-media.com<ma...@tok-media.com>
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Thu, Apr 26, 2012 at 9:39 AM, Deno Vichas <de...@syncopated.net>> wrote:
On 4/25/2012 11:34 PM, Shubham Srivastava wrote:
Whats the best way(or the only way) to take a cluster wide backup of Cassandra. Cant find much of the documentation on the same.

I am using a MultiDC setup with cassandra 0.8.6.


Regards,
Shubham
 here's how i'm doing in AWS land using the DataStax AMI via a nightly cron job.  you'll need pssh and s3cmd -


#!/bin/bash
cd /home/ec2-user/ops

echo "making snapshots"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 clearsnapshot stocktouch'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 snapshot stocktouch'

echo "making tar balls"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm `hostname`-cassandra-snapshot.tar.gz'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf `hostname`-cassandra-snapshot.tar.gz /raid0/cassandra/data/stocktouch/snapshots'

echo "coping tar balls"
pslurp -h prod-cassandra-nodes.txt -l ubuntu /home/ubuntu/*cassandra-snapshot.tar.gz .

echo "tar'ing tar balls"
tar -cvf cassandra-snapshots-all-nodes.tar 10*

echo "pushing to S3"
../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar  s3://stocktouch-backups

echo "DONE!"

Re: Taking a Cluster Wide Snapshot

Posted by Tamar Fraenkel <ta...@tok-media.com>.

I think it make's sense and would be happy if you can share the incremental
snapshot scripts.
Thanks!
*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

tamar@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Tue, May 1, 2012 at 11:06 AM, Shubham Srivastava <
Shubham.Srivastava@makemytrip.com> wrote:

>  On another thought I am writing a code/script for taking a backup of all
> the nodes in a single DC , renaming data files with some uid and then
> merging them . The storage however would happen on some storage medium nas
> for ex which would be in the same DC. This would help in data copying a non
> hefty job.
>
>  Hopefully the one single DC data(from all the nodes in this DC) should
> give me the complete data just in case if RF >=1 .
>
>  The next improvement would be do do the same on incremental snapshots so
> that once you have a baseline data all the rest would be collecting chunks
> of increments alone and merging it with the original global snapshot.
>
>  I have do the same on each individual DC's.
>
>  Do you guys agree?
>
>  Regards,
> Shubham
>
>
>  *From:* Tamar Fraenkel [tamar@tok-media.com]
> *Sent:* Tuesday, May 01, 2012 10:50 AM
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: Taking a Cluster Wide Snapshot
>
>   Thanks for posting the script.
> I see that the snapshot is always a full one, and if I understand
> correctly, it replaces the old snapshot on S3. Am I right?
>
>  *Tamar Fraenkel *
> Senior Software Engineer, TOK Media
>
> [image: Inline image 1]
>
> tamar@tok-media.com
> Tel:   +972 2 6409736
> Mob:  +972 54 8356490
> Fax:   +972 2 5612956
>
>
>
>
>
> On Thu, Apr 26, 2012 at 9:39 AM, Deno Vichas <de...@syncopated.net> wrote:
>
>>  On 4/25/2012 11:34 PM, Shubham Srivastava wrote:
>>
>> Whats the best way(or the only way) to take a cluster wide backup of
>> Cassandra. Cant find much of the documentation on the same.
>>
>>  I am using a MultiDC setup with cassandra 0.8.6.
>>
>>
>>  Regards,
>> Shubham
>>
>>   here's how i'm doing in AWS land using the DataStax AMI via a nightly
>> cron job.  you'll need pssh and s3cmd -
>>
>>
>> #!/bin/bash
>> cd /home/ec2-user/ops
>>
>> echo "making snapshots"
>> pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p
>> 7199 clearsnapshot stocktouch'
>> pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p
>> 7199 snapshot stocktouch'
>>
>> echo "making tar balls"
>> pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm
>> `hostname`-cassandra-snapshot.tar.gz'
>> pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf
>> `hostname`-cassandra-snapshot.tar.gz
>> /raid0/cassandra/data/stocktouch/snapshots'
>>
>> echo "coping tar balls"
>> pslurp -h prod-cassandra-nodes.txt -l ubuntu
>> /home/ubuntu/*cassandra-snapshot.tar.gz .
>>
>> echo "tar'ing tar balls"
>> tar -cvf cassandra-snapshots-all-nodes.tar 10*
>>
>> echo "pushing to S3"
>> ../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar
>> s3://stocktouch-backups
>>
>> echo "DONE!"
>>
>>
>

RE: Taking a Cluster Wide Snapshot

Posted by Shubham Srivastava <Sh...@makemytrip.com>.

On another thought I am writing a code/script for taking a backup of all the nodes in a single DC , renaming data files with some uid and then merging them . The storage however would happen on some storage medium nas for ex which would be in the same DC. This would help in data copying a non hefty job.

Hopefully the one single DC data(from all the nodes in this DC) should give me the complete data just in case if RF >=1 .

The next improvement would be do do the same on incremental snapshots so that once you have a baseline data all the rest would be collecting chunks of increments alone and merging it with the original global snapshot.

I have do the same on each individual DC's.

Do you guys agree?

Regards,
Shubham


From: Tamar Fraenkel [tamar@tok-media.com]
Sent: Tuesday, May 01, 2012 10:50 AM
To: user@cassandra.apache.org
Subject: Re: Taking a Cluster Wide Snapshot

Thanks for posting the script.
I see that the snapshot is always a full one, and if I understand correctly, it replaces the old snapshot on S3. Am I right?

Tamar Fraenkel
Senior Software Engineer, TOK Media

[Inline image 1]

tamar@tok-media.com<ma...@tok-media.com>
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Thu, Apr 26, 2012 at 9:39 AM, Deno Vichas <de...@syncopated.net>> wrote:
On 4/25/2012 11:34 PM, Shubham Srivastava wrote:
Whats the best way(or the only way) to take a cluster wide backup of Cassandra. Cant find much of the documentation on the same.

I am using a MultiDC setup with cassandra 0.8.6.


Regards,
Shubham
 here's how i'm doing in AWS land using the DataStax AMI via a nightly cron job.  you'll need pssh and s3cmd -


#!/bin/bash
cd /home/ec2-user/ops

echo "making snapshots"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 clearsnapshot stocktouch'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 snapshot stocktouch'

echo "making tar balls"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm `hostname`-cassandra-snapshot.tar.gz'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf `hostname`-cassandra-snapshot.tar.gz /raid0/cassandra/data/stocktouch/snapshots'

echo "coping tar balls"
pslurp -h prod-cassandra-nodes.txt -l ubuntu /home/ubuntu/*cassandra-snapshot.tar.gz .

echo "tar'ing tar balls"
tar -cvf cassandra-snapshots-all-nodes.tar 10*

echo "pushing to S3"
../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar  s3://stocktouch-backups

echo "DONE!"

Re: Taking a Cluster Wide Snapshot

Posted by Tamar Fraenkel <ta...@tok-media.com>.

Thanks for posting the script.
I see that the snapshot is always a full one, and if I understand
correctly, it replaces the old snapshot on S3. Am I right?

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

tamar@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Thu, Apr 26, 2012 at 9:39 AM, Deno Vichas <de...@syncopated.net> wrote:

>  On 4/25/2012 11:34 PM, Shubham Srivastava wrote:
>
> Whats the best way(or the only way) to take a cluster wide backup of
> Cassandra. Cant find much of the documentation on the same.
>
>  I am using a MultiDC setup with cassandra 0.8.6.
>
>
>  Regards,
> Shubham
>
>  here's how i'm doing in AWS land using the DataStax AMI via a nightly
> cron job.  you'll need pssh and s3cmd -
>
>
> #!/bin/bash
> cd /home/ec2-user/ops
>
> echo "making snapshots"
> pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p
> 7199 clearsnapshot stocktouch'
> pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p
> 7199 snapshot stocktouch'
>
> echo "making tar balls"
> pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm
> `hostname`-cassandra-snapshot.tar.gz'
> pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf
> `hostname`-cassandra-snapshot.tar.gz
> /raid0/cassandra/data/stocktouch/snapshots'
>
> echo "coping tar balls"
> pslurp -h prod-cassandra-nodes.txt -l ubuntu
> /home/ubuntu/*cassandra-snapshot.tar.gz .
>
> echo "tar'ing tar balls"
> tar -cvf cassandra-snapshots-all-nodes.tar 10*
>
> echo "pushing to S3"
> ../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar
> s3://stocktouch-backups
>
> echo "DONE!"
>
>

Re: Taking a Cluster Wide Snapshot

Posted by Rob Coli <rc...@palominodb.com>.

On Thu, Apr 26, 2012 at 10:38 PM, Shubham Srivastava
<Sh...@makemytrip.com> wrote:
> On another thought I could also try copying the data of my keyspace alone from one node to another node in the new cluster (I have both the old and new clusters having same nodes DC1:6,DC2:6 with same tokens) with the same tokens.
>
> Would there be any risk of the new cluster getting joined to the old cluster probably if the data inside keyspace is aware of the original IP's etc.

As a result of this very concern while @ Digg...

https://issues.apache.org/jira/browse/CASSANDRA-769

tl;dr : as long as your cluster names are unique in your cluster
config (**and you do not copy the System keyspace, letting the new
cluster initialize with the new cluster name**), nodes are at no risk
of joining the wrong cluster.

=Rob

-- 
=Robert Coli
AIM&GTALK - rcoli@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb

RE: Taking a Cluster Wide Snapshot

Posted by Shubham Srivastava <Sh...@makemytrip.com>.

Thanks a lot Rob. 

On another thought I could also try copying the data of my keyspace alone from one node to another node in the new cluster (I have both the old and new clusters having same nodes DC1:6,DC2:6 with same tokens) with the same tokens.

Would there be any risk of the new cluster getting joined to the old cluster probably if the data inside keyspace is aware of the original IP's etc. 

Is this recommended?

Regards,
Shubham
________________________________________
From: Rob Coli [rcoli@palominodb.com]
Sent: Thursday, April 26, 2012 11:42 PM
To: user@cassandra.apache.org
Subject: Re: Taking a Cluster Wide Snapshot

> I copied all the snapshots from each individual nodes where the snapshot
> data size was around 12Gb on each node to a common folder(one folder alone).
>
> Strangely I found duplicate file names in multiple snapshots and
> more strangely the data size was different of each duplicate file which lead
> to the total data size to close to 13Gb(else have to be overwritten) where
> as the expectation was 12*6 = 72Gb.

You have detected via experimentation that the namespacing of sstable
filenames per CF per node is not unique. In order to do the operation
you are doing, you have to rename them to be globally unique. Just
inflate the integer part is the easiest way.

https://issues.apache.org/jira/browse/CASSANDRA-1983

=Rob

--
=Robert Coli
AIM&GTALK - rcoli@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb

Re: Taking a Cluster Wide Snapshot

Posted by Rob Coli <rc...@palominodb.com>.

> I copied all the snapshots from each individual nodes where the snapshot
> data size was around 12Gb on each node to a common folder(one folder alone).
>
> Strangely I found duplicate file names in multiple snapshots and
> more strangely the data size was different of each duplicate file which lead
> to the total data size to close to 13Gb(else have to be overwritten) where
> as the expectation was 12*6 = 72Gb.

You have detected via experimentation that the namespacing of sstable
filenames per CF per node is not unique. In order to do the operation
you are doing, you have to rename them to be globally unique. Just
inflate the integer part is the easiest way.

https://issues.apache.org/jira/browse/CASSANDRA-1983

=Rob

-- 
=Robert Coli
AIM&GTALK - rcoli@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb

RE: Taking a Cluster Wide Snapshot

Posted by Shubham Srivastava <Sh...@makemytrip.com>.

I was trying to get hold of all the data kind of a global snapshot.

I did the below :

I copied all the snapshots from each individual nodes where the snapshot data size was around 12Gb on each node to a common folder(one folder alone).

Strangely I found duplicate file names in multiple snapshots and more strangely the data size was different of each duplicate file which lead to the total data size to close to 13Gb(else have to be overwritten) where as the expectation was 12*6 = 72Gb.

Does that mean that if I need to create a new ring with the same data as the existing one I cant just do that or should I start with the 13Gb copy to check if all the data is present which sounds pretty illogical.

Please suggest??

________________________________
From: Shubham Srivastava
Sent: Thursday, April 26, 2012 12:43 PM
To: 'user@cassandra.apache.org'
Subject: Re: Taking a Cluster Wide Snapshot

Your second part was what I was also referring where I put all the files from nodes to a single node to create a similar bkp which needs to have unique file names across cluster.


From: Deno Vichas [mailto:deno@syncopated.net]
Sent: Thursday, April 26, 2012 12:29 PM
To: user@cassandra.apache.org <us...@cassandra.apache.org>
Subject: Re: Taking a Cluster Wide Snapshot

there's no prerequisite for unique names.  each node's snapshot gets tar'ed up and then copied over to a directory the name of the hostname of the node.  then those dirs are tar'ed and copied to S3.

what i haven't tried yet is to untar everything for all nodes into a single node cluster.  i'm assuming i can get tar to replace or skip existing file so i end up with a set of unique files.  can somebody confirm this?




On 4/25/2012 11:45 PM, Shubham Srivastava wrote:
Thanks a Lot Deno.  A bit surprised that the an equivalent command should be there with nodetool. Not sure if it is in the latest release.

BTW this makes a prerequisite that all the Data files of Cassandra be it index or filters etc will have unique names across cluster. Is this a reasoanble assumption to have.

Regards,
Shubham
________________________________
From: Deno Vichas [deno@syncopated.net<ma...@syncopated.net>]
Sent: Thursday, April 26, 2012 12:09 PM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Taking a Cluster Wide Snapshot

On 4/25/2012 11:34 PM, Shubham Srivastava wrote:
Whats the best way(or the only way) to take a cluster wide backup of Cassandra. Cant find much of the documentation on the same.

I am using a MultiDC setup with cassandra 0.8.6.


Regards,
Shubham
 here's how i'm doing in AWS land using the DataStax AMI via a nightly cron job.  you'll need pssh and s3cmd -


#!/bin/bash
cd /home/ec2-user/ops

echo "making snapshots"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 clearsnapshot stocktouch'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 snapshot stocktouch'

echo "making tar balls"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm `hostname`-cassandra-snapshot.tar.gz'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf `hostname`-cassandra-snapshot.tar.gz /raid0/cassandra/data/stocktouch/snapshots'

echo "coping tar balls"
pslurp -h prod-cassandra-nodes.txt -l ubuntu /home/ubuntu/*cassandra-snapshot.tar.gz .

echo "tar'ing tar balls"
tar -cvf cassandra-snapshots-all-nodes.tar 10*

echo "pushing to S3"
../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar  s3://stocktouch-backups

echo "DONE!"

Re: Taking a Cluster Wide Snapshot

Posted by Shubham Srivastava <Sh...@makemytrip.com>.

Your second part was what I was also referring where I put all the files from nodes to a single node to create a similar bkp which needs to have unique file names across cluster.

From: Deno Vichas [mailto:deno@syncopated.net]
Sent: Thursday, April 26, 2012 12:29 PM
To: user@cassandra.apache.org <us...@cassandra.apache.org>
Subject: Re: Taking a Cluster Wide Snapshot

there's no prerequisite for unique names.  each node's snapshot gets tar'ed up and then copied over to a directory the name of the hostname of the node.  then those dirs are tar'ed and copied to S3.

what i haven't tried yet is to untar everything for all nodes into a single node cluster.  i'm assuming i can get tar to replace or skip existing file so i end up with a set of unique files.  can somebody confirm this?

On 4/25/2012 11:45 PM, Shubham Srivastava wrote:
Thanks a Lot Deno.  A bit surprised that the an equivalent command should be there with nodetool. Not sure if it is in the latest release.

BTW this makes a prerequisite that all the Data files of Cassandra be it index or filters etc will have unique names across cluster. Is this a reasoanble assumption to have.

Regards,
Shubham
________________________________
From: Deno Vichas [deno@syncopated.net<ma...@syncopated.net>]
Sent: Thursday, April 26, 2012 12:09 PM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Taking a Cluster Wide Snapshot

On 4/25/2012 11:34 PM, Shubham Srivastava wrote:
Whats the best way(or the only way) to take a cluster wide backup of Cassandra. Cant find much of the documentation on the same.

I am using a MultiDC setup with cassandra 0.8.6.

Regards,
Shubham
 here's how i'm doing in AWS land using the DataStax AMI via a nightly cron job.  you'll need pssh and s3cmd -

#!/bin/bash
cd /home/ec2-user/ops

echo "making snapshots"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 clearsnapshot stocktouch'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 snapshot stocktouch'

echo "making tar balls"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm `hostname`-cassandra-snapshot.tar.gz'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf `hostname`-cassandra-snapshot.tar.gz /raid0/cassandra/data/stocktouch/snapshots'

echo "coping tar balls"
pslurp -h prod-cassandra-nodes.txt -l ubuntu /home/ubuntu/*cassandra-snapshot.tar.gz .

echo "tar'ing tar balls"
tar -cvf cassandra-snapshots-all-nodes.tar 10*

echo "pushing to S3"
../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar  s3://stocktouch-backups

echo "DONE!"

Re: Taking a Cluster Wide Snapshot

Posted by Deno Vichas <de...@syncopated.net>.

there's no prerequisite for unique names.  each node's snapshot gets 
tar'ed up and then copied over to a directory the name of the hostname 
of the node.  then those dirs are tar'ed and copied to S3.

what i haven't tried yet is to untar everything for all nodes into a 
single node cluster.  i'm assuming i can get tar to replace or skip 
existing file so i end up with a set of unique files.  can somebody 
confirm this?




On 4/25/2012 11:45 PM, Shubham Srivastava wrote:
> Thanks a Lot Deno.  A bit surprised that the an equivalent command 
> should be there with nodetool. Not sure if it is in the latest release.
>
> BTW this makes a prerequisite that all the Data files of Cassandra be 
> it index or filters etc will have unique names across cluster. Is this 
> a reasoanble assumption to have.
>
> Regards,
> Shubham
> ------------------------------------------------------------------------
> *From:* Deno Vichas [deno@syncopated.net]
> *Sent:* Thursday, April 26, 2012 12:09 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Taking a Cluster Wide Snapshot
>
> On 4/25/2012 11:34 PM, Shubham Srivastava wrote:
>> Whats the best way(or the only way) to take a cluster wide backup of 
>> Cassandra. Cant find much of the documentation on the same.
>>
>> I am using a MultiDC setup with cassandra 0.8.6.
>>
>>
>> Regards,
>> Shubham
>  here's how i'm doing in AWS land using the DataStax AMI via a nightly 
> cron job. you'll need pssh and s3cmd -
>
>
> #!/bin/bash
> cd /home/ec2-user/ops
>
> echo "making snapshots"
> pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost 
> -p 7199 clearsnapshot stocktouch'
> pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost 
> -p 7199 snapshot stocktouch'
>
> echo "making tar balls"
> pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm 
> `hostname`-cassandra-snapshot.tar.gz'
> pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf 
> `hostname`-cassandra-snapshot.tar.gz 
> /raid0/cassandra/data/stocktouch/snapshots'
>
> echo "coping tar balls"
> pslurp -h prod-cassandra-nodes.txt -l ubuntu 
> /home/ubuntu/*cassandra-snapshot.tar.gz .
>
> echo "tar'ing tar balls"
> tar -cvf cassandra-snapshots-all-nodes.tar 10*
>
> echo "pushing to S3"
> ../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar  
> s3://stocktouch-backups
>
> echo "DONE!"
>

RE: Taking a Cluster Wide Snapshot

Posted by Shubham Srivastava <Sh...@makemytrip.com>.

Thanks a Lot Deno.  A bit surprised that the an equivalent command should be there with nodetool. Not sure if it is in the latest release.

BTW this makes a prerequisite that all the Data files of Cassandra be it index or filters etc will have unique names across cluster. Is this a reasoanble assumption to have.

Regards,
Shubham
________________________________
From: Deno Vichas [deno@syncopated.net]
Sent: Thursday, April 26, 2012 12:09 PM
To: user@cassandra.apache.org
Subject: Re: Taking a Cluster Wide Snapshot

On 4/25/2012 11:34 PM, Shubham Srivastava wrote:
Whats the best way(or the only way) to take a cluster wide backup of Cassandra. Cant find much of the documentation on the same.

I am using a MultiDC setup with cassandra 0.8.6.

Regards,
Shubham
 here's how i'm doing in AWS land using the DataStax AMI via a nightly cron job.  you'll need pssh and s3cmd -

#!/bin/bash
cd /home/ec2-user/ops

echo "making snapshots"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 clearsnapshot stocktouch'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 snapshot stocktouch'

echo "making tar balls"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm `hostname`-cassandra-snapshot.tar.gz'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf `hostname`-cassandra-snapshot.tar.gz /raid0/cassandra/data/stocktouch/snapshots'

echo "coping tar balls"
pslurp -h prod-cassandra-nodes.txt -l ubuntu /home/ubuntu/*cassandra-snapshot.tar.gz .

echo "tar'ing tar balls"
tar -cvf cassandra-snapshots-all-nodes.tar 10*

echo "pushing to S3"
../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar  s3://stocktouch-backups

echo "DONE!"

Re: Taking a Cluster Wide Snapshot

Posted by Deno Vichas <de...@syncopated.net>.

On 4/25/2012 11:34 PM, Shubham Srivastava wrote:
> Whats the best way(or the only way) to take a cluster wide backup of 
> Cassandra. Cant find much of the documentation on the same.
>
> I am using a MultiDC setup with cassandra 0.8.6.
>
>
> Regards,
> Shubham
  here's how i'm doing in AWS land using the DataStax AMI via a nightly 
cron job. you'll need pssh and s3cmd -


#!/bin/bash
cd /home/ec2-user/ops

echo "making snapshots"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 
7199 clearsnapshot stocktouch'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 
7199 snapshot stocktouch'

echo "making tar balls"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm 
`hostname`-cassandra-snapshot.tar.gz'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf 
`hostname`-cassandra-snapshot.tar.gz 
/raid0/cassandra/data/stocktouch/snapshots'

echo "coping tar balls"
pslurp -h prod-cassandra-nodes.txt -l ubuntu 
/home/ubuntu/*cassandra-snapshot.tar.gz .

echo "tar'ing tar balls"
tar -cvf cassandra-snapshots-all-nodes.tar 10*

echo "pushing to S3"
../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar  
s3://stocktouch-backups

echo "DONE!"