You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Senthil, Athinanthny X. -ND" <At...@disney.com> on 2014/01/30 06:45:07 UTC

Restoring keyspace using snapshots

Plan to backup and restore keyspace from PROD to PRE-PROD cluster which has same number  of nodes. Keyspace will have few hundred millions of rows. We need to do this every other week. Which one of the below  options most time-efficient and puts less stress on target cluster ? We want to finish backup and restore in low usage time window.
Nodetool refresh

1.      Take a snapshot from individual nodes from prod

2.      Copy the sstable data and index files to pre-prod cluster (copy the snapshots to respective nodes based on token assignment)

3.      Cleanup old data and

4.      Run nodetool refresh on every node

Sstableloader

1.      Take a snapshot from individual nodes from prod

2.      Copy the sstable data and index files from all nodes to 1 node in  pre-prod cluster

3.      Cleanup old data

4.      Then run sstableloader to load data to respective keyspace/ CF. (Does sstableloader work in cluster (without vnodes ) where authentication is enabled)

CQL3 COPY

I tried this for CF that have <1 million rows and it works fine . But for large CF it throws rpc_timeout error
Any other suggestions?

Re: Restoring keyspace using snapshots

Posted by John Anderstedt <jo...@svenskaspel.se>.
In this case I would go on with the nodetool refresh, simply because you use the machines in a more effective way.(copy data from one node to another, each node cleans/refresh the data itself) if the clustersetup is the same with nodes/tokens there’s no need to copy all the data to one point and then stream it into another cluster(copy data many to one and then stream one to many).

mvh/regards
john


30 jan 2014 kl. 06:45 skrev Senthil, Athinanthny X. -ND <At...@disney.com>>:

Plan to backup and restore keyspace from PROD to PRE-PROD cluster which has same number  of nodes. Keyspace will have few hundred millions of rows. We need to do this every other week. Which one of the below  options most time-efficient and puts less stress on target cluster ? We want to finish backup and restore in low usage time window.
Nodetool refresh
1.      Take a snapshot from individual nodes from prod
2.      Copy the sstable data and index files to pre-prod cluster (copy the snapshots to respective nodes based on token assignment)
3.      Cleanup old data and
4.      Run nodetool refresh on every node

Sstableloader
1.      Take a snapshot from individual nodes from prod
2.      Copy the sstable data and index files from all nodes to 1 node in  pre-prod cluster
3.      Cleanup old data
4.      Then run sstableloader to load data to respective keyspace/ CF. (Does sstableloader work in cluster (without vnodes ) where authentication is enabled)

CQL3 COPY
I tried this for CF that have <1 million rows and it works fine . But for large CF it throws rpc_timeout error
Any other suggestions?

AB SVENSKA SPEL
621 80 Visby
Norra Hansegatan 17, Visby
Växel: +4610-120 00 00
https://svenskaspel.se

Please consider the environment before printing this email

Re: Restoring keyspace using snapshots

Posted by Robert Coli <rc...@eventbrite.com>.
On Wed, Jan 29, 2014 at 9:45 PM, Senthil, Athinanthny X. -ND <
Athinanthny.X.Senthil.-ND@disney.com> wrote:

> Plan to backup and restore keyspace from PROD to PRE-PROD cluster which
> has same number  of nodes. Keyspace will have few hundred millions of rows.
> We need to do this every other week. Which one of the below  options most
> time-efficient and puts less stress on target cluster ? We want to finish
> backup and restore in low usage time window.
>

http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra

Has some details on when each approach may be better or worse. In your
case, you should probably just do the "copy-the-sstables" method. If the
target cluster has the same number of nodes, just assign it the same tokens
and then just copy SSTables from SOURCE_NODE_A to TARGET_NODE_A and so on.
If you do that, you don't even have to run cleanup, because no nodes have
changed their range ownership.

Don't use refresh if you don't need to, just (coalesce the target cluster,
load schema and then) copy the SSTables into the dir with the node down,
and then start it.

Refresh's current design is unsafe :

https://issues.apache.org/jira/browse/CASSANDRA-6245
https://issues.apache.org/jira/browse/CASSANDRA-6514

=Rob