You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jake Maizel <ja...@soundcloud.com> on 2010/12/02 11:08:39 UTC

Best Practice for Data Center Migration

Hello,

We have a ring of 12 nodes with 6 in one data center and 6 in another.
  We want to shutdown all 6 nodes in data center 1 in order to close
it down.  We are using a replication factor of 3 and are using
RackAwareStrategy with version 0.6.6.

Are there any best practices for doing this type of operation?

We have been thinking that using decomission on each of the nodes in
the old data center one at a time would do the trick.  Does this sound
reasonable?

We have also been considering increasing the replication factor to 4
and then just shutting down all the old nodes.  Would that work as far
as data availability would go?

Any other suggestions?

Thanks.

-- 
Jake Maizel
Network Operations
Soundcloud

Mail & GTalk: jake@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE

Re: Best Practice for Data Center Migration

Posted by Daniel Doubleday <da...@gmx.net>.

Hm - 

assuming that you have configured your initial tokens in a way that every next start token lives in the other datacenter wouldn't it suffice to decrease rf to 2 switch to simple replication strategy switch off the old dc and start repairs/cleanup?

every row should live in either primary node or the node after the the primary (when the primary was located in the switched off dc)

Daniel Doubleday
smeet.com, Berlin

On Dec 2, 2010, at 6:11 PM, Jonathan Ellis wrote:

> On Thu, Dec 2, 2010 at 4:08 AM, Jake Maizel <ja...@soundcloud.com> wrote:
>> Hello,
>> 
>> We have a ring of 12 nodes with 6 in one data center and 6 in another.
>>  We want to shutdown all 6 nodes in data center 1 in order to close
>> it down.  We are using a replication factor of 3 and are using
>> RackAwareStrategy with version 0.6.6.
>> 
>> We have been thinking that using decomission on each of the nodes in
>> the old data center one at a time would do the trick.  Does this sound
>> reasonable?
> 
> That is the simplest approach.  The major downside is that
> RackAwareStrategy guarantees you will have at least one copy of _each_
> row in both DCs, so when you are down to 1 node in dc1 it will have a
> copy of all the data.  If you have a small enough data volume to make
> this feasible then that is the option I would go with.
> 
>> We have also been considering increasing the replication factor to 4
>> and then just shutting down all the old nodes.  Would that work as far
>> as data availability would go?
> 
> Not sure what you are thinking of there, but probably not. :)
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com

Re: Best Practice for Data Center Migration

Posted by Jake Maizel <ja...@soundcloud.com>.

Thanks for the followup.

I have a few follow on questions:

In the case of using decommission, any idea of what happens when we
get to the last node in the old data center?  Do you think it will
decommission properly?

I agree that this sounds like the easiest method.  We have to see if
we can support the storage requirement as we go down the cluster and
decommission.

In the case of changing the RF and dropping the entire old cluster
here's what I was thinking:

We change the RF to 4 which I take as meaning that there will be two
copies of data in each cluster.  So, if we just turn off all the nodes
in the old data center then we still have two copies of all data in
the new data center and then we can rebuild and cleanup things with
nodetool to get to a normal state.  We would then turn down the RF to
3 and rebuild in order to get back to our original config.  The reason
I thought this would work is that since RackAware alternates replica
placement and we have inserted the new data center nodes in between
the old key ranges evenly, a pair of nodes in the new DC would each
get a replica of the data. That would give us some redundancy until we
can rebuild.

I am probably making a bad assumption about the RackAwareStrategy that
blocks this.  If so, it'd be nice if you could explain it to me.

If you have another idea that might be worth discussing I'd appreciate it.

Thanks,

Jake

On Thu, Dec 2, 2010 at 6:11 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> On Thu, Dec 2, 2010 at 4:08 AM, Jake Maizel <ja...@soundcloud.com> wrote:
>> Hello,
>>
>> We have a ring of 12 nodes with 6 in one data center and 6 in another.
>>  We want to shutdown all 6 nodes in data center 1 in order to close
>> it down.  We are using a replication factor of 3 and are using
>> RackAwareStrategy with version 0.6.6.
>>
>> We have been thinking that using decomission on each of the nodes in
>> the old data center one at a time would do the trick.  Does this sound
>> reasonable?
>
> That is the simplest approach.  The major downside is that
> RackAwareStrategy guarantees you will have at least one copy of _each_
> row in both DCs, so when you are down to 1 node in dc1 it will have a
> copy of all the data.  If you have a small enough data volume to make
> this feasible then that is the option I would go with.
>
>> We have also been considering increasing the replication factor to 4
>> and then just shutting down all the old nodes.  Would that work as far
>> as data availability would go?
>
> Not sure what you are thinking of there, but probably not. :)
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

-- 
Jake Maizel
Network Operations
Soundcloud

Mail & GTalk: jake@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE

Re: Best Practice for Data Center Migration

Posted by Jonathan Ellis <jb...@gmail.com>.

On Thu, Dec 2, 2010 at 4:08 AM, Jake Maizel <ja...@soundcloud.com> wrote:
> Hello,
>
> We have a ring of 12 nodes with 6 in one data center and 6 in another.
>  We want to shutdown all 6 nodes in data center 1 in order to close
> it down.  We are using a replication factor of 3 and are using
> RackAwareStrategy with version 0.6.6.
>
> We have been thinking that using decomission on each of the nodes in
> the old data center one at a time would do the trick.  Does this sound
> reasonable?

That is the simplest approach.  The major downside is that
RackAwareStrategy guarantees you will have at least one copy of _each_
row in both DCs, so when you are down to 1 node in dc1 it will have a
copy of all the data.  If you have a small enough data volume to make
this feasible then that is the option I would go with.

> We have also been considering increasing the replication factor to 4
> and then just shutting down all the old nodes.  Would that work as far
> as data availability would go?

Not sure what you are thinking of there, but probably not. :)

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com