You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Saleil Bhat (BLOOMBERG/ 731 LEX)" <sb...@bloomberg.net> on 2019/04/02 22:27:47 UTC

Procedures for moving part of a C* cluster to a different datacenter

Hello all, 

I have a question about moving part of a multi-datacenter cluster to a new physical datacenter. 
For example, suppose I have a two-datacenter cluster with one DC in San Jose, California and one DC in Orlando, Florida, and I want to move all the nodes in Orlando to a new datacenter in Tampa.  


The standard procedure for doing this seems to be add a 3rd datacenter to the cluster, stream data to the new datacenter via nodetool rebuild, then decommission the old datacenter. A more detailed review of this procedure can be found here: 
http://thelastpickle.com/blog/2019/02/26/data-center-switch.html



However, I see two problems with the above protocol.  First, it requires changes on the application layer because of the datacenter name change; e.g. all applications referring to the datacenter ‘Orlando’ will now have to be changed to refer to ‘Tampa’.  Second, it requires that a full repair be run on every node in the old datacenter, ensuring that all writes which went to it are replicated to the new datacenter, before decommissioning it. This repair (for a large dataset) can be prohibitively expensive. 



As such, I was wondering what peoples’ thoughts were on the following alternative procedure: 

1) Kill one node in the old datacenter

2) Add a new node in the new datacenter but indicate that it is to REPLACE the one just shutdown; this node will bootstrap, and all the data which it is supposed to be responsible for will be streamed to it

3) Repeat steps one and two until all nodes have been replaced



In particular, I’m curious if anybody has any insight on what  problems can arise if a “logical” datacenter in Cassandra actually spans two different physical datacenters, and whether these problems might be mitigated if the two physical datacenters in question are geographically close together (e.g. Tampa and Orlando). 

Thanks, 
-Saleil 

Re: Procedures for moving part of a C* cluster to a different datacenter

Posted by Paul Chandler <pa...@redshots.com>.
Saleil,

Are you performing any regular repairs on the existing cluster?

If you are, you could set this repair up on the Tampa cluster, then after all the applications have been switched to Tampa, wait for a complete repair cycle, then it will be safe to decommission Orlando. however, there could be missing data in Tampa until the repairs are completed. 

If you are not performing any regular repairs, then you could already have data inconsistencies between the 2 existing clusters, so it won’t be any worse.

Having said that, we moved more than 50 clusters from the UK to Belgium, using a similar process, but we didn’t do any additional repairs apart from the ones performed by Opscenter, and we didn’t have any reports of missing data.

One thing I definitely would not do is have  a “logical” datacenter in Cassandra actually spans two different physical datacenters. If there is any connection issue between the datacenters, including long latencies, then any local_quorum may not serviced, due to 2 replicas being in the inaccessible datacenter.

Finally, we quite often had problems at the rebuild stage, and needed different settings depending on the type of cluster. So be prepared to fail at that point and experiment with different settings, but the good thing about this process is the fact that you can rollback at any stage without affecting the original cluster.

Paul Chandler


> On 3 Apr 2019, at 10:46, Stefan Miklosovic <st...@instaclustr.com> wrote:
> 
> On Wed, 3 Apr 2019 at 18:38, Oleksandr Shulgin
> <oleksandr.shulgin@zalando.de <ma...@zalando.de>> wrote:
>> 
>> On Wed, Apr 3, 2019 at 12:28 AM Saleil Bhat (BLOOMBERG/ 731 LEX) <sb...@bloomberg.net> wrote:
>>> 
>>> 
>>> The standard procedure for doing this seems to be add a 3rd datacenter to the cluster, stream data to the new datacenter via nodetool rebuild, then decommission the old datacenter. A more detailed review of this procedure can be found here:
> http://thelastpickle.com/blog/2019/02/26/data-center-switch.html
>>> 
>>> 
> 
> However, I see two problems with the above protocol. First, it requires changes on the application layer because of the datacenter name change; e.g. all applications referring to the datacenter ‘Orlando’ will now have to be changed to refer to ‘Tampa’.
>> 
>> 
>> Alternatively, you may omit DC specification in the client and provide internal network addresses as the contact points.
> 
> I am afraid you are mixing two things together. I believe OP means
> that he has to change local dc in DCAwareRoundRobinPolicy. I am not
> sure what contact points have to do with that. If there is at least
> one contact point from DC nobody removes all should be fine.
> 
> The process in the article is right. Before transitioning to new DC
> one has to be sure that all writes and reads still target old dc too
> after you alter a keyspace and add new dc there so you dont miss any
> write when something goes south and you have to switch it back. Thats
> achieved by local_one / local_quorum and DCAwareRoundRobinPolicy with
> localDc pointing to the old one.
> 
> Then you do rebuild and you restart your app in such way that new DC
> will be in that policy so new writes and reads are going primarily to
> that DC and once all is fine you drop the old one (you can do maybe
> additional repair to be sure). I think the rolling restart of the app
> is inevitable but if services are in some kind of HA setup I dont see
> a problem with that. From outside it would look like there is not any
> downtime.
> 
> OP has a problem with repair on nodes and it is true that can be time
> consuming, even not doable, but there are workarounds around that and
> I do not want to go into here. You can speed this process
> significantly when you are smart about that and you repair in smaller
> chunks so you dont clog your cluster completely, its called subrange
> repair.
> 
>>> As such, I was wondering what peoples’ thoughts were on the following alternative procedure:
>>> 1) Kill one node in the old datacenter
>>> 2) Add a new node in the new datacenter but indicate that it is to REPLACE the one just shutdown; this node will bootstrap, and all the data which it is supposed to be responsible for will be streamed to it
>> 
>> 
>> I don't think this is going to work.  First, I believe streaming for bootstrap or for replacing a node is DC-local, so the first node won't have any peers to stream from.  Even if it would stream from the remote DC, this single node will own 100% of the ring and will most likely die of the load well before it finishes streaming.
>> 
>> Regards,
>> --
>> Alex
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
> For additional commands, e-mail: user-help@cassandra.apache.org <ma...@cassandra.apache.org>

Re: Procedures for moving part of a C* cluster to a different datacenter

Posted by Stefan Miklosovic <st...@instaclustr.com>.
On Wed, 3 Apr 2019 at 18:38, Oleksandr Shulgin
<ol...@zalando.de> wrote:
>
> On Wed, Apr 3, 2019 at 12:28 AM Saleil Bhat (BLOOMBERG/ 731 LEX) <sb...@bloomberg.net> wrote:
>>
>>
>> The standard procedure for doing this seems to be add a 3rd datacenter to the cluster, stream data to the new datacenter via nodetool rebuild, then decommission the old datacenter. A more detailed review of this procedure can be found here:
http://thelastpickle.com/blog/2019/02/26/data-center-switch.html
>>
>>

However, I see two problems with the above protocol. First, it requires changes on the application layer because of the datacenter name change; e.g. all applications referring to the datacenter ‘Orlando’ will now have to be changed to refer to ‘Tampa’.
>
>
> Alternatively, you may omit DC specification in the client and provide internal network addresses as the contact points.

I am afraid you are mixing two things together. I believe OP means
that he has to change local dc in DCAwareRoundRobinPolicy. I am not
sure what contact points have to do with that. If there is at least
one contact point from DC nobody removes all should be fine.

The process in the article is right. Before transitioning to new DC
one has to be sure that all writes and reads still target old dc too
after you alter a keyspace and add new dc there so you dont miss any
write when something goes south and you have to switch it back. Thats
achieved by local_one / local_quorum and DCAwareRoundRobinPolicy with
localDc pointing to the old one.

Then you do rebuild and you restart your app in such way that new DC
will be in that policy so new writes and reads are going primarily to
that DC and once all is fine you drop the old one (you can do maybe
additional repair to be sure). I think the rolling restart of the app
is inevitable but if services are in some kind of HA setup I dont see
a problem with that. From outside it would look like there is not any
downtime.

OP has a problem with repair on nodes and it is true that can be time
consuming, even not doable, but there are workarounds around that and
I do not want to go into here. You can speed this process
significantly when you are smart about that and you repair in smaller
chunks so you dont clog your cluster completely, its called subrange
repair.

>> As such, I was wondering what peoples’ thoughts were on the following alternative procedure:
>> 1) Kill one node in the old datacenter
>> 2) Add a new node in the new datacenter but indicate that it is to REPLACE the one just shutdown; this node will bootstrap, and all the data which it is supposed to be responsible for will be streamed to it
>
>
> I don't think this is going to work.  First, I believe streaming for bootstrap or for replacing a node is DC-local, so the first node won't have any peers to stream from.  Even if it would stream from the remote DC, this single node will own 100% of the ring and will most likely die of the load well before it finishes streaming.
>
> Regards,
> --
> Alex
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Re: Procedures for moving part of a C* cluster to a different datacenter

Posted by Oleksandr Shulgin <ol...@zalando.de>.
On Wed, Apr 3, 2019 at 12:28 AM Saleil Bhat (BLOOMBERG/ 731 LEX) <
sbhat39@bloomberg.net> wrote:

>
> The standard procedure for doing this seems to be add a 3rd datacenter to
> the cluster, stream data to the new datacenter via nodetool rebuild, then
> decommission the old datacenter. A more detailed review of this procedure
> can be found here:
> http://thelastpickle.com/blog/2019/02/26/data-center-switch.html
>
> However, I see two problems with the above protocol. First, it requires
> changes on the application layer because of the datacenter name change;
> e.g. all applications referring to the datacenter ‘Orlando’ will now have
> to be changed to refer to ‘Tampa’.
>

Alternatively, you may omit DC specification in the client and provide
internal network addresses as the contact points.

As such, I was wondering what peoples’ thoughts were on the following
> alternative procedure:
> 1) Kill one node in the old datacenter
> 2) Add a new node in the new datacenter but indicate that it is to REPLACE
> the one just shutdown; this node will bootstrap, and all the data which it
> is supposed to be responsible for will be streamed to it
>

I don't think this is going to work.  First, I believe streaming for
bootstrap or for replacing a node is DC-local, so the first node won't have
any peers to stream from.  Even if it would stream from the remote DC, this
single node will own 100% of the ring and will most likely die of the load
well before it finishes streaming.

Regards,
-- 
Alex