You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Germain MAURICE <ge...@oscaro.com> on 2022/04/11 09:52:23 UTC

Cassandra migration process

Hello,
In my company we are working on migrating our cassandra cluster from a provider to another one, we plan to migrate the data adding a node and decommissioning an old one.
We would like to throttle the bandwith used between the both providers to preserve the capacity of the link.
We would to like to confirm if the process of migration and throughput throttling  we plan is the right one.
The plan is the following :

  *   installing a new node on gcp
  *   setting streamthroughput on each on-premise node (3 nodes) that will ensure we don’t use more than 3 * streamthrougput of bandwith of the link between the both provider
  *   launch ˋnodetool decommission` on an on-premise node
  *   wait for the end of the decommission

Is that right ?
Thank you for your answer.

Re: Cassandra migration process

Posted by Bowen Song <bo...@bso.ng>.

Paul is right. It's generally better to setup a new DC and then 
decommission the existing DC.

However, if the network latency is not a concern, and the cost of 
running two DCs in parallel is prohibitively high, you could do node by 
node replacement assuming the settings in the cassandra.yaml are 
compatible. Pay attention to endpoint_snitch.

If you are going to do this, it's better to use 
"-Dcassandra.replace_address_first_boot=..." instead of repeatedly 
decommissioning and adding nodes. Note: this require the same num_tokens 
on the old and the replacement node.

Decommissioning and adding nodes will shuffle the token ring, which 
could lead to unnecessary streaming activities and excessive disk space 
usage. Although the disk space can be reclaimed by "nodetool cleanup", 
you may find yourself need to run that frequently while moving nodes in 
order to avoid running out of disk space on other nodes.

See 
https://cassandra.apache.org/doc/3.11/cassandra/operating/topo_changes.html#replacing-a-dead-node 
for the process of replacing a dead node. Tips: shutting down a live 
node to turn it into a dead node.

See also 
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsReplaceNode.html 
for a detailed description of the process, but keep in mind that this is 
written for an older version and uses "-Dcassandra.replace_address" 
instead of the new "-Dcassandra.replace_address_first_boot" option. Both 
will work, but the new option is generally preferred, because it 
minimizes the risk of messing up the cluster if you forgot to remove it 
after the node has fully joined the cluster.

On 11/04/2022 11:43, Paul Chandler wrote:
> I would recommend creating a second Cassandra Datacenter for the 
> cluster, rather than single nodes in the same DC, this is likely to 
> cause latency issues, due to quorum queries being across datacenters.
>
> We did this several times, moving from Rackspace to GCP, this is all 
> documented in 3 blog posts starting here: 
> https://www.redshots.com/moving-cassandra-clusters-without-downtime-part-1/ 
>
>
> If you have any further questions let me know.
>
> Thanks
>
> Paul
>
>> On 11 Apr 2022, at 10:52, Germain MAURICE 
>> <ge...@oscaro.com> wrote:
>>
>> Hello,
>> In my company we are working on migrating our cassandra cluster from 
>> a provider to another one, we plan to migrate the data adding a node 
>> and decommissioning an old one.
>> We would like to throttle the bandwith used between the both 
>> providers to preserve the capacity of the link.
>> We would to like to confirm if the process of migration and 
>> throughput throttling  we plan is the right one.
>> The plan is the following :
>>
>>   * installing a new node on gcp
>>   * setting streamthroughput on each on-premise node (3 nodes) that
>>     will ensure we don’t use more than 3 * streamthrougput of
>>     bandwith of the link between the both provider
>>   * launch ˋnodetool decommission` on an on-premise node
>>   * wait for the end of the decommission
>>
>> Is that right ?
>> Thank you for your answer.
>

Re: Cassandra migration process

Posted by Paul Chandler <pa...@redshots.com>.

I would recommend creating a second Cassandra Datacenter for the cluster, rather than single nodes in the same DC, this is likely to cause latency issues, due to quorum queries being across datacenters.

We did this several times, moving from Rackspace to GCP, this is all documented in 3 blog posts starting here: https://www.redshots.com/moving-cassandra-clusters-without-downtime-part-1/ 

If you have any further questions let me know. 

Thanks 

Paul 

> On 11 Apr 2022, at 10:52, Germain MAURICE <ge...@oscaro.com> wrote:
> 
> Hello,
> In my company we are working on migrating our cassandra cluster from a provider to another one, we plan to migrate the data adding a node and decommissioning an old one.
> We would like to throttle the bandwith used between the both providers to preserve the capacity of the link.
> We would to like to confirm if the process of migration and throughput throttling  we plan is the right one.
> The plan is the following :
> installing a new node on gcp
> setting streamthroughput on each on-premise node (3 nodes) that will ensure we don’t use more than 3 * streamthrougput of bandwith of the link between the both provider
> launch ˋnodetool decommission` on an on-premise node
> wait for the end of the decommission
> Is that right ?
> Thank you for your answer.