You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Donald Smith <Do...@audiencescience.com> on 2014/10/15 01:52:43 UTC

Question about adding nodes incrementally to a new datacenter: wait til all hosts come up so they can learn the token ranges?

Suppose I create a new DC with 25 nodes. I have their IPs in cassandra-topology.properties.  Twenty-three of the nodes start up, but two of the nodes fail to start.   If I start replicating (via "nodetool rebuild") without those two nodes, then when those 2 nodes enter the DC the distribution of tokens to vnodes will change and I'd need to rebuild or bootstrap, right?

In other words, it's better to wait til all nodes come up before we start replicating.  Does this sound right?

I presume that all the nodes need to come up so it can learn the token ranges.

Thanks, Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
donalds@AudienceScience.com<ma...@AudienceScience.com>

[AudienceScience]


Re: Question about adding nodes incrementally to a new datacenter: wait til all hosts come up so they can learn the token ranges?

Posted by Robert Coli <rc...@eventbrite.com>.
On Wed, Oct 15, 2014 at 3:25 PM, Donald Smith <
Donald.Smith@audiencescience.com> wrote:

>  So, my point is that to avoid the need to bootstrap and to cleanup, it's
> better to bring all nodes up at about the same time.  If this is wrong,
> please explain why.
>
Oh, sure. As you say, you avoid having to run cleanup.

FWIW, you have just explored why "rebuild" exists. Before "rebuild"
existed, there was no [1] way to add all the nodes in the new DC (with
auto_bootstrap:false) and then have them "bootstrap" their data. "rebuild"
allows you to bring up the new DC with all its nodes and RF=0, and then do
an operation which is like bootstrap from the perspective of data
transferred, but does not have the limitations of bootstrap.

=Rob
http://twitter.com/rcolidba
[1] As I typed this, I realized (and discussed w/ driftx) that there is
logically probably "a" way to do this operation (by using replace_XXX)...
it's just fiddly, involves multiple node restarts per node, and might
actually not work in practice due to implementation details.

Re: Question about adding nodes incrementally to a new datacenter: wait til all hosts come up so they can learn the token ranges?

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
"So, my point is that to avoid the need to bootstrap and to cleanup, it's
better to bring all nodes up at about the same time.  If this is wrong,
please explain why."

LGTM. That's how I do it. Balance first your ring by adding all the nodes
you want, adding them with "auto_bootstrap: false", then open your DC by
Altering the keyspace and setting the new DC RF (Make sure to read this
first:
http://www.datastax.com/documentation/cassandra/1.2/cassandra/operations/ops_add_dc_to_cluster_t.html).
Then rebuild from an other DC.

Good luck

2014-10-16 0:25 GMT+02:00 Donald Smith <Do...@audiencescience.com>:

>  Even with vnodes, when you add a node to a cluster, it takes over some
> portions of the token range.  If the other nodes have been running for a
> long time you should *bootstrap *the new node, so it gets old data.  Then
> you should run "*nodetool cleanup*" on the other nodes to eliminate
> no-longer-needed rows which now belong to the new node.
>
>
> So, my point is that to avoid the need to bootstrap and to cleanup, it's
> better to bring all nodes up at about the same time.  If this is wrong,
> please explain why.
>
>
>  Thanks, Don
>  ------------------------------
> *From:* Robert Coli <rc...@eventbrite.com>
> *Sent:* Wednesday, October 15, 2014 1:54 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Question about adding nodes incrementally to a new
> datacenter: wait til all hosts come up so they can learn the token ranges?
>
>   On Tue, Oct 14, 2014 at 4:52 PM, Donald Smith <
> Donald.Smith@audiencescience.com> wrote:
>
>>  Suppose I create a new DC with 25 nodes. I have their IPs in
>> cassandra-topology.properties.  Twenty-three of the nodes start up, but two
>> of the nodes fail to start.   If I start replicating (via "nodetool
>> rebuild") without those two nodes, then when those 2 nodes enter the DC the
>> distribution of tokens to vnodes will change and I'd need to rebuild or
>> bootstrap, right?
>>
>>
>>
>> In other words, it's better to wait til all nodes come up before we start
>> replicating.  Does this sound right?
>>
>>
>>
>> I presume that all the nodes need to come up so it can learn the token
>> ranges.
>>
>
>  I don't understand your question. Vnodes exist to randomly distribute
> data on each physical node into [n] virtual node chunks, 256 by default.
>
>  They do this in order to allow you to add 2 nodes to your 25 node
> cluster without rebalancing the prior 23.
>
>  The simplest way to illustrate this is to imagine a token range of 0-20
> in a 4 node cluster with RF=1.
>
>  A 0-5
> B 5-10
> C 10-15
> D 15-20 (0)
>
>  Each node has 25% of the data. If you add a new node "E", and want it to
> join with 25% of the data, there is literally nowhere you can have it join
> to accomplish this goal. You have to join it in between one of the existing
> nodes, and then move each of those nodes so that the distribution is even
> again. This is why, prior to vnodes, the best practice was to double your
> cluster size.
>
>  =Rob
> http://twitter.com/rcolidba
>
>

Re: Question about adding nodes incrementally to a new datacenter: wait til all hosts come up so they can learn the token ranges?

Posted by Donald Smith <Do...@audiencescience.com>.
Even with vnodes, when you add a node to a cluster, it takes over some portions of the token range.  If the other nodes have been running for a long time you should bootstrap the new node, so it gets old data.  Then you should run "nodetool cleanup" on the other nodes to eliminate no-longer-needed rows which now belong to the new node.

So, my point is that to avoid the need to bootstrap and to cleanup, it's better to bring all nodes up at about the same time.  If this is wrong, please explain why.


Thanks, Don

________________________________
From: Robert Coli <rc...@eventbrite.com>
Sent: Wednesday, October 15, 2014 1:54 PM
To: user@cassandra.apache.org
Subject: Re: Question about adding nodes incrementally to a new datacenter: wait til all hosts come up so they can learn the token ranges?

On Tue, Oct 14, 2014 at 4:52 PM, Donald Smith <Do...@audiencescience.com>> wrote:
Suppose I create a new DC with 25 nodes. I have their IPs in cassandra-topology.properties.  Twenty-three of the nodes start up, but two of the nodes fail to start.   If I start replicating (via "nodetool rebuild") without those two nodes, then when those 2 nodes enter the DC the distribution of tokens to vnodes will change and I'd need to rebuild or bootstrap, right?

In other words, it's better to wait til all nodes come up before we start replicating.  Does this sound right?

I presume that all the nodes need to come up so it can learn the token ranges.

I don't understand your question. Vnodes exist to randomly distribute data on each physical node into [n] virtual node chunks, 256 by default.

They do this in order to allow you to add 2 nodes to your 25 node cluster without rebalancing the prior 23.

The simplest way to illustrate this is to imagine a token range of 0-20 in a 4 node cluster with RF=1.

A 0-5
B 5-10
C 10-15
D 15-20 (0)

Each node has 25% of the data. If you add a new node "E", and want it to join with 25% of the data, there is literally nowhere you can have it join to accomplish this goal. You have to join it in between one of the existing nodes, and then move each of those nodes so that the distribution is even again. This is why, prior to vnodes, the best practice was to double your cluster size.

=Rob
http://twitter.com/rcolidba


Re: Question about adding nodes incrementally to a new datacenter: wait til all hosts come up so they can learn the token ranges?

Posted by Robert Coli <rc...@eventbrite.com>.
On Tue, Oct 14, 2014 at 4:52 PM, Donald Smith <
Donald.Smith@audiencescience.com> wrote:

>  Suppose I create a new DC with 25 nodes. I have their IPs in
> cassandra-topology.properties.  Twenty-three of the nodes start up, but two
> of the nodes fail to start.   If I start replicating (via "nodetool
> rebuild") without those two nodes, then when those 2 nodes enter the DC the
> distribution of tokens to vnodes will change and I'd need to rebuild or
> bootstrap, right?
>
>
>
> In other words, it's better to wait til all nodes come up before we start
> replicating.  Does this sound right?
>
>
>
> I presume that all the nodes need to come up so it can learn the token
> ranges.
>

I don't understand your question. Vnodes exist to randomly distribute data
on each physical node into [n] virtual node chunks, 256 by default.

They do this in order to allow you to add 2 nodes to your 25 node cluster
without rebalancing the prior 23.

The simplest way to illustrate this is to imagine a token range of 0-20 in
a 4 node cluster with RF=1.

A 0-5
B 5-10
C 10-15
D 15-20 (0)

Each node has 25% of the data. If you add a new node "E", and want it to
join with 25% of the data, there is literally nowhere you can have it join
to accomplish this goal. You have to join it in between one of the existing
nodes, and then move each of those nodes so that the distribution is even
again. This is why, prior to vnodes, the best practice was to double your
cluster size.

=Rob
http://twitter.com/rcolidba