You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Durity, Sean R" <SE...@homedepot.com> on 2018/09/04 13:05:54 UTC

RE: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

I would only run the clean-up (on all nodes) after all new nodes are added. I would also look at increasing RF to 3 (and running repair) once there are plenty of nodes. (This is assuming that availability matters and that your queries use QUORUM or LOCAL_QUORUM for consistency level.

Longer term, I agree with Oleksandr, the recommendation for number of vnodes is now much smaller than 256. I am using 8 or 16.


Sean Durity

From: Oleksandr Shulgin <ol...@zalando.de>
Sent: Monday, September 03, 2018 10:02 AM
To: User <us...@cassandra.apache.org>
Subject: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

On Mon, Sep 3, 2018 at 12:19 PM onmstester onmstester <on...@zoho.com>> wrote:
What i have understood from this part of document is that, when i already have node A,B and C in cluster  there would be some old data on A,B,C after
new node D joined the cluster completely which is data streamed to D, then if i add node E to the cluster immediately, the old data on A,B,C would be also moved between nodes everytime?

Potentially, when you add node E it takes ownership of some of the data that D has.  So you have to run cleanup on all (except the very last node you add) in the end.  It still makes sense to do this once, not after every single node you add.

--
Alex


________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

Posted by Oleksandr Shulgin <ol...@zalando.de>.
On Sat, 8 Sep 2018, 19:00 Jeff Jirsa, <jj...@gmail.com> wrote:

> Virtual nodes accomplish two primary goals
>
> 1) it makes it easier to gradually add/remove capacity to your cluster by
> distributing the new host capacity around the ring in smaller increments
>
> 2) it increases the number of sources for streaming, which speeds up
> bootstrap and decommission
>
> Whether or not either of these actually is true depends on a number of
> factors, like your cluster size (for #1) and your replication factor (for
> #2). If you have 4 hosts and 4 tokens per host and add a 5th host, you’ll
> probably add a neighbor near each existing host (#1) and stream from every
> other host (#2), so that’s great. If you have 20 hosts and add a new host
> with 4 tokens, most of your existing ranges won’t change at all - you’re
> nominally adding 5% of your cluster capacity but you won’t see a 5%
> improvement because you don’t have enough tokens to move 5% of your ranges.
> If you had 32 tokens, you’d probably actually see that 5% improvement,
> because you’d likely add a new range near each of the existing ranges.
>

Jeff,

I'm a bit lost here: are you referring to streaming speed improvement or
cluster capacity increase?

Going down to 1 token would mean you’d probably need to manually move
> tokens after each bootstrap to rebalance, which is fine, it just takes more
> operator awareness.
>

Right. This is then the old story before vnodes that you can only scale out
and keep balanced cluster if you double the number of nodes. Or you can
move the tokens.

What's not clear to me is why 4 tokens (as opposed to only 1) should be
enough for adding small number of nodes and keeping the balance.

Assuming we have 3 racks, we would add 3 nodes at a time for scaling out.
With 4 tokens we split only 12 ranges across the ring this way. I would
think it depends on the current cluster size, but empirically the load skew
at first gets worse (for middle-sized clusters) and then probably is
cancelled out for bigger sizes. Did anyone tried to do the actual math for
this?

I don’t know how DSE calculates which replication factor to use for their
> token allocation logic, maybe they guess or take the highest or something.
> Cassandra doesn’t - we require you to be explicit, but we could probably do
> better here.
>

I believe that DSE also doesn't calculate it--you specify the RF to
optimize for in the config. At least their config parameter is called
allocate_tokens_for_local_replication_factor:
https://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/config/configVnodes.html

That being said, I have never used DSE, hence was my question.

Cheers,
--
Alex

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
Your example only really applies if someone is using a 20 node cluster at
RF=1, something I've never seen, but I'm sure exists somewhere.
Realistically, RF=3 using racks (or AWS regions) and 21 nodes, means you'll
have 3 racks with 7 nodes per rack.  Adding a single node is an unlikely
operation, you'd probably add 3, one in each region / rack.  In that case
you'd distribute the load from 7 to 8 nodes, a 14% improvement, and for
that number it would help every node even with only 4 tokens.

Generally speaking I would expand cluster capacity by a percentage (say
15-20%), so you'd be looking at adding a handful of nodes to each rack,
which will still be beneficial when using only 4 tokens.

On Sat, Sep 8, 2018 at 1:00 PM Jeff Jirsa <jj...@gmail.com> wrote:

> Virtual nodes accomplish two primary goals
>
> 1) it makes it easier to gradually add/remove capacity to your cluster by
> distributing the new host capacity around the ring in smaller increments
>
> 2) it increases the number of sources for streaming, which speeds up
> bootstrap and decommission
>
> Whether or not either of these actually is true depends on a number of
> factors, like your cluster size (for #1) and your replication factor (for
> #2). If you have 4 hosts and 4 tokens per host and add a 5th host, you’ll
> probably add a neighbor near each existing host (#1) and stream from every
> other host (#2), so that’s great. If you have 20 hosts and add a new host
> with 4 tokens, most of your existing ranges won’t change at all - you’re
> nominally adding 5% of your cluster capacity but you won’t see a 5%
> improvement because you don’t have enough tokens to move 5% of your ranges.
> If you had 32 tokens, you’d probably actually see that 5% improvement,
> because you’d likely add a new range near each of the existing ranges.
>
> Going down to 1 token would mean you’d probably need to manually move
> tokens after each bootstrap to rebalance, which is fine, it just takes more
> operator awareness.
>
> I don’t know how DSE calculates which replication factor to use for their
> token allocation logic, maybe they guess or take the highest or something.
> Cassandra doesn’t - we require you to be explicit, but we could probably do
> better here.
>
>
>
> On Sep 8, 2018, at 8:17 AM, Oleksandr Shulgin <
> oleksandr.shulgin@zalando.de> wrote:
>
> On Sat, 8 Sep 2018, 14:47 Jonathan Haddad, <jo...@jonhaddad.com> wrote:
>
>> 256 tokens is a pretty terrible default setting especially post 3.0.  I
>> recommend folks use 4 tokens for new clusters,
>>
>
> I wonder why not setting it to all the way down to 1 then? What's the key
> difference once you have so few vnodes?
>
> with some caveats.
>>
>
> And those are?
>
> When you fire up a cluster, there's no way to make the initial tokens be
>> distributed evenly, you'll get random ones.  You'll want to set them
>> explicitly using:
>>
>> python -c 'print( [str(((2**64 / 4) * i) - 2**63) for i in range(4)])'
>>
>>
>> After you fire up the first seed, create a keyspace using RF=3 (or
>> whatever you're planning on using) and set allocate_tokens_for_keyspace to
>> that keyspace in your config, and join the rest of the nodes.  That gives
>> even distribution.
>>
>
> Do you possibly know if the DSE-style option which doesn't require a
> keyspace to be there also works to allocate evenly distributed tokens for
> the very first seed node?
>
> Thanks,
> --
> Alex
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

Posted by onmstester onmstester <on...@zoho.com>.
Thanks Jeff, You mean that with RF=2, num_tokens = 256 and having less than 256 nodes i should not worry about data distribution? Sent using Zoho Mail ---- On Sat, 08 Sep 2018 21:30:28 +0430 Jeff Jirsa <jj...@gmail.com> wrote ---- Virtual nodes accomplish two primary goals 1) it makes it easier to gradually add/remove capacity to your cluster by distributing the new host capacity around the ring in smaller increments 2) it increases the number of sources for streaming, which speeds up bootstrap and decommission Whether or not either of these actually is true depends on a number of factors, like your cluster size (for #1) and your replication factor (for #2). If you have 4 hosts and 4 tokens per host and add a 5th host, you’ll probably add a neighbor near each existing host (#1) and stream from every other host (#2), so that’s great. If you have 20 hosts and add a new host with 4 tokens, most of your existing ranges won’t change at all - you’re nominally adding 5% of your cluster capacity but you won’t see a 5% improvement because you don’t have enough tokens to move 5% of your ranges. If you had 32 tokens, you’d probably actually see that 5% improvement, because you’d likely add a new range near each of the existing ranges. Going down to 1 token would mean you’d probably need to manually move tokens after each bootstrap to rebalance, which is fine, it just takes more operator awareness. I don’t know how DSE calculates which replication factor to use for their token allocation logic, maybe they guess or take the highest or something. Cassandra doesn’t - we require you to be explicit, but we could probably do better here.

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

Posted by Jeff Jirsa <jj...@gmail.com>.
Virtual nodes accomplish two primary goals

1) it makes it easier to gradually add/remove capacity to your cluster by distributing the new host capacity around the ring in smaller increments

2) it increases the number of sources for streaming, which speeds up bootstrap and decommission

Whether or not either of these actually is true depends on a number of factors, like your cluster size (for #1) and your replication factor (for #2). If you have 4 hosts and 4 tokens per host and add a 5th host, you’ll probably add a neighbor near each existing host (#1) and stream from every other host (#2), so that’s great. If you have 20 hosts and add a new host with 4 tokens, most of your existing ranges won’t change at all - you’re nominally adding 5% of your cluster capacity but you won’t see a 5% improvement because you don’t have enough tokens to move 5% of your ranges. If you had 32 tokens, you’d probably actually see that 5% improvement, because you’d likely add a new range near each of the existing ranges.

Going down to 1 token would mean you’d probably need to manually move tokens after each bootstrap to rebalance, which is fine, it just takes more operator awareness.

I don’t know how DSE calculates which replication factor to use for their token allocation logic, maybe they guess or take the highest or something. Cassandra doesn’t - we require you to be explicit, but we could probably do better here.



> On Sep 8, 2018, at 8:17 AM, Oleksandr Shulgin <ol...@zalando.de> wrote:
> 
>> On Sat, 8 Sep 2018, 14:47 Jonathan Haddad, <jo...@jonhaddad.com> wrote:
>> 256 tokens is a pretty terrible default setting especially post 3.0.  I recommend folks use 4 tokens for new clusters,
> 
> 
> I wonder why not setting it to all the way down to 1 then? What's the key difference once you have so few vnodes?
> 
>> with some caveats.
> 
> 
> And those are?
> 
>> When you fire up a cluster, there's no way to make the initial tokens be distributed evenly, you'll get random ones.  You'll want to set them explicitly using:
>> 
>> python -c 'print( [str(((2**64 / 4) * i) - 2**63) for i in range(4)])'
>> 
>> After you fire up the first seed, create a keyspace using RF=3 (or whatever you're planning on using) and set allocate_tokens_for_keyspace to that keyspace in your config, and join the rest of the nodes.  That gives even distribution.
> 
> 
> Do you possibly know if the DSE-style option which doesn't require a keyspace to be there also works to allocate evenly distributed tokens for the very first seed node?
> 
> Thanks,
> --
> Alex
> 

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
> I wonder why not setting it to all the way down to 1 then? What's the key
difference once you have so few vnodes?

4 tokens lets you have balanced clusters when they're small and imposes
very little overhead when they get big.  Using multiple tokens let's
multiple nodes stream data to the new node, which helps keeps bootstrap
times down.

> And those are?

That using 4 tokens by itself isn't enough, you need to start the cluster
out in a manner that distributes the initial tokens, using the technique I
listed.

> Do you possibly know if the DSE-style option which doesn't require a
keyspace to be there also works to allocate evenly distributed tokens for
the very first seed node?

I have no idea what DSE does, sorry.  I don't have access to the source.
Sounds like a question for their slack channel.

On Sat, Sep 8, 2018 at 11:17 AM Oleksandr Shulgin <
oleksandr.shulgin@zalando.de> wrote:

> On Sat, 8 Sep 2018, 14:47 Jonathan Haddad, <jo...@jonhaddad.com> wrote:
>
>> 256 tokens is a pretty terrible default setting especially post 3.0.  I
>> recommend folks use 4 tokens for new clusters,
>>
>
> I wonder why not setting it to all the way down to 1 then? What's the key
> difference once you have so few vnodes?
>
> with some caveats.
>>
>
> And those are?
>
> When you fire up a cluster, there's no way to make the initial tokens be
>> distributed evenly, you'll get random ones.  You'll want to set them
>> explicitly using:
>>
>> python -c 'print( [str(((2**64 / 4) * i) - 2**63) for i in range(4)])'
>>
>>
>> After you fire up the first seed, create a keyspace using RF=3 (or
>> whatever you're planning on using) and set allocate_tokens_for_keyspace to
>> that keyspace in your config, and join the rest of the nodes.  That gives
>> even distribution.
>>
>
> Do you possibly know if the DSE-style option which doesn't require a
> keyspace to be there also works to allocate evenly distributed tokens for
> the very first seed node?
>
> Thanks,
> --
> Alex
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

Posted by Oleksandr Shulgin <ol...@zalando.de>.
On Sat, 8 Sep 2018, 14:47 Jonathan Haddad, <jo...@jonhaddad.com> wrote:

> 256 tokens is a pretty terrible default setting especially post 3.0.  I
> recommend folks use 4 tokens for new clusters,
>

I wonder why not setting it to all the way down to 1 then? What's the key
difference once you have so few vnodes?

with some caveats.
>

And those are?

When you fire up a cluster, there's no way to make the initial tokens be
> distributed evenly, you'll get random ones.  You'll want to set them
> explicitly using:
>
> python -c 'print( [str(((2**64 / 4) * i) - 2**63) for i in range(4)])'
>
>
> After you fire up the first seed, create a keyspace using RF=3 (or
> whatever you're planning on using) and set allocate_tokens_for_keyspace to
> that keyspace in your config, and join the rest of the nodes.  That gives
> even distribution.
>

Do you possibly know if the DSE-style option which doesn't require a
keyspace to be there also works to allocate evenly distributed tokens for
the very first seed node?

Thanks,
--
Alex

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
Keep using whatever settings you've been using.  I'd still use allocate
tokens for keyspace but it probably won't make much of a difference with
256 tokens.

On Sat, Sep 8, 2018 at 10:40 AM onmstester onmstester <on...@zoho.com>
wrote:

> Thanks Jon,
> But i never concerned about num_tokens config before, because no official
> cluster setup documents (on datastax:
> https://docs.datastax.com/en/cassandra/3.0/cassandra/initialize/initSingleDS.html
> or other blogs) warned us-beginners to be concerned about it.
> I always setup my clusters with nodes having same hardware spec
> (homogeneous) and num_tokens = 256, and data seems to be evenly
> distributed, at least nodetool status report it that way + killing any
> node, i still got all of my data and application was working, So i assumed
> data perfectly and evenly distributed among nodes.
> So could you please explain more why should i run that python command and
> config allocate_tokens_for_keyspace? i only have one keyspace per cluster.
> Im using Network replication strategy, and a rack-aware topology config.
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
> ---- On Sat, 08 Sep 2018 17:17:10 +0430 *Jonathan Haddad
> <jon@jonhaddad.com <jo...@jonhaddad.com>>* wrote ----
>
> 256 tokens is a pretty terrible default setting especially post 3.0.  I
> recommend folks use 4 tokens for new clusters, with some caveats.
>
> When you fire up a cluster, there's no way to make the initial tokens be
> distributed evenly, you'll get random ones.  You'll want to set them
> explicitly using:
>
> python -c 'print( [str(((2**64 / 4) * i) - 2**63) for i in range(4)])'
>
>
> After you fire up the first seed, create a keyspace using RF=3 (or
> whatever you're planning on using) and set allocate_tokens_for_keyspace to
> that keyspace in your config, and join the rest of the nodes.  That gives
> even distribution.
>
> On Sat, Sep 8, 2018 at 1:40 AM onmstester onmstester <on...@zoho.com>
> wrote:
>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>
>
>
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

Posted by onmstester onmstester <on...@zoho.com>.
Thanks Jon, But i never concerned about num_tokens config before, because no official cluster setup documents (on datastax: https://docs.datastax.com/en/cassandra/3.0/cassandra/initialize/initSingleDS.html or other blogs) warned us-beginners to be concerned about it. I always setup my clusters with nodes having same hardware spec (homogeneous) and num_tokens = 256, and data seems to be evenly distributed, at least nodetool status report it that way + killing any node, i still got all of my data and application was working, So i assumed data perfectly and evenly distributed among nodes. So could you please explain more why should i run that python command and config allocate_tokens_for_keyspace? i only have one keyspace per cluster. Im using Network replication strategy, and a rack-aware topology config. Sent using Zoho Mail ---- On Sat, 08 Sep 2018 17:17:10 +0430 Jonathan Haddad <jo...@jonhaddad.com> wrote ---- 256 tokens is a pretty terrible default setting especially post 3.0.  I recommend folks use 4 tokens for new clusters, with some caveats.  When you fire up a cluster, there's no way to make the initial tokens be distributed evenly, you'll get random ones.  You'll want to set them explicitly using: python -c 'print( [str(((2**64 / 4) * i) - 2**63) for i in range(4)])' After you fire up the first seed, create a keyspace using RF=3 (or whatever you're planning on using) and set allocate_tokens_for_keyspace to that keyspace in your config, and join the rest of the nodes.  That gives even distribution. On Sat, Sep 8, 2018 at 1:40 AM onmstester onmstester <on...@zoho.com> wrote: -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
256 tokens is a pretty terrible default setting especially post 3.0.  I
recommend folks use 4 tokens for new clusters, with some caveats.

When you fire up a cluster, there's no way to make the initial tokens be
distributed evenly, you'll get random ones.  You'll want to set them
explicitly using:

python -c 'print( [str(((2**64 / 4) * i) - 2**63) for i in range(4)])'


After you fire up the first seed, create a keyspace using RF=3 (or whatever
you're planning on using) and set allocate_tokens_for_keyspace to that
keyspace in your config, and join the rest of the nodes.  That gives even
distribution.

On Sat, Sep 8, 2018 at 1:40 AM onmstester onmstester <on...@zoho.com>
wrote:

> Why not setting default vnodes count to that recommendation in Cassandra
> installation files?
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
> ---- On Tue, 04 Sep 2018 17:35:54 +0430 *Durity, Sean R
> <SEAN_R_DURITY@homedepot.com <SE...@homedepot.com>>* wrote ----
>
>
>
> Longer term, I agree with Oleksandr, the recommendation for number of
> vnodes is now much smaller than 256. I am using 8 or 16.
>
>
>
>
>
> Sean Durity
>
>
>
>
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

RE: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

Posted by onmstester onmstester <on...@zoho.com>.
Why not setting default vnodes count to that recommendation in Cassandra installation files?  Sent using Zoho Mail ---- On Tue, 04 Sep 2018 17:35:54 +0430 Durity, Sean R <SE...@homedepot.com> wrote ----   Longer term, I agree with Oleksandr, the recommendation for number of vnodes is now much smaller than 256. I am using 8 or 16.     Sean Durity