You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Jon Haddad <jo...@jonhaddad.com> on 2020/02/03 21:45:21 UTC

Re: [Discuss] num_tokens default in Cassandra 4.0

I think it's a good idea to take a step back and get a high level view of
the problem we're trying to solve.

First, high token counts result in decreased availability as each node has
data overlap with with more nodes in the cluster.  Specifically, a node can
share data with RF-1 * 2 * num_tokens.  So a 256 token cluster at RF=3 is
going to almost always share data with every other node in the cluster that
isn't in the same rack, unless you're doing something wild like using more
than a thousand nodes in a cluster.  We advertise

With 16 tokens, that is vastly improved, but you still have up to 64 nodes
each node needs to query against, so you're again, hitting every node
unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs).  I
wouldn't use 16 here, and I doubt any of you would either.  I've advocated
for 4 tokens because you'd have overlap with only 16 nodes, which works
well for small clusters as well as large.  Assuming I was creating a new
cluster for myself (in a hypothetical brand new application I'm building) I
would put this in production.  I have worked with several teams where I
helped them put 4 token clusters in prod and it has worked very well.  We
didn't see any wild imbalance issues.

As Mick's pointed out, our current method of using random token assignment
for the default number of problematic for 4 tokens.  I fully agree with
this, and I think if we were to try to use 4 tokens, we'd want to address
this in tandem.  We can discuss how to better allocate tokens by default
(something more predictable than random), but I'd like to avoid the
specifics of that for the sake of this email.

To Alex's point, repairs are problematic with lower token counts due to
over streaming.  I think this is a pretty serious issue and I we'd have to
address it before going all the way down to 4.  This, in my opinion, is a
more complex problem to solve and I think trying to fix it here could make
shipping 4.0 take even longer, something none of us want.

For the sake of shipping 4.0 without adding extra overhead and time, I'm ok
with moving to 16 tokens, and in the process adding extensive documentation
outlining what we recommend for production use.  I think we should also try
to figure out something better than random as the default to fix the data
imbalance issues.  I've got a few ideas here I've been noodling on.

As long as folks are fine with potentially changing the default again in C*
5.0 (after another discussion / debate), 16 is enough of an improvement
that I'm OK with the change, and willing to author the docs to help people
set up their first cluster.  For folks that go into production with the
defaults, we're at least not setting them up for total failure once their
clusters get large like we are now.

In future versions, we'll probably want to address the issue of data
imbalance by building something in that shifts individual tokens around.  I
don't think we should try to do this in 4.0 either.

Jon

On Fri, Jan 31, 2020 at 2:04 PM Jeremy Hanna <je...@gmail.com>
wrote:

> I think Mick and Anthony make some valid operational and skew points for
> smaller/starting clusters with 4 num_tokens. There’s an arbitrary line
> between small and large clusters but I think most would agree that most
> clusters are on the small to medium side. (A small nuance is afaict the
> probabilities have to do with quorum on a full token range, ie it has to do
> with the size of a datacenter not the full cluster
>
> As I read this discussion I’m personally more inclined to go with 16 for
> now. It’s true that if we could fix the skew and topology gotchas for those
> starting things up, 4 would be ideal from an availability perspective.
> However we’re still in the brainstorming stage for how to address those
> challenges. I think we should create tickets for those issues and go with
> 16 for 4.0.
>
> This is about an out of the box experience. It balances availability,
> operations (such as skew and general bootstrap friendliness and
> streaming/repair), and cluster sizing. Balancing all of those, I think for
> now I’m more comfortable with 16 as the default with docs on considerations
> and tickets to unblock 4 as the default for all users.
>
> >>> On Feb 1, 2020, at 6:30 AM, Jeff Jirsa <jj...@gmail.com> wrote:
> >> On Fri, Jan 31, 2020 at 11:25 AM Joseph Lynch <jo...@gmail.com>
> wrote:
> >> I think that we might be bikeshedding this number a bit because it is
> easy
> >> to debate and there is not yet one right answer.
> >
> >
> > https://www.youtube.com/watch?v=v465T5u9UKo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by Rahul Singh <ra...@gmail.com>.

+1 on 8

rahul.xavier.singh@gmail.com

http://cassandra.link
The Apache Cassandra Knowledge Base.
On Feb 17, 2020, 5:20 PM -0500, Erick Ramirez <er...@datastax.com>, wrote:
> +1 on 8 tokens. I'd personally like us to be able to move this along pretty
> quickly as it's confusing for users looking for direction. Cheers!
>
> On Tue, 18 Feb 2020, 9:14 am Jeremy Hanna, <je...@gmail.com>
> wrote:
>
> > I just wanted to close the loop on this if possible. After some discussion
> > in slack about various topics, I would like to see if people are okay with
> > num_tokens=8 by default (as it's not much different operationally than
> > 16). Joey brought up a few small changes that I can put on the ticket. It
> > also requires some documentation for things like decommission order and
> > skew.
> >
> > Are people okay with this change moving forward like this? If so, I'll
> > comment on the ticket and we can move forward.
> >
> > Thanks,
> >
> > Jeremy
> >

Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by Erick Ramirez <er...@datastax.com>.

+1 on 8 tokens. I'd personally like us to be able to move this along pretty
quickly as it's confusing for users looking for direction. Cheers!

On Tue, 18 Feb 2020, 9:14 am Jeremy Hanna, <je...@gmail.com>
wrote:

> I just wanted to close the loop on this if possible.  After some discussion
> in slack about various topics, I would like to see if people are okay with
> num_tokens=8 by default (as it's not much different operationally than
> 16).  Joey brought up a few small changes that I can put on the ticket.  It
> also requires some documentation for things like decommission order and
> skew.
>
> Are people okay with this change moving forward like this?  If so, I'll
> comment on the ticket and we can move forward.
>
> Thanks,
>
> Jeremy
>

Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by Jeremy Hanna <je...@gmail.com>.

Just to close the loop on this,
https://issues.apache.org/jira/browse/CASSANDRA-13701 is getting tested
now.  The project testing will get updated to utilize the new defaults
(both num_tokens and using the new allocation algorithm by uncommenting
allocate_tokens_for_local_replication_factor: 3.  Jon did some
documentation on num_tokens on
https://cassandra.apache.org/doc/latest/getting_started/production.html#tokens
on a separate ticket he mentioned -
https://issues.apache.org/jira/browse/CASSANDRA-15600.  The new default in
Cassandra 4.0+ will be to use the new allocation algorithm with num_tokens:
16.  There is a note in the NEWS.txt about upgrading and bootstrapping.  It
is a lot of effort to change this once it is set, so hopefully new users
will be in a much better place out of the box.  Thanks everyone for your
efforts in this.

On Wed, Apr 1, 2020 at 4:28 PM Jeremy Hanna <je...@gmail.com>
wrote:

> As discussed, let's go with 16.  Speaking with Anthony privately as well,
> I had forgotten that some of the analysis that Branimir had initially done
> on the skew and allocation may have been internal to DataStax so I should
> have mentioned that previously.  Thanks to Mick, Alex, and Anthony for
> doing this analysis and helping back the decision with data.  This will
> benefit many that start with Cassandra that don't know that 256 is a bad
> number and end up with a hard to change decision later.  I assigned myself
> to https://issues.apache.org/jira/browse/CASSANDRA-13701.  Thanks all.
>
> On Wed, Mar 11, 2020 at 6:02 AM Mick Semb Wever <mc...@apache.org> wrote:
>
>>
>>
>> > I propose we drop it to 16 immediately.  I'll add the production docs
>> > in CASSANDRA-15618 with notes on token count, the reasons why you'd
>> want 1,
>> > 4, or 16.  As a follow up, if we can get a token simulation written we
>> can
>> > try all sorts of topologies with whatever token algorithms we want.
>> Once
>> > that simulation is written and we've got some reports we can revisit.
>>
>>
>> This works for me, for our first step forward.
>> Good docs will always empower users more than any default setting can!
>>
>> cheers,
>> Mick
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
>>

Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by Jeremy Hanna <je...@gmail.com>.

As discussed, let's go with 16.  Speaking with Anthony privately as well, I
had forgotten that some of the analysis that Branimir had initially done on
the skew and allocation may have been internal to DataStax so I should have
mentioned that previously.  Thanks to Mick, Alex, and Anthony for doing
this analysis and helping back the decision with data.  This will benefit
many that start with Cassandra that don't know that 256 is a bad number and
end up with a hard to change decision later.  I assigned myself to
https://issues.apache.org/jira/browse/CASSANDRA-13701.  Thanks all.

On Wed, Mar 11, 2020 at 6:02 AM Mick Semb Wever <mc...@apache.org> wrote:

>
>
> > I propose we drop it to 16 immediately.  I'll add the production docs
> > in CASSANDRA-15618 with notes on token count, the reasons why you'd want
> 1,
> > 4, or 16.  As a follow up, if we can get a token simulation written we
> can
> > try all sorts of topologies with whatever token algorithms we want.  Once
> > that simulation is written and we've got some reports we can revisit.
>
>
> This works for me, for our first step forward.
> Good docs will always empower users more than any default setting can!
>
> cheers,
> Mick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by Mick Semb Wever <mc...@apache.org>.

 
> I propose we drop it to 16 immediately.  I'll add the production docs
> in CASSANDRA-15618 with notes on token count, the reasons why you'd want 1,
> 4, or 16.  As a follow up, if we can get a token simulation written we can
> try all sorts of topologies with whatever token algorithms we want.  Once
> that simulation is written and we've got some reports we can revisit.


This works for me, for our first step forward.
Good docs will always empower users more than any default setting can!

cheers,
Mick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by Jon Haddad <jo...@jonhaddad.com>.

There's a lot going on here... hopefully I can respond to everything in a
coherent manner.

> Perhaps a simple way to avoid this is to update the random allocation
algorithm to re-generate tokens when the ranges created do not have a good
size distribution?

Instead of using random tokens for the first node, I think we'd be better
off picking a random initial token then using an even distribution around
the ring, using the first token as an offset.  The main benefit of random
is that we don't get collisions, not the distribution.  I haven't read
through the change in CASSANDRA-15600, maybe it addresses this problem
already, if so we can ignore my suggestion here.

> Clusters where we have used num_tokens 4 we have regretted.
> While we accept the validity and importance of the increased availability
provided by num_tokens 4, we have never seen or used it in practice.

While we worked together, I personally moved quite a few clusters to 4
tokens, and didn't run into any balance issues.  I'm not sure why you're
saying you've never seen it in practice, I did it with a whole bunch of our
clients.

Mick Said:

> We know of a number of production clusters that have been set up this
way. I am unaware of any Cassandra docs or community recommendations that
say you should avoid doing this. So, this is a problem regardless of the
value for num_tokens.

Paulo:

> Having the number of racks not a multiple of the replication factor is
not a good practice since it can lead to imbalance and other problems like
this, so we should not only document this but perhaps add a warning or even
hard fail when this is encountered during node startup?

Agreed on both the above - I intend to document this in CASSANDRA-15618.

Mick, from your test:

>  Each cluster was configured with one rack.

This is an important nuance of the results you're seeing.  It sounds like
the test covers the edge case of using a single rack / AZ for an entire
cluster.  I can't remember too many times where I actually saw this, of the
several hundred clusters I looked at over the almost 4 years I was at TLP.
   This isn't to say it's not out there in the wild, but I don't think it
should drive us to pick a token count.  We can probably do better than
using a completely random algorithm for the corner case of using a single
rack or fewer racks than RF, and we should also encourage people to run
Cassandra in a way that doesn't set themselves up for a gunshot to the foot.

In a world of tradeoffs, I'm still not convinced that 16 tokens makes any
sense as a default.  Assuming we can fix the worst case random imbalance in
small clusters, 4 is a significantly better option as it will make it
easier for teams to scale Cassandra out the way we claim they can.  Using
16 tokens brings an unnecessary (and probably unknown) ceiling to people's
abilities to scale and for the *majority* of clusters where people pick
Cassandra for scalability and availability it's still too high.  I'd rather
us put a default that works best for the majority of people and document
the cases where people might want to deviate from it, rather than picking a
somewhat crappy (but better than 256) default.

That said, we don't have the better token distribution yet, so if we're
going to assume people just put C* in production with minimal configuration
changes, 16 will help us deal with the imbalance issues *today*.  We know
it works better than 256, so I'm willing to take this as a win *today*, on
the assumption that folks are OK changing this value again before we
release 4.0 if we find we can make it work without the super sharp edges
that we can currently stab ourselves with.  I'd much rather ship C* with 16
tokens than 256, and I don't want to keep debating this so much we don't
end up making any change at all.

I propose we drop it to 16 immediately.  I'll add the production docs
in CASSANDRA-15618 with notes on token count, the reasons why you'd want 1,
4, or 16.  As a follow up, if we can get a token simulation written we can
try all sorts of topologies with whatever token algorithms we want.  Once
that simulation is written and we've got some reports we can revisit.

Eventually we'll probably need to add the ability for folks to fix cluster
imbalances without adding / removing hardware, but I suspect we've got a
fair amount of plumbing to rework to make something like that doable.

Jon

On Mon, Mar 9, 2020 at 5:03 AM Paulo Motta <pa...@gmail.com> wrote:

> Great investigation, good job guys!
>
> > Personally I would have liked to have seen even more iterations. While
> 14 run iterations gives an indication, the average of randomness is not
> what is important here. What concerns me is the consequence to imbalances
> as the cluster grows when you're very unlucky with initial random tokens,
> for example when random tokens land very close together. The token
> allocation can deal with breaking up large token ranges but is unable to do
> anything about such tiny token ranges. Even a bad 1-in-a-100 experience
> should be a consideration when picking a default num_tokens.
>
> Perhaps a simple way to avoid this is to update the random allocation
> algorithm to re-generate tokens when the ranges created do not have a good
> size distribution?
>
> > But it can be worse, for example if you have RF=3 and only two racks
> then you will only get random tokens. We know of a number of production
> clusters that have been set up this way. I am unaware of any Cassandra docs
> or community recommendations that say you should avoid doing this. So, this
> is a problem regardless of the value for num_tokens.
>
> Having the number of racks not a multiple of the replication factor is not
> a good practice since it can lead to imbalance and other problems like
> this, so we should not only document this but perhaps add a warning or even
> hard fail when this is encountered during node startup?
>
> Cheers,
>
> Paulo
>
> Em seg., 9 de mar. de 2020 às 08:25, Mick Semb Wever <mc...@apache.org>
> escreveu:
>
>>
>> > Can we ask for some analysis and data against the risks different
>> > num_tokens choices present. We shouldn't rush into a new default, and
>> such
>> > background information and data is operator value added.
>>
>>
>> Thanks for everyone's patience on this topic.
>> The following is further input on a number of fronts.
>>
>>
>> ** Analysis of Token Distributions
>>
>> The following is work done by Alex Dejanovski and Anthony Grasso. It
>> builds upon their previous work at The Last Pickle and why we recommend 16
>> as the best value to clients. (Please buy beers for these two for the
>> effort they have done here.)
>>
>> The following three graphs show the ranges of imbalance that occur on
>> clusters growing from 4 nodes to 12 nodes, for the different values of
>> num_tokens: 4, 8 and 16. The range is based on 14 run iterations (except 16
>> which only got ten).
>>
>>
>> num_tokens: 4
>>
>>
>> num_tokens: 8
>>
>>
>> num_tokens: 16
>>
>> These graphs were generated using clusters created in AWS by tlp-cluster (
>> https://github.com/thelastpickle/tlp-cluster). A script was written to
>> automate the testing and generate the data for each value of num_tokens.
>> Each cluster was configured with one rack.  Of course these interpretations
>> are debatable. The data to the graphs is in
>> https://docs.google.com/spreadsheets/d/1gPZpSOUm3_pSCo9y-ZJ8WIctpvXNr5hDdupJ7K_9PHY/edit?usp=sharing
>>
>>
>> What I see from these graphs is…
>>  a)  token allocation is pretty good are fixing initial bad random token
>> imbalances. By the time you are at 12 nodes, presuming you have setup the
>> cluster correctly so that token allocation actually works, your nodes will
>> be balanced with num_tokens 4 or greater.
>>  b) you need to get to ~12 nodes with num_tokens 4 to have a good balance.
>>  c) you need to get to ~9 nodes with num_token 8 to have a good balance.
>>  d) you need to get to ~6 nodes with num_tokens 16 to have a good balance.
>>
>> Personally I would have liked to have seen even more iterations. While 14
>> run iterations gives an indication, the average of randomness is not what
>> is important here. What concerns me is the consequence to imbalances as the
>> cluster grows when you're very unlucky with initial random tokens, for
>> example when random tokens land very close together. The token allocation
>> can deal with breaking up large token ranges but is unable to do anything
>> about such tiny token ranges. Even a bad 1-in-a-100 experience should be a
>> consideration when picking a default num_tokens.
>>
>>
>> ** When does the Token Allocation work…
>>
>> This has been touched on already in this thread. There are cases where
>> token allocation fails to kick in. The first node in up to RF racks
>> generates random tokens, this typically means the first three nodes.
>>
>> But it can be worse, for example if you have RF=3 and only two racks then
>> you will only get random tokens. We know of a number of production clusters
>> that have been set up this way. I am unaware of any Cassandra docs or
>> community recommendations that say you should avoid doing this. So, this is
>> a problem regardless of the value for num_tokens.
>>
>>
>> ** Algorithmic token allocation does not handle the racks = RF case well (
>> CASSANDRA-15600 <https://issues.apache.org/jira/browse/CASSANDRA-15600>)
>>
>> This recently landed in trunk. My understanding is that this improves the
>> situation the graphs cover, but not the situation just described where a DC
>> has 1>racks>RF.  Ekaterina, maybe you could elaborate?
>>
>>
>> ** Decommissioning Nodes
>>
>> Elasticity is a feature to Cassandra. The operational costs to Cassandra
>> are a real consideration. A reduction from a 9 node cluster back to a 6
>> node cluster does happen often enough. Decommissioning nodes on smaller
>> clusters have the greatest operational cost savings yet will suffer most
>> from too low a num_tokens setup.
>>
>>
>> ** Recommendations from Cassandra Consulting Companies
>>
>> My understanding is that DataStax recommends num_tokens 8, while
>> Instaclustr and The Last Pickle have both recommended 16. Interestingly
>> enough those that are pushing for num_tokens 4,  are using today num_tokens
>> 1 (and are already sitting with a lot of in-house C* experience).
>>
>>
>> ** Keeping it Real
>>
>> Clusters where we have used num_tokens 4 we have regretted. This and past
>> analysis work, similar to above, had led us to use 16 num_tokens. Cost
>> optimisation of clusters is one of the key user concerns out there, and we
>> have witnessed problems on this front with num_tokens 4.
>>
>> While we accept the validity and importance of the increased availability
>> provided by num_tokens 4, we have never seen or used it in practice. The
>> default value of num_tokens is important. The value of 256 has been good
>> business for consultants, it was a bad choice for clusters and difficult to
>> change. A new default should be chosen wisely.
>>
>>
>> regards,
>> Mick, Anthony, Alex
>>
>>

Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by Paulo Motta <pa...@gmail.com>.

Great investigation, good job guys!

> Personally I would have liked to have seen even more iterations. While 14
run iterations gives an indication, the average of randomness is not what
is important here. What concerns me is the consequence to imbalances as the
cluster grows when you're very unlucky with initial random tokens, for
example when random tokens land very close together. The token allocation
can deal with breaking up large token ranges but is unable to do anything
about such tiny token ranges. Even a bad 1-in-a-100 experience should be a
consideration when picking a default num_tokens.

Perhaps a simple way to avoid this is to update the random allocation
algorithm to re-generate tokens when the ranges created do not have a good
size distribution?

> But it can be worse, for example if you have RF=3 and only two racks then
you will only get random tokens. We know of a number of production clusters
that have been set up this way. I am unaware of any Cassandra docs or
community recommendations that say you should avoid doing this. So, this is
a problem regardless of the value for num_tokens.

Having the number of racks not a multiple of the replication factor is not
a good practice since it can lead to imbalance and other problems like
this, so we should not only document this but perhaps add a warning or even
hard fail when this is encountered during node startup?

Cheers,

Paulo

Em seg., 9 de mar. de 2020 às 08:25, Mick Semb Wever <mc...@apache.org>
escreveu:

>
> > Can we ask for some analysis and data against the risks different
> > num_tokens choices present. We shouldn't rush into a new default, and
> such
> > background information and data is operator value added.
>
>
> Thanks for everyone's patience on this topic.
> The following is further input on a number of fronts.
>
>
> ** Analysis of Token Distributions
>
> The following is work done by Alex Dejanovski and Anthony Grasso. It
> builds upon their previous work at The Last Pickle and why we recommend 16
> as the best value to clients. (Please buy beers for these two for the
> effort they have done here.)
>
> The following three graphs show the ranges of imbalance that occur on
> clusters growing from 4 nodes to 12 nodes, for the different values of
> num_tokens: 4, 8 and 16. The range is based on 14 run iterations (except 16
> which only got ten).
>
>
> num_tokens: 4
>
>
> num_tokens: 8
>
>
> num_tokens: 16
>
> These graphs were generated using clusters created in AWS by tlp-cluster (
> https://github.com/thelastpickle/tlp-cluster). A script was written to
> automate the testing and generate the data for each value of num_tokens.
> Each cluster was configured with one rack.  Of course these interpretations
> are debatable. The data to the graphs is in
> https://docs.google.com/spreadsheets/d/1gPZpSOUm3_pSCo9y-ZJ8WIctpvXNr5hDdupJ7K_9PHY/edit?usp=sharing
>
>
> What I see from these graphs is…
>  a)  token allocation is pretty good are fixing initial bad random token
> imbalances. By the time you are at 12 nodes, presuming you have setup the
> cluster correctly so that token allocation actually works, your nodes will
> be balanced with num_tokens 4 or greater.
>  b) you need to get to ~12 nodes with num_tokens 4 to have a good balance.
>  c) you need to get to ~9 nodes with num_token 8 to have a good balance.
>  d) you need to get to ~6 nodes with num_tokens 16 to have a good balance.
>
> Personally I would have liked to have seen even more iterations. While 14
> run iterations gives an indication, the average of randomness is not what
> is important here. What concerns me is the consequence to imbalances as the
> cluster grows when you're very unlucky with initial random tokens, for
> example when random tokens land very close together. The token allocation
> can deal with breaking up large token ranges but is unable to do anything
> about such tiny token ranges. Even a bad 1-in-a-100 experience should be a
> consideration when picking a default num_tokens.
>
>
> ** When does the Token Allocation work…
>
> This has been touched on already in this thread. There are cases where
> token allocation fails to kick in. The first node in up to RF racks
> generates random tokens, this typically means the first three nodes.
>
> But it can be worse, for example if you have RF=3 and only two racks then
> you will only get random tokens. We know of a number of production clusters
> that have been set up this way. I am unaware of any Cassandra docs or
> community recommendations that say you should avoid doing this. So, this is
> a problem regardless of the value for num_tokens.
>
>
> ** Algorithmic token allocation does not handle the racks = RF case well (
> CASSANDRA-15600 <https://issues.apache.org/jira/browse/CASSANDRA-15600>)
>
> This recently landed in trunk. My understanding is that this improves the
> situation the graphs cover, but not the situation just described where a DC
> has 1>racks>RF.  Ekaterina, maybe you could elaborate?
>
>
> ** Decommissioning Nodes
>
> Elasticity is a feature to Cassandra. The operational costs to Cassandra
> are a real consideration. A reduction from a 9 node cluster back to a 6
> node cluster does happen often enough. Decommissioning nodes on smaller
> clusters have the greatest operational cost savings yet will suffer most
> from too low a num_tokens setup.
>
>
> ** Recommendations from Cassandra Consulting Companies
>
> My understanding is that DataStax recommends num_tokens 8, while
> Instaclustr and The Last Pickle have both recommended 16. Interestingly
> enough those that are pushing for num_tokens 4,  are using today num_tokens
> 1 (and are already sitting with a lot of in-house C* experience).
>
>
> ** Keeping it Real
>
> Clusters where we have used num_tokens 4 we have regretted. This and past
> analysis work, similar to above, had led us to use 16 num_tokens. Cost
> optimisation of clusters is one of the key user concerns out there, and we
> have witnessed problems on this front with num_tokens 4.
>
> While we accept the validity and importance of the increased availability
> provided by num_tokens 4, we have never seen or used it in practice. The
> default value of num_tokens is important. The value of 256 has been good
> business for consultants, it was a bad choice for clusters and difficult to
> change. A new default should be chosen wisely.
>
>
> regards,
> Mick, Anthony, Alex
>
>

Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by Mick Semb Wever <mc...@apache.org>.

> Can we ask for some analysis and data against the risks different
> num_tokens choices present. We shouldn't rush into a new default, and such
> background information and data is operator value added.

Thanks for everyone's patience on this topic.
The following is further input on a number of fronts.

** Analysis of Token Distributions

The following is work done by Alex Dejanovski and Anthony Grasso. It builds upon their previous work at The Last Pickle and why we recommend 16 as the best value to clients. (Please buy beers for these two for the effort they have done here.)

The following three graphs show the ranges of imbalance that occur on clusters growing from 4 nodes to 12 nodes, for the different values of num_tokens: 4, 8 and 16. The range is based on 14 run iterations (except 16 which only got ten).

num_tokens: 4

num_tokens: 8

num_tokens: 16

These graphs were generated using clusters created in AWS by tlp-cluster (https://github.com/thelastpickle/tlp-cluster). A script was written to automate the testing and generate the data for each value of num_tokens. Each cluster was configured with one rack. Of course these interpretations are debatable. The data to the graphs is in https://docs.google.com/spreadsheets/d/1gPZpSOUm3_pSCo9y-ZJ8WIctpvXNr5hDdupJ7K_9PHY/edit?usp=sharing

What I see from these graphs is…
a) token allocation is pretty good are fixing initial bad random token imbalances. By the time you are at 12 nodes, presuming you have setup the cluster correctly so that token allocation actually works, your nodes will be balanced with num_tokens 4 or greater.
b) you need to get to ~12 nodes with num_tokens 4 to have a good balance.
c) you need to get to ~9 nodes with num_token 8 to have a good balance.
d) you need to get to ~6 nodes with num_tokens 16 to have a good balance.

Personally I would have liked to have seen even more iterations. While 14 run iterations gives an indication, the average of randomness is not what is important here. What concerns me is the consequence to imbalances as the cluster grows when you're very unlucky with initial random tokens, for example when random tokens land very close together. The token allocation can deal with breaking up large token ranges but is unable to do anything about such tiny token ranges. Even a bad 1-in-a-100 experience should be a consideration when picking a default num_tokens.

** When does the Token Allocation work…

This has been touched on already in this thread. There are cases where token allocation fails to kick in. The first node in up to RF racks generates random tokens, this typically means the first three nodes.

But it can be worse, for example if you have RF=3 and only two racks then you will only get random tokens. We know of a number of production clusters that have been set up this way. I am unaware of any Cassandra docs or community recommendations that say you should avoid doing this. So, this is a problem regardless of the value for num_tokens.

** Algorithmic token allocation does not handle the racks = RF case well (CASSANDRA-15600 <https://issues.apache.org/jira/browse/CASSANDRA-15600>)

This recently landed in trunk. My understanding is that this improves the situation the graphs cover, but not the situation just described where a DC has 1>racks>RF. Ekaterina, maybe you could elaborate?

** Decommissioning Nodes

Elasticity is a feature to Cassandra. The operational costs to Cassandra are a real consideration. A reduction from a 9 node cluster back to a 6 node cluster does happen often enough. Decommissioning nodes on smaller clusters have the greatest operational cost savings yet will suffer most from too low a num_tokens setup.

** Recommendations from Cassandra Consulting Companies

My understanding is that DataStax recommends num_tokens 8, while Instaclustr and The Last Pickle have both recommended 16. Interestingly enough those that are pushing for num_tokens 4, are using today num_tokens 1 (and are already sitting with a lot of in-house C* experience).

** Keeping it Real

Clusters where we have used num_tokens 4 we have regretted. This and past analysis work, similar to above, had led us to use 16 num_tokens. Cost optimisation of clusters is one of the key user concerns out there, and we have witnessed problems on this front with num_tokens 4.

While we accept the validity and importance of the increased availability provided by num_tokens 4, we have never seen or used it in practice. The default value of num_tokens is important. The value of 256 has been good business for consultants, it was a bad choice for clusters and difficult to change. A new default should be chosen wisely.

regards,
Mick, Anthony, Alex

Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by Joshua McKenzie <jm...@apache.org>.

>
> Discussions here and on slack have brought up a number of important
> concerns.

Sounds like we're letting the perfect be the enemy of the good. Is anyone
arguing that 256 is a better default than 16? Or is the fear that going to
16 now would make a default change in, say, 5.0 more painful?


On Tue, Feb 18, 2020 at 3:12 AM Ben Slater <be...@instaclustr.com>
wrote:

> In case it helps move the decision along, we moved to 16 vnodes as default
> in Nov 2018 and haven't looked back (many clusters from 3-100s of nodes
> later). The testing we did in making that decision is summarised here:
> https://www.instaclustr.com/cassandra-vnodes-how-many-should-i-use/
>
> <https://www.instaclustr.com/cassandra-vnodes-how-many-should-i-use/
> >Cheers
> Ben
>
> ---
>
>
> *Ben Slater**Chief Product Officer*
>
> <https://www.instaclustr.com/platform/>
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
> <https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> On Tue, 18 Feb 2020 at 18:44, Mick Semb Wever <mi...@thelastpickle.com>
> wrote:
>
> > -1
> >
> > Discussions here and on slack have brought up a number of important
> > concerns. I think those concerns need to be summarised here before any
> > informal vote.
> >
> > It was my understanding that some of those concerns may even be blockers
> to
> > a move to 16. That is we have to presume the worse case scenario where
> all
> > tokens get randomly generated.
> >
> > Can we ask for some analysis and data against the risks different
> > num_tokens choices present. We shouldn't rush into a new default, and
> such
> > background information and data is operator value added. Maybe I missed
> any
> > info/experiments that have happened?
> >
> >
> >
> > On Mon., 17 Feb. 2020, 11:14 pm Jeremy Hanna, <
> jeremy.hanna1234@gmail.com>
> > wrote:
> >
> > > I just wanted to close the loop on this if possible.  After some
> > discussion
> > > in slack about various topics, I would like to see if people are okay
> > with
> > > num_tokens=8 by default (as it's not much different operationally than
> > > 16).  Joey brought up a few small changes that I can put on the ticket.
> > It
> > > also requires some documentation for things like decommission order and
> > > skew.
> > >
> > > Are people okay with this change moving forward like this?  If so, I'll
> > > comment on the ticket and we can move forward.
> > >
> > > Thanks,
> > >
> > > Jeremy
> > >
> > > On Tue, Feb 4, 2020 at 8:45 AM Jon Haddad <jo...@jonhaddad.com> wrote:
> > >
> > > > I think it's a good idea to take a step back and get a high level
> view
> > of
> > > > the problem we're trying to solve.
> > > >
> > > > First, high token counts result in decreased availability as each
> node
> > > has
> > > > data overlap with with more nodes in the cluster.  Specifically, a
> node
> > > can
> > > > share data with RF-1 * 2 * num_tokens.  So a 256 token cluster at
> RF=3
> > is
> > > > going to almost always share data with every other node in the
> cluster
> > > that
> > > > isn't in the same rack, unless you're doing something wild like using
> > > more
> > > > than a thousand nodes in a cluster.  We advertise
> > > >
> > > > With 16 tokens, that is vastly improved, but you still have up to 64
> > > nodes
> > > > each node needs to query against, so you're again, hitting every node
> > > > unless you go above ~96 nodes in the cluster (assuming 3 racks /
> > AZs).  I
> > > > wouldn't use 16 here, and I doubt any of you would either.  I've
> > > advocated
> > > > for 4 tokens because you'd have overlap with only 16 nodes, which
> works
> > > > well for small clusters as well as large.  Assuming I was creating a
> > new
> > > > cluster for myself (in a hypothetical brand new application I'm
> > > building) I
> > > > would put this in production.  I have worked with several teams
> where I
> > > > helped them put 4 token clusters in prod and it has worked very well.
> > We
> > > > didn't see any wild imbalance issues.
> > > >
> > > > As Mick's pointed out, our current method of using random token
> > > assignment
> > > > for the default number of problematic for 4 tokens.  I fully agree
> with
> > > > this, and I think if we were to try to use 4 tokens, we'd want to
> > address
> > > > this in tandem.  We can discuss how to better allocate tokens by
> > default
> > > > (something more predictable than random), but I'd like to avoid the
> > > > specifics of that for the sake of this email.
> > > >
> > > > To Alex's point, repairs are problematic with lower token counts due
> to
> > > > over streaming.  I think this is a pretty serious issue and I we'd
> have
> > > to
> > > > address it before going all the way down to 4.  This, in my opinion,
> > is a
> > > > more complex problem to solve and I think trying to fix it here could
> > > make
> > > > shipping 4.0 take even longer, something none of us want.
> > > >
> > > > For the sake of shipping 4.0 without adding extra overhead and time,
> > I'm
> > > ok
> > > > with moving to 16 tokens, and in the process adding extensive
> > > documentation
> > > > outlining what we recommend for production use.  I think we should
> also
> > > try
> > > > to figure out something better than random as the default to fix the
> > data
> > > > imbalance issues.  I've got a few ideas here I've been noodling on.
> > > >
> > > > As long as folks are fine with potentially changing the default again
> > in
> > > C*
> > > > 5.0 (after another discussion / debate), 16 is enough of an
> improvement
> > > > that I'm OK with the change, and willing to author the docs to help
> > > people
> > > > set up their first cluster.  For folks that go into production with
> the
> > > > defaults, we're at least not setting them up for total failure once
> > their
> > > > clusters get large like we are now.
> > > >
> > > > In future versions, we'll probably want to address the issue of data
> > > > imbalance by building something in that shifts individual tokens
> > > around.  I
> > > > don't think we should try to do this in 4.0 either.
> > > >
> > > > Jon
> > > >
> > > >
> > > >
> > > > On Fri, Jan 31, 2020 at 2:04 PM Jeremy Hanna <
> > jeremy.hanna1234@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > I think Mick and Anthony make some valid operational and skew
> points
> > > for
> > > > > smaller/starting clusters with 4 num_tokens. There’s an arbitrary
> > line
> > > > > between small and large clusters but I think most would agree that
> > most
> > > > > clusters are on the small to medium side. (A small nuance is afaict
> > the
> > > > > probabilities have to do with quorum on a full token range, ie it
> has
> > > to
> > > > do
> > > > > with the size of a datacenter not the full cluster
> > > > >
> > > > > As I read this discussion I’m personally more inclined to go with
> 16
> > > for
> > > > > now. It’s true that if we could fix the skew and topology gotchas
> for
> > > > those
> > > > > starting things up, 4 would be ideal from an availability
> > perspective.
> > > > > However we’re still in the brainstorming stage for how to address
> > those
> > > > > challenges. I think we should create tickets for those issues and
> go
> > > with
> > > > > 16 for 4.0.
> > > > >
> > > > > This is about an out of the box experience. It balances
> availability,
> > > > > operations (such as skew and general bootstrap friendliness and
> > > > > streaming/repair), and cluster sizing. Balancing all of those, I
> > think
> > > > for
> > > > > now I’m more comfortable with 16 as the default with docs on
> > > > considerations
> > > > > and tickets to unblock 4 as the default for all users.
> > > > >
> > > > > >>> On Feb 1, 2020, at 6:30 AM, Jeff Jirsa <jj...@gmail.com>
> wrote:
> > > > > >> On Fri, Jan 31, 2020 at 11:25 AM Joseph Lynch <
> > > joe.e.lynch@gmail.com
> > > > >
> > > > > wrote:
> > > > > >> I think that we might be bikeshedding this number a bit because
> it
> > > is
> > > > > easy
> > > > > >> to debate and there is not yet one right answer.
> > > > > >
> > > > > >
> > > > > > https://www.youtube.com/watch?v=v465T5u9UKo
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by Ben Slater <be...@instaclustr.com>.

In case it helps move the decision along, we moved to 16 vnodes as default
in Nov 2018 and haven't looked back (many clusters from 3-100s of nodes
later). The testing we did in making that decision is summarised here:
https://www.instaclustr.com/cassandra-vnodes-how-many-should-i-use/

<https://www.instaclustr.com/cassandra-vnodes-how-many-should-i-use/>Cheers
Ben

---


*Ben Slater**Chief Product Officer*

<https://www.instaclustr.com/platform/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Tue, 18 Feb 2020 at 18:44, Mick Semb Wever <mi...@thelastpickle.com>
wrote:

> -1
>
> Discussions here and on slack have brought up a number of important
> concerns. I think those concerns need to be summarised here before any
> informal vote.
>
> It was my understanding that some of those concerns may even be blockers to
> a move to 16. That is we have to presume the worse case scenario where all
> tokens get randomly generated.
>
> Can we ask for some analysis and data against the risks different
> num_tokens choices present. We shouldn't rush into a new default, and such
> background information and data is operator value added. Maybe I missed any
> info/experiments that have happened?
>
>
>
> On Mon., 17 Feb. 2020, 11:14 pm Jeremy Hanna, <je...@gmail.com>
> wrote:
>
> > I just wanted to close the loop on this if possible.  After some
> discussion
> > in slack about various topics, I would like to see if people are okay
> with
> > num_tokens=8 by default (as it's not much different operationally than
> > 16).  Joey brought up a few small changes that I can put on the ticket.
> It
> > also requires some documentation for things like decommission order and
> > skew.
> >
> > Are people okay with this change moving forward like this?  If so, I'll
> > comment on the ticket and we can move forward.
> >
> > Thanks,
> >
> > Jeremy
> >
> > On Tue, Feb 4, 2020 at 8:45 AM Jon Haddad <jo...@jonhaddad.com> wrote:
> >
> > > I think it's a good idea to take a step back and get a high level view
> of
> > > the problem we're trying to solve.
> > >
> > > First, high token counts result in decreased availability as each node
> > has
> > > data overlap with with more nodes in the cluster.  Specifically, a node
> > can
> > > share data with RF-1 * 2 * num_tokens.  So a 256 token cluster at RF=3
> is
> > > going to almost always share data with every other node in the cluster
> > that
> > > isn't in the same rack, unless you're doing something wild like using
> > more
> > > than a thousand nodes in a cluster.  We advertise
> > >
> > > With 16 tokens, that is vastly improved, but you still have up to 64
> > nodes
> > > each node needs to query against, so you're again, hitting every node
> > > unless you go above ~96 nodes in the cluster (assuming 3 racks /
> AZs).  I
> > > wouldn't use 16 here, and I doubt any of you would either.  I've
> > advocated
> > > for 4 tokens because you'd have overlap with only 16 nodes, which works
> > > well for small clusters as well as large.  Assuming I was creating a
> new
> > > cluster for myself (in a hypothetical brand new application I'm
> > building) I
> > > would put this in production.  I have worked with several teams where I
> > > helped them put 4 token clusters in prod and it has worked very well.
> We
> > > didn't see any wild imbalance issues.
> > >
> > > As Mick's pointed out, our current method of using random token
> > assignment
> > > for the default number of problematic for 4 tokens.  I fully agree with
> > > this, and I think if we were to try to use 4 tokens, we'd want to
> address
> > > this in tandem.  We can discuss how to better allocate tokens by
> default
> > > (something more predictable than random), but I'd like to avoid the
> > > specifics of that for the sake of this email.
> > >
> > > To Alex's point, repairs are problematic with lower token counts due to
> > > over streaming.  I think this is a pretty serious issue and I we'd have
> > to
> > > address it before going all the way down to 4.  This, in my opinion,
> is a
> > > more complex problem to solve and I think trying to fix it here could
> > make
> > > shipping 4.0 take even longer, something none of us want.
> > >
> > > For the sake of shipping 4.0 without adding extra overhead and time,
> I'm
> > ok
> > > with moving to 16 tokens, and in the process adding extensive
> > documentation
> > > outlining what we recommend for production use.  I think we should also
> > try
> > > to figure out something better than random as the default to fix the
> data
> > > imbalance issues.  I've got a few ideas here I've been noodling on.
> > >
> > > As long as folks are fine with potentially changing the default again
> in
> > C*
> > > 5.0 (after another discussion / debate), 16 is enough of an improvement
> > > that I'm OK with the change, and willing to author the docs to help
> > people
> > > set up their first cluster.  For folks that go into production with the
> > > defaults, we're at least not setting them up for total failure once
> their
> > > clusters get large like we are now.
> > >
> > > In future versions, we'll probably want to address the issue of data
> > > imbalance by building something in that shifts individual tokens
> > around.  I
> > > don't think we should try to do this in 4.0 either.
> > >
> > > Jon
> > >
> > >
> > >
> > > On Fri, Jan 31, 2020 at 2:04 PM Jeremy Hanna <
> jeremy.hanna1234@gmail.com
> > >
> > > wrote:
> > >
> > > > I think Mick and Anthony make some valid operational and skew points
> > for
> > > > smaller/starting clusters with 4 num_tokens. There’s an arbitrary
> line
> > > > between small and large clusters but I think most would agree that
> most
> > > > clusters are on the small to medium side. (A small nuance is afaict
> the
> > > > probabilities have to do with quorum on a full token range, ie it has
> > to
> > > do
> > > > with the size of a datacenter not the full cluster
> > > >
> > > > As I read this discussion I’m personally more inclined to go with 16
> > for
> > > > now. It’s true that if we could fix the skew and topology gotchas for
> > > those
> > > > starting things up, 4 would be ideal from an availability
> perspective.
> > > > However we’re still in the brainstorming stage for how to address
> those
> > > > challenges. I think we should create tickets for those issues and go
> > with
> > > > 16 for 4.0.
> > > >
> > > > This is about an out of the box experience. It balances availability,
> > > > operations (such as skew and general bootstrap friendliness and
> > > > streaming/repair), and cluster sizing. Balancing all of those, I
> think
> > > for
> > > > now I’m more comfortable with 16 as the default with docs on
> > > considerations
> > > > and tickets to unblock 4 as the default for all users.
> > > >
> > > > >>> On Feb 1, 2020, at 6:30 AM, Jeff Jirsa <jj...@gmail.com> wrote:
> > > > >> On Fri, Jan 31, 2020 at 11:25 AM Joseph Lynch <
> > joe.e.lynch@gmail.com
> > > >
> > > > wrote:
> > > > >> I think that we might be bikeshedding this number a bit because it
> > is
> > > > easy
> > > > >> to debate and there is not yet one right answer.
> > > > >
> > > > >
> > > > > https://www.youtube.com/watch?v=v465T5u9UKo
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >
> > > >
> > >
> >
>

Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by Mick Semb Wever <mi...@thelastpickle.com>.

The appeal to 'perfect is the enemy...' is appreciated. But I (we) have
seen from experiences that this is about what is good rather than what is
perfect.

I'm not suggesting we create a fool proof system, just one that is safe
against what we know happens all too often in production systems.

I believe there is some further analysis and testing happening, so I'm only
asking that we take a bit of patience so that our definition of what is
"good" (vs perfect) is grounded.


On Wed., 19 Feb. 2020, 1:35 pm Jeremiah Jordan, <je...@datastax.com>
wrote:

> If you don’t know what you are doing you will have one rack which will
> also be safe. If you are setting up racks then you most likely read
> something about doing that, and should also be fine.
> This discussion has gone off the rails 100 times with what ifs that are
> “letting perfect be the enemy of good”. The setting doesn’t need to be
> perfect. It just needs to be “good enough“.
>
> > On Feb 19, 2020, at 1:44 AM, Mick Semb Wever <mi...@thelastpickle.com>
> wrote:
> >
> > Why do we have to assume random assignment?
> >
> >
> >
> > Because token allocation only works once you have a node in RF racks. If
> > you don't bootstrap nodes in alternating racks, or just never have RF
> racks
> > setup (but more than one rack) it's going to be random.
> >
> > Whatever default we choose should be a safe choice, not the best for
> > experts. Making it safe (4 as the default would be great) shouldn't be
> > difficult, and I thought Joey was building a  list of related issues?
> >
> > Seeing these issues put together summarised would really help build the
> > consensus IMHO.
> >
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by Jon Haddad <jo...@jonhaddad.com>.

Joey Lynch had a good idea - that if the allocate tokens for RF isn't set
we use 1 as the RF.  I suggested we take it a step further and use the rack
count as the RF if it's not set.

This should take care of most clusters even if they don't set the RF, and
will handle the uneven distribution when provisioning a new cluster.

The only case where you'd want more tokens is to scale down, which I saw in
very few clusters of the hundreds I've worked on.



On Wed, Feb 19, 2020 at 4:35 AM Jeremiah Jordan <je...@datastax.com>
wrote:

> If you don’t know what you are doing you will have one rack which will
> also be safe. If you are setting up racks then you most likely read
> something about doing that, and should also be fine.
> This discussion has gone off the rails 100 times with what ifs that are
> “letting perfect be the enemy of good”. The setting doesn’t need to be
> perfect. It just needs to be “good enough“.
>
> > On Feb 19, 2020, at 1:44 AM, Mick Semb Wever <mi...@thelastpickle.com>
> wrote:
> >
> > Why do we have to assume random assignment?
> >
> >
> >
> > Because token allocation only works once you have a node in RF racks. If
> > you don't bootstrap nodes in alternating racks, or just never have RF
> racks
> > setup (but more than one rack) it's going to be random.
> >
> > Whatever default we choose should be a safe choice, not the best for
> > experts. Making it safe (4 as the default would be great) shouldn't be
> > difficult, and I thought Joey was building a  list of related issues?
> >
> > Seeing these issues put together summarised would really help build the
> > consensus IMHO.
> >
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by Jeremiah Jordan <je...@datastax.com>.

If you don’t know what you are doing you will have one rack which will also be safe. If you are setting up racks then you most likely read something about doing that, and should also be fine.
This discussion has gone off the rails 100 times with what ifs that are “letting perfect be the enemy of good”. The setting doesn’t need to be perfect. It just needs to be “good enough“.

> On Feb 19, 2020, at 1:44 AM, Mick Semb Wever <mi...@thelastpickle.com> wrote:
> 
> Why do we have to assume random assignment?
> 
> 
> 
> Because token allocation only works once you have a node in RF racks. If
> you don't bootstrap nodes in alternating racks, or just never have RF racks
> setup (but more than one rack) it's going to be random.
> 
> Whatever default we choose should be a safe choice, not the best for
> experts. Making it safe (4 as the default would be great) shouldn't be
> difficult, and I thought Joey was building a  list of related issues?
> 
> Seeing these issues put together summarised would really help build the
> consensus IMHO.
> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by Mick Semb Wever <mi...@thelastpickle.com>.

Why do we have to assume random assignment?



Because token allocation only works once you have a node in RF racks. If
you don't bootstrap nodes in alternating racks, or just never have RF racks
setup (but more than one rack) it's going to be random.

Whatever default we choose should be a safe choice, not the best for
experts. Making it safe (4 as the default would be great) shouldn't be
difficult, and I thought Joey was building a  list of related issues?

Seeing these issues put together summarised would really help build the
consensus IMHO.

>

Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by Jeremiah D Jordan <je...@gmail.com>.

+1 for 8 + algorithm assignment being the default.

Why do we have to assume random assignment?  If someone turns off algorithm assignment they are changing away from defaults, so they should also adjust the num tokens.

-Jeremiah

> On Feb 18, 2020, at 1:44 AM, Mick Semb Wever <mi...@thelastpickle.com> wrote:
> 
> -1
> 
> Discussions here and on slack have brought up a number of important
> concerns. I think those concerns need to be summarised here before any
> informal vote.
> 
> It was my understanding that some of those concerns may even be blockers to
> a move to 16. That is we have to presume the worse case scenario where all
> tokens get randomly generated.
> 
> Can we ask for some analysis and data against the risks different
> num_tokens choices present. We shouldn't rush into a new default, and such
> background information and data is operator value added. Maybe I missed any
> info/experiments that have happened?
> 
> 
> 
> On Mon., 17 Feb. 2020, 11:14 pm Jeremy Hanna, <je...@gmail.com>
> wrote:
> 
>> I just wanted to close the loop on this if possible.  After some discussion
>> in slack about various topics, I would like to see if people are okay with
>> num_tokens=8 by default (as it's not much different operationally than
>> 16).  Joey brought up a few small changes that I can put on the ticket.  It
>> also requires some documentation for things like decommission order and
>> skew.
>> 
>> Are people okay with this change moving forward like this?  If so, I'll
>> comment on the ticket and we can move forward.
>> 
>> Thanks,
>> 
>> Jeremy
>> 
>> On Tue, Feb 4, 2020 at 8:45 AM Jon Haddad <jo...@jonhaddad.com> wrote:
>> 
>>> I think it's a good idea to take a step back and get a high level view of
>>> the problem we're trying to solve.
>>> 
>>> First, high token counts result in decreased availability as each node
>> has
>>> data overlap with with more nodes in the cluster.  Specifically, a node
>> can
>>> share data with RF-1 * 2 * num_tokens.  So a 256 token cluster at RF=3 is
>>> going to almost always share data with every other node in the cluster
>> that
>>> isn't in the same rack, unless you're doing something wild like using
>> more
>>> than a thousand nodes in a cluster.  We advertise
>>> 
>>> With 16 tokens, that is vastly improved, but you still have up to 64
>> nodes
>>> each node needs to query against, so you're again, hitting every node
>>> unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs).  I
>>> wouldn't use 16 here, and I doubt any of you would either.  I've
>> advocated
>>> for 4 tokens because you'd have overlap with only 16 nodes, which works
>>> well for small clusters as well as large.  Assuming I was creating a new
>>> cluster for myself (in a hypothetical brand new application I'm
>> building) I
>>> would put this in production.  I have worked with several teams where I
>>> helped them put 4 token clusters in prod and it has worked very well.  We
>>> didn't see any wild imbalance issues.
>>> 
>>> As Mick's pointed out, our current method of using random token
>> assignment
>>> for the default number of problematic for 4 tokens.  I fully agree with
>>> this, and I think if we were to try to use 4 tokens, we'd want to address
>>> this in tandem.  We can discuss how to better allocate tokens by default
>>> (something more predictable than random), but I'd like to avoid the
>>> specifics of that for the sake of this email.
>>> 
>>> To Alex's point, repairs are problematic with lower token counts due to
>>> over streaming.  I think this is a pretty serious issue and I we'd have
>> to
>>> address it before going all the way down to 4.  This, in my opinion, is a
>>> more complex problem to solve and I think trying to fix it here could
>> make
>>> shipping 4.0 take even longer, something none of us want.
>>> 
>>> For the sake of shipping 4.0 without adding extra overhead and time, I'm
>> ok
>>> with moving to 16 tokens, and in the process adding extensive
>> documentation
>>> outlining what we recommend for production use.  I think we should also
>> try
>>> to figure out something better than random as the default to fix the data
>>> imbalance issues.  I've got a few ideas here I've been noodling on.
>>> 
>>> As long as folks are fine with potentially changing the default again in
>> C*
>>> 5.0 (after another discussion / debate), 16 is enough of an improvement
>>> that I'm OK with the change, and willing to author the docs to help
>> people
>>> set up their first cluster.  For folks that go into production with the
>>> defaults, we're at least not setting them up for total failure once their
>>> clusters get large like we are now.
>>> 
>>> In future versions, we'll probably want to address the issue of data
>>> imbalance by building something in that shifts individual tokens
>> around.  I
>>> don't think we should try to do this in 4.0 either.
>>> 
>>> Jon
>>> 
>>> 
>>> 
>>> On Fri, Jan 31, 2020 at 2:04 PM Jeremy Hanna <jeremy.hanna1234@gmail.com
>>> 
>>> wrote:
>>> 
>>>> I think Mick and Anthony make some valid operational and skew points
>> for
>>>> smaller/starting clusters with 4 num_tokens. There’s an arbitrary line
>>>> between small and large clusters but I think most would agree that most
>>>> clusters are on the small to medium side. (A small nuance is afaict the
>>>> probabilities have to do with quorum on a full token range, ie it has
>> to
>>> do
>>>> with the size of a datacenter not the full cluster
>>>> 
>>>> As I read this discussion I’m personally more inclined to go with 16
>> for
>>>> now. It’s true that if we could fix the skew and topology gotchas for
>>> those
>>>> starting things up, 4 would be ideal from an availability perspective.
>>>> However we’re still in the brainstorming stage for how to address those
>>>> challenges. I think we should create tickets for those issues and go
>> with
>>>> 16 for 4.0.
>>>> 
>>>> This is about an out of the box experience. It balances availability,
>>>> operations (such as skew and general bootstrap friendliness and
>>>> streaming/repair), and cluster sizing. Balancing all of those, I think
>>> for
>>>> now I’m more comfortable with 16 as the default with docs on
>>> considerations
>>>> and tickets to unblock 4 as the default for all users.
>>>> 
>>>>>>> On Feb 1, 2020, at 6:30 AM, Jeff Jirsa <jj...@gmail.com> wrote:
>>>>>> On Fri, Jan 31, 2020 at 11:25 AM Joseph Lynch <
>> joe.e.lynch@gmail.com
>>>> 
>>>> wrote:
>>>>>> I think that we might be bikeshedding this number a bit because it
>> is
>>>> easy
>>>>>> to debate and there is not yet one right answer.
>>>>> 
>>>>> 
>>>>> https://www.youtube.com/watch?v=v465T5u9UKo
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>> 
>>>> 
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by Mick Semb Wever <mi...@thelastpickle.com>.

-1

Discussions here and on slack have brought up a number of important
concerns. I think those concerns need to be summarised here before any
informal vote.

It was my understanding that some of those concerns may even be blockers to
a move to 16. That is we have to presume the worse case scenario where all
tokens get randomly generated.

Can we ask for some analysis and data against the risks different
num_tokens choices present. We shouldn't rush into a new default, and such
background information and data is operator value added. Maybe I missed any
info/experiments that have happened?



On Mon., 17 Feb. 2020, 11:14 pm Jeremy Hanna, <je...@gmail.com>
wrote:

> I just wanted to close the loop on this if possible.  After some discussion
> in slack about various topics, I would like to see if people are okay with
> num_tokens=8 by default (as it's not much different operationally than
> 16).  Joey brought up a few small changes that I can put on the ticket.  It
> also requires some documentation for things like decommission order and
> skew.
>
> Are people okay with this change moving forward like this?  If so, I'll
> comment on the ticket and we can move forward.
>
> Thanks,
>
> Jeremy
>
> On Tue, Feb 4, 2020 at 8:45 AM Jon Haddad <jo...@jonhaddad.com> wrote:
>
> > I think it's a good idea to take a step back and get a high level view of
> > the problem we're trying to solve.
> >
> > First, high token counts result in decreased availability as each node
> has
> > data overlap with with more nodes in the cluster.  Specifically, a node
> can
> > share data with RF-1 * 2 * num_tokens.  So a 256 token cluster at RF=3 is
> > going to almost always share data with every other node in the cluster
> that
> > isn't in the same rack, unless you're doing something wild like using
> more
> > than a thousand nodes in a cluster.  We advertise
> >
> > With 16 tokens, that is vastly improved, but you still have up to 64
> nodes
> > each node needs to query against, so you're again, hitting every node
> > unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs).  I
> > wouldn't use 16 here, and I doubt any of you would either.  I've
> advocated
> > for 4 tokens because you'd have overlap with only 16 nodes, which works
> > well for small clusters as well as large.  Assuming I was creating a new
> > cluster for myself (in a hypothetical brand new application I'm
> building) I
> > would put this in production.  I have worked with several teams where I
> > helped them put 4 token clusters in prod and it has worked very well.  We
> > didn't see any wild imbalance issues.
> >
> > As Mick's pointed out, our current method of using random token
> assignment
> > for the default number of problematic for 4 tokens.  I fully agree with
> > this, and I think if we were to try to use 4 tokens, we'd want to address
> > this in tandem.  We can discuss how to better allocate tokens by default
> > (something more predictable than random), but I'd like to avoid the
> > specifics of that for the sake of this email.
> >
> > To Alex's point, repairs are problematic with lower token counts due to
> > over streaming.  I think this is a pretty serious issue and I we'd have
> to
> > address it before going all the way down to 4.  This, in my opinion, is a
> > more complex problem to solve and I think trying to fix it here could
> make
> > shipping 4.0 take even longer, something none of us want.
> >
> > For the sake of shipping 4.0 without adding extra overhead and time, I'm
> ok
> > with moving to 16 tokens, and in the process adding extensive
> documentation
> > outlining what we recommend for production use.  I think we should also
> try
> > to figure out something better than random as the default to fix the data
> > imbalance issues.  I've got a few ideas here I've been noodling on.
> >
> > As long as folks are fine with potentially changing the default again in
> C*
> > 5.0 (after another discussion / debate), 16 is enough of an improvement
> > that I'm OK with the change, and willing to author the docs to help
> people
> > set up their first cluster.  For folks that go into production with the
> > defaults, we're at least not setting them up for total failure once their
> > clusters get large like we are now.
> >
> > In future versions, we'll probably want to address the issue of data
> > imbalance by building something in that shifts individual tokens
> around.  I
> > don't think we should try to do this in 4.0 either.
> >
> > Jon
> >
> >
> >
> > On Fri, Jan 31, 2020 at 2:04 PM Jeremy Hanna <jeremy.hanna1234@gmail.com
> >
> > wrote:
> >
> > > I think Mick and Anthony make some valid operational and skew points
> for
> > > smaller/starting clusters with 4 num_tokens. There’s an arbitrary line
> > > between small and large clusters but I think most would agree that most
> > > clusters are on the small to medium side. (A small nuance is afaict the
> > > probabilities have to do with quorum on a full token range, ie it has
> to
> > do
> > > with the size of a datacenter not the full cluster
> > >
> > > As I read this discussion I’m personally more inclined to go with 16
> for
> > > now. It’s true that if we could fix the skew and topology gotchas for
> > those
> > > starting things up, 4 would be ideal from an availability perspective.
> > > However we’re still in the brainstorming stage for how to address those
> > > challenges. I think we should create tickets for those issues and go
> with
> > > 16 for 4.0.
> > >
> > > This is about an out of the box experience. It balances availability,
> > > operations (such as skew and general bootstrap friendliness and
> > > streaming/repair), and cluster sizing. Balancing all of those, I think
> > for
> > > now I’m more comfortable with 16 as the default with docs on
> > considerations
> > > and tickets to unblock 4 as the default for all users.
> > >
> > > >>> On Feb 1, 2020, at 6:30 AM, Jeff Jirsa <jj...@gmail.com> wrote:
> > > >> On Fri, Jan 31, 2020 at 11:25 AM Joseph Lynch <
> joe.e.lynch@gmail.com
> > >
> > > wrote:
> > > >> I think that we might be bikeshedding this number a bit because it
> is
> > > easy
> > > >> to debate and there is not yet one right answer.
> > > >
> > > >
> > > > https://www.youtube.com/watch?v=v465T5u9UKo
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> > >
> >
>

Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by Jeremy Hanna <je...@gmail.com>.

I just wanted to close the loop on this if possible.  After some discussion
in slack about various topics, I would like to see if people are okay with
num_tokens=8 by default (as it's not much different operationally than
16).  Joey brought up a few small changes that I can put on the ticket.  It
also requires some documentation for things like decommission order and
skew.

Are people okay with this change moving forward like this?  If so, I'll
comment on the ticket and we can move forward.

Thanks,

Jeremy

On Tue, Feb 4, 2020 at 8:45 AM Jon Haddad <jo...@jonhaddad.com> wrote:

> I think it's a good idea to take a step back and get a high level view of
> the problem we're trying to solve.
>
> First, high token counts result in decreased availability as each node has
> data overlap with with more nodes in the cluster.  Specifically, a node can
> share data with RF-1 * 2 * num_tokens.  So a 256 token cluster at RF=3 is
> going to almost always share data with every other node in the cluster that
> isn't in the same rack, unless you're doing something wild like using more
> than a thousand nodes in a cluster.  We advertise
>
> With 16 tokens, that is vastly improved, but you still have up to 64 nodes
> each node needs to query against, so you're again, hitting every node
> unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs).  I
> wouldn't use 16 here, and I doubt any of you would either.  I've advocated
> for 4 tokens because you'd have overlap with only 16 nodes, which works
> well for small clusters as well as large.  Assuming I was creating a new
> cluster for myself (in a hypothetical brand new application I'm building) I
> would put this in production.  I have worked with several teams where I
> helped them put 4 token clusters in prod and it has worked very well.  We
> didn't see any wild imbalance issues.
>
> As Mick's pointed out, our current method of using random token assignment
> for the default number of problematic for 4 tokens.  I fully agree with
> this, and I think if we were to try to use 4 tokens, we'd want to address
> this in tandem.  We can discuss how to better allocate tokens by default
> (something more predictable than random), but I'd like to avoid the
> specifics of that for the sake of this email.
>
> To Alex's point, repairs are problematic with lower token counts due to
> over streaming.  I think this is a pretty serious issue and I we'd have to
> address it before going all the way down to 4.  This, in my opinion, is a
> more complex problem to solve and I think trying to fix it here could make
> shipping 4.0 take even longer, something none of us want.
>
> For the sake of shipping 4.0 without adding extra overhead and time, I'm ok
> with moving to 16 tokens, and in the process adding extensive documentation
> outlining what we recommend for production use.  I think we should also try
> to figure out something better than random as the default to fix the data
> imbalance issues.  I've got a few ideas here I've been noodling on.
>
> As long as folks are fine with potentially changing the default again in C*
> 5.0 (after another discussion / debate), 16 is enough of an improvement
> that I'm OK with the change, and willing to author the docs to help people
> set up their first cluster.  For folks that go into production with the
> defaults, we're at least not setting them up for total failure once their
> clusters get large like we are now.
>
> In future versions, we'll probably want to address the issue of data
> imbalance by building something in that shifts individual tokens around.  I
> don't think we should try to do this in 4.0 either.
>
> Jon
>
>
>
> On Fri, Jan 31, 2020 at 2:04 PM Jeremy Hanna <je...@gmail.com>
> wrote:
>
> > I think Mick and Anthony make some valid operational and skew points for
> > smaller/starting clusters with 4 num_tokens. There’s an arbitrary line
> > between small and large clusters but I think most would agree that most
> > clusters are on the small to medium side. (A small nuance is afaict the
> > probabilities have to do with quorum on a full token range, ie it has to
> do
> > with the size of a datacenter not the full cluster
> >
> > As I read this discussion I’m personally more inclined to go with 16 for
> > now. It’s true that if we could fix the skew and topology gotchas for
> those
> > starting things up, 4 would be ideal from an availability perspective.
> > However we’re still in the brainstorming stage for how to address those
> > challenges. I think we should create tickets for those issues and go with
> > 16 for 4.0.
> >
> > This is about an out of the box experience. It balances availability,
> > operations (such as skew and general bootstrap friendliness and
> > streaming/repair), and cluster sizing. Balancing all of those, I think
> for
> > now I’m more comfortable with 16 as the default with docs on
> considerations
> > and tickets to unblock 4 as the default for all users.
> >
> > >>> On Feb 1, 2020, at 6:30 AM, Jeff Jirsa <jj...@gmail.com> wrote:
> > >> On Fri, Jan 31, 2020 at 11:25 AM Joseph Lynch <joe.e.lynch@gmail.com
> >
> > wrote:
> > >> I think that we might be bikeshedding this number a bit because it is
> > easy
> > >> to debate and there is not yet one right answer.
> > >
> > >
> > > https://www.youtube.com/watch?v=v465T5u9UKo
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>

Re: Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by Jeff Jirsa <jj...@gmail.com>.

The more vnodes you have on each host, the more likely it becomes that any
2 hosts are adjacent/neighbors/replicas.


On Mon, Feb 3, 2020 at 8:39 PM onmstester onmstester
<on...@zoho.com.invalid> wrote:

> Sorry if its trivial, but i do not understand how num_tokens affects
> availability, with RF=3, CLW,CLR=quorum, the cluster could tolerate to lost
> at most one node and all of the tokens assigned to that node would be also
> assigned to two other nodes no matter what num_tokens is, right?
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
> ============ Forwarded message ============
> From: Jon Haddad <jo...@jonhaddad.com>
> To: <de...@cassandra.apache.org>
> Date: Tue, 04 Feb 2020 01:15:21 +0330
> Subject: Re: [Discuss] num_tokens default in Cassandra 4.0
> ============ Forwarded message ============
>
> I think it's a good idea to take a step back and get a high level view of
> the problem we're trying to solve.
>
> First, high token counts result in decreased availability as each node has
> data overlap with with more nodes in the cluster. Specifically, a node can
> share data with RF-1 * 2 * num_tokens. So a 256 token cluster at RF=3 is
> going to almost always share data with every other node in the cluster
> that
> isn't in the same rack, unless you're doing something wild like using more
> than a thousand nodes in a cluster. We advertise
>
> With 16 tokens, that is vastly improved, but you still have up to 64 nodes
> each node needs to query against, so you're again, hitting every node
> unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs). I
> wouldn't use 16 here, and I doubt any of you would either. I've advocated
> for 4 tokens because you'd have overlap with only 16 nodes, which works
> well for small clusters as well as large. Assuming I was creating a new
> cluster for myself (in a hypothetical brand new application I'm building)
> I
> would put this in production. I have worked with several teams where I
> helped them put 4 token clusters in prod and it has worked very well. We
> didn't see any wild imbalance issues.
>
> As Mick's pointed out, our current method of using random token assignment
> for the default number of problematic for 4 tokens. I fully agree with
> this, and I think if we were to try to use 4 tokens, we'd want to address
> this in tandem. We can discuss how to better allocate tokens by default
> (something more predictable than random), but I'd like to avoid the
> specifics of that for the sake of this email.
>
> To Alex's point, repairs are problematic with lower token counts due to
> over streaming. I think this is a pretty serious issue and I we'd have to
> address it before going all the way down to 4. This, in my opinion, is a
> more complex problem to solve and I think trying to fix it here could make
> shipping 4.0 take even longer, something none of us want.
>
> For the sake of shipping 4.0 without adding extra overhead and time, I'm
> ok
> with moving to 16 tokens, and in the process adding extensive
> documentation
> outlining what we recommend for production use. I think we should also try
> to figure out something better than random as the default to fix the data
> imbalance issues. I've got a few ideas here I've been noodling on.
>
> As long as folks are fine with potentially changing the default again in
> C*
> 5.0 (after another discussion / debate), 16 is enough of an improvement
> that I'm OK with the change, and willing to author the docs to help people
> set up their first cluster. For folks that go into production with the
> defaults, we're at least not setting them up for total failure once their
> clusters get large like we are now.
>
> In future versions, we'll probably want to address the issue of data
> imbalance by building something in that shifts individual tokens around. I
> don't think we should try to do this in 4.0 either.
>
> Jon
>
>
>
> On Fri, Jan 31, 2020 at 2:04 PM Jeremy Hanna <je...@gmail.com>
> wrote:
>
> > I think Mick and Anthony make some valid operational and skew points for
> > smaller/starting clusters with 4 num_tokens. There’s an arbitrary line
> > between small and large clusters but I think most would agree that most
> > clusters are on the small to medium side. (A small nuance is afaict the
> > probabilities have to do with quorum on a full token range, ie it has to
> do
> > with the size of a datacenter not the full cluster
> >
> > As I read this discussion I’m personally more inclined to go with 16 for
> > now. It’s true that if we could fix the skew and topology gotchas for
> those
> > starting things up, 4 would be ideal from an availability perspective.
> > However we’re still in the brainstorming stage for how to address those
> > challenges. I think we should create tickets for those issues and go
> with
> > 16 for 4.0.
> >
> > This is about an out of the box experience. It balances availability,
> > operations (such as skew and general bootstrap friendliness and
> > streaming/repair), and cluster sizing. Balancing all of those, I think
> for
> > now I’m more comfortable with 16 as the default with docs on
> considerations
> > and tickets to unblock 4 as the default for all users.
> >
> > >>> On Feb 1, 2020, at 6:30 AM, Jeff Jirsa <jj...@gmail.com> wrote:
> > >> On Fri, Jan 31, 2020 at 11:25 AM Joseph Lynch <jo...@gmail.com>
>
> > wrote:
> > >> I think that we might be bikeshedding this number a bit because it is
> > easy
> > >> to debate and there is not yet one right answer.
> > >
> > >
> > > https://www.youtube.com/watch?v=v465T5u9UKo
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>
>
>
>

Fwd: Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by onmstester onmstester <on...@zoho.com.INVALID>.

Thank you so much

Sent using https://www.zoho.com/mail/

============ Forwarded message ============
From: Max C. <mc...@core43.com>
To: <us...@cassandra.apache.org>
Date: Tue, 04 Feb 2020 08:37:21 +0330
Subject: Re: [Discuss] num_tokens default in Cassandra 4.0
============ Forwarded message ============

Let’s say you have a 6 node cluster, with RF=3, and no vnodes.  In that case each piece of data is stored as follows:

<primary>: <replicas>

N1: N2 N3

N2: N3 N4

N3: N4 N5

N4: N5 N6

N5: N6 N1

N6: N1 N2

With this setup, there are some circumstances where you could lose 2 nodes (ex: N1 & N4) and still be able to maintain CL=quorum.  If your cluster is very large, then you could lose even more — and that’s a good thing, because if you have hundreds/thousands of nodes then you don’t want the world to come tumbling down if  > 1 node is down.  Or maybe you want to upgrade the OS on your nodes, and want to (with very careful planning!) do it by taking down more than 1 node at a time.

… but if you have a large number of vnodes, then a given node will share a small segment of data with LOTS of other nodes, which destroys this property.  The more vnodes, the less likely you’re able to handle > 1 node down.

For example, see this diagram in the Datastax docs —

https://docs.datastax.com/en/dse/5.1/dse-arch/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html#Distributingdatausingvnodes

In that bottom picture, you can’t knock out 2 nodes and still maintain CL=quorum.  Ex:  If you knock out node 1 & 4, then ranges B & L would no longer meet CL=quorum;  but you can do that in the top diagram, since there are no ranges shared between node 1 & 4.

Hope that helps.

- Max

On Feb 3, 2020, at 8:39 pm, onmstester onmstester <ma...@zoho.com.INVALID> wrote:

Sorry if its trivial, but i do not understand how num_tokens affects availability, with RF=3, CLW,CLR=quorum, the cluster could tolerate to lost at most one node and all of the tokens assigned to that node would be also assigned to two other nodes no matter what num_tokens is, right?

Sent using https://www.zoho.com/mail/

============ Forwarded message ============
From: Jon Haddad <ma...@jonhaddad.com>
To: <ma...@cassandra.apache.org>
Date: Tue, 04 Feb 2020 01:15:21 +0330
Subject: Re: [Discuss] num_tokens default in Cassandra 4.0
============ Forwarded message ============

I think it's a good idea to take a step back and get a high level view of 
the problem we're trying to solve. 

First, high token counts result in decreased availability as each node has 
data overlap with with more nodes in the cluster.  Specifically, a node can 
share data with RF-1 * 2 * num_tokens.  So a 256 token cluster at RF=3 is 
going to almost always share data with every other node in the cluster that 
isn't in the same rack, unless you're doing something wild like using more 
than a thousand nodes in a cluster.  We advertise 

With 16 tokens, that is vastly improved, but you still have up to 64 nodes 
each node needs to query against, so you're again, hitting every node 
unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs).  I 
wouldn't use 16 here, and I doubt any of you would either.  I've advocated 
for 4 tokens because you'd have overlap with only 16 nodes, which works 
well for small clusters as well as large.  Assuming I was creating a new 
cluster for myself (in a hypothetical brand new application I'm building) I 
would put this in production.  I have worked with several teams where I 
helped them put 4 token clusters in prod and it has worked very well.  We 
didn't see any wild imbalance issues. 

As Mick's pointed out, our current method of using random token assignment 
for the default number of problematic for 4 tokens.  I fully agree with 
this, and I think if we were to try to use 4 tokens, we'd want to address 
this in tandem.  We can discuss how to better allocate tokens by default 
(something more predictable than random), but I'd like to avoid the 
specifics of that for the sake of this email. 

To Alex's point, repairs are problematic with lower token counts due to 
over streaming.  I think this is a pretty serious issue and I we'd have to 
address it before going all the way down to 4.  This, in my opinion, is a 
more complex problem to solve and I think trying to fix it here could make 
shipping 4.0 take even longer, something none of us want. 

For the sake of shipping 4.0 without adding extra overhead and time, I'm ok 
with moving to 16 tokens, and in the process adding extensive documentation 
outlining what we recommend for production use.  I think we should also try 
to figure out something better than random as the default to fix the data 
imbalance issues.  I've got a few ideas here I've been noodling on. 

As long as folks are fine with potentially changing the default again in C* 
5.0 (after another discussion / debate), 16 is enough of an improvement 
that I'm OK with the change, and willing to author the docs to help people 
set up their first cluster.  For folks that go into production with the 
defaults, we're at least not setting them up for total failure once their 
clusters get large like we are now. 

In future versions, we'll probably want to address the issue of data 
imbalance by building something in that shifts individual tokens around.  I 
don't think we should try to do this in 4.0 either. 

Jon 

On Fri, Jan 31, 2020 at 2:04 PM Jeremy Hanna <ma...@gmail.com> 
wrote: 

> I think Mick and Anthony make some valid operational and skew points for 
> smaller/starting clusters with 4 num_tokens. There’s an arbitrary line 
> between small and large clusters but I think most would agree that most 
> clusters are on the small to medium side. (A small nuance is afaict the 
> probabilities have to do with quorum on a full token range, ie it has to do 
> with the size of a datacenter not the full cluster 
> 
> As I read this discussion I’m personally more inclined to go with 16 for 
> now. It’s true that if we could fix the skew and topology gotchas for those 
> starting things up, 4 would be ideal from an availability perspective. 
> However we’re still in the brainstorming stage for how to address those 
> challenges. I think we should create tickets for those issues and go with 
> 16 for 4.0. 
> 
> This is about an out of the box experience. It balances availability, 
> operations (such as skew and general bootstrap friendliness and 
> streaming/repair), and cluster sizing. Balancing all of those, I think for 
> now I’m more comfortable with 16 as the default with docs on considerations 
> and tickets to unblock 4 as the default for all users. 
> 
> >>> On Feb 1, 2020, at 6:30 AM, Jeff Jirsa <ma...@gmail.com> wrote: 
> >> On Fri, Jan 31, 2020 at 11:25 AM Joseph Lynch <ma...@gmail.com> 
> wrote: 
> >> I think that we might be bikeshedding this number a bit because it is 
> easy 
> >> to debate and there is not yet one right answer. 
> > 
> > 
> > https://www.youtube.com/watch?v=v465T5u9UKo 
> 
> --------------------------------------------------------------------- 
> To unsubscribe, e-mail: mailto:dev-unsubscribe@cassandra.apache.org 
> For additional commands, e-mail: mailto:dev-help@cassandra.apache.org 
> 
>

Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by Jeremiah D Jordan <je...@gmail.com>.

JustFYI if being able to operationally do things many nodes at a time, you should look at setting up racks.  With num racks = RF you can take down all nodes in a given rack at once without affecting LOCAL_QUORUM.  Your single token example has the same functionality in this respect as a vnodes cluster using racks (and actually if you setup a single token cluster using racks you would have setup nodes N1 and N4 to be in the same rack).

> On Feb 3, 2020, at 11:07 PM, Max C. <mc...@core43.com> wrote:
> 
> Let’s say you have a 6 node cluster, with RF=3, and no vnodes.  In that case each piece of data is stored as follows:
> 
> <primary>: <replicas>
> N1: N2 N3
> N2: N3 N4
> N3: N4 N5
> N4: N5 N6
> N5: N6 N1
> N6: N1 N2
> 
> With this setup, there are some circumstances where you could lose 2 nodes (ex: N1 & N4) and still be able to maintain CL=quorum.  If your cluster is very large, then you could lose even more — and that’s a good thing, because if you have hundreds/thousands of nodes then you don’t want the world to come tumbling down if  > 1 node is down.  Or maybe you want to upgrade the OS on your nodes, and want to (with very careful planning!) do it by taking down more than 1 node at a time.
> 
> … but if you have a large number of vnodes, then a given node will share a small segment of data with LOTS of other nodes, which destroys this property.  The more vnodes, the less likely you’re able to handle > 1 node down.
> 
> For example, see this diagram in the Datastax docs —
> 
> https://docs.datastax.com/en/dse/5.1/dse-arch/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html#Distributingdatausingvnodes <https://docs.datastax.com/en/dse/5.1/dse-arch/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html#Distributingdatausingvnodes>
> 
> In that bottom picture, you can’t knock out 2 nodes and still maintain CL=quorum.  Ex:  If you knock out node 1 & 4, then ranges B & L would no longer meet CL=quorum;  but you can do that in the top diagram, since there are no ranges shared between node 1 & 4.
> 
> Hope that helps.
> 
> - Max
> 
> 
>> On Feb 3, 2020, at 8:39 pm, onmstester onmstester <onmstester@zoho.com.INVALID <ma...@zoho.com.INVALID>> wrote:
>> 
>> Sorry if its trivial, but i do not understand how num_tokens affects availability, with RF=3, CLW,CLR=quorum, the cluster could tolerate to lost at most one node and all of the tokens assigned to that node would be also assigned to two other nodes no matter what num_tokens is, right?
>> 
>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>> 
>> 
>> ============ Forwarded message ============
>> From: Jon Haddad <jon@jonhaddad.com <ma...@jonhaddad.com>>
>> To: <dev@cassandra.apache.org <ma...@cassandra.apache.org>>
>> Date: Tue, 04 Feb 2020 01:15:21 +0330
>> Subject: Re: [Discuss] num_tokens default in Cassandra 4.0
>> ============ Forwarded message ============
>> 
>> I think it's a good idea to take a step back and get a high level view of 
>> the problem we're trying to solve. 
>> 
>> First, high token counts result in decreased availability as each node has 
>> data overlap with with more nodes in the cluster. Specifically, a node can 
>> share data with RF-1 * 2 * num_tokens. So a 256 token cluster at RF=3 is 
>> going to almost always share data with every other node in the cluster that 
>> isn't in the same rack, unless you're doing something wild like using more 
>> than a thousand nodes in a cluster. We advertise 
>> 
>> With 16 tokens, that is vastly improved, but you still have up to 64 nodes 
>> each node needs to query against, so you're again, hitting every node 
>> unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs). I 
>> wouldn't use 16 here, and I doubt any of you would either. I've advocated 
>> for 4 tokens because you'd have overlap with only 16 nodes, which works 
>> well for small clusters as well as large. Assuming I was creating a new 
>> cluster for myself (in a hypothetical brand new application I'm building) I 
>> would put this in production. I have worked with several teams where I 
>> helped them put 4 token clusters in prod and it has worked very well. We 
>> didn't see any wild imbalance issues. 
>> 
>> As Mick's pointed out, our current method of using random token assignment 
>> for the default number of problematic for 4 tokens. I fully agree with 
>> this, and I think if we were to try to use 4 tokens, we'd want to address 
>> this in tandem. We can discuss how to better allocate tokens by default 
>> (something more predictable than random), but I'd like to avoid the 
>> specifics of that for the sake of this email. 
>> 
>> To Alex's point, repairs are problematic with lower token counts due to 
>> over streaming. I think this is a pretty serious issue and I we'd have to 
>> address it before going all the way down to 4. This, in my opinion, is a 
>> more complex problem to solve and I think trying to fix it here could make 
>> shipping 4.0 take even longer, something none of us want. 
>> 
>> For the sake of shipping 4.0 without adding extra overhead and time, I'm ok 
>> with moving to 16 tokens, and in the process adding extensive documentation 
>> outlining what we recommend for production use. I think we should also try 
>> to figure out something better than random as the default to fix the data 
>> imbalance issues. I've got a few ideas here I've been noodling on. 
>> 
>> As long as folks are fine with potentially changing the default again in C* 
>> 5.0 (after another discussion / debate), 16 is enough of an improvement 
>> that I'm OK with the change, and willing to author the docs to help people 
>> set up their first cluster. For folks that go into production with the 
>> defaults, we're at least not setting them up for total failure once their 
>> clusters get large like we are now. 
>> 
>> In future versions, we'll probably want to address the issue of data 
>> imbalance by building something in that shifts individual tokens around. I 
>> don't think we should try to do this in 4.0 either. 
>> 
>> Jon 
>> 
>> 
>> 
>> On Fri, Jan 31, 2020 at 2:04 PM Jeremy Hanna <jeremy.hanna1234@gmail.com <ma...@gmail.com>> 
>> wrote: 
>> 
>> > I think Mick and Anthony make some valid operational and skew points for 
>> > smaller/starting clusters with 4 num_tokens. There’s an arbitrary line 
>> > between small and large clusters but I think most would agree that most 
>> > clusters are on the small to medium side. (A small nuance is afaict the 
>> > probabilities have to do with quorum on a full token range, ie it has to do 
>> > with the size of a datacenter not the full cluster 
>> > 
>> > As I read this discussion I’m personally more inclined to go with 16 for 
>> > now. It’s true that if we could fix the skew and topology gotchas for those 
>> > starting things up, 4 would be ideal from an availability perspective. 
>> > However we’re still in the brainstorming stage for how to address those 
>> > challenges. I think we should create tickets for those issues and go with 
>> > 16 for 4.0. 
>> > 
>> > This is about an out of the box experience. It balances availability, 
>> > operations (such as skew and general bootstrap friendliness and 
>> > streaming/repair), and cluster sizing. Balancing all of those, I think for 
>> > now I’m more comfortable with 16 as the default with docs on considerations 
>> > and tickets to unblock 4 as the default for all users. 
>> > 
>> > >>> On Feb 1, 2020, at 6:30 AM, Jeff Jirsa <jjirsa@gmail.com <ma...@gmail.com>> wrote: 
>> > >> On Fri, Jan 31, 2020 at 11:25 AM Joseph Lynch <joe.e.lynch@gmail.com <ma...@gmail.com>> 
>> > wrote: 
>> > >> I think that we might be bikeshedding this number a bit because it is 
>> > easy 
>> > >> to debate and there is not yet one right answer. 
>> > > 
>> > > 
>> > > https://www.youtube.com/watch?v=v465T5u9UKo <https://www.youtube.com/watch?v=v465T5u9UKo> 
>> > 
>> > --------------------------------------------------------------------- 
>> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org> 
>> > For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org> 
>> > 
>> > 
>> 
>> 
>

Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by Carl Mueller <ca...@smartthings.com.INVALID>.

Your case seems to argue for completely eliminating vnodes. Which the Priam
people have been preaching for a long time.

There is not, certainly to a cassandra user-level person, good
documentation on the pros and cons of vnodes vs single tokens, and as we
see here the impacts of various vnode counts isn't an obvious/trivial
concept in a database system already loaded with nontrivial concepts.

I've never seen a good honest discussion on why vnodes made their way into
cassandra as the default, and 256 as the node count. They immediately broke
secondary indexes, and as max says they muddle the data distribution for
resiliency and scaling more than one node at once.

The only advantage seems to be "no manual token management" which to me was
just laziness at the tooling/nodetool level, and better streaming impacts
on node standup, although you seem completely limited to single node at a
time scaling which is a huge restriction.

Also, there is an inability to change the node count, which seems really
strange given than vnodes are supposed to be able to address heterogenous
node hardware and subsegment the data to a finer grain. I get that changing
node count would be a "hard problem", but the current solution is basically
"spin up a new datacenter with a new node count" which is a sucky solution.

RE: the racks point by Jeremiah, we do have rack alignment, and although I
understand theoretically that I should be able to do things with rack
aligned quorum safety (And did in some extreme instances of an LCS --> STCS
--> LCS local recompaction to force tombstone purges that were in
inaccessible sections of the LCS tree), the current warnings about scaling
simultaneously and lack of discussion on how we can use racks in the case
of vnodes to do so, and given all the tickets about problems with multiple
node scaling, we're kind of stuck.

I get that the ideal case of cassandra is gradually growing data with
balanced load growth.

But for more chaotic loads, which things like IoT fleets coming online at
once and misbehaving networks of IoT devices, it would be really nice to
increase our load scaling abilities. We are kind of stuck with vertical
node scaling, which has rapidly diminishing returns, and spinning up nodes
one at a time.

Vnode count seems to impact all of this in some ways, and in opaque ways.

Anyway, I'm fine with 16, agree that token selection should be improved,
but think a priority for adding the ability to change node counts online
should be explored, even if it involves slowly picking off 1 vnode at a
time from one machine. VNode evolution would be very rare, more rare than
version upgrades.

On Mon, Feb 3, 2020 at 11:07 PM Max C. <mc...@core43.com> wrote:

> Let’s say you have a 6 node cluster, with RF=3, and no vnodes.  In that
> case each piece of data is stored as follows:
>
> <primary>: <replicas>
> *N1: N2 N3*
> N2: N3 N4
> N3: N4 N5
> *N4: N5 N6*
> N5: N6 N1
> N6: N1 N2
>
> With this setup, there are some circumstances where you could lose 2 nodes
> (ex: N1 & N4) and still be able to maintain CL=quorum.  If your cluster is
> very large, then you could lose even more — and that’s a good thing,
> because if you have hundreds/thousands of nodes then you don’t want the
> world to come tumbling down if  > 1 node is down.  Or maybe you want to
> upgrade the OS on your nodes, and want to (with very careful planning!) do
> it by taking down more than 1 node at a time.
>
> … but if you have a large number of vnodes, then a given node will share a
> small segment of data with LOTS of other nodes, which destroys this
> property.  The more vnodes, the less likely you’re able to handle > 1 node
> down.
>
> For example, see this diagram in the Datastax docs —
>
>
> https://docs.datastax.com/en/dse/5.1/dse-arch/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html#Distributingdatausingvnodes
>
> In that bottom picture, you can’t knock out 2 nodes and still maintain
> CL=quorum.  Ex:  If you knock out node 1 & 4, then ranges B & L would no
> longer meet CL=quorum;  but you can do that in the top diagram, since there
> are no ranges shared between node 1 & 4.
>
> Hope that helps.
>
> - Max
>
>
> On Feb 3, 2020, at 8:39 pm, onmstester onmstester <
> onmstester@zoho.com.INVALID> wrote:
>
> Sorry if its trivial, but i do not understand how num_tokens affects
> availability, with RF=3, CLW,CLR=quorum, the cluster could tolerate to lost
> at most one node and all of the tokens assigned to that node would be also
> assigned to two other nodes no matter what num_tokens is, right?
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
> ============ Forwarded message ============
> From: Jon Haddad <jo...@jonhaddad.com>
> To: <de...@cassandra.apache.org>
> Date: Tue, 04 Feb 2020 01:15:21 +0330
> Subject: Re: [Discuss] num_tokens default in Cassandra 4.0
> ============ Forwarded message ============
>
> I think it's a good idea to take a step back and get a high level view of
> the problem we're trying to solve.
>
> First, high token counts result in decreased availability as each node has
> data overlap with with more nodes in the cluster. Specifically, a node can
> share data with RF-1 * 2 * num_tokens. So a 256 token cluster at RF=3 is
> going to almost always share data with every other node in the cluster
> that
> isn't in the same rack, unless you're doing something wild like using more
> than a thousand nodes in a cluster. We advertise
>
> With 16 tokens, that is vastly improved, but you still have up to 64 nodes
> each node needs to query against, so you're again, hitting every node
> unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs). I
> wouldn't use 16 here, and I doubt any of you would either. I've advocated
> for 4 tokens because you'd have overlap with only 16 nodes, which works
> well for small clusters as well as large. Assuming I was creating a new
> cluster for myself (in a hypothetical brand new application I'm building)
> I
> would put this in production. I have worked with several teams where I
> helped them put 4 token clusters in prod and it has worked very well. We
> didn't see any wild imbalance issues.
>
> As Mick's pointed out, our current method of using random token assignment
> for the default number of problematic for 4 tokens. I fully agree with
> this, and I think if we were to try to use 4 tokens, we'd want to address
> this in tandem. We can discuss how to better allocate tokens by default
> (something more predictable than random), but I'd like to avoid the
> specifics of that for the sake of this email.
>
> To Alex's point, repairs are problematic with lower token counts due to
> over streaming. I think this is a pretty serious issue and I we'd have to
> address it before going all the way down to 4. This, in my opinion, is a
> more complex problem to solve and I think trying to fix it here could make
> shipping 4.0 take even longer, something none of us want.
>
> For the sake of shipping 4.0 without adding extra overhead and time, I'm
> ok
> with moving to 16 tokens, and in the process adding extensive
> documentation
> outlining what we recommend for production use. I think we should also try
> to figure out something better than random as the default to fix the data
> imbalance issues. I've got a few ideas here I've been noodling on.
>
> As long as folks are fine with potentially changing the default again in
> C*
> 5.0 (after another discussion / debate), 16 is enough of an improvement
> that I'm OK with the change, and willing to author the docs to help people
> set up their first cluster. For folks that go into production with the
> defaults, we're at least not setting them up for total failure once their
> clusters get large like we are now.
>
> In future versions, we'll probably want to address the issue of data
> imbalance by building something in that shifts individual tokens around. I
> don't think we should try to do this in 4.0 either.
>
> Jon
>
>
>
> On Fri, Jan 31, 2020 at 2:04 PM Jeremy Hanna <je...@gmail.com>
> wrote:
>
> > I think Mick and Anthony make some valid operational and skew points for
> > smaller/starting clusters with 4 num_tokens. There’s an arbitrary line
> > between small and large clusters but I think most would agree that most
> > clusters are on the small to medium side. (A small nuance is afaict the
> > probabilities have to do with quorum on a full token range, ie it has to
> do
> > with the size of a datacenter not the full cluster
> >
> > As I read this discussion I’m personally more inclined to go with 16 for
> > now. It’s true that if we could fix the skew and topology gotchas for
> those
> > starting things up, 4 would be ideal from an availability perspective.
> > However we’re still in the brainstorming stage for how to address those
> > challenges. I think we should create tickets for those issues and go
> with
> > 16 for 4.0.
> >
> > This is about an out of the box experience. It balances availability,
> > operations (such as skew and general bootstrap friendliness and
> > streaming/repair), and cluster sizing. Balancing all of those, I think
> for
> > now I’m more comfortable with 16 as the default with docs on
> considerations
> > and tickets to unblock 4 as the default for all users.
> >
> > >>> On Feb 1, 2020, at 6:30 AM, Jeff Jirsa <jj...@gmail.com> wrote:
> > >> On Fri, Jan 31, 2020 at 11:25 AM Joseph Lynch <jo...@gmail.com>
>
> > wrote:
> > >> I think that we might be bikeshedding this number a bit because it is
> > easy
> > >> to debate and there is not yet one right answer.
> > >
> > >
> > > https://www.youtube.com/watch?v=v465T5u9UKo
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>
>
>
>
>

Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by "Max C." <mc...@core43.com>.

Let’s say you have a 6 node cluster, with RF=3, and no vnodes.  In that case each piece of data is stored as follows:

<primary>: <replicas>
N1: N2 N3
N2: N3 N4
N3: N4 N5
N4: N5 N6
N5: N6 N1
N6: N1 N2

With this setup, there are some circumstances where you could lose 2 nodes (ex: N1 & N4) and still be able to maintain CL=quorum.  If your cluster is very large, then you could lose even more — and that’s a good thing, because if you have hundreds/thousands of nodes then you don’t want the world to come tumbling down if  > 1 node is down.  Or maybe you want to upgrade the OS on your nodes, and want to (with very careful planning!) do it by taking down more than 1 node at a time.

… but if you have a large number of vnodes, then a given node will share a small segment of data with LOTS of other nodes, which destroys this property.  The more vnodes, the less likely you’re able to handle > 1 node down.

For example, see this diagram in the Datastax docs —

https://docs.datastax.com/en/dse/5.1/dse-arch/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html#Distributingdatausingvnodes <https://docs.datastax.com/en/dse/5.1/dse-arch/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html#Distributingdatausingvnodes>

In that bottom picture, you can’t knock out 2 nodes and still maintain CL=quorum.  Ex:  If you knock out node 1 & 4, then ranges B & L would no longer meet CL=quorum;  but you can do that in the top diagram, since there are no ranges shared between node 1 & 4.

Hope that helps.

- Max


> On Feb 3, 2020, at 8:39 pm, onmstester onmstester <on...@zoho.com.INVALID> wrote:
> 
> Sorry if its trivial, but i do not understand how num_tokens affects availability, with RF=3, CLW,CLR=quorum, the cluster could tolerate to lost at most one node and all of the tokens assigned to that node would be also assigned to two other nodes no matter what num_tokens is, right?
> 
> Sent using Zoho Mail <https://www.zoho.com/mail/>
> 
> 
> ============ Forwarded message ============
> From: Jon Haddad <jon@jonhaddad.com <ma...@jonhaddad.com>>
> To: <dev@cassandra.apache.org <ma...@cassandra.apache.org>>
> Date: Tue, 04 Feb 2020 01:15:21 +0330
> Subject: Re: [Discuss] num_tokens default in Cassandra 4.0
> ============ Forwarded message ============
> 
> I think it's a good idea to take a step back and get a high level view of 
> the problem we're trying to solve. 
> 
> First, high token counts result in decreased availability as each node has 
> data overlap with with more nodes in the cluster. Specifically, a node can 
> share data with RF-1 * 2 * num_tokens. So a 256 token cluster at RF=3 is 
> going to almost always share data with every other node in the cluster that 
> isn't in the same rack, unless you're doing something wild like using more 
> than a thousand nodes in a cluster. We advertise 
> 
> With 16 tokens, that is vastly improved, but you still have up to 64 nodes 
> each node needs to query against, so you're again, hitting every node 
> unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs). I 
> wouldn't use 16 here, and I doubt any of you would either. I've advocated 
> for 4 tokens because you'd have overlap with only 16 nodes, which works 
> well for small clusters as well as large. Assuming I was creating a new 
> cluster for myself (in a hypothetical brand new application I'm building) I 
> would put this in production. I have worked with several teams where I 
> helped them put 4 token clusters in prod and it has worked very well. We 
> didn't see any wild imbalance issues. 
> 
> As Mick's pointed out, our current method of using random token assignment 
> for the default number of problematic for 4 tokens. I fully agree with 
> this, and I think if we were to try to use 4 tokens, we'd want to address 
> this in tandem. We can discuss how to better allocate tokens by default 
> (something more predictable than random), but I'd like to avoid the 
> specifics of that for the sake of this email. 
> 
> To Alex's point, repairs are problematic with lower token counts due to 
> over streaming. I think this is a pretty serious issue and I we'd have to 
> address it before going all the way down to 4. This, in my opinion, is a 
> more complex problem to solve and I think trying to fix it here could make 
> shipping 4.0 take even longer, something none of us want. 
> 
> For the sake of shipping 4.0 without adding extra overhead and time, I'm ok 
> with moving to 16 tokens, and in the process adding extensive documentation 
> outlining what we recommend for production use. I think we should also try 
> to figure out something better than random as the default to fix the data 
> imbalance issues. I've got a few ideas here I've been noodling on. 
> 
> As long as folks are fine with potentially changing the default again in C* 
> 5.0 (after another discussion / debate), 16 is enough of an improvement 
> that I'm OK with the change, and willing to author the docs to help people 
> set up their first cluster. For folks that go into production with the 
> defaults, we're at least not setting them up for total failure once their 
> clusters get large like we are now. 
> 
> In future versions, we'll probably want to address the issue of data 
> imbalance by building something in that shifts individual tokens around. I 
> don't think we should try to do this in 4.0 either. 
> 
> Jon 
> 
> 
> 
> On Fri, Jan 31, 2020 at 2:04 PM Jeremy Hanna <jeremy.hanna1234@gmail.com <ma...@gmail.com>> 
> wrote: 
> 
> > I think Mick and Anthony make some valid operational and skew points for 
> > smaller/starting clusters with 4 num_tokens. There’s an arbitrary line 
> > between small and large clusters but I think most would agree that most 
> > clusters are on the small to medium side. (A small nuance is afaict the 
> > probabilities have to do with quorum on a full token range, ie it has to do 
> > with the size of a datacenter not the full cluster 
> > 
> > As I read this discussion I’m personally more inclined to go with 16 for 
> > now. It’s true that if we could fix the skew and topology gotchas for those 
> > starting things up, 4 would be ideal from an availability perspective. 
> > However we’re still in the brainstorming stage for how to address those 
> > challenges. I think we should create tickets for those issues and go with 
> > 16 for 4.0. 
> > 
> > This is about an out of the box experience. It balances availability, 
> > operations (such as skew and general bootstrap friendliness and 
> > streaming/repair), and cluster sizing. Balancing all of those, I think for 
> > now I’m more comfortable with 16 as the default with docs on considerations 
> > and tickets to unblock 4 as the default for all users. 
> > 
> > >>> On Feb 1, 2020, at 6:30 AM, Jeff Jirsa <jjirsa@gmail.com <ma...@gmail.com>> wrote: 
> > >> On Fri, Jan 31, 2020 at 11:25 AM Joseph Lynch <joe.e.lynch@gmail.com <ma...@gmail.com>> 
> > wrote: 
> > >> I think that we might be bikeshedding this number a bit because it is 
> > easy 
> > >> to debate and there is not yet one right answer. 
> > > 
> > > 
> > > https://www.youtube.com/watch?v=v465T5u9UKo <https://www.youtube.com/watch?v=v465T5u9UKo> 
> > 
> > --------------------------------------------------------------------- 
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org> 
> > For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org> 
> > 
> > 
> 
>

Fwd: Re: [Discuss] num_tokens default in Cassandra 4.0

Posted by onmstester onmstester <on...@zoho.com.INVALID>.

Sorry if its trivial, but i do not understand how num_tokens affects availability, with RF=3, CLW,CLR=quorum, the cluster could tolerate to lost at most one node and all of the tokens assigned to that node would be also assigned to two other nodes no matter what num_tokens is, right?

Sent using https://www.zoho.com/mail/

============ Forwarded message ============
From: Jon Haddad <ma...@jonhaddad.com>
To: <ma...@cassandra.apache.org>
Date: Tue, 04 Feb 2020 01:15:21 +0330
Subject: Re: [Discuss] num_tokens default in Cassandra 4.0
============ Forwarded message ============

I think it's a good idea to take a step back and get a high level view of 
the problem we're trying to solve. 

First, high token counts result in decreased availability as each node has 
data overlap with with more nodes in the cluster.  Specifically, a node can 
share data with RF-1 * 2 * num_tokens.  So a 256 token cluster at RF=3 is 
going to almost always share data with every other node in the cluster that 
isn't in the same rack, unless you're doing something wild like using more 
than a thousand nodes in a cluster.  We advertise 

With 16 tokens, that is vastly improved, but you still have up to 64 nodes 
each node needs to query against, so you're again, hitting every node 
unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs).  I 
wouldn't use 16 here, and I doubt any of you would either.  I've advocated 
for 4 tokens because you'd have overlap with only 16 nodes, which works 
well for small clusters as well as large.  Assuming I was creating a new 
cluster for myself (in a hypothetical brand new application I'm building) I 
would put this in production.  I have worked with several teams where I 
helped them put 4 token clusters in prod and it has worked very well.  We 
didn't see any wild imbalance issues. 

As Mick's pointed out, our current method of using random token assignment 
for the default number of problematic for 4 tokens.  I fully agree with 
this, and I think if we were to try to use 4 tokens, we'd want to address 
this in tandem.  We can discuss how to better allocate tokens by default 
(something more predictable than random), but I'd like to avoid the 
specifics of that for the sake of this email. 

To Alex's point, repairs are problematic with lower token counts due to 
over streaming.  I think this is a pretty serious issue and I we'd have to 
address it before going all the way down to 4.  This, in my opinion, is a 
more complex problem to solve and I think trying to fix it here could make 
shipping 4.0 take even longer, something none of us want. 

For the sake of shipping 4.0 without adding extra overhead and time, I'm ok 
with moving to 16 tokens, and in the process adding extensive documentation 
outlining what we recommend for production use.  I think we should also try 
to figure out something better than random as the default to fix the data 
imbalance issues.  I've got a few ideas here I've been noodling on. 

As long as folks are fine with potentially changing the default again in C* 
5.0 (after another discussion / debate), 16 is enough of an improvement 
that I'm OK with the change, and willing to author the docs to help people 
set up their first cluster.  For folks that go into production with the 
defaults, we're at least not setting them up for total failure once their 
clusters get large like we are now. 

In future versions, we'll probably want to address the issue of data 
imbalance by building something in that shifts individual tokens around.  I 
don't think we should try to do this in 4.0 either. 

Jon 

On Fri, Jan 31, 2020 at 2:04 PM Jeremy Hanna <ma...@gmail.com> 
wrote: 

> I think Mick and Anthony make some valid operational and skew points for 
> smaller/starting clusters with 4 num_tokens. There’s an arbitrary line 
> between small and large clusters but I think most would agree that most 
> clusters are on the small to medium side. (A small nuance is afaict the 
> probabilities have to do with quorum on a full token range, ie it has to do 
> with the size of a datacenter not the full cluster 
> 
> As I read this discussion I’m personally more inclined to go with 16 for 
> now. It’s true that if we could fix the skew and topology gotchas for those 
> starting things up, 4 would be ideal from an availability perspective. 
> However we’re still in the brainstorming stage for how to address those 
> challenges. I think we should create tickets for those issues and go with 
> 16 for 4.0. 
> 
> This is about an out of the box experience. It balances availability, 
> operations (such as skew and general bootstrap friendliness and 
> streaming/repair), and cluster sizing. Balancing all of those, I think for 
> now I’m more comfortable with 16 as the default with docs on considerations 
> and tickets to unblock 4 as the default for all users. 
> 
> >>> On Feb 1, 2020, at 6:30 AM, Jeff Jirsa <ma...@gmail.com> wrote: 
> >> On Fri, Jan 31, 2020 at 11:25 AM Joseph Lynch <ma...@gmail.com> 
> wrote: 
> >> I think that we might be bikeshedding this number a bit because it is 
> easy 
> >> to debate and there is not yet one right answer. 
> > 
> > 
> > https://www.youtube.com/watch?v=v465T5u9UKo 
> 
> --------------------------------------------------------------------- 
> To unsubscribe, e-mail: mailto:dev-unsubscribe@cassandra.apache.org 
> For additional commands, e-mail: mailto:dev-help@cassandra.apache.org 
> 
>