You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by John Sanda <jo...@gmail.com> on 2012/10/31 17:38:56 UTC

distribution of token ranges with virtual nodes

I am not entirely clear on what
http://wiki.apache.org/cassandra/VirtualNodes/Balance#imbalance is saying
with respect to random vs. manual token selection. Can/should i assume that
i will get even range distribution or close to it with random token
selection? For the sake of discussion, what is a reasonable default to
start with for num_tokens assuming nodes are homogenous? That wiki page
mentions a default of 256 which I see commented out in cassandra.yaml;
however, Config.num_tokens is set to 1. Maybe I missed where the default of
256 is used. From some initial testing though, it looks like 1 token per
node is being used. Using defaults in cassandra.yaml, I see this in my logs,


WARN [main] 2012-10-31 12:06:48,591 StorageService.java (line 639)
Generated random token [-8703249769453332665]. Random tokens will result in
an unbalanced ring; see http://wiki.apache.org/cassandra/Operations

-- 

- John

Re: distribution of token ranges with virtual nodes

Posted by Manu Zhang <ow...@gmail.com>.

sorry, I missed it since it's not executable by default.


On Thu, Jan 10, 2013 at 10:05 AM, Jason Wee <pe...@gmail.com> wrote:

> It should be in the trunk, check it
> https://github.com/apache/cassandra/blob/trunk/bin/cassandra-shuffle
>
>
> On Thu, Jan 10, 2013 at 1:18 AM, Manu Zhang <ow...@gmail.com>wrote:
>
>> Is cassandra-shuffle command in the trunk? Or it is only included in the
>> Debian package? I don't find it in the trunk.
>>
>>
>> On Sat, Nov 3, 2012 at 2:18 AM, Eric Evans <ee...@acunu.com> wrote:
>>
>>> On Fri, Nov 2, 2012 at 12:38 AM, Manu Zhang <ow...@gmail.com>
>>> wrote:
>>> >> It splits into a contiguous range, because truly upgrading to vnode
>>> >> functionality is another step.
>>> >
>>> > That confuses me. As I understand it, there is no point in having 256
>>> tokens
>>> > on same node if I don't commit the shuffle
>>>
>>> This isn't exactly true.  By-partition operations (think repair,
>>> streaming, etc) will be more reliable in the sense that if they fail
>>> and need to be restarted, there is less that is lost/needs redoing.
>>> Also, if all you did was migrate from 1-token-per-node to 256
>>> contiguous tokens per node, normal topology changes (bootstrapping new
>>> nodes, decommissioning old ones), would gradually work to redistribute
>>> the partitions.  And, from a topology perspective, splitting the one
>>> partition into many contiguous partition is a no-op; it's safe to do
>>> and there is no cost to speak of from a computational or IO
>>> perspective.
>>>
>>> On the other hand, shuffling requires moving tokens around the
>>> cluster.  If you completely randomize placement, it follows that you
>>> will need to relocate all of the clusters data, so it's quite costly.
>>> It's also precedent setting, and not thoroughly tested yet.
>>>
>>> --
>>> Eric Evans
>>> Acunu | http://www.acunu.com | @acunu
>>>
>>
>>
>

Re: distribution of token ranges with virtual nodes

Posted by Jason Wee <pe...@gmail.com>.

It should be in the trunk, check it
https://github.com/apache/cassandra/blob/trunk/bin/cassandra-shuffle


On Thu, Jan 10, 2013 at 1:18 AM, Manu Zhang <ow...@gmail.com> wrote:

> Is cassandra-shuffle command in the trunk? Or it is only included in the
> Debian package? I don't find it in the trunk.
>
>
> On Sat, Nov 3, 2012 at 2:18 AM, Eric Evans <ee...@acunu.com> wrote:
>
>> On Fri, Nov 2, 2012 at 12:38 AM, Manu Zhang <ow...@gmail.com>
>> wrote:
>> >> It splits into a contiguous range, because truly upgrading to vnode
>> >> functionality is another step.
>> >
>> > That confuses me. As I understand it, there is no point in having 256
>> tokens
>> > on same node if I don't commit the shuffle
>>
>> This isn't exactly true.  By-partition operations (think repair,
>> streaming, etc) will be more reliable in the sense that if they fail
>> and need to be restarted, there is less that is lost/needs redoing.
>> Also, if all you did was migrate from 1-token-per-node to 256
>> contiguous tokens per node, normal topology changes (bootstrapping new
>> nodes, decommissioning old ones), would gradually work to redistribute
>> the partitions.  And, from a topology perspective, splitting the one
>> partition into many contiguous partition is a no-op; it's safe to do
>> and there is no cost to speak of from a computational or IO
>> perspective.
>>
>> On the other hand, shuffling requires moving tokens around the
>> cluster.  If you completely randomize placement, it follows that you
>> will need to relocate all of the clusters data, so it's quite costly.
>> It's also precedent setting, and not thoroughly tested yet.
>>
>> --
>> Eric Evans
>> Acunu | http://www.acunu.com | @acunu
>>
>
>

Re: distribution of token ranges with virtual nodes

Posted by Manu Zhang <ow...@gmail.com>.

Is cassandra-shuffle command in the trunk? Or it is only included in the
Debian package? I don't find it in the trunk.


On Sat, Nov 3, 2012 at 2:18 AM, Eric Evans <ee...@acunu.com> wrote:

> On Fri, Nov 2, 2012 at 12:38 AM, Manu Zhang <ow...@gmail.com>
> wrote:
> >> It splits into a contiguous range, because truly upgrading to vnode
> >> functionality is another step.
> >
> > That confuses me. As I understand it, there is no point in having 256
> tokens
> > on same node if I don't commit the shuffle
>
> This isn't exactly true.  By-partition operations (think repair,
> streaming, etc) will be more reliable in the sense that if they fail
> and need to be restarted, there is less that is lost/needs redoing.
> Also, if all you did was migrate from 1-token-per-node to 256
> contiguous tokens per node, normal topology changes (bootstrapping new
> nodes, decommissioning old ones), would gradually work to redistribute
> the partitions.  And, from a topology perspective, splitting the one
> partition into many contiguous partition is a no-op; it's safe to do
> and there is no cost to speak of from a computational or IO
> perspective.
>
> On the other hand, shuffling requires moving tokens around the
> cluster.  If you completely randomize placement, it follows that you
> will need to relocate all of the clusters data, so it's quite costly.
> It's also precedent setting, and not thoroughly tested yet.
>
> --
> Eric Evans
> Acunu | http://www.acunu.com | @acunu
>

Re: distribution of token ranges with virtual nodes

Posted by Eric Evans <ee...@acunu.com>.

On Fri, Nov 2, 2012 at 12:38 AM, Manu Zhang <ow...@gmail.com> wrote:
>> It splits into a contiguous range, because truly upgrading to vnode
>> functionality is another step.
>
> That confuses me. As I understand it, there is no point in having 256 tokens
> on same node if I don't commit the shuffle

This isn't exactly true.  By-partition operations (think repair,
streaming, etc) will be more reliable in the sense that if they fail
and need to be restarted, there is less that is lost/needs redoing.
Also, if all you did was migrate from 1-token-per-node to 256
contiguous tokens per node, normal topology changes (bootstrapping new
nodes, decommissioning old ones), would gradually work to redistribute
the partitions.  And, from a topology perspective, splitting the one
partition into many contiguous partition is a no-op; it's safe to do
and there is no cost to speak of from a computational or IO
perspective.

On the other hand, shuffling requires moving tokens around the
cluster.  If you completely randomize placement, it follows that you
will need to relocate all of the clusters data, so it's quite costly.
It's also precedent setting, and not thoroughly tested yet.

--
Eric Evans
Acunu | http://www.acunu.com | @acunu

Re: distribution of token ranges with virtual nodes

Posted by Manu Zhang <ow...@gmail.com>.

>
> It splits into a contiguous range, because truly upgrading to vnode functionality
> is another step.

That confuses me. As I understand it, there is no point in having 256
tokens on same node if I don't commit the shuffle


On Fri, Nov 2, 2012 at 11:10 AM, Brandon Williams <dr...@gmail.com> wrote:

> On Thu, Nov 1, 2012 at 10:05 PM, Manu Zhang <ow...@gmail.com>
> wrote:
> >
> >> it will migrate you to virtual nodes by splitting the existing partition
> >> 256 ways.
> >
> >
> > Out of curiosity, is it for the purpose of avoiding streaming?
>
> It splits into a contiguous range, because truly upgrading to vnode
> functionality is another step.
>
> >
> >>  the former would require you to perform a shuffle to achieve that.
> >
> >
> > Is there a nodetool option or are there other ways "shuffle" could be
> done
> > automatically?
>
> There a shuffle command in bin/ that was recently committed, we'll
> document this in process in NEWS.txt shortly.
>
> -Brandon
>

Re: distribution of token ranges with virtual nodes

Posted by Brandon Williams <dr...@gmail.com>.

On Thu, Nov 1, 2012 at 10:05 PM, Manu Zhang <ow...@gmail.com> wrote:
>
>> it will migrate you to virtual nodes by splitting the existing partition
>> 256 ways.
>
>
> Out of curiosity, is it for the purpose of avoiding streaming?

It splits into a contiguous range, because truly upgrading to vnode
functionality is another step.

>
>>  the former would require you to perform a shuffle to achieve that.
>
>
> Is there a nodetool option or are there other ways "shuffle" could be done
> automatically?

There a shuffle command in bin/ that was recently committed, we'll
document this in process in NEWS.txt shortly.

-Brandon

Re: distribution of token ranges with virtual nodes

Posted by Manu Zhang <ow...@gmail.com>.

> it will migrate you to virtual nodes by splitting the existing partition
> 256 ways.


Out of curiosity, is it for the purpose of avoiding streaming?

 the former would require you to perform a shuffle to achieve that.


Is there a nodetool option or are there other ways "shuffle" could be done
automatically?


On Thu, Nov 1, 2012 at 2:17 AM, Eric Evans <ee...@acunu.com> wrote:

> On Wed, Oct 31, 2012 at 11:38 AM, John Sanda <jo...@gmail.com> wrote:
> > Can/should i assume that i will get even range distribution or close to
> it with random
> > token selection?
>
> The short answer is: If you're using virtual nodes, random token
> selection will give you even range distribution.
>
> The somewhat longer answer is that this is really a function of the
> total number of tokens.  The more randomly generated tokens a cluster
> has, the more distribution will even out.  The reason this can work
> for virtual nodes where it has not for the older 1-token-per-node
> model is because (assuming a reasonable num_tokens value), virtual
> nodes gives you a much higher token count for a given number of nodes.
>
> That wiki page you cite wasn't really intended to be documentation
> (expect some of that soon though), but what that section was trying to
> convey was that while random distribution is quite good, it may not be
> 100% perfect, especially when the number of nodes is low (remember,
> the number of tokens scales with the number of nodes).  I think this
> is (or may be) a problem for some.  If you're forced to manually
> calculate tokens then you are quite naturally going to calculate a
> perfect distribution, and if you've grown accustomed to this, seeing
> the ownership values off by a few percent could really bring out your
> inner OCD. :)
>
> > For the sake of discussion, what is a reasonable default to start
> > with for num_tokens assuming nodes are homogenous? That wiki page
> mentions a
> > default of 256 which I see commented out in cassandra.yaml; however,
> > Config.num_tokens is set to 1.
>
> The (unconfigured )default is 1.  That is to say that virtual nodes is
> not enabled.  The current recommendation when setting this,
> (documented in the config) is 256.
>
> > Maybe I missed where the default of 256 is
> > used. From some initial testing though, it looks like 1 token per node is
> > being used. Using defaults in cassandra.yaml, I see this in my logs,
>
> Right.  And it's worth noting that if you uncomment num_tokens *after*
> starting a node with it commented (i.e. num_tokens: 1), then it will
> migrate you to virtual nodes by splitting the existing partition 256
> ways.  This is *not* the equivalent of starting a node with num_tokens
> = 256 for the first time.  The latter would leave you with randomized
> placement, the former would require you to perform a shuffle to
> achieve that.
>
>
>
> --
> Eric Evans
> Acunu | http://www.acunu.com | @acunu
>

Re: distribution of token ranges with virtual nodes

Posted by Eric Evans <ee...@acunu.com>.

On Wed, Oct 31, 2012 at 11:38 AM, John Sanda <jo...@gmail.com> wrote:
> Can/should i assume that i will get even range distribution or close to it with random
> token selection?

The short answer is: If you're using virtual nodes, random token
selection will give you even range distribution.

The somewhat longer answer is that this is really a function of the
total number of tokens.  The more randomly generated tokens a cluster
has, the more distribution will even out.  The reason this can work
for virtual nodes where it has not for the older 1-token-per-node
model is because (assuming a reasonable num_tokens value), virtual
nodes gives you a much higher token count for a given number of nodes.

That wiki page you cite wasn't really intended to be documentation
(expect some of that soon though), but what that section was trying to
convey was that while random distribution is quite good, it may not be
100% perfect, especially when the number of nodes is low (remember,
the number of tokens scales with the number of nodes).  I think this
is (or may be) a problem for some.  If you're forced to manually
calculate tokens then you are quite naturally going to calculate a
perfect distribution, and if you've grown accustomed to this, seeing
the ownership values off by a few percent could really bring out your
inner OCD. :)

> For the sake of discussion, what is a reasonable default to start
> with for num_tokens assuming nodes are homogenous? That wiki page mentions a
> default of 256 which I see commented out in cassandra.yaml; however,
> Config.num_tokens is set to 1.

The (unconfigured )default is 1.  That is to say that virtual nodes is
not enabled.  The current recommendation when setting this,
(documented in the config) is 256.

> Maybe I missed where the default of 256 is
> used. From some initial testing though, it looks like 1 token per node is
> being used. Using defaults in cassandra.yaml, I see this in my logs,

Right.  And it's worth noting that if you uncomment num_tokens *after*
starting a node with it commented (i.e. num_tokens: 1), then it will
migrate you to virtual nodes by splitting the existing partition 256
ways.  This is *not* the equivalent of starting a node with num_tokens
= 256 for the first time.  The latter would leave you with randomized
placement, the former would require you to perform a shuffle to
achieve that.

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu