You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Theo Hultberg <th...@iconara.net> on 2013/06/10 16:47:21 UTC

Why so many vnodes?

Hi,

The default number of vnodes is 256, is there any significance in this
number? Since Cassandra's vnodes don't work like for example Riak's, where
there is a fixed number of vnodes distributed evenly over the nodes, why so
many? Even with a moderately sized cluster you get thousands of slices.
Does this matter? If your cluster grows to over thirty machines and you
start looking at ten thousand slices, would that be a problem? I guess trat
traversing a list of a thousand or ten thousand slices to find where a
token lives isn't a huge problem, but are there any other up or downsides
to having a small or large number of vnodes per node?

I understand the benefits for splitting up the ring into pieces, for
example to be able to stream data from more nodes when bootstrapping a new
one, but that works even if each node only has say 32 vnodes (unless your
cluster is truly huge).

yours,
Theo

Re: Why so many vnodes?

Posted by Richard Low <ri...@wentnet.com>.

On 11 June 2013 09:54, Theo Hultberg <th...@iconara.net> wrote:

But in the paragraph just before Richard said that finding the node that
> owns a token becomes slower on large clusters with lots of token ranges, so
> increasing it further seems contradictory.
>

I do mean increase for larger clusters, but I guess it depends on what you
are optimizing for.  If you care about maintaining an even load, where
differences are measured relative to the amount of data each node has, then
you need T >> N.

However, you're right, this can slow down some operations.  Repair has a
fixed cost for each token so gets a bit slower with higher T.  Finding
which node owns a range gets harder with T but this code was optimized so I
don't think it will become a practical issue.

Is this a correct interpretation: finding the node that owns a particular
> token becomes slower as the number of nodes (and therefore total token
> ranges) increases, but for large clusters you also need to take the time
> for bootstraps into account, which will become slower if each node has
> fewer token ranges. The speed referred to in the two cases are the speeds
> of different operations, and there will be a trade off, and 256 initial
> tokens is a trade off that works for most cases.
>

Yes this is right.  The bootstraps may become slower because the node is
streaming from fewer original nodes (although it may only show on very busy
clusters, since otherwise bootstrap is limited by the joining node).  But
more importantly I think is that new nodes won't take an even share of the
data if T is too small.

Richard.

Re: Why so many vnodes?

Posted by Theo Hultberg <th...@iconara.net>.

But in the paragraph just before Richard said that finding the node that
owns a token becomes slower on large clusters with lots of token ranges, so
increasing it further seems contradictory.

Is this a correct interpretation: finding the node that owns a particular
token becomes slower as the number of nodes (and therefore total token
ranges) increases, but for large clusters you also need to take the time
for bootstraps into account, which will become slower if each node has
fewer token ranges. The speed referred to in the two cases are the speeds
of different operations, and there will be a trade off, and 256 initial
tokens is a trade off that works for most cases.

T#


On Tue, Jun 11, 2013 at 8:37 AM, Alain RODRIGUEZ <ar...@gmail.com> wrote:

> I think he actually meant *increase*, for this reason "For small T, a
> random choice of initial tokens will in most cases give a poor distribution
> of data.  The larger T is, the closer to uniform the distribution will be,
> with increasing probability."
>
> Alain
>
>
> 2013/6/11 Theo Hultberg <th...@iconara.net>
>
>> thanks, that makes sense, but I assume in your last sentence you mean
>> decrease it for large clusters, not increase it?
>>
>> T#
>>
>>
>> On Mon, Jun 10, 2013 at 11:02 PM, Richard Low <ri...@wentnet.com>wrote:
>>
>>> Hi Theo,
>>>
>>> The number (let's call it T and the number of nodes N) 256 was chosen to
>>> give good load balancing for random token assignments for most cluster
>>> sizes.  For small T, a random choice of initial tokens will in most cases
>>> give a poor distribution of data.  The larger T is, the closer to uniform
>>> the distribution will be, with increasing probability.
>>>
>>> Also, for small T, when a new node is added, it won't have many ranges
>>> to split so won't be able to take an even slice of the data.
>>>
>>> For this reason T should be large.  But if it is too large, there are
>>> too many slices to keep track of as you say.  The function to find which
>>> keys live where becomes more expensive and operations that deal with
>>> individual vnodes e.g. repair become slow.  (An extreme example is SELECT *
>>> LIMIT 1, which when there is no data has to scan each vnode in turn in
>>> search of a single row.  This is O(NT) and for even quite small T takes
>>> seconds to complete.)
>>>
>>> So 256 was chosen to be a reasonable balance.  I don't think most users
>>> will find it too slow; users with extremely large clusters may need to
>>> increase it.
>>>
>>> Richard.
>>>
>>>
>>> On 10 June 2013 18:55, Theo Hultberg <th...@iconara.net> wrote:
>>>
>>>> I'm not sure I follow what you mean, or if I've misunderstood what
>>>> Cassandra is telling me. Each node has 256 vnodes (or tokens, as the
>>>> prefered name seems to be). When I run `nodetool status` each node is
>>>> reported as having 256 vnodes, regardless of how many nodes are in the
>>>> cluster. A single node cluster has 256 vnodes on the single node, a six
>>>> node cluster has 256 nodes on each machine, making 1590 vnodes in total.
>>>> When I run `SELECT tokens FROM system.peers` or `nodetool ring` each node
>>>> lists 256 tokens.
>>>>
>>>> This is different from how it works in Riak and Voldemort, if I'm not
>>>> mistaken, and that is the source of my confusion.
>>>>
>>>> T#
>>>>
>>>>
>>>> On Mon, Jun 10, 2013 at 4:54 PM, Milind Parikh <mi...@gmail.com>wrote:
>>>>
>>>>> There are n vnodes regardless of the size of the physical cluster.
>>>>> Regards
>>>>> Milind
>>>>> On Jun 10, 2013 7:48 AM, "Theo Hultberg" <th...@iconara.net> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The default number of vnodes is 256, is there any significance in
>>>>>> this number? Since Cassandra's vnodes don't work like for example Riak's,
>>>>>> where there is a fixed number of vnodes distributed evenly over the nodes,
>>>>>> why so many? Even with a moderately sized cluster you get thousands of
>>>>>> slices. Does this matter? If your cluster grows to over thirty machines and
>>>>>> you start looking at ten thousand slices, would that be a problem? I guess
>>>>>> trat traversing a list of a thousand or ten thousand slices to find where a
>>>>>> token lives isn't a huge problem, but are there any other up or downsides
>>>>>> to having a small or large number of vnodes per node?
>>>>>>
>>>>>> I understand the benefits for splitting up the ring into pieces, for
>>>>>> example to be able to stream data from more nodes when bootstrapping a new
>>>>>> one, but that works even if each node only has say 32 vnodes (unless your
>>>>>> cluster is truly huge).
>>>>>>
>>>>>> yours,
>>>>>> Theo
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why so many vnodes?

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

I think he actually meant *increase*, for this reason "For small T, a
random choice of initial tokens will in most cases give a poor distribution
of data.  The larger T is, the closer to uniform the distribution will be,
with increasing probability."

Alain


2013/6/11 Theo Hultberg <th...@iconara.net>

> thanks, that makes sense, but I assume in your last sentence you mean
> decrease it for large clusters, not increase it?
>
> T#
>
>
> On Mon, Jun 10, 2013 at 11:02 PM, Richard Low <ri...@wentnet.com> wrote:
>
>> Hi Theo,
>>
>> The number (let's call it T and the number of nodes N) 256 was chosen to
>> give good load balancing for random token assignments for most cluster
>> sizes.  For small T, a random choice of initial tokens will in most cases
>> give a poor distribution of data.  The larger T is, the closer to uniform
>> the distribution will be, with increasing probability.
>>
>> Also, for small T, when a new node is added, it won't have many ranges to
>> split so won't be able to take an even slice of the data.
>>
>> For this reason T should be large.  But if it is too large, there are too
>> many slices to keep track of as you say.  The function to find which keys
>> live where becomes more expensive and operations that deal with individual
>> vnodes e.g. repair become slow.  (An extreme example is SELECT * LIMIT 1,
>> which when there is no data has to scan each vnode in turn in search of a
>> single row.  This is O(NT) and for even quite small T takes seconds to
>> complete.)
>>
>> So 256 was chosen to be a reasonable balance.  I don't think most users
>> will find it too slow; users with extremely large clusters may need to
>> increase it.
>>
>> Richard.
>>
>>
>> On 10 June 2013 18:55, Theo Hultberg <th...@iconara.net> wrote:
>>
>>> I'm not sure I follow what you mean, or if I've misunderstood what
>>> Cassandra is telling me. Each node has 256 vnodes (or tokens, as the
>>> prefered name seems to be). When I run `nodetool status` each node is
>>> reported as having 256 vnodes, regardless of how many nodes are in the
>>> cluster. A single node cluster has 256 vnodes on the single node, a six
>>> node cluster has 256 nodes on each machine, making 1590 vnodes in total.
>>> When I run `SELECT tokens FROM system.peers` or `nodetool ring` each node
>>> lists 256 tokens.
>>>
>>> This is different from how it works in Riak and Voldemort, if I'm not
>>> mistaken, and that is the source of my confusion.
>>>
>>> T#
>>>
>>>
>>> On Mon, Jun 10, 2013 at 4:54 PM, Milind Parikh <mi...@gmail.com>wrote:
>>>
>>>> There are n vnodes regardless of the size of the physical cluster.
>>>> Regards
>>>> Milind
>>>> On Jun 10, 2013 7:48 AM, "Theo Hultberg" <th...@iconara.net> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> The default number of vnodes is 256, is there any significance in this
>>>>> number? Since Cassandra's vnodes don't work like for example Riak's, where
>>>>> there is a fixed number of vnodes distributed evenly over the nodes, why so
>>>>> many? Even with a moderately sized cluster you get thousands of slices.
>>>>> Does this matter? If your cluster grows to over thirty machines and you
>>>>> start looking at ten thousand slices, would that be a problem? I guess trat
>>>>> traversing a list of a thousand or ten thousand slices to find where a
>>>>> token lives isn't a huge problem, but are there any other up or downsides
>>>>> to having a small or large number of vnodes per node?
>>>>>
>>>>> I understand the benefits for splitting up the ring into pieces, for
>>>>> example to be able to stream data from more nodes when bootstrapping a new
>>>>> one, but that works even if each node only has say 32 vnodes (unless your
>>>>> cluster is truly huge).
>>>>>
>>>>> yours,
>>>>> Theo
>>>>>
>>>>
>>>
>>
>

Re: Why so many vnodes?

Posted by Theo Hultberg <th...@iconara.net>.

thanks, that makes sense, but I assume in your last sentence you mean
decrease it for large clusters, not increase it?

T#


On Mon, Jun 10, 2013 at 11:02 PM, Richard Low <ri...@wentnet.com> wrote:

> Hi Theo,
>
> The number (let's call it T and the number of nodes N) 256 was chosen to
> give good load balancing for random token assignments for most cluster
> sizes.  For small T, a random choice of initial tokens will in most cases
> give a poor distribution of data.  The larger T is, the closer to uniform
> the distribution will be, with increasing probability.
>
> Also, for small T, when a new node is added, it won't have many ranges to
> split so won't be able to take an even slice of the data.
>
> For this reason T should be large.  But if it is too large, there are too
> many slices to keep track of as you say.  The function to find which keys
> live where becomes more expensive and operations that deal with individual
> vnodes e.g. repair become slow.  (An extreme example is SELECT * LIMIT 1,
> which when there is no data has to scan each vnode in turn in search of a
> single row.  This is O(NT) and for even quite small T takes seconds to
> complete.)
>
> So 256 was chosen to be a reasonable balance.  I don't think most users
> will find it too slow; users with extremely large clusters may need to
> increase it.
>
> Richard.
>
>
> On 10 June 2013 18:55, Theo Hultberg <th...@iconara.net> wrote:
>
>> I'm not sure I follow what you mean, or if I've misunderstood what
>> Cassandra is telling me. Each node has 256 vnodes (or tokens, as the
>> prefered name seems to be). When I run `nodetool status` each node is
>> reported as having 256 vnodes, regardless of how many nodes are in the
>> cluster. A single node cluster has 256 vnodes on the single node, a six
>> node cluster has 256 nodes on each machine, making 1590 vnodes in total.
>> When I run `SELECT tokens FROM system.peers` or `nodetool ring` each node
>> lists 256 tokens.
>>
>> This is different from how it works in Riak and Voldemort, if I'm not
>> mistaken, and that is the source of my confusion.
>>
>> T#
>>
>>
>> On Mon, Jun 10, 2013 at 4:54 PM, Milind Parikh <mi...@gmail.com>wrote:
>>
>>> There are n vnodes regardless of the size of the physical cluster.
>>> Regards
>>> Milind
>>> On Jun 10, 2013 7:48 AM, "Theo Hultberg" <th...@iconara.net> wrote:
>>>
>>>> Hi,
>>>>
>>>> The default number of vnodes is 256, is there any significance in this
>>>> number? Since Cassandra's vnodes don't work like for example Riak's, where
>>>> there is a fixed number of vnodes distributed evenly over the nodes, why so
>>>> many? Even with a moderately sized cluster you get thousands of slices.
>>>> Does this matter? If your cluster grows to over thirty machines and you
>>>> start looking at ten thousand slices, would that be a problem? I guess trat
>>>> traversing a list of a thousand or ten thousand slices to find where a
>>>> token lives isn't a huge problem, but are there any other up or downsides
>>>> to having a small or large number of vnodes per node?
>>>>
>>>> I understand the benefits for splitting up the ring into pieces, for
>>>> example to be able to stream data from more nodes when bootstrapping a new
>>>> one, but that works even if each node only has say 32 vnodes (unless your
>>>> cluster is truly huge).
>>>>
>>>> yours,
>>>> Theo
>>>>
>>>
>>
>

Re: Why so many vnodes?

Posted by Richard Low <ri...@wentnet.com>.

Hi Theo,

The number (let's call it T and the number of nodes N) 256 was chosen to
give good load balancing for random token assignments for most cluster
sizes.  For small T, a random choice of initial tokens will in most cases
give a poor distribution of data.  The larger T is, the closer to uniform
the distribution will be, with increasing probability.

Also, for small T, when a new node is added, it won't have many ranges to
split so won't be able to take an even slice of the data.

For this reason T should be large.  But if it is too large, there are too
many slices to keep track of as you say.  The function to find which keys
live where becomes more expensive and operations that deal with individual
vnodes e.g. repair become slow.  (An extreme example is SELECT * LIMIT 1,
which when there is no data has to scan each vnode in turn in search of a
single row.  This is O(NT) and for even quite small T takes seconds to
complete.)

So 256 was chosen to be a reasonable balance.  I don't think most users
will find it too slow; users with extremely large clusters may need to
increase it.

Richard.

On 10 June 2013 18:55, Theo Hultberg <th...@iconara.net> wrote:

> I'm not sure I follow what you mean, or if I've misunderstood what
> Cassandra is telling me. Each node has 256 vnodes (or tokens, as the
> prefered name seems to be). When I run `nodetool status` each node is
> reported as having 256 vnodes, regardless of how many nodes are in the
> cluster. A single node cluster has 256 vnodes on the single node, a six
> node cluster has 256 nodes on each machine, making 1590 vnodes in total.
> When I run `SELECT tokens FROM system.peers` or `nodetool ring` each node
> lists 256 tokens.
>
> This is different from how it works in Riak and Voldemort, if I'm not
> mistaken, and that is the source of my confusion.
>
> T#
>
>
> On Mon, Jun 10, 2013 at 4:54 PM, Milind Parikh <mi...@gmail.com>wrote:
>
>> There are n vnodes regardless of the size of the physical cluster.
>> Regards
>> Milind
>> On Jun 10, 2013 7:48 AM, "Theo Hultberg" <th...@iconara.net> wrote:
>>
>>> Hi,
>>>
>>> The default number of vnodes is 256, is there any significance in this
>>> number? Since Cassandra's vnodes don't work like for example Riak's, where
>>> there is a fixed number of vnodes distributed evenly over the nodes, why so
>>> many? Even with a moderately sized cluster you get thousands of slices.
>>> Does this matter? If your cluster grows to over thirty machines and you
>>> start looking at ten thousand slices, would that be a problem? I guess trat
>>> traversing a list of a thousand or ten thousand slices to find where a
>>> token lives isn't a huge problem, but are there any other up or downsides
>>> to having a small or large number of vnodes per node?
>>>
>>> I understand the benefits for splitting up the ring into pieces, for
>>> example to be able to stream data from more nodes when bootstrapping a new
>>> one, but that works even if each node only has say 32 vnodes (unless your
>>> cluster is truly huge).
>>>
>>> yours,
>>> Theo
>>>
>>
>

Re: Why so many vnodes?

Posted by Theo Hultberg <th...@iconara.net>.

I'm not sure I follow what you mean, or if I've misunderstood what
Cassandra is telling me. Each node has 256 vnodes (or tokens, as the
prefered name seems to be). When I run `nodetool status` each node is
reported as having 256 vnodes, regardless of how many nodes are in the
cluster. A single node cluster has 256 vnodes on the single node, a six
node cluster has 256 nodes on each machine, making 1590 vnodes in total.
When I run `SELECT tokens FROM system.peers` or `nodetool ring` each node
lists 256 tokens.

This is different from how it works in Riak and Voldemort, if I'm not
mistaken, and that is the source of my confusion.

T#

On Mon, Jun 10, 2013 at 4:54 PM, Milind Parikh <mi...@gmail.com>wrote:

> There are n vnodes regardless of the size of the physical cluster.
> Regards
> Milind
> On Jun 10, 2013 7:48 AM, "Theo Hultberg" <th...@iconara.net> wrote:
>
>> Hi,
>>
>> The default number of vnodes is 256, is there any significance in this
>> number? Since Cassandra's vnodes don't work like for example Riak's, where
>> there is a fixed number of vnodes distributed evenly over the nodes, why so
>> many? Even with a moderately sized cluster you get thousands of slices.
>> Does this matter? If your cluster grows to over thirty machines and you
>> start looking at ten thousand slices, would that be a problem? I guess trat
>> traversing a list of a thousand or ten thousand slices to find where a
>> token lives isn't a huge problem, but are there any other up or downsides
>> to having a small or large number of vnodes per node?
>>
>> I understand the benefits for splitting up the ring into pieces, for
>> example to be able to stream data from more nodes when bootstrapping a new
>> one, but that works even if each node only has say 32 vnodes (unless your
>> cluster is truly huge).
>>
>> yours,
>> Theo
>>
>

Re: Why so many vnodes?

Posted by Milind Parikh <mi...@gmail.com>.

There are n vnodes regardless of the size of the physical cluster.
Regards
Milind
On Jun 10, 2013 7:48 AM, "Theo Hultberg" <th...@iconara.net> wrote:

> Hi,
>
> The default number of vnodes is 256, is there any significance in this
> number? Since Cassandra's vnodes don't work like for example Riak's, where
> there is a fixed number of vnodes distributed evenly over the nodes, why so
> many? Even with a moderately sized cluster you get thousands of slices.
> Does this matter? If your cluster grows to over thirty machines and you
> start looking at ten thousand slices, would that be a problem? I guess trat
> traversing a list of a thousand or ten thousand slices to find where a
> token lives isn't a huge problem, but are there any other up or downsides
> to having a small or large number of vnodes per node?
>
> I understand the benefits for splitting up the ring into pieces, for
> example to be able to stream data from more nodes when bootstrapping a new
> one, but that works even if each node only has say 32 vnodes (unless your
> cluster is truly huge).
>
> yours,
> Theo
>