You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Manu Zhang <ow...@gmail.com> on 2013/01/30 09:22:40 UTC

how RandomPartitioner calculate tokens

Hi,

As per the Datastax Cassandra Documentation 1.2,

"for single data center deployments, tokens are calculated by dividing 
the hash range by the number of nodes in the cluster", *does it mean we 
have to recalculate the tokens of keys when nodes come and go?**
*
"for multiple data center deployments, tokens are calculated per data 
center so that the hash range is evenly divide for the nodes in each 
data center." *This is understandable, but when I go to the getToken 
method of RandomPartitioner, I can't find any datacenter-aware token 
calculation* *codes.

By the way, the documentation doesn't mention how Murmur3Partitioner 
calculate tokens for multiple data center. Assuming it doesn't calculate 
tokens per data center, what difference between Murmur3Partitioner and 
RandomPartitioner has made that unnecessary?

*Thanks.
*
*Manu Zhang*




*

Re: how RandomPartitioner calculate tokens

Posted by Manu Zhang <ow...@gmail.com>.

On Wed 30 Jan 2013 05:47:59 PM CST, Sylvain Lebresne wrote:
> I'll admit that this part of the DataStax documentation is a bit
> confusing (and
> I'll reach to the doc writers to make sure this is improved).
>
> The partitioner (being it RandomPartitioner, Murmur3Partitioner or
> OrderPreservingPartitioner) is pretty much only a hash function that
> defines
> how to compute the token (it's hash) of a key. In particular, the
> partitioner
> has no notion whatsoever of data centers and more generally does not
> depend in
> any way of how many nodes you have.
>
> However, for actually distribute data, each node is assigned a token (or
> multiple ones with "vnodes"). Getting an even distribution of data
> depends on
> the exact token picked for your nodes.
>
> Now, the sentences of the doc you cite actually refer to how to
> calculate the
> tokens you assign to nodes. In particular, what it describes is pretty
> much
> what the small token-generator tool that comes with Cassandra
> (http://goo.gl/rwea9) does, but is not something Cassandra itself actually
> does.
>
> Also, that procedure to compute token is pretty much the same for
> RandomPartitioner and Murmur3Partitioner, except that the token range
> for both
> partitioner is not exactly the same. And as a side note, if you use
> vnodes, you
> don't really have to bother about manually assigning tokens for nodes.
>
> --
> Sylvain
>
>
> On Wed, Jan 30, 2013 at 9:22 AM, Manu Zhang <owenzhang1990@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Hi,
>
>     As per the Datastax Cassandra Documentation 1.2,
>
>     "for single data center deployments, tokens are calculated by
>     dividing the hash range by the number of nodes in the cluster",
>     *does it mean we have to recalculate the tokens of keys when nodes
>     come and go?**
>     *
>     "for multiple data center deployments, tokens are calculated per
>     data center so that the hash range is evenly divide for the nodes
>     in each data center." *This is understandable, but when I go to
>     the getToken method of RandomPartitioner, I can't find any
>     datacenter-aware token calculation* *codes.
>
>     By the way, the documentation doesn't mention how
>     Murmur3Partitioner calculate tokens for multiple data center.
>     Assuming it doesn't calculate tokens per data center, what
>     difference between Murmur3Partitioner and RandomPartitioner has
>     made that unnecessary?
>
>     *Thanks.
>     *
>     *Manu Zhang*
>
>
>
>
>     *
>
>

Thanks Sylvain, it's all clear now.

Re: how RandomPartitioner calculate tokens

Posted by Sylvain Lebresne <sy...@datastax.com>.

I'll admit that this part of the DataStax documentation is a bit confusing
(and
I'll reach to the doc writers to make sure this is improved).

The partitioner (being it RandomPartitioner, Murmur3Partitioner or
OrderPreservingPartitioner) is pretty much only a hash function that defines
how to compute the token (it's hash) of a key. In particular, the
partitioner
has no notion whatsoever of data centers and more generally does not depend
in
any way of how many nodes you have.

However, for actually distribute data, each node is assigned a token (or
multiple ones with "vnodes"). Getting an even distribution of data depends
on
the exact token picked for your nodes.

Now, the sentences of the doc you cite actually refer to how to calculate
the
tokens you assign to nodes. In particular, what it describes is pretty much
what the small token-generator tool that comes with Cassandra
(http://goo.gl/rwea9) does, but is not something Cassandra itself actually
does.

Also, that procedure to compute token is pretty much the same for
RandomPartitioner and Murmur3Partitioner, except that the token range for
both
partitioner is not exactly the same. And as a side note, if you use vnodes,
you
don't really have to bother about manually assigning tokens for nodes.

--
Sylvain

On Wed, Jan 30, 2013 at 9:22 AM, Manu Zhang <ow...@gmail.com> wrote:

>  Hi,
>
> As per the Datastax Cassandra Documentation 1.2,
>
> "for single data center deployments, tokens are calculated by dividing the
> hash range by the number of nodes in the cluster", *does it mean we have
> to recalculate the tokens of keys when nodes come and go?**
> *
> "for multiple data center deployments, tokens are calculated per data
> center so that the hash range is evenly divide for the nodes in each data
> center." *This is understandable, but when I go to the getToken method of
> RandomPartitioner, I can't find any datacenter-aware token calculation* *codes.
>
>
> By the way, the documentation doesn't mention how Murmur3Partitioner
> calculate tokens for multiple data center. Assuming it doesn't calculate
> tokens per data center, what  difference between Murmur3Partitioner and
> RandomPartitioner has made that unnecessary?
>
> *Thanks.
> *
> *Manu Zhang*
>
>
>
>
> *
>