You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Anubhav Kale <An...@microsoft.com> on 2016/06/01 19:40:56 UTC

Token Ring Question

Hello,

I recently learnt that regardless of number of Data Centers, there is really only one token ring across all nodes. (I was under the impression that there is one per DC like how Datastax Ops Center would show it).

Suppose we have 4 v-nodes, and 2 DCs (2 nodes in each DC) and a key space is set to replicate in only one DC - say DC1.

Now, if the token for a particular write falls in the "primary range" of a node living in DC2, does the code check for such conditions and instead put it on some node in DC1 ? What is the true meaning of "primary" token range in such scenarios ?

Is this how things works roughly speaking or am I missing something ?

Thanks !

Re: Token Ring Question

Posted by Tyler Hobbs <ty...@datastax.com>.

On Fri, Jun 24, 2016 at 2:31 PM, Anubhav Kale <An...@microsoft.com>
wrote:

> So, can someone educate me on how token aware policies in drivers really
> work ? It appears that it’s quite possible that the data may live on nodes
> that don’t own the tokens for it. By “own” I mean the ownership as defined
> in system.local / peers and is fed back to drivers.
>

The tokens in system.local/peers are accurate.  Combined with the
replication settings for a keyspace, drivers can accurately determine which
nodes are replicas for a given partition.

Even if the driver's calculation is incorrect for some reason, token-aware
routing is just an optimization.  Nothing will break if a query is sent to
a node that's not a replica.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Token Ring Question

Posted by Bulat Shakirzyanov <bu...@datastax.com>.

Drivers actually reconstruct the token ring using information about keyspace's replication settings as well as token ranges assigned to each node.
Whenever you prepare a statement that is parameterized by partition key, the driver finds the token by first converting the key to a byte array and running it through the hash function (partitioner) you've configured when setting up the cluster. This token lets the driver find replicas (nodes owning the data), note that the actual replicas will be different based on replication settings of a given keyspace. Once the replicas are found, they are either randomized or not, depends on your token aware policy configuration, and the request is delivered to those nodes first.

To summarize, token awareness works by hashing partition key of prepared statements at execution time as well as by reconstruction of token rings client side upon initial connection and schema discovery using information about assigned token ranges and each keyspace's replication settings.

Ruby Drivers generation of a replica map for the network topology replication strategy - https://github.com/datastax/ruby-driver/blob/master/lib/cassandra/cluster/schema/replication_strategies/network_topology.rb


Bulat

> On Jun 24, 2016, at 12:31, Anubhav Kale <An...@microsoft.com> wrote:
> 
> So, can someone educate me on how token aware policies in drivers really work ? It appears that it�s quite possible that the data may live on nodes that don�t own the tokens for it. By �own� I mean the ownership as defined in system.local / peers and is fed back to drivers.
>  
> If this statement is correct,
>  
> In my view, unless the drivers execute the *Topology.GetReplicas from Cassandra core somehow (something that isn�t available to them), they will never be able to tell the correct node that holds data for a given token.
>  
> Is my understanding wrong ?
>  
> From: Anubhav Kale [mailto:Anubhav.Kale@microsoft.com] 
> Sent: Friday, June 3, 2016 3:17 PM
> To: user@cassandra.apache.org
> Subject: RE: Token Ring Question
>  
> Thank you, I was just curious about how this works.
>  
> From: Tyler Hobbs [mailto:tyler@datastax.com] 
> Sent: Friday, June 3, 2016 3:02 PM
> To: user@cassandra.apache.org
> Subject: Re: Token Ring Question
>  
> There really is only one token ring, but conceptually it's easiest to think of it like multiple rings, as OpsCenter shows it.  The only difference is that every token has to be unique across the whole cluster.
> 
> Now, if the token for a particular write falls in the �primary range� of a node living in DC2, does the code check for such conditions and instead put it on some node in DC1 ?
>  
> Yes.  It will continue searching around the token ring until it hits a token that belongs to a node in the correct datacenter.
> 
> What is the true meaning of �primary� token range in such scenarios ?
>  
> There's not really any such thing as a "primary token range", it's just a convenient idea for some tools.  In reality, it's just the replica that owns the first (clockwise) token.  I'm not sure what you're really asking, though -- what are you concerned about?
>  
>  
> On Wed, Jun 1, 2016 at 2:40 PM, Anubhav Kale <An...@microsoft.com> wrote:
> Hello,
>  
> I recently learnt that regardless of number of Data Centers, there is really only one token ring across all nodes. (I was under the impression that there is one per DC like how Datastax Ops Center would show it).
>  
> Suppose we have 4 v-nodes, and 2 DCs (2 nodes in each DC) and a key space is set to replicate in only one DC � say DC1.
>  
> Now, if the token for a particular write falls in the �primary range� of a node living in DC2, does the code check for such conditions and instead put it on some node in DC1 ? What is the true meaning of �primary� token range in such scenarios ?
>  
> Is this how things works roughly speaking or am I missing something ?
>  
> Thanks !
> 
> 
> 
> --
> Tyler Hobbs
> DataStax

RE: Token Ring Question

Posted by Anubhav Kale <An...@microsoft.com>.

So, can someone educate me on how token aware policies in drivers really work ? It appears that it’s quite possible that the data may live on nodes that don’t own the tokens for it. By “own” I mean the ownership as defined in system.local / peers and is fed back to drivers.

If this statement is correct,

In my view, unless the drivers execute the *Topology.GetReplicas from Cassandra core somehow (something that isn’t available to them), they will never be able to tell the correct node that holds data for a given token.

Is my understanding wrong ?

From: Anubhav Kale [mailto:Anubhav.Kale@microsoft.com]
Sent: Friday, June 3, 2016 3:17 PM
To: user@cassandra.apache.org
Subject: RE: Token Ring Question

Thank you, I was just curious about how this works.

From: Tyler Hobbs [mailto:tyler@datastax.com]
Sent: Friday, June 3, 2016 3:02 PM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Token Ring Question

There really is only one token ring, but conceptually it's easiest to think of it like multiple rings, as OpsCenter shows it.  The only difference is that every token has to be unique across the whole cluster.
Now, if the token for a particular write falls in the “primary range” of a node living in DC2, does the code check for such conditions and instead put it on some node in DC1 ?

Yes.  It will continue searching around the token ring until it hits a token that belongs to a node in the correct datacenter.
What is the true meaning of “primary” token range in such scenarios ?

There's not really any such thing as a "primary token range", it's just a convenient idea for some tools.  In reality, it's just the replica that owns the first (clockwise) token.  I'm not sure what you're really asking, though -- what are you concerned about?


On Wed, Jun 1, 2016 at 2:40 PM, Anubhav Kale <An...@microsoft.com>> wrote:
Hello,

I recently learnt that regardless of number of Data Centers, there is really only one token ring across all nodes. (I was under the impression that there is one per DC like how Datastax Ops Center would show it).

Suppose we have 4 v-nodes, and 2 DCs (2 nodes in each DC) and a key space is set to replicate in only one DC – say DC1.

Now, if the token for a particular write falls in the “primary range” of a node living in DC2, does the code check for such conditions and instead put it on some node in DC1 ? What is the true meaning of “primary” token range in such scenarios ?

Is this how things works roughly speaking or am I missing something ?

Thanks !



--
Tyler Hobbs
DataStax<https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fdatastax.com%2f&data=01%7c01%7cAnubhav.Kale%40microsoft.com%7cf121131457ea451f60ff08d38bfabe92%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=n2DewZys%2fqY2A7nc%2bN0FqcF3%2bIfP3%2fNiTQeoYGxYDcU%3d>

RE: Token Ring Question

Posted by Anubhav Kale <An...@microsoft.com>.

Thank you, I was just curious about how this works.

From: Tyler Hobbs [mailto:tyler@datastax.com]
Sent: Friday, June 3, 2016 3:02 PM
To: user@cassandra.apache.org
Subject: Re: Token Ring Question

There really is only one token ring, but conceptually it's easiest to think of it like multiple rings, as OpsCenter shows it.  The only difference is that every token has to be unique across the whole cluster.
Now, if the token for a particular write falls in the “primary range” of a node living in DC2, does the code check for such conditions and instead put it on some node in DC1 ?

Yes.  It will continue searching around the token ring until it hits a token that belongs to a node in the correct datacenter.
What is the true meaning of “primary” token range in such scenarios ?

There's not really any such thing as a "primary token range", it's just a convenient idea for some tools.  In reality, it's just the replica that owns the first (clockwise) token.  I'm not sure what you're really asking, though -- what are you concerned about?

On Wed, Jun 1, 2016 at 2:40 PM, Anubhav Kale <An...@microsoft.com>> wrote:
Hello,

I recently learnt that regardless of number of Data Centers, there is really only one token ring across all nodes. (I was under the impression that there is one per DC like how Datastax Ops Center would show it).

Suppose we have 4 v-nodes, and 2 DCs (2 nodes in each DC) and a key space is set to replicate in only one DC – say DC1.

Now, if the token for a particular write falls in the “primary range” of a node living in DC2, does the code check for such conditions and instead put it on some node in DC1 ? What is the true meaning of “primary” token range in such scenarios ?

Is this how things works roughly speaking or am I missing something ?

Thanks !

--
Tyler Hobbs
DataStax<https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fdatastax.com%2f&data=01%7c01%7cAnubhav.Kale%40microsoft.com%7cf121131457ea451f60ff08d38bfabe92%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=n2DewZys%2fqY2A7nc%2bN0FqcF3%2bIfP3%2fNiTQeoYGxYDcU%3d>

Re: Token Ring Question

Posted by Tyler Hobbs <ty...@datastax.com>.

There really is only one token ring, but conceptually it's easiest to think
of it like multiple rings, as OpsCenter shows it.  The only difference is
that every token has to be unique across the whole cluster.

Now, if the token for a particular write falls in the “primary range” of a
> node living in DC2, does the code check for such conditions and instead put
> it on some node in DC1 ?
>

Yes.  It will continue searching around the token ring until it hits a
token that belongs to a node in the correct datacenter.

What is the true meaning of “primary” token range in such scenarios ?
>

There's not really any such thing as a "primary token range", it's just a
convenient idea for some tools.  In reality, it's just the replica that
owns the first (clockwise) token.  I'm not sure what you're really asking,
though -- what are you concerned about?

On Wed, Jun 1, 2016 at 2:40 PM, Anubhav Kale <An...@microsoft.com>
wrote:

> Hello,
>
>
>
> I recently learnt that regardless of number of Data Centers, there is
> really only one token ring across all nodes. (I was under the impression
> that there is one per DC like how Datastax Ops Center would show it).
>
>
>
> Suppose we have 4 v-nodes, and 2 DCs (2 nodes in each DC) and a key space
> is set to replicate in only one DC – say DC1.
>
>
>
> Now, if the token for a particular write falls in the “primary range” of a
> node living in DC2, does the code check for such conditions and instead put
> it on some node in DC1 ? What is the true meaning of “primary” token range
> in such scenarios ?
>
>
>
> Is this how things works roughly speaking or am I missing something ?
>
>
>
> Thanks !
>

-- 
Tyler Hobbs
DataStax <http://datastax.com/>