You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Inquistive allen <in...@gmail.com> on 2019/08/17 17:53:32 UTC

Consistent hashing vnodes and ring in cassandra

I am a newbie in cassandra. I have asked this question on various platforms but never got a satisfying answer.
Hence thought of bringing up the topic here. Sorry for this might be a simple question.

1. I studied the paper on consistent hashing (which is being implemented in Cassandra)
2. Cassandra has the concept of Vnodes. The vnodes( As I understand a Vnode is a collection of Hashes) , are the basic blocks of replication in cassandra. It is the vnodes which are replicated across the cluster. Please do correct me I'm wrong
3. Suppose I have a Keyspace A with replication factor 3 and Keyspace B with replication factor 2. 
4. Is it that a Vnode is a collection of hashes of data from various Keyspaces.
5. In that case, Keyspace with varying replication factors , replicating them to other nodes would be a problem
6. Now from the consistent hashing paper, I get a feeling  that , ach Keyspace has a different ring. Also the name "KEYSPACE", points to a ring of keys in the ring.
    So is it that each keyspace has a different ring. If it is so, everything else like replicating vnodes among nodes in the cluster would fall in place.
    Each Keyspace has a different ring ---> each Vnode has data of various tables from a given keyspace----> hence copies equal to RF is only made in the cluster.

I know I am missing something. This way of understanding thing might be wrong.
Kindly help me understand the same. As this would help me visualise repair, bootstrap, adding cluster, streaming operations in a much better way.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Re: Consistent hashing vnodes and ring in cassandra

Posted by Jeff Jirsa <jj...@gmail.com>.
The has results in a token

The replicas are typically the RF instances with tokens numerically larger than the hash

So if a row hashes to token 1, and the instances are -100,0,100,200,300,400,500 the replica instances are 100,200,300, and cassandra considers them to be identical for nearly all purposes

Some snitches will skip some of the replicas to satisfy other requirements - if 200 and 300 are in the same rack (or the same machine because of vnodes), a rack aware snitch will choose 100,200,400 instead.


> On Aug 17, 2019, at 12:21 PM, Inquistive allen <in...@gmail.com> wrote:
> 
> Hello Jeff,
> 
> Thanks for the responses.
> I just got it right. One Last thing, when a read request comes in to the coordinator node, the partition key is hashed and a node is located where corresponding data is previously stored. How does the coordinator node locate the replica nodes for this row.
> The first copy of this was written based on hash number, but the replica copies were written based on replication strategy. 
> Will a hash of any partition key list out all nodes where data is present . 
> 
> Thanks
> 
> 
>> On Sun, 18 Aug, 2019, 12:35 AM Jeff Jirsa, <jj...@gmail.com> wrote:
>> 
>> 
>> > On Aug 17, 2019, at 10:53 AM, Inquistive allen <in...@gmail.com> wrote:
>> > 
>> > I am a newbie in cassandra. I have asked this question on various platforms but never got a satisfying answer.
>> > Hence thought of bringing up the topic here. Sorry for this might be a simple question.
>> > 
>> > 1. I studied the paper on consistent hashing (which is being implemented in Cassandra)
>> > 2. Cassandra has the concept of Vnodes. The vnodes( As I understand a Vnode is a collection of Hashes) , are the basic blocks of replication in cassandra. It is the vnodes which are replicated across the cluster. Please do correct me I'm wrong
>> 
>> Vnodes JUST mean each host has more than one token
>> 
>> > 3. Suppose I have a Keyspace A with replication factor 3 and Keyspace B with replication factor 2. 
>> > 4. Is it that a Vnode is a collection of hashes of data from various Keyspaces.
>> > 5. In that case, Keyspace with varying replication factors , replicating them to other nodes would be a problem
>> > 6. Now from the consistent hashing paper, I get a feeling  that , ach Keyspace has a different ring. Also the name "KEYSPACE", points to a ring of keys in the ring.
>> >    So is it that each keyspace has a different ring. If it is so, everything else like replicating vnodes among nodes in the cluster would fall in place.
>> >    Each Keyspace has a different ring ---> each Vnode has data of various tables from a given keyspace----> hence copies equal to RF is only made in the cluster.
>> > 
>> > I know I am missing something. This way of understanding thing might be wrong.
>> > Kindly help me understand the same. As this would help me visualise repair, bootstrap, adding cluster, streaming operations in a much better way.
>> > 
>> 
>> The easiest way to visualize most cassandra operations is to draw the tokens in a circle. Vnodes means extra tokens
>> 
>> Replica sets are adjacent tokens. You steam from any node in the replica set in the common replacement case, or the losing replica in the expansion case 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>> 

Re: Consistent hashing vnodes and ring in cassandra

Posted by Inquistive allen <in...@gmail.com>.
Hello Jeff,

Thanks for the response.
I just got it right. One Last thing, when a read request comes in to the
coordinator node, the partition key is hashed and a node is located where
corresponding data is previously stored. How does the coordinator node
locate the replica nodes for this row.
The first copy of this was written based on hash number, but the replica
copies were written based on replication strategy.
Will a hash of any partition key list out all nodes where data is present .

Thanks


On Sun, 18 Aug, 2019, 12:35 AM Jeff Jirsa, <jj...@gmail.com> wrote:

>
>
> > On Aug 17, 2019, at 10:53 AM, Inquistive allen <in...@gmail.com>
> wrote:
> >
> > I am a newbie in cassandra. I have asked this question on various
> platforms but never got a satisfying answer.
> > Hence thought of bringing up the topic here. Sorry for this might be a
> simple question.
> >
> > 1. I studied the paper on consistent hashing (which is being implemented
> in Cassandra)
> > 2. Cassandra has the concept of Vnodes. The vnodes( As I understand a
> Vnode is a collection of Hashes) , are the basic blocks of replication in
> cassandra. It is the vnodes which are replicated across the cluster. Please
> do correct me I'm wrong
>
> Vnodes JUST mean each host has more than one token
>
> > 3. Suppose I have a Keyspace A with replication factor 3 and Keyspace B
> with replication factor 2.
> > 4. Is it that a Vnode is a collection of hashes of data from various
> Keyspaces.
> > 5. In that case, Keyspace with varying replication factors , replicating
> them to other nodes would be a problem
> > 6. Now from the consistent hashing paper, I get a feeling  that , ach
> Keyspace has a different ring. Also the name "KEYSPACE", points to a ring
> of keys in the ring.
> >    So is it that each keyspace has a different ring. If it is so,
> everything else like replicating vnodes among nodes in the cluster would
> fall in place.
> >    Each Keyspace has a different ring ---> each Vnode has data of
> various tables from a given keyspace----> hence copies equal to RF is only
> made in the cluster.
> >
> > I know I am missing something. This way of understanding thing might be
> wrong.
> > Kindly help me understand the same. As this would help me visualise
> repair, bootstrap, adding cluster, streaming operations in a much better
> way.
> >
>
> The easiest way to visualize most cassandra operations is to draw the
> tokens in a circle. Vnodes means extra tokens
>
> Replica sets are adjacent tokens. You steam from any node in the replica
> set in the common replacement case, or the losing replica in the expansion
> case
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>

Re: Consistent hashing vnodes and ring in cassandra

Posted by Jeff Jirsa <jj...@gmail.com>.

> On Aug 17, 2019, at 10:53 AM, Inquistive allen <in...@gmail.com> wrote:
> 
> I am a newbie in cassandra. I have asked this question on various platforms but never got a satisfying answer.
> Hence thought of bringing up the topic here. Sorry for this might be a simple question.
> 
> 1. I studied the paper on consistent hashing (which is being implemented in Cassandra)
> 2. Cassandra has the concept of Vnodes. The vnodes( As I understand a Vnode is a collection of Hashes) , are the basic blocks of replication in cassandra. It is the vnodes which are replicated across the cluster. Please do correct me I'm wrong

Vnodes JUST mean each host has more than one token

> 3. Suppose I have a Keyspace A with replication factor 3 and Keyspace B with replication factor 2. 
> 4. Is it that a Vnode is a collection of hashes of data from various Keyspaces.
> 5. In that case, Keyspace with varying replication factors , replicating them to other nodes would be a problem
> 6. Now from the consistent hashing paper, I get a feeling  that , ach Keyspace has a different ring. Also the name "KEYSPACE", points to a ring of keys in the ring.
>    So is it that each keyspace has a different ring. If it is so, everything else like replicating vnodes among nodes in the cluster would fall in place.
>    Each Keyspace has a different ring ---> each Vnode has data of various tables from a given keyspace----> hence copies equal to RF is only made in the cluster.
> 
> I know I am missing something. This way of understanding thing might be wrong.
> Kindly help me understand the same. As this would help me visualise repair, bootstrap, adding cluster, streaming operations in a much better way.
> 

The easiest way to visualize most cassandra operations is to draw the tokens in a circle. Vnodes means extra tokens

Replica sets are adjacent tokens. You steam from any node in the replica set in the common replacement case, or the losing replica in the expansion case 



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org