You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "motta.lrd" <mo...@hotmail.it> on 2014/03/10 12:26:35 UTC

Replication with virtual nodes

Hello, 

I have just learnt about virtual nodes in Cassandra. 
Let's assume this scenario where we have 16 tokens, 4 physical nodes, and
each physical node is responsible of 4 tokens. 

<http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/file/n7593310/Screen_Shot_2014-03-10_at_07.55.27.png> 

If we have a replication factor of 2, how the replication works? 
Where does token *1* will be replicated? 

Thank you 



--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Replication-with-virtual-nodes-tp7593310.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Replication with virtual nodes

Posted by "motta.lrd" <mo...@hotmail.it>.
I will reply to myself and raise a flag in case someone is interested.

Assuming the tokens are replicated as follows:
"""
Request to update row X
Compute the token from the row key  
Identify the server with that token  
Place one replica there  
Increment the token until you get to a different server  
Place the next token there  
"""

Then the number of virtual nodes may affect the availability.
If we consider the previous example.

Let’s call the tokens as Tx and the server as Sx
where x is the number of the token or server respectively.

With a RF = 3 this means that on S1:
T1 replicated to S2 (owner of T2) and S4 (owner of T3)
T5 replicated to S2 (owner of T6) and S4 (owner of T7)
T13 replicated to S2 (owner of T14) and S3 (owner of T15)
T9 replicated to S2 (owner of T10) and S3 (owner of T11)

These are the possible data loss scenarios involving S1:

I lose S1, S2 and S3  =>  I lost T13 and T9 
I lose S1, S2 and S4  =>  I lost T1 and T5 


==> This is a lower availability with respect to a scenario in which each
server has only one token.
Consider this simple case 
<http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/file/n7593326/355a30c68fa0538957eca97ebb226cc3.png> 

Again with RF = 3 I will have:
T1 is replicated in S2 and S3.

The only data loss scenario involving S1 is the following:
I lose S1, S2, and S3  =>  I lost token 1

which is 1 less then the previous case.
the gap between the two increases with the number of virtual nodes.

can anyone confirm this conjecture?
thanks



--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Replication-with-virtual-nodes-tp7593310p7593326.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.