You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Renjie Liu <li...@gmail.com> on 2016/05/19 02:33:46 UTC

How cassandra ensures consistency when adding or removing a node?

Hi, cassandra devs:
I'm learning cassandra and I can understand most of the techniques used.
But I can't understand how cassandra ensures consistency when
adding/removing a node? It seems that when a node joins the dht ring, some
node need to transferring data to the new node using streaming. But the
data may still get updated while transferring, so the new node can never
catch up with it. How cassandra handles this? Will cassandra lose data
during this process?
-- 
Liu, Renjie
Software Engineer, MVAD

Re: How cassandra ensures consistency when adding or removing a node?

Posted by Renjie Liu <li...@gmail.com>.
Thanks, Alex.

On Thu, May 19, 2016 at 3:44 PM Oleksandr Petrov <ol...@gmail.com>
wrote:

> I think that this article [1] covers most of the concepts (see key
> concepts) quite well.
> I am not aware of any article that explains the whole process, though.
>
> Briefly, there are several processes/concepts that are somewhat related to
> that subject: token ownership, replica, coordinator and gossip.
> Ensuring consistency in small cluster (amount of replica <= amount of
> nodes) is more or less straightforward. In this case, when node bootstraps,
> it notifies all the replicas, information about that node gets added to
> `pending nodes`, all nodes know about the bootstrapping node, as otherwise
> streaming would not even start.
> Having a coordinator outside of replica for the partition/token you're
> querying is a bit more complex, as it involves the knowledge about the
> joined node that's distributed over gossip.
>
> There are two properties that can improve the situation with range
> movements: cassandra.consistent.rangemovement
> and cassandra.consistent.simultaneousmoves.allow. First one disallows ring
> changes in case there's any node in replica is offline. In addition to
> that, it makes sure there are no moves within the ring. In that case, if
> you're connected to coordinator that's a part of replica, data has to be
> placed correctly. The data will be moved and any inconsistencies will be
> eventually fixed with a repair (answering your question, there will be no
> data lost during this process).
>
> (I tried to provide information according to my best knowledge, although if
> anyone sees something wrong, please indicate accordingly)
>
> [1] https://dzone.com/articles/introduction-apache-cassandra
>
> On Thu, May 19, 2016 at 5:58 AM Renjie Liu <li...@gmail.com>
> wrote:
>
> > BTW, is there any article explaining the process? I think this will help
> us
> > understand it better.
> >
> > On Thu, May 19, 2016 at 11:28 AM Renjie Liu <li...@gmail.com>
> > wrote:
> >
> > > Thanks, I'll read the code.
> > >
> > > On Thu, May 19, 2016 at 11:02 AM Jeff Jirsa <
> jeff.jirsa@crowdstrike.com>
> > > wrote:
> > >
> > >>
> > >>
> >
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/TokenMetadata.java#L731-L754
> > >>
> > >>
> > >> And
> > >>
> > >>
> > >>
> >
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/TokenMetadata.java#L60-L88
> > >>
> > >>
> > >>
> > >> Cassandra keeps a map of joining and leaving nodes, and does extra
> > writes
> > >> to the appropriate nodes for mutations created after the streaming is
> > >> calculated.
> > >>
> > >>
> > >>
> > >> On 5/18/16, 7:33 PM, "Renjie Liu" <li...@gmail.com> wrote:
> > >>
> > >> >Hi, cassandra devs:
> > >> >I'm learning cassandra and I can understand most of the techniques
> > used.
> > >> >But I can't understand how cassandra ensures consistency when
> > >> >adding/removing a node? It seems that when a node joins the dht ring,
> > >> some
> > >> >node need to transferring data to the new node using streaming. But
> the
> > >> >data may still get updated while transferring, so the new node can
> > never
> > >> >catch up with it. How cassandra handles this? Will cassandra lose
> data
> > >> >during this process?
> > >> >--
> > >> >Liu, Renjie
> > >> >Software Engineer, MVAD
> > >
> > > --
> > > Liu, Renjie
> > > Software Engineer, MVAD
> > >
> > --
> > Liu, Renjie
> > Software Engineer, MVAD
> >
> --
> Alex Petrov
>
-- 
Liu, Renjie
Software Engineer, MVAD

Re: How cassandra ensures consistency when adding or removing a node?

Posted by Oleksandr Petrov <ol...@gmail.com>.
I think that this article [1] covers most of the concepts (see key
concepts) quite well.
I am not aware of any article that explains the whole process, though.

Briefly, there are several processes/concepts that are somewhat related to
that subject: token ownership, replica, coordinator and gossip.
Ensuring consistency in small cluster (amount of replica <= amount of
nodes) is more or less straightforward. In this case, when node bootstraps,
it notifies all the replicas, information about that node gets added to
`pending nodes`, all nodes know about the bootstrapping node, as otherwise
streaming would not even start.
Having a coordinator outside of replica for the partition/token you're
querying is a bit more complex, as it involves the knowledge about the
joined node that's distributed over gossip.

There are two properties that can improve the situation with range
movements: cassandra.consistent.rangemovement
and cassandra.consistent.simultaneousmoves.allow. First one disallows ring
changes in case there's any node in replica is offline. In addition to
that, it makes sure there are no moves within the ring. In that case, if
you're connected to coordinator that's a part of replica, data has to be
placed correctly. The data will be moved and any inconsistencies will be
eventually fixed with a repair (answering your question, there will be no
data lost during this process).

(I tried to provide information according to my best knowledge, although if
anyone sees something wrong, please indicate accordingly)

[1] https://dzone.com/articles/introduction-apache-cassandra

On Thu, May 19, 2016 at 5:58 AM Renjie Liu <li...@gmail.com> wrote:

> BTW, is there any article explaining the process? I think this will help us
> understand it better.
>
> On Thu, May 19, 2016 at 11:28 AM Renjie Liu <li...@gmail.com>
> wrote:
>
> > Thanks, I'll read the code.
> >
> > On Thu, May 19, 2016 at 11:02 AM Jeff Jirsa <je...@crowdstrike.com>
> > wrote:
> >
> >>
> >>
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/TokenMetadata.java#L731-L754
> >>
> >>
> >> And
> >>
> >>
> >>
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/TokenMetadata.java#L60-L88
> >>
> >>
> >>
> >> Cassandra keeps a map of joining and leaving nodes, and does extra
> writes
> >> to the appropriate nodes for mutations created after the streaming is
> >> calculated.
> >>
> >>
> >>
> >> On 5/18/16, 7:33 PM, "Renjie Liu" <li...@gmail.com> wrote:
> >>
> >> >Hi, cassandra devs:
> >> >I'm learning cassandra and I can understand most of the techniques
> used.
> >> >But I can't understand how cassandra ensures consistency when
> >> >adding/removing a node? It seems that when a node joins the dht ring,
> >> some
> >> >node need to transferring data to the new node using streaming. But the
> >> >data may still get updated while transferring, so the new node can
> never
> >> >catch up with it. How cassandra handles this? Will cassandra lose data
> >> >during this process?
> >> >--
> >> >Liu, Renjie
> >> >Software Engineer, MVAD
> >
> > --
> > Liu, Renjie
> > Software Engineer, MVAD
> >
> --
> Liu, Renjie
> Software Engineer, MVAD
>
-- 
Alex Petrov

Re: How cassandra ensures consistency when adding or removing a node?

Posted by Renjie Liu <li...@gmail.com>.
BTW, is there any article explaining the process? I think this will help us
understand it better.

On Thu, May 19, 2016 at 11:28 AM Renjie Liu <li...@gmail.com> wrote:

> Thanks, I'll read the code.
>
> On Thu, May 19, 2016 at 11:02 AM Jeff Jirsa <je...@crowdstrike.com>
> wrote:
>
>>
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/TokenMetadata.java#L731-L754
>>
>>
>> And
>>
>>
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/TokenMetadata.java#L60-L88
>>
>>
>>
>> Cassandra keeps a map of joining and leaving nodes, and does extra writes
>> to the appropriate nodes for mutations created after the streaming is
>> calculated.
>>
>>
>>
>> On 5/18/16, 7:33 PM, "Renjie Liu" <li...@gmail.com> wrote:
>>
>> >Hi, cassandra devs:
>> >I'm learning cassandra and I can understand most of the techniques used.
>> >But I can't understand how cassandra ensures consistency when
>> >adding/removing a node? It seems that when a node joins the dht ring,
>> some
>> >node need to transferring data to the new node using streaming. But the
>> >data may still get updated while transferring, so the new node can never
>> >catch up with it. How cassandra handles this? Will cassandra lose data
>> >during this process?
>> >--
>> >Liu, Renjie
>> >Software Engineer, MVAD
>
> --
> Liu, Renjie
> Software Engineer, MVAD
>
-- 
Liu, Renjie
Software Engineer, MVAD

Re: How cassandra ensures consistency when adding or removing a node?

Posted by Renjie Liu <li...@gmail.com>.
Thanks, I'll read the code.

On Thu, May 19, 2016 at 11:02 AM Jeff Jirsa <je...@crowdstrike.com>
wrote:

>
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/TokenMetadata.java#L731-L754
>
>
> And
>
>
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/TokenMetadata.java#L60-L88
>
>
>
> Cassandra keeps a map of joining and leaving nodes, and does extra writes
> to the appropriate nodes for mutations created after the streaming is
> calculated.
>
>
>
> On 5/18/16, 7:33 PM, "Renjie Liu" <li...@gmail.com> wrote:
>
> >Hi, cassandra devs:
> >I'm learning cassandra and I can understand most of the techniques used.
> >But I can't understand how cassandra ensures consistency when
> >adding/removing a node? It seems that when a node joins the dht ring, some
> >node need to transferring data to the new node using streaming. But the
> >data may still get updated while transferring, so the new node can never
> >catch up with it. How cassandra handles this? Will cassandra lose data
> >during this process?
> >--
> >Liu, Renjie
> >Software Engineer, MVAD

-- 
Liu, Renjie
Software Engineer, MVAD

Re: How cassandra ensures consistency when adding or removing a node?

Posted by Jeff Jirsa <je...@crowdstrike.com>.
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/TokenMetadata.java#L731-L754


And 

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/TokenMetadata.java#L60-L88



Cassandra keeps a map of joining and leaving nodes, and does extra writes to the appropriate nodes for mutations created after the streaming is calculated.



On 5/18/16, 7:33 PM, "Renjie Liu" <li...@gmail.com> wrote:

>Hi, cassandra devs:
>I'm learning cassandra and I can understand most of the techniques used.
>But I can't understand how cassandra ensures consistency when
>adding/removing a node? It seems that when a node joins the dht ring, some
>node need to transferring data to the new node using streaming. But the
>data may still get updated while transferring, so the new node can never
>catch up with it. How cassandra handles this? Will cassandra lose data
>during this process?
>-- 
>Liu, Renjie
>Software Engineer, MVAD