You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Qi Li <ke...@gmail.com> on 2016/09/14 09:46:38 UTC

race condition for quorum consistency

hi all,

we are using quorum consistency, and we *suspect* there may be a race
condition during the write. lets say RF is 3. so write will wait for at
least 2 nodes to ack. suppose there is only 1 node acked(node A). the other
2 nodes(node B, C) are still waiting to update. there come two read requests
one read is having the data responded from the node B and C, so version 1
us returned.
the other node is having data responded from node A and B, so the latest
version 2 is returned.

so clients are getting different data at the same time. is this a valid
analysis? if so, is there any options we can set to deal with this issue?

thanks
Ken

Re: race condition for quorum consistency

Posted by Alexander Dejanovski <al...@thelastpickle.com>.

I haven't been very accurate in my first answer indeed, which was
misleading.
Apache Cassandra guarantees that if all queries are ran at least at quorum,
a client writing successfully (as in the cluster acknowledged the write)
then reading his previous write will see the correct value unless another
client updated it between the write and the read (which would be a race
condition). Same goes for two different clients if the first issues a
successful write and only after that the second reads the value.
Quorum provides consistency guaranty if queries are fired in sequence.

Without diving into complex scenarios where it may work because of read
repair and the fact that everything is async, Ken's use case was : C1
writes, it is not successful yet, C2 and C3 read at the approx. same time.
Once again, in this case C2 and C3 could be reading a different value as
C1's mutation could be in pending state on some nodes. Considering we have
nodes A, B and C :

   - Node A has received the write from C1, nodes B and C have not
   - C2 reads from A and B, there's a digest mismatch which triggers a
   foreground read repair (background read repairs are triggered at CL ONE) >
   it gets the up to date value that was written by C1
   - C3 reads from B and C, there's no digest mismatch and the value is not
   up to date with A > it does not get the value written by C1

Cheers,

On Thu, Sep 15, 2016 at 12:10 AM Tyler Hobbs <ty...@datastax.com> wrote:

>
> On Wed, Sep 14, 2016 at 3:49 PM, Nicolas Douillet <
> nicolas.douillet@gmail.com> wrote:
>
>>
>>    -
>>    - during read requests, cassandra will ask to one node the data and
>>    to the others involved in the CL a digest, and if all digests do not match,
>>    will ask for them the entire data, handle the merge and finally will ask to
>>    those nodes a background repair. Your write may have succeed during this
>>    time.
>>
>>
> This is very good info, but as a minor correction, the repair here will
> happen in the foreground before the response is returned to the client.
> So, at least from a single client's perspective, you get monotonic reads.
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>
-- 
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: race condition for quorum consistency

Posted by Tyler Hobbs <ty...@datastax.com>.

On Wed, Sep 14, 2016 at 3:49 PM, Nicolas Douillet <
nicolas.douillet@gmail.com> wrote:

> -
> - during read requests, cassandra will ask to one node the data and to
> the others involved in the CL a digest, and if all digests do not match,
> will ask for them the entire data, handle the merge and finally will ask to
> those nodes a background repair. Your write may have succeed during this
> time.

This is very good info, but as a minor correction, the repair here will
happen in the foreground before the response is returned to the client.
So, at least from a single client's perspective, you get monotonic reads.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: race condition for quorum consistency

Posted by Nicolas Douillet <ni...@gmail.com>.

Hi,

In my opinion the guaranty provided by Cassandra is :
      if your write request in Quorum *succeed*, then the next (after the
write response) read requests in Quorum (that succeed too) will be
consistent
      (actually CL.Write + CL.Read > RF)

Of course while you haven't received a valid response to your write request
in Quorum the cluster is in a inconsistent state, and you have *to retry
your write request.*

That said, Cassandra provides some other important behaviors that will tend
to reduce the time of this inconsistent state :

   - the coordinator will not send the request to only the nodes that
   should answer to satisfy the CL, but to all nodes that should have
the data (of
   course with RF=3, only A,B&C are involved)

   - during read requests, cassandra will ask to one node the data and to
   the others involved in the CL a digest, and if all digests do not match,
   will ask for them the entire data, handle the merge and finally will ask to
   those nodes a background repair. Your write may have succeed during this
   time.

   - according to a chance ratio, cassandra will ask *sometimes* a read to
   all nodes holding the data, not only the ones involved in the CL and
   execute background repairs

   - you have to schedule repairs regularly


I'd add that if some nodes do not succeed to handle write requests in time,
they may be under pressure, and there is a small chance that they succeed
on a read request :)

And finally what is time? From where/when? You may schedule a read after an
other but receive the result before. Writing in Quorum is not writing
within a transaction, you'll certainly have to made some tradeoff.

Regards,

--
Nicolas




Le mer. 14 sept. 2016 à 21:14, Alexander Dejanovski <al...@thelastpickle.com>
a écrit :

> My understanding of the described scenario is that the write hasn't
> succeeded when reads are fired, as B and C haven't processed the mutation
> yet.
>
> There would be 3 clients here and not 2 : C1 writes, C2 and C3 read.
>
> So the race condition could still happen in this particular case.
>
> Le mer. 14 sept. 2016 21:07, Work <jr...@codojo.me> a écrit :
>
>> Hi Alex:
>>
>> Hmmm ... Assuming clock skew is eliminated.... And assuming nodes are up
>> and available ... And assuming quorum writes and quorum reads and everyone
>> waiting for success ( which is NOT The OP scenario), Two different clients
>> will be guaranteed to see all successful writes, or be told that read
>> failed.
>>
>> C1 writes at quorum to A,B
>> C2 reads at quorum.
>> So it tries to read from ALL nodes, A,B, C.
>> If A,B respond --> success
>> If A,C respond --> conflict
>> If B, C respond --> conflict
>> Because a quorum (2 nodes) responded, the coordinator will return the
>> latest time stamp and may issue read repair depending on YAML settings.
>>
>> So where do you see only one client having this guarantee?
>>
>> Regards,
>>
>> James
>>
>> On Sep 14, 2016, at 4:00 AM, Alexander DEJANOVSKI <ad...@gmail.com>
>> wrote:
>>
>> Hi,
>>
>> the analysis is valid, and strong consistency the Cassandra way means
>> that one client writing at quorum, then reading at quorum will always see
>> his previous write.
>> Two different clients have no guarantee to see the same data when using
>> quorum, as illustrated in your example.
>>
>> Only options here are to route requests to specific clients based on some
>> id to guarantee the sequence of operations outside of Cassandra (the same
>> client will always be responsible for a set of ids), or raise the CL to ALL
>> at the expense of availability (you should not do that).
>>
>>
>> Cheers,
>>
>> Alex
>>
>> Le mer. 14 sept. 2016 à 11:47, Qi Li <ke...@gmail.com> a écrit :
>>
>>> hi all,
>>>
>>> we are using quorum consistency, and we *suspect* there may be a race
>>> condition during the write. lets say RF is 3. so write will wait for at
>>> least 2 nodes to ack. suppose there is only 1 node acked(node A). the other
>>> 2 nodes(node B, C) are still waiting to update. there come two read requests
>>> one read is having the data responded from the node B and C, so version
>>> 1 us returned.
>>> the other node is having data responded from node A and B, so the latest
>>> version 2 is returned.
>>>
>>> so clients are getting different data at the same time. is this a valid
>>> analysis? if so, is there any options we can set to deal with this issue?
>>>
>>> thanks
>>> Ken
>>>
>> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

Re: race condition for quorum consistency

Posted by Alexander Dejanovski <al...@thelastpickle.com>.

My understanding of the described scenario is that the write hasn't
succeeded when reads are fired, as B and C haven't processed the mutation
yet.

There would be 3 clients here and not 2 : C1 writes, C2 and C3 read.

So the race condition could still happen in this particular case.

Le mer. 14 sept. 2016 21:07, Work <jr...@codojo.me> a écrit :

> Hi Alex:
>
> Hmmm ... Assuming clock skew is eliminated.... And assuming nodes are up
> and available ... And assuming quorum writes and quorum reads and everyone
> waiting for success ( which is NOT The OP scenario), Two different clients
> will be guaranteed to see all successful writes, or be told that read
> failed.
>
> C1 writes at quorum to A,B
> C2 reads at quorum.
> So it tries to read from ALL nodes, A,B, C.
> If A,B respond --> success
> If A,C respond --> conflict
> If B, C respond --> conflict
> Because a quorum (2 nodes) responded, the coordinator will return the
> latest time stamp and may issue read repair depending on YAML settings.
>
> So where do you see only one client having this guarantee?
>
> Regards,
>
> James
>
> On Sep 14, 2016, at 4:00 AM, Alexander DEJANOVSKI <ad...@gmail.com>
> wrote:
>
> Hi,
>
> the analysis is valid, and strong consistency the Cassandra way means that
> one client writing at quorum, then reading at quorum will always see his
> previous write.
> Two different clients have no guarantee to see the same data when using
> quorum, as illustrated in your example.
>
> Only options here are to route requests to specific clients based on some
> id to guarantee the sequence of operations outside of Cassandra (the same
> client will always be responsible for a set of ids), or raise the CL to ALL
> at the expense of availability (you should not do that).
>
>
> Cheers,
>
> Alex
>
> Le mer. 14 sept. 2016 à 11:47, Qi Li <ke...@gmail.com> a écrit :
>
>> hi all,
>>
>> we are using quorum consistency, and we *suspect* there may be a race
>> condition during the write. lets say RF is 3. so write will wait for at
>> least 2 nodes to ack. suppose there is only 1 node acked(node A). the other
>> 2 nodes(node B, C) are still waiting to update. there come two read requests
>> one read is having the data responded from the node B and C, so version 1
>> us returned.
>> the other node is having data responded from node A and B, so the latest
>> version 2 is returned.
>>
>> so clients are getting different data at the same time. is this a valid
>> analysis? if so, is there any options we can set to deal with this issue?
>>
>> thanks
>> Ken
>>
> --
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: race condition for quorum consistency

Posted by Work <jr...@codojo.me>.

Hi Alex:

Hmmm ... Assuming clock skew is eliminated.... And assuming nodes are up and available ... And assuming quorum writes and quorum reads and everyone waiting for success ( which is NOT The OP scenario), Two different clients will be guaranteed to see all successful writes, or be told that read failed. 

C1 writes at quorum to A,B
C2 reads at quorum. 
So it tries to read from ALL nodes, A,B, C.
If A,B respond --> success
If A,C respond --> conflict
If B, C respond --> conflict
Because a quorum (2 nodes) responded, the coordinator will return the latest time stamp and may issue read repair depending on YAML settings.

So where do you see only one client having this guarantee?

Regards,

James

> On Sep 14, 2016, at 4:00 AM, Alexander DEJANOVSKI <ad...@gmail.com> wrote:
> 
> Hi, 
> 
> the analysis is valid, and strong consistency the Cassandra way means that one client writing at quorum, then reading at quorum will always see his previous write.
> Two different clients have no guarantee to see the same data when using quorum, as illustrated in your example.
> 
> Only options here are to route requests to specific clients based on some id to guarantee the sequence of operations outside of Cassandra (the same client will always be responsible for a set of ids), or raise the CL to ALL at the expense of availability (you should not do that).
> 
>  
> Cheers,
> 
> Alex
> 
>> Le mer. 14 sept. 2016 à 11:47, Qi Li <ke...@gmail.com> a écrit :
>> hi all,
>> 
>> we are using quorum consistency, and we *suspect* there may be a race condition during the write. lets say RF is 3. so write will wait for at least 2 nodes to ack. suppose there is only 1 node acked(node A). the other 2 nodes(node B, C) are still waiting to update. there come two read requests
>> one read is having the data responded from the node B and C, so version 1 us returned.
>> the other node is having data responded from node A and B, so the latest version 2 is returned.
>> 
>> so clients are getting different data at the same time. is this a valid analysis? if so, is there any options we can set to deal with this issue? 
>> 
>> thanks
>> Ken

Re: race condition for quorum consistency

Posted by Alexander DEJANOVSKI <ad...@gmail.com>.

Hi,

the analysis is valid, and strong consistency the Cassandra way means that
one client writing at quorum, then reading at quorum will always see his
previous write.
Two different clients have no guarantee to see the same data when using
quorum, as illustrated in your example.

Only options here are to route requests to specific clients based on some
id to guarantee the sequence of operations outside of Cassandra (the same
client will always be responsible for a set of ids), or raise the CL to ALL
at the expense of availability (you should not do that).


Cheers,

Alex

Le mer. 14 sept. 2016 à 11:47, Qi Li <ke...@gmail.com> a écrit :

> hi all,
>
> we are using quorum consistency, and we *suspect* there may be a race
> condition during the write. lets say RF is 3. so write will wait for at
> least 2 nodes to ack. suppose there is only 1 node acked(node A). the other
> 2 nodes(node B, C) are still waiting to update. there come two read requests
> one read is having the data responded from the node B and C, so version 1
> us returned.
> the other node is having data responded from node A and B, so the latest
> version 2 is returned.
>
> so clients are getting different data at the same time. is this a valid
> analysis? if so, is there any options we can set to deal with this issue?
>
> thanks
> Ken
>