You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ariel Weisberg (JIRA)" <ji...@apache.org> on 2018/12/21 17:39:00 UTC

[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

    [ https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726929#comment-16726929 ] 

Ariel Weisberg commented on CASSANDRA-12126:
--------------------------------------------

Having reviewed the code I think what Benedict says is correct. The criteria we use for identifying if there is an progress paxos round that needs resolution is incorrect because it assumes we have visibility to all accepted ballots when we only have visibility to a majority.

I think this optimization can be done correctly, but it's a bit of surgery. Right now reads do a prepare and modify the promised ballot at each acceptor. If instead we only read the promised ballot from each acceptor then we could check the promised ballot matches the most recent committed ballot. If those are the same we know nothing is in progress because a higher ballot than the most recent accepted/committed ballot has not been promised by a majority which means there can be no lingering accepted ballot since a majority of promises must be collected first.

If that isn't the case then we can go ahead and do a prepare + propose to make them match and subsequent reads won't have to do a propose.

This may also impact our choice of how many replicas to contact in each phase since we want them to have consistent paxos state so reads can be one roundtrip. I am not sure if we contact them all (like with mutations) or just a majority.

> CAS Reads Inconsistencies 
> --------------------------
>
>                 Key: CASSANDRA-12126
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination
>            Reporter: sankalp kohli
>            Priority: Major
>              Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies true to a propose and saves the commit in accepted filed. The other two machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that there is something inflight from A and will propose and commit it with the current ballot. Now we can read the value written in step 1 as part of this CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and commit a different value than step 1. Step 1 value will never be seen again and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It talks about this issue which is how learners can find out if majority of the acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know if it was accepted by majority of acceptors. When we ask majority of acceptors, and more than one acceptors but not majority has something in flight, we have no way of knowing if it is accepted by majority of acceptors. So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable with respect to writes and other reads. In this case, we know that majority of acceptors have no inflight commit which means we have majority that nothing was accepted by majority. I think we should run a propose step here with empty commit and that will cause write written in step 1 to not be visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org