You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ariel Weisberg (JIRA)" <ji...@apache.org> on 2017/04/05 21:42:41 UTC

[jira] [Commented] (CASSANDRA-13419) Relax limit on number of pending endpoints during CAS

    [ https://issues.apache.org/jira/browse/CASSANDRA-13419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957793#comment-15957793 ] 

Ariel Weisberg commented on CASSANDRA-13419:
--------------------------------------------

[~slebresne] [~pauloricardomg]

Sylvain I don't follow your suggestion.

So the starting point is that we need the quorum for the condition check or serial read to include at least one replica that responded to PREPARE. This fixes the stale read issue from CASSANDRA-8346.

So we might only consider a node pending for CAS for the timeout of a Paxos round. Because if it's been pending longer than that amount of time it must have been part of the quorum of the PREPARE? What drives that guarantee?

Could we do something very simple like remember who was in the QUORUM for PREPARE and require a response from at least one of them when doing the condition check or read?

I don't see having the pending node be in a different state as being super hard either. We can record a timestamp when it first joins and then compare how long it has been when deciding whether it is pending for the purposes of Paxos. We are measuring time since what though? Since the coordinator first learned about the pending node via Gossip?

> Relax limit on number of pending endpoints during CAS
> -----------------------------------------------------
>
>                 Key: CASSANDRA-13419
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13419
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Coordination, CQL
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>
> CASSANDRA-8346 avoids stale reads during CAS when checking the condition or doing serial reads by disallowing more than one pending endpoint.
> It seems like it should be possible to allow more than one pending endpoint by being smarter about who we read from during the QUORUM read or about the state of pending nodes that are there for host replacement.
> Sylvain suggested 
> bq. Well, I guess things are working as they do for decently good reason here. That said, thinking about it, it could be that the solution from CASSANDRA-8346 is a bit of a big hammer: I believe it's enough to ensure that we read from at least one replica that responded to PREPARE 'in the same Paxos round' But we have timeouts on the paxos round, so it could be it is possible to reduce drastically the time we consider a node pending for CAS so that it's not a real problem in practice. Something like having pending node move to a "almost there" state before becoming true replica, and staying in that state for basically the max time of a paxos round, and then Paxos might be able to replace "pending" nodes by those "almost there" for PREPARE.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)