You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Xiaolong Jiang (JIRA)" <ji...@apache.org> on 2017/03/13 23:43:41 UTC
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking

    [ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923200#comment-15923200 ] 

Xiaolong Jiang commented on CASSANDRA-10726:
--------------------------------------------

The patch is trying to do 2 things:
1. Before, when we read, say quorum read, let  RF = 3 (replica1, replica2, replica3), so the client request is trying to read from 2 replicas (replica1, replica2), but there is a digest mismatch between these 2 replicas, so read repair will kick in. Let's say the stale data is in replica2, read repair will send the correct data to replica2. But for some reason, the write request got timeout, then we send "read timeout " to client side. 
After this patch, we will wait for replica2 write for some time, if it didn't come back, correct data is sent to replica3 no matter whether replica3 already has latest data or not. Because we know if replica3 write succeeds, it's guaranteed 2 replicas got the correct data, client will return success with data for read request, and next time the quorum read will definitely read correct data.

2. The second thing this patch is trying to do is to make sure in read repair part, we don't block for replicas beyond what is needed for consistency level to reply back in speculative retry/read repair chance case. For example, we still use above RF = 3 quorum read case, it's trying to read from replica1 and replica2, but replica2 is slow, then speculative retry kicks in, read will try to read replica3, then all 3 replicas read come back, but there is digest mismatch, both replica2 and replica3 are stale data, what happens before is read repair will block for both replica2 and replica3 to finish read repair, but there is no need to wait for both to come back, we only need to wait for one repair to come back since we only need one successful repair to guarantee successful quorum read. And next quorum read will definitely read latest data even replica 3 read repair failed.    This is applied same to read repiar chance. Let's say the read repair chance is "GLOBAL", we don't need to block for all replicas to finish repair, we only need to block what the read consistency level needs. 

> Read repair inserts should not be blocking
> ------------------------------------------
>
>                 Key: CASSANDRA-10726
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10726
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Coordination
>            Reporter: Richard Low
>            Assignee: Xiaolong Jiang
>             Fix For: 3.0.x
>
>
> Today, if there’s a digest mismatch in a foreground read repair, the insert to update out of date replicas is blocking. This means, if it fails, the read fails with a timeout. If a node is dropping writes (maybe it is overloaded or the mutation stage is backed up for some other reason), all reads to a replica set could fail. Further, replicas dropping writes get more out of sync so will require more read repair.
> The comment on the code for why the writes are blocking is:
> {code}
> // wait for the repair writes to be acknowledged, to minimize impact on any replica that's
> // behind on writes in case the out-of-sync row is read multiple times in quick succession
> {code}
> but the bad side effect is that reads timeout. Either the writes should not be blocking or we should return success for the read even if the write times out.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)