You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Christian Spriegel (JIRA)" <ji...@apache.org> on 2018/06/18 15:32:00 UTC

[jira] [Commented] (CASSANDRA-14480) Digest mismatch requires all replicas to be responsive

    [ https://issues.apache.org/jira/browse/CASSANDRA-14480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515889#comment-16515889 ] 

Christian Spriegel commented on CASSANDRA-14480:
------------------------------------------------

I just saw this happening in a production system:

 
{noformat}
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ALL (8 responses were required but only 7 replica responded){noformat}
Our queries use LOCAL_QUORUM, but we have RTEs happening due to read-repair. read_repair_chance = 0.1 is set, so its going cross DC :(

 

> Digest mismatch requires all replicas to be responsive
> ------------------------------------------------------
>
>                 Key: CASSANDRA-14480
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14480
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Christian Spriegel
>            Priority: Major
>         Attachments: Reader.java, Writer.java, schema_14480.cql
>
>
> I ran across a scenario where a digest mismatch causes a read-repair that requires all up nodes to be able to respond. If one of these nodes is not responding, then the read-repair is being reported to the client as ReadTimeoutException.
>  
> My expection would be that a CL=QUORUM will always succeed as long as 2 nodes are responding. But unfortunetaly the third node being "up" in the ring, but not being able to respond does lead to a RTE.
>  
>  
> I came up with a scenario that reproduces the issue:
>  # set up a 3 node cluster using ccm
>  # increase the phi_convict_threshold to 16, so that nodes are permanently reported as up
>  # create attached schema
>  # run attached reader&writer (which only connects to node1&2). This should already produce digest mismatches
>  # do a "ccm node3 pause"
>  # The reader will report a read-timeout with consistency QUORUM (2 responses were required but only 1 replica responded). Within the DigestMismatchException catch-block it can be seen that the repairHandler is waiting for 3 responses, even though the exception says that 2 responses are required.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org