You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Yifan Cai (Jira)" <ji...@apache.org> on 2019/12/04 19:39:00 UTC

[jira] [Commented] (CASSANDRA-15442) Read repair implicitly increases read timeout value

    [ https://issues.apache.org/jira/browse/CASSANDRA-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988141#comment-16988141 ] 

Yifan Cai commented on CASSANDRA-15442:
---------------------------------------

The read repair needs to be blocking to guarantee monotonic read, per CASSANDRA-2494.

According to the discussion at CASSANDRA-14635, making the repair (write) async is not a considered use case.
----
h4. Proposed Fix

To respect the timeout that client is expecting, in each step, the blocking operation and the internode messagings should only use the *remaining* timeout. The write part at repair is still part of the read, so it should share the same timeout. 

The step 2 in the timeline already adjusts the internode requests timeout to the remaining. 

The proposed fix argues the step 3 should also use the remaining timeout, instead of using a separate {{WriteRPCTimeout.}}
h4. The Impact

- The read timeout (due to blocking read repair) may occur more frequently if using the existing {{ReadRPCTimeout}}. The read timeout may need to be configured higher to allow the blocking read repair to complete. In fact, the timeout is increased to reflect the actual time taken. (The time for write is just not counted in read as of now)
- Increasing the read timeout allows the genuine slow read queries (but no read repair) to stay longer and negatively impact throughput. 
 

> Read repair implicitly increases read timeout value
> ---------------------------------------------------
>
>                 Key: CASSANDRA-15442
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15442
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Core
>            Reporter: Yifan Cai
>            Assignee: Yifan Cai
>            Priority: Normal
>
> When read repair occurs during a read, internally, it starts several _blocking_ operations in sequence. See {{org.apache.cassandra.service.StorageProxy#fetchRows}}. 
>  The timeline of the blocking operations
>  # Regular read, wait for full data/digest read response to complete. {{reads[*].awaitResponses();}}
>  # Read repair read, wait for full data read response to complete. {{reads[*].awaitReadRepair();}}
>  # Read repair write, wait for write response to complete. {{concatAndBlockOnRepair(results, repairs);}}
> Step 1 and 2 each waits for the duration of read timeout, say 5 s.
>  Step 3 waits for the duration of write timeout, say 2 s.
>  In the worse case, the actual time taken for a read could accumulate to ~12 s, if each individual step does not exceed the timeout value.
>  From the client perspective, it does not expect a request taken way higher than the database configured timeout value. 
>  Such scenario is especially bad for the clients that have set up client-side timeout monitoring close to the configured one. The clients think the operations timed out and abort, but they are in fact still running on server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org