You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sergio Bossa (JIRA)" <ji...@apache.org> on 2016/06/22 18:41:24 UTC

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

    [ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344958#comment-15344958 ] 

Sergio Bossa commented on CASSANDRA-9318:
-----------------------------------------

I would like to reopen this ticket and propose the following patch to implement coordinator-based back-pressure:

| [3.0 patch|https://github.com/apache/cassandra/compare/cassandra-3.0...sbtourist:CASSANDRA-9318-3.0?expand=1] | [testall|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-3.0-testall/] | [dtest|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-3.0-dtest/] |

The above patch provides full-blown, end-to-end, replica-to-coordinator-to-client *write* back-pressure, based on the following main concepts:
* The replica itself has no back-pressure knowledge: it keeps trying to write mutations as fast as possible, and still applies load shedding.
* The coordinator tracks the back-pressure state *per replica*, which in the current implementation consists of the incoming and outgoing rate of messages from/to the replica.
* The coordinator is configured with a back-pressure strategy that based on the back-pressure state, applies a given back-pressure algorithm when sending mutations to each replica.
* The provided default strategy is based on the incoming/outgoing message ratio, used to rate limit outgoing messages towards a given replica.
* The back-pressure strategy is also in charge of signalling the coordinator when a given replica is considered "overloaded", in which case an {{OverloadedException}} is thrown to the client for all mutation requests deemed as "overloading", until the strategy considers such overloaded state over.
* The provided default strategy uses configurable low/high thresholds to either rate limit or throw exception back to clients.

While all of that might seem too complex, the patch is actually surprisingly simple. I provided as many unit tests as possible, and I've also tested it on a 2-nodes CCM cluster, using [ByteMan|http://byteman.jboss.org/] to simulate a slow replica, and I'd say results are quite promising: as an example, see attached ByteMan script and plots showing a cluster with no back-pressure ending up dropping ~200k mutations, while a cluster with back-pressure enabled only ~2k, which means less coordinator overload and an easily recoverable replica state via hints.

I can foresee at least two open points:
* We might want to track more "back-pressure state" to allow implementing different strategies; I personally believe strategies based on in/out rates are the most appropriate ones to avoid *both* the overloading and dropped mutations problems, but people might think differently.
* When the {{OverloadedException}} is (eventually) thrown, some requests might have been already sent, which is exactly what currently happens with hint overloading too: we might want to check both kinds of overloading before actually sending any mutations to replicas.

Thoughts?

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths, Streaming and Messaging
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>
> It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)