You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Peter Schuller (JIRA)" <ji...@apache.org> on 2011/04/13 00:20:05 UTC

[jira] [Commented] (CASSANDRA-2338) C* consistency level needs to be pluggable

    [ https://issues.apache.org/jira/browse/CASSANDRA-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019088#comment-13019088 ] 

Peter Schuller commented on CASSANDRA-2338:
-------------------------------------------

A related concern that is not directly about consistency but about performance is that one may wish to control to what extent requests are sent to "extra" nodes than required by consistency.

For example, a very nice property of running with QUORUM + read-repair turned fully on (and let's say RF=3), is that for any request, it's totally fine for a single node to be e.g. slow without giving an application visible poor latency. If the dynamic snitch is engaged, the slow node should usually not be the one that is considered closest, so it's not the one getting the data read.

Turning off read-repair negates that since without read repair, messages are sent only to those required by consistency level. So if any node is slow, the request will be slow.

Also related is that for reads that are expected to be small, it may be that it is irrelevant to do the digest-only optimization. For many cases, the disk I/O and perhaps CPU cost is going to be a lot more relevant than the overhead of sending some extra ~ 40 bytes or whatever over the network. In such cases, it is probably often preferable to send read commands to all, or at least multiple, nodes such one is not depending on a specific node being up + fast in order to return the data.

For example, suppose I have a low-consistency situation where I care about good latency in terms of avoiding outliers. While CL.ONE is the typical suggestion, a better avoidance of outliers should be possible if CL.ONE is used but each node is sent a full read command such that the request can complete immediately whenever any node responds, without waiting for a timeout (or just a slow response not timing out) from the node that happens to be considered closest.

This may be out of scope for this ticket, but maybe worth at least thinking about. If Cassandra can offer, at reasonable complexity for the application writer, detailed choices for all of these at the same time:

(1) Least number of endpoints for consistency, at a per-DC level (to control consistency).
(2) Maximum allowed, at a per-DC level (to control latency).
(3) Pessimistic "over-messaging" (to control latency, in particular outliers).

... it should be enough to cover a great many cases (and from a PR perspective, under the assumption that the complexity cost is not too high, it would really show-case what kind of detailed control and specific tuning is fundamentally possible given the data and messaging model).


> C* consistency level needs to be pluggable
> ------------------------------------------
>
>                 Key: CASSANDRA-2338
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2338
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Matthew F. Dennis
>            Priority: Minor
>
> for cases where people want to run C* across multiple DCs for disaster recovery et cetera where normal operations only happen in the first DC (e.g. no writes/reads happen in the remove DC under normal operation) neither LOCAL_QUORUM or EACH_QUORUM really suffices.  
> Consider the case with RF of DC1:3 DC2:2
> LOCAL_QUORUM doesn't provide any guarantee that data is in the remote DC.
> EACH_QUORUM requires that both nodes in the remote DC are up.
> It would be useful in some situations to be able to specify a strategy where LOCAL_QUORUM is used for the local DC and at least one in a remote DC (and/or at least in *each* remote DC).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira