You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "ZhaoYang (Jira)" <ji...@apache.org> on 2020/04/29 16:29:00 UTC

[jira] [Created] (CASSANDRA-15774) Improve range reads to query by endpoints instead of vnodes to reduce number of remote requests

ZhaoYang created CASSANDRA-15774:
------------------------------------

             Summary: Improve range reads to query by endpoints instead of vnodes to reduce number of remote requests
                 Key: CASSANDRA-15774
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15774
             Project: Cassandra
          Issue Type: Improvement
          Components: Legacy/Coordination
            Reporter: ZhaoYang


Currently, range read queries in batches, see {{StorageProxy.RangeCommandIterator#sendNextRequests()}}. For each batch, it computes a list of merged vnode ranges up to concurrency factor and query each merged vnode range asynchronously. (note: consecutive vnode ranges can be merged if they share enough replicas to satisfy consistency level requirement)

This works fine in general, but when concurrency factor is high because returned row count is small comparing to query limit or index filtering is used, coordinator may send too many concurrent remote range requests in a batch.

We can improve it by grouping remote range requests by endpoints where each endpoint will return response corresponding to multiple non-consecutive ranges. With endpoint grouping, number of remote range requests should largely reduced and it's always capped by number of nodes in the cluster instead of number of ranges which is capped by concurrency factor.

Let's look at an example on a 5-node cluster with 10 ranges(a,b,c,d,e,f,g,h,i,h) and rf3.

Following is the range to replica mapping using round robin that should work well with consecutive range merger (consecutive range merger doesn't work well with fully random replica mapping, because it's less likely to have overlapping replicas for consecutive ranges)
{code:java}
   range-a replicas: 1, 2, 3
   range-b replicas: 2, 3, 4
   range-c replicas: 3, 4, 5
   range-d replicas: 1, 4, 5
   range-e replicas: 1, 2, 5
   range-f replicas: 1, 2, 3
   range-g replicas: 2, 3, 4
   range-h replicas: 3, 4, 5
   range-i replicas: 1, 4, 5
   range-j replicas: 1, 2, 5
{code}
With default range read implementation and consecutive range merger, we need 10 replica read requests(2 for each merged range) for quorum:
{code:java}
     range (a,b] on node [2, 3]
     range (c,d] on node [4, 5]
     range (e,f] on node [1, 2]
     range (g,h] on node [3, 4]
     range (i,j] on node [1, 5]
{code}
With group query by endpoints, we only need 4 replica read requests for quorum:
{code:java}
    * node 1: a, d, e, f, i, j
    * node 2: a, b, e, f, g, j
    * node 3: b, c, g, h
    * node 4: c, d, h, i
{code}
 
Note that there are some complexities around short-read protection which needs to know whether replica has more rows available for current range.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org