You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Branimir Lambov (JIRA)" <ji...@apache.org> on 2015/03/25 17:32:53 UTC

[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

    [ https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380174#comment-14380174 ] 

Branimir Lambov commented on CASSANDRA-7032:
--------------------------------------------

Patch is up for review [here|https://github.com/apache/cassandra/compare/trunk...blambov:7032-vnode-assignment]. It gives the option to specify a "allocate_tokens_keyspace" when bringing up a node. The node's tokens are then allocated to optimize the load distribution for the replication strategy of that keyspace.

The allocation is currently restricted to Murmur3Partitioner and SimpleStrategy or NetworkTopologyStrategy (is there anything else we need to support?). With the latter it cannot deal with cases where the number of racks in the dc is more than one but less than the replication factor, which should not be a common case.

There are a couple of things still left to do or explore, possibly in separate patches:
- add a dtest starting several nodes with allocation
- run a cstar_perf to see if it could show improvement for RF 2 in a 3-node cluster
- optimization of the selection for the first RF nodes in the cluster to guarantee good distribution later (see ReplicationAwareTokenAllocator.testNewCluster)
- (if deemed worthwhile) multiple different replication factors in one datacentre; the current code works ok when asked to allocate alternatingly but this could be improved if we consider all relevant strategies in parallel

> Improve vnode allocation
> ------------------------
>
>                 Key: CASSANDRA-7032
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Branimir Lambov
>              Labels: performance, vnodes
>             Fix For: 3.0
>
>         Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java
>
>
> It's been known for a little while that random vnode allocation causes hotspots of ownership. It should be possible to improve dramatically on this with deterministic allocation. I have quickly thrown together a simple greedy algorithm that allocates vnodes efficiently, and will repair hotspots in a randomly allocated cluster gradually as more nodes are added, and also ensures that token ranges are fairly evenly spread between nodes (somewhat tunably so). The allocation still permits slight discrepancies in ownership, but it is bound by the inverse of the size of the cluster (as opposed to random allocation, which strangely gets worse as the cluster size increases). I'm sure there is a decent dynamic programming solution to this that would be even better.
> If on joining the ring a new node were to CAS a shared table where a canonical allocation of token ranges lives after running this (or a similar) algorithm, we could then get guaranteed bounds on the ownership distribution in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)