You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict (JIRA)" <ji...@apache.org> on 2014/12/03 11:39:15 UTC

[jira] [Comment Edited] (CASSANDRA-7032) Improve vnode allocation

    [ https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232857#comment-14232857 ] 

Benedict edited comment on CASSANDRA-7032 at 12/3/14 10:38 AM:
---------------------------------------------------------------

Well, NetworkTopologyStrategy already enforces some degree of balance across racks, and absolutely guarantees balance across DCs as far as replication ownership is concerned. It _would_ be nice to migrate this behaviour to the token selection so that we could reason about ownership a bit more clearly (NTS might enforce our general ownership constraints, but having a predictably cheap generation strategy for end points would be great, as the amount of state necessary to route queries could shrink dramatically. if we could rely on a sequence of adjacent tokens ensuring these properties, for instance), but a simpler goal of simply ensuring that for any given arbitrary slice of the global token range, all nodes have a share of the range that is within epsilon of perfect, should be more than sufficient.

TL;DR; our goal should probably be: "for any given arbitrary slice of the global token range, all nodes have a share of the range that is within epsilon* of perfect"

\* with epsilon probably inversely proportional to the size of the slice


was (Author: benedict):
Well, NetworkTopologyStrategy already enforces some degree of balance across racks, and absolutely guarantees balance across DCs as far as replication ownership is concerned. It _would_ be nice to migrate this behaviour to the token selection so that we could reason about ownership a bit more clearly (NTS might enforce our general ownership constraints, but having a predictably cheap generation strategy for end points would be great, as the amount of state necessary to route queries could shrink dramatically. if we could rely on a sequence of adjacent tokens ensuring these properties, for instance), but a simpler goal of simply ensuring that for any given arbitrary slice of the global token range, all nodes have a share of the range that is within epsilon of perfect, should be more than sufficient.

TL;DR; our goal should probably be: "for any given arbitrary slice of the global token range, all nodes have a share of the range that is within epsilon* of perfect"

* with epsilon probably inversely proportional to the size of the slice

> Improve vnode allocation
> ------------------------
>
>                 Key: CASSANDRA-7032
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Branimir Lambov
>              Labels: performance, vnodes
>             Fix For: 3.0
>
>         Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java
>
>
> It's been known for a little while that random vnode allocation causes hotspots of ownership. It should be possible to improve dramatically on this with deterministic allocation. I have quickly thrown together a simple greedy algorithm that allocates vnodes efficiently, and will repair hotspots in a randomly allocated cluster gradually as more nodes are added, and also ensures that token ranges are fairly evenly spread between nodes (somewhat tunably so). The allocation still permits slight discrepancies in ownership, but it is bound by the inverse of the size of the cluster (as opposed to random allocation, which strangely gets worse as the cluster size increases). I'm sure there is a decent dynamic programming solution to this that would be even better.
> If on joining the ring a new node were to CAS a shared table where a canonical allocation of token ranges lives after running this (or a similar) algorithm, we could then get guaranteed bounds on the ownership distribution in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)