You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@helix.apache.org by "Kanak Biscuitwala (JIRA)" <ji...@apache.org> on 2013/08/07 20:10:53 UTC

[jira] [Commented] (HELIX-196) Research libcrunch, add it as a new rebalancing strategy

    [ https://issues.apache.org/jira/browse/HELIX-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732243#comment-13732243 ] 

Kanak Biscuitwala commented on HELIX-196:
-----------------------------------------

This is pretty interesting, and solves many of the same problems that we try to solve in our auto-rebalancer, while taking into account physical node locations if we care about it. 

They've provided a few mapping algorithms for different techniques, all of which are based on the CRUSH algorithm that came out of UCSC, and some of which take into account a metric that they call RDF, which takes into account not only how many replicas there are, but how many nodes share a chunk of data.

It's still under development, and documentation is still a work in progress, but here's what I think are the key considerations from trying out the current release:

1. There may be some tricks we need to play to ensure that capacity constraints are satisfied. It looks like they're currently respected when possible, but ignored if not. In our case, it's more acceptable to not serve a replica rather than having a node promise it, and be overloaded. A way to do this is to advertise fewer replicas to the algorithm than the number that actually exists.
2. We'd need to decide the topology we want to expose to the algorithm. The simplest one is that every node is a direct child of the root.
3. What is the optimal RDF for our purposes?
                
> Research libcrunch, add it as a new rebalancing strategy
> --------------------------------------------------------
>
>                 Key: HELIX-196
>                 URL: https://issues.apache.org/jira/browse/HELIX-196
>             Project: Apache Helix
>          Issue Type: Improvement
>          Components: helix-core
>            Reporter: Kanak Biscuitwala
>            Assignee: Kanak Biscuitwala
>            Priority: Minor
>
> Twitter just open-sourced libcrunch, so it may be of interest to add it as a new rebalancing strategy.
> Source: https://github.com/twitter/libcrunch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira