You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@helix.apache.org by kishore g <g....@gmail.com> on 2014/10/10 21:25:42 UTC

Writing custom rebalancer

Hi,

Even though we have couple of ways of writing custom rebalancer (one on
participant side and another on the controller side), I dont think its
trivial for some one to write them without understanding all the internal
details of Helix.

I am starting this thread to see if others have any thoughts on making it
easier for some to get started and write their own rebalancer as part of
the quick start.

thanks,
Kishore G

Re: Writing custom rebalancer

Posted by Bob Schulman <bo...@schulman.com>.
I was thinking not so much about the decomposition as much as the types of
problems to be solved with custom rebalancers.  Here are a few that come
to mind that would be interesting test-cases.  Would be nice if these are
relatively easy to implement, as these would require custom logic, and it
would be super useful for many systems.

- Bob

- balance due to a hot spot.  The idea would be to split the load so the
hot spot is cut down to size.
- balance due to workload patterns, such as some partitions hot during the
day and others at night, say because of US vs non-US traffic, or
read-heavy by day and write-heavy by night as updates are computed offline
and reloaded in bulk
- change balancing for isolation in a multi-tenant situation

> Here is my two cents. Currently we are mostly running rebalancer inside
> controller pipeline, so the rebalancer is triggered by Zookeeper change
> notifications and it gets a free copy of Zookeeper cluster data snapshot.
>  However, rebalancer may also be triggered in other ways like timers,
> system load changes, or any external signals. In addition, rebalancer may
> also need to access data from some monitoring systems, or external
> services like MySQL.
>
>
> We could probably separate the logic of rebalancer from controller.
> Rebalancer is all about setting ideal-state; i.e. set the target mappings
> of partition-->(host, state). The rebalancer logic will be mostly
> application specific. On the other hand, controller is all about bringing
>  current-state to ideal-state. The controller logic includes using a
> semi-greedy algorithm (e.g. shortest path) to calculate the next mappings
>  from current mapping (i.e. current-state) to target mapping (i.e.
> ideal-state), applying constraints, figuring out optimal parallism, etc.
> The controller logic will be mostly generic to all applications. The only
>  protocol between rebalancer and controller is ideal-state (i.e. the
> target partiton-->(host, state) mappings). In this sense, every rebalancer
> is customized, and we can provide some default implementations like auto
> or semi-auto ones. Rebalancer can be also running anywhere provided that
> there is only one instance running. This can be achieved through leader
> election, running with controller, or use custom-code invoker.
>
> On Fri, Oct 10, 2014 at 12:25 PM, kishore g <g....@gmail.com> wrote:
>
>
>> Hi,
>>
>>
>> Even though we have couple of ways of writing custom rebalancer (one on
>>  participant side and another on the controller side), I dont think its
>>  trivial for some one to write them without understanding all the
>> internal details of Helix.
>>
>> I am starting this thread to see if others have any thoughts on making
>> it easier for some to get started and write their own rebalancer as part
>> of the quick start.
>>
>> thanks, Kishore G
>>
>>
>



Re: Writing custom rebalancer

Posted by Bob Schulman <bo...@schulman.com>.
I was thinking not so much about the decomposition as much as the types of
problems to be solved with custom rebalancers.  Here are a few that come
to mind that would be interesting test-cases.  Would be nice if these are
relatively easy to implement, as these would require custom logic, and it
would be super useful for many systems.

- Bob

- balance due to a hot spot.  The idea would be to split the load so the
hot spot is cut down to size.
- balance due to workload patterns, such as some partitions hot during the
day and others at night, say because of US vs non-US traffic, or
read-heavy by day and write-heavy by night as updates are computed offline
and reloaded in bulk
- change balancing for isolation in a multi-tenant situation

> Here is my two cents. Currently we are mostly running rebalancer inside
> controller pipeline, so the rebalancer is triggered by Zookeeper change
> notifications and it gets a free copy of Zookeeper cluster data snapshot.
>  However, rebalancer may also be triggered in other ways like timers,
> system load changes, or any external signals. In addition, rebalancer may
> also need to access data from some monitoring systems, or external
> services like MySQL.
>
>
> We could probably separate the logic of rebalancer from controller.
> Rebalancer is all about setting ideal-state; i.e. set the target mappings
> of partition-->(host, state). The rebalancer logic will be mostly
> application specific. On the other hand, controller is all about bringing
>  current-state to ideal-state. The controller logic includes using a
> semi-greedy algorithm (e.g. shortest path) to calculate the next mappings
>  from current mapping (i.e. current-state) to target mapping (i.e.
> ideal-state), applying constraints, figuring out optimal parallism, etc.
> The controller logic will be mostly generic to all applications. The only
>  protocol between rebalancer and controller is ideal-state (i.e. the
> target partiton-->(host, state) mappings). In this sense, every rebalancer
> is customized, and we can provide some default implementations like auto
> or semi-auto ones. Rebalancer can be also running anywhere provided that
> there is only one instance running. This can be achieved through leader
> election, running with controller, or use custom-code invoker.
>
> On Fri, Oct 10, 2014 at 12:25 PM, kishore g <g....@gmail.com> wrote:
>
>
>> Hi,
>>
>>
>> Even though we have couple of ways of writing custom rebalancer (one on
>>  participant side and another on the controller side), I dont think its
>>  trivial for some one to write them without understanding all the
>> internal details of Helix.
>>
>> I am starting this thread to see if others have any thoughts on making
>> it easier for some to get started and write their own rebalancer as part
>> of the quick start.
>>
>> thanks, Kishore G
>>
>>
>



Re: Writing custom rebalancer

Posted by Zhen Zhang <ne...@gmail.com>.
Here is my two cents. Currently we are mostly running rebalancer inside
controller pipeline, so the rebalancer is triggered by Zookeeper change
notifications and it gets a free copy of Zookeeper cluster data snapshot.
However, rebalancer may also be triggered in other ways like timers, system
load changes, or any external signals. In addition, rebalancer may also
need to access data from some monitoring systems, or external services like
MySQL.

We could probably separate the logic of rebalancer from controller.
Rebalancer is all about setting ideal-state; i.e. set the target mappings
of partition-->(host, state). The rebalancer logic will be mostly
application specific. On the other hand, controller is all about bringing
current-state to ideal-state. The controller logic includes using a
semi-greedy algorithm (e.g. shortest path) to calculate the next mappings
from current mapping (i.e. current-state) to target mapping (i.e.
ideal-state), applying constraints, figuring out optimal parallism, etc.
The controller logic will be mostly generic to all applications. The only
protocol between rebalancer and controller is ideal-state (i.e. the target
partiton-->(host, state) mappings). In this sense, every rebalancer is
customized, and we can provide some default implementations like auto or
semi-auto ones. Rebalancer can be also running anywhere provided that there
is only one instance running. This can be achieved through leader election,
running with controller, or use custom-code invoker.

On Fri, Oct 10, 2014 at 12:25 PM, kishore g <g....@gmail.com> wrote:

> Hi,
>
> Even though we have couple of ways of writing custom rebalancer (one on
> participant side and another on the controller side), I dont think its
> trivial for some one to write them without understanding all the internal
> details of Helix.
>
> I am starting this thread to see if others have any thoughts on making it
> easier for some to get started and write their own rebalancer as part of
> the quick start.
>
> thanks,
> Kishore G
>

Re: Writing custom rebalancer

Posted by Zhen Zhang <ne...@gmail.com>.
Here is my two cents. Currently we are mostly running rebalancer inside
controller pipeline, so the rebalancer is triggered by Zookeeper change
notifications and it gets a free copy of Zookeeper cluster data snapshot.
However, rebalancer may also be triggered in other ways like timers, system
load changes, or any external signals. In addition, rebalancer may also
need to access data from some monitoring systems, or external services like
MySQL.

We could probably separate the logic of rebalancer from controller.
Rebalancer is all about setting ideal-state; i.e. set the target mappings
of partition-->(host, state). The rebalancer logic will be mostly
application specific. On the other hand, controller is all about bringing
current-state to ideal-state. The controller logic includes using a
semi-greedy algorithm (e.g. shortest path) to calculate the next mappings
from current mapping (i.e. current-state) to target mapping (i.e.
ideal-state), applying constraints, figuring out optimal parallism, etc.
The controller logic will be mostly generic to all applications. The only
protocol between rebalancer and controller is ideal-state (i.e. the target
partiton-->(host, state) mappings). In this sense, every rebalancer is
customized, and we can provide some default implementations like auto or
semi-auto ones. Rebalancer can be also running anywhere provided that there
is only one instance running. This can be achieved through leader election,
running with controller, or use custom-code invoker.

On Fri, Oct 10, 2014 at 12:25 PM, kishore g <g....@gmail.com> wrote:

> Hi,
>
> Even though we have couple of ways of writing custom rebalancer (one on
> participant side and another on the controller side), I dont think its
> trivial for some one to write them without understanding all the internal
> details of Helix.
>
> I am starting this thread to see if others have any thoughts on making it
> easier for some to get started and write their own rebalancer as part of
> the quick start.
>
> thanks,
> Kishore G
>