You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@helix.apache.org by Kanak Biscuitwala <ka...@hotmail.com> on 2013/11/05 20:24:10 UTC

Participant Disable Sematics

Hi,

We've identified some use cases that do not necessarily fit in with how participants are disabled today for full auto (auto rebalance) mode. For reference, here's what Helix currently supports:

- Disable participant: this participant will have its currently served replicas all go to the initial state and there will not be any transitions started from the initial state until the participant is enabled once more (depends on the rebalancing algorithm). The rebalancing algorithm may choose to serve these replicas on other participants while the participant is disabled.

- Disable partition for participant: this participant will have its currently served replica go to the initial state and that replica will remain in that state until the participant is enabled once more (depending on the rebalancing algorithm). The rebalancing algorithm may choose to serve this replica on another participant while the participant is disabled.

However, in some cases, when we disable a partition, we may not want the replicas to be reassigned to other participants because of a maintenance operation or other task. For instance, if we use OnlineOffline, and we disable a partition on a node that has that partition ONLINE, then there would simply not be any partition online until stop disabling the partition.

One way to do this is to just disable the partition across the cluster. However, for state models that have multiple states, this doesn't really make sense. For instance, what should this do in MasterSlave? If we have 1 master and 2 slaves and disable the master, then should there just be 2 slaves, 1 master and 1 slave, or 1 master and 2 slaves with another node taking up the mastership?

Does anyone have any thoughts on the utility of this new use case, what the semantics should be, and potential interfaces that could make sense here?

Thanks,
Kanak 		 	   		  

Re: Participant Disable Sematics

Posted by kishore g <g....@gmail.com>.
I think we definitely need a way to disable a partition across the entire
cluster. I am not sure its worth having a functionality to disable a
partition for a participant but not reassign that partition to another node.

What we can instead do is provide temporary override for setting the
locations for a partition. So if some one wants to do maintenance of a
partition on a participant, they can

- set the preferred locations for that partition, (override)
-- disable to partition on a participant
-- do maintenance
-- enable the partition
-- remove the override

We can have a JIRA for this but dont think its needed at this time.






On Tue, Nov 5, 2013 at 11:24 AM, Kanak Biscuitwala <ka...@hotmail.com>wrote:

> Hi,
>
> We've identified some use cases that do not necessarily fit in with how
> participants are disabled today for full auto (auto rebalance) mode. For
> reference, here's what Helix currently supports:
>
> - Disable participant: this participant will have its currently served
> replicas all go to the initial state and there will not be any transitions
> started from the initial state until the participant is enabled once more
> (depends on the rebalancing algorithm). The rebalancing algorithm may
> choose to serve these replicas on other participants while the participant
> is disabled.
>
> - Disable partition for participant: this participant will have its
> currently served replica go to the initial state and that replica will
> remain in that state until the participant is enabled once more (depending
> on the rebalancing algorithm). The rebalancing algorithm may choose to
> serve this replica on another participant while the participant is disabled.
>
> However, in some cases, when we disable a partition, we may not want the
> replicas to be reassigned to other participants because of a maintenance
> operation or other task. For instance, if we use OnlineOffline, and we
> disable a partition on a node that has that partition ONLINE, then there
> would simply not be any partition online until stop disabling the partition.
>
> One way to do this is to just disable the partition across the cluster.
> However, for state models that have multiple states, this doesn't really
> make sense. For instance, what should this do in MasterSlave? If we have 1
> master and 2 slaves and disable the master, then should there just be 2
> slaves, 1 master and 1 slave, or 1 master and 2 slaves with another node
> taking up the mastership?
>
> Does anyone have any thoughts on the utility of this new use case, what
> the semantics should be, and potential interfaces that could make sense
> here?
>
> Thanks,
> Kanak