You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@helix.apache.org by "Kanak Biscuitwala (JIRA)" <ji...@apache.org> on 2013/10/23 03:15:42 UTC

[jira] [Commented] (HELIX-276) Allow FULL_AUTO mode to favor some transitions

    [ https://issues.apache.org/jira/browse/HELIX-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802490#comment-13802490 ] 

Kanak Biscuitwala commented on HELIX-276:
-----------------------------------------

There needs to be some more specification here. Here is an example case where transition preferences may not make sense:

1. The first node is launched
2. Helix assigns one replica of each partition to the node, and puts them all in Master state (as is governed by state priorities)
3. A second node is launched
4. Helix assigns a second replica of each partition to the second node

Here, we probably want some of the replicas in the second node to be in state Master. Otherwise, a single node failure would force a large number of Slave --> Master transitions at once. However, this would violate potential transition preferences.

An alternative is that we "prefer" existing replicas, but not at the expense of state balance. This is more of a "we'll try our best not to force multiple transitions" decision.

Another alternative is that we only try to do this when nodes are removed (but not added).

In any case, probably the "right" thing to do is to list out a few more scenarios, come up with some configuration properties associated with those scenarios, and then expose those as an API. I'm inclined to leave the "default" behavior more or less as-is, but with an additional tiebreaker, but a config API would help individuals apps choose the right policy for their applications.

I will work on a design for this API and will add updates to this ticket.

> Allow FULL_AUTO mode to favor some transitions
> ----------------------------------------------
>
>                 Key: HELIX-276
>                 URL: https://issues.apache.org/jira/browse/HELIX-276
>             Project: Apache Helix
>          Issue Type: Improvement
>            Reporter: Matthieu Morel
>            Assignee: Kanak Biscuitwala
>
> In FULL_AUTO mode, helix computes both partitioning and states.
> Currently, in a master-replica model, when rebalancing due to a failure of the master node, Helix does not promote an existing replica to master, but instead assigns a new master (I.e. offline -> replica -> master).
> The current algorithm optimizes for minimal partition movement and even distribution of state. However, it should also take into account the priorities between states, or provide a way to customize it. For instance, when it is more costly (number of transitions, priorities) to perform offline -> master than replica -> master, the algorithm could favor replica -> master transitions.
> One application would be for quick failover : mater ops are logged to a journal, a replica builds its state by tailing the journal, and upon failure of the master, recovery is fast since only a few operations may have to be played to reach the latest state of the master. If a new node is assigned master role from scratch, the whole journal must be replayed.
> More context in this thread:
> http://markmail.org/message/inq6tnlnk5ckscwr



--
This message was sent by Atlassian JIRA
(v6.1#6144)