You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2018/10/15 19:22:08 UTC

[GitHub] clintropolis commented on issue #6349: maintenance mode for Historical

clintropolis commented on issue #6349: maintenance mode for Historical
URL: https://github.com/apache/incubator-druid/pull/6349#issuecomment-429981063
 
 
   @egor-ryashin I think since the `BalancerStrategy` is the central point of decision making of where to place segments in the cluster it _must_ be aware of such a fundamental aspect of it's operation, else everything that calls into it must be aware and independently handle whether or not servers are in maintenance mode. TL;DR I don't think the `BalancerStrategy` _can_ operate correctly without knowledge of maintenance mode.
   
   I think I've already spotted a flaw in this PR in it's current state that I believe can result in undefined/unpredictable coordinator behavior that would have been remedied by the cost balancer being aware of maintenance... [dropping segments in the `LoadRule` phase](https://github.com/apache/incubator-druid/blob/1cec73b22f52413c9cbfb2396463529f64b04fd9/server/src/main/java/org/apache/druid/server/coordinator/rules/LoadRule.java#L341). Because `drop` doesn't use the same method to get the list of server holders to consider dropping segments from as the assignment of segments, it is possible that the "best" server to drop from according to the balancer strategy might _not_ be a server that is marked as in maintenance when one was a candidate, producing an action that is potentially conflicting with a decision that assignment or movement might make. I think that a server in maintenance should always be the 'best' to be dropped from, and the 'worst' to be assigned to, and I don't think there is really a clean way from within `LoadRule` to handle this in the dropping case because a filter on the list of servers alone won't really produce correct behavior.
   
   On the other hand, if cost balancer logic _was_ aware of maintenance, we could get by without `LoadRule` or anything else that uses the cost balancer to add/move/drop segments even being aware that maintenance mode for historical servers exist at all. And _everything_ should be using the balancer strategy to make these sorts of decisions, in order to produce as consistent and predictable behavior as possible. If the balancer strategy was aware of maintenance, [this check becomes a totally optional, slight optimization to operate on a subset of servers instead of the whole](https://github.com/apache/incubator-druid/blob/1cec73b22f52413c9cbfb2396463529f64b04fd9/server/src/main/java/org/apache/druid/server/coordinator/rules/LoadRule.java#L156), instead of a requirement to ensure that assignment doesn't assign to a server that is in maintenance.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org