You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "James Peach (JIRA)" <ji...@apache.org> on 2015/09/05 01:15:45 UTC
[jira] [Comment Edited] (MESOS-3157) only perform batch resource allocations

    [ https://issues.apache.org/jira/browse/MESOS-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731589#comment-14731589 ] 

James Peach edited comment on MESOS-3157 at 9/4/15 11:15 PM:
-------------------------------------------------------------

First let's look at the case of {{addSlave}} and {{updateSlave}}. It is possible to implement these by caching a set of pending {{slaveID}}'s and then batching them into a single allocation pass. The problem with this is knowing when to trigger the location pass. You want to trigger it one you have more that a few {{slaveID}}s ready, but before the batch allocation kicks in. You also want to wait as long as possible so that you can batch as many as possible. This seems tricky; I can't think of a way to know that no more {{addSlave}} or {{updateSlave}} events are going to come.

In my environment, slaves don't come and go often enough to cause allocation performance problems. I can imagine that cloud deployments might be different, but I'd still be surprised if people are getting multiple slave events a minute.

Now, consider frameworks. All framework events trigger full allocations. Our scheduler generates a lot of frameworks and also declines offers regularly, so this is the primary source of allocation events. It should be possible to use a counter to effectively batch these full allocations. The idea would be to increment a {{pending-allocation}} counter when the caller triggers an allocation. You then dispatch an allocation attempt to the allocator process, which decrements the counter. If the counter is 1 when you get into the dispatch, you can actually perform the allocation. You have to wait until after the allocation pass to decrement the counter.

Note that if we run with only batch allocations, we get the same behaviour with less complexity. The cost is increased latency, but since we run with an allocation interval of 5sec, we are not really saving that much. Maybe we could get one more allocation cycle into that interval, but if we wanted to do that we could just decrease the allocation interval. I wonder if there's a use case for implementing the above and then running with a much longer batch interval? My initial reaction is that is probably not desirable.

Anyway, I can take a crack at the framework case, but I'm not really convinced that it is better than the simple, predictable batch allocation :)



was (Author: jamespeach):
First let's look at the case of {{addSlave}} and {{updateSlave}}. It is possible to implement these by caching a set of pending {{slaveID}}s and then batching them into a single allocation pass. The problem with this is knowing when to trigger the location pass. You want to trigger it one you have more that a few {{slaveID}}s ready, but before the batch allocation kicks in. You also want to wait as long as possible so that you can batch as many as possible. This seems tricky; I can't think of a way to know that no more {{addSlave}} or {{updateSlave}} events are going to come.

In my environment, slaves don't come and go often enough to cause allocation performance problems. I can imagine that cloud deployments might be different, but I'd still be surprised if people are getting multiple slave events a minute.

Now, consider frameworks. All framework events trigger full allocations. Our scheduler generates a lot of frameworks and also declines offers regularly, so this is the primary source of allocation events. It should be possible to use a counter to effectively batch these full allocations. The idea would be to increment a {{pending-allocation}} counter when the caller triggers an allocation. You then dispatch an allocation attempt to the allocator process, which decrements the counter. If the counter is 1 when you get into the dispatch, you can actually perform the allocation. You have to wait until after the allocation pass to decrement the counter.

Note that if we run with only batch allocations, we get the same behaviour with less complexity. The cost is increased latency, but since we run with an allocation interval of 5sec, we are not really saving that much. Maybe we could get one more allocation cycle into that interval, but if we wanted to do that we could just decrease the allocation interval. I wonder if there's a use case for implementing the above and then running with a much longer batch interval? My initial reaction is that is probably not desirable.

Anyway, I can take a crack at the framework case, but I'm not really convinced that it is better than the simple, predictable batch allocation :)


> only perform batch resource allocations
> ---------------------------------------
>
>                 Key: MESOS-3157
>                 URL: https://issues.apache.org/jira/browse/MESOS-3157
>             Project: Mesos
>          Issue Type: Bug
>          Components: allocation
>            Reporter: James Peach
>            Assignee: James Peach
>
> Our deployment environments have a lot of churn, with many short-live frameworks that often revive offers. Running the allocator takes a long time (from seconds up to minutes).
> In this situation, event-triggered allocation causes the event queue in the allocator process to get very long, and the allocator effectively becomes unresponsive (eg. a revive offers message takes too long to come to the head of the queue).
> We have been running a patch to remove all the event-triggered allocations and only allocate from the batch task {{HierarchicalAllocatorProcess::batch}}. This works great and really improves responsiveness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)