You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Andrei Sekretenko (Jira)" <ji...@apache.org> on 2019/10/21 18:09:00 UTC
[jira] [Commented] (MESOS-10015)
HierarchicalAllocatorProcess::updateAvailable() can stall the allocator
with a huge number of reservations on an agent.
[ https://issues.apache.org/jira/browse/MESOS-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956346#comment-16956346 ]
Andrei Sekretenko commented on MESOS-10015:
-------------------------------------------
https://issues.apache.org/jira/browse/MESOS-9942 and related work will fix the `total number of frameworks` part.
To fix the quadratic growth vs the reservations count, we can avoid using `Resources::operator +=`, `Resources::operator-=` and `Resources::contains()` for re-adding a slave to a framework sorter.
> HierarchicalAllocatorProcess::updateAvailable() can stall the allocator with a huge number of reservations on an agent.
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: MESOS-10015
> URL: https://issues.apache.org/jira/browse/MESOS-10015
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 1.5.3, 1.6.2, 1.7.2, 1.8.1, 1.9.0
> Reporter: Andrei Sekretenko
> Assignee: Andrei Sekretenko
> Priority: Critical
> Labels: resource-management
>
> Currently, updateAvailable() called for a single-object Resources for a single framework on a single slave requires `(total number of frameworks) * (number of resource objects per this slave)^2` calls of `Resource::addable()`
> In a cluster with a large number of frameworks this results in severe degradation of allocator performance when a bunch of RESERVE/UNRESERVE operations occurs for an agent with hundreds of unique resources.
> On our testing cluster task we observed task scheduling delays up to 30 minutes due to allocator being occupied with processing UNRESERVE operations.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)