You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Jie Yu (JIRA)" <ji...@apache.org> on 2015/06/19 02:03:52 UTC
[jira] [Updated] (MESOS-2891) Performance regression in
hierarchical allocator.
[ https://issues.apache.org/jira/browse/MESOS-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jie Yu updated MESOS-2891:
--------------------------
Attachment: Perf 'addSlave' Benchmark.png
> Performance regression in hierarchical allocator.
> -------------------------------------------------
>
> Key: MESOS-2891
> URL: https://issues.apache.org/jira/browse/MESOS-2891
> Project: Mesos
> Issue Type: Bug
> Components: allocation, master
> Reporter: Benjamin Mahler
> Priority: Blocker
> Labels: twitter
> Attachments: Screen Shot 2015-06-18 at 5.02.26 PM.png
>
>
> For large clusters, the 0.23.0 allocator cannot keep up with the volume of slaves. After the following slave was re-registered, it took the allocator a long time to work through the backlog of slaves to add:
> {noformat:title=45 minute delay}
> I0618 18:55:40.738399 10172 master.cpp:3419] Re-registered slave 20150422-211121-2148346890-5050-3253-S4695
> I0618 19:40:14.960636 10164 hierarchical.hpp:496] Added slave 20150422-211121-2148346890-5050-3253-S4695
> {noformat}
> Empirically, [addSlave|https://github.com/apache/mesos/blob/dda49e688c7ece603ac7a04a977fc7085c713dd1/src/master/allocator/mesos/hierarchical.hpp#L462] and [updateSlave|https://github.com/apache/mesos/blob/dda49e688c7ece603ac7a04a977fc7085c713dd1/src/master/allocator/mesos/hierarchical.hpp#L533] have become expensive.
> Some timings from a production cluster reveal that the allocator spending in the low tens of milliseconds for each call to {{addSlave}} and {{updateSlave}}, when there are tens of thousands of slaves this amounts to the large delay seen above.
> We also saw a slow steady increase in memory consumption, hinting further at a queue backup in the allocator.
> A synthetic benchmark like we did for the registrar would be prudent here, along with visibility into the allocator's queue size.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)