You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Benjamin Mahler <bm...@apache.org> on 2018/09/20 06:35:51 UTC
[Performance WG] Meeting Notes - September 19

Thanks to those who joined: Yan Xu, Chun-Hung Hsiao, Meng Zhu, Carl Dellar

Notes:

(1) I forgot to mention during the meeting that more progress has happened
on the parallel reads of master state for the other read-only endpoints.
Alex or Benno can reply to this thread to provide an update. [1]

(2) Work is ongoing to improve allocation cycle performance [2]:

  (a) The patches for making the Resources wrapper copy on write are ready
to land [3]. These improve the performance of common filtering operations.
Meng presented some allocation cycle data based on this:

https://docs.google.com/spreadsheets/d/1GmBdialteknPDf8IdumzPbF4bGmu7
5mVHIpiXLf3xHc

The data shows that copy on write Resources improves allocation cycle time
significantly, but as there are more frameworks, the Sorter starts to
dominate the time spent in the allocation cycle and the relative benefit
decreases.

  (b) To improve allocation cycle time further by addressing the sorter
performance issues, I sent out / will send out a few patches [4]. The two
that provide the most benefit are: introducing an efficient
ScalarResourceQuantities type to make sort itself faster, and avoiding
dirtying the sorter upon allocation so that the allocation cycle doesn't
have to keep re-sorting. The latter requires an additional change to update
the usage of framework sorters so that the total they use are the entire
cluster rather than the role allocation.

(3) There's also been significant improvements to the master's offer
fan-out path [5]. We don't yet have a benchmark for this, but I'll try to
demonstrate the improvement in 1.8.

(4) Meng showed a new allocator benchmark test fixture that Kapil worked on
that makes it easier to get a "cluster" set up with a particular
configuration to make it easier to measure allocator scenarios of interest
[6].

(5) We chatted briefly about the master's call ingestion performance,
there's a benchmark [7] that uses the reconciliation call to send a big
message and Ilya looked into the results some time ago, but we should
revisit and gather performance data.

(6) I'm nearly done with the 1.7.0 performance blog post, just waiting on
some data from Alex / Benno.

Agenda Doc: https://docs.google.com/document/d/
12hWGuzbqyNWc2l1ysbPcXwc0pzHEy4bodagrlNGCuQU

Ben

[1] https://issues.apache.org/jira/browse/MESOS-9158
[2] https://issues.apache.org/jira/browse/MESOS-9087
[3] https://issues.apache.org/jira/browse/MESOS-6765
[4] https://issues.apache.org/jira/browse/MESOS-9239
[5] https://issues.apache.org/jira/browse/MESOS-9234
[6] https://issues.apache.org/jira/browse/MESOS-9187
[7]
https://github.com/apache/mesos/blob/1.7.0/src/tests/scheduler_tests.cpp#L2164-L2230