You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Benjamin Mahler <bm...@apache.org> on 2018/05/18 23:40:13 UTC

High Level Design Doc: Offer Starvation

Hi folks,

One of the long standing issues with running many frameworks on Mesos is
the presence of what is called "offer starvation". This is when some
role/framework that has unsatisfied demand is not receiving offers, while
mesos is continually sends offers to other roles/frameworks that don't want
them. This was originally captured via:

https://issues.apache.org/jira/browse/MESOS-3202

It's currently not possible to program a well-behaved scheduler to avoid
this issue, since the only mechanisms schedulers have today is to SUPPRESS
if they have no work to do and otherwise filter offers that aren't needed
for a timeout. However, a scheduler that has short lived workloads must
REVIVE frequently (which clears all of its filters). With a sufficient
number of these frameworks Mesos may not be able to allocate all the
available resources. See the Background section of the document for more
details.

This document goes over the background of the issue and covers various
solutions for addressing it. Some of them are longer term and would merit
their own design doc:

https://docs.google.com/document/d/1uvTmBo_21Ul9U_mijgWyh7hE0E_yZXrFr43JIB9OCl8

The current thinking is that it would be simplest in the short term to
provide an alternative sorter to DRF that can be chosen when starting the
master (e.g. random). In the medium term, we may add demand-awareness, and
long term migrate to shared state scheduling.

Please share any feedback or questions, thanks!

Ben