You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2016/12/30 01:19:58 UTC

[jira] [Created] (MESOS-6844) Avoid offer fragmentation between multiple frameworks / within a single framework.

Benjamin Mahler created MESOS-6844:
--------------------------------------

             Summary: Avoid offer fragmentation between multiple frameworks / within a single framework.
                 Key: MESOS-6844
                 URL: https://issues.apache.org/jira/browse/MESOS-6844
             Project: Mesos
          Issue Type: Epic
          Components: allocation
            Reporter: Benjamin Mahler


The current allocation strategy is to make "coarse-grained" offers to the frameworks, wherein each offer will contain all of the resources currently available on the agent to the framework.

However, this "coarse-grained" invariant does not apply over time as resources are freed and additional offers can be made, since we make another "coarse-grained" offer without rescinding any existing outstanding offers.

This leads fragmentation of the offers for an agent (i.e. it is possible for there to be multiple offers to one or more frameworks for the available resources on an agent). There are a number of issues with this:

(1) In the case where the fragmented offers have been sent to multiple frameworks, it's possible for none of the frameworks to have sufficient resources to run anything. As the schedulers decline or hold on to these offers, it may take a long time to make progress.

(2) A simple scheduler may be implemented to only operate without holding and merging offers since this is more complex (e.g. how long to hold on to offers? more complex offer management / matching). In this case there are some pathological cases where the framework might not receive the single un-fragmented offer (when each time the allocator makes an offer, it sees an outstanding offer already as the DECLINE has not yet been processed).

The suggestion in this ticket is to explore imposing the "coarse-grained" invariant by avoiding fragmenting the offers across multiple frameworks and even for the same framework (we should look at these somewhat separately). This can be achieved if the allocator has visibility into the offers and rescinds outstanding offers for the agent prior to offering additionally freed resources on the agent.

Note however, that this also has some negative implications for scheduling throughput. Consider the case where there is a high degree of churn on an agent due to a large number of small, short-lived tasks. In this case, the framework would experience a lot of scheduling interference as it tries to accept offers but the offers are rescinded frequently as the allocator attempts to un-fragment the offers. There may be ways to mitigate this, for example we could allow operations on the rescinded offers so long as the operation can still be applied and the allocation constraints (fairness / quota) are not violated, but this needs more exploration. It may be that different frameworks desire different behavior here.

This problem should also be examined in the context of optimistic resource allocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)