You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Yan Xu (JIRA)" <ji...@apache.org> on 2016/12/10 03:40:58 UTC

[jira] [Created] (MESOS-6774) Role sorter and quota role sorter can have more copies of share resources in allocations than in total.

Yan Xu created MESOS-6774:
-----------------------------

             Summary: Role sorter and quota role sorter can have more copies of share resources in allocations than in total.
                 Key: MESOS-6774
                 URL: https://issues.apache.org/jira/browse/MESOS-6774
             Project: Mesos
          Issue Type: Improvement
          Components: allocation
            Reporter: Yan Xu


The way shared resources support works in the allocator is to allocate multiple copies of the shared resources so multiple frameworks can receive them. Multiple copies of the same shared resources doesn't affect the quantity of the sorter's allocations and total pool so it doesn't have an impact on DRF.

To make resource accounting work, though, when the copies of the same resource are add to a framework's allocation, we increase total size of the total pool in the sorter (again, adding these copies doesn't affect quantity) so that the *allocations in a sorter is always bounded by the total pool in the sorter*. This invariant is a requirement for the following logic in the allocator to work:

{code:title=Remove the resources from the framework sorter when it's unallocated from the framework}
      frameworkSorters[role]->unallocated(
          frameworkId.value(), slaveId, resources);
      frameworkSorters[role]->remove(slaveId, resources);
{code}

e.g., if there are 2 copies of a shared disk allocated to framework1, the sorter's total pool has 2 copies of the disk as well.

However we currently only do this for the framework sorter below a role because the allocator (implicitly) assumes that role sorter, being the root-level sorter, has a total pool that's unchanged during allocation or resource recover. This is not a problem right now because for this reason, {{Sorter::add(const SlaveID& slaveId, const Resources& resources)/remove(const SlaveID& slaveId, const Resources& resources)}} are not called during allocation or resource recover.

This will likely change with MESOS-6375, when role sorters are having a hierarchy so not all of them are bound to the physical size of the cluster. We should revisit the shared resource allocation logic then to make sure the invariant *allocations in a sorter is always bounded by the total pool in the sorter* holds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)