You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Anindya Sinha <an...@apple.com> on 2016/11/29 00:25:43 UTC

Re: Review Request 53096: Fix handling in shared count in total resources in the sorter.


> On Nov. 19, 2016, 12:30 a.m., Jiang Yan Xu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, line 757
> > <https://reviews.apache.org/r/53096/diff/4/?file=1566655#file1566655line757>
> >
> >     So if `operation` contians multiple copies, the result will end up having multiple copies right?

This piece of code is executed for non LAUNCH operations only. So we should have at most a single copy of shared resources at this point for a specific operation.


> On Nov. 19, 2016, 12:30 a.m., Jiang Yan Xu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, lines 752-755
> > <https://reviews.apache.org/r/53096/diff/4/?file=1566655#file1566655line752>
> >
> >     This seems the wrong place to do it, multiple copies of the same shared resources are added to the allocation of the role (in roleSorter and quotaRoleSorter) from multiple places:
> >     
> >     - updateAllocation
> >     - addFramework
> >     - addSlave
> >     - allocate
> >     
> >     If the rule is to not have more than one copy the shared resources to roleSorter and quotaRoleSorter, they should be invariants, i.e., we should prevent them from being added instead of deduping them here.
> >     
> >     Let's chat about the design.

That is indeed the case. We can have multiple copies of shared resources in allocations for role sorter and quota sorter; but only a single copy of shared resources in the total maintained in role sorter and quota sorter. The issue here is that we use the shared count in allocations in framework sorter to remove appropriate resources in the role and quota sorter's total resources. Since total resources in role sorter and quota sorter is atmost 1, but allocations in framework sorter can be more which results in `CHECK(total_.resources[slaveId].contains(resources))` failing.

So, this fix seems to take care of the inconsistencies in the shared count. However, as discussed it would be better to use the agent's total resources (before and after applying the appropriate `operation`) to account for the total resources in the role and quota sorter, i.e. something similar to the approach in `HierarchicalAllocatorProcess::updateAvailable()`, as follows:

```
  // Update the total resources.
  Try<Resources> updatedTotal = slaves[slaveId].total.apply(operations);
  CHECK_SOME(updatedTotal);

  const Resources oldTotal = slaves[slaveId].total;
  slaves[slaveId].total = updatedTotal.get();

  // Now, update the total resources in the role sorters by removing
  // the previous resources at this slave and adding the new resources.
  roleSorter->remove(slaveId, oldTotal);
  roleSorter->add(slaveId, updatedTotal.get());

  // See comment at `quotaRoleSorter` declaration regarding non-revocable.
  quotaRoleSorter->remove(slaveId, oldTotal.nonRevocable());
  quotaRoleSorter->add(slaveId, updatedTotal.get().nonRevocable());
```


- Anindya


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53096/#review156326
-----------------------------------------------------------


On Nov. 18, 2016, 5:14 a.m., Anindya Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53096/
> -----------------------------------------------------------
> 
> (Updated Nov. 18, 2016, 5:14 a.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6444
>     https://issues.apache.org/jira/browse/MESOS-6444
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We maintain a single copy of shared resource in the role and quota
> sorter's total resources. So, when we update these resources, we ensure
> that we only count a single copy even though the framework sorter
> may be returned multiple copies of a shared resource.
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/mesos/hierarchical.cpp c8f9492ee1b69e125a1e841116d22a578a9b524e 
>   src/tests/persistent_volume_tests.cpp 8651b2c5455089041f16d091c90a7e0d948191b8 
> 
> Diff: https://reviews.apache.org/r/53096/diff/
> 
> 
> Testing
> -------
> 
> Tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>