You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Thomas Marshall <tw...@gmail.com> on 2013/08/16 19:30:04 UTC
Review Request 13620: Fix the Allocator to recover resources when a
slave/framework is removed
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13620/
-----------------------------------------------------------
Review request for mesos, Benjamin Hindman and Vinod Kone.
Bugs: MESOS-621
https://issues.apache.org/jira/browse/MESOS-621
Repository: mesos-git
Description
-------
Previously, when a slave or framework was removed the allocator didn't recover the associated resources, instead relying on the master calling Allocator::resourcesRecovered for all resources allocated. This was difficult to reason about and meant that the allocator's state was sometimes inconsistent with the reality of the cluster (for example, a framework could have resources allocated to it on a slave that had been removed), so this patch fixes this.
This also solves a problem with the upcoming implementation of revocation where resources were recovered from a removed framework and the allocator didn't know what that framework's role is because it had been removed.
Diffs
-----
src/master/hierarchical_allocator_process.hpp 183b205
src/master/master.cpp d53b8bb
Diff: https://reviews.apache.org/r/13620/diff/
Testing
-------
make check
Thanks,
Thomas Marshall
Re: Review Request 13620: Fix the Allocator to recover resources when a
slave/framework is removed
Posted by Thomas Marshall <tw...@gmail.com>.
> On Aug. 19, 2013, 3:49 p.m., Benjamin Hindman wrote:
> > src/master/hierarchical_allocator_process.hpp, lines 471-472
> > <https://reviews.apache.org/r/13620/diff/1/?file=342088#file342088line471>
> >
> > What's the difference between these two? Or why isn't 'unallocated' also "removing" the resources?
Sorter::unallocated tells the sorter that those resources are no longer allocated to that framework. Sorter::remove tells the allocator that those resources are no longer in the total pool the allocated resources are coming out of.
In the case of the role level sorter, Sorter::remove is only called when a slave is removed, but for the framework level sorters, we call Sorter::remove each time a framework is deallocated from, so that the framework's share is calculated as a portion of the total resources allocated to all frameworks in that role, rather than as a portion of the total cluster resources.
- Thomas
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13620/#review25303
-----------------------------------------------------------
On Aug. 16, 2013, 5:30 p.m., Thomas Marshall wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13620/
> -----------------------------------------------------------
>
> (Updated Aug. 16, 2013, 5:30 p.m.)
>
>
> Review request for mesos, Benjamin Hindman and Vinod Kone.
>
>
> Bugs: MESOS-621
> https://issues.apache.org/jira/browse/MESOS-621
>
>
> Repository: mesos-git
>
>
> Description
> -------
>
> Previously, when a slave or framework was removed the allocator didn't recover the associated resources, instead relying on the master calling Allocator::resourcesRecovered for all resources allocated. This was difficult to reason about and meant that the allocator's state was sometimes inconsistent with the reality of the cluster (for example, a framework could have resources allocated to it on a slave that had been removed), so this patch fixes this.
>
> This also solves a problem with the upcoming implementation of revocation where resources were recovered from a removed framework and the allocator didn't know what that framework's role is because it had been removed.
>
>
> Diffs
> -----
>
> src/master/hierarchical_allocator_process.hpp 183b205
> src/master/master.cpp d53b8bb
>
> Diff: https://reviews.apache.org/r/13620/diff/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Thomas Marshall
>
>
Re: Review Request 13620: Fix the Allocator to recover resources when a
slave/framework is removed
Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13620/#review25303
-----------------------------------------------------------
I like adding 'allocations', but how about some helpers in Slave and Framework too?
src/master/hierarchical_allocator_process.hpp
<https://reviews.apache.org/r/13620/#comment49642>
What's the difference between these two? Or why isn't 'unallocated' also "removing" the resources?
src/master/hierarchical_allocator_process.hpp
<https://reviews.apache.org/r/13620/#comment49643>
Pull this up to previous line?
src/master/hierarchical_allocator_process.hpp
<https://reviews.apache.org/r/13620/#comment49644>
What about:
Slave::allocate(FrameworkID, Resources)
{
available -= resources;
allocations[frameworkId] += resources;
}
and:
Slave::deallocate(FrameworkID, Resources)
{
available += resources;
allocations[frameworkId] -= resources;
if (!allocations[frameworkId].allocatable()) {
allocations.erase(frameworkId);
}
}
The 'allocations.erase' part should enable you to use this code above where you remove a framework too.
src/master/hierarchical_allocator_process.hpp
<https://reviews.apache.org/r/13620/#comment49645>
You could add a Framework::allocate/deallocate for consistency too:
slaves[slaveId].allocate(frameworkId, resources);
frameworks[frameworkId].allocate(slaveId, resources);
- Benjamin Hindman
On Aug. 16, 2013, 5:30 p.m., Thomas Marshall wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13620/
> -----------------------------------------------------------
>
> (Updated Aug. 16, 2013, 5:30 p.m.)
>
>
> Review request for mesos, Benjamin Hindman and Vinod Kone.
>
>
> Bugs: MESOS-621
> https://issues.apache.org/jira/browse/MESOS-621
>
>
> Repository: mesos-git
>
>
> Description
> -------
>
> Previously, when a slave or framework was removed the allocator didn't recover the associated resources, instead relying on the master calling Allocator::resourcesRecovered for all resources allocated. This was difficult to reason about and meant that the allocator's state was sometimes inconsistent with the reality of the cluster (for example, a framework could have resources allocated to it on a slave that had been removed), so this patch fixes this.
>
> This also solves a problem with the upcoming implementation of revocation where resources were recovered from a removed framework and the allocator didn't know what that framework's role is because it had been removed.
>
>
> Diffs
> -----
>
> src/master/hierarchical_allocator_process.hpp 183b205
> src/master/master.cpp d53b8bb
>
> Diff: https://reviews.apache.org/r/13620/diff/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Thomas Marshall
>
>
Re: Review Request 13620: Fix the Allocator to recover resources when
a slave/framework is removed
Posted by Mesos ReviewBot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13620/#review87673
-----------------------------------------------------------
Bad patch!
Reviews applied: [13620]
Failed command: ./support/apply-review.sh -n -r 13620
Error:
2015-06-12 02:04:37 URL:https://reviews.apache.org/r/13620/diff/raw/ [8945/8945] -> "13620.patch" [1]
error: src/master/hierarchical_allocator_process.hpp: does not exist in index
error: patch failed: src/master/master.cpp:564
error: src/master/master.cpp: patch does not apply
Failed to apply patch
- Mesos ReviewBot
On Aug. 19, 2013, 8:39 p.m., Thomas Marshall wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13620/
> -----------------------------------------------------------
>
> (Updated Aug. 19, 2013, 8:39 p.m.)
>
>
> Review request for mesos, Benjamin Hindman and Vinod Kone.
>
>
> Bugs: MESOS-621
> https://issues.apache.org/jira/browse/MESOS-621
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Previously, when a slave or framework was removed the allocator didn't recover the associated resources, instead relying on the master calling Allocator::resourcesRecovered for all resources allocated. This was difficult to reason about and meant that the allocator's state was sometimes inconsistent with the reality of the cluster (for example, a framework could have resources allocated to it on a slave that had been removed), so this patch fixes this.
>
> This also solves a problem with the upcoming implementation of revocation where resources were recovered from a removed framework and the allocator didn't know what that framework's role is because it had been removed.
>
>
> Diffs
> -----
>
> src/master/hierarchical_allocator_process.hpp 183b205
> src/master/master.cpp d53b8bb
>
> Diff: https://reviews.apache.org/r/13620/diff/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Thomas Marshall
>
>
Re: Review Request 13620: Fix the Allocator to recover resources when
a slave/framework is removed
Posted by Niklas Nielsen <ni...@qni.dk>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13620/#review87671
-----------------------------------------------------------
Ben; is this still relevant?
- Niklas Nielsen
On Aug. 19, 2013, 1:39 p.m., Thomas Marshall wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13620/
> -----------------------------------------------------------
>
> (Updated Aug. 19, 2013, 1:39 p.m.)
>
>
> Review request for mesos, Benjamin Hindman and Vinod Kone.
>
>
> Bugs: MESOS-621
> https://issues.apache.org/jira/browse/MESOS-621
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Previously, when a slave or framework was removed the allocator didn't recover the associated resources, instead relying on the master calling Allocator::resourcesRecovered for all resources allocated. This was difficult to reason about and meant that the allocator's state was sometimes inconsistent with the reality of the cluster (for example, a framework could have resources allocated to it on a slave that had been removed), so this patch fixes this.
>
> This also solves a problem with the upcoming implementation of revocation where resources were recovered from a removed framework and the allocator didn't know what that framework's role is because it had been removed.
>
>
> Diffs
> -----
>
> src/master/hierarchical_allocator_process.hpp 183b205
> src/master/master.cpp d53b8bb
>
> Diff: https://reviews.apache.org/r/13620/diff/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Thomas Marshall
>
>
Re: Review Request 13620: Fix the Allocator to recover resources when a
slave/framework is removed
Posted by Thomas Marshall <tw...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13620/
-----------------------------------------------------------
(Updated Aug. 19, 2013, 8:39 p.m.)
Review request for mesos, Benjamin Hindman and Vinod Kone.
Changes
-------
Ben's review - added helpers to the Framework and Slave structs for allocating and deallocating.
Bugs: MESOS-621
https://issues.apache.org/jira/browse/MESOS-621
Repository: mesos-git
Description
-------
Previously, when a slave or framework was removed the allocator didn't recover the associated resources, instead relying on the master calling Allocator::resourcesRecovered for all resources allocated. This was difficult to reason about and meant that the allocator's state was sometimes inconsistent with the reality of the cluster (for example, a framework could have resources allocated to it on a slave that had been removed), so this patch fixes this.
This also solves a problem with the upcoming implementation of revocation where resources were recovered from a removed framework and the allocator didn't know what that framework's role is because it had been removed.
Diffs (updated)
-----
src/master/hierarchical_allocator_process.hpp 183b205
src/master/master.cpp d53b8bb
Diff: https://reviews.apache.org/r/13620/diff/
Testing
-------
make check
Thanks,
Thomas Marshall