You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Thomas Marshall <tw...@gmail.com> on 2013/08/16 19:30:04 UTC

Review Request 13620: Fix the Allocator to recover resources when a slave/framework is removed

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13620/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman and Vinod Kone.


Bugs: MESOS-621
    https://issues.apache.org/jira/browse/MESOS-621


Repository: mesos-git


Description
-------

Previously, when a slave or framework was removed the allocator didn't recover the associated resources, instead relying on the master calling Allocator::resourcesRecovered for all resources allocated. This was difficult to reason about and meant that the allocator's state was sometimes inconsistent with the reality of the cluster (for example, a framework could have resources allocated to it on a slave that had been removed), so this patch fixes this.

This also solves a problem with the upcoming implementation of revocation where resources were recovered from a removed framework and the allocator didn't know what that framework's role is because it had been removed.


Diffs
-----

  src/master/hierarchical_allocator_process.hpp 183b205 
  src/master/master.cpp d53b8bb 

Diff: https://reviews.apache.org/r/13620/diff/


Testing
-------

make check


Thanks,

Thomas Marshall


Re: Review Request 13620: Fix the Allocator to recover resources when a slave/framework is removed

Posted by Thomas Marshall <tw...@gmail.com>.

> On Aug. 19, 2013, 3:49 p.m., Benjamin Hindman wrote:
> > src/master/hierarchical_allocator_process.hpp, lines 471-472
> > <https://reviews.apache.org/r/13620/diff/1/?file=342088#file342088line471>
> >
> >     What's the difference between these two? Or why isn't 'unallocated' also "removing" the resources?

Sorter::unallocated tells the sorter that those resources are no longer allocated to that framework. Sorter::remove tells the allocator that those resources are no longer in the total pool the allocated resources are coming out of.

In the case of the role level sorter, Sorter::remove is only called when a slave is removed, but for the framework level sorters, we call Sorter::remove each time a framework is deallocated from, so that the framework's share is calculated as a portion of the total resources allocated to all frameworks in that role, rather than as a portion of the total cluster resources. 


- Thomas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13620/#review25303
-----------------------------------------------------------


On Aug. 16, 2013, 5:30 p.m., Thomas Marshall wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13620/
> -----------------------------------------------------------
> 
> (Updated Aug. 16, 2013, 5:30 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Bugs: MESOS-621
>     https://issues.apache.org/jira/browse/MESOS-621
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Previously, when a slave or framework was removed the allocator didn't recover the associated resources, instead relying on the master calling Allocator::resourcesRecovered for all resources allocated. This was difficult to reason about and meant that the allocator's state was sometimes inconsistent with the reality of the cluster (for example, a framework could have resources allocated to it on a slave that had been removed), so this patch fixes this.
> 
> This also solves a problem with the upcoming implementation of revocation where resources were recovered from a removed framework and the allocator didn't know what that framework's role is because it had been removed.
> 
> 
> Diffs
> -----
> 
>   src/master/hierarchical_allocator_process.hpp 183b205 
>   src/master/master.cpp d53b8bb 
> 
> Diff: https://reviews.apache.org/r/13620/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Thomas Marshall
> 
>


Re: Review Request 13620: Fix the Allocator to recover resources when a slave/framework is removed

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13620/#review25303
-----------------------------------------------------------


I like adding 'allocations', but how about some helpers in Slave and Framework too?


src/master/hierarchical_allocator_process.hpp
<https://reviews.apache.org/r/13620/#comment49642>

    What's the difference between these two? Or why isn't 'unallocated' also "removing" the resources?



src/master/hierarchical_allocator_process.hpp
<https://reviews.apache.org/r/13620/#comment49643>

    Pull this up to previous line?



src/master/hierarchical_allocator_process.hpp
<https://reviews.apache.org/r/13620/#comment49644>

    What about:
    
    Slave::allocate(FrameworkID, Resources)
    {
      available -= resources;
      allocations[frameworkId] += resources;
    }
    
    and:
    
    Slave::deallocate(FrameworkID, Resources)
    {
      available += resources;
      allocations[frameworkId] -= resources;
    
      if (!allocations[frameworkId].allocatable()) {
        allocations.erase(frameworkId);
      }
    }
    
    The 'allocations.erase' part should enable you to use this code above where you remove a framework too.



src/master/hierarchical_allocator_process.hpp
<https://reviews.apache.org/r/13620/#comment49645>

    You could add a Framework::allocate/deallocate for consistency too:
    
    slaves[slaveId].allocate(frameworkId, resources);
    frameworks[frameworkId].allocate(slaveId, resources);


- Benjamin Hindman


On Aug. 16, 2013, 5:30 p.m., Thomas Marshall wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13620/
> -----------------------------------------------------------
> 
> (Updated Aug. 16, 2013, 5:30 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Bugs: MESOS-621
>     https://issues.apache.org/jira/browse/MESOS-621
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Previously, when a slave or framework was removed the allocator didn't recover the associated resources, instead relying on the master calling Allocator::resourcesRecovered for all resources allocated. This was difficult to reason about and meant that the allocator's state was sometimes inconsistent with the reality of the cluster (for example, a framework could have resources allocated to it on a slave that had been removed), so this patch fixes this.
> 
> This also solves a problem with the upcoming implementation of revocation where resources were recovered from a removed framework and the allocator didn't know what that framework's role is because it had been removed.
> 
> 
> Diffs
> -----
> 
>   src/master/hierarchical_allocator_process.hpp 183b205 
>   src/master/master.cpp d53b8bb 
> 
> Diff: https://reviews.apache.org/r/13620/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Thomas Marshall
> 
>


Re: Review Request 13620: Fix the Allocator to recover resources when a slave/framework is removed

Posted by Mesos ReviewBot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13620/#review87673
-----------------------------------------------------------


Bad patch!

Reviews applied: [13620]

Failed command: ./support/apply-review.sh -n -r 13620

Error:
 2015-06-12 02:04:37 URL:https://reviews.apache.org/r/13620/diff/raw/ [8945/8945] -> "13620.patch" [1]
error: src/master/hierarchical_allocator_process.hpp: does not exist in index
error: patch failed: src/master/master.cpp:564
error: src/master/master.cpp: patch does not apply
Failed to apply patch

- Mesos ReviewBot


On Aug. 19, 2013, 8:39 p.m., Thomas Marshall wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13620/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2013, 8:39 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Bugs: MESOS-621
>     https://issues.apache.org/jira/browse/MESOS-621
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, when a slave or framework was removed the allocator didn't recover the associated resources, instead relying on the master calling Allocator::resourcesRecovered for all resources allocated. This was difficult to reason about and meant that the allocator's state was sometimes inconsistent with the reality of the cluster (for example, a framework could have resources allocated to it on a slave that had been removed), so this patch fixes this.
> 
> This also solves a problem with the upcoming implementation of revocation where resources were recovered from a removed framework and the allocator didn't know what that framework's role is because it had been removed.
> 
> 
> Diffs
> -----
> 
>   src/master/hierarchical_allocator_process.hpp 183b205 
>   src/master/master.cpp d53b8bb 
> 
> Diff: https://reviews.apache.org/r/13620/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Thomas Marshall
> 
>


Re: Review Request 13620: Fix the Allocator to recover resources when a slave/framework is removed

Posted by Niklas Nielsen <ni...@qni.dk>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13620/#review87671
-----------------------------------------------------------


Ben; is this still relevant?

- Niklas Nielsen


On Aug. 19, 2013, 1:39 p.m., Thomas Marshall wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13620/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2013, 1:39 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Bugs: MESOS-621
>     https://issues.apache.org/jira/browse/MESOS-621
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, when a slave or framework was removed the allocator didn't recover the associated resources, instead relying on the master calling Allocator::resourcesRecovered for all resources allocated. This was difficult to reason about and meant that the allocator's state was sometimes inconsistent with the reality of the cluster (for example, a framework could have resources allocated to it on a slave that had been removed), so this patch fixes this.
> 
> This also solves a problem with the upcoming implementation of revocation where resources were recovered from a removed framework and the allocator didn't know what that framework's role is because it had been removed.
> 
> 
> Diffs
> -----
> 
>   src/master/hierarchical_allocator_process.hpp 183b205 
>   src/master/master.cpp d53b8bb 
> 
> Diff: https://reviews.apache.org/r/13620/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Thomas Marshall
> 
>


Re: Review Request 13620: Fix the Allocator to recover resources when a slave/framework is removed

Posted by Thomas Marshall <tw...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13620/
-----------------------------------------------------------

(Updated Aug. 19, 2013, 8:39 p.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Changes
-------

Ben's review - added helpers to the Framework and Slave structs for allocating and deallocating.


Bugs: MESOS-621
    https://issues.apache.org/jira/browse/MESOS-621


Repository: mesos-git


Description
-------

Previously, when a slave or framework was removed the allocator didn't recover the associated resources, instead relying on the master calling Allocator::resourcesRecovered for all resources allocated. This was difficult to reason about and meant that the allocator's state was sometimes inconsistent with the reality of the cluster (for example, a framework could have resources allocated to it on a slave that had been removed), so this patch fixes this.

This also solves a problem with the upcoming implementation of revocation where resources were recovered from a removed framework and the allocator didn't know what that framework's role is because it had been removed.


Diffs (updated)
-----

  src/master/hierarchical_allocator_process.hpp 183b205 
  src/master/master.cpp d53b8bb 

Diff: https://reviews.apache.org/r/13620/diff/


Testing
-------

make check


Thanks,

Thomas Marshall