You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Meng Zhu <mz...@mesosphere.io> on 2018/05/07 19:28:07 UTC

Re: Review Request 66984: Added the ability to exhaustively allocate declined resources.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66984/
-----------------------------------------------------------

(Updated May 7, 2018, 12:28 p.m.)


Review request for mesos, Benjamin Mahler, Kapil Arya, Till Toenshoff, and Vinod Kone.


Summary (updated)
-----------------

Added the ability to exhaustively allocate declined resources.


Repository: mesos


Description (updated)
-------

Schedulers that are below their fair share will continue to get
allocated resources even if they don't need those resources. The
expected behavior here is that schedulers will decline those resources
for a long time (or forever). Not all schedulers do this, however,
which means that some schedulers might get _starved_ of
resources. Technically these schedulers are already at or above their
fair share, but some operators feel that this is keeping the cluster
underutilized.

Rather than guarantee that all schedulers will decline with large
timeouts this patch ensures that all other schedulers will see
resources from the declined agent before they see resources from that
agent again, even if there are now more resources available on that
agent.


Diffs (updated)
-----

  src/master/allocator/mesos/hierarchical.hpp 955ae3e6a9e3c790fb260311ec4b4ef725a826d3 
  src/master/allocator/mesos/hierarchical.cpp 1000968be6a2935a4cac571414d7f06d7df7acf0 


Diff: https://reviews.apache.org/r/66984/diff/2/

Changes: https://reviews.apache.org/r/66984/diff/1-2/


Testing (updated)
-------

make check
Need to add dedicated tests.


Thanks,

Meng Zhu


Re: Review Request 66984: Added the ability to exhaustively allocate declined resources.

Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66984/#review202673
-----------------------------------------------------------



The description could use some more explanation around allocating up to quota guarantee vs above guarantee up to limit, and how these pertain to exhaustiveness. This also seems to have an implication on the code (e.g. if quota check for candidates). I already forgot what the thinking is there so I certainly won't remember a week from now :)


src/master/allocator/mesos/hierarchical.hpp
Lines 440-445 (patched)
<https://reviews.apache.org/r/66984/#comment284619>

    I think we can just update the above paragraph to clarify it's per role, rather than having it as a separate note here.



src/master/allocator/mesos/hierarchical.hpp
Lines 447-448 (patched)
<https://reviews.apache.org/r/66984/#comment284620>

    Let's clearly specify the conditions under which it gets cleared, is this the only event outside of the set filling up?
    
    I'm inclined to clear filters for the agent attribute change case but keep it in the exclusion set, so that we don't hit the starvation problem this feature aims to address.



src/master/allocator/mesos/hierarchical.hpp
Lines 450-451 (patched)
<https://reviews.apache.org/r/66984/#comment284621>

    Let's note here that if introduce cases such that we clear the set too frequently we run into the problem this feature aims to address.



src/master/allocator/mesos/hierarchical.hpp
Lines 452 (patched)
<https://reviews.apache.org/r/66984/#comment284623>

    hashset? If not, can you clarify where we rely on the ordering?



src/master/allocator/mesos/hierarchical.cpp
Lines 652-654 (patched)
<https://reviews.apache.org/r/66984/#comment284626>

    See my comment above.



src/master/allocator/mesos/hierarchical.cpp
Lines 1215-1217 (patched)
<https://reviews.apache.org/r/66984/#comment284638>

    It's a little odd that it's not cleared here but it's cleared on other insertions. If we make the framework candidates a function can we then do the clearing here as well? Or is it too expensive to compute on each recovery of resources?



src/master/allocator/mesos/hierarchical.cpp
Lines 1216 (patched)
<https://reviews.apache.org/r/66984/#comment284628>

    Just curious, what would it mean if it were already in the set? Would it be a bug? Do you want to check for that?



src/master/allocator/mesos/hierarchical.cpp
Lines 2067-2087 (patched)
<https://reviews.apache.org/r/66984/#comment284645>

    Can you contain this in a labmda?
    
    ```
    hashmap<string, set<FrameworkID>> roleFrameworkCandidates = [&]() {
      ...
      
      return candidates;
    }();
    ```



src/master/allocator/mesos/hierarchical.cpp
Lines 2084 (patched)
<https://reviews.apache.org/r/66984/#comment284646>

    std::move it into the map



src/master/allocator/mesos/hierarchical.cpp
Lines 2089-2107 (patched)
<https://reviews.apache.org/r/66984/#comment284648>

    Do you actually need subset? The way this is used it seems like you just need to use equality?



src/master/allocator/mesos/hierarchical.cpp
Line 2061 (original)
<https://reviews.apache.org/r/66984/#comment284643>

    The removal of this looks like an optimization to avoid re-lookup of the agent, so we could pull that out in front of this change to get it quickly committed and minimize this diff.



src/master/allocator/mesos/hierarchical.cpp
Lines 2145 (patched)
<https://reviews.apache.org/r/66984/#comment284649>

    Can you use .at for read-only access here and elsewhere?



src/master/allocator/mesos/hierarchical.cpp
Lines 2154-2160 (patched)
<https://reviews.apache.org/r/66984/#comment284622>

    > In this case, the agent will keep offering resources to the frameworks that are omitted in the exclusion set, starving others.
    
    Hm.. is this right? It looks more like no one would be getting the offer? This invariant looks like: exclusion set for role != framework candidate set for role.
    
    I guess the more difficult invariant to check is: how do we avoid getting "stuck" with a partially full exclusion set? In some cases, it seems like this is actually the implemented behavior: if a framework keeps accepting and lauching small very short-lived tasks on an agent, another framework that declined it might never get it back.



src/master/allocator/mesos/hierarchical.cpp
Lines 2169-2180 (patched)
<https://reviews.apache.org/r/66984/#comment284650>

    Is this required for correctness or is this an optimization?



src/master/allocator/mesos/hierarchical.cpp
Lines 2188-2199 (patched)
<https://reviews.apache.org/r/66984/#comment284651>

    Is this required for correctness or is this an optimization?



src/master/allocator/mesos/hierarchical.cpp
Lines 2104-2151 (original), 2230-2282 (patched)
<https://reviews.apache.org/r/66984/#comment284631>

    Moving this up seems like an optimization that we could pull out in front of this change to minimize this diff?



src/master/allocator/mesos/hierarchical.cpp
Lines 2287-2299 (patched)
<https://reviews.apache.org/r/66984/#comment284634>

    Is this required for correctness? Or is this an optimization?



src/master/allocator/mesos/hierarchical.cpp
Lines 2315 (patched)
<https://reviews.apache.org/r/66984/#comment284652>

    See my comment above, is subset actually needed here or is equality sufficient?



src/master/allocator/mesos/hierarchical.cpp
Lines 2343-2349 (patched)
<https://reviews.apache.org/r/66984/#comment284642>

    Rather than the bool indirection, is it possible to check the two sets we care about? That would be more readable.


- Benjamin Mahler


On May 8, 2018, 4:55 p.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66984/
> -----------------------------------------------------------
> 
> (Updated May 8, 2018, 4:55 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Kapil Arya, Till Toenshoff, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Schedulers that are below their fair share will continue to get
> allocated resources even if they don't need those resources. The
> expected behavior here is that schedulers will decline those resources
> for a long time (or forever). Not all schedulers do this, however,
> which means that some schedulers might get _starved_ of
> resources. Technically these schedulers are already at or above their
> fair share, but some operators feel that this is keeping the cluster
> underutilized.
> 
> Rather than guarantee that all schedulers will decline with large
> timeouts this patch ensures that all other schedulers will see
> resources from the declined agent before they see resources from that
> agent again, even if there are now more resources available on that
> agent.
> 
> To this end, each agent maintains an exclusion set of frameworks
> that declined its allocation earlier and skips them during the
> allocation. Note, a framework here is associated with each of
> its roles. If a framework declined resources allocated to one
> of its roles, it can still get allocations for its other roles.
> 
> Note, an agent's exclusion set will be cleared if its attribute
> changes.
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/mesos/hierarchical.hpp 955ae3e6a9e3c790fb260311ec4b4ef725a826d3 
>   src/master/allocator/mesos/hierarchical.cpp 1000968be6a2935a4cac571414d7f06d7df7acf0 
> 
> 
> Diff: https://reviews.apache.org/r/66984/diff/5/
> 
> 
> Testing
> -------
> 
> make check
> Dedicated test in #66994
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>


Re: Review Request 66984: Added the ability to exhaustively allocate declined resources.

Posted by Meng Zhu <mz...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66984/
-----------------------------------------------------------

(Updated May 8, 2018, 9:55 a.m.)


Review request for mesos, Benjamin Mahler, Kapil Arya, Till Toenshoff, and Vinod Kone.


Changes
-------

Slip fix.


Repository: mesos


Description
-------

Schedulers that are below their fair share will continue to get
allocated resources even if they don't need those resources. The
expected behavior here is that schedulers will decline those resources
for a long time (or forever). Not all schedulers do this, however,
which means that some schedulers might get _starved_ of
resources. Technically these schedulers are already at or above their
fair share, but some operators feel that this is keeping the cluster
underutilized.

Rather than guarantee that all schedulers will decline with large
timeouts this patch ensures that all other schedulers will see
resources from the declined agent before they see resources from that
agent again, even if there are now more resources available on that
agent.

To this end, each agent maintains an exclusion set of frameworks
that declined its allocation earlier and skips them during the
allocation. Note, a framework here is associated with each of
its roles. If a framework declined resources allocated to one
of its roles, it can still get allocations for its other roles.

Note, an agent's exclusion set will be cleared if its attribute
changes.


Diffs (updated)
-----

  src/master/allocator/mesos/hierarchical.hpp 955ae3e6a9e3c790fb260311ec4b4ef725a826d3 
  src/master/allocator/mesos/hierarchical.cpp 1000968be6a2935a4cac571414d7f06d7df7acf0 


Diff: https://reviews.apache.org/r/66984/diff/5/

Changes: https://reviews.apache.org/r/66984/diff/4-5/


Testing
-------

make check
Dedicated test in #66994


Thanks,

Meng Zhu


Re: Review Request 66984: Added the ability to exhaustively allocate declined resources.

Posted by Meng Zhu <mz...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66984/
-----------------------------------------------------------

(Updated May 8, 2018, 9:36 a.m.)


Review request for mesos, Benjamin Mahler, Kapil Arya, Till Toenshoff, and Vinod Kone.


Changes
-------

Patch updated:
- Made exclusion set role aware;
- Moved exclusion set to slave struct;
- Added invariant check; 
- Added a todo in quota allocation stage;
- Documented a design decision in the description.


Repository: mesos


Description (updated)
-------

Schedulers that are below their fair share will continue to get
allocated resources even if they don't need those resources. The
expected behavior here is that schedulers will decline those resources
for a long time (or forever). Not all schedulers do this, however,
which means that some schedulers might get _starved_ of
resources. Technically these schedulers are already at or above their
fair share, but some operators feel that this is keeping the cluster
underutilized.

Rather than guarantee that all schedulers will decline with large
timeouts this patch ensures that all other schedulers will see
resources from the declined agent before they see resources from that
agent again, even if there are now more resources available on that
agent.

To this end, each agent maintains an exclusion set of frameworks
that declined its allocation earlier and skips them during the
allocation. Note, a framework here is associated with each of
its roles. If a framework declined resources allocated to one
of its roles, it can still get allocations for its other roles.

Note, an agent's exclusion set will be cleared if its attribute
changes.


Diffs (updated)
-----

  src/master/allocator/mesos/hierarchical.hpp 955ae3e6a9e3c790fb260311ec4b4ef725a826d3 
  src/master/allocator/mesos/hierarchical.cpp 1000968be6a2935a4cac571414d7f06d7df7acf0 


Diff: https://reviews.apache.org/r/66984/diff/4/

Changes: https://reviews.apache.org/r/66984/diff/3-4/


Testing
-------

make check
Dedicated test in #66994


Thanks,

Meng Zhu


Re: Review Request 66984: Added the ability to exhaustively allocate declined resources.

Posted by Meng Zhu <mz...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66984/
-----------------------------------------------------------

(Updated May 7, 2018, 2:02 p.m.)


Review request for mesos, Benjamin Mahler, Kapil Arya, Till Toenshoff, and Vinod Kone.


Changes
-------

Added a todo to check invariants.


Repository: mesos


Description
-------

Schedulers that are below their fair share will continue to get
allocated resources even if they don't need those resources. The
expected behavior here is that schedulers will decline those resources
for a long time (or forever). Not all schedulers do this, however,
which means that some schedulers might get _starved_ of
resources. Technically these schedulers are already at or above their
fair share, but some operators feel that this is keeping the cluster
underutilized.

Rather than guarantee that all schedulers will decline with large
timeouts this patch ensures that all other schedulers will see
resources from the declined agent before they see resources from that
agent again, even if there are now more resources available on that
agent.


Diffs (updated)
-----

  src/master/allocator/mesos/hierarchical.hpp 955ae3e6a9e3c790fb260311ec4b4ef725a826d3 
  src/master/allocator/mesos/hierarchical.cpp 1000968be6a2935a4cac571414d7f06d7df7acf0 


Diff: https://reviews.apache.org/r/66984/diff/3/

Changes: https://reviews.apache.org/r/66984/diff/2-3/


Testing
-------

make check
Need to add dedicated tests.


Thanks,

Meng Zhu