You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Andrei Sekretenko <as...@mesosphere.io> on 2019/10/29 15:30:54 UTC

Review Request 71697: Optimized tracking of cluster resource totals.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71697/
-----------------------------------------------------------

Review request for mesos, Benjamin Mahler and Meng Zhu.


Bugs: MESOS-10015
    https://issues.apache.org/jira/browse/MESOS-10015


Repository: mesos


Description
-------

This patch addresses poor performance of
`HierarchicalAllocatorProcess::updateAllocation()` for agents with
a huge number of non-addable resources in a many-framework case
(see MESOS-10015).

Sorter methods for totals tracking that modify `Resources` of an agent
in the Sorter are replaced with methods that add/remove resource
quantities of an agent as a whole (which was actually the only use case
of the old methods). Thus, subtracting/adding `Resources` of a whole
agent no longer occurs when updating resources of an agent in a Sorter.

Further, this patch completely removes agent resource tracking logic
from the random sorter (which by itself makes no use of them) by
implementing cluster totals tracking in the allocator.

Results of `*BENCHMARK_WithReservationParam.UpdateAllocation*`
(for the DRF sorter):

1.8.x branch:
Agent resources size: 200 (50 frameworks)
Made 20 reserve and unreserve operations in 1.938801227secs
Agent resources size: 400 (100 frameworks)
Made 20 reserve and unreserve operations in 13.861857374secs
Agent resources size: 800 (200 frameworks)
Made 20 reserve and unreserve operations in 2.13412983136667mins

1.8.x branch + this pathch:
Agent resources size: 200 (50 frameworks)
Made 20 reserve and unreserve operations in 214.063821ms
Agent resources size: 400 (100 frameworks)
Made 20 reserve and unreserve operations in 425.278671ms
Agent resources size: 800 (200 frameworks)
Made 20 reserve and unreserve operations in 1.136214374secs
...
Agent resources size: 6400 (1600 frameworks)
Made 20 reserve and unreserve operations in 50.094194999secs

This is a backport of https://reviews.apache.org/r/71646


Diffs
-----

  src/master/allocator/mesos/hierarchical.hpp 4f716820748e070569e988f8dad15670367a74b7 
  src/master/allocator/mesos/hierarchical.cpp 061b70258f4874f4f2b26a57705b9ba1543c7553 
  src/master/allocator/sorter/drf/sorter.hpp 7daf1bfd2dfe88e2d8e0af07c8af8aa823f80935 
  src/master/allocator/sorter/drf/sorter.cpp 9367469132e426f0b4b66a80ad300c157fba6bf2 
  src/master/allocator/sorter/random/sorter.hpp c8e777be256b4faf931bf1a106185d7f91b3ba6f 
  src/master/allocator/sorter/random/sorter.cpp 9899cfd570607a60dbd7980d340a8e7d9d3e6df5 
  src/master/allocator/sorter/sorter.hpp d56a1166a9e82b034564842ac071874ec2885004 
  src/tests/sorter_tests.cpp 1e4a7893411d2107049a7bb92ee159526588c58c 


Diff: https://reviews.apache.org/r/71697/diff/1/


Testing
-------

make check

`*BENCHMARK_WithReservationParam.UpdateAllocation*`:

**Before:**

Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 1.938801227secs
Average UNRESERVE duration: 49.161884ms
Average RESERVE duration: 47.778177ms

Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 13.861857374secs
Average UNRESERVE duration: 346.822609ms
Average RESERVE duration: 346.270259ms

Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 2.13412983136667mins
Average UNRESERVE duration: 3.200348465secs
Average RESERVE duration: 3.202041028secs

Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
(killed after several minutes)

**After:**

Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 214.063821ms
Average UNRESERVE duration: 5.134867ms
Average RESERVE duration: 5.568323ms

Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 425.278671ms
Average UNRESERVE duration: 10.201193ms
Average RESERVE duration: 11.06274ms

Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 1.136214374secs
Average UNRESERVE duration: 28.336427ms
Average RESERVE duration: 28.474291ms

Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 3.773618637secs
Average UNRESERVE duration: 93.619424ms
Average RESERVE duration: 95.061507ms

Agent resources size: 3200 (800 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 13.881966194secs
Average UNRESERVE duration: 350.46368ms
Average RESERVE duration: 343.634628ms

Agent resources size: 6400 (1600 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 50.094194999secs
Average UNRESERVE duration: 1.252057472secs
Average RESERVE duration: 1.252652277secs


Thanks,

Andrei Sekretenko


Re: Review Request 71697: Optimized tracking of cluster resource totals.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71697/#review218435
-----------------------------------------------------------



Bad review!

Reviews applied: [71697]

Error:
2019-10-29 16:48:01 URL:https://reviews.apache.org/r/71697/diff/raw/ [39429/39429] -> "71697.patch" [1]
error: patch failed: src/master/allocator/mesos/hierarchical.hpp:529
error: src/master/allocator/mesos/hierarchical.hpp: patch does not apply
error: patch failed: src/master/allocator/mesos/hierarchical.cpp:570
error: src/master/allocator/mesos/hierarchical.cpp: patch does not apply
error: src/master/allocator/sorter/drf/sorter.hpp: does not exist in index
error: src/master/allocator/sorter/drf/sorter.cpp: does not exist in index
error: src/master/allocator/sorter/random/sorter.hpp: does not exist in index
error: src/master/allocator/sorter/random/sorter.cpp: does not exist in index
error: src/master/allocator/sorter/sorter.hpp: does not exist in index

- Mesos Reviewbot


On Oct. 29, 2019, 3:30 p.m., Andrei Sekretenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71697/
> -----------------------------------------------------------
> 
> (Updated Oct. 29, 2019, 3:30 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Meng Zhu.
> 
> 
> Bugs: MESOS-10015
>     https://issues.apache.org/jira/browse/MESOS-10015
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch addresses poor performance of
> `HierarchicalAllocatorProcess::updateAllocation()` for agents with
> a huge number of non-addable resources in a many-framework case
> (see MESOS-10015).
> 
> Sorter methods for totals tracking that modify `Resources` of an agent
> in the Sorter are replaced with methods that add/remove resource
> quantities of an agent as a whole (which was actually the only use case
> of the old methods). Thus, subtracting/adding `Resources` of a whole
> agent no longer occurs when updating resources of an agent in a Sorter.
> 
> Further, this patch completely removes agent resource tracking logic
> from the random sorter (which by itself makes no use of them) by
> implementing cluster totals tracking in the allocator.
> 
> Results of `*BENCHMARK_WithReservationParam.UpdateAllocation*`
> (for the DRF sorter):
> 
> 1.8.x branch:
> Agent resources size: 200 (50 frameworks)
> Made 20 reserve and unreserve operations in 1.938801227secs
> Agent resources size: 400 (100 frameworks)
> Made 20 reserve and unreserve operations in 13.861857374secs
> Agent resources size: 800 (200 frameworks)
> Made 20 reserve and unreserve operations in 2.13412983136667mins
> 
> 1.8.x branch + this pathch:
> Agent resources size: 200 (50 frameworks)
> Made 20 reserve and unreserve operations in 214.063821ms
> Agent resources size: 400 (100 frameworks)
> Made 20 reserve and unreserve operations in 425.278671ms
> Agent resources size: 800 (200 frameworks)
> Made 20 reserve and unreserve operations in 1.136214374secs
> ...
> Agent resources size: 6400 (1600 frameworks)
> Made 20 reserve and unreserve operations in 50.094194999secs
> 
> This is a backport of https://reviews.apache.org/r/71646
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/mesos/hierarchical.hpp 4f716820748e070569e988f8dad15670367a74b7 
>   src/master/allocator/mesos/hierarchical.cpp 061b70258f4874f4f2b26a57705b9ba1543c7553 
>   src/master/allocator/sorter/drf/sorter.hpp 7daf1bfd2dfe88e2d8e0af07c8af8aa823f80935 
>   src/master/allocator/sorter/drf/sorter.cpp 9367469132e426f0b4b66a80ad300c157fba6bf2 
>   src/master/allocator/sorter/random/sorter.hpp c8e777be256b4faf931bf1a106185d7f91b3ba6f 
>   src/master/allocator/sorter/random/sorter.cpp 9899cfd570607a60dbd7980d340a8e7d9d3e6df5 
>   src/master/allocator/sorter/sorter.hpp d56a1166a9e82b034564842ac071874ec2885004 
>   src/tests/sorter_tests.cpp 1e4a7893411d2107049a7bb92ee159526588c58c 
> 
> 
> Diff: https://reviews.apache.org/r/71697/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> `*BENCHMARK_WithReservationParam.UpdateAllocation*`:
> 
> **Before:**
> 
> Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 1.938801227secs
> Average UNRESERVE duration: 49.161884ms
> Average RESERVE duration: 47.778177ms
> 
> Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 13.861857374secs
> Average UNRESERVE duration: 346.822609ms
> Average RESERVE duration: 346.270259ms
> 
> Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 2.13412983136667mins
> Average UNRESERVE duration: 3.200348465secs
> Average RESERVE duration: 3.202041028secs
> 
> Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
> (killed after several minutes)
> 
> **After:**
> 
> Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 214.063821ms
> Average UNRESERVE duration: 5.134867ms
> Average RESERVE duration: 5.568323ms
> 
> Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 425.278671ms
> Average UNRESERVE duration: 10.201193ms
> Average RESERVE duration: 11.06274ms
> 
> Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 1.136214374secs
> Average UNRESERVE duration: 28.336427ms
> Average RESERVE duration: 28.474291ms
> 
> Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 3.773618637secs
> Average UNRESERVE duration: 93.619424ms
> Average RESERVE duration: 95.061507ms
> 
> Agent resources size: 3200 (800 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 13.881966194secs
> Average UNRESERVE duration: 350.46368ms
> Average RESERVE duration: 343.634628ms
> 
> Agent resources size: 6400 (1600 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 50.094194999secs
> Average UNRESERVE duration: 1.252057472secs
> Average RESERVE duration: 1.252652277secs
> 
> 
> Thanks,
> 
> Andrei Sekretenko
> 
>