You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Andrei Sekretenko <as...@mesosphere.io> on 2019/10/24 16:52:54 UTC

Re: Review Request 71646: Modified Sorter interface to add/remove agent resources as a whole.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71646/
-----------------------------------------------------------

(Updated Oct. 24, 2019, 4:52 p.m.)


Review request for mesos, Benjamin Mahler and Meng Zhu.


Changes
-------

Addressed issues; modified sorters to store per-agent `resourceQuantities` instead of `Resources`.


Summary (updated)
-----------------

Modified Sorter interface to add/remove agent resources as a whole.


Bugs: MESOS-10015
    https://issues.apache.org/jira/browse/MESOS-10015


Repository: mesos


Description (updated)
-------

This patch replaces Sorter methods which modify `Resources` of an agent
in the Sorter with methods which add/remove an agent as a whole (which
was actually the only use case of the old methods). Thus, subtracting/
adding `Resources` of the whole agent no longer occurs when updating
resources of the agent in the Sorter.

This mitigates the issue with poor performance of
`HierarchicalAllocatorProcess::updateAllocation()` for agents with
a huge number of non-addable resources (see MESOS-10015).

Results of `*BENCHMARK_WithReservationParam.UpdateAllocation*`:

Master:
Agent resources size: 200 (50 frameworks)
Made 20 reserve and unreserve operations in 2.08586secs
Agent resources size: 400 (100 frameworks)
Made 20 reserve and unreserve operations in 13.8449005secs
Agent resources size: 800 (200 frameworks)
Made 20 reserve and unreserve operations in 2.19253121188333mins

Master + this patch:
Agent resources size: 200 (50 frameworks)
Made 20 reserve and unreserve operations in 471.781183ms
Agent resources size: 400 (100 frameworks)
Made 20 reserve and unreserve operations in 1.022879058secs
Agent resources size: 800 (200 frameworks)
Made 20 reserve and unreserve operations in 2.622324521secs
...
Agent resources size: 6400 (1600 frameworks)
Made 20 reserve and unreserve operations in 2.04261335795mins


Diffs (updated)
-----

  src/master/allocator/mesos/hierarchical.cpp 21010de363f25c516bb031e4ae48888e53621128 
  src/master/allocator/mesos/sorter/drf/sorter.hpp 3f6c7413f1b76f3fa86388360983763c8b76079f 
  src/master/allocator/mesos/sorter/drf/sorter.cpp ef79083b710fba628b4a7e93f903883899f8a71b 
  src/master/allocator/mesos/sorter/random/sorter.hpp a3097be98d175d2b47714eb8b70b1ce8c5c2bba8 
  src/master/allocator/mesos/sorter/random/sorter.cpp 86aeb1b8136eaffd2d52d3b603636b01383a9024 
  src/master/allocator/mesos/sorter/sorter.hpp 6b6b4a1811ba36e0212de17b9a6e63a6f8678a7f 
  src/tests/sorter_tests.cpp d7fdee8f2cab4c930230750f0bd1a55eb08f89bb 


Diff: https://reviews.apache.org/r/71646/diff/2/

Changes: https://reviews.apache.org/r/71646/diff/1-2/


Testing (updated)
-------

**make check**

**Variant of `ReservationParam/HierarchicalAllocator__BENCHMARK_WithReservationParam`**
from https://reviews.apache.org/r/71639/ (work in progress) 
shows significant improvement and change from O(number_of_roles^3) to O(number_of_roles^2):
**Before**:
Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 2.08586secs
Average UNRESERVE duration: 51.491561ms
Average RESERVE duration: 52.801438ms

Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 13.8449005secs
Average UNRESERVE duration: 347.624639ms
Average RESERVE duration: 344.620385ms

Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 2.19253121188333mins
Average UNRESERVE duration: 3.285422441secs
Average RESERVE duration: 3.292171194secs

Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
(killed after several minutes)

**After:**
Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 471.781183ms
Average UNRESERVE duration: 12.112223ms
Average RESERVE duration: 11.476835ms

Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 1.022879058secs
Average UNRESERVE duration: 25.53819ms
Average RESERVE duration: 25.605762ms

Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 2.622324521secs
Average UNRESERVE duration: 65.166039ms
Average RESERVE duration: 65.950186ms

Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 8.419455875secs
Average UNRESERVE duration: 209.886948ms
Average RESERVE duration: 211.085845ms

Agent resources size: 3200 (800 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 32.82614382secs
Average UNRESERVE duration: 823.126069ms
Average RESERVE duration: 818.181121ms

Agent resources size: 6400 (1600 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 2.04261335795mins
Average UNRESERVE duration: 3.063538394secs
Average RESERVE duration: 3.064301679secs

**No significant performnce changes in `QuotaParam/BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota`.**

**Before:**

Added 30 agents in 1.175593ms
Added 30 frameworks in 6.829173ms
Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter
Made 36 allocations in 8.294832ms
Made 0 allocation in 3.674923ms

Added 300 agents in 7.860046ms
Added 300 frameworks in 149.743858ms
Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter
Made 350 allocations in 132.796102ms
Made 0 allocation in 107.887758ms

Added 3000 agents in 36.944587ms
Added 3000 frameworks in 10.688501403secs
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
Made 3500 allocations in 12.6020582secs
Made 0 allocation in 9.716229696secs

Added 30 agents in 1.010362ms
Added 30 frameworks in 6.272027ms
Benchmark setup: 30 agents, 30 roles, 30 frameworks, with random sorter
Made 38 allocations in 9.119976ms
Made 0 allocation in 5.460369ms

Added 300 agents in 7.442897ms
Added 300 frameworks in 152.016597ms
Benchmark setup: 300 agents, 300 roles, 300 frameworks, with random sorter
Made 391 allocations in 195.242282ms
Made 0 allocation in 139.638551ms

Added 3000 agents in 36.003028ms
Added 3000 frameworks in 11.203697649secs
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
Made 3856 allocations in 17.807913455secs
Made 0 allocation in 13.524946653secs

**After:**

Added 30 agents in 1.196576ms
Added 30 frameworks in 6.814792ms
Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter
Made 36 allocations in 8.263036ms
Made 0 allocation in 3.947283ms

Added 300 agents in 8.497121ms
Added 300 frameworks in 156.578165ms
Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter
Made 350 allocations in 168.745307ms
Made 0 allocation in 95.505069ms

Added 3000 agents in 38.074525ms
Added 3000 frameworks in 11.249150205secs
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
Made 3500 allocations in 12.772526049secs
Made 0 allocation in 10.132801781secs

Added 30 agents in 799844ns
Added 30 frameworks in 5.8663ms
Benchmark setup: 30 agents, 30 roles, 30 frameworks, with random sorter
Made 38 allocations in 9.612524ms
Made 0 allocation in 5.150924ms

Added 300 agents in 5.560583ms
Added 300 frameworks in 138.469712ms
Benchmark setup: 300 agents, 300 roles, 300 frameworks, with random sorter
Made 391 allocations in 175.021255ms
Made 0 allocation in 138.181869ms

Added 3000 agents in 42.921689ms
Added 3000 frameworks in 10.825018278secs
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
Made 3856 allocations in 15.29232742secs
Made 0 allocation in 14.202057473secs


Thanks,

Andrei Sekretenko


Re: Review Request 71646: Modified Sorter interface to add/remove agent resources as a whole.

Posted by Meng Zhu <mz...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71646/#review218401
-----------------------------------------------------------


Ship it!




Ship It!

- Meng Zhu


On Oct. 24, 2019, 9:52 a.m., Andrei Sekretenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71646/
> -----------------------------------------------------------
> 
> (Updated Oct. 24, 2019, 9:52 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Meng Zhu.
> 
> 
> Bugs: MESOS-10015
>     https://issues.apache.org/jira/browse/MESOS-10015
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch replaces Sorter methods which modify `Resources` of an agent
> in the Sorter with methods which add/remove an agent as a whole (which
> was actually the only use case of the old methods). Thus, subtracting/
> adding `Resources` of the whole agent no longer occurs when updating
> resources of the agent in the Sorter.
> 
> This mitigates the issue with poor performance of
> `HierarchicalAllocatorProcess::updateAllocation()` for agents with
> a huge number of non-addable resources (see MESOS-10015).
> 
> Results of `*BENCHMARK_WithReservationParam.UpdateAllocation*`:
> 
> Master:
> Agent resources size: 200 (50 frameworks)
> Made 20 reserve and unreserve operations in 2.08586secs
> Agent resources size: 400 (100 frameworks)
> Made 20 reserve and unreserve operations in 13.8449005secs
> Agent resources size: 800 (200 frameworks)
> Made 20 reserve and unreserve operations in 2.19253121188333mins
> 
> Master + this patch:
> Agent resources size: 200 (50 frameworks)
> Made 20 reserve and unreserve operations in 471.781183ms
> Agent resources size: 400 (100 frameworks)
> Made 20 reserve and unreserve operations in 1.022879058secs
> Agent resources size: 800 (200 frameworks)
> Made 20 reserve and unreserve operations in 2.622324521secs
> ...
> Agent resources size: 6400 (1600 frameworks)
> Made 20 reserve and unreserve operations in 2.04261335795mins
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/mesos/hierarchical.cpp 21010de363f25c516bb031e4ae48888e53621128 
>   src/master/allocator/mesos/sorter/drf/sorter.hpp 3f6c7413f1b76f3fa86388360983763c8b76079f 
>   src/master/allocator/mesos/sorter/drf/sorter.cpp ef79083b710fba628b4a7e93f903883899f8a71b 
>   src/master/allocator/mesos/sorter/random/sorter.hpp a3097be98d175d2b47714eb8b70b1ce8c5c2bba8 
>   src/master/allocator/mesos/sorter/random/sorter.cpp 86aeb1b8136eaffd2d52d3b603636b01383a9024 
>   src/master/allocator/mesos/sorter/sorter.hpp 6b6b4a1811ba36e0212de17b9a6e63a6f8678a7f 
>   src/tests/sorter_tests.cpp d7fdee8f2cab4c930230750f0bd1a55eb08f89bb 
> 
> 
> Diff: https://reviews.apache.org/r/71646/diff/2/
> 
> 
> Testing
> -------
> 
> **make check**
> 
> **Variant of `ReservationParam/HierarchicalAllocator__BENCHMARK_WithReservationParam`**
> from https://reviews.apache.org/r/71639/ (work in progress) 
> shows significant improvement and change from O(number_of_roles^3) to O(number_of_roles^2):
> **Before**:
> Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 2.08586secs
> Average UNRESERVE duration: 51.491561ms
> Average RESERVE duration: 52.801438ms
> 
> Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 13.8449005secs
> Average UNRESERVE duration: 347.624639ms
> Average RESERVE duration: 344.620385ms
> 
> Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 2.19253121188333mins
> Average UNRESERVE duration: 3.285422441secs
> Average RESERVE duration: 3.292171194secs
> 
> Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
> (killed after several minutes)
> 
> **After:**
> Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 471.781183ms
> Average UNRESERVE duration: 12.112223ms
> Average RESERVE duration: 11.476835ms
> 
> Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 1.022879058secs
> Average UNRESERVE duration: 25.53819ms
> Average RESERVE duration: 25.605762ms
> 
> Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 2.622324521secs
> Average UNRESERVE duration: 65.166039ms
> Average RESERVE duration: 65.950186ms
> 
> Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 8.419455875secs
> Average UNRESERVE duration: 209.886948ms
> Average RESERVE duration: 211.085845ms
> 
> Agent resources size: 3200 (800 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 32.82614382secs
> Average UNRESERVE duration: 823.126069ms
> Average RESERVE duration: 818.181121ms
> 
> Agent resources size: 6400 (1600 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 2.04261335795mins
> Average UNRESERVE duration: 3.063538394secs
> Average RESERVE duration: 3.064301679secs
> 
> **No significant performnce changes in `QuotaParam/BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota`.**
> 
> **Before:**
> 
> Added 30 agents in 1.175593ms
> Added 30 frameworks in 6.829173ms
> Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter
> Made 36 allocations in 8.294832ms
> Made 0 allocation in 3.674923ms
> 
> Added 300 agents in 7.860046ms
> Added 300 frameworks in 149.743858ms
> Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter
> Made 350 allocations in 132.796102ms
> Made 0 allocation in 107.887758ms
> 
> Added 3000 agents in 36.944587ms
> Added 3000 frameworks in 10.688501403secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
> Made 3500 allocations in 12.6020582secs
> Made 0 allocation in 9.716229696secs
> 
> Added 30 agents in 1.010362ms
> Added 30 frameworks in 6.272027ms
> Benchmark setup: 30 agents, 30 roles, 30 frameworks, with random sorter
> Made 38 allocations in 9.119976ms
> Made 0 allocation in 5.460369ms
> 
> Added 300 agents in 7.442897ms
> Added 300 frameworks in 152.016597ms
> Benchmark setup: 300 agents, 300 roles, 300 frameworks, with random sorter
> Made 391 allocations in 195.242282ms
> Made 0 allocation in 139.638551ms
> 
> Added 3000 agents in 36.003028ms
> Added 3000 frameworks in 11.203697649secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
> Made 3856 allocations in 17.807913455secs
> Made 0 allocation in 13.524946653secs
> 
> **After:**
> 
> Added 30 agents in 1.196576ms
> Added 30 frameworks in 6.814792ms
> Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter
> Made 36 allocations in 8.263036ms
> Made 0 allocation in 3.947283ms
> 
> Added 300 agents in 8.497121ms
> Added 300 frameworks in 156.578165ms
> Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter
> Made 350 allocations in 168.745307ms
> Made 0 allocation in 95.505069ms
> 
> Added 3000 agents in 38.074525ms
> Added 3000 frameworks in 11.249150205secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
> Made 3500 allocations in 12.772526049secs
> Made 0 allocation in 10.132801781secs
> 
> Added 30 agents in 799844ns
> Added 30 frameworks in 5.8663ms
> Benchmark setup: 30 agents, 30 roles, 30 frameworks, with random sorter
> Made 38 allocations in 9.612524ms
> Made 0 allocation in 5.150924ms
> 
> Added 300 agents in 5.560583ms
> Added 300 frameworks in 138.469712ms
> Benchmark setup: 300 agents, 300 roles, 300 frameworks, with random sorter
> Made 391 allocations in 175.021255ms
> Made 0 allocation in 138.181869ms
> 
> Added 3000 agents in 42.921689ms
> Added 3000 frameworks in 10.825018278secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
> Made 3856 allocations in 15.29232742secs
> Made 0 allocation in 14.202057473secs
> 
> 
> Thanks,
> 
> Andrei Sekretenko
> 
>


Re: Review Request 71646: Optimized tracking of cluster resource totals.

Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71646/#review218468
-----------------------------------------------------------




src/master/allocator/mesos/sorter/drf/sorter.cpp
Line 464 (original), 455-456 (patched)
<https://reviews.apache.org/r/71646/#comment306214>

    Whitespace is off here?


- Benjamin Mahler


On Oct. 29, 2019, 2:56 p.m., Andrei Sekretenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71646/
> -----------------------------------------------------------
> 
> (Updated Oct. 29, 2019, 2:56 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Meng Zhu.
> 
> 
> Bugs: MESOS-10015
>     https://issues.apache.org/jira/browse/MESOS-10015
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch addresses poor performance of
> `HierarchicalAllocatorProcess::updateAllocation()` for agents with
> a huge number of non-addable resources in a many-framework case
> (see MESOS-10015).
> 
> Sorter methods for totals tracking that modify `Resources` of an agent
> in the Sorter are replaced with methods that add/remove resource
> quantities of an agent as a whole (which was actually the only use case
> of the old methods). Thus, subtracting/adding `Resources` of a whole
> agent no longer occurs when updating resources of an agent in a Sorter.
> 
> Further, this patch completely removes agent resource tracking logic
> from the random sorter (which by itself makes no use of them) by
> implementing cluster totals tracking in the allocator.
> 
> Results of `*BENCHMARK_WithReservationParam.UpdateAllocation*`
> (for the DRF sorter):
> 
> Master:
> Agent resources size: 200 (50 frameworks)
> Made 20 reserve and unreserve operations in 2.08586secs
> Agent resources size: 400 (100 frameworks)
> Made 20 reserve and unreserve operations in 13.8449005secs
> Agent resources size: 800 (200 frameworks)
> Made 20 reserve and unreserve operations in 2.19253121188333mins
> 
> Master + this patch:
> Agent resources size: 200 (50 frameworks)
> Made 20 reserve and unreserve operations in 468.482366ms
> Agent resources size: 400 (100 frameworks)
> Made 20 reserve and unreserve operations in 925.725947ms
> Agent resources size: 800 (200 frameworks)
> Made 20 reserve and unreserve operations in 2.110337109secs
> ...
> Agent resources size: 6400 (1600 frameworks)
> Made 20 reserve and unreserve operations in 1.50141861756667mins
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/mesos/hierarchical.hpp 9d0fbe771868ea60e66b9e25b0c666d5416d6e85 
>   src/master/allocator/mesos/hierarchical.cpp 21010de363f25c516bb031e4ae48888e53621128 
>   src/master/allocator/mesos/sorter/drf/sorter.hpp 3f6c7413f1b76f3fa86388360983763c8b76079f 
>   src/master/allocator/mesos/sorter/drf/sorter.cpp ef79083b710fba628b4a7e93f903883899f8a71b 
>   src/master/allocator/mesos/sorter/random/sorter.hpp a3097be98d175d2b47714eb8b70b1ce8c5c2bba8 
>   src/master/allocator/mesos/sorter/random/sorter.cpp 86aeb1b8136eaffd2d52d3b603636b01383a9024 
>   src/master/allocator/mesos/sorter/sorter.hpp 6b6b4a1811ba36e0212de17b9a6e63a6f8678a7f 
>   src/tests/sorter_tests.cpp d7fdee8f2cab4c930230750f0bd1a55eb08f89bb 
> 
> 
> Diff: https://reviews.apache.org/r/71646/diff/4/
> 
> 
> Testing
> -------
> 
> **make check**
> 
> **Variant of `ReservationParam/HierarchicalAllocator__BENCHMARK_WithReservationParam`**
> from https://reviews.apache.org/r/71639/ (work in progress) 
> shows significant improvement and change from O(number_of_roles^3) to O(number_of_roles^2):
> **Before**:
> Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 2.08586secs
> Average UNRESERVE duration: 51.491561ms
> Average RESERVE duration: 52.801438ms
> 
> Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 13.8449005secs
> Average UNRESERVE duration: 347.624639ms
> Average RESERVE duration: 344.620385ms
> 
> Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 2.19253121188333mins
> Average UNRESERVE duration: 3.285422441secs
> Average RESERVE duration: 3.292171194secs
> 
> Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
> (killed after several minutes)
> 
> **After:**
> Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 468.482366ms
> Average UNRESERVE duration: 10.979921ms
> Average RESERVE duration: 12.444196ms
> 
> Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 925.725947ms
> Average UNRESERVE duration: 23.377155ms
> Average RESERVE duration: 22.909141ms
> 
> Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 2.110337109secs
> Average UNRESERVE duration: 52.53835ms
> Average RESERVE duration: 52.978505ms
> 
> Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 6.524451736secs
> Average UNRESERVE duration: 162.464708ms
> Average RESERVE duration: 163.757877ms
> 
> Agent resources size: 3200 (800 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 24.696928676secs
> Average UNRESERVE duration: 609.666416ms
> Average RESERVE duration: 625.180017ms
> 
> Agent resources size: 6400 (1600 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 1.50141861756667mins
> Average UNRESERVE duration: 2.269904993secs
> Average RESERVE duration: 2.234350859secs
> 
> **No significant performnce changes in `QuotaParam/BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota`.**
> 
> **Before:**
> 
> Added 30 agents in 1.175593ms
> Added 30 frameworks in 6.829173ms
> Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter
> Made 36 allocations in 8.294832ms
> Made 0 allocation in 3.674923ms
> 
> Added 300 agents in 7.860046ms
> Added 300 frameworks in 149.743858ms
> Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter
> Made 350 allocations in 132.796102ms
> Made 0 allocation in 107.887758ms
> 
> Added 3000 agents in 36.944587ms
> Added 3000 frameworks in 10.688501403secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
> Made 3500 allocations in 12.6020582secs
> Made 0 allocation in 9.716229696secs
> 
> Added 30 agents in 1.010362ms
> Added 30 frameworks in 6.272027ms
> Benchmark setup: 30 agents, 30 roles, 30 frameworks, with random sorter
> Made 38 allocations in 9.119976ms
> Made 0 allocation in 5.460369ms
> 
> Added 300 agents in 7.442897ms
> Added 300 frameworks in 152.016597ms
> Benchmark setup: 300 agents, 300 roles, 300 frameworks, with random sorter
> Made 391 allocations in 195.242282ms
> Made 0 allocation in 139.638551ms
> 
> Added 3000 agents in 36.003028ms
> Added 3000 frameworks in 11.203697649secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
> Made 3856 allocations in 17.807913455secs
> Made 0 allocation in 13.524946653secs
> 
> **After:**
> 
> Added 30 agents in 1.196576ms
> Added 30 frameworks in 6.814792ms
> Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter
> Made 36 allocations in 8.263036ms
> Made 0 allocation in 3.947283ms
> 
> Added 300 agents in 8.497121ms
> Added 300 frameworks in 156.578165ms
> Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter
> Made 350 allocations in 168.745307ms
> Made 0 allocation in 95.505069ms
> 
> Added 3000 agents in 38.074525ms
> Added 3000 frameworks in 11.249150205secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
> Made 3500 allocations in 12.772526049secs
> Made 0 allocation in 10.132801781secs
> 
> Added 30 agents in 799844ns
> Added 30 frameworks in 5.8663ms
> Benchmark setup: 30 agents, 30 roles, 30 frameworks, with random sorter
> Made 38 allocations in 9.612524ms
> Made 0 allocation in 5.150924ms
> 
> Added 300 agents in 5.560583ms
> Added 300 frameworks in 138.469712ms
> Benchmark setup: 300 agents, 300 roles, 300 frameworks, with random sorter
> Made 391 allocations in 175.021255ms
> Made 0 allocation in 138.181869ms
> 
> Added 3000 agents in 42.921689ms
> Added 3000 frameworks in 10.825018278secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
> Made 3856 allocations in 15.29232742secs
> Made 0 allocation in 14.202057473secs
> 
> 
> Thanks,
> 
> Andrei Sekretenko
> 
>


Re: Review Request 71646: Optimized tracking of cluster resource totals.

Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71646/#review218467
-----------------------------------------------------------


Fix it, then Ship it!





src/master/allocator/mesos/hierarchical.cpp
Lines 2930-2931 (patched)
<https://reviews.apache.org/r/71646/#comment306211>

    Stricly speaking, this opens a window of inconsistency.
    
    When there are allocations on the agent, one would hope that removing that agent would fail a CHECK (trying to remove an agent which still has allocation info tracked in the sorter).
    
    As I think Meng mentioned, an update call that goes from old -> new in a single call would allow the remove agent case to enforce that no allocations remain.



src/master/allocator/mesos/sorter/drf/sorter.cpp
Line 466 (original), 458 (patched)
<https://reviews.apache.org/r/71646/#comment306212>

    Rather than using inserted.first->second, perhaps just use scalarQuantities here? That also allows you to have `inserted` be a bool.



src/master/allocator/mesos/sorter/drf/sorter.cpp
Line 485 (original), 472 (patched)
<https://reviews.apache.org/r/71646/#comment306213>

    "Attempted to remove unknown slave XXX"


- Benjamin Mahler


On Oct. 29, 2019, 2:56 p.m., Andrei Sekretenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71646/
> -----------------------------------------------------------
> 
> (Updated Oct. 29, 2019, 2:56 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Meng Zhu.
> 
> 
> Bugs: MESOS-10015
>     https://issues.apache.org/jira/browse/MESOS-10015
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch addresses poor performance of
> `HierarchicalAllocatorProcess::updateAllocation()` for agents with
> a huge number of non-addable resources in a many-framework case
> (see MESOS-10015).
> 
> Sorter methods for totals tracking that modify `Resources` of an agent
> in the Sorter are replaced with methods that add/remove resource
> quantities of an agent as a whole (which was actually the only use case
> of the old methods). Thus, subtracting/adding `Resources` of a whole
> agent no longer occurs when updating resources of an agent in a Sorter.
> 
> Further, this patch completely removes agent resource tracking logic
> from the random sorter (which by itself makes no use of them) by
> implementing cluster totals tracking in the allocator.
> 
> Results of `*BENCHMARK_WithReservationParam.UpdateAllocation*`
> (for the DRF sorter):
> 
> Master:
> Agent resources size: 200 (50 frameworks)
> Made 20 reserve and unreserve operations in 2.08586secs
> Agent resources size: 400 (100 frameworks)
> Made 20 reserve and unreserve operations in 13.8449005secs
> Agent resources size: 800 (200 frameworks)
> Made 20 reserve and unreserve operations in 2.19253121188333mins
> 
> Master + this patch:
> Agent resources size: 200 (50 frameworks)
> Made 20 reserve and unreserve operations in 468.482366ms
> Agent resources size: 400 (100 frameworks)
> Made 20 reserve and unreserve operations in 925.725947ms
> Agent resources size: 800 (200 frameworks)
> Made 20 reserve and unreserve operations in 2.110337109secs
> ...
> Agent resources size: 6400 (1600 frameworks)
> Made 20 reserve and unreserve operations in 1.50141861756667mins
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/mesos/hierarchical.hpp 9d0fbe771868ea60e66b9e25b0c666d5416d6e85 
>   src/master/allocator/mesos/hierarchical.cpp 21010de363f25c516bb031e4ae48888e53621128 
>   src/master/allocator/mesos/sorter/drf/sorter.hpp 3f6c7413f1b76f3fa86388360983763c8b76079f 
>   src/master/allocator/mesos/sorter/drf/sorter.cpp ef79083b710fba628b4a7e93f903883899f8a71b 
>   src/master/allocator/mesos/sorter/random/sorter.hpp a3097be98d175d2b47714eb8b70b1ce8c5c2bba8 
>   src/master/allocator/mesos/sorter/random/sorter.cpp 86aeb1b8136eaffd2d52d3b603636b01383a9024 
>   src/master/allocator/mesos/sorter/sorter.hpp 6b6b4a1811ba36e0212de17b9a6e63a6f8678a7f 
>   src/tests/sorter_tests.cpp d7fdee8f2cab4c930230750f0bd1a55eb08f89bb 
> 
> 
> Diff: https://reviews.apache.org/r/71646/diff/4/
> 
> 
> Testing
> -------
> 
> **make check**
> 
> **Variant of `ReservationParam/HierarchicalAllocator__BENCHMARK_WithReservationParam`**
> from https://reviews.apache.org/r/71639/ (work in progress) 
> shows significant improvement and change from O(number_of_roles^3) to O(number_of_roles^2):
> **Before**:
> Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 2.08586secs
> Average UNRESERVE duration: 51.491561ms
> Average RESERVE duration: 52.801438ms
> 
> Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 13.8449005secs
> Average UNRESERVE duration: 347.624639ms
> Average RESERVE duration: 344.620385ms
> 
> Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 2.19253121188333mins
> Average UNRESERVE duration: 3.285422441secs
> Average RESERVE duration: 3.292171194secs
> 
> Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
> (killed after several minutes)
> 
> **After:**
> Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 468.482366ms
> Average UNRESERVE duration: 10.979921ms
> Average RESERVE duration: 12.444196ms
> 
> Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 925.725947ms
> Average UNRESERVE duration: 23.377155ms
> Average RESERVE duration: 22.909141ms
> 
> Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 2.110337109secs
> Average UNRESERVE duration: 52.53835ms
> Average RESERVE duration: 52.978505ms
> 
> Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 6.524451736secs
> Average UNRESERVE duration: 162.464708ms
> Average RESERVE duration: 163.757877ms
> 
> Agent resources size: 3200 (800 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 24.696928676secs
> Average UNRESERVE duration: 609.666416ms
> Average RESERVE duration: 625.180017ms
> 
> Agent resources size: 6400 (1600 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 1.50141861756667mins
> Average UNRESERVE duration: 2.269904993secs
> Average RESERVE duration: 2.234350859secs
> 
> **No significant performnce changes in `QuotaParam/BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota`.**
> 
> **Before:**
> 
> Added 30 agents in 1.175593ms
> Added 30 frameworks in 6.829173ms
> Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter
> Made 36 allocations in 8.294832ms
> Made 0 allocation in 3.674923ms
> 
> Added 300 agents in 7.860046ms
> Added 300 frameworks in 149.743858ms
> Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter
> Made 350 allocations in 132.796102ms
> Made 0 allocation in 107.887758ms
> 
> Added 3000 agents in 36.944587ms
> Added 3000 frameworks in 10.688501403secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
> Made 3500 allocations in 12.6020582secs
> Made 0 allocation in 9.716229696secs
> 
> Added 30 agents in 1.010362ms
> Added 30 frameworks in 6.272027ms
> Benchmark setup: 30 agents, 30 roles, 30 frameworks, with random sorter
> Made 38 allocations in 9.119976ms
> Made 0 allocation in 5.460369ms
> 
> Added 300 agents in 7.442897ms
> Added 300 frameworks in 152.016597ms
> Benchmark setup: 300 agents, 300 roles, 300 frameworks, with random sorter
> Made 391 allocations in 195.242282ms
> Made 0 allocation in 139.638551ms
> 
> Added 3000 agents in 36.003028ms
> Added 3000 frameworks in 11.203697649secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
> Made 3856 allocations in 17.807913455secs
> Made 0 allocation in 13.524946653secs
> 
> **After:**
> 
> Added 30 agents in 1.196576ms
> Added 30 frameworks in 6.814792ms
> Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter
> Made 36 allocations in 8.263036ms
> Made 0 allocation in 3.947283ms
> 
> Added 300 agents in 8.497121ms
> Added 300 frameworks in 156.578165ms
> Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter
> Made 350 allocations in 168.745307ms
> Made 0 allocation in 95.505069ms
> 
> Added 3000 agents in 38.074525ms
> Added 3000 frameworks in 11.249150205secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
> Made 3500 allocations in 12.772526049secs
> Made 0 allocation in 10.132801781secs
> 
> Added 30 agents in 799844ns
> Added 30 frameworks in 5.8663ms
> Benchmark setup: 30 agents, 30 roles, 30 frameworks, with random sorter
> Made 38 allocations in 9.612524ms
> Made 0 allocation in 5.150924ms
> 
> Added 300 agents in 5.560583ms
> Added 300 frameworks in 138.469712ms
> Benchmark setup: 300 agents, 300 roles, 300 frameworks, with random sorter
> Made 391 allocations in 175.021255ms
> Made 0 allocation in 138.181869ms
> 
> Added 3000 agents in 42.921689ms
> Added 3000 frameworks in 10.825018278secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
> Made 3856 allocations in 15.29232742secs
> Made 0 allocation in 14.202057473secs
> 
> 
> Thanks,
> 
> Andrei Sekretenko
> 
>


Re: Review Request 71646: Optimized tracking of cluster resource totals.

Posted by Andrei Sekretenko <as...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71646/
-----------------------------------------------------------

(Updated Oct. 29, 2019, 2:56 p.m.)


Review request for mesos, Benjamin Mahler and Meng Zhu.


Changes
-------

Improved commit message and updated test results


Summary (updated)
-----------------

Optimized tracking of cluster resource totals.


Bugs: MESOS-10015
    https://issues.apache.org/jira/browse/MESOS-10015


Repository: mesos


Description (updated)
-------

This patch addresses poor performance of
`HierarchicalAllocatorProcess::updateAllocation()` for agents with
a huge number of non-addable resources in a many-framework case
(see MESOS-10015).

Sorter methods for totals tracking that modify `Resources` of an agent
in the Sorter are replaced with methods that add/remove resource
quantities of an agent as a whole (which was actually the only use case
of the old methods). Thus, subtracting/adding `Resources` of a whole
agent no longer occurs when updating resources of an agent in a Sorter.

Further, this patch completely removes agent resource tracking logic
from the random sorter (which by itself makes no use of them) by
implementing cluster totals tracking in the allocator.

Results of `*BENCHMARK_WithReservationParam.UpdateAllocation*`
(for the DRF sorter):

Master:
Agent resources size: 200 (50 frameworks)
Made 20 reserve and unreserve operations in 2.08586secs
Agent resources size: 400 (100 frameworks)
Made 20 reserve and unreserve operations in 13.8449005secs
Agent resources size: 800 (200 frameworks)
Made 20 reserve and unreserve operations in 2.19253121188333mins

Master + this patch:
Agent resources size: 200 (50 frameworks)
Made 20 reserve and unreserve operations in 468.482366ms
Agent resources size: 400 (100 frameworks)
Made 20 reserve and unreserve operations in 925.725947ms
Agent resources size: 800 (200 frameworks)
Made 20 reserve and unreserve operations in 2.110337109secs
...
Agent resources size: 6400 (1600 frameworks)
Made 20 reserve and unreserve operations in 1.50141861756667mins


Diffs (updated)
-----

  src/master/allocator/mesos/hierarchical.hpp 9d0fbe771868ea60e66b9e25b0c666d5416d6e85 
  src/master/allocator/mesos/hierarchical.cpp 21010de363f25c516bb031e4ae48888e53621128 
  src/master/allocator/mesos/sorter/drf/sorter.hpp 3f6c7413f1b76f3fa86388360983763c8b76079f 
  src/master/allocator/mesos/sorter/drf/sorter.cpp ef79083b710fba628b4a7e93f903883899f8a71b 
  src/master/allocator/mesos/sorter/random/sorter.hpp a3097be98d175d2b47714eb8b70b1ce8c5c2bba8 
  src/master/allocator/mesos/sorter/random/sorter.cpp 86aeb1b8136eaffd2d52d3b603636b01383a9024 
  src/master/allocator/mesos/sorter/sorter.hpp 6b6b4a1811ba36e0212de17b9a6e63a6f8678a7f 
  src/tests/sorter_tests.cpp d7fdee8f2cab4c930230750f0bd1a55eb08f89bb 


Diff: https://reviews.apache.org/r/71646/diff/4/

Changes: https://reviews.apache.org/r/71646/diff/3-4/


Testing (updated)
-------

**make check**

**Variant of `ReservationParam/HierarchicalAllocator__BENCHMARK_WithReservationParam`**
from https://reviews.apache.org/r/71639/ (work in progress) 
shows significant improvement and change from O(number_of_roles^3) to O(number_of_roles^2):
**Before**:
Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 2.08586secs
Average UNRESERVE duration: 51.491561ms
Average RESERVE duration: 52.801438ms

Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 13.8449005secs
Average UNRESERVE duration: 347.624639ms
Average RESERVE duration: 344.620385ms

Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 2.19253121188333mins
Average UNRESERVE duration: 3.285422441secs
Average RESERVE duration: 3.292171194secs

Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
(killed after several minutes)

**After:**
Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 468.482366ms
Average UNRESERVE duration: 10.979921ms
Average RESERVE duration: 12.444196ms

Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 925.725947ms
Average UNRESERVE duration: 23.377155ms
Average RESERVE duration: 22.909141ms

Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 2.110337109secs
Average UNRESERVE duration: 52.53835ms
Average RESERVE duration: 52.978505ms

Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 6.524451736secs
Average UNRESERVE duration: 162.464708ms
Average RESERVE duration: 163.757877ms

Agent resources size: 3200 (800 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 24.696928676secs
Average UNRESERVE duration: 609.666416ms
Average RESERVE duration: 625.180017ms

Agent resources size: 6400 (1600 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 1.50141861756667mins
Average UNRESERVE duration: 2.269904993secs
Average RESERVE duration: 2.234350859secs

**No significant performnce changes in `QuotaParam/BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota`.**

**Before:**

Added 30 agents in 1.175593ms
Added 30 frameworks in 6.829173ms
Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter
Made 36 allocations in 8.294832ms
Made 0 allocation in 3.674923ms

Added 300 agents in 7.860046ms
Added 300 frameworks in 149.743858ms
Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter
Made 350 allocations in 132.796102ms
Made 0 allocation in 107.887758ms

Added 3000 agents in 36.944587ms
Added 3000 frameworks in 10.688501403secs
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
Made 3500 allocations in 12.6020582secs
Made 0 allocation in 9.716229696secs

Added 30 agents in 1.010362ms
Added 30 frameworks in 6.272027ms
Benchmark setup: 30 agents, 30 roles, 30 frameworks, with random sorter
Made 38 allocations in 9.119976ms
Made 0 allocation in 5.460369ms

Added 300 agents in 7.442897ms
Added 300 frameworks in 152.016597ms
Benchmark setup: 300 agents, 300 roles, 300 frameworks, with random sorter
Made 391 allocations in 195.242282ms
Made 0 allocation in 139.638551ms

Added 3000 agents in 36.003028ms
Added 3000 frameworks in 11.203697649secs
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
Made 3856 allocations in 17.807913455secs
Made 0 allocation in 13.524946653secs

**After:**

Added 30 agents in 1.196576ms
Added 30 frameworks in 6.814792ms
Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter
Made 36 allocations in 8.263036ms
Made 0 allocation in 3.947283ms

Added 300 agents in 8.497121ms
Added 300 frameworks in 156.578165ms
Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter
Made 350 allocations in 168.745307ms
Made 0 allocation in 95.505069ms

Added 3000 agents in 38.074525ms
Added 3000 frameworks in 11.249150205secs
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
Made 3500 allocations in 12.772526049secs
Made 0 allocation in 10.132801781secs

Added 30 agents in 799844ns
Added 30 frameworks in 5.8663ms
Benchmark setup: 30 agents, 30 roles, 30 frameworks, with random sorter
Made 38 allocations in 9.612524ms
Made 0 allocation in 5.150924ms

Added 300 agents in 5.560583ms
Added 300 frameworks in 138.469712ms
Benchmark setup: 300 agents, 300 roles, 300 frameworks, with random sorter
Made 391 allocations in 175.021255ms
Made 0 allocation in 138.181869ms

Added 3000 agents in 42.921689ms
Added 3000 frameworks in 10.825018278secs
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
Made 3856 allocations in 15.29232742secs
Made 0 allocation in 14.202057473secs


Thanks,

Andrei Sekretenko


Re: Review Request 71646: Got rid of `Reources::add()/subtract()` for agent resources in Sorter.

Posted by Andrei Sekretenko <as...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71646/
-----------------------------------------------------------

(Updated Oct. 28, 2019, 6:45 p.m.)


Review request for mesos, Benjamin Mahler and Meng Zhu.


Changes
-------

Merged https://reviews.apache.org/r/71672 and https://reviews.apache.org/r/71673/ into this patch.


Summary (updated)
-----------------

Got rid of `Reources::add()/subtract()` for agent resources in Sorter.


Bugs: MESOS-10015
    https://issues.apache.org/jira/browse/MESOS-10015


Repository: mesos


Description (updated)
-------

This patch addresses the issue with poor performance of
`HierarchicalAllocatorProcess::updateAllocation()` for agents with
a huge number of non-addable resources (see MESOS-10015)

Sorter methods that modify `Resources` of an agent in the Sorter
are replaced with methods that add/remove resource quantities of an
agent as a whole (which was actually the only use case of the old
methods). Thus subtracting/adding of `Resources` of the whole agent
no longer occurs when updating resources of the agent in the Sorter.

Further, this patch fully removes tracking agent resources from the
random sorter by implementing tracking of the cluster totals inside
of the allocator.

Results of `*BENCHMARK_WithReservationParam.UpdateAllocation*`
(for the DRF sorter):

Master:
Agent resources size: 200 (50 frameworks)
Made 20 reserve and unreserve operations in 2.08586secs
Agent resources size: 400 (100 frameworks)
Made 20 reserve and unreserve operations in 13.8449005secs
Agent resources size: 800 (200 frameworks)
Made 20 reserve and unreserve operations in 2.19253121188333mins

Master + this patch:
Agent resources size: 200 (50 frameworks)
Made 20 reserve and unreserve operations in 463.223084ms
Agent resources size: 400 (100 frameworks)
Made 20 reserve and unreserve operations in 930.097972ms
Agent resources size: 800 (200 frameworks)
Made 20 reserve and unreserve operations in 2.160506847secs
...
Agent resources size: 6400 (1600 frameworks)
Made 20 reserve and unreserve operations in 1.50729369885mins


Diffs (updated)
-----

  src/master/allocator/mesos/hierarchical.hpp 9d0fbe771868ea60e66b9e25b0c666d5416d6e85 
  src/master/allocator/mesos/hierarchical.cpp 21010de363f25c516bb031e4ae48888e53621128 
  src/master/allocator/mesos/sorter/drf/sorter.hpp 3f6c7413f1b76f3fa86388360983763c8b76079f 
  src/master/allocator/mesos/sorter/drf/sorter.cpp ef79083b710fba628b4a7e93f903883899f8a71b 
  src/master/allocator/mesos/sorter/random/sorter.hpp a3097be98d175d2b47714eb8b70b1ce8c5c2bba8 
  src/master/allocator/mesos/sorter/random/sorter.cpp 86aeb1b8136eaffd2d52d3b603636b01383a9024 
  src/master/allocator/mesos/sorter/sorter.hpp 6b6b4a1811ba36e0212de17b9a6e63a6f8678a7f 
  src/tests/sorter_tests.cpp d7fdee8f2cab4c930230750f0bd1a55eb08f89bb 


Diff: https://reviews.apache.org/r/71646/diff/3/

Changes: https://reviews.apache.org/r/71646/diff/2-3/


Testing
-------

**make check**

**Variant of `ReservationParam/HierarchicalAllocator__BENCHMARK_WithReservationParam`**
from https://reviews.apache.org/r/71639/ (work in progress) 
shows significant improvement and change from O(number_of_roles^3) to O(number_of_roles^2):
**Before**:
Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 2.08586secs
Average UNRESERVE duration: 51.491561ms
Average RESERVE duration: 52.801438ms

Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 13.8449005secs
Average UNRESERVE duration: 347.624639ms
Average RESERVE duration: 344.620385ms

Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 2.19253121188333mins
Average UNRESERVE duration: 3.285422441secs
Average RESERVE duration: 3.292171194secs

Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
(killed after several minutes)

**After:**
Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 471.781183ms
Average UNRESERVE duration: 12.112223ms
Average RESERVE duration: 11.476835ms

Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 1.022879058secs
Average UNRESERVE duration: 25.53819ms
Average RESERVE duration: 25.605762ms

Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 2.622324521secs
Average UNRESERVE duration: 65.166039ms
Average RESERVE duration: 65.950186ms

Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 8.419455875secs
Average UNRESERVE duration: 209.886948ms
Average RESERVE duration: 211.085845ms

Agent resources size: 3200 (800 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 32.82614382secs
Average UNRESERVE duration: 823.126069ms
Average RESERVE duration: 818.181121ms

Agent resources size: 6400 (1600 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 2.04261335795mins
Average UNRESERVE duration: 3.063538394secs
Average RESERVE duration: 3.064301679secs

**No significant performnce changes in `QuotaParam/BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota`.**

**Before:**

Added 30 agents in 1.175593ms
Added 30 frameworks in 6.829173ms
Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter
Made 36 allocations in 8.294832ms
Made 0 allocation in 3.674923ms

Added 300 agents in 7.860046ms
Added 300 frameworks in 149.743858ms
Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter
Made 350 allocations in 132.796102ms
Made 0 allocation in 107.887758ms

Added 3000 agents in 36.944587ms
Added 3000 frameworks in 10.688501403secs
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
Made 3500 allocations in 12.6020582secs
Made 0 allocation in 9.716229696secs

Added 30 agents in 1.010362ms
Added 30 frameworks in 6.272027ms
Benchmark setup: 30 agents, 30 roles, 30 frameworks, with random sorter
Made 38 allocations in 9.119976ms
Made 0 allocation in 5.460369ms

Added 300 agents in 7.442897ms
Added 300 frameworks in 152.016597ms
Benchmark setup: 300 agents, 300 roles, 300 frameworks, with random sorter
Made 391 allocations in 195.242282ms
Made 0 allocation in 139.638551ms

Added 3000 agents in 36.003028ms
Added 3000 frameworks in 11.203697649secs
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
Made 3856 allocations in 17.807913455secs
Made 0 allocation in 13.524946653secs

**After:**

Added 30 agents in 1.196576ms
Added 30 frameworks in 6.814792ms
Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter
Made 36 allocations in 8.263036ms
Made 0 allocation in 3.947283ms

Added 300 agents in 8.497121ms
Added 300 frameworks in 156.578165ms
Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter
Made 350 allocations in 168.745307ms
Made 0 allocation in 95.505069ms

Added 3000 agents in 38.074525ms
Added 3000 frameworks in 11.249150205secs
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
Made 3500 allocations in 12.772526049secs
Made 0 allocation in 10.132801781secs

Added 30 agents in 799844ns
Added 30 frameworks in 5.8663ms
Benchmark setup: 30 agents, 30 roles, 30 frameworks, with random sorter
Made 38 allocations in 9.612524ms
Made 0 allocation in 5.150924ms

Added 300 agents in 5.560583ms
Added 300 frameworks in 138.469712ms
Benchmark setup: 300 agents, 300 roles, 300 frameworks, with random sorter
Made 391 allocations in 175.021255ms
Made 0 allocation in 138.181869ms

Added 3000 agents in 42.921689ms
Added 3000 frameworks in 10.825018278secs
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
Made 3856 allocations in 15.29232742secs
Made 0 allocation in 14.202057473secs


Thanks,

Andrei Sekretenko