You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@mesos.apache.org by Meng Zhu <mz...@mesosphere.io> on 2019/05/01 01:36:50 UTC

Re: Review Request 70497: Optimize the random sorter with random sampling.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70497/
-----------------------------------------------------------

(Updated April 30, 2019, 6:36 p.m.)

Review request for mesos, Andrei Sekretenko and Benjamin Mahler.

Changes
-------

Rebased and updated benchmark result.

Bugs: MESOS-9725
https://issues.apache.org/jira/browse/MESOS-9725

Repository: mesos

Description
-------

This patch optimizes the random sorter by avoiding
clients shuffling. The sorter now does random sampling
as the caller asks for the next client.

The random sampling works in two steps. Clients with the same
relative weights are grouped together. In the first phase, we
randomly pick a group of clients. This requires the generation
of a weighted random number. In the second phase, a client within
the group is picked. Since all clients in the group have the same
weight, this can be done in constant time. This two step approach
minimizes the cost of generating weighted random number which
could be expensive given the size of the weights.

Diffs (updated)
-----

src/master/allocator/sorter/random/sorter.hpp 55e22d7705f163fe47d5aa47416ee0714c5a87c0
src/master/allocator/sorter/random/sorter.cpp 813f5b55d38dd9fa822de53ee944c3f72251a69d

Diff: https://reviews.apache.org/r/70497/diff/2/

Changes: https://reviews.apache.org/r/70497/diff/1-2/

Testing (updated)
-------

make check

Benchmarking: ran `QuotaParam/BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/5` with optimized build

Before:

Added 3000 agents in 101.884509ms
Added 3000 frameworks in 19.711779311secs
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
Made 3856 allocations in 16.283607645secs
Made 0 allocation in 16.31197771secs

After r/70419, r/70576 and r/70497 :

Added 3000 agents in 87.147084ms
Added 3000 frameworks in 19.246494668secs
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
Made 3872 allocations in 12.230518989secs
Made 0 allocation in 12.012211914secs

Thanks,

Meng Zhu

Re: Review Request 70497: Optimize the random sorter with random sampling.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70497/#review215030
-----------------------------------------------------------



Patch looks great!

Reviews applied: [70419, 70576, 70497]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers --disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On May 1, 2019, 1:36 a.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70497/
> -----------------------------------------------------------
> 
> (Updated May 1, 2019, 1:36 a.m.)
> 
> 
> Review request for mesos, Andrei Sekretenko and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9725
>     https://issues.apache.org/jira/browse/MESOS-9725
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch optimizes the random sorter by avoiding
> clients shuffling. The sorter now does random sampling
> as the caller asks for the next client.
> 
> The random sampling works in two steps. Clients with the same
> relative weights are grouped together. In the first phase, we
> randomly pick a group of clients. This requires the generation
> of a weighted random number. In the second phase, a client within
> the group is picked. Since all clients in the group have the same
> weight, this can be done in constant time. This two step approach
> minimizes the cost of generating weighted random number which
> could be expensive given the size of the weights.
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/sorter/random/sorter.hpp 55e22d7705f163fe47d5aa47416ee0714c5a87c0 
>   src/master/allocator/sorter/random/sorter.cpp 813f5b55d38dd9fa822de53ee944c3f72251a69d 
> 
> 
> Diff: https://reviews.apache.org/r/70497/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> Benchmarking: ran `QuotaParam/BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/5` with optimized build
> 
> Before:
> 
> Added 3000 agents in 101.884509ms
> Added 3000 frameworks in 19.711779311secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
> Made 3856 allocations in 16.283607645secs
> Made 0 allocation in 16.31197771secs
> 
> After r/70419, r/70576 and r/70497 :
> 
> Added 3000 agents in 87.147084ms
> Added 3000 frameworks in 19.246494668secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
> Made 3872 allocations in 12.230518989secs
> Made 0 allocation in 12.012211914secs
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>

Re: Review Request 70497: Optimize the random sorter with random sampling.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70497/#review214975
-----------------------------------------------------------



FAIL: Failed to get dependent review IDs for the current patch.

Failed command: `python.exe D:\DCOS\mesos\mesos\support\get-review-ids.py -r 70497 -o C:\Users\jenkins\AppData\Local\Temp\mesos_dependent_review_ids`

All the build artifacts available at: http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/3305/mesos-review-70497

Relevant logs:

- [get-review-ids.log](http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/3305/mesos-review-70497/logs/get-review-ids.log):

```
Dependent review: https://reviews.apache.org/api/review-requests/70576/
Dependent review: https://reviews.apache.org/api/review-requests/70419/
Dependent review: https://reviews.apache.org/api/review-requests/70418/
The review request 70418 is already submitted
Dependent review: https://reviews.apache.org/api/review-requests/70419/
Traceback (most recent call last):
  File "D:\DCOS\mesos\mesos\support\get-review-ids.py", line 62, in <module>
    main()
  File "D:\DCOS\mesos\mesos\support\get-review-ids.py", line 51, in main
    review_ids = handler.get_dependent_review_ids(review_request)
  File "D:\DCOS\mesos\mesos\support\common.py", line 93, in get_dependent_review_ids
    self._review_ids(review_request, review_ids)
  File "D:\DCOS\mesos\mesos\support\common.py", line 62, in _review_ids
    self._review_ids(dependent_review, review_ids)
  File "D:\DCOS\mesos\mesos\support\common.py", line 62, in _review_ids
    self._review_ids(dependent_review, review_ids)
  File "D:\DCOS\mesos\mesos\support\common.py", line 62, in _review_ids
    self._review_ids(dependent_review, review_ids)
  File "D:\DCOS\mesos\mesos\support\common.py", line 61, in _review_ids
    "field." % review_request["id"])
common.ReviewError: Circular dependency detected for review 70418. Please fix the 'depends_on' field.
```

- Mesos Reviewbot Windows


On May 1, 2019, 1:36 a.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70497/
> -----------------------------------------------------------
> 
> (Updated May 1, 2019, 1:36 a.m.)
> 
> 
> Review request for mesos, Andrei Sekretenko and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9725
>     https://issues.apache.org/jira/browse/MESOS-9725
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch optimizes the random sorter by avoiding
> clients shuffling. The sorter now does random sampling
> as the caller asks for the next client.
> 
> The random sampling works in two steps. Clients with the same
> relative weights are grouped together. In the first phase, we
> randomly pick a group of clients. This requires the generation
> of a weighted random number. In the second phase, a client within
> the group is picked. Since all clients in the group have the same
> weight, this can be done in constant time. This two step approach
> minimizes the cost of generating weighted random number which
> could be expensive given the size of the weights.
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/sorter/random/sorter.hpp 55e22d7705f163fe47d5aa47416ee0714c5a87c0 
>   src/master/allocator/sorter/random/sorter.cpp 813f5b55d38dd9fa822de53ee944c3f72251a69d 
> 
> 
> Diff: https://reviews.apache.org/r/70497/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> Benchmarking: ran `QuotaParam/BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/5` with optimized build
> 
> Before:
> 
> Added 3000 agents in 101.884509ms
> Added 3000 frameworks in 19.711779311secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
> Made 3856 allocations in 16.283607645secs
> Made 0 allocation in 16.31197771secs
> 
> After r/70419, r/70576 and r/70497 :
> 
> Added 3000 agents in 87.147084ms
> Added 3000 frameworks in 19.246494668secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
> Made 3872 allocations in 12.230518989secs
> Made 0 allocation in 12.012211914secs
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>

Re: Review Request 70497: Optimize the random sorter with random sampling.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70497/#review214973
-----------------------------------------------------------



Bad review!

Reviews applied: [70497, 70576, 70419, 70418]

Error:
Circular dependency detected for review 70419.Please fix the 'depends_on' field.

- Mesos Reviewbot


On May 1, 2019, 1:36 a.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70497/
> -----------------------------------------------------------
> 
> (Updated May 1, 2019, 1:36 a.m.)
> 
> 
> Review request for mesos, Andrei Sekretenko and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9725
>     https://issues.apache.org/jira/browse/MESOS-9725
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch optimizes the random sorter by avoiding
> clients shuffling. The sorter now does random sampling
> as the caller asks for the next client.
> 
> The random sampling works in two steps. Clients with the same
> relative weights are grouped together. In the first phase, we
> randomly pick a group of clients. This requires the generation
> of a weighted random number. In the second phase, a client within
> the group is picked. Since all clients in the group have the same
> weight, this can be done in constant time. This two step approach
> minimizes the cost of generating weighted random number which
> could be expensive given the size of the weights.
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/sorter/random/sorter.hpp 55e22d7705f163fe47d5aa47416ee0714c5a87c0 
>   src/master/allocator/sorter/random/sorter.cpp 813f5b55d38dd9fa822de53ee944c3f72251a69d 
> 
> 
> Diff: https://reviews.apache.org/r/70497/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> Benchmarking: ran `QuotaParam/BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/5` with optimized build
> 
> Before:
> 
> Added 3000 agents in 101.884509ms
> Added 3000 frameworks in 19.711779311secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
> Made 3856 allocations in 16.283607645secs
> Made 0 allocation in 16.31197771secs
> 
> After r/70419, r/70576 and r/70497 :
> 
> Added 3000 agents in 87.147084ms
> Added 3000 frameworks in 19.246494668secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
> Made 3872 allocations in 12.230518989secs
> Made 0 allocation in 12.012211914secs
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>

Re: Review Request 70497: Optimize the random sorter with random sampling.

Posted by Benjamin Mahler <bm...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70497/#review214981
-----------------------------------------------------------



This was a pretty nice read, thanks!

Mostly just left some readablity suggestions. The only thing that's a bit worrying is the double precision equality bucketing.


src/master/allocator/sorter/random/sorter.hpp
Line 146 (original), 146 (patched)
<https://reviews.apache.org/r/70497/#comment301264>

    How about `clientsByWeight()` here and elsewhere?



src/master/allocator/sorter/random/sorter.hpp
Line 164 (original), 163 (patched)
<https://reviews.apache.org/r/70497/#comment301263>

    double as a map key scares me.. :)
    
    the double multiplication to produce these keys is a bit concerning, I wonder if we should do something to keep only a significant number of digits, or CHECK that all nodes at same level with same ancestor weights are in the same bucket (but.. if that check fails we're screwed)



src/master/allocator/sorter/random/sorter.cpp
Lines 629 (patched)
<https://reviews.apache.org/r/70497/#comment301253>

    Won't show up in benchmarks since these will be 1 item vectors since we don't benchmark mixed weights, especially since they're usually small, but might as well reserve:
    
    weights.reserve(weightToClients.size());
    clients.reserve(weightToClients.size());



src/master/allocator/sorter/random/sorter.cpp
Lines 634 (patched)
<https://reviews.apache.org/r/70497/#comment301262>

    Can you do this in the initializer list?



src/master/allocator/sorter/random/sorter.cpp
Lines 647-648 (patched)
<https://reviews.apache.org/r/70497/#comment301255>

    Can we frame step 1 here and below as choosing a bucket? Bucket seems to convey to the reader that they are bucketed by weight.



src/master/allocator/sorter/random/sorter.cpp
Lines 650-652 (patched)
<https://reviews.apache.org/r/70497/#comment301256>

    Ditto here, I think this would read better if it talks about the chosen bucket rather than vectors of clients and clients[i] vectors?
    
    vector is the container but bucket seems like the logical way to think about it, clients are bucketed by weight



src/master/allocator/sorter/random/sorter.cpp
Lines 658-661 (patched)
<https://reviews.apache.org/r/70497/#comment301258>

    How about:
    
    ```
      // Step 1: Pick a client bucket.
      size_t bucketIndex =
        std::discrete_distribution<>(weightBuckets.begin(), weightBuckets.end())(*generator);
        
      vector<Node*>& clientBucket = clientBuckets[bucketIndex];
      
      // Step 2: Pick a client from the bucket.
      size_t clientIndex =
        std::uniform_int_distribution<>(0, clientBucket.size() - 1)(*generator);
        
      const Node* client = clientBucket[clientIndex];
    ```



src/master/allocator/sorter/random/sorter.cpp
Lines 675 (patched)
<https://reviews.apache.org/r/70497/#comment301259>

    maybe this is more readable since it shows the swap-with-last intent more clearly?
    
    ```
    std::swap(clientBucket[clientIndex], clientBucket[clientBucket.size() - 1]);
    clientBucket.pop_back();
    ```
    
    There won't be any efficiency difference either from the current code which avoids moving the item into the last position.



src/master/allocator/sorter/random/sorter.cpp
Lines 678-684 (patched)
<https://reviews.apache.org/r/70497/#comment301260>

    Ditto here with std::swap to show the swap-with-last intent, note that it will std::move them so won't be any less efficient.



src/master/allocator/sorter/random/sorter.cpp
Lines 686 (patched)
<https://reviews.apache.org/r/70497/#comment301261>

    hm.. you should be able to just do:
    
    ```
    return cref(chosenClient->clientPath());
    ```
    
    https://en.cppreference.com/w/cpp/utility/functional/ref


- Benjamin Mahler


On May 1, 2019, 1:36 a.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70497/
> -----------------------------------------------------------
> 
> (Updated May 1, 2019, 1:36 a.m.)
> 
> 
> Review request for mesos, Andrei Sekretenko and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9725
>     https://issues.apache.org/jira/browse/MESOS-9725
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch optimizes the random sorter by avoiding
> clients shuffling. The sorter now does random sampling
> as the caller asks for the next client.
> 
> The random sampling works in two steps. Clients with the same
> relative weights are grouped together. In the first phase, we
> randomly pick a group of clients. This requires the generation
> of a weighted random number. In the second phase, a client within
> the group is picked. Since all clients in the group have the same
> weight, this can be done in constant time. This two step approach
> minimizes the cost of generating weighted random number which
> could be expensive given the size of the weights.
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/sorter/random/sorter.hpp 55e22d7705f163fe47d5aa47416ee0714c5a87c0 
>   src/master/allocator/sorter/random/sorter.cpp 813f5b55d38dd9fa822de53ee944c3f72251a69d 
> 
> 
> Diff: https://reviews.apache.org/r/70497/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> Benchmarking: ran `QuotaParam/BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/5` with optimized build
> 
> Before:
> 
> Added 3000 agents in 101.884509ms
> Added 3000 frameworks in 19.711779311secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
> Made 3856 allocations in 16.283607645secs
> Made 0 allocation in 16.31197771secs
> 
> After r/70419, r/70576 and r/70497 :
> 
> Added 3000 agents in 87.147084ms
> Added 3000 frameworks in 19.246494668secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
> Made 3872 allocations in 12.230518989secs
> Made 0 allocation in 12.012211914secs
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>