You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@mesos.apache.org by Alexander Rukletsov <ru...@gmail.com> on 2018/07/31 13:39:56 UTC

Review Request 68132: Batch '/state' requests on Master.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/
-----------------------------------------------------------

Review request for mesos, Benno Evers and Benjamin Mahler.


Bugs: MESOS-9122
    https://issues.apache.org/jira/browse/MESOS-9122


Repository: mesos


Description
-------

With this patch handlers for '/state' requests are not scheduled
directly after authorization, but are accumulated and then scheduled
for later parallel processing.

This approach allows, if there are N '/state' requests in the Master's
mailbox and T is the request response time, to block the Master actor
only once for time O(T) instead of blocking it for time N*T prior to
this patch.

This batching technique reduces both the time Master is spending
answering '/state' requests and the average request responce time
in presence of multiple requests in the Master's mailbox. However,
for seldom '/state' requests that don't accumulate in the Master's
mailbox, the response time might increase due to an added trip
through the mailbox.


Diffs
-----

  src/master/http.cpp 6947031da3ce3523408d69d6dac60551a39a4601 
  src/master/master.hpp 0353d550308816f219aedb6afe15c643fc8bb340 
  src/master/master.cpp 2af976f7ea7f81d4b06a45ce13286dbd61b9b144 


Diff: https://reviews.apache.org/r/68132/diff/1/


Testing
-------

`make check` on Mac OS 10.13.5 and various Linux distros.

Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark.

**Without this patch**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 1.102349605secs, 10 responses are in [2.662342ms, 2.143755433secs]
'/state' response on average took 1.549122019secs, 10 responses are in [494.278454ms, 2.633971927secs]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 18.436968137secs, 10 responses are in [2.578238ms, 33.210561732secs]
'/state' response on average took 23.916379537secs, 10 responses are in [5.170660597secs, 43.008091744secs]
```

**With this patch**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 417.211022ms, 10 responses are in [4.066901ms, 728.045442ms]
'/state' response on average took 830.351291ms, 10 responses are in [459.033455ms, 1.208880892secs]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 5.439950928secs, 10 responses are in [3.246906ms, 9.343994388secs]
'/state' response on average took 16.764607823secs, 10 responses are in [4.980333091secs, 18.461983916secs]
```


Thanks,

Alexander Rukletsov

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Alexander Rukletsov <ru...@gmail.com>.


> On Aug. 1, 2018, 12:42 a.m., Benjamin Mahler wrote:
> > A couple of comments on the benchmark information before looking at the code, these probably belong on the previous review, but since the numbers are only shown in this one I'll leave these here:
> > 
> > * Can we compare percentiles (e.g. min, q1, q3, max) across the approaches instead of averages? i.e. how much better does min,q1,q3,max get? Averages are generally a poor fit for performance data because it doesn't tell us about the distribution (e.g. if we make p90 3x worse for a 10% benefit to average that's not ok), we can include p50 if we're interested in the half-way point.
> > * Can you include the cpu model of the box you ran this on? I'm interested in how many physical/virtual cores there are.
> > * Can you also include the regular state query benchmark measurements to make sure we're not regressing too much on the single request case? (no need to get the non-optimized build numbers).
> > * Some of the numbers don't look very good, e.g. Before `[min: 1.578161651secs, max: 8.789315237secs]` After: `[4.047655443secs, 6.00752698secs]`. Can we see the distribution here? Do you understand exactly why the lowest measurement is so much higher? Looking at the non-optimized numbers, the minimum didn't get worse? Is the data highly variable between runs?
> > * Can you also include perf data for the optimized run? http://mesos.apache.org/documentation/latest/performance-profiling/

* Sure. Updated https://reviews.apache.org/r/68131/ (see also preparatory work in https://reviews.apache.org/r/68224/ and https://reviews.apache.org/r/68225/).
* Done.
* Will do, stay tuned.
* This is fine and expected. Due to the batching there is an extra defer in the master queue, which affects the response time of the first request. Nothing to worry about, IMO.
* Will do, stay tuned.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/#review206711
-----------------------------------------------------------


On Aug. 6, 2018, 10:30 a.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68132/
> -----------------------------------------------------------
> 
> (Updated Aug. 6, 2018, 10:30 a.m.)
> 
> 
> Review request for mesos, Benno Evers and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9122
>     https://issues.apache.org/jira/browse/MESOS-9122
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> With this patch handlers for '/state' requests are not scheduled
> directly after authorization, but are accumulated and then scheduled
> for later parallel processing.
> 
> This approach allows, if there are N '/state' requests in the Master's
> mailbox and T is the request response time, to block the Master actor
> only once for time O(T) instead of blocking it for time N*T prior to
> this patch.
> 
> This batching technique reduces both the time Master is spending
> answering '/state' requests and the average request response time
> in presence of multiple requests in the Master's mailbox. However,
> for seldom '/state' requests that don't accumulate in the Master's
> mailbox, the response time might increase due to an added trip
> through the mailbox.
> 
> The change preserves the read-your-writes consistency model.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp d43fbd689598612ec5946b46e2fa5e7f5e22cfa8 
>   src/master/master.hpp 209b998db8d2bad7a3812df44f0939458f48eb11 
> 
> 
> Diff: https://reviews.apache.org/r/68132/diff/2/
> 
> 
> Testing
> -------
> 
> `make check` on Mac OS 10.13.5 and various Linux distros.
> 
> Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark?
> 
> **Setup**
> Processor: Intel i7-4980HQ 2.8 GHz with 6 MB on-chip L3 cache and 128 MB L4 cache (Crystalwell)
> Total Number of Cores: 4
> Total Number of Cores: 8
> L2 Cache (per Core): 256 KB  
> 
> Average improvement without optimization: 62%–70%.
> Average improvement with optimization: 17%–62%.
> 
> **[No batching, no optimization](https://dobianchi.files.wordpress.com/2011/11/no-barrique-no-berlusconi.jpg?w=638)**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.102349605secs, 10 responses are in [2.662342ms, 2.143755433secs]
> '/state' response on average took 1.549122019secs, 10 responses are in [494.278454ms, 2.633971927secs]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 18.436968137secs, 10 responses are in [2.578238ms, 33.210561732secs]
> '/state' response on average took 23.916379537secs, 10 responses are in [5.170660597secs, 43.008091744secs]
> ```
> 
> **With batching but no optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 417.211022ms, 10 responses are in [4.066901ms, 728.045442ms]
> '/state' response on average took 830.351291ms, 10 responses are in [459.033455ms, 1.208880892secs]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 5.439950928secs, 10 responses are in [3.246906ms, 9.343994388secs]
> '/state' response on average took 16.764607823secs, 10 responses are in [4.980333091secs, 18.461983916secs]
> ```
> 
> **No batching but `-O3` optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 2.396221ms, 10 responses are in [1.628583ms, 2.816639ms]
> '/state' response on average took 113.469574ms, 10 responses are in [104.218099ms, 134.477062ms]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 3.892615876secs, 10 responses are in [2.480517ms, 7.630934838secs]
> '/state' response on average took 5.205245306secs, 10 responses are in [1.578161651secs, 8.789315237secs]
> ```
> 
> **Batching and `-O3` optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.973573ms, 10 responses are in [1.221193ms, 2.694713ms]
> '/state' response on average took 113.331551ms, 10 responses are in [102.593397ms, 142.028555ms]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.475842691secs, 10 responses are in [2.437217ms, 3.815589561secs]
> '/state' response on average took 4.742303751secs, 10 responses are in [4.047655443secs, 6.00752698secs]
> ```
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Benjamin Mahler <bm...@apache.org>.


> On Aug. 1, 2018, 12:42 a.m., Benjamin Mahler wrote:
> > A couple of comments on the benchmark information before looking at the code, these probably belong on the previous review, but since the numbers are only shown in this one I'll leave these here:
> > 
> > * Can we compare percentiles (e.g. min, q1, q3, max) across the approaches instead of averages? i.e. how much better does min,q1,q3,max get? Averages are generally a poor fit for performance data because it doesn't tell us about the distribution (e.g. if we make p90 3x worse for a 10% benefit to average that's not ok), we can include p50 if we're interested in the half-way point.
> > * Can you include the cpu model of the box you ran this on? I'm interested in how many physical/virtual cores there are.
> > * Can you also include the regular state query benchmark measurements to make sure we're not regressing too much on the single request case? (no need to get the non-optimized build numbers).
> > * Some of the numbers don't look very good, e.g. Before `[min: 1.578161651secs, max: 8.789315237secs]` After: `[4.047655443secs, 6.00752698secs]`. Can we see the distribution here? Do you understand exactly why the lowest measurement is so much higher? Looking at the non-optimized numbers, the minimum didn't get worse? Is the data highly variable between runs?
> > * Can you also include perf data for the optimized run? http://mesos.apache.org/documentation/latest/performance-profiling/
> 
> Alexander Rukletsov wrote:
>     * Sure. Updated https://reviews.apache.org/r/68131/ (see also preparatory work in https://reviews.apache.org/r/68224/ and https://reviews.apache.org/r/68225/).
>     * Done.
>     * Will do, stay tuned.
>     * This is fine and expected. Due to the batching there is an extra defer in the master queue, which affects the response time of the first request. Nothing to worry about, IMO.
>     * Will do, stay tuned.

> This is fine and expected. Due to the batching there is an extra defer in the master queue, which affects the response time of the first request. Nothing to worry about, IMO.

Looking at the numbers for the single request benchmark case (thanks for posting those), the batching overhead only seems to be 0.15% or 150ms (10.946s -> 11.096s). This means that it's not just the extra defer that's causing an 85% or 1.5 second slowdown (1.512s -> 2.820s) in the minimum request processing time. It must be something else, like the request is now ending up in a batch (probably with more than 4 requests so that we're entering hyperthreading territory). Let's make sure we understand why it's happening.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/#review206711
-----------------------------------------------------------


On Aug. 7, 2018, 12:11 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68132/
> -----------------------------------------------------------
> 
> (Updated Aug. 7, 2018, 12:11 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9122
>     https://issues.apache.org/jira/browse/MESOS-9122
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> With this patch handlers for '/state' requests are not scheduled
> directly after authorization, but are accumulated and then scheduled
> for later parallel processing.
> 
> This approach allows, if there are N '/state' requests in the Master's
> mailbox and T is the request response time, to block the Master actor
> only once for time O(T) instead of blocking it for time N*T prior to
> this patch.
> 
> This batching technique reduces both the time Master is spending
> answering '/state' requests and the average request response time
> in presence of multiple requests in the Master's mailbox. However,
> for seldom '/state' requests that don't accumulate in the Master's
> mailbox, the response time might increase due to an added trip
> through the mailbox.
> 
> The change preserves the read-your-writes consistency model.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp d43fbd689598612ec5946b46e2fa5e7f5e22cfa8 
>   src/master/master.hpp 209b998db8d2bad7a3812df44f0939458f48eb11 
> 
> 
> Diff: https://reviews.apache.org/r/68132/diff/2/
> 
> 
> Testing
> -------
> 
> `make check` on Mac OS 10.13.5 and various Linux distros.
> 
> Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark and `MasterStateQuery_BENCHMARK_Test.GetState`, see below.
> 
> **Setup**
> Processor: Intel i7-4980HQ 2.8 GHz with 6 MB on-chip L3 cache and 128 MB L4 cache (Crystalwell)
> Total Number of Cores: 4
> Total Number of Cores: 8
> L2 Cache (per Core): 256 KB  
> 
> Compiler: Apple LLVM version 9.1.0 (clang-902.0.39.2)
> Optimization: -O2
> 
> **MasterStateQuery_BENCHMARK_Test.GetState, v0 '/state' response time**
> 
> setup                                                    | no batching | batching
> ---------------------------------------------------------|-------------|----------
>  1000 agents,  10000 running, and  10000 completed tasks | 146.496ms   | 158.319ms
> 10000 agents, 100000 running, and 100000 completed tasks | 1.795s      | 1.899s
> 20000 agents, 200000 running, and 200000 completed tasks | 3.742s      | 4.427s
> 40000 agents, 400000 running, and 400000 completed tasks | 10.946s     | 11.096s
> 
> **MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 1**
> Test setup 1: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 50 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **50 measurements** per endpoint.
> 
> /flags | no batching | batching       /state | no batching | batching
> -------------------------------   *   --------------------------------
>    min |  1.598ms    | 1.475ms           min | 100.627ms   | 105.383ms
>    p25 |  2.370ms    | 2.452ms           p25 | 102.206ms   | 107.184ms
>    p50 |  2.520ms    | 2.562ms           p50 | 103.213ms   | 108.468ms
>    p75 |  2.623ms    | 2.665ms           p75 | 104.100ms   | 109.808ms
>    p90 |  2.803ms    | 2.731ms           p90 | 106.079ms   | 111.043ms
>    max | 84.957ms    | 2.934ms           max | 153.438ms   | 154.636ms
> 
> **MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 2**
> Test setup 2: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **10 measurements** per endpoint.
> 
> /flags | no batching | batching       /state | no batching | batching
> --------------------------------  *   -------------------------------
>    min | 2.309ms     |   1.579ms         min | 1.512s      | 2.820s
>    p25 | 1.547s      | 373.609ms         p25 | 3.262s      | 3.588s
>    p50 | 3.189s      | 831.261ms         p50 | 5.052s      | 4.253s
>    p75 | 5.346s      |   2.215s          p75 | 6.846s      | 4.510s
>    p90 | 5.854s      |   2.351s          p90 | 7.883s      | 4.705s
>    max | 7.237s      |   2.444s          max | 8.517s      | 4.934s
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Benjamin Mahler <bm...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/#review206711
-----------------------------------------------------------



A couple of comments on the benchmark information before looking at the code, these probably belong on the previous review, but since the numbers are only shown in this one I'll leave these here:

* Can we compare percentiles (e.g. min, q1, q3, max) across the approaches instead of averages? i.e. how much better does min,q1,q3,max get? Averages are generally a poor fit for performance data because it doesn't tell us about the distribution (e.g. if we make p90 3x worse for a 10% benefit to average that's not ok), we can include p50 if we're interested in the half-way point.
* Can you include the cpu model of the box you ran this on? I'm interested in how many physical/virtual cores there are.
* Can you also include the regular state query benchmark measurements to make sure we're not regressing too much on the single request case? (no need to get the non-optimized build numbers).
* Some of the numbers don't look very good, e.g. Before `[min: 1.578161651secs, max: 8.789315237secs]` After: `[4.047655443secs, 6.00752698secs]`. Can we see the distribution here? Do you understand exactly why the lowest measurement is so much higher? Looking at the non-optimized numbers, the minimum didn't get worse? Is the data highly variable between runs?
* Can you also include perf data for the optimized run? http://mesos.apache.org/documentation/latest/performance-profiling/

- Benjamin Mahler


On July 31, 2018, 5:24 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68132/
> -----------------------------------------------------------
> 
> (Updated July 31, 2018, 5:24 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9122
>     https://issues.apache.org/jira/browse/MESOS-9122
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> With this patch handlers for '/state' requests are not scheduled
> directly after authorization, but are accumulated and then scheduled
> for later parallel processing.
> 
> This approach allows, if there are N '/state' requests in the Master's
> mailbox and T is the request response time, to block the Master actor
> only once for time O(T) instead of blocking it for time N*T prior to
> this patch.
> 
> This batching technique reduces both the time Master is spending
> answering '/state' requests and the average request response time
> in presence of multiple requests in the Master's mailbox. However,
> for seldom '/state' requests that don't accumulate in the Master's
> mailbox, the response time might increase due to an added trip
> through the mailbox.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 6947031da3ce3523408d69d6dac60551a39a4601 
>   src/master/master.hpp 0353d550308816f219aedb6afe15c643fc8bb340 
>   src/master/master.cpp 2af976f7ea7f81d4b06a45ce13286dbd61b9b144 
> 
> 
> Diff: https://reviews.apache.org/r/68132/diff/1/
> 
> 
> Testing
> -------
> 
> `make check` on Mac OS 10.13.5 and various Linux distros.
> 
> Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark.
> 
> Average improvement without optimization: 62%–70%.
> Average improvement with optimization: 17%–62%.
> 
> **[No batching, no optimization](https://dobianchi.files.wordpress.com/2011/11/no-barrique-no-berlusconi.jpg?w=638)**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.102349605secs, 10 responses are in [2.662342ms, 2.143755433secs]
> '/state' response on average took 1.549122019secs, 10 responses are in [494.278454ms, 2.633971927secs]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 18.436968137secs, 10 responses are in [2.578238ms, 33.210561732secs]
> '/state' response on average took 23.916379537secs, 10 responses are in [5.170660597secs, 43.008091744secs]
> ```
> 
> **With batching but no optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 417.211022ms, 10 responses are in [4.066901ms, 728.045442ms]
> '/state' response on average took 830.351291ms, 10 responses are in [459.033455ms, 1.208880892secs]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 5.439950928secs, 10 responses are in [3.246906ms, 9.343994388secs]
> '/state' response on average took 16.764607823secs, 10 responses are in [4.980333091secs, 18.461983916secs]
> ```
> 
> **No batching but `-O3` optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 2.396221ms, 10 responses are in [1.628583ms, 2.816639ms]
> '/state' response on average took 113.469574ms, 10 responses are in [104.218099ms, 134.477062ms]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 3.892615876secs, 10 responses are in [2.480517ms, 7.630934838secs]
> '/state' response on average took 5.205245306secs, 10 responses are in [1.578161651secs, 8.789315237secs]
> ```
> 
> **Batching and `-O3` optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.973573ms, 10 responses are in [1.221193ms, 2.694713ms]
> '/state' response on average took 113.331551ms, 10 responses are in [102.593397ms, 142.028555ms]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.475842691secs, 10 responses are in [2.437217ms, 3.815589561secs]
> '/state' response on average took 4.742303751secs, 10 responses are in [4.047655443secs, 6.00752698secs]
> ```
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Alexander Rukletsov <ru...@gmail.com>.


> On Aug. 1, 2018, 1:59 a.m., Benjamin Mahler wrote:
> > src/master/http.cpp
> > Line 2812 (original), 2815 (patched)
> > <https://reviews.apache.org/r/68132/diff/1/?file=2065562#file2065562line2815>
> >
> >     Can we keep the existing name? I believe the idea is to have them match the path, so "/state" -> Http::state seems ideal as is.
> 
> Alexander Rukletsov wrote:
>     No : ). I would like readers to understand that this endpoint is somewhat different from the rest. Why not reflecting it in the name?

Jokes aside, I'm in favour of the second scheme you have suggested:
```
state
deferStateRequest
processStateRequestBatch
```


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/#review206714
-----------------------------------------------------------


On Aug. 3, 2018, 1:46 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68132/
> -----------------------------------------------------------
> 
> (Updated Aug. 3, 2018, 1:46 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9122
>     https://issues.apache.org/jira/browse/MESOS-9122
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> With this patch handlers for '/state' requests are not scheduled
> directly after authorization, but are accumulated and then scheduled
> for later parallel processing.
> 
> This approach allows, if there are N '/state' requests in the Master's
> mailbox and T is the request response time, to block the Master actor
> only once for time O(T) instead of blocking it for time N*T prior to
> this patch.
> 
> This batching technique reduces both the time Master is spending
> answering '/state' requests and the average request response time
> in presence of multiple requests in the Master's mailbox. However,
> for seldom '/state' requests that don't accumulate in the Master's
> mailbox, the response time might increase due to an added trip
> through the mailbox.
> 
> The change preserves the read-your-writes consistency model.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 6947031da3ce3523408d69d6dac60551a39a4601 
>   src/master/master.hpp 0353d550308816f219aedb6afe15c643fc8bb340 
>   src/master/master.cpp 2af976f7ea7f81d4b06a45ce13286dbd61b9b144 
> 
> 
> Diff: https://reviews.apache.org/r/68132/diff/1/
> 
> 
> Testing
> -------
> 
> `make check` on Mac OS 10.13.5 and various Linux distros.
> 
> Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark?
> 
> **Setup**
> Processor: Intel i7-4980HQ 2.8 GHz with 6 MB on-chip L3 cache and 128 MB L4 cache (Crystalwell)
> Total Number of Cores: 4
> Total Number of Cores: 8
> L2 Cache (per Core): 256 KB  
> 
> Average improvement without optimization: 62%–70%.
> Average improvement with optimization: 17%–62%.
> 
> **[No batching, no optimization](https://dobianchi.files.wordpress.com/2011/11/no-barrique-no-berlusconi.jpg?w=638)**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.102349605secs, 10 responses are in [2.662342ms, 2.143755433secs]
> '/state' response on average took 1.549122019secs, 10 responses are in [494.278454ms, 2.633971927secs]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 18.436968137secs, 10 responses are in [2.578238ms, 33.210561732secs]
> '/state' response on average took 23.916379537secs, 10 responses are in [5.170660597secs, 43.008091744secs]
> ```
> 
> **With batching but no optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 417.211022ms, 10 responses are in [4.066901ms, 728.045442ms]
> '/state' response on average took 830.351291ms, 10 responses are in [459.033455ms, 1.208880892secs]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 5.439950928secs, 10 responses are in [3.246906ms, 9.343994388secs]
> '/state' response on average took 16.764607823secs, 10 responses are in [4.980333091secs, 18.461983916secs]
> ```
> 
> **No batching but `-O3` optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 2.396221ms, 10 responses are in [1.628583ms, 2.816639ms]
> '/state' response on average took 113.469574ms, 10 responses are in [104.218099ms, 134.477062ms]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 3.892615876secs, 10 responses are in [2.480517ms, 7.630934838secs]
> '/state' response on average took 5.205245306secs, 10 responses are in [1.578161651secs, 8.789315237secs]
> ```
> 
> **Batching and `-O3` optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.973573ms, 10 responses are in [1.221193ms, 2.694713ms]
> '/state' response on average took 113.331551ms, 10 responses are in [102.593397ms, 142.028555ms]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.475842691secs, 10 responses are in [2.437217ms, 3.815589561secs]
> '/state' response on average took 4.742303751secs, 10 responses are in [4.047655443secs, 6.00752698secs]
> ```
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Alexander Rukletsov <ru...@gmail.com>.


> On Aug. 1, 2018, 1:59 a.m., Benjamin Mahler wrote:
> > src/master/http.cpp
> > Lines 2887-2889 (original), 2849-2851 (patched)
> > <https://reviews.apache.org/r/68132/diff/1/?file=2065562#file2065562line2890>
> >
> >     Can you move in the request and the promise (without Owned) here? (the lambda will need to be `mutable` for request to be moved here).

I'm not sure I can move the request since it is passed by const ref from the very beginning, but I can surely do it for the promise, if you fancy.


> On Aug. 1, 2018, 1:59 a.m., Benjamin Mahler wrote:
> > src/master/http.cpp
> > Lines 5319 (patched)
> > <https://reviews.apache.org/r/68132/diff/1/?file=2065562#file2065562line5441>
> >
> >     This makes batchedRequest not so const :), might as well have it come in as a `BatchedRequest&&` unless `process::async` doesn't support moving yet?

I'm not sure I understand your suggestion here.


> On Aug. 1, 2018, 1:59 a.m., Benjamin Mahler wrote:
> > src/master/http.cpp
> > Lines 5327 (patched)
> > <https://reviews.apache.org/r/68132/diff/1/?file=2065562#file2065562line5449>
> >
> >     Can we move the request in here? If async doesn't support it, can you add a TODO?

Do you think it will impact performance? I think passing the request by const ref is fine, and difinitely safer than moving parts of a struct.


> On Aug. 1, 2018, 1:59 a.m., Benjamin Mahler wrote:
> > src/master/master.hpp
> > Lines 1842 (patched)
> > <https://reviews.apache.org/r/68132/diff/1/?file=2065563#file2065563line1842>
> >
> >     Can we avoid Owned and std::move this struct instead of copying it?

Not sure I understand which struct you suggest to move.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/#review206714
-----------------------------------------------------------


On Aug. 3, 2018, 1:46 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68132/
> -----------------------------------------------------------
> 
> (Updated Aug. 3, 2018, 1:46 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9122
>     https://issues.apache.org/jira/browse/MESOS-9122
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> With this patch handlers for '/state' requests are not scheduled
> directly after authorization, but are accumulated and then scheduled
> for later parallel processing.
> 
> This approach allows, if there are N '/state' requests in the Master's
> mailbox and T is the request response time, to block the Master actor
> only once for time O(T) instead of blocking it for time N*T prior to
> this patch.
> 
> This batching technique reduces both the time Master is spending
> answering '/state' requests and the average request response time
> in presence of multiple requests in the Master's mailbox. However,
> for seldom '/state' requests that don't accumulate in the Master's
> mailbox, the response time might increase due to an added trip
> through the mailbox.
> 
> The change preserves the read-your-writes consistency model.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 6947031da3ce3523408d69d6dac60551a39a4601 
>   src/master/master.hpp 0353d550308816f219aedb6afe15c643fc8bb340 
>   src/master/master.cpp 2af976f7ea7f81d4b06a45ce13286dbd61b9b144 
> 
> 
> Diff: https://reviews.apache.org/r/68132/diff/1/
> 
> 
> Testing
> -------
> 
> `make check` on Mac OS 10.13.5 and various Linux distros.
> 
> Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark?
> 
> **Setup**
> Processor: Intel i7-4980HQ 2.8 GHz with 6 MB on-chip L3 cache and 128 MB L4 cache (Crystalwell)
> Total Number of Cores: 4
> Total Number of Cores: 8
> L2 Cache (per Core): 256 KB  
> 
> Average improvement without optimization: 62%–70%.
> Average improvement with optimization: 17%–62%.
> 
> **[No batching, no optimization](https://dobianchi.files.wordpress.com/2011/11/no-barrique-no-berlusconi.jpg?w=638)**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.102349605secs, 10 responses are in [2.662342ms, 2.143755433secs]
> '/state' response on average took 1.549122019secs, 10 responses are in [494.278454ms, 2.633971927secs]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 18.436968137secs, 10 responses are in [2.578238ms, 33.210561732secs]
> '/state' response on average took 23.916379537secs, 10 responses are in [5.170660597secs, 43.008091744secs]
> ```
> 
> **With batching but no optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 417.211022ms, 10 responses are in [4.066901ms, 728.045442ms]
> '/state' response on average took 830.351291ms, 10 responses are in [459.033455ms, 1.208880892secs]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 5.439950928secs, 10 responses are in [3.246906ms, 9.343994388secs]
> '/state' response on average took 16.764607823secs, 10 responses are in [4.980333091secs, 18.461983916secs]
> ```
> 
> **No batching but `-O3` optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 2.396221ms, 10 responses are in [1.628583ms, 2.816639ms]
> '/state' response on average took 113.469574ms, 10 responses are in [104.218099ms, 134.477062ms]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 3.892615876secs, 10 responses are in [2.480517ms, 7.630934838secs]
> '/state' response on average took 5.205245306secs, 10 responses are in [1.578161651secs, 8.789315237secs]
> ```
> 
> **Batching and `-O3` optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.973573ms, 10 responses are in [1.221193ms, 2.694713ms]
> '/state' response on average took 113.331551ms, 10 responses are in [102.593397ms, 142.028555ms]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.475842691secs, 10 responses are in [2.437217ms, 3.815589561secs]
> '/state' response on average took 4.742303751secs, 10 responses are in [4.047655443secs, 6.00752698secs]
> ```
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Alexander Rukletsov <ru...@gmail.com>.


> On Aug. 1, 2018, 1:59 a.m., Benjamin Mahler wrote:
> > Overall the approach looks good, didn't see any bugs, so just minor comments below.
> > 
> > * I'm not sure where to put it but it seems we need a TODO to de-duplicate response processing when the principal is identical? E.g. if "ben" asks for state three times in one batch, ideally we only compute the response for "ben" once since they're all identical within a principal?
> > * Can you document the consistency model in the description?

* yes
* yes


> On Aug. 1, 2018, 1:59 a.m., Benjamin Mahler wrote:
> > src/master/http.cpp
> > Line 2812 (original), 2815 (patched)
> > <https://reviews.apache.org/r/68132/diff/1/?file=2065562#file2065562line2815>
> >
> >     Can we keep the existing name? I believe the idea is to have them match the path, so "/state" -> Http::state seems ideal as is.

No : ). I would like readers to understand that this endpoint is somewhat different from the rest. Why not reflecting it in the name?


> On Aug. 1, 2018, 1:59 a.m., Benjamin Mahler wrote:
> > src/master/http.cpp
> > Lines 5323-5328 (patched)
> > <https://reviews.apache.org/r/68132/diff/1/?file=2065562#file2065562line5445>
> >
> >     It seems a little odd to have the lambda have to know about the batch struct and do promise setting, instead of just returning the Response:
> >     
> >     ```
> >     auto response = [this](Owned<ObjectApprovers> approvers) {
> >       ...
> >       
> >       return http::OK(...);
> >     }
> >     ```
> >     
> >     Then this code here is the one that deals with promise setting, e.g.
> >     
> >     ```
> >       // Fire off the workers.
> >       foreach (const BatchedStateRequest& request, batchedStateRequests) {
> >         request.promise.associate(process::async(response, request.approvers));
> >       }
> >       
> >       // Wait for all responses to transition.
> >       vector<Future<Response>> responses;
> >       foreach (const BatchedStateRequest& request, batchedStateRequests) {
> >         responses.push_back(request.promise.future());
> >       }
> >       process::await(responses).await();
> >     ```
> >     
> >     This lets us keep the response lambda agnostic of batching and we could more cleanly move it up in the future.

The reason I did so was to avoid carrying around a heavy `Http::OK` object. However, I see the sentiment and will change the code.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/#review206714
-----------------------------------------------------------


On Aug. 3, 2018, 1:46 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68132/
> -----------------------------------------------------------
> 
> (Updated Aug. 3, 2018, 1:46 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9122
>     https://issues.apache.org/jira/browse/MESOS-9122
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> With this patch handlers for '/state' requests are not scheduled
> directly after authorization, but are accumulated and then scheduled
> for later parallel processing.
> 
> This approach allows, if there are N '/state' requests in the Master's
> mailbox and T is the request response time, to block the Master actor
> only once for time O(T) instead of blocking it for time N*T prior to
> this patch.
> 
> This batching technique reduces both the time Master is spending
> answering '/state' requests and the average request response time
> in presence of multiple requests in the Master's mailbox. However,
> for seldom '/state' requests that don't accumulate in the Master's
> mailbox, the response time might increase due to an added trip
> through the mailbox.
> 
> The change preserves the read-your-writes consistency model.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 6947031da3ce3523408d69d6dac60551a39a4601 
>   src/master/master.hpp 0353d550308816f219aedb6afe15c643fc8bb340 
>   src/master/master.cpp 2af976f7ea7f81d4b06a45ce13286dbd61b9b144 
> 
> 
> Diff: https://reviews.apache.org/r/68132/diff/1/
> 
> 
> Testing
> -------
> 
> `make check` on Mac OS 10.13.5 and various Linux distros.
> 
> Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark?
> 
> **Setup**
> Processor: Intel i7-4980HQ 2.8 GHz with 6 MB on-chip L3 cache and 128 MB L4 cache (Crystalwell)
> Total Number of Cores: 4
> Total Number of Cores: 8
> L2 Cache (per Core): 256 KB  
> 
> Average improvement without optimization: 62%–70%.
> Average improvement with optimization: 17%–62%.
> 
> **[No batching, no optimization](https://dobianchi.files.wordpress.com/2011/11/no-barrique-no-berlusconi.jpg?w=638)**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.102349605secs, 10 responses are in [2.662342ms, 2.143755433secs]
> '/state' response on average took 1.549122019secs, 10 responses are in [494.278454ms, 2.633971927secs]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 18.436968137secs, 10 responses are in [2.578238ms, 33.210561732secs]
> '/state' response on average took 23.916379537secs, 10 responses are in [5.170660597secs, 43.008091744secs]
> ```
> 
> **With batching but no optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 417.211022ms, 10 responses are in [4.066901ms, 728.045442ms]
> '/state' response on average took 830.351291ms, 10 responses are in [459.033455ms, 1.208880892secs]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 5.439950928secs, 10 responses are in [3.246906ms, 9.343994388secs]
> '/state' response on average took 16.764607823secs, 10 responses are in [4.980333091secs, 18.461983916secs]
> ```
> 
> **No batching but `-O3` optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 2.396221ms, 10 responses are in [1.628583ms, 2.816639ms]
> '/state' response on average took 113.469574ms, 10 responses are in [104.218099ms, 134.477062ms]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 3.892615876secs, 10 responses are in [2.480517ms, 7.630934838secs]
> '/state' response on average took 5.205245306secs, 10 responses are in [1.578161651secs, 8.789315237secs]
> ```
> 
> **Batching and `-O3` optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.973573ms, 10 responses are in [1.221193ms, 2.694713ms]
> '/state' response on average took 113.331551ms, 10 responses are in [102.593397ms, 142.028555ms]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.475842691secs, 10 responses are in [2.437217ms, 3.815589561secs]
> '/state' response on average took 4.742303751secs, 10 responses are in [4.047655443secs, 6.00752698secs]
> ```
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Benjamin Mahler <bm...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/#review206714
-----------------------------------------------------------



Overall the approach looks good, didn't see any bugs, so just minor comments below.

* I'm not sure where to put it but it seems we need a TODO to de-duplicate response processing when the principal is identical? E.g. if "ben" asks for state three times in one batch, ideally we only compute the response for "ben" once since they're all identical within a principal?
* Can you document the consistency model in the description?


src/master/http.cpp
Line 2812 (original), 2815 (patched)
<https://reviews.apache.org/r/68132/#comment289793>

    Can we keep the existing name? I believe the idea is to have them match the path, so "/state" -> Http::state seems ideal as is.



src/master/http.cpp
Lines 2838-2910 (original), 2841-2853 (patched)
<https://reviews.apache.org/r/68132/#comment289799>

    It seems a little brittle to inline the batching related logic and deal with promises here, could we use a function?
    
    ```
    .then(... {
      return _state(request, approvers);
    })
    ```
    
    ```
    Future<Response> _state(request, approvers)
    {
      bool scheduleBatch = batchedStateRequests.empty();
      
      ... Add entry and grab `future` ...
      
      if (scheduleBatch) { dispatch ... }
      
      return future;
    }
    ```
    
    We could name it differently, e.g. `s/_state/deferStateRequest/`. This way the handler doesn't have to inline batching and promise logic.



src/master/http.cpp
Lines 2887-2889 (original), 2849-2851 (patched)
<https://reviews.apache.org/r/68132/#comment289796>

    Can you move in the request and the promise (without Owned) here? (the lambda will need to be `mutable` for request to be moved here).



src/master/http.cpp
Lines 5173 (patched)
<https://reviews.apache.org/r/68132/#comment289800>

    Per my comment above, I guess we could name these like:
    
    ```
    state
    _state
    __state
    ```
    
    Or:
    
    ```
    state
    deferStateRequest
    processStateRequestBatch
    ```
    
    The former seems a little easier to guess the flow, the latter tries to name the functions a bit more meaningfully (which can often make the flow harder to see from function names alone).



src/master/http.cpp
Lines 5175-5178 (patched)
<https://reviews.apache.org/r/68132/#comment289795>

    No need for the special case and the early return? The code will handle 0 items correctly.
    
    If this is trying to let us know in the future about a bug where the batching is firing incorrectly such that there are 0 items, we could CHECK:
    
    ```
    CHECK(!batchedStateRequests.empty())
      << "Bug in state batching logic";
    ```
    
    Seems ok without the CHECK to me as well.



src/master/http.cpp
Lines 5319 (patched)
<https://reviews.apache.org/r/68132/#comment289797>

    This makes batchedRequest not so const :), might as well have it come in as a `BatchedRequest&&` unless `process::async` doesn't support moving yet?



src/master/http.cpp
Lines 5323-5328 (patched)
<https://reviews.apache.org/r/68132/#comment289801>

    It seems a little odd to have the lambda have to know about the batch struct and do promise setting, instead of just returning the Response:
    
    ```
    auto response = [this](Owned<ObjectApprovers> approvers) {
      ...
      
      return http::OK(...);
    }
    ```
    
    Then this code here is the one that deals with promise setting, e.g.
    
    ```
      // Fire off the workers.
      foreach (const BatchedStateRequest& request, batchedStateRequests) {
        request.promise.associate(process::async(response, request.approvers));
      }
      
      // Wait for all responses to transition.
      vector<Future<Response>> responses;
      foreach (const BatchedStateRequest& request, batchedStateRequests) {
        responses.push_back(request.promise.future());
      }
      process::await(responses).await();
    ```
    
    This lets us keep the response lambda agnostic of batching and we could more cleanly move it up in the future.



src/master/http.cpp
Lines 5327 (patched)
<https://reviews.apache.org/r/68132/#comment289798>

    Can we move the request in here? If async doesn't support it, can you add a TODO?



src/master/master.hpp
Lines 1842 (patched)
<https://reviews.apache.org/r/68132/#comment289794>

    Can we avoid Owned and std::move this struct instead of copying it?


- Benjamin Mahler


On July 31, 2018, 5:24 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68132/
> -----------------------------------------------------------
> 
> (Updated July 31, 2018, 5:24 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9122
>     https://issues.apache.org/jira/browse/MESOS-9122
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> With this patch handlers for '/state' requests are not scheduled
> directly after authorization, but are accumulated and then scheduled
> for later parallel processing.
> 
> This approach allows, if there are N '/state' requests in the Master's
> mailbox and T is the request response time, to block the Master actor
> only once for time O(T) instead of blocking it for time N*T prior to
> this patch.
> 
> This batching technique reduces both the time Master is spending
> answering '/state' requests and the average request response time
> in presence of multiple requests in the Master's mailbox. However,
> for seldom '/state' requests that don't accumulate in the Master's
> mailbox, the response time might increase due to an added trip
> through the mailbox.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 6947031da3ce3523408d69d6dac60551a39a4601 
>   src/master/master.hpp 0353d550308816f219aedb6afe15c643fc8bb340 
>   src/master/master.cpp 2af976f7ea7f81d4b06a45ce13286dbd61b9b144 
> 
> 
> Diff: https://reviews.apache.org/r/68132/diff/1/
> 
> 
> Testing
> -------
> 
> `make check` on Mac OS 10.13.5 and various Linux distros.
> 
> Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark.
> 
> Average improvement without optimization: 62%–70%.
> Average improvement with optimization: 17%–62%.
> 
> **[No batching, no optimization](https://dobianchi.files.wordpress.com/2011/11/no-barrique-no-berlusconi.jpg?w=638)**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.102349605secs, 10 responses are in [2.662342ms, 2.143755433secs]
> '/state' response on average took 1.549122019secs, 10 responses are in [494.278454ms, 2.633971927secs]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 18.436968137secs, 10 responses are in [2.578238ms, 33.210561732secs]
> '/state' response on average took 23.916379537secs, 10 responses are in [5.170660597secs, 43.008091744secs]
> ```
> 
> **With batching but no optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 417.211022ms, 10 responses are in [4.066901ms, 728.045442ms]
> '/state' response on average took 830.351291ms, 10 responses are in [459.033455ms, 1.208880892secs]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 5.439950928secs, 10 responses are in [3.246906ms, 9.343994388secs]
> '/state' response on average took 16.764607823secs, 10 responses are in [4.980333091secs, 18.461983916secs]
> ```
> 
> **No batching but `-O3` optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 2.396221ms, 10 responses are in [1.628583ms, 2.816639ms]
> '/state' response on average took 113.469574ms, 10 responses are in [104.218099ms, 134.477062ms]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 3.892615876secs, 10 responses are in [2.480517ms, 7.630934838secs]
> '/state' response on average took 5.205245306secs, 10 responses are in [1.578161651secs, 8.789315237secs]
> ```
> 
> **Batching and `-O3` optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.973573ms, 10 responses are in [1.221193ms, 2.694713ms]
> '/state' response on average took 113.331551ms, 10 responses are in [102.593397ms, 142.028555ms]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.475842691secs, 10 responses are in [2.437217ms, 3.815589561secs]
> '/state' response on average took 4.742303751secs, 10 responses are in [4.047655443secs, 6.00752698secs]
> ```
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Alexander Rukletsov <ru...@gmail.com>.


> On Aug. 9, 2018, 1:11 p.m., Benno Evers wrote:
> > src/master/master.hpp
> > Line 1467 (original), 1470 (patched)
> > <https://reviews.apache.org/r/68132/diff/2/?file=2068405#file2068405line1470>
> >
> >     Since the `batchedStateRequests` vector acts similarly to a cache, maybe we should keep the `const` here and make the vector `mutable`?

This is a good suggestion, however, I don't see this pattern anywhere in our codebase. Are you aware of other places we can apply the `const + mutable` pattern?


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/#review206879
-----------------------------------------------------------


On Aug. 7, 2018, 12:11 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68132/
> -----------------------------------------------------------
> 
> (Updated Aug. 7, 2018, 12:11 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9122
>     https://issues.apache.org/jira/browse/MESOS-9122
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> With this patch handlers for '/state' requests are not scheduled
> directly after authorization, but are accumulated and then scheduled
> for later parallel processing.
> 
> This approach allows, if there are N '/state' requests in the Master's
> mailbox and T is the request response time, to block the Master actor
> only once for time O(T) instead of blocking it for time N*T prior to
> this patch.
> 
> This batching technique reduces both the time Master is spending
> answering '/state' requests and the average request response time
> in presence of multiple requests in the Master's mailbox. However,
> for seldom '/state' requests that don't accumulate in the Master's
> mailbox, the response time might increase due to an added trip
> through the mailbox.
> 
> The change preserves the read-your-writes consistency model.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp d43fbd689598612ec5946b46e2fa5e7f5e22cfa8 
>   src/master/master.hpp 209b998db8d2bad7a3812df44f0939458f48eb11 
> 
> 
> Diff: https://reviews.apache.org/r/68132/diff/2/
> 
> 
> Testing
> -------
> 
> `make check` on Mac OS 10.13.5 and various Linux distros.
> 
> Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark and `MasterStateQuery_BENCHMARK_Test.GetState`, see below.
> 
> **Setup**
> Processor: Intel i7-4980HQ 2.8 GHz with 6 MB on-chip L3 cache and 128 MB L4 cache (Crystalwell)
> Total Number of Cores: 4
> Total Number of Cores: 8
> L2 Cache (per Core): 256 KB  
> 
> Compiler: Apple LLVM version 9.1.0 (clang-902.0.39.2)
> Optimization: -O2
> 
> **MasterStateQuery_BENCHMARK_Test.GetState, v0 '/state' response time**
> 
> setup                                                    | no batching | batching
> ---------------------------------------------------------|-------------|----------
>  1000 agents,  10000 running, and  10000 completed tasks | 146.496ms   | 158.319ms
> 10000 agents, 100000 running, and 100000 completed tasks | 1.795s      | 1.899s
> 20000 agents, 200000 running, and 200000 completed tasks | 3.742s      | 4.427s
> 40000 agents, 400000 running, and 400000 completed tasks | 10.946s     | 11.096s
> 
> **MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 1**
> Test setup 1: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 50 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **50 measurements** per endpoint.
> 
> /flags | no batching | batching       /state | no batching | batching
> -------------------------------   *   --------------------------------
>    min |  1.598ms    | 1.475ms           min | 100.627ms   | 105.383ms
>    p25 |  2.370ms    | 2.452ms           p25 | 102.206ms   | 107.184ms
>    p50 |  2.520ms    | 2.562ms           p50 | 103.213ms   | 108.468ms
>    p75 |  2.623ms    | 2.665ms           p75 | 104.100ms   | 109.808ms
>    p90 |  2.803ms    | 2.731ms           p90 | 106.079ms   | 111.043ms
>    max | 84.957ms    | 2.934ms           max | 153.438ms   | 154.636ms
> 
> **MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 2**
> Test setup 2: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **10 measurements** per endpoint.
> 
> /flags | no batching | batching       /state | no batching | batching
> --------------------------------  *   -------------------------------
>    min | 2.309ms     |   1.579ms         min | 1.512s      | 2.820s
>    p25 | 1.547s      | 373.609ms         p25 | 3.262s      | 3.588s
>    p50 | 3.189s      | 831.261ms         p50 | 5.052s      | 4.253s
>    p75 | 5.346s      |   2.215s          p75 | 6.846s      | 4.510s
>    p90 | 5.854s      |   2.351s          p90 | 7.883s      | 4.705s
>    max | 7.237s      |   2.444s          max | 8.517s      | 4.934s
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Benno Evers <be...@mesosphere.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/#review206879
-----------------------------------------------------------


Fix it, then Ship it!





src/master/http.cpp
Lines 2874 (patched)
<https://reviews.apache.org/r/68132/#comment290184>

    It seems a bit unfortunate that the state JSON serialization is mixed together with the request batch processing in the same function, maybe we can extract it into a free function `void json(JSON::ObjectWriter*, const Master&)`.
    
    On the other hand, it will be changed in the next review anyways, so maybe we can leave it as is for now.



src/master/http.cpp
Lines 2881 (patched)
<https://reviews.apache.org/r/68132/#comment289983>

    Intuitively I'd expect a variable `response` to be of type `http::Response`.



src/master/master.hpp
Line 1467 (original), 1470 (patched)
<https://reviews.apache.org/r/68132/#comment290183>

    Since the `batchedStateRequests` vector acts similarly to a cache, maybe we should keep the `const` here and make the vector `mutable`?


- Benno Evers


On Aug. 7, 2018, 12:11 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68132/
> -----------------------------------------------------------
> 
> (Updated Aug. 7, 2018, 12:11 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9122
>     https://issues.apache.org/jira/browse/MESOS-9122
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> With this patch handlers for '/state' requests are not scheduled
> directly after authorization, but are accumulated and then scheduled
> for later parallel processing.
> 
> This approach allows, if there are N '/state' requests in the Master's
> mailbox and T is the request response time, to block the Master actor
> only once for time O(T) instead of blocking it for time N*T prior to
> this patch.
> 
> This batching technique reduces both the time Master is spending
> answering '/state' requests and the average request response time
> in presence of multiple requests in the Master's mailbox. However,
> for seldom '/state' requests that don't accumulate in the Master's
> mailbox, the response time might increase due to an added trip
> through the mailbox.
> 
> The change preserves the read-your-writes consistency model.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp d43fbd689598612ec5946b46e2fa5e7f5e22cfa8 
>   src/master/master.hpp 209b998db8d2bad7a3812df44f0939458f48eb11 
> 
> 
> Diff: https://reviews.apache.org/r/68132/diff/2/
> 
> 
> Testing
> -------
> 
> `make check` on Mac OS 10.13.5 and various Linux distros.
> 
> Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark and `MasterStateQuery_BENCHMARK_Test.GetState`, see below.
> 
> **Setup**
> Processor: Intel i7-4980HQ 2.8 GHz with 6 MB on-chip L3 cache and 128 MB L4 cache (Crystalwell)
> Total Number of Cores: 4
> Total Number of Cores: 8
> L2 Cache (per Core): 256 KB  
> 
> Compiler: Apple LLVM version 9.1.0 (clang-902.0.39.2)
> Optimization: -O2
> 
> **MasterStateQuery_BENCHMARK_Test.GetState, v0 '/state' response time**
> 
> setup                                                    | no batching | batching
> ---------------------------------------------------------|-------------|----------
>  1000 agents,  10000 running, and  10000 completed tasks | 146.496ms   | 158.319ms
> 10000 agents, 100000 running, and 100000 completed tasks | 1.795s      | 1.899s
> 20000 agents, 200000 running, and 200000 completed tasks | 3.742s      | 4.427s
> 40000 agents, 400000 running, and 400000 completed tasks | 10.946s     | 11.096s
> 
> **MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 1**
> Test setup 1: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 50 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **50 measurements** per endpoint.
> 
> /flags | no batching | batching       /state | no batching | batching
> -------------------------------   *   --------------------------------
>    min |  1.598ms    | 1.475ms           min | 100.627ms   | 105.383ms
>    p25 |  2.370ms    | 2.452ms           p25 | 102.206ms   | 107.184ms
>    p50 |  2.520ms    | 2.562ms           p50 | 103.213ms   | 108.468ms
>    p75 |  2.623ms    | 2.665ms           p75 | 104.100ms   | 109.808ms
>    p90 |  2.803ms    | 2.731ms           p90 | 106.079ms   | 111.043ms
>    max | 84.957ms    | 2.934ms           max | 153.438ms   | 154.636ms
> 
> **MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 2**
> Test setup 2: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **10 measurements** per endpoint.
> 
> /flags | no batching | batching       /state | no batching | batching
> --------------------------------  *   -------------------------------
>    min | 2.309ms     |   1.579ms         min | 1.512s      | 2.820s
>    p25 | 1.547s      | 373.609ms         p25 | 3.262s      | 3.588s
>    p50 | 3.189s      | 831.261ms         p50 | 5.052s      | 4.253s
>    p75 | 5.346s      |   2.215s          p75 | 6.846s      | 4.510s
>    p90 | 5.854s      |   2.351s          p90 | 7.883s      | 4.705s
>    max | 7.237s      |   2.444s          max | 8.517s      | 4.934s
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/#review206942
-----------------------------------------------------------



PASS: Mesos patch 68132 was successfully built and tested.

Reviews applied: `['68224', '68225', '68131', '68132']`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2112/mesos-review-68132

- Mesos Reviewbot Windows


On Aug. 7, 2018, 12:11 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68132/
> -----------------------------------------------------------
> 
> (Updated Aug. 7, 2018, 12:11 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9122
>     https://issues.apache.org/jira/browse/MESOS-9122
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> With this patch handlers for '/state' requests are not scheduled
> directly after authorization, but are accumulated and then scheduled
> for later parallel processing.
> 
> This approach allows, if there are N '/state' requests in the Master's
> mailbox and T is the request response time, to block the Master actor
> only once for time O(T) instead of blocking it for time N*T prior to
> this patch.
> 
> This batching technique reduces both the time Master is spending
> answering '/state' requests and the average request response time
> in presence of multiple requests in the Master's mailbox. However,
> for seldom '/state' requests that don't accumulate in the Master's
> mailbox, the response time might increase due to an added trip
> through the mailbox.
> 
> The change preserves the read-your-writes consistency model.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp d43fbd689598612ec5946b46e2fa5e7f5e22cfa8 
>   src/master/master.hpp 209b998db8d2bad7a3812df44f0939458f48eb11 
> 
> 
> Diff: https://reviews.apache.org/r/68132/diff/2/
> 
> 
> Testing
> -------
> 
> `make check` on Mac OS 10.13.5 and various Linux distros.
> 
> Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark and `MasterStateQuery_BENCHMARK_Test.GetState`, see below.
> 
> **Setup**
> Processor: Intel i7-4980HQ 2.8 GHz with 6 MB on-chip L3 cache and 128 MB L4 cache (Crystalwell)
> Total Number of Cores: 4
> Total Number of Cores: 8
> L2 Cache (per Core): 256 KB  
> 
> Compiler: Apple LLVM version 9.1.0 (clang-902.0.39.2)
> Optimization: -O2
> 
> **MasterStateQuery_BENCHMARK_Test.GetState, v0 '/state' response time**
> 
> setup                                                    | no batching | batching
> ---------------------------------------------------------|-------------|----------
>  1000 agents,  10000 running, and  10000 completed tasks | 146.496ms   | 158.319ms
> 10000 agents, 100000 running, and 100000 completed tasks | 1.795s      | 1.899s
> 20000 agents, 200000 running, and 200000 completed tasks | 3.742s      | 4.427s
> 40000 agents, 400000 running, and 400000 completed tasks | 10.946s     | 11.096s
> 
> **MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 1**
> Test setup 1: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 50 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **50 measurements** per endpoint.
> 
> /flags | no batching | batching       /state | no batching | batching
> -------------------------------   *   --------------------------------
>    min |  1.598ms    | 1.475ms           min | 100.627ms   | 105.383ms
>    p25 |  2.370ms    | 2.452ms           p25 | 102.206ms   | 107.184ms
>    p50 |  2.520ms    | 2.562ms           p50 | 103.213ms   | 108.468ms
>    p75 |  2.623ms    | 2.665ms           p75 | 104.100ms   | 109.808ms
>    p90 |  2.803ms    | 2.731ms           p90 | 106.079ms   | 111.043ms
>    max | 84.957ms    | 2.934ms           max | 153.438ms   | 154.636ms
> 
> **MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 2**
> Test setup 2: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **10 measurements** per endpoint.
> 
> /flags | no batching | batching       /state | no batching | batching
> --------------------------------  *   -------------------------------
>    min | 2.309ms     |   1.579ms         min | 1.512s      | 2.820s
>    p25 | 1.547s      | 373.609ms         p25 | 3.262s      | 3.588s
>    p50 | 3.189s      | 831.261ms         p50 | 5.052s      | 4.253s
>    p75 | 5.346s      |   2.215s          p75 | 6.846s      | 4.510s
>    p90 | 5.854s      |   2.351s          p90 | 7.883s      | 4.705s
>    max | 7.237s      |   2.444s          max | 8.517s      | 4.934s
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/#review206978
-----------------------------------------------------------



Patch looks great!

Reviews applied: [68224, 68225, 68131, 68132]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Aug. 7, 2018, 12:11 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68132/
> -----------------------------------------------------------
> 
> (Updated Aug. 7, 2018, 12:11 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9122
>     https://issues.apache.org/jira/browse/MESOS-9122
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> With this patch handlers for '/state' requests are not scheduled
> directly after authorization, but are accumulated and then scheduled
> for later parallel processing.
> 
> This approach allows, if there are N '/state' requests in the Master's
> mailbox and T is the request response time, to block the Master actor
> only once for time O(T) instead of blocking it for time N*T prior to
> this patch.
> 
> This batching technique reduces both the time Master is spending
> answering '/state' requests and the average request response time
> in presence of multiple requests in the Master's mailbox. However,
> for seldom '/state' requests that don't accumulate in the Master's
> mailbox, the response time might increase due to an added trip
> through the mailbox.
> 
> The change preserves the read-your-writes consistency model.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp d43fbd689598612ec5946b46e2fa5e7f5e22cfa8 
>   src/master/master.hpp 209b998db8d2bad7a3812df44f0939458f48eb11 
> 
> 
> Diff: https://reviews.apache.org/r/68132/diff/2/
> 
> 
> Testing
> -------
> 
> `make check` on Mac OS 10.13.5 and various Linux distros.
> 
> Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark and `MasterStateQuery_BENCHMARK_Test.GetState`, see below.
> 
> **Setup**
> Processor: Intel i7-4980HQ 2.8 GHz with 6 MB on-chip L3 cache and 128 MB L4 cache (Crystalwell)
> Total Number of Cores: 4
> Total Number of Cores: 8
> L2 Cache (per Core): 256 KB  
> 
> Compiler: Apple LLVM version 9.1.0 (clang-902.0.39.2)
> Optimization: -O2
> 
> **MasterStateQuery_BENCHMARK_Test.GetState, v0 '/state' response time**
> 
> setup                                                    | no batching | batching
> ---------------------------------------------------------|-------------|----------
>  1000 agents,  10000 running, and  10000 completed tasks | 146.496ms   | 158.319ms
> 10000 agents, 100000 running, and 100000 completed tasks | 1.795s      | 1.899s
> 20000 agents, 200000 running, and 200000 completed tasks | 3.742s      | 4.427s
> 40000 agents, 400000 running, and 400000 completed tasks | 10.946s     | 11.096s
> 
> **MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 1**
> Test setup 1: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 50 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **50 measurements** per endpoint.
> 
> /flags | no batching | batching       /state | no batching | batching
> -------------------------------   *   --------------------------------
>    min |  1.598ms    | 1.475ms           min | 100.627ms   | 105.383ms
>    p25 |  2.370ms    | 2.452ms           p25 | 102.206ms   | 107.184ms
>    p50 |  2.520ms    | 2.562ms           p50 | 103.213ms   | 108.468ms
>    p75 |  2.623ms    | 2.665ms           p75 | 104.100ms   | 109.808ms
>    p90 |  2.803ms    | 2.731ms           p90 | 106.079ms   | 111.043ms
>    max | 84.957ms    | 2.934ms           max | 153.438ms   | 154.636ms
> 
> **MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 2**
> Test setup 2: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **10 measurements** per endpoint.
> 
> /flags | no batching | batching       /state | no batching | batching
> --------------------------------  *   -------------------------------
>    min | 2.309ms     |   1.579ms         min | 1.512s      | 2.820s
>    p25 | 1.547s      | 373.609ms         p25 | 3.262s      | 3.588s
>    p50 | 3.189s      | 831.261ms         p50 | 5.052s      | 4.253s
>    p75 | 5.346s      |   2.215s          p75 | 6.846s      | 4.510s
>    p90 | 5.854s      |   2.351s          p90 | 7.883s      | 4.705s
>    max | 7.237s      |   2.444s          max | 8.517s      | 4.934s
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/#review207112
-----------------------------------------------------------



PASS: Mesos patch 68132 was successfully built and tested.

Reviews applied: `['68132']`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2136/mesos-review-68132

- Mesos Reviewbot Windows


On Aug. 11, 2018, 6:09 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68132/
> -----------------------------------------------------------
> 
> (Updated Aug. 11, 2018, 6:09 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9122
>     https://issues.apache.org/jira/browse/MESOS-9122
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> With this patch handlers for '/state' requests are not scheduled
> directly after authorization, but are accumulated and then scheduled
> for later parallel processing.
> 
> This approach allows, if there are N '/state' requests in the Master's
> mailbox and T is the request response time, to block the Master actor
> only once for time O(T) instead of blocking it for time N*T prior to
> this patch.
> 
> This batching technique reduces both the time Master is spending
> answering '/state' requests and the average request response time
> in presence of multiple requests in the Master's mailbox. However,
> for seldom '/state' requests that don't accumulate in the Master's
> mailbox, the response time might increase due to an added trip
> through the mailbox.
> 
> The change preserves the read-your-writes consistency model.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp d43fbd689598612ec5946b46e2fa5e7f5e22cfa8 
>   src/master/master.hpp 209b998db8d2bad7a3812df44f0939458f48eb11 
> 
> 
> Diff: https://reviews.apache.org/r/68132/diff/2/
> 
> 
> Testing
> -------
> 
> `make check` on Mac OS 10.13.5 and various Linux distros.
> 
> Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark and `MasterStateQuery_BENCHMARK_Test.GetState`, see below.
> 
> **Setup**
> Processor: Intel i7-4980HQ 2.8 GHz with 6 MB on-chip L3 cache and 128 MB L4 cache (Crystalwell)
> Total Number of Cores: 4
> Total Number of Cores: 8
> L2 Cache (per Core): 256 KB  
> 
> Compiler: Apple LLVM version 9.1.0 (clang-902.0.39.2)
> Optimization: -O2
> 
> **MasterStateQuery_BENCHMARK_Test.GetState, v0 '/state' response time**
> 
> setup                                                    | no batching | batching
> ---------------------------------------------------------|-------------|----------
>  1000 agents,  10000 running, and  10000 completed tasks | 146.496ms   | 158.319ms
> 10000 agents, 100000 running, and 100000 completed tasks | 1.795s      | 1.899s
> 20000 agents, 200000 running, and 200000 completed tasks | 3.742s      | 4.427s
> 40000 agents, 400000 running, and 400000 completed tasks | 10.946s     | 11.096s
> 
> **MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 1**
> Test setup 1: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 50 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **50 measurements** per endpoint.
> 
> /flags | no batching | batching       /state | no batching | batching
> -------------------------------   *   --------------------------------
>    min |  1.598ms    | 1.475ms           min | 100.627ms   | 105.383ms
>    p25 |  2.370ms    | 2.452ms           p25 | 102.206ms   | 107.184ms
>    p50 |  2.520ms    | 2.562ms           p50 | 103.213ms   | 108.468ms
>    p75 |  2.623ms    | 2.665ms           p75 | 104.100ms   | 109.808ms
>    p90 |  2.803ms    | 2.731ms           p90 | 106.079ms   | 111.043ms
>    max | 84.957ms    | 2.934ms           max | 153.438ms   | 154.636ms
> 
> **MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 2**
> Test setup 2: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **10 measurements** per endpoint.
> 
> /flags | no batching | batching       /state | no batching | batching
> --------------------------------  *   -------------------------------
>    min | 2.309ms     |   1.579ms         min | 1.512s      | 2.820s
>    p25 | 1.547s      | 373.609ms         p25 | 3.262s      | 3.588s
>    p50 | 3.189s      | 831.261ms         p50 | 5.052s      | 4.253s
>    p75 | 5.346s      |   2.215s          p75 | 6.846s      | 4.510s
>    p90 | 5.854s      |   2.351s          p90 | 7.883s      | 4.705s
>    max | 7.237s      |   2.444s          max | 8.517s      | 4.934s
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Alexander Rukletsov <ru...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/
-----------------------------------------------------------

(Updated Aug. 11, 2018, 6:09 p.m.)


Review request for mesos, Benno Evers and Benjamin Mahler.


Bugs: MESOS-9122
    https://issues.apache.org/jira/browse/MESOS-9122


Repository: mesos


Description
-------

With this patch handlers for '/state' requests are not scheduled
directly after authorization, but are accumulated and then scheduled
for later parallel processing.

This approach allows, if there are N '/state' requests in the Master's
mailbox and T is the request response time, to block the Master actor
only once for time O(T) instead of blocking it for time N*T prior to
this patch.

This batching technique reduces both the time Master is spending
answering '/state' requests and the average request response time
in presence of multiple requests in the Master's mailbox. However,
for seldom '/state' requests that don't accumulate in the Master's
mailbox, the response time might increase due to an added trip
through the mailbox.

The change preserves the read-your-writes consistency model.


Diffs
-----

  src/master/http.cpp d43fbd689598612ec5946b46e2fa5e7f5e22cfa8 
  src/master/master.hpp 209b998db8d2bad7a3812df44f0939458f48eb11 


Diff: https://reviews.apache.org/r/68132/diff/2/


Testing
-------

`make check` on Mac OS 10.13.5 and various Linux distros.

Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark and `MasterStateQuery_BENCHMARK_Test.GetState`, see below.

**Setup**
Processor: Intel i7-4980HQ 2.8 GHz with 6 MB on-chip L3 cache and 128 MB L4 cache (Crystalwell)
Total Number of Cores: 4
Total Number of Cores: 8
L2 Cache (per Core): 256 KB  

Compiler: Apple LLVM version 9.1.0 (clang-902.0.39.2)
Optimization: -O2

**MasterStateQuery_BENCHMARK_Test.GetState, v0 '/state' response time**

setup                                                    | no batching | batching
---------------------------------------------------------|-------------|----------
 1000 agents,  10000 running, and  10000 completed tasks | 146.496ms   | 158.319ms
10000 agents, 100000 running, and 100000 completed tasks | 1.795s      | 1.899s
20000 agents, 200000 running, and 200000 completed tasks | 3.742s      | 4.427s
40000 agents, 400000 running, and 400000 completed tasks | 10.946s     | 11.096s

**MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 1**
Test setup 1: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 50 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **50 measurements** per endpoint.

/flags | no batching | batching       /state | no batching | batching
-------------------------------   *   --------------------------------
   min |  1.598ms    | 1.475ms           min | 100.627ms   | 105.383ms
   p25 |  2.370ms    | 2.452ms           p25 | 102.206ms   | 107.184ms
   p50 |  2.520ms    | 2.562ms           p50 | 103.213ms   | 108.468ms
   p75 |  2.623ms    | 2.665ms           p75 | 104.100ms   | 109.808ms
   p90 |  2.803ms    | 2.731ms           p90 | 106.079ms   | 111.043ms
   max | 84.957ms    | 2.934ms           max | 153.438ms   | 154.636ms

**MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 2**
Test setup 2: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **10 measurements** per endpoint.

/flags | no batching | batching       /state | no batching | batching
--------------------------------  *   -------------------------------
   min | 2.309ms     |   1.579ms         min | 1.512s      | 2.820s
   p25 | 1.547s      | 373.609ms         p25 | 3.262s      | 3.588s
   p50 | 3.189s      | 831.261ms         p50 | 5.052s      | 4.253s
   p75 | 5.346s      |   2.215s          p75 | 6.846s      | 4.510s
   p90 | 5.854s      |   2.351s          p90 | 7.883s      | 4.705s
   max | 7.237s      |   2.444s          max | 8.517s      | 4.934s


Thanks,

Alexander Rukletsov

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Benjamin Mahler <bm...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/#review207054
-----------------------------------------------------------


Ship it!




Looks good, thank you Alex! This will be a big win for users, and I'm looking forward to having all reads follow this approach


src/master/http.cpp
Lines 2835-2838 (patched)
<https://reviews.apache.org/r/68132/#comment290240>

    thanks!



src/master/http.cpp
Lines 3022 (patched)
<https://reviews.apache.org/r/68132/#comment290242>

    How about:
    
    ```
    // Produce the responses in parallel.
    ```



src/master/http.cpp
Lines 3026-3028 (patched)
<https://reviews.apache.org/r/68132/#comment290245>

    Perhaps a TODO to avoid the extra request copy and move once process::async supports moving arguments in.



src/master/http.cpp
Lines 3027 (patched)
<https://reviews.apache.org/r/68132/#comment290244>

    You may want to s/set/associate/ to make it clearer to the reader that we're associating the future instead of setting a response (I'm not sure if we'll keep the `Promise::set(Future)` overload).



src/master/http.cpp
Lines 3035-3036 (patched)
<https://reviews.apache.org/r/68132/#comment290243>

    The reader may not know what the deadlock issue is here, consider referencing [MESOS-8256](https://issues.apache.org/jira/browse/MESOS-8256) here



src/master/master.hpp
Lines 1848-1850 (patched)
<https://reviews.apache.org/r/68132/#comment290241>

    Do we need this? Does it add any information that the reader can't see from the fields?


- Benjamin Mahler


On Aug. 7, 2018, 12:11 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68132/
> -----------------------------------------------------------
> 
> (Updated Aug. 7, 2018, 12:11 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9122
>     https://issues.apache.org/jira/browse/MESOS-9122
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> With this patch handlers for '/state' requests are not scheduled
> directly after authorization, but are accumulated and then scheduled
> for later parallel processing.
> 
> This approach allows, if there are N '/state' requests in the Master's
> mailbox and T is the request response time, to block the Master actor
> only once for time O(T) instead of blocking it for time N*T prior to
> this patch.
> 
> This batching technique reduces both the time Master is spending
> answering '/state' requests and the average request response time
> in presence of multiple requests in the Master's mailbox. However,
> for seldom '/state' requests that don't accumulate in the Master's
> mailbox, the response time might increase due to an added trip
> through the mailbox.
> 
> The change preserves the read-your-writes consistency model.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp d43fbd689598612ec5946b46e2fa5e7f5e22cfa8 
>   src/master/master.hpp 209b998db8d2bad7a3812df44f0939458f48eb11 
> 
> 
> Diff: https://reviews.apache.org/r/68132/diff/2/
> 
> 
> Testing
> -------
> 
> `make check` on Mac OS 10.13.5 and various Linux distros.
> 
> Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark and `MasterStateQuery_BENCHMARK_Test.GetState`, see below.
> 
> **Setup**
> Processor: Intel i7-4980HQ 2.8 GHz with 6 MB on-chip L3 cache and 128 MB L4 cache (Crystalwell)
> Total Number of Cores: 4
> Total Number of Cores: 8
> L2 Cache (per Core): 256 KB  
> 
> Compiler: Apple LLVM version 9.1.0 (clang-902.0.39.2)
> Optimization: -O2
> 
> **MasterStateQuery_BENCHMARK_Test.GetState, v0 '/state' response time**
> 
> setup                                                    | no batching | batching
> ---------------------------------------------------------|-------------|----------
>  1000 agents,  10000 running, and  10000 completed tasks | 146.496ms   | 158.319ms
> 10000 agents, 100000 running, and 100000 completed tasks | 1.795s      | 1.899s
> 20000 agents, 200000 running, and 200000 completed tasks | 3.742s      | 4.427s
> 40000 agents, 400000 running, and 400000 completed tasks | 10.946s     | 11.096s
> 
> **MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 1**
> Test setup 1: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 50 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **50 measurements** per endpoint.
> 
> /flags | no batching | batching       /state | no batching | batching
> -------------------------------   *   --------------------------------
>    min |  1.598ms    | 1.475ms           min | 100.627ms   | 105.383ms
>    p25 |  2.370ms    | 2.452ms           p25 | 102.206ms   | 107.184ms
>    p50 |  2.520ms    | 2.562ms           p50 | 103.213ms   | 108.468ms
>    p75 |  2.623ms    | 2.665ms           p75 | 104.100ms   | 109.808ms
>    p90 |  2.803ms    | 2.731ms           p90 | 106.079ms   | 111.043ms
>    max | 84.957ms    | 2.934ms           max | 153.438ms   | 154.636ms
> 
> **MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 2**
> Test setup 2: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **10 measurements** per endpoint.
> 
> /flags | no batching | batching       /state | no batching | batching
> --------------------------------  *   -------------------------------
>    min | 2.309ms     |   1.579ms         min | 1.512s      | 2.820s
>    p25 | 1.547s      | 373.609ms         p25 | 3.262s      | 3.588s
>    p50 | 3.189s      | 831.261ms         p50 | 5.052s      | 4.253s
>    p75 | 5.346s      |   2.215s          p75 | 6.846s      | 4.510s
>    p90 | 5.854s      |   2.351s          p90 | 7.883s      | 4.705s
>    max | 7.237s      |   2.444s          max | 8.517s      | 4.934s
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Alexander Rukletsov <ru...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/
-----------------------------------------------------------

(Updated Aug. 7, 2018, 12:11 p.m.)


Review request for mesos, Benno Evers and Benjamin Mahler.


Bugs: MESOS-9122
    https://issues.apache.org/jira/browse/MESOS-9122


Repository: mesos


Description
-------

With this patch handlers for '/state' requests are not scheduled
directly after authorization, but are accumulated and then scheduled
for later parallel processing.

This approach allows, if there are N '/state' requests in the Master's
mailbox and T is the request response time, to block the Master actor
only once for time O(T) instead of blocking it for time N*T prior to
this patch.

This batching technique reduces both the time Master is spending
answering '/state' requests and the average request response time
in presence of multiple requests in the Master's mailbox. However,
for seldom '/state' requests that don't accumulate in the Master's
mailbox, the response time might increase due to an added trip
through the mailbox.

The change preserves the read-your-writes consistency model.


Diffs
-----

  src/master/http.cpp d43fbd689598612ec5946b46e2fa5e7f5e22cfa8 
  src/master/master.hpp 209b998db8d2bad7a3812df44f0939458f48eb11 


Diff: https://reviews.apache.org/r/68132/diff/2/


Testing (updated)
-------

`make check` on Mac OS 10.13.5 and various Linux distros.

Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark and `MasterStateQuery_BENCHMARK_Test.GetState`, see below.

**Setup**
Processor: Intel i7-4980HQ 2.8 GHz with 6 MB on-chip L3 cache and 128 MB L4 cache (Crystalwell)
Total Number of Cores: 4
Total Number of Cores: 8
L2 Cache (per Core): 256 KB  

Compiler: Apple LLVM version 9.1.0 (clang-902.0.39.2)
Optimization: -O2

**MasterStateQuery_BENCHMARK_Test.GetState, v0 '/state' response time**

setup                                                    | no batching | batching
---------------------------------------------------------|-------------|----------
 1000 agents,  10000 running, and  10000 completed tasks | 146.496ms   | 158.319ms
10000 agents, 100000 running, and 100000 completed tasks | 1.795s      | 1.899s
20000 agents, 200000 running, and 200000 completed tasks | 3.742s      | 4.427s
40000 agents, 400000 running, and 400000 completed tasks | 10.946s     | 11.096s

**MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 1**
Test setup 1: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 50 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **50 measurements** per endpoint.

/flags | no batching | batching       /state | no batching | batching
-------------------------------   *   --------------------------------
   min |  1.598ms    | 1.475ms           min | 100.627ms   | 105.383ms
   p25 |  2.370ms    | 2.452ms           p25 | 102.206ms   | 107.184ms
   p50 |  2.520ms    | 2.562ms           p50 | 103.213ms   | 108.468ms
   p75 |  2.623ms    | 2.665ms           p75 | 104.100ms   | 109.808ms
   p90 |  2.803ms    | 2.731ms           p90 | 106.079ms   | 111.043ms
   max | 84.957ms    | 2.934ms           max | 153.438ms   | 154.636ms

**MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 2**
Test setup 2: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **10 measurements** per endpoint.

/flags | no batching | batching       /state | no batching | batching
--------------------------------  *   -------------------------------
   min | 2.309ms     |   1.579ms         min | 1.512s      | 2.820s
   p25 | 1.547s      | 373.609ms         p25 | 3.262s      | 3.588s
   p50 | 3.189s      | 831.261ms         p50 | 5.052s      | 4.253s
   p75 | 5.346s      |   2.215s          p75 | 6.846s      | 4.510s
   p90 | 5.854s      |   2.351s          p90 | 7.883s      | 4.705s
   max | 7.237s      |   2.444s          max | 8.517s      | 4.934s


Thanks,

Alexander Rukletsov

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Alexander Rukletsov <ru...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/
-----------------------------------------------------------

(Updated Aug. 7, 2018, 12:10 p.m.)


Review request for mesos, Benno Evers and Benjamin Mahler.


Bugs: MESOS-9122
    https://issues.apache.org/jira/browse/MESOS-9122


Repository: mesos


Description
-------

With this patch handlers for '/state' requests are not scheduled
directly after authorization, but are accumulated and then scheduled
for later parallel processing.

This approach allows, if there are N '/state' requests in the Master's
mailbox and T is the request response time, to block the Master actor
only once for time O(T) instead of blocking it for time N*T prior to
this patch.

This batching technique reduces both the time Master is spending
answering '/state' requests and the average request response time
in presence of multiple requests in the Master's mailbox. However,
for seldom '/state' requests that don't accumulate in the Master's
mailbox, the response time might increase due to an added trip
through the mailbox.

The change preserves the read-your-writes consistency model.


Diffs
-----

  src/master/http.cpp d43fbd689598612ec5946b46e2fa5e7f5e22cfa8 
  src/master/master.hpp 209b998db8d2bad7a3812df44f0939458f48eb11 


Diff: https://reviews.apache.org/r/68132/diff/2/


Testing (updated)
-------

`make check` on Mac OS 10.13.5 and various Linux distros.

Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark and `MasterStateQuery_BENCHMARK_Test.GetState`, see below.

**Setup**
Processor: Intel i7-4980HQ 2.8 GHz with 6 MB on-chip L3 cache and 128 MB L4 cache (Crystalwell)
Total Number of Cores: 4
Total Number of Cores: 8
L2 Cache (per Core): 256 KB  

Compiler: Apple LLVM version 9.1.0 (clang-902.0.39.2)
Optimization: -O2

**MasterStateQuery_BENCHMARK_Test.GetState, v0 '/state' response time**

setup                                                    | no batching | batching
---------------------------------------------------------|-------------|----------
 1000 agents,  10000 running, and  10000 completed tasks | 146.496ms   | 158.319ms
10000 agents, 100000 running, and 100000 completed tasks | 1.795s      | 1.899s
20000 agents, 200000 running, and 200000 completed tasks | 3.742s      | 4.427s
40000 agents, 400000 running, and 400000 completed tasks | 10.946s     | 11.096s

**MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 1**
Test setup 1: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 50 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **50 measurements** per endpoint.

/flags | no batching | batching       /state | no batching | batching
-------------------------------   *   --------------------------------
   min |  1.598ms    | 1.475ms           min | 100.627ms   | 105.383ms
   p25 |  2.370ms    | 2.452ms           p25 | 102.206ms   | 107.184ms
   p50 |  2.520ms    | 2.562ms           p50 | 103.213ms   | 108.468ms
   p75 |  2.623ms    | 2.665ms           p75 | 104.100ms   | 109.808ms
   p90 |  2.803ms    | 2.731ms           p90 | 106.079ms   | 111.043ms
   max | 84.957ms    | 2.934ms           max | 153.438ms   | 154.636ms

**MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 2**
Test setup 2: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **10 measurements** per endpoint.

/flags | no batching | batching       /state | no batching | batching
--------------------------------   *  -------------------------------
   min | 2.309ms     |   1.579ms         min | 1.512s      | 2.820s
   p25 | 1.547s      | 373.609ms         p25 | 3.262s      | 3.588s
   p50 | 3.189s      | 831.261ms         p50 | 5.052s      | 4.253s
   p75 | 5.346s      |   2.215s          p75 | 6.846s      | 4.510s
   p90 | 5.854s      |   2.351s          p90 | 7.883s      | 4.705s
   max | 7.237s      |   2.444s          max | 8.517s      | 4.934s


Thanks,

Alexander Rukletsov

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Alexander Rukletsov <ru...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/
-----------------------------------------------------------

(Updated Aug. 7, 2018, 11:31 a.m.)


Review request for mesos, Benno Evers and Benjamin Mahler.


Bugs: MESOS-9122
    https://issues.apache.org/jira/browse/MESOS-9122


Repository: mesos


Description
-------

With this patch handlers for '/state' requests are not scheduled
directly after authorization, but are accumulated and then scheduled
for later parallel processing.

This approach allows, if there are N '/state' requests in the Master's
mailbox and T is the request response time, to block the Master actor
only once for time O(T) instead of blocking it for time N*T prior to
this patch.

This batching technique reduces both the time Master is spending
answering '/state' requests and the average request response time
in presence of multiple requests in the Master's mailbox. However,
for seldom '/state' requests that don't accumulate in the Master's
mailbox, the response time might increase due to an added trip
through the mailbox.

The change preserves the read-your-writes consistency model.


Diffs
-----

  src/master/http.cpp d43fbd689598612ec5946b46e2fa5e7f5e22cfa8 
  src/master/master.hpp 209b998db8d2bad7a3812df44f0939458f48eb11 


Diff: https://reviews.apache.org/r/68132/diff/2/


Testing (updated)
-------

`make check` on Mac OS 10.13.5 and various Linux distros.

Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark and `MasterStateQuery_BENCHMARK_Test.GetState`, see below.

**Setup**
Processor: Intel i7-4980HQ 2.8 GHz with 6 MB on-chip L3 cache and 128 MB L4 cache (Crystalwell)
Total Number of Cores: 4
Total Number of Cores: 8
L2 Cache (per Core): 256 KB  

Compiler: Apple LLVM version 9.1.0 (clang-902.0.39.2)
Optimization: -O2

**MasterStateQuery_BENCHMARK_Test.GetState, v0 '/state' response time**

setup                                                    | no batching | batching
---------------------------------------------------------|-------------|----------
 1000 agents,  10000 running, and  10000 completed tasks | 146.496ms   | 158.319ms
10000 agents, 100000 running, and 100000 completed tasks | 1.795s      | 1.899s
20000 agents, 200000 running, and 200000 completed tasks | 3.742s      | 4.427s
40000 agents, 400000 running, and 400000 completed tasks | 10.946s     | 11.096s

**MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 1**
Test setup 1: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 50 '/state' and '/flags' requests will be sent with 200ms interval, i.e., total **50 measurements** per endpoint.

/flags | no batching | batching       /state | no batching | batching
-------------------------------   *   --------------------------------
   min |  1.598ms    | 1.475ms           min | 100.627ms   | 105.383ms
   p25 |  2.370ms    | 2.452ms           p25 | 102.206ms   | 107.184ms
   p50 |  2.520ms    | 2.562ms           p50 | 103.213ms   | 108.468ms
   p75 |  2.623ms    | 2.665ms           p75 | 104.100ms   | 109.808ms
   p90 |  2.803ms    | 2.731ms           p90 | 106.079ms   | 111.043ms
   max | 84.957ms    | 2.934ms           max | 153.438ms   | 154.636ms

**MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 2**
Test setup 2: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval, i.e., total **10 measurements** per endpoint.

/flags | no batching | batching       /state | no batching | batching
--------------------------------   *  -------------------------------
   min | 2.309ms     |   1.579ms         min | 1.512s      | 2.820s
   p25 | 1.547s      | 373.609ms         p25 | 3.262s      | 3.588s
   p50 | 3.189s      | 831.261ms         p50 | 5.052s      | 4.253s
   p75 | 5.346s      |   2.215s          p75 | 6.846s      | 4.510s
   p90 | 5.854s      |   2.351s          p90 | 7.883s      | 4.705s
   max | 7.237s      |   2.444s          max | 8.517s      | 4.934s


Thanks,

Alexander Rukletsov

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Alexander Rukletsov <ru...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/
-----------------------------------------------------------

(Updated Aug. 6, 2018, 10:30 a.m.)


Review request for mesos, Benno Evers and Benjamin Mahler.


Bugs: MESOS-9122
    https://issues.apache.org/jira/browse/MESOS-9122


Repository: mesos


Description
-------

With this patch handlers for '/state' requests are not scheduled
directly after authorization, but are accumulated and then scheduled
for later parallel processing.

This approach allows, if there are N '/state' requests in the Master's
mailbox and T is the request response time, to block the Master actor
only once for time O(T) instead of blocking it for time N*T prior to
this patch.

This batching technique reduces both the time Master is spending
answering '/state' requests and the average request response time
in presence of multiple requests in the Master's mailbox. However,
for seldom '/state' requests that don't accumulate in the Master's
mailbox, the response time might increase due to an added trip
through the mailbox.

The change preserves the read-your-writes consistency model.


Diffs (updated)
-----

  src/master/http.cpp d43fbd689598612ec5946b46e2fa5e7f5e22cfa8 
  src/master/master.hpp 209b998db8d2bad7a3812df44f0939458f48eb11 


Diff: https://reviews.apache.org/r/68132/diff/2/

Changes: https://reviews.apache.org/r/68132/diff/1-2/


Testing
-------

`make check` on Mac OS 10.13.5 and various Linux distros.

Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark?

**Setup**
Processor: Intel i7-4980HQ 2.8 GHz with 6 MB on-chip L3 cache and 128 MB L4 cache (Crystalwell)
Total Number of Cores: 4
Total Number of Cores: 8
L2 Cache (per Core): 256 KB  

Average improvement without optimization: 62%–70%.
Average improvement with optimization: 17%–62%.

**[No batching, no optimization](https://dobianchi.files.wordpress.com/2011/11/no-barrique-no-berlusconi.jpg?w=638)**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 1.102349605secs, 10 responses are in [2.662342ms, 2.143755433secs]
'/state' response on average took 1.549122019secs, 10 responses are in [494.278454ms, 2.633971927secs]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 18.436968137secs, 10 responses are in [2.578238ms, 33.210561732secs]
'/state' response on average took 23.916379537secs, 10 responses are in [5.170660597secs, 43.008091744secs]
```

**With batching but no optimization**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 417.211022ms, 10 responses are in [4.066901ms, 728.045442ms]
'/state' response on average took 830.351291ms, 10 responses are in [459.033455ms, 1.208880892secs]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 5.439950928secs, 10 responses are in [3.246906ms, 9.343994388secs]
'/state' response on average took 16.764607823secs, 10 responses are in [4.980333091secs, 18.461983916secs]
```

**No batching but `-O3` optimization**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 2.396221ms, 10 responses are in [1.628583ms, 2.816639ms]
'/state' response on average took 113.469574ms, 10 responses are in [104.218099ms, 134.477062ms]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 3.892615876secs, 10 responses are in [2.480517ms, 7.630934838secs]
'/state' response on average took 5.205245306secs, 10 responses are in [1.578161651secs, 8.789315237secs]
```

**Batching and `-O3` optimization**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 1.973573ms, 10 responses are in [1.221193ms, 2.694713ms]
'/state' response on average took 113.331551ms, 10 responses are in [102.593397ms, 142.028555ms]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 1.475842691secs, 10 responses are in [2.437217ms, 3.815589561secs]
'/state' response on average took 4.742303751secs, 10 responses are in [4.047655443secs, 6.00752698secs]
```


Thanks,

Alexander Rukletsov

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Alexander Rukletsov <ru...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/
-----------------------------------------------------------

(Updated Aug. 3, 2018, 1:46 p.m.)


Review request for mesos, Benno Evers and Benjamin Mahler.


Bugs: MESOS-9122
    https://issues.apache.org/jira/browse/MESOS-9122


Repository: mesos


Description
-------

With this patch handlers for '/state' requests are not scheduled
directly after authorization, but are accumulated and then scheduled
for later parallel processing.

This approach allows, if there are N '/state' requests in the Master's
mailbox and T is the request response time, to block the Master actor
only once for time O(T) instead of blocking it for time N*T prior to
this patch.

This batching technique reduces both the time Master is spending
answering '/state' requests and the average request response time
in presence of multiple requests in the Master's mailbox. However,
for seldom '/state' requests that don't accumulate in the Master's
mailbox, the response time might increase due to an added trip
through the mailbox.

The change preserves the read-your-writes consistency model.


Diffs
-----

  src/master/http.cpp 6947031da3ce3523408d69d6dac60551a39a4601 
  src/master/master.hpp 0353d550308816f219aedb6afe15c643fc8bb340 
  src/master/master.cpp 2af976f7ea7f81d4b06a45ce13286dbd61b9b144 


Diff: https://reviews.apache.org/r/68132/diff/1/


Testing (updated)
-------

`make check` on Mac OS 10.13.5 and various Linux distros.

Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark?

**Setup**
Processor: Intel i7-4980HQ 2.8 GHz with 6 MB on-chip L3 cache and 128 MB L4 cache (Crystalwell)
Total Number of Cores: 4
Total Number of Cores: 8
L2 Cache (per Core): 256 KB  

Average improvement without optimization: 62%–70%.
Average improvement with optimization: 17%–62%.

**[No batching, no optimization](https://dobianchi.files.wordpress.com/2011/11/no-barrique-no-berlusconi.jpg?w=638)**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 1.102349605secs, 10 responses are in [2.662342ms, 2.143755433secs]
'/state' response on average took 1.549122019secs, 10 responses are in [494.278454ms, 2.633971927secs]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 18.436968137secs, 10 responses are in [2.578238ms, 33.210561732secs]
'/state' response on average took 23.916379537secs, 10 responses are in [5.170660597secs, 43.008091744secs]
```

**With batching but no optimization**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 417.211022ms, 10 responses are in [4.066901ms, 728.045442ms]
'/state' response on average took 830.351291ms, 10 responses are in [459.033455ms, 1.208880892secs]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 5.439950928secs, 10 responses are in [3.246906ms, 9.343994388secs]
'/state' response on average took 16.764607823secs, 10 responses are in [4.980333091secs, 18.461983916secs]
```

**No batching but `-O3` optimization**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 2.396221ms, 10 responses are in [1.628583ms, 2.816639ms]
'/state' response on average took 113.469574ms, 10 responses are in [104.218099ms, 134.477062ms]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 3.892615876secs, 10 responses are in [2.480517ms, 7.630934838secs]
'/state' response on average took 5.205245306secs, 10 responses are in [1.578161651secs, 8.789315237secs]
```

**Batching and `-O3` optimization**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 1.973573ms, 10 responses are in [1.221193ms, 2.694713ms]
'/state' response on average took 113.331551ms, 10 responses are in [102.593397ms, 142.028555ms]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 1.475842691secs, 10 responses are in [2.437217ms, 3.815589561secs]
'/state' response on average took 4.742303751secs, 10 responses are in [4.047655443secs, 6.00752698secs]
```


Thanks,

Alexander Rukletsov

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Alexander Rukletsov <ru...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/
-----------------------------------------------------------

(Updated Aug. 3, 2018, 10:49 a.m.)


Review request for mesos, Benno Evers and Benjamin Mahler.


Bugs: MESOS-9122
    https://issues.apache.org/jira/browse/MESOS-9122


Repository: mesos


Description (updated)
-------

With this patch handlers for '/state' requests are not scheduled
directly after authorization, but are accumulated and then scheduled
for later parallel processing.

This approach allows, if there are N '/state' requests in the Master's
mailbox and T is the request response time, to block the Master actor
only once for time O(T) instead of blocking it for time N*T prior to
this patch.

This batching technique reduces both the time Master is spending
answering '/state' requests and the average request response time
in presence of multiple requests in the Master's mailbox. However,
for seldom '/state' requests that don't accumulate in the Master's
mailbox, the response time might increase due to an added trip
through the mailbox.

The change preserves the read-your-writes consistency model.


Diffs
-----

  src/master/http.cpp 6947031da3ce3523408d69d6dac60551a39a4601 
  src/master/master.hpp 0353d550308816f219aedb6afe15c643fc8bb340 
  src/master/master.cpp 2af976f7ea7f81d4b06a45ce13286dbd61b9b144 


Diff: https://reviews.apache.org/r/68132/diff/1/


Testing
-------

`make check` on Mac OS 10.13.5 and various Linux distros.

Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark.

Average improvement without optimization: 62%–70%.
Average improvement with optimization: 17%–62%.

**[No batching, no optimization](https://dobianchi.files.wordpress.com/2011/11/no-barrique-no-berlusconi.jpg?w=638)**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 1.102349605secs, 10 responses are in [2.662342ms, 2.143755433secs]
'/state' response on average took 1.549122019secs, 10 responses are in [494.278454ms, 2.633971927secs]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 18.436968137secs, 10 responses are in [2.578238ms, 33.210561732secs]
'/state' response on average took 23.916379537secs, 10 responses are in [5.170660597secs, 43.008091744secs]
```

**With batching but no optimization**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 417.211022ms, 10 responses are in [4.066901ms, 728.045442ms]
'/state' response on average took 830.351291ms, 10 responses are in [459.033455ms, 1.208880892secs]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 5.439950928secs, 10 responses are in [3.246906ms, 9.343994388secs]
'/state' response on average took 16.764607823secs, 10 responses are in [4.980333091secs, 18.461983916secs]
```

**No batching but `-O3` optimization**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 2.396221ms, 10 responses are in [1.628583ms, 2.816639ms]
'/state' response on average took 113.469574ms, 10 responses are in [104.218099ms, 134.477062ms]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 3.892615876secs, 10 responses are in [2.480517ms, 7.630934838secs]
'/state' response on average took 5.205245306secs, 10 responses are in [1.578161651secs, 8.789315237secs]
```

**Batching and `-O3` optimization**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 1.973573ms, 10 responses are in [1.221193ms, 2.694713ms]
'/state' response on average took 113.331551ms, 10 responses are in [102.593397ms, 142.028555ms]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 1.475842691secs, 10 responses are in [2.437217ms, 3.815589561secs]
'/state' response on average took 4.742303751secs, 10 responses are in [4.047655443secs, 6.00752698secs]
```


Thanks,

Alexander Rukletsov

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/#review206768
-----------------------------------------------------------



Bad patch!

Reviews applied: [68132, 68131]

Failed command: python support/apply-reviews.py -n -r 68131

Error:
The support scripts will be upgraded to Python 3 by July 1st.
Make sure to install Python 3.6 on your machine before.
2018-08-01 23:05:18 URL:https://reviews.apache.org/r/68131/diff/raw/ [6814/6814] -> "68131.patch" [1]
error: patch failed: src/tests/master_benchmarks.cpp:482
error: src/tests/master_benchmarks.cpp: patch does not apply

Full log: https://builds.apache.org/job/Mesos-Reviewbot/22988/console

- Mesos Reviewbot


On July 31, 2018, 5:24 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68132/
> -----------------------------------------------------------
> 
> (Updated July 31, 2018, 5:24 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9122
>     https://issues.apache.org/jira/browse/MESOS-9122
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> With this patch handlers for '/state' requests are not scheduled
> directly after authorization, but are accumulated and then scheduled
> for later parallel processing.
> 
> This approach allows, if there are N '/state' requests in the Master's
> mailbox and T is the request response time, to block the Master actor
> only once for time O(T) instead of blocking it for time N*T prior to
> this patch.
> 
> This batching technique reduces both the time Master is spending
> answering '/state' requests and the average request response time
> in presence of multiple requests in the Master's mailbox. However,
> for seldom '/state' requests that don't accumulate in the Master's
> mailbox, the response time might increase due to an added trip
> through the mailbox.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 6947031da3ce3523408d69d6dac60551a39a4601 
>   src/master/master.hpp 0353d550308816f219aedb6afe15c643fc8bb340 
>   src/master/master.cpp 2af976f7ea7f81d4b06a45ce13286dbd61b9b144 
> 
> 
> Diff: https://reviews.apache.org/r/68132/diff/1/
> 
> 
> Testing
> -------
> 
> `make check` on Mac OS 10.13.5 and various Linux distros.
> 
> Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark.
> 
> Average improvement without optimization: 62%–70%.
> Average improvement with optimization: 17%–62%.
> 
> **[No batching, no optimization](https://dobianchi.files.wordpress.com/2011/11/no-barrique-no-berlusconi.jpg?w=638)**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.102349605secs, 10 responses are in [2.662342ms, 2.143755433secs]
> '/state' response on average took 1.549122019secs, 10 responses are in [494.278454ms, 2.633971927secs]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 18.436968137secs, 10 responses are in [2.578238ms, 33.210561732secs]
> '/state' response on average took 23.916379537secs, 10 responses are in [5.170660597secs, 43.008091744secs]
> ```
> 
> **With batching but no optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 417.211022ms, 10 responses are in [4.066901ms, 728.045442ms]
> '/state' response on average took 830.351291ms, 10 responses are in [459.033455ms, 1.208880892secs]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 5.439950928secs, 10 responses are in [3.246906ms, 9.343994388secs]
> '/state' response on average took 16.764607823secs, 10 responses are in [4.980333091secs, 18.461983916secs]
> ```
> 
> **No batching but `-O3` optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 2.396221ms, 10 responses are in [1.628583ms, 2.816639ms]
> '/state' response on average took 113.469574ms, 10 responses are in [104.218099ms, 134.477062ms]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 3.892615876secs, 10 responses are in [2.480517ms, 7.630934838secs]
> '/state' response on average took 5.205245306secs, 10 responses are in [1.578161651secs, 8.789315237secs]
> ```
> 
> **Batching and `-O3` optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.973573ms, 10 responses are in [1.221193ms, 2.694713ms]
> '/state' response on average took 113.331551ms, 10 responses are in [102.593397ms, 142.028555ms]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.475842691secs, 10 responses are in [2.437217ms, 3.815589561secs]
> '/state' response on average took 4.742303751secs, 10 responses are in [4.047655443secs, 6.00752698secs]
> ```
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Alexander Rukletsov <ru...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/
-----------------------------------------------------------

(Updated July 31, 2018, 5:24 p.m.)


Review request for mesos, Benno Evers and Benjamin Mahler.


Bugs: MESOS-9122
    https://issues.apache.org/jira/browse/MESOS-9122


Repository: mesos


Description
-------

With this patch handlers for '/state' requests are not scheduled
directly after authorization, but are accumulated and then scheduled
for later parallel processing.

This approach allows, if there are N '/state' requests in the Master's
mailbox and T is the request response time, to block the Master actor
only once for time O(T) instead of blocking it for time N*T prior to
this patch.

This batching technique reduces both the time Master is spending
answering '/state' requests and the average request response time
in presence of multiple requests in the Master's mailbox. However,
for seldom '/state' requests that don't accumulate in the Master's
mailbox, the response time might increase due to an added trip
through the mailbox.


Diffs
-----

  src/master/http.cpp 6947031da3ce3523408d69d6dac60551a39a4601 
  src/master/master.hpp 0353d550308816f219aedb6afe15c643fc8bb340 
  src/master/master.cpp 2af976f7ea7f81d4b06a45ce13286dbd61b9b144 


Diff: https://reviews.apache.org/r/68132/diff/1/


Testing (updated)
-------

`make check` on Mac OS 10.13.5 and various Linux distros.

Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark.

Average improvement without optimization: 62%–70%.
Average improvement with optimization: 17%–62%.

**[No batching, no optimization](https://dobianchi.files.wordpress.com/2011/11/no-barrique-no-berlusconi.jpg?w=638)**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 1.102349605secs, 10 responses are in [2.662342ms, 2.143755433secs]
'/state' response on average took 1.549122019secs, 10 responses are in [494.278454ms, 2.633971927secs]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 18.436968137secs, 10 responses are in [2.578238ms, 33.210561732secs]
'/state' response on average took 23.916379537secs, 10 responses are in [5.170660597secs, 43.008091744secs]
```

**With batching but no optimization**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 417.211022ms, 10 responses are in [4.066901ms, 728.045442ms]
'/state' response on average took 830.351291ms, 10 responses are in [459.033455ms, 1.208880892secs]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 5.439950928secs, 10 responses are in [3.246906ms, 9.343994388secs]
'/state' response on average took 16.764607823secs, 10 responses are in [4.980333091secs, 18.461983916secs]
```

**No batching but `-O3` optimization**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 2.396221ms, 10 responses are in [1.628583ms, 2.816639ms]
'/state' response on average took 113.469574ms, 10 responses are in [104.218099ms, 134.477062ms]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 3.892615876secs, 10 responses are in [2.480517ms, 7.630934838secs]
'/state' response on average took 5.205245306secs, 10 responses are in [1.578161651secs, 8.789315237secs]
```

**Batching and `-O3` optimization**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 1.973573ms, 10 responses are in [1.221193ms, 2.694713ms]
'/state' response on average took 113.331551ms, 10 responses are in [102.593397ms, 142.028555ms]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 1.475842691secs, 10 responses are in [2.437217ms, 3.815589561secs]
'/state' response on average took 4.742303751secs, 10 responses are in [4.047655443secs, 6.00752698secs]
```


Thanks,

Alexander Rukletsov

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Alexander Rukletsov <ru...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/
-----------------------------------------------------------

(Updated July 31, 2018, 5:22 p.m.)


Review request for mesos, Benno Evers and Benjamin Mahler.


Bugs: MESOS-9122
    https://issues.apache.org/jira/browse/MESOS-9122


Repository: mesos


Description
-------

With this patch handlers for '/state' requests are not scheduled
directly after authorization, but are accumulated and then scheduled
for later parallel processing.

This approach allows, if there are N '/state' requests in the Master's
mailbox and T is the request response time, to block the Master actor
only once for time O(T) instead of blocking it for time N*T prior to
this patch.

This batching technique reduces both the time Master is spending
answering '/state' requests and the average request response time
in presence of multiple requests in the Master's mailbox. However,
for seldom '/state' requests that don't accumulate in the Master's
mailbox, the response time might increase due to an added trip
through the mailbox.


Diffs
-----

  src/master/http.cpp 6947031da3ce3523408d69d6dac60551a39a4601 
  src/master/master.hpp 0353d550308816f219aedb6afe15c643fc8bb340 
  src/master/master.cpp 2af976f7ea7f81d4b06a45ce13286dbd61b9b144 


Diff: https://reviews.apache.org/r/68132/diff/1/


Testing (updated)
-------

`make check` on Mac OS 10.13.5 and various Linux distros.

Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark.

Average improvement without optimization: 62%–70%.
Average improvement with optimization: 17%–62%.

**No batching, no optimization**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 1.102349605secs, 10 responses are in [2.662342ms, 2.143755433secs]
'/state' response on average took 1.549122019secs, 10 responses are in [494.278454ms, 2.633971927secs]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 18.436968137secs, 10 responses are in [2.578238ms, 33.210561732secs]
'/state' response on average took 23.916379537secs, 10 responses are in [5.170660597secs, 43.008091744secs]
```

**With batching but no optimization**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 417.211022ms, 10 responses are in [4.066901ms, 728.045442ms]
'/state' response on average took 830.351291ms, 10 responses are in [459.033455ms, 1.208880892secs]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 5.439950928secs, 10 responses are in [3.246906ms, 9.343994388secs]
'/state' response on average took 16.764607823secs, 10 responses are in [4.980333091secs, 18.461983916secs]
```

**No batching but `-O3` optimization**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 2.396221ms, 10 responses are in [1.628583ms, 2.816639ms]
'/state' response on average took 113.469574ms, 10 responses are in [104.218099ms, 134.477062ms]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 3.892615876secs, 10 responses are in [2.480517ms, 7.630934838secs]
'/state' response on average took 5.205245306secs, 10 responses are in [1.578161651secs, 8.789315237secs]
```

**Batching and `-O3` optimization**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 1.973573ms, 10 responses are in [1.221193ms, 2.694713ms]
'/state' response on average took 113.331551ms, 10 responses are in [102.593397ms, 142.028555ms]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 1.475842691secs, 10 responses are in [2.437217ms, 3.815589561secs]
'/state' response on average took 4.742303751secs, 10 responses are in [4.047655443secs, 6.00752698secs]
```


Thanks,

Alexander Rukletsov

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/#review206662
-----------------------------------------------------------



PASS: Mesos patch 68132 was successfully built and tested.

Reviews applied: `['68131', '68132']`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2011/mesos-review-68132

- Mesos Reviewbot Windows


On July 31, 2018, 3:05 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68132/
> -----------------------------------------------------------
> 
> (Updated July 31, 2018, 3:05 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9122
>     https://issues.apache.org/jira/browse/MESOS-9122
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> With this patch handlers for '/state' requests are not scheduled
> directly after authorization, but are accumulated and then scheduled
> for later parallel processing.
> 
> This approach allows, if there are N '/state' requests in the Master's
> mailbox and T is the request response time, to block the Master actor
> only once for time O(T) instead of blocking it for time N*T prior to
> this patch.
> 
> This batching technique reduces both the time Master is spending
> answering '/state' requests and the average request response time
> in presence of multiple requests in the Master's mailbox. However,
> for seldom '/state' requests that don't accumulate in the Master's
> mailbox, the response time might increase due to an added trip
> through the mailbox.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 6947031da3ce3523408d69d6dac60551a39a4601 
>   src/master/master.hpp 0353d550308816f219aedb6afe15c643fc8bb340 
>   src/master/master.cpp 2af976f7ea7f81d4b06a45ce13286dbd61b9b144 
> 
> 
> Diff: https://reviews.apache.org/r/68132/diff/1/
> 
> 
> Testing
> -------
> 
> `make check` on Mac OS 10.13.5 and various Linux distros.
> 
> Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark.
> 
> **Without this patch**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.102349605secs, 10 responses are in [2.662342ms, 2.143755433secs]
> '/state' response on average took 1.549122019secs, 10 responses are in [494.278454ms, 2.633971927secs]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 18.436968137secs, 10 responses are in [2.578238ms, 33.210561732secs]
> '/state' response on average took 23.916379537secs, 10 responses are in [5.170660597secs, 43.008091744secs]
> ```
> 
> **With this patch**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 417.211022ms, 10 responses are in [4.066901ms, 728.045442ms]
> '/state' response on average took 830.351291ms, 10 responses are in [459.033455ms, 1.208880892secs]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 5.439950928secs, 10 responses are in [3.246906ms, 9.343994388secs]
> '/state' response on average took 16.764607823secs, 10 responses are in [4.980333091secs, 18.461983916secs]
> ```
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Re: Review Request 68132: Batch '/state' requests on Master.

Posted by Alexander Rukletsov <ru...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/
-----------------------------------------------------------

(Updated July 31, 2018, 3:05 p.m.)


Review request for mesos, Benno Evers and Benjamin Mahler.


Bugs: MESOS-9122
    https://issues.apache.org/jira/browse/MESOS-9122


Repository: mesos


Description (updated)
-------

With this patch handlers for '/state' requests are not scheduled
directly after authorization, but are accumulated and then scheduled
for later parallel processing.

This approach allows, if there are N '/state' requests in the Master's
mailbox and T is the request response time, to block the Master actor
only once for time O(T) instead of blocking it for time N*T prior to
this patch.

This batching technique reduces both the time Master is spending
answering '/state' requests and the average request response time
in presence of multiple requests in the Master's mailbox. However,
for seldom '/state' requests that don't accumulate in the Master's
mailbox, the response time might increase due to an added trip
through the mailbox.


Diffs
-----

  src/master/http.cpp 6947031da3ce3523408d69d6dac60551a39a4601 
  src/master/master.hpp 0353d550308816f219aedb6afe15c643fc8bb340 
  src/master/master.cpp 2af976f7ea7f81d4b06a45ce13286dbd61b9b144 


Diff: https://reviews.apache.org/r/68132/diff/1/


Testing
-------

`make check` on Mac OS 10.13.5 and various Linux distros.

Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark.

**Without this patch**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 1.102349605secs, 10 responses are in [2.662342ms, 2.143755433secs]
'/state' response on average took 1.549122019secs, 10 responses are in [494.278454ms, 2.633971927secs]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 18.436968137secs, 10 responses are in [2.578238ms, 33.210561732secs]
'/state' response on average took 23.916379537secs, 10 responses are in [5.170660597secs, 43.008091744secs]
```

**With this patch**
```
Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 417.211022ms, 10 responses are in [4.066901ms, 728.045442ms]
'/state' response on average took 830.351291ms, 10 responses are in [459.033455ms, 1.208880892secs]

Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent with 200ms interval
Launching 10 '/state' requests in background
Launching 10 '/flags' requests
'/flags' response on average took 5.439950928secs, 10 responses are in [3.246906ms, 9.343994388secs]
'/state' response on average took 16.764607823secs, 10 responses are in [4.980333091secs, 18.461983916secs]
```


Thanks,

Alexander Rukletsov