You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Jiang Yan Xu <ya...@jxu.me> on 2017/10/19 23:28:41 UTC

Review Request 63174: Added a benchmark for agent reregistration during master failover.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/
-----------------------------------------------------------

Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.


Bugs: MESOS-8098
    https://issues.apache.org/jira/browse/MESOS-8098


Repository: mesos


Description
-------

The current benchmark is very simple: without framework involvement and without agent retries but it's possible to add a number of others so I am creating a new file for them.


Diffs
-----

  src/Makefile.am 936bc49ddfca03b9278ab11b6d317f3ff635cb00 
  src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
  src/tests/master_benchmarks.cpp PRE-CREATION 


Diff: https://reviews.apache.org/r/63174/diff/1/


Testing
-------

Benchmark based off https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a (close to current HEAD).

```
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks in 45.075488ms
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (48126 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks in 14.172361ms
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (45979 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks in 413.508328ms
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (49487 ms)
[----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (143596 ms total)

...

[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks in 32.787363ms
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (48266 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks in 19.735003ms
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (46169 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks in 321.267267ms
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (51550 ms)
[----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (145987 ms total)
```

Benchmark based off https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
```
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks in 85.800335ms
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (59247 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks in 35.342066ms
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (93662 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks in 798.738642ms
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (116078 ms)
[----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (268987 ms total)

...

[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks in 66.270249ms
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (59925 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks in 50.146349ms
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (88631 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks in 807.621964ms
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (109941 ms)
[----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (258497 ms total)
```

The recently patches cut down the time by nearly 50%. These were built with `--enable-optimize`.

I can also get some flame graphs.


Thanks,

Jiang Yan Xu


Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review189082
-----------------------------------------------------------



PASS: Mesos patch 63174 was successfully built and tested.

Reviews applied: `['63174']`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/63174

- Mesos Reviewbot Windows


On Oct. 24, 2017, 2:05 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> -----------------------------------------------------------
> 
> (Updated Oct. 24, 2017, 2:05 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
>     https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The current benchmark is very simple: without framework involvement and without agent retries but it's possible to add a number of others so I am creating a new file for them.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am b60a54a031260de6f1fb43584ae5083df2dc7e31 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/2/
> 
> 
> Testing
> -------
> 
> Benchmark based off https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a (close to current HEAD).
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.188008209secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (22404 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 20.868372615secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (37981 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 15.354579251secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (33766 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (94151 ms total)
> 
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.045441129secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (19959 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 21.324309077secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (38490 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 14.68607521secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (32073 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (90523 ms total)
> 
> ```
> 
> Benchmark based off https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 23.217901878secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (38327 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 46.158610597secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (75280 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 38.56781112secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (68006 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (181613 ms total)
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 25.752844224secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (43509 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 45.190859035secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (73966 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 36.322992753secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (66946 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (184421 ms total)
> ```
> 
> The recently patches cut down the time by over 50%. These were built with `--enable-optimize --enable-lock-free-run-queue --enable-lock-free-event-queue --enable-last-in-first-out-fixed-size-semaphore`.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review189079
-----------------------------------------------------------



Patch looks great!

Reviews applied: [63174]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Oct. 24, 2017, 6:05 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> -----------------------------------------------------------
> 
> (Updated Oct. 24, 2017, 6:05 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
>     https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The current benchmark is very simple: without framework involvement and without agent retries but it's possible to add a number of others so I am creating a new file for them.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am b60a54a031260de6f1fb43584ae5083df2dc7e31 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/2/
> 
> 
> Testing
> -------
> 
> Benchmark based off https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a (close to current HEAD).
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.188008209secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (22404 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 20.868372615secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (37981 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 15.354579251secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (33766 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (94151 ms total)
> 
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.045441129secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (19959 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 21.324309077secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (38490 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 14.68607521secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (32073 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (90523 ms total)
> 
> ```
> 
> Benchmark based off https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 23.217901878secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (38327 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 46.158610597secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (75280 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 38.56781112secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (68006 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (181613 ms total)
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 25.752844224secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (43509 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 45.190859035secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (73966 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 36.322992753secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (66946 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (184421 ms total)
> ```
> 
> The recently patches cut down the time by over 50%. These were built with `--enable-optimize --enable-lock-free-run-queue --enable-lock-free-event-queue --enable-last-in-first-out-fixed-size-semaphore`.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

Posted by Jiang Yan Xu <ya...@jxu.me>.

> On Oct. 24, 2017, 2:55 p.m., Benjamin Mahler wrote:
> > A couple of suggestions for speeding up the benchmark overhead:
> > 
> > (1) Upgrade protobuf to 3.4.x, this comes with move support and rvalue setters for fields. Which will avoid some copies in the benchmark code and improve performance elsewhere too :) In the interim, you could manually use `Swap(T*)` but it means we'd probably want to re-write the code once move support is available (so that doesn't seem like a good option).
> > 
> > (2) You could try using an arena for the test fixture, although I don't know if it's worth the complexity. Probably just reducing copying is simpler.
> > 
> > (3) We can avoid re-parsing resources for each task and agent.

Using `Swap` for now and will clean up after proto 3.4.


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review189093
-----------------------------------------------------------


On Nov. 1, 2017, 3:06 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> -----------------------------------------------------------
> 
> (Updated Nov. 1, 2017, 3:06 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
>     https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The current benchmark is very simple: without framework involvement and without agent retries but it's possible to add a number of others so I am creating a new file for them.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 1c97b1fd8151f87c4e9e6d62884b0ef7d582c312 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/3/
> 
> 
> Testing
> -------
> 
> Benchmark based off https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a (close to current HEAD).
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.188008209secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (22404 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 20.868372615secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (37981 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 15.354579251secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (33766 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (94151 ms total)
> 
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.045441129secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (19959 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 21.324309077secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (38490 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 14.68607521secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (32073 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (90523 ms total)
> 
> ```
> 
> Benchmark based off https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 23.217901878secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (38327 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 46.158610597secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (75280 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 38.56781112secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (68006 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (181613 ms total)
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 25.752844224secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (43509 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 45.190859035secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (73966 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 36.322992753secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (66946 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (184421 ms total)
> ```
> 
> The recently patches cut down the time by over 50%. These were built with `--enable-optimize --enable-lock-free-run-queue --enable-lock-free-event-queue --enable-last-in-first-out-fixed-size-semaphore`.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review189093
-----------------------------------------------------------



A couple of suggestions for speeding up the benchmark overhead:

(1) Upgrade protobuf to 3.4.x, this comes with move support and rvalue setters for fields. Which will avoid some copies in the benchmark code and improve performance elsewhere too :) In the interim, you could manually use `Swap(T*)` but it means we'd probably want to re-write the code once move support is available (so that doesn't seem like a good option).

(2) You could try using an arena for the test fixture, although I don't know if it's worth the complexity. Probably just reducing copying is simpler.

(3) We can avoid re-parsing resources for each task and agent.


src/tests/master_benchmarks.cpp
Lines 63 (patched)
<https://reviews.apache.org/r/63174/#comment266020>

    Can you avoid parsing resources for each agent?



src/tests/master_benchmarks.cpp
Lines 84 (patched)
<https://reviews.apache.org/r/63174/#comment266018>

    Can you avoid parsing resources for every task?



src/tests/master_benchmarks.cpp
Lines 91-92 (patched)
<https://reviews.apache.org/r/63174/#comment266019>

    Code written this way is nice because it will automatically benefit from move support when we upgrade protobuf to 3.4.x. :)
    
    Maybe you can write more of the test in such a manner that it would benefit from an upgrade to 3.4.x? I would be happy to review a 3.4.x upgrade since we need it for other performance improvements. We can see who wants to pick that up, I think Dmitry might be interested.



src/tests/master_benchmarks.cpp
Lines 139-140 (patched)
<https://reviews.apache.org/r/63174/#comment266015>

    Here's an example of where you could move into `message.frameworks` if you upgrade to protobuf 3.4.x:
    
    ```
    message.mutable_frameworks()->Add(createFrameworkInfo(frameworkId));
    ```
    
    Alternatively, pre-3.4.x, you can swap:
    
    ```
    message.add_frameworks()->Swap(&createFrameworkInfo(frameworkId));
    
    // maybe you need to do:
    
    FrameworkInfo f = createFrameworkInfo(frameworkId);
    message.add_frameworks()->Swap(&f);
    ```



src/tests/master_benchmarks.cpp
Lines 143-147 (patched)
<https://reviews.apache.org/r/63174/#comment266016>

    Ditto copying here.



src/tests/master_benchmarks.cpp
Lines 163-167 (patched)
<https://reviews.apache.org/r/63174/#comment266017>

    Ditto copying here and elsewhere.



src/tests/master_benchmarks.cpp
Lines 241-243 (patched)
<https://reviews.apache.org/r/63174/#comment266013>

    Comment about why you're using the replicated log here?



src/tests/master_benchmarks.cpp
Lines 261 (patched)
<https://reviews.apache.org/r/63174/#comment266012>

    I'm a little concerned about this pattern, because if the test were to fail an assertion, the process would be destructed without terminating / waiting on it.
    
    Can you use a wrapper around the process that terminates and waits?
    
    Alternatively, if we had a SCOPE_EXIT { ... } abstraction (I had a review but never committed it), we could just do:
    
    ```
    SCOPE_EXIT { process::terminate(pid); wait(pid); };
    ```
    
    E.g. https://github.com/facebook/folly/blob/v2017.10.23.00/folly/ScopeGuard.h#L285-L287


- Benjamin Mahler


On Oct. 24, 2017, 6:05 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> -----------------------------------------------------------
> 
> (Updated Oct. 24, 2017, 6:05 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
>     https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The current benchmark is very simple: without framework involvement and without agent retries but it's possible to add a number of others so I am creating a new file for them.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am b60a54a031260de6f1fb43584ae5083df2dc7e31 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/2/
> 
> 
> Testing
> -------
> 
> Benchmark based off https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a (close to current HEAD).
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.188008209secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (22404 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 20.868372615secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (37981 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 15.354579251secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (33766 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (94151 ms total)
> 
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.045441129secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (19959 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 21.324309077secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (38490 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 14.68607521secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (32073 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (90523 ms total)
> 
> ```
> 
> Benchmark based off https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 23.217901878secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (38327 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 46.158610597secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (75280 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 38.56781112secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (68006 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (181613 ms total)
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 25.752844224secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (43509 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 45.190859035secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (73966 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 36.322992753secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (66946 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (184421 ms total)
> ```
> 
> The recently patches cut down the time by over 50%. These were built with `--enable-optimize --enable-lock-free-run-queue --enable-lock-free-event-queue --enable-last-in-first-out-fixed-size-semaphore`.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review189978
-----------------------------------------------------------


Ship it!




Are you able to also upload some flame graphs to MESOS-8098 for posterity? To avoid including unnecessary data, I guess you could temporarily tweak the benchmark to test sleep before and after the timed section so that you can start/stop profiling for just the parts we care about.


src/tests/master_benchmarks.cpp
Lines 65-66 (patched)
<https://reviews.apache.org/r/63174/#comment267233>

    Can you say we use a static here to avoid the cost of re-parsing?



src/tests/master_benchmarks.cpp
Lines 88 (patched)
<https://reviews.apache.org/r/63174/#comment267234>

    Ditto here



src/tests/master_benchmarks.cpp
Lines 316 (patched)
<https://reviews.apache.org/r/63174/#comment267232>

    Do you need to print this?


- Benjamin Mahler


On Nov. 1, 2017, 10:06 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> -----------------------------------------------------------
> 
> (Updated Nov. 1, 2017, 10:06 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
>     https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The current benchmark is very simple: without framework involvement and without agent retries but it's possible to add a number of others so I am creating a new file for them.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 1c97b1fd8151f87c4e9e6d62884b0ef7d582c312 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/3/
> 
> 
> Testing
> -------
> 
> Benchmark based off https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a (close to current HEAD).
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.188008209secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (22404 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 20.868372615secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (37981 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 15.354579251secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (33766 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (94151 ms total)
> 
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.045441129secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (19959 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 21.324309077secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (38490 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 14.68607521secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (32073 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (90523 ms total)
> 
> ```
> 
> Benchmark based off https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 23.217901878secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (38327 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 46.158610597secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (75280 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 38.56781112secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (68006 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (181613 ms total)
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 25.752844224secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (43509 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 45.190859035secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (73966 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 36.322992753secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (66946 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (184421 ms total)
> ```
> 
> The recently patches cut down the time by over 50%. These were built with `--enable-optimize --enable-lock-free-run-queue --enable-lock-free-event-queue --enable-last-in-first-out-fixed-size-semaphore`.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

Posted by Jiang Yan Xu <ya...@jxu.me>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/
-----------------------------------------------------------

(Updated Nov. 3, 2017, 11:10 a.m.)


Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.


Changes
-------

Addressed comment. NNFR.


Bugs: MESOS-8098
    https://issues.apache.org/jira/browse/MESOS-8098


Repository: mesos


Description
-------

The current benchmark is very simple: without framework involvement and without agent retries but it's possible to add a number of others so I am creating a new file for them.


Diffs (updated)
-----

  src/Makefile.am 1c97b1fd8151f87c4e9e6d62884b0ef7d582c312 
  src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
  src/tests/master_benchmarks.cpp PRE-CREATION 


Diff: https://reviews.apache.org/r/63174/diff/4/

Changes: https://reviews.apache.org/r/63174/diff/3-4/


Testing
-------

Benchmark based off https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a (close to current HEAD).

```
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.188008209secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (22404 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 20.868372615secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (37981 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 15.354579251secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (33766 ms)
[----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (94151 ms total)


[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.045441129secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (19959 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 21.324309077secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (38490 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 14.68607521secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (32073 ms)
[----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (90523 ms total)

```

Benchmark based off https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)

```
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 23.217901878secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (38327 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 46.158610597secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (75280 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 38.56781112secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (68006 ms)
[----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (181613 ms total)

[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 25.752844224secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (43509 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 45.190859035secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (73966 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 36.322992753secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (66946 ms)
[----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (184421 ms total)
```

The recently patches cut down the time by over 50%. These were built with `--enable-optimize --enable-lock-free-run-queue --enable-lock-free-event-queue --enable-last-in-first-out-fixed-size-semaphore`.


Thanks,

Jiang Yan Xu


Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review189867
-----------------------------------------------------------



FAIL: Mesos tests failed to build.

Reviews applied: `['63174']`

Failed command: `cmake.exe --build . --target mesos-tests --config Debug`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/63174

Relevant logs:

- [mesos-tests-build-cmake-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/63174/logs/mesos-tests-build-cmake-stdout.log):

```
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(59): warning C4996: 'GetVersionExW': was declared deprecated (compiling source file C:\DCOS\mesos\mesos\src\tests\uri_fetcher_tests.cpp) [C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(448): warning C4996: 'GetVersionExW': was declared deprecated (compiling source file C:\DCOS\mesos\mesos\src\tests\uri_fetcher_tests.cpp) [C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\src\master/master.hpp(2070): warning C4244: 'return': conversion from 'unsigned __int64' to 'double', possible loss of data (compiling source file C:\DCOS\mesos\mesos\src\tests\values_tests.cpp) [C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\src\tests\values_tests.cpp(51): warning C4244: 'argument': conversion from 'double' to 'float', possible loss of data [C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\src\tests\values_tests.cpp(51): warning C4305: 'argument': truncation from 'double' to 'float' [C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(59): warning C4996: 'GetVersionExW': was declared deprecated (compiling source file C:\DCOS\mesos\mesos\src\tests\common\recordio_tests.cpp) [C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(448): warning C4996: 'GetVersionExW': was declared deprecated (compiling source file C:\DCOS\mesos\mesos\src\tests\common\recordio_tests.cpp) [C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(59): warning C4996: 'GetVersionExW': was declared deprecated (compiling source file C:\DCOS\mesos\mesos\src\tests\common\http_tests.cpp) [C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(448): warning C4996: 'GetVersionExW': was declared deprecated (compiling source file C:\DCOS\mesos\mesos\src\tests\common\http_tests.cpp) [C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(59): warning C4996: 'GetVersionExW': was declared deprecated (compiling source file C:\DCOS\mesos\mesos\src\tests\common\type_utils_tests.cpp) [C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(448): warning C4996: 'GetVersionExW': was declared deprecated (compiling source file C:\DCOS\mesos\mesos\src\tests\common\type_utils_tests.cpp) [C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\src\master/master.hpp(2070): warning C4244: 'return': conversion from 'unsigned __int64' to 'double', possible loss of data (compiling source file C:\DCOS\mesos\mesos\src\tests\common\type_utils_tests.cpp) [C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(59): warning C4996: 'GetVersionExW': was declared deprecated (compiling source file C:\DCOS\mesos\mesos\src\tests\containerizer\docker_tests.cpp) [C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(448): warning C4996: 'GetVersionExW': was declared deprecated (compiling source file C:\DCOS\mesos\mesos\src\tests\containerizer\docker_tests.cpp) [C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(59): warning C4996: 'GetVersionExW': was declared deprecated (compiling source file C:\DCOS\mesos\mesos\src\tests\containerizer\containerizer_tests.cpp) [C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(448): warning C4996: 'GetVersionExW': was declared deprecated (compiling source file C:\DCOS\mesos\mesos\src\tests\containerizer\containerizer_tests.cpp) [C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\src\master/master.hpp(2070): warning C4244: 'return': conversion from 'unsigned __int64' to 'double', possible loss of data (compiling source file C:\DCOS\mesos\mesos\src\tests\containerizer\containerizer_tests.cpp) [C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\src\master/master.hpp(2070): warning C4244: 'return': conversion from 'unsigned __int64' to 'double', possible loss of data (compiling source file C:\DCOS\mesos\mesos\src\tests\containerizer\docker_tests.cpp) [C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]


"C:\DCOS\mesos\src\tests\mesos-tests.vcxproj" (default target) (1) ->
(Link target) -> 
  default_executor_tests.obj : error LNK2001: unresolved external symbol "public: static char const * const mesos::internal::tests::KillPolicyTestHelper::NAME" (?NAME@KillPolicyTestHelper@tests@internal@mesos@@2QBDB) [C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\src\mesos-tests.exe : fatal error LNK1120: 1 unresolved externals [C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]

    452 Warning(s)
    2 Error(s)

Time Elapsed 01:10:30.58
```

- [mesos-tests-CMakeOutput.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/63174/logs/mesos-tests-CMakeOutput.log):

```
  Creating directory "C:\DCOS\mesos\CMakeFiles\CMakeTmp\Debug\".

  Creating directory "cmTC_d8194.dir\Debug\cmTC_d8194.tlog\".

InitializeBuildStatus:

  Creating "cmTC_d8194.dir\Debug\cmTC_d8194.tlog\unsuccessfulbuild" because "AlwaysCreate" was specified.

ClCompile:

  C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\bin\HostX64\x64\CL.exe /c /Zi /W3 /WX- /diagnostics:classic /Od /Ob0 /D WIN32 /D _WINDOWS /D COMPILER_SUPPORTS_CXX11 /D "CMAKE_INTDIR=\"Debug\"" /D _MBCS /Gm- /EHsc /RTC1 /MDd /GS /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /GR /Fo"cmTC_d8194.dir\Debug\" /Fd"cmTC_d8194.dir\Debug\vc141.pdb" /Gd /TP /errorReport:queue C:\DCOS\mesos\CMakeFiles\CMakeTmp\src.cxx

  Microsoft (R) C/C++ Optimizing Compiler Version 19.10.25019 for x64

  Copyright (C) Microsoft Corporation.  All rights reserved.

  

  cl /c /Zi /W3 /WX- /diagnostics:classic /Od /Ob0 /D WIN32 /D _WINDOWS /D COMPILER_SUPPORTS_CXX11 /D "CMAKE_INTDIR=\"Debug\"" /D _MBCS /Gm- /EHsc /RTC1 /MDd /GS /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /GR /Fo"cmTC_d8194.dir\Debug\" /Fd"cmTC_d8194.dir\Debug\vc141.pdb" /Gd /TP /errorReport:queue C:\DCOS\mesos\CMakeFiles\CMakeTmp\src.cxx

  src.cxx

  

Link:

  C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\bin\HostX64\x64\link.exe /ERRORREPORT:QUEUE /OUT:"C:\DCOS\mesos\CMakeFiles\CMakeTmp\Debug\cmTC_d8194.exe" /INCREMENTAL /NOLOGO kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /manifest:embed /DEBUG /PDB:"C:/DCOS/mesos/CMakeFiles/CMakeTmp/Debug/cmTC_d8194.pdb" /SUBSYSTEM:CONSOLE /TLBID:1 /DYNAMICBASE /NXCOMPAT /IMPLIB:"C:/DCOS/mesos/CMakeFiles/CMakeTmp/Debug/cmTC_d8194.lib" /MACHINE:X64  /machine:x64 cmTC_d8194.dir\Debug\src.obj

  cmTC_d8194.vcxproj -> C:\DCOS\mesos\CMakeFiles\CMakeTmp\Debug\cmTC_d8194.exe

  cmTC_d8194.vcxproj -> C:/DCOS/mesos/CMakeFiles/CMakeTmp/Debug/cmTC_d8194.pdb (Full PDB)

FinalizeBuildStatus:

  Deleting file "cmTC_d8194.dir\Debug\cmTC_d8194.tlog\unsuccessfulbuild".

  Touching "cmTC_d8194.dir\Debug\cmTC_d8194.tlog\cmTC_d8194.lastbuildstate".

Done Building Project "C:\DCOS\mesos\CMakeFiles\CMakeTmp\cmTC_d8194.vcxproj" (default targets).



Build succeeded.

    0 Warning(s)

    0 Error(s)



Time Elapsed 00:00:02.52


Source file was:
int main() { return 0; }
```

- [mesos-tests-CMakeError.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/63174/logs/mesos-tests-CMakeError.log):

```
PrepareForBuild:

  Creating directory "cmTC_21b2c.dir\Debug\".

  Creating directory "C:\DCOS\mesos\CMakeFiles\CMakeTmp\Debug\".

  Creating directory "cmTC_21b2c.dir\Debug\cmTC_21b2c.tlog\".

InitializeBuildStatus:

  Creating "cmTC_21b2c.dir\Debug\cmTC_21b2c.tlog\unsuccessfulbuild" because "AlwaysCreate" was specified.

ClCompile:

  C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\bin\HostX64\x64\CL.exe /c /Zi /W3 /WX- /diagnostics:classic /MP /Od /Ob0 /D WIN32 /D _WINDOWS /D UNICODE /D _UNICODE /D "CMAKE_INTDIR=\"Debug\"" /D _UNICODE /D UNICODE /Gm- /RTC1 /MTd /GS /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /Fo"cmTC_21b2c.dir\Debug\" /Fd"cmTC_21b2c.dir\Debug\vc141.pdb" /Gd /TC /errorReport:queue C:\DCOS\mesos\CMakeFiles\CMakeTmp\CheckIncludeFile.c

  Microsoft (R) C/C++ Optimizing Compiler Version 19.10.25019 for x64

  Copyright (C) Microsoft Corporation.  All rights reserved.

  

  CheckIncludeFile.c

  cl /c /Zi /W3 /WX- /diagnostics:classic /MP /Od /Ob0 /D WIN32 /D _WINDOWS /D UNICODE /D _UNICODE /D "CMAKE_INTDIR=\"Debug\"" /D _UNICODE /D UNICODE /Gm- /RTC1 /MTd /GS /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /Fo"cmTC_21b2c.dir\Debug\" /Fd"cmTC_21b2c.dir\Debug\vc141.pdb" /Gd /TC /errorReport:queue C:\DCOS\mesos\CMakeFiles\CMakeTmp\CheckIncludeFile.c

  

C:\DCOS\mesos\CMakeFiles\CMakeTmp\CheckIncludeFile.c(1): fatal error C1083: Cannot open include file: 'pthread.h': No such file or directory [C:\DCOS\mesos\CMakeFiles\CMakeTmp\cmTC_21b2c.vcxproj]

Done Building Project "C:\DCOS\mesos\CMakeFiles\CMakeTmp\cmTC_21b2c.vcxproj" (default targets) -- FAILED.



Build FAILED.



"C:\DCOS\mesos\CMakeFiles\CMakeTmp\cmTC_21b2c.vcxproj" (default target) (1) ->

(ClCompile target) -> 

  C:\DCOS\mesos\CMakeFiles\CMakeTmp\CheckIncludeFile.c(1): fatal error C1083: Cannot open include file: 'pthread.h': No such file or directory [C:\DCOS\mesos\CMakeFiles\CMakeTmp\cmTC_21b2c.vcxproj]



    0 Warning(s)

    1 Error(s)



Time Elapsed 00:00:01.66



```

- Mesos Reviewbot Windows


On Nov. 1, 2017, 10:06 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> -----------------------------------------------------------
> 
> (Updated Nov. 1, 2017, 10:06 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
>     https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The current benchmark is very simple: without framework involvement and without agent retries but it's possible to add a number of others so I am creating a new file for them.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 1c97b1fd8151f87c4e9e6d62884b0ef7d582c312 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/3/
> 
> 
> Testing
> -------
> 
> Benchmark based off https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a (close to current HEAD).
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.188008209secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (22404 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 20.868372615secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (37981 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 15.354579251secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (33766 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (94151 ms total)
> 
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.045441129secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (19959 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 21.324309077secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (38490 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 14.68607521secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (32073 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (90523 ms total)
> 
> ```
> 
> Benchmark based off https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 23.217901878secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (38327 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 46.158610597secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (75280 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 38.56781112secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (68006 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (181613 ms total)
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 25.752844224secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (43509 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 45.190859035secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (73966 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 36.322992753secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (66946 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (184421 ms total)
> ```
> 
> The recently patches cut down the time by over 50%. These were built with `--enable-optimize --enable-lock-free-run-queue --enable-lock-free-event-queue --enable-last-in-first-out-fixed-size-semaphore`.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review189892
-----------------------------------------------------------



Patch looks great!

Reviews applied: [63174]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Nov. 1, 2017, 10:06 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> -----------------------------------------------------------
> 
> (Updated Nov. 1, 2017, 10:06 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
>     https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The current benchmark is very simple: without framework involvement and without agent retries but it's possible to add a number of others so I am creating a new file for them.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 1c97b1fd8151f87c4e9e6d62884b0ef7d582c312 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/3/
> 
> 
> Testing
> -------
> 
> Benchmark based off https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a (close to current HEAD).
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.188008209secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (22404 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 20.868372615secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (37981 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 15.354579251secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (33766 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (94151 ms total)
> 
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.045441129secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (19959 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 21.324309077secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (38490 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 14.68607521secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (32073 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (90523 ms total)
> 
> ```
> 
> Benchmark based off https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 23.217901878secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (38327 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 46.158610597secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (75280 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 38.56781112secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (68006 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (181613 ms total)
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 25.752844224secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (43509 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 45.190859035secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (73966 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 36.322992753secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (66946 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (184421 ms total)
> ```
> 
> The recently patches cut down the time by over 50%. These were built with `--enable-optimize --enable-lock-free-run-queue --enable-lock-free-event-queue --enable-last-in-first-out-fixed-size-semaphore`.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

Posted by Jiang Yan Xu <ya...@jxu.me>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/
-----------------------------------------------------------

(Updated Nov. 1, 2017, 3:06 p.m.)


Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.


Changes
-------

Addressed review comments, reduces benchmark overhead by 10secs (10%).

```
[----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 10.387637507secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (17506ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 21.918619408secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (41810 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 14.680627873secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (24801 ms)
[----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (84117 ms total)

...

[----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 10.434383702secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (17788 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 21.597951218secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (36953 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 14.982351549secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (25360 ms)
[----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (80101 ms total)
```


Bugs: MESOS-8098
    https://issues.apache.org/jira/browse/MESOS-8098


Repository: mesos


Description
-------

The current benchmark is very simple: without framework involvement and without agent retries but it's possible to add a number of others so I am creating a new file for them.


Diffs (updated)
-----

  src/Makefile.am 1c97b1fd8151f87c4e9e6d62884b0ef7d582c312 
  src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
  src/tests/master_benchmarks.cpp PRE-CREATION 


Diff: https://reviews.apache.org/r/63174/diff/3/

Changes: https://reviews.apache.org/r/63174/diff/2-3/


Testing
-------

Benchmark based off https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a (close to current HEAD).

```
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.188008209secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (22404 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 20.868372615secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (37981 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 15.354579251secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (33766 ms)
[----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (94151 ms total)


[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.045441129secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (19959 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 21.324309077secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (38490 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 14.68607521secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (32073 ms)
[----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (90523 ms total)

```

Benchmark based off https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)

```
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 23.217901878secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (38327 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 46.158610597secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (75280 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 38.56781112secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (68006 ms)
[----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (181613 ms total)

[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 25.752844224secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (43509 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 45.190859035secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (73966 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 36.322992753secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (66946 ms)
[----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (184421 ms total)
```

The recently patches cut down the time by over 50%. These were built with `--enable-optimize --enable-lock-free-run-queue --enable-lock-free-event-queue --enable-last-in-first-out-fixed-size-semaphore`.


Thanks,

Jiang Yan Xu


Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

Posted by Jiang Yan Xu <ya...@jxu.me>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/
-----------------------------------------------------------

(Updated Oct. 24, 2017, 11:05 a.m.)


Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.


Changes
-------

Refactor to put the message preparation work inside each TestSlave actor so they can be parallelized. Also fixed the bug that the test actually didn't wait for all the `SlaveReregisteredMessage`s...


Bugs: MESOS-8098
    https://issues.apache.org/jira/browse/MESOS-8098


Repository: mesos


Description
-------

The current benchmark is very simple: without framework involvement and without agent retries but it's possible to add a number of others so I am creating a new file for them.


Diffs (updated)
-----

  src/Makefile.am b60a54a031260de6f1fb43584ae5083df2dc7e31 
  src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
  src/tests/master_benchmarks.cpp PRE-CREATION 


Diff: https://reviews.apache.org/r/63174/diff/2/

Changes: https://reviews.apache.org/r/63174/diff/1-2/


Testing (updated)
-------

Benchmark based off https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a (close to current HEAD).

```
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.188008209secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (22404 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 20.868372615secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (37981 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 15.354579251secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (33766 ms)
[----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (94151 ms total)


[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.045441129secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (19959 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 21.324309077secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (38490 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 14.68607521secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (32073 ms)
[----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (90523 ms total)

```

Benchmark based off https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)

```
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 23.217901878secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (38327 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 46.158610597secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (75280 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 38.56781112secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (68006 ms)
[----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (181613 ms total)

[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 25.752844224secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (43509 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 45.190859035secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (73966 ms)
[ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 36.322992753secs
[       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (66946 ms)
[----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (184421 ms total)
```

The recently patches cut down the time by over 50%. These were built with `--enable-optimize --enable-lock-free-run-queue --enable-lock-free-event-queue --enable-last-in-first-out-fixed-size-semaphore`.


Thanks,

Jiang Yan Xu


Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review188819
-----------------------------------------------------------



PASS: Mesos patch 63174 was successfully built and tested.

Reviews applied: `['63174']`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/63174

- Mesos Reviewbot Windows


On Oct. 19, 2017, 11:28 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> -----------------------------------------------------------
> 
> (Updated Oct. 19, 2017, 11:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
>     https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The current benchmark is very simple: without framework involvement and without agent retries but it's possible to add a number of others so I am creating a new file for them.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 936bc49ddfca03b9278ab11b6d317f3ff635cb00 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/1/
> 
> 
> Testing
> -------
> 
> Benchmark based off https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a (close to current HEAD).
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks in 45.075488ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (48126 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks in 14.172361ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (45979 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks in 413.508328ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (49487 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (143596 ms total)
> 
> ...
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks in 32.787363ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (48266 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks in 19.735003ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (46169 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks in 321.267267ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (51550 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (145987 ms total)
> ```
> 
> Benchmark based off https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks in 85.800335ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (59247 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks in 35.342066ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (93662 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks in 798.738642ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (116078 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (268987 ms total)
> 
> ...
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks in 66.270249ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (59925 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks in 50.146349ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (88631 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks in 807.621964ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (109941 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (258497 ms total)
> ```
> 
> The recently patches cut down the time by nearly 50%. These were built with `--enable-optimize`.
> 
> I can also get some flame graphs.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review188802
-----------------------------------------------------------



Patch looks great!

Reviews applied: [63174]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Oct. 19, 2017, 11:28 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> -----------------------------------------------------------
> 
> (Updated Oct. 19, 2017, 11:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
>     https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The current benchmark is very simple: without framework involvement and without agent retries but it's possible to add a number of others so I am creating a new file for them.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 936bc49ddfca03b9278ab11b6d317f3ff635cb00 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/1/
> 
> 
> Testing
> -------
> 
> Benchmark based off https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a (close to current HEAD).
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks in 45.075488ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (48126 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks in 14.172361ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (45979 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks in 413.508328ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (49487 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (143596 ms total)
> 
> ...
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks in 32.787363ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (48266 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks in 19.735003ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (46169 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks in 321.267267ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (51550 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (145987 ms total)
> ```
> 
> Benchmark based off https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks in 85.800335ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (59247 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks in 35.342066ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (93662 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks in 798.738642ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (116078 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (268987 ms total)
> 
> ...
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks in 66.270249ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (59925 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks in 50.146349ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (88631 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks in 807.621964ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (109941 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (258497 ms total)
> ```
> 
> The recently patches cut down the time by nearly 50%. These were built with `--enable-optimize`.
> 
> I can also get some flame graphs.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

Posted by Jiang Yan Xu <ya...@jxu.me>.

> On Oct. 19, 2017, 6:38 p.m., Benjamin Mahler wrote:
> > Thanks Yan! I will dig in soon.
> > 
> > Just some quick questions:
> > 
> > (1) I thought during the meeting you said it was taking a minute, but looking at all the benchmark timings they're all under a second? Is it only the benchmark setup that's expensive here?
> > (2) Is this with the lock free event & run queues? If not, how much do they help?
> > (3) As an aside, it has come up before, but it would be useful to be able to force the messages to go through the remote stack rather than the local stack. No need to think about this yet, but just something to keep in mind as not being accurate in this benchmark.

1) Yeah looks like it. I used to include the setup time so it was large. 
2) Yeah I have used `--enable-optimize --enable-lock-free-run-queue --enable-lock-free-event-queue --enable-last-in-first-out-fixed-size-semaphore`. I could compare with the perf without them.
3) Right right I think we should keep that in mind and we should have tests that cover the remote stack. For the case here I thought it would be a simple and good-enough start since the local stack alright coveres the proto (de)serliazation and the rest of the libprocess optimization that we recently have improved.


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review188799
-----------------------------------------------------------


On Oct. 19, 2017, 4:28 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> -----------------------------------------------------------
> 
> (Updated Oct. 19, 2017, 4:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
>     https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The current benchmark is very simple: without framework involvement and without agent retries but it's possible to add a number of others so I am creating a new file for them.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 936bc49ddfca03b9278ab11b6d317f3ff635cb00 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/1/
> 
> 
> Testing
> -------
> 
> Benchmark based off https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a (close to current HEAD).
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks in 45.075488ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (48126 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks in 14.172361ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (45979 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks in 413.508328ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (49487 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (143596 ms total)
> 
> ...
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks in 32.787363ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (48266 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks in 19.735003ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (46169 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks in 321.267267ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (51550 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (145987 ms total)
> ```
> 
> Benchmark based off https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks in 85.800335ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (59247 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks in 35.342066ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (93662 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks in 798.738642ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (116078 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (268987 ms total)
> 
> ...
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks in 66.270249ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (59925 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks in 50.146349ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (88631 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks in 807.621964ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (109941 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (258497 ms total)
> ```
> 
> The recently patches cut down the time by nearly 50%. These were built with `--enable-optimize`.
> 
> I can also get some flame graphs.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

Posted by Jiang Yan Xu <ya...@jxu.me>.

> On Oct. 19, 2017, 6:38 p.m., Benjamin Mahler wrote:
> > Thanks Yan! I will dig in soon.
> > 
> > Just some quick questions:
> > 
> > (1) I thought during the meeting you said it was taking a minute, but looking at all the benchmark timings they're all under a second? Is it only the benchmark setup that's expensive here?
> > (2) Is this with the lock free event & run queues? If not, how much do they help?
> > (3) As an aside, it has come up before, but it would be useful to be able to force the messages to go through the remote stack rather than the local stack. No need to think about this yet, but just something to keep in mind as not being accurate in this benchmark.
> 
> Jiang Yan Xu wrote:
>     1) Yeah looks like it. I used to include the setup time so it was large. 
>     2) Yeah I have used `--enable-optimize --enable-lock-free-run-queue --enable-lock-free-event-queue --enable-last-in-first-out-fixed-size-semaphore`. I could compare with the perf without them.
>     3) Right right I think we should keep that in mind and we should have tests that cover the remote stack. For the case here I thought it would be a simple and good-enough start since the local stack alright coveres the proto (de)serliazation and the rest of the libprocess optimization that we recently have improved.

Haha... actually the sub-second numbers in revision 1 were totally meaningless. I did `process::await(reregistered)` instead of `process::await(reregistered).await();` when I intended to wait for the results...

I did some optimization in rev 2 e.g., parallelize the message preparation, allocate from the stack instead of heap but I have to reduce the number of tasks to prevent it from running too long. 

PTAL.


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review188799
-----------------------------------------------------------


On Oct. 24, 2017, 11:05 a.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> -----------------------------------------------------------
> 
> (Updated Oct. 24, 2017, 11:05 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
>     https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The current benchmark is very simple: without framework involvement and without agent retries but it's possible to add a number of others so I am creating a new file for them.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am b60a54a031260de6f1fb43584ae5083df2dc7e31 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/2/
> 
> 
> Testing
> -------
> 
> Benchmark based off https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a (close to current HEAD).
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.188008209secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (22404 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 20.868372615secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (37981 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 15.354579251secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (33766 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (94151 ms total)
> 
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 11.045441129secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (19959 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 21.324309077secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (38490 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 14.68607521secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (32073 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (90523 ms total)
> 
> ```
> 
> Benchmark based off https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 23.217901878secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (38327 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 46.158610597secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (75280 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 38.56781112secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (68006 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (181613 ms total)
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 100000 running tasks and 100000 completed tasks in 25.752844224secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (43509 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 200000 running tasks and 0 completed tasks in 45.190859035secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (73966 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 20000 agents with a total of 100000 running tasks and 0 completed tasks in 36.322992753secs
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (66946 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (184421 ms total)
> ```
> 
> The recently patches cut down the time by over 50%. These were built with `--enable-optimize --enable-lock-free-run-queue --enable-lock-free-event-queue --enable-last-in-first-out-fixed-size-semaphore`.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review188799
-----------------------------------------------------------



Thanks Yan! I will dig in soon.

Just some quick questions:

(1) I thought during the meeting you said it was taking a minute, but looking at all the benchmark timings they're all under a second? Is it only the benchmark setup that's expensive here?
(2) Is this with the lock free event & run queues? If not, how much do they help?
(3) As an aside, it has come up before, but it would be useful to be able to force the messages to go through the remote stack rather than the local stack. No need to think about this yet, but just something to keep in mind as not being accurate in this benchmark.

- Benjamin Mahler


On Oct. 19, 2017, 11:28 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> -----------------------------------------------------------
> 
> (Updated Oct. 19, 2017, 11:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
>     https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The current benchmark is very simple: without framework involvement and without agent retries but it's possible to add a number of others so I am creating a new file for them.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 936bc49ddfca03b9278ab11b6d317f3ff635cb00 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/1/
> 
> 
> Testing
> -------
> 
> Benchmark based off https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a (close to current HEAD).
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks in 45.075488ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (48126 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks in 14.172361ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (45979 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks in 413.508328ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (49487 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (143596 ms total)
> 
> ...
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks in 32.787363ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (48266 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks in 19.735003ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (46169 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks in 321.267267ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (51550 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (145987 ms total)
> ```
> 
> Benchmark based off https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks in 85.800335ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (59247 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks in 35.342066ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (93662 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks in 798.738642ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (116078 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (268987 ms total)
> 
> ...
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks in 66.270249ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 (59925 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks in 50.146349ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 (88631 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks in 807.621964ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 (109941 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (258497 ms total)
> ```
> 
> The recently patches cut down the time by nearly 50%. These were built with `--enable-optimize`.
> 
> I can also get some flame graphs.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>