You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Benjamin Bannier <be...@mesosphere.io> on 2018/03/27 18:32:52 UTC

Review Request 66313: Fixed an oversubscription test for agent registration backoff.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66313/
-----------------------------------------------------------

Review request for mesos, Alexander Rukletsov and Till Toenshoff.


Bugs: MESOS-8733
    https://issues.apache.org/jira/browse/MESOS-8733


Repository: mesos


Description
-------

In the `OversubscriptionTest.ForwardUpdateSlaveMessage` test we
observe a single `UpdateSlaveMessage` to make sure the agent has fully
recovered. This message is sent from the resource provider-capable
agent to communicate its (empty) set of resource providers after
registration.

Since message was sent with a running clock, it is possible that the
agent encounters a timeout of its registration backoff timers. The
agent would then register agent, triggering another similar message
which is not expected in the test.

This patch adjusts the test to always run with paused clock
eliminating this particular scenario.


Diffs
-----

  src/tests/oversubscription_tests.cpp 47c51e3d035eb5143d00efb466675eb02236b52e 


Diff: https://reviews.apache.org/r/66313/diff/1/


Testing
-------

`make check`

Before this patch this test would fail after a handful of iterations; with this patch I was able to execute this test >110,000 times without issues.


Thanks,

Benjamin Bannier


Re: Review Request 66313: Fixed an oversubscription test for agent registration backoff.

Posted by Alexander Rukletsov <ru...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66313/#review200075
-----------------------------------------------------------


Ship it!




Ship It!

- Alexander Rukletsov


On March 27, 2018, 6:32 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66313/
> -----------------------------------------------------------
> 
> (Updated March 27, 2018, 6:32 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Till Toenshoff.
> 
> 
> Bugs: MESOS-8733
>     https://issues.apache.org/jira/browse/MESOS-8733
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> In the `OversubscriptionTest.ForwardUpdateSlaveMessage` test we
> observe a single `UpdateSlaveMessage` to make sure the agent has fully
> recovered. This message is sent from the resource provider-capable
> agent to communicate its (empty) set of resource providers after
> registration.
> 
> Since message was sent with a running clock, it is possible that the
> agent encounters a timeout of its registration backoff timers. The
> agent would then register agent, triggering another similar message
> which is not expected in the test.
> 
> This patch adjusts the test to always run with paused clock
> eliminating this particular scenario.
> 
> 
> Diffs
> -----
> 
>   src/tests/oversubscription_tests.cpp 47c51e3d035eb5143d00efb466675eb02236b52e 
> 
> 
> Diff: https://reviews.apache.org/r/66313/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Before this patch this test would fail after a handful of iterations; with this patch I was able to execute this test >110,000 times without issues.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 66313: Fixed an oversubscription test for agent registration backoff.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66313/#review200064
-----------------------------------------------------------



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['66313']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/66313

Relevant logs:

- [mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/66313/logs/mesos-tests-stdout.log):

```
[       OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (112 ms)
[----------] 9 tests from Endpoint/SlaveEndpointTest (1020 ms total)

[----------] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN      ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[       OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (33 ms)
[ RUN      ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[       OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (37 ms)
[----------] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (72 ms total)

[----------] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN      ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[       OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (761 ms)
[----------] 1 test from IsolationFlag/CpuIsolatorTest (781 ms total)

[----------] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN      ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[       OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (743 ms)
[----------] 1 test from IsolationFlag/MemoryIsolatorTest (767 ms total)

[----------] Global test environment tear-down
[==========] 949 tests from 94 test cases ran. (439765 ms total)
[  PASSED  ] 948 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CommandExecutorCheckTest.CommandCheckTimeout

 1 FAILED TEST
  YOU HAVE 214 DISABLED TESTS

```

- [mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/66313/logs/mesos-tests-stderr.log):

```
I0327 19:32:51.571627 10064 master.cpp:10446] Updating the state of task c3e41ee2-e5d8-49e7-94ad-6109a50d8e0f of framework 8785e0d5-1b32-4cb7-85df-13d08ef19fcb-0000 (latesI0327 19:32:51.383610  7052 exec.cpp:162] Version: 1.6.0
I0327 19:32:51.413622 10132 exec.cpp:236] Executor registered on agent 8785e0d5-1b32-4cb7-85df-13d08ef19fcb-S0
I0327 19:32:51.417731 10788 executor.cpp:176] Received SUBSCRIBED event
I0327 19:32:51.422617 10788 executor.cpp:180] Subscribed executor on winbldsrv-02.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I0327 19:32:51.422617 10788 executor.cpp:176] Received LAUNCH event
I0327 19:32:51.428632 10788 executor.cpp:648] Starting task c3e41ee2-e5d8-49e7-94ad-6109a50d8e0f
I0327 19:32:51.511649 10788 executor.cpp:483] Running 'D:\DCOS\mesos\src\mesos-containerizer.exe launch <POSSIBLY-SENSITIVE-DATA>'
I0327 19:32:51.542621 10788 executor.cpp:661] Forked command at 2792
I0327 19:32:51.573623 10792 exec.cpp:445] Executor asked to shutdown
I0327 19:32:51.574631  5012 executor.cpp:176] Received SHUTDOWN event
I0327 19:32:51.574631  5012 executor.cpp:758] Shutting down
I0327 19:32:51.574631  5012 executor.cpp:868] Sending SIGTERM to process tree at pid 2t state: TASK_KILLED, status update state: TASK_KILLED)
I0327 19:32:51.571627 10052 slave.cpp:3873] Shutting down framework 8785e0d5-1b32-4cb7-85df-13d08ef19fcb-0000
I0327 19:32:51.572621 10052 slave.cpp:6566] Shutting down executor 'c3e41ee2-e5d8-49e7-94ad-6109a50d8e0f' of framework 8785e0d5-1b32-4cb7-85df-13d08ef19fcb-0000 at executor(1)@10.3.1.5:50421
I0327 19:32:51.573623  9732 slave.cpp:919] Agent terminating
W0327 19:32:51.573623  9732 slave.cpp:3869] Ignoring shutdown framework 8785e0d5-1b32-4cb7-85df-13d08ef19fcb-0000 because it is terminating
I0327 19:32:51.575619 10064 master.cpp:10545] Removing task c3e41ee2-e5d8-49e7-94ad-6109a50d8e0f with resources cpus(allocated: *):4; mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: *):[31000-32000] of framework 8785e0d5-1b32-4cb7-85df-13d08ef19fcb-0000 on agent 8785e0d5-1b32-4cb7-85df-13d08ef19fcb-S0 at slave(418)@10.3.1.5:50400 (winbldsrv-02.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0327 19:32:51.577610  3548 containerizer.cpp:2338] Destroying container 450e454a-bf3e-4dcf-beea-2d16db192da0 in RUNNING state
I0327 19:32:51.577610  3548 containerizer.cpp:2952] Transitioning the state of container 450e454a-bf3e-4dcf-beea-2d16db192da0 from RUNNING to DESTROYING
I0327 19:32:51.577610 10064 master.cpp:1295] Agent 8785e0d5-1b32-4cb7-85df-13d08ef19fcb-S0 at slave(418)@10.3.1.5:50400 (winbldsrv-02.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I0327 19:32:51.578611 10064 master.cpp:3283] Disconnecting agent 8785e0d5-1b32-4cb7-85df-13d08ef19fcb-S0 at slave(418)@10.3.1.5:50400 (winbldsrv-02.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0327 19:32:51.578611  9768 hierarchical.cpp:344] Removed framework 8785e0d5-1b32-4cb7-85df-13d08ef19fcb-0000
I0327 19:32:51.578611 10064 master.cpp:3302] Deactivating agent 8785e0d5-1b32-4cb7-85df-13d08ef19fcb-S0 at slave(418)@10.3.1.5:50400 (winbldsrv-02.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0327 19:32:51.578611  3548 launcher.cpp:156] Asked to destroy container 450e454a-bf3e-4dcf-beea-2d16db192da0
I0327 19:32:51.578611  7808 hierarchical.cpp:766] Agent 8785e0d5-1b32-4cb7-85df-13d08ef19fcb-S0 deactivated
I0327 19:32:51.605891  9280 containerizer.cpp:2791] Container 450e454a-bf3e-4dcf-beea-2d16db192da0 has exited
I0327 19:32:51.641590  9732 master.cpp:1137] Master terminating
I0327 19:32:51.643599  7808 hierarchical.cpp:609] Removed agent 8785e0d5-1b32-4cb7-85df-13d08ef19fcb-S0
I0327 19:32:52.051609  9456 process.cpp:929] Stopped the socket accept loop
```

- Mesos Reviewbot Windows


On March 27, 2018, 6:32 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66313/
> -----------------------------------------------------------
> 
> (Updated March 27, 2018, 6:32 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Till Toenshoff.
> 
> 
> Bugs: MESOS-8733
>     https://issues.apache.org/jira/browse/MESOS-8733
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> In the `OversubscriptionTest.ForwardUpdateSlaveMessage` test we
> observe a single `UpdateSlaveMessage` to make sure the agent has fully
> recovered. This message is sent from the resource provider-capable
> agent to communicate its (empty) set of resource providers after
> registration.
> 
> Since message was sent with a running clock, it is possible that the
> agent encounters a timeout of its registration backoff timers. The
> agent would then register agent, triggering another similar message
> which is not expected in the test.
> 
> This patch adjusts the test to always run with paused clock
> eliminating this particular scenario.
> 
> 
> Diffs
> -----
> 
>   src/tests/oversubscription_tests.cpp 47c51e3d035eb5143d00efb466675eb02236b52e 
> 
> 
> Diff: https://reviews.apache.org/r/66313/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Before this patch this test would fail after a handful of iterations; with this patch I was able to execute this test >110,000 times without issues.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 66313: Fixed an oversubscription test for agent registration backoff.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66313/#review200068
-----------------------------------------------------------



Patch looks great!

Reviews applied: [66313]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On March 27, 2018, 11:32 a.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66313/
> -----------------------------------------------------------
> 
> (Updated March 27, 2018, 11:32 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Till Toenshoff.
> 
> 
> Bugs: MESOS-8733
>     https://issues.apache.org/jira/browse/MESOS-8733
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> In the `OversubscriptionTest.ForwardUpdateSlaveMessage` test we
> observe a single `UpdateSlaveMessage` to make sure the agent has fully
> recovered. This message is sent from the resource provider-capable
> agent to communicate its (empty) set of resource providers after
> registration.
> 
> Since message was sent with a running clock, it is possible that the
> agent encounters a timeout of its registration backoff timers. The
> agent would then register agent, triggering another similar message
> which is not expected in the test.
> 
> This patch adjusts the test to always run with paused clock
> eliminating this particular scenario.
> 
> 
> Diffs
> -----
> 
>   src/tests/oversubscription_tests.cpp 47c51e3d035eb5143d00efb466675eb02236b52e 
> 
> 
> Diff: https://reviews.apache.org/r/66313/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Before this patch this test would fail after a handful of iterations; with this patch I was able to execute this test >110,000 times without issues.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>