You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Benno Evers <be...@mesosphere.com> on 2017/12/12 12:00:13 UTC

Review Request 64536: Removed race from SlaveRecoveryTest.ReconnectExecutor.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64536/
-----------------------------------------------------------

Review request for mesos and Alexander Rukletsov.


Repository: mesos


Description
-------

Since the executor now sends two status updates in potentially
rapid progression, there was a race where the slave successfully
received a TASK_RUNNING update before shutting down, throwing off
the later checks.


Diffs
-----

  src/tests/slave_recovery_tests.cpp 253b0fc2ff7ec1f00937d42636151553c46d5175 


Diff: https://reviews.apache.org/r/64536/diff/1/


Testing
-------

`./mesos-tests --gtest_filter="SlaveRecoveryTest/0.ReconnectExecutor" --gtest_repeat=500 --gtest_break_on_failure`


Thanks,

Benno Evers


Re: Review Request 64536: Removed race from SlaveRecoveryTest.ReconnectExecutor.

Posted by Alexander Rukletsov <ru...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64536/#review193708
-----------------------------------------------------------


Ship it!




Ship It!

- Alexander Rukletsov


On Dec. 12, 2017, noon, Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64536/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2017, noon)
> 
> 
> Review request for mesos and Alexander Rukletsov.
> 
> 
> Bugs: MESOS-8245
>     https://issues.apache.org/jira/browse/MESOS-8245
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Since the executor now sends two status updates in potentially
> rapid progression, there was a race where the slave successfully
> received a TASK_RUNNING update before shutting down, throwing off
> the later checks.
> 
> 
> Diffs
> -----
> 
>   src/tests/slave_recovery_tests.cpp 253b0fc2ff7ec1f00937d42636151553c46d5175 
> 
> 
> Diff: https://reviews.apache.org/r/64536/diff/1/
> 
> 
> Testing
> -------
> 
> `./mesos-tests --gtest_filter="SlaveRecoveryTest/0.ReconnectExecutor" --gtest_repeat=500 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 64536: Removed race from SlaveRecoveryTest.ReconnectExecutor.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64536/#review193535
-----------------------------------------------------------



FAIL: Some Mesos tests failed.

Reviews applied: `['64536']`

Failed command: `D:\DCOS\mesos\src\mesos-tests.exe --verbose`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64536

Relevant logs:

- [mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64536/logs/mesos-tests-stdout.log):

```

[----------] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN      ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[       OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2203 ms)
[----------] 1 test from IsolationFlag/CpuIsolatorTest (2226 ms total)

[----------] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN      ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[       OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2232 ms)
[----------] 1 test from IsolationFlag/MemoryIsolatorTest (2255 ms total)

[----------] Global test environment tear-down
[==========] 829 tests from 84 test cases ran. (306069 ms total)
[  PASSED  ] 819 tests.
[  FAILED  ] 10 tests, listed below:
[  FAILED  ] OfferOperationStatusUpdateManagerTest.UpdateAndAckNonTerminalUpdate
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RecoverCheckpointedStream
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RecoverEmptyFile
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RecoverTerminatedStream
[  FAILED  ] OfferOperationStatusUpdateManagerTest.IgnoreDuplicateUpdate
[  FAILED  ] OfferOperationStatusUpdateManagerTest.IgnoreDuplicateUpdateAfterRecover
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RejectDuplicateAck
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RejectDuplicateAckAfterRecover
[  FAILED  ] OfferOperationStatusUpdateManagerTest.NonStrictRecoveryCorruptedFile
[  FAILED  ] SlaveTest.ResourceProviderPublishAll

10 FAILED TESTS
  YOU HAVE 204 DISABLED TESTS

```

- [mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64536/logs/mesos-tests-stderr.log):

```
I1212 12:57:17.451431  7052 slave.cpp:3400] Shutting down framework 841bed6f-0c12-4c5f-b3af-74557bd90411-0000
I1212 12:57:17.451431  9148 master.cpp:10114] Updating the state of task 6d538a57-09b1-4fa1-b70d-6175711b5a35 of framework 841bed6f-0c12-4c5f-b3af-74557bd90411-0000 (latest state: TASK_KILLED, status update state: TASK_KILLED)
I1212 12:57:17.451431  7052 slave.cpp:6091] Shutting down executor '6d538a57-09b1-4fa1-b70d-6175711b5a35' of framework 841bed6f-0c12-4c5f-b3af-74557bd90411-0000 at executor(1)@10.3.1.5:57016
I1212 12:57:17.452409  7052 slave.cpp:909] Agent terminatiI1212 12:57:16.787437  1416 exec.cpp:162] Version: 1.5.0
I1212 12:57:16.810417  6384 exec.cpp:237] Executor registered on agent 841bed6f-0c12-4c5f-b3af-74557bd90411-S0
I1212 12:57:16.813416  8364 executor.cpp:171] Received SUBSCRIBED event
I1212 12:57:16.817443  8364 executor.cpp:175] Subscribed executor on build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I1212 12:57:16.817443  8364 executor.cpp:171] Received LAUNCH event
I1212 12:57:16.821440  8364 executor.cpp:637] Starting task 6d538a57-09b1-4fa1-b70d-6175711b5a35
I1212 12:57:16.908439  8364 executor.cpp:477] Running 'D:\DCOS\mesos\src\mesos-containerizer.exe launch <POSSIBLY-SENSITIVE-DATA>'
I1212 12:57:17.428408  8364 executor.cpp:650] Forked command at 144
I1212 12:57:17.453408  1456 exec.cpp:435] Executor asked to shutdown
I1212 12:57:17.453408  4956 executor.cpp:171] Received SHUTDOWN event
I1212 12:57:17.453408  4956 executor.cpp:747] Shutting down
I1212 12:57:17.454408  4956 executor.cpp:854] Sending SIGTERM to process tree at pid 14ng
W1212 12:57:17.453408  7052 slave.cpp:3396] Ignoring shutdown framework 841bed6f-0c12-4c5f-b3af-74557bd90411-0000 because it is terminating
I1212 12:57:17.454408  9148 master.cpp:10220] Removing task 6d538a57-09b1-4fa1-b70d-6175711b5a35 with resources cpus(allocated: *):4; mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: *):[31000-32000] of framework 841bed6f-0c12-4c5f-b3af-74557bd90411-0000 on agent 841bed6f-0c12-4c5f-b3af-74557bd90411-S0 at slave(326)@10.3.1.5:56995 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I1212 12:57:17.456408  7052 containerizer.cpp:2328] Destroying container 42def7e4-36d8-431b-8c19-9e491da11c58 in RUNNING state
I1212 12:57:17.456408  9148 master.cpp:1305] Agent 841bed6f-0c12-4c5f-b3af-74557bd90411-S0 at slave(326)@10.3.1.5:56995 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I1212 12:57:17.456408  7052 containerizer.cpp:2930] Transitioning the state of container 42def7e4-36d8-431b-8c19-9e491da11c58 from RUNNING to DESTROYING
I1212 12:57:17.456408  9148 master.cpp:3364] Disconnecting agent 841bed6f-0c12-4c5f-b3af-74557bd90411-S0 at slave(326)@10.3.1.5:56995 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I1212 12:57:17.456408  9148 master.cpp:3383] Deactivating agent 841bed6f-0c12-4c5f-b3af-74557bd90411-S0 at slave(326)@10.3.1.5:56995 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I1212 12:57:17.456408  7136 hierarchical.cpp:344] Removed framework 841bed6f-0c12-4c5f-b3af-74557bd90411-0000
I1212 12:57:17.457409  7136 hierarchical.cpp:766] Agent 841bed6f-0c12-4c5f-b3af-74557bd90411-S0 deactivated
I1212 12:57:17.457409  7052 launcher.cpp:156] Asked to destroy container 42def7e4-36d8-431b-8c19-9e491da11c58
I1212 12:57:17.481518  7356 containerizer.cpp:2779] Container 42def7e4-36d8-431b-8c19-9e491da11c58 has exited
I1212 12:57:17.508533  4392 master.cpp:1147] Master terminating
I1212 12:57:17.510571  7696 hierarchical.cpp:609] Removed agent 841bed6f-0c12-4c5f-b3af-74557bd90411-S0
I1212 12:57:17.805527    92 process.cpp:887] Failed to accept socket: future discarded
```

- Mesos Reviewbot Windows


On Dec. 12, 2017, 5:30 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64536/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2017, 5:30 p.m.)
> 
> 
> Review request for mesos and Alexander Rukletsov.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Since the executor now sends two status updates in potentially
> rapid progression, there was a race where the slave successfully
> received a TASK_RUNNING update before shutting down, throwing off
> the later checks.
> 
> 
> Diffs
> -----
> 
>   src/tests/slave_recovery_tests.cpp 253b0fc2ff7ec1f00937d42636151553c46d5175 
> 
> 
> Diff: https://reviews.apache.org/r/64536/diff/1/
> 
> 
> Testing
> -------
> 
> `./mesos-tests --gtest_filter="SlaveRecoveryTest/0.ReconnectExecutor" --gtest_repeat=500 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Benno Evers
> 
>