You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Benno Evers <be...@mesosphere.com> on 2017/12/12 12:00:13 UTC
Review Request 64536: Removed race from
SlaveRecoveryTest.ReconnectExecutor.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64536/
-----------------------------------------------------------
Review request for mesos and Alexander Rukletsov.
Repository: mesos
Description
-------
Since the executor now sends two status updates in potentially
rapid progression, there was a race where the slave successfully
received a TASK_RUNNING update before shutting down, throwing off
the later checks.
Diffs
-----
src/tests/slave_recovery_tests.cpp 253b0fc2ff7ec1f00937d42636151553c46d5175
Diff: https://reviews.apache.org/r/64536/diff/1/
Testing
-------
`./mesos-tests --gtest_filter="SlaveRecoveryTest/0.ReconnectExecutor" --gtest_repeat=500 --gtest_break_on_failure`
Thanks,
Benno Evers
Re: Review Request 64536: Removed race from
SlaveRecoveryTest.ReconnectExecutor.
Posted by Alexander Rukletsov <ru...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64536/#review193708
-----------------------------------------------------------
Ship it!
Ship It!
- Alexander Rukletsov
On Dec. 12, 2017, noon, Benno Evers wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64536/
> -----------------------------------------------------------
>
> (Updated Dec. 12, 2017, noon)
>
>
> Review request for mesos and Alexander Rukletsov.
>
>
> Bugs: MESOS-8245
> https://issues.apache.org/jira/browse/MESOS-8245
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Since the executor now sends two status updates in potentially
> rapid progression, there was a race where the slave successfully
> received a TASK_RUNNING update before shutting down, throwing off
> the later checks.
>
>
> Diffs
> -----
>
> src/tests/slave_recovery_tests.cpp 253b0fc2ff7ec1f00937d42636151553c46d5175
>
>
> Diff: https://reviews.apache.org/r/64536/diff/1/
>
>
> Testing
> -------
>
> `./mesos-tests --gtest_filter="SlaveRecoveryTest/0.ReconnectExecutor" --gtest_repeat=500 --gtest_break_on_failure`
>
>
> Thanks,
>
> Benno Evers
>
>
Re: Review Request 64536: Removed race from
SlaveRecoveryTest.ReconnectExecutor.
Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64536/#review193535
-----------------------------------------------------------
FAIL: Some Mesos tests failed.
Reviews applied: `['64536']`
Failed command: `D:\DCOS\mesos\src\mesos-tests.exe --verbose`
All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64536
Relevant logs:
- [mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64536/logs/mesos-tests-stdout.log):
```
[----------] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[ OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2203 ms)
[----------] 1 test from IsolationFlag/CpuIsolatorTest (2226 ms total)
[----------] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[ OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2232 ms)
[----------] 1 test from IsolationFlag/MemoryIsolatorTest (2255 ms total)
[----------] Global test environment tear-down
[==========] 829 tests from 84 test cases ran. (306069 ms total)
[ PASSED ] 819 tests.
[ FAILED ] 10 tests, listed below:
[ FAILED ] OfferOperationStatusUpdateManagerTest.UpdateAndAckNonTerminalUpdate
[ FAILED ] OfferOperationStatusUpdateManagerTest.RecoverCheckpointedStream
[ FAILED ] OfferOperationStatusUpdateManagerTest.RecoverEmptyFile
[ FAILED ] OfferOperationStatusUpdateManagerTest.RecoverTerminatedStream
[ FAILED ] OfferOperationStatusUpdateManagerTest.IgnoreDuplicateUpdate
[ FAILED ] OfferOperationStatusUpdateManagerTest.IgnoreDuplicateUpdateAfterRecover
[ FAILED ] OfferOperationStatusUpdateManagerTest.RejectDuplicateAck
[ FAILED ] OfferOperationStatusUpdateManagerTest.RejectDuplicateAckAfterRecover
[ FAILED ] OfferOperationStatusUpdateManagerTest.NonStrictRecoveryCorruptedFile
[ FAILED ] SlaveTest.ResourceProviderPublishAll
10 FAILED TESTS
YOU HAVE 204 DISABLED TESTS
```
- [mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64536/logs/mesos-tests-stderr.log):
```
I1212 12:57:17.451431 7052 slave.cpp:3400] Shutting down framework 841bed6f-0c12-4c5f-b3af-74557bd90411-0000
I1212 12:57:17.451431 9148 master.cpp:10114] Updating the state of task 6d538a57-09b1-4fa1-b70d-6175711b5a35 of framework 841bed6f-0c12-4c5f-b3af-74557bd90411-0000 (latest state: TASK_KILLED, status update state: TASK_KILLED)
I1212 12:57:17.451431 7052 slave.cpp:6091] Shutting down executor '6d538a57-09b1-4fa1-b70d-6175711b5a35' of framework 841bed6f-0c12-4c5f-b3af-74557bd90411-0000 at executor(1)@10.3.1.5:57016
I1212 12:57:17.452409 7052 slave.cpp:909] Agent terminatiI1212 12:57:16.787437 1416 exec.cpp:162] Version: 1.5.0
I1212 12:57:16.810417 6384 exec.cpp:237] Executor registered on agent 841bed6f-0c12-4c5f-b3af-74557bd90411-S0
I1212 12:57:16.813416 8364 executor.cpp:171] Received SUBSCRIBED event
I1212 12:57:16.817443 8364 executor.cpp:175] Subscribed executor on build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I1212 12:57:16.817443 8364 executor.cpp:171] Received LAUNCH event
I1212 12:57:16.821440 8364 executor.cpp:637] Starting task 6d538a57-09b1-4fa1-b70d-6175711b5a35
I1212 12:57:16.908439 8364 executor.cpp:477] Running 'D:\DCOS\mesos\src\mesos-containerizer.exe launch <POSSIBLY-SENSITIVE-DATA>'
I1212 12:57:17.428408 8364 executor.cpp:650] Forked command at 144
I1212 12:57:17.453408 1456 exec.cpp:435] Executor asked to shutdown
I1212 12:57:17.453408 4956 executor.cpp:171] Received SHUTDOWN event
I1212 12:57:17.453408 4956 executor.cpp:747] Shutting down
I1212 12:57:17.454408 4956 executor.cpp:854] Sending SIGTERM to process tree at pid 14ng
W1212 12:57:17.453408 7052 slave.cpp:3396] Ignoring shutdown framework 841bed6f-0c12-4c5f-b3af-74557bd90411-0000 because it is terminating
I1212 12:57:17.454408 9148 master.cpp:10220] Removing task 6d538a57-09b1-4fa1-b70d-6175711b5a35 with resources cpus(allocated: *):4; mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: *):[31000-32000] of framework 841bed6f-0c12-4c5f-b3af-74557bd90411-0000 on agent 841bed6f-0c12-4c5f-b3af-74557bd90411-S0 at slave(326)@10.3.1.5:56995 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I1212 12:57:17.456408 7052 containerizer.cpp:2328] Destroying container 42def7e4-36d8-431b-8c19-9e491da11c58 in RUNNING state
I1212 12:57:17.456408 9148 master.cpp:1305] Agent 841bed6f-0c12-4c5f-b3af-74557bd90411-S0 at slave(326)@10.3.1.5:56995 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I1212 12:57:17.456408 7052 containerizer.cpp:2930] Transitioning the state of container 42def7e4-36d8-431b-8c19-9e491da11c58 from RUNNING to DESTROYING
I1212 12:57:17.456408 9148 master.cpp:3364] Disconnecting agent 841bed6f-0c12-4c5f-b3af-74557bd90411-S0 at slave(326)@10.3.1.5:56995 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I1212 12:57:17.456408 9148 master.cpp:3383] Deactivating agent 841bed6f-0c12-4c5f-b3af-74557bd90411-S0 at slave(326)@10.3.1.5:56995 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I1212 12:57:17.456408 7136 hierarchical.cpp:344] Removed framework 841bed6f-0c12-4c5f-b3af-74557bd90411-0000
I1212 12:57:17.457409 7136 hierarchical.cpp:766] Agent 841bed6f-0c12-4c5f-b3af-74557bd90411-S0 deactivated
I1212 12:57:17.457409 7052 launcher.cpp:156] Asked to destroy container 42def7e4-36d8-431b-8c19-9e491da11c58
I1212 12:57:17.481518 7356 containerizer.cpp:2779] Container 42def7e4-36d8-431b-8c19-9e491da11c58 has exited
I1212 12:57:17.508533 4392 master.cpp:1147] Master terminating
I1212 12:57:17.510571 7696 hierarchical.cpp:609] Removed agent 841bed6f-0c12-4c5f-b3af-74557bd90411-S0
I1212 12:57:17.805527 92 process.cpp:887] Failed to accept socket: future discarded
```
- Mesos Reviewbot Windows
On Dec. 12, 2017, 5:30 p.m., Benno Evers wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64536/
> -----------------------------------------------------------
>
> (Updated Dec. 12, 2017, 5:30 p.m.)
>
>
> Review request for mesos and Alexander Rukletsov.
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Since the executor now sends two status updates in potentially
> rapid progression, there was a race where the slave successfully
> received a TASK_RUNNING update before shutting down, throwing off
> the later checks.
>
>
> Diffs
> -----
>
> src/tests/slave_recovery_tests.cpp 253b0fc2ff7ec1f00937d42636151553c46d5175
>
>
> Diff: https://reviews.apache.org/r/64536/diff/1/
>
>
> Testing
> -------
>
> `./mesos-tests --gtest_filter="SlaveRecoveryTest/0.ReconnectExecutor" --gtest_repeat=500 --gtest_break_on_failure`
>
>
> Thanks,
>
> Benno Evers
>
>