You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Benno Evers <be...@mesosphere.com> on 2019/01/04 16:57:23 UTC
Review Request 69669: Notified frameworks when operations are marked
as unreachable.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/
-----------------------------------------------------------
Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and Joseph Wu.
Repository: mesos
Description
-------
When an agent is being marked as unreachable due to missing
the reregistration timeout, all operations on that agent
are implicilty transitioned to status `OPERATION_UNREACHABLE`.
This commit adds an explicit notification for this transition
to frameworks which opted-in to operation feedback.
Diffs
-----
src/master/master.hpp 99549ab857b16d722f0dd991f98dbe54e9ed19a1
src/master/master.cpp b4faf2b077a0288ba36195b7a21402932489d316
src/tests/api_tests.cpp b6064cd749e42e45c2b471c71e9769a41b59f726
Diff: https://reviews.apache.org/r/69669/diff/1/
Testing
-------
Thanks,
Benno Evers
Re: Review Request 69669: Notified frameworks when operations are
marked as unreachable.
Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/#review211789
-----------------------------------------------------------
FAIL: Some of the unit tests failed. Please check the relevant logs.
Reviews applied: `['69575', '69597', '69669']`
Failed command: `Start-MesosCITesting`
All the build artifacts available at: http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2728/mesos-review-69669
Relevant logs:
- [mesos-tests.log](http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2728/mesos-review-69669/logs/mesos-tests.log):
```
I0109 18:58:57.001946 5572 master.cpp:1271] Agent b4e6f5db-e8a0-4bbc-bad7-f7400b2e66b5-S0 at slave(468)@192.10.1.4:52219 (windows-01.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net) disconnected
I0109 18:58:57.001946 5572 master.cpp:3274] Disconnecting agent b4e6f5db-e8a0-4bbc-bad7-f7400b2e66b5-S0 at slave(468)@192.10.1.4:52219 (windows-01.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net)
I0109 18:58:57.002950 5572 master.cpp:3293] Deactivating agent b4e6f5db-e8a0-4bbc-bad7-f7400b2e66b5-S0 at slave(468)@192.10.1.4:52219 (windows-01.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net)
I0109 18:58:57.002950 6192 hierarchical.cpp:358] Removed framework b4e6f5db-e8a0-4bbc-bad7-f7400b2e66b5-0000
I0109 18:58:57.003962 6192 hierarchical.cpp:802] Agent b4e6f5db-e8a0-4bbc-bad7-f7400b2e66b5-S0 deactivated
I0109 18:58:57.003962 5572 containerizer.cpp:2463] Destroying container d726a03e-24ba-426d-95fd-07dcf876572d in RUNNING state
I0109 18:58:57.003962 5572 containerizer.cpp:3130] Transitioning the state of container d726a03e-24ba-426d-95fd-07dcf876572d from RUNNING to DESTROYING
I0109 18:58:57.004971 5572 launcher.cpp:161] Asked to destroy container d726a03e-24ba-426d-95fd-07dcf876572d
W0109 18:58:57.004971 6352 process.cpp:838] Failed to recv on socket WindowsFD::Type::SOCKET=1788 to peer '192.10.1.4:54061': IO failed with error code: The specified network name is no longer available.
W0109 18:58:57.005954 6352 process.cpp:1423] Failed to recv on socket WindowsFD::Type::SOCKET=1800 to peer '192.10.1.4:54060': IO failed with error code: The specified network name is no longer available.
I0109 18:58:57.084885 6192 containerizer.cpp:2969] Container d726a03e-24ba-426d-95fd-07dcf876572d has exited
I0109 18:58:57.113801 6064 master.cpp:11[ OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (686 ms)
[----------] 1 test from IsolationFlag/MemoryIsolatorTest (704 ms total)
[----------] Global test environment tear-down
[==========] 1086 tests from 104 test cases ran. (499593 ms total)
[ PASSED ] 1084 tests.
[ FAILED ] 2 tests, listed below:
[ FAILED ] DockerFetcherPluginTest.INTERNET_CURL_FetchImage
[ FAILED ] ContentType/MasterAPITest.OperationUpdatesUponAgentGone/1, where GetParam() = application/json
2 FAILED TESTS
YOU HAVE 231 DISABLED TESTS
11] Master terminating
I0109 18:58:57.115829 5176 hierarchical.cpp:644] Removed agent b4e6f5db-e8a0-4bbc-bad7-f7400b2e66b5-S0
I0109 18:58:57.374857 6352 process.cpp:927] Stopped the socket accept loop
```
- Mesos Reviewbot Windows
On Jan. 4, 2019, 4:57 p.m., Benno Evers wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69669/
> -----------------------------------------------------------
>
> (Updated Jan. 4, 2019, 4:57 p.m.)
>
>
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and Joseph Wu.
>
>
> Bugs: MESOS-8783
> https://issues.apache.org/jira/browse/MESOS-8783
>
>
> Repository: mesos
>
>
> Description
> -------
>
> When an agent is being marked as unreachable due to missing
> the reregistration timeout, all operations on that agent
> are implicilty transitioned to status `OPERATION_UNREACHABLE`.
>
> This commit adds an explicit notification for this transition
> to frameworks which opted-in to operation feedback.
>
>
> Diffs
> -----
>
> src/master/master.hpp 99549ab857b16d722f0dd991f98dbe54e9ed19a1
> src/master/master.cpp b4faf2b077a0288ba36195b7a21402932489d316
> src/tests/api_tests.cpp b6064cd749e42e45c2b471c71e9769a41b59f726
>
>
> Diff: https://reviews.apache.org/r/69669/diff/1/
>
>
> Testing
> -------
>
> Internal CI run.
>
>
> Thanks,
>
> Benno Evers
>
>
Re: Review Request 69669: Notified frameworks when operations are
marked as unreachable.
Posted by Benno Evers <be...@mesosphere.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/#review211765
-----------------------------------------------------------
src/master/master.hpp
Lines 904 (patched)
<https://reviews.apache.org/r/69669/#comment297362>
I thought about making this a member function of `struct Slave` instead, but it felt a bit unclean to suddenly add networking code to it because it currently acts mostly as a fancy data storage.
- Benno Evers
On Jan. 4, 2019, 4:57 p.m., Benno Evers wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69669/
> -----------------------------------------------------------
>
> (Updated Jan. 4, 2019, 4:57 p.m.)
>
>
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and Joseph Wu.
>
>
> Bugs: MESOS-8783
> https://issues.apache.org/jira/browse/MESOS-8783
>
>
> Repository: mesos
>
>
> Description
> -------
>
> When an agent is being marked as unreachable due to missing
> the reregistration timeout, all operations on that agent
> are implicilty transitioned to status `OPERATION_UNREACHABLE`.
>
> This commit adds an explicit notification for this transition
> to frameworks which opted-in to operation feedback.
>
>
> Diffs
> -----
>
> src/master/master.hpp 99549ab857b16d722f0dd991f98dbe54e9ed19a1
> src/master/master.cpp b4faf2b077a0288ba36195b7a21402932489d316
> src/tests/api_tests.cpp b6064cd749e42e45c2b471c71e9769a41b59f726
>
>
> Diff: https://reviews.apache.org/r/69669/diff/1/
>
>
> Testing
> -------
>
> Internal CI run.
>
>
> Thanks,
>
> Benno Evers
>
>
Re: Review Request 69669: Notified frameworks when operations are
marked as unreachable.
Posted by Benno Evers <be...@mesosphere.com>.
> On Jan. 10, 2019, 8:53 p.m., Greg Mann wrote:
> > src/master/master.cpp
> > Line 8954 (original), 8982-8984 (patched)
> > <https://reviews.apache.org/r/69669/diff/1/?file=2117537#file2117537line8982>
> >
> > Nit: fits on one line.
I'm curious, do we have an (informal) guideline that says when things fit on one line we should try to fit them there?
I actually intentionally spread this out, because I found my eyes were skipping over this line while reading, and due to that I was wondering why we only loop over `slave->resourceProviders` below.
> On Jan. 10, 2019, 8:53 p.m., Greg Mann wrote:
> > src/tests/api_tests.cpp
> > Lines 5127 (patched)
> > <https://reviews.apache.org/r/69669/diff/1/?file=2117538#file2117538line5127>
> >
> > Since we don't reference the contents of `slaveFlags` anywhere, you can omit this variable; `StartSlave()` will use the default argument value of `None()` and create the slave flags itself before calling `cluster::Slave::create()`.
Dropping this since we have to use the `slaveFlags` when pausing the clock in this test.
- Benno
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/#review211836
-----------------------------------------------------------
On Jan. 11, 2019, 2:24 p.m., Benno Evers wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69669/
> -----------------------------------------------------------
>
> (Updated Jan. 11, 2019, 2:24 p.m.)
>
>
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and Joseph Wu.
>
>
> Bugs: MESOS-8783
> https://issues.apache.org/jira/browse/MESOS-8783
>
>
> Repository: mesos
>
>
> Description
> -------
>
> When an agent is being marked as unreachable due to missing
> the reregistration timeout, all operations on that agent
> are implicilty transitioned to status `OPERATION_UNREACHABLE`.
>
> This commit adds an explicit notification for this transition
> to frameworks which opted-in to operation feedback.
>
>
> Diffs
> -----
>
> src/master/master.hpp 8de73124dbfb81d6edd0d1d5193adc21756f3fad
> src/master/master.cpp 9624832d017dae717da803ca2ff2f8fc5135ea9d
> src/tests/api_tests.cpp c597243e2e210e83a4ab7441fbcfa3198b43d849
>
>
> Diff: https://reviews.apache.org/r/69669/diff/2/
>
>
> Testing
> -------
>
> Internal CI run.
>
> `./src/mesos-tests --gtest_filter="*OperationUpdatesUponUnreachable*" --verbose --gtest_repeat=5000`
>
>
> Thanks,
>
> Benno Evers
>
>
Re: Review Request 69669: Notified frameworks when operations are
marked as unreachable.
Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/#review211836
-----------------------------------------------------------
src/master/master.cpp
Line 8954 (original), 8982-8984 (patched)
<https://reviews.apache.org/r/69669/#comment297419>
Nit: fits on one line.
src/tests/api_tests.cpp
Lines 5127 (patched)
<https://reviews.apache.org/r/69669/#comment297420>
Since we don't reference the contents of `slaveFlags` anywhere, you can omit this variable; `StartSlave()` will use the default argument value of `None()` and create the slave flags itself before calling `cluster::Slave::create()`.
src/tests/api_tests.cpp
Lines 5161-5164 (patched)
<https://reviews.apache.org/r/69669/#comment297421>
I think this variable is unused?
src/tests/api_tests.cpp
Lines 5203-5204 (patched)
<https://reviews.apache.org/r/69669/#comment297424>
Let's get rid of the parentheses:
"Try to reserve the resources managed by the resource provider, because currently operation feedback is only supported for that case."
src/tests/api_tests.cpp
Lines 5236-5239 (patched)
<https://reviews.apache.org/r/69669/#comment297426>
Could we just pause the clock for the whole test? It might be necessary to retain the `slaveFlags` variable if you do this.
We should also resume the clock at the end of the test.
- Greg Mann
On Jan. 4, 2019, 4:57 p.m., Benno Evers wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69669/
> -----------------------------------------------------------
>
> (Updated Jan. 4, 2019, 4:57 p.m.)
>
>
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and Joseph Wu.
>
>
> Bugs: MESOS-8783
> https://issues.apache.org/jira/browse/MESOS-8783
>
>
> Repository: mesos
>
>
> Description
> -------
>
> When an agent is being marked as unreachable due to missing
> the reregistration timeout, all operations on that agent
> are implicilty transitioned to status `OPERATION_UNREACHABLE`.
>
> This commit adds an explicit notification for this transition
> to frameworks which opted-in to operation feedback.
>
>
> Diffs
> -----
>
> src/master/master.hpp 99549ab857b16d722f0dd991f98dbe54e9ed19a1
> src/master/master.cpp b4faf2b077a0288ba36195b7a21402932489d316
> src/tests/api_tests.cpp b6064cd749e42e45c2b471c71e9769a41b59f726
>
>
> Diff: https://reviews.apache.org/r/69669/diff/1/
>
>
> Testing
> -------
>
> Internal CI run.
>
>
> Thanks,
>
> Benno Evers
>
>
Re: Review Request 69669: Notified frameworks when operations are
marked as unreachable.
Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/#review211978
-----------------------------------------------------------
Ship it!
Ship It!
- Greg Mann
On Jan. 11, 2019, 2:24 p.m., Benno Evers wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69669/
> -----------------------------------------------------------
>
> (Updated Jan. 11, 2019, 2:24 p.m.)
>
>
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and Joseph Wu.
>
>
> Bugs: MESOS-8783
> https://issues.apache.org/jira/browse/MESOS-8783
>
>
> Repository: mesos
>
>
> Description
> -------
>
> When an agent is being marked as unreachable due to missing
> the reregistration timeout, all operations on that agent
> are implicilty transitioned to status `OPERATION_UNREACHABLE`.
>
> This commit adds an explicit notification for this transition
> to frameworks which opted-in to operation feedback.
>
>
> Diffs
> -----
>
> src/master/master.hpp 8de73124dbfb81d6edd0d1d5193adc21756f3fad
> src/master/master.cpp 9624832d017dae717da803ca2ff2f8fc5135ea9d
> src/tests/api_tests.cpp c597243e2e210e83a4ab7441fbcfa3198b43d849
>
>
> Diff: https://reviews.apache.org/r/69669/diff/3/
>
>
> Testing
> -------
>
> Internal CI run.
>
> `./src/mesos-tests --gtest_filter="*OperationUpdatesUponUnreachable*" --verbose --gtest_repeat=5000`
>
>
> Thanks,
>
> Benno Evers
>
>
Re: Review Request 69669: Notified frameworks when operations are
marked as unreachable.
Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/#review211889
-----------------------------------------------------------
PASS: Mesos patch 69669 was successfully built and tested.
Reviews applied: `['69669']`
All the build artifacts available at: http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2761/mesos-review-69669
- Mesos Reviewbot Windows
On Jan. 11, 2019, 2:24 p.m., Benno Evers wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69669/
> -----------------------------------------------------------
>
> (Updated Jan. 11, 2019, 2:24 p.m.)
>
>
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and Joseph Wu.
>
>
> Bugs: MESOS-8783
> https://issues.apache.org/jira/browse/MESOS-8783
>
>
> Repository: mesos
>
>
> Description
> -------
>
> When an agent is being marked as unreachable due to missing
> the reregistration timeout, all operations on that agent
> are implicilty transitioned to status `OPERATION_UNREACHABLE`.
>
> This commit adds an explicit notification for this transition
> to frameworks which opted-in to operation feedback.
>
>
> Diffs
> -----
>
> src/master/master.hpp 8de73124dbfb81d6edd0d1d5193adc21756f3fad
> src/master/master.cpp 9624832d017dae717da803ca2ff2f8fc5135ea9d
> src/tests/api_tests.cpp c597243e2e210e83a4ab7441fbcfa3198b43d849
>
>
> Diff: https://reviews.apache.org/r/69669/diff/2/
>
>
> Testing
> -------
>
> Internal CI run.
>
> `./src/mesos-tests --gtest_filter="*OperationUpdatesUponUnreachable*" --verbose --gtest_repeat=5000`
>
>
> Thanks,
>
> Benno Evers
>
>
Re: Review Request 69669: Notified frameworks when operations are
marked as unreachable.
Posted by Benno Evers <be...@mesosphere.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/
-----------------------------------------------------------
(Updated Jan. 11, 2019, 2:24 p.m.)
Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and Joseph Wu.
Changes
-------
Address comments and rebase onto latest master.
Bugs: MESOS-8783
https://issues.apache.org/jira/browse/MESOS-8783
Repository: mesos
Description
-------
When an agent is being marked as unreachable due to missing
the reregistration timeout, all operations on that agent
are implicilty transitioned to status `OPERATION_UNREACHABLE`.
This commit adds an explicit notification for this transition
to frameworks which opted-in to operation feedback.
Diffs (updated)
-----
src/master/master.hpp 8de73124dbfb81d6edd0d1d5193adc21756f3fad
src/master/master.cpp 9624832d017dae717da803ca2ff2f8fc5135ea9d
src/tests/api_tests.cpp c597243e2e210e83a4ab7441fbcfa3198b43d849
Diff: https://reviews.apache.org/r/69669/diff/2/
Changes: https://reviews.apache.org/r/69669/diff/1-2/
Testing (updated)
-------
Internal CI run.
`./src/mesos-tests --gtest_filter="*OperationUpdatesUponUnreachable*" --verbose --gtest_repeat=5000`
Thanks,
Benno Evers