You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Benno Evers <be...@mesosphere.com> on 2019/01/04 16:57:23 UTC

Review Request 69669: Notified frameworks when operations are marked as unreachable.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/
-----------------------------------------------------------

Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and Joseph Wu.


Repository: mesos


Description
-------

When an agent is being marked as unreachable due to missing
the reregistration timeout, all operations on that agent
are implicilty transitioned to status `OPERATION_UNREACHABLE`.

This commit adds an explicit notification for this transition
to frameworks which opted-in to operation feedback.


Diffs
-----

  src/master/master.hpp 99549ab857b16d722f0dd991f98dbe54e9ed19a1 
  src/master/master.cpp b4faf2b077a0288ba36195b7a21402932489d316 
  src/tests/api_tests.cpp b6064cd749e42e45c2b471c71e9769a41b59f726 


Diff: https://reviews.apache.org/r/69669/diff/1/


Testing
-------


Thanks,

Benno Evers


Re: Review Request 69669: Notified frameworks when operations are marked as unreachable.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/#review211789
-----------------------------------------------------------



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['69575', '69597', '69669']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2728/mesos-review-69669

Relevant logs:

- [mesos-tests.log](http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2728/mesos-review-69669/logs/mesos-tests.log):

```
I0109 18:58:57.001946  5572 master.cpp:1271] Agent b4e6f5db-e8a0-4bbc-bad7-f7400b2e66b5-S0 at slave(468)@192.10.1.4:52219 (windows-01.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net) disconnected
I0109 18:58:57.001946  5572 master.cpp:3274] Disconnecting agent b4e6f5db-e8a0-4bbc-bad7-f7400b2e66b5-S0 at slave(468)@192.10.1.4:52219 (windows-01.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net)
I0109 18:58:57.002950  5572 master.cpp:3293] Deactivating agent b4e6f5db-e8a0-4bbc-bad7-f7400b2e66b5-S0 at slave(468)@192.10.1.4:52219 (windows-01.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net)
I0109 18:58:57.002950  6192 hierarchical.cpp:358] Removed framework b4e6f5db-e8a0-4bbc-bad7-f7400b2e66b5-0000
I0109 18:58:57.003962  6192 hierarchical.cpp:802] Agent b4e6f5db-e8a0-4bbc-bad7-f7400b2e66b5-S0 deactivated
I0109 18:58:57.003962  5572 containerizer.cpp:2463] Destroying container d726a03e-24ba-426d-95fd-07dcf876572d in RUNNING state
I0109 18:58:57.003962  5572 containerizer.cpp:3130] Transitioning the state of container d726a03e-24ba-426d-95fd-07dcf876572d from RUNNING to DESTROYING
I0109 18:58:57.004971  5572 launcher.cpp:161] Asked to destroy container d726a03e-24ba-426d-95fd-07dcf876572d
W0109 18:58:57.004971  6352 process.cpp:838] Failed to recv on socket WindowsFD::Type::SOCKET=1788 to peer '192.10.1.4:54061': IO failed with error code: The specified network name is no longer available.

W0109 18:58:57.005954  6352 process.cpp:1423] Failed to recv on socket WindowsFD::Type::SOCKET=1800 to peer '192.10.1.4:54060': IO failed with error code: The specified network name is no longer available.

I0109 18:58:57.084885  6192 containerizer.cpp:2969] Container d726a03e-24ba-426d-95fd-07dcf876572d has exited
I0109 18:58:57.113801  6064 master.cpp:11[       OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (686 ms)
[----------] 1 test from IsolationFlag/MemoryIsolatorTest (704 ms total)

[----------] Global test environment tear-down
[==========] 1086 tests from 104 test cases ran. (499593 ms total)
[  PASSED  ] 1084 tests.
[  FAILED  ] 2 tests, listed below:
[  FAILED  ] DockerFetcherPluginTest.INTERNET_CURL_FetchImage
[  FAILED  ] ContentType/MasterAPITest.OperationUpdatesUponAgentGone/1, where GetParam() = application/json

 2 FAILED TESTS
  YOU HAVE 231 DISABLED TESTS

11] Master terminating
I0109 18:58:57.115829  5176 hierarchical.cpp:644] Removed agent b4e6f5db-e8a0-4bbc-bad7-f7400b2e66b5-S0
I0109 18:58:57.374857  6352 process.cpp:927] Stopped the socket accept loop
```

- Mesos Reviewbot Windows


On Jan. 4, 2019, 4:57 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69669/
> -----------------------------------------------------------
> 
> (Updated Jan. 4, 2019, 4:57 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8783
>     https://issues.apache.org/jira/browse/MESOS-8783
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When an agent is being marked as unreachable due to missing
> the reregistration timeout, all operations on that agent
> are implicilty transitioned to status `OPERATION_UNREACHABLE`.
> 
> This commit adds an explicit notification for this transition
> to frameworks which opted-in to operation feedback.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 99549ab857b16d722f0dd991f98dbe54e9ed19a1 
>   src/master/master.cpp b4faf2b077a0288ba36195b7a21402932489d316 
>   src/tests/api_tests.cpp b6064cd749e42e45c2b471c71e9769a41b59f726 
> 
> 
> Diff: https://reviews.apache.org/r/69669/diff/1/
> 
> 
> Testing
> -------
> 
> Internal CI run.
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 69669: Notified frameworks when operations are marked as unreachable.

Posted by Benno Evers <be...@mesosphere.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/#review211765
-----------------------------------------------------------




src/master/master.hpp
Lines 904 (patched)
<https://reviews.apache.org/r/69669/#comment297362>

    I thought about making this a member function of `struct Slave` instead, but it felt a bit unclean to suddenly add networking code to it because it currently acts mostly as a fancy data storage.


- Benno Evers


On Jan. 4, 2019, 4:57 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69669/
> -----------------------------------------------------------
> 
> (Updated Jan. 4, 2019, 4:57 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8783
>     https://issues.apache.org/jira/browse/MESOS-8783
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When an agent is being marked as unreachable due to missing
> the reregistration timeout, all operations on that agent
> are implicilty transitioned to status `OPERATION_UNREACHABLE`.
> 
> This commit adds an explicit notification for this transition
> to frameworks which opted-in to operation feedback.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 99549ab857b16d722f0dd991f98dbe54e9ed19a1 
>   src/master/master.cpp b4faf2b077a0288ba36195b7a21402932489d316 
>   src/tests/api_tests.cpp b6064cd749e42e45c2b471c71e9769a41b59f726 
> 
> 
> Diff: https://reviews.apache.org/r/69669/diff/1/
> 
> 
> Testing
> -------
> 
> Internal CI run.
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 69669: Notified frameworks when operations are marked as unreachable.

Posted by Benno Evers <be...@mesosphere.com>.

> On Jan. 10, 2019, 8:53 p.m., Greg Mann wrote:
> > src/master/master.cpp
> > Line 8954 (original), 8982-8984 (patched)
> > <https://reviews.apache.org/r/69669/diff/1/?file=2117537#file2117537line8982>
> >
> >     Nit: fits on one line.

I'm curious, do we have an (informal) guideline that says when things fit on one line we should try to fit them there?

I actually intentionally spread this out, because I found my eyes were skipping over this line while reading, and due to that I was wondering why we only loop over `slave->resourceProviders` below.


> On Jan. 10, 2019, 8:53 p.m., Greg Mann wrote:
> > src/tests/api_tests.cpp
> > Lines 5127 (patched)
> > <https://reviews.apache.org/r/69669/diff/1/?file=2117538#file2117538line5127>
> >
> >     Since we don't reference the contents of `slaveFlags` anywhere, you can omit this variable; `StartSlave()` will use the default argument value of `None()` and create the slave flags itself before calling `cluster::Slave::create()`.

Dropping this since we have to use the `slaveFlags` when pausing the clock in this test.


- Benno


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/#review211836
-----------------------------------------------------------


On Jan. 11, 2019, 2:24 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69669/
> -----------------------------------------------------------
> 
> (Updated Jan. 11, 2019, 2:24 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8783
>     https://issues.apache.org/jira/browse/MESOS-8783
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When an agent is being marked as unreachable due to missing
> the reregistration timeout, all operations on that agent
> are implicilty transitioned to status `OPERATION_UNREACHABLE`.
> 
> This commit adds an explicit notification for this transition
> to frameworks which opted-in to operation feedback.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 8de73124dbfb81d6edd0d1d5193adc21756f3fad 
>   src/master/master.cpp 9624832d017dae717da803ca2ff2f8fc5135ea9d 
>   src/tests/api_tests.cpp c597243e2e210e83a4ab7441fbcfa3198b43d849 
> 
> 
> Diff: https://reviews.apache.org/r/69669/diff/2/
> 
> 
> Testing
> -------
> 
> Internal CI run.
> 
> `./src/mesos-tests --gtest_filter="*OperationUpdatesUponUnreachable*" --verbose --gtest_repeat=5000`
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 69669: Notified frameworks when operations are marked as unreachable.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/#review211836
-----------------------------------------------------------




src/master/master.cpp
Line 8954 (original), 8982-8984 (patched)
<https://reviews.apache.org/r/69669/#comment297419>

    Nit: fits on one line.



src/tests/api_tests.cpp
Lines 5127 (patched)
<https://reviews.apache.org/r/69669/#comment297420>

    Since we don't reference the contents of `slaveFlags` anywhere, you can omit this variable; `StartSlave()` will use the default argument value of `None()` and create the slave flags itself before calling `cluster::Slave::create()`.



src/tests/api_tests.cpp
Lines 5161-5164 (patched)
<https://reviews.apache.org/r/69669/#comment297421>

    I think this variable is unused?



src/tests/api_tests.cpp
Lines 5203-5204 (patched)
<https://reviews.apache.org/r/69669/#comment297424>

    Let's get rid of the parentheses:
    
    "Try to reserve the resources managed by the resource provider, because currently operation feedback is only supported for that case."



src/tests/api_tests.cpp
Lines 5236-5239 (patched)
<https://reviews.apache.org/r/69669/#comment297426>

    Could we just pause the clock for the whole test? It might be necessary to retain the `slaveFlags` variable if you do this.
    
    We should also resume the clock at the end of the test.


- Greg Mann


On Jan. 4, 2019, 4:57 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69669/
> -----------------------------------------------------------
> 
> (Updated Jan. 4, 2019, 4:57 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8783
>     https://issues.apache.org/jira/browse/MESOS-8783
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When an agent is being marked as unreachable due to missing
> the reregistration timeout, all operations on that agent
> are implicilty transitioned to status `OPERATION_UNREACHABLE`.
> 
> This commit adds an explicit notification for this transition
> to frameworks which opted-in to operation feedback.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 99549ab857b16d722f0dd991f98dbe54e9ed19a1 
>   src/master/master.cpp b4faf2b077a0288ba36195b7a21402932489d316 
>   src/tests/api_tests.cpp b6064cd749e42e45c2b471c71e9769a41b59f726 
> 
> 
> Diff: https://reviews.apache.org/r/69669/diff/1/
> 
> 
> Testing
> -------
> 
> Internal CI run.
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 69669: Notified frameworks when operations are marked as unreachable.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/#review211978
-----------------------------------------------------------


Ship it!




Ship It!

- Greg Mann


On Jan. 11, 2019, 2:24 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69669/
> -----------------------------------------------------------
> 
> (Updated Jan. 11, 2019, 2:24 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8783
>     https://issues.apache.org/jira/browse/MESOS-8783
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When an agent is being marked as unreachable due to missing
> the reregistration timeout, all operations on that agent
> are implicilty transitioned to status `OPERATION_UNREACHABLE`.
> 
> This commit adds an explicit notification for this transition
> to frameworks which opted-in to operation feedback.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 8de73124dbfb81d6edd0d1d5193adc21756f3fad 
>   src/master/master.cpp 9624832d017dae717da803ca2ff2f8fc5135ea9d 
>   src/tests/api_tests.cpp c597243e2e210e83a4ab7441fbcfa3198b43d849 
> 
> 
> Diff: https://reviews.apache.org/r/69669/diff/3/
> 
> 
> Testing
> -------
> 
> Internal CI run.
> 
> `./src/mesos-tests --gtest_filter="*OperationUpdatesUponUnreachable*" --verbose --gtest_repeat=5000`
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 69669: Notified frameworks when operations are marked as unreachable.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/#review211889
-----------------------------------------------------------



PASS: Mesos patch 69669 was successfully built and tested.

Reviews applied: `['69669']`

All the build artifacts available at: http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2761/mesos-review-69669

- Mesos Reviewbot Windows


On Jan. 11, 2019, 2:24 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69669/
> -----------------------------------------------------------
> 
> (Updated Jan. 11, 2019, 2:24 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8783
>     https://issues.apache.org/jira/browse/MESOS-8783
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When an agent is being marked as unreachable due to missing
> the reregistration timeout, all operations on that agent
> are implicilty transitioned to status `OPERATION_UNREACHABLE`.
> 
> This commit adds an explicit notification for this transition
> to frameworks which opted-in to operation feedback.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 8de73124dbfb81d6edd0d1d5193adc21756f3fad 
>   src/master/master.cpp 9624832d017dae717da803ca2ff2f8fc5135ea9d 
>   src/tests/api_tests.cpp c597243e2e210e83a4ab7441fbcfa3198b43d849 
> 
> 
> Diff: https://reviews.apache.org/r/69669/diff/2/
> 
> 
> Testing
> -------
> 
> Internal CI run.
> 
> `./src/mesos-tests --gtest_filter="*OperationUpdatesUponUnreachable*" --verbose --gtest_repeat=5000`
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 69669: Notified frameworks when operations are marked as unreachable.

Posted by Benno Evers <be...@mesosphere.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/
-----------------------------------------------------------

(Updated Jan. 11, 2019, 2:24 p.m.)


Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and Joseph Wu.


Changes
-------

Address comments and rebase onto latest master.


Bugs: MESOS-8783
    https://issues.apache.org/jira/browse/MESOS-8783


Repository: mesos


Description
-------

When an agent is being marked as unreachable due to missing
the reregistration timeout, all operations on that agent
are implicilty transitioned to status `OPERATION_UNREACHABLE`.

This commit adds an explicit notification for this transition
to frameworks which opted-in to operation feedback.


Diffs (updated)
-----

  src/master/master.hpp 8de73124dbfb81d6edd0d1d5193adc21756f3fad 
  src/master/master.cpp 9624832d017dae717da803ca2ff2f8fc5135ea9d 
  src/tests/api_tests.cpp c597243e2e210e83a4ab7441fbcfa3198b43d849 


Diff: https://reviews.apache.org/r/69669/diff/2/

Changes: https://reviews.apache.org/r/69669/diff/1-2/


Testing (updated)
-------

Internal CI run.

`./src/mesos-tests --gtest_filter="*OperationUpdatesUponUnreachable*" --verbose --gtest_repeat=5000`


Thanks,

Benno Evers