You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Chun-Hung Hsiao <ch...@mesosphere.io> on 2017/09/14 17:46:02 UTC

Review Request 62336: Kicked in disk monitoring early during recovery.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62336/
-----------------------------------------------------------

Review request for mesos, Benjamin Mahler, Jie Yu, and Vinod Kone.


Bugs: MESOS-7939
    https://issues.apache.org/jira/browse/MESOS-7939


Repository: mesos


Description
-------

This patch calls `Slave::checkDiskUsage` after `Slave::recoverFramework`
so terminated containers are scheduled for GC, and before checkpointing
anything so there would be enough disk for checkpointing.

To avoid introducing a one-time disk monitoring function,
`Slave::checkDiskUsage` no longer contains a delayed recursion. Instead,
we use `process::loop` to monitor disk ugase periodically.


Diffs
-----

  src/slave/slave.cpp 6d1516a5d5b5db684f79385e60d892ff75fd00fd 


Diff: https://reviews.apache.org/r/62336/diff/1/


Testing
-------

sudo make check


Thanks,

Chun-Hung Hsiao


Re: Review Request 62336: Kicked in disk monitoring early during recovery.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62336/#review185469
-----------------------------------------------------------



FAIL: Some Mesos tests failed.

Reviews applied: `['62252', '62230', '62343', '62344', '62336']`

Failed command: `C:\mesos\src\mesos-tests.exe --verbose`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/62336

Relevant logs:

- [mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/62336/logs/mesos-tests-stdout.log):

```
[ RUN      ] ContentType/SchedulerTest.SchedulerReconnect/0
[       OK ] ContentType/SchedulerTest.SchedulerReconnect/0 (230 ms)
[ RUN      ] ContentType/SchedulerTest.SchedulerReconnect/1
[       OK ] ContentType/SchedulerTest.SchedulerReconnect/1 (248 ms)
[----------] 30 tests from ContentType/SchedulerTest (25595 ms total)

[----------] 2 tests from ContentTypeAndSSLConfig/SchedulerSSLTest
[ RUN      ] ContentTypeAndSSLConfig/SchedulerSSLTest.RunTaskAndTeardown/0
[       OK ] ContentTypeAndSSLConfig/SchedulerSSLTest.RunTaskAndTeardown/0 (950 ms)
[ RUN      ] ContentTypeAndSSLConfig/SchedulerSSLTest.RunTaskAndTeardown/1
[       OK ] ContentTypeAndSSLConfig/SchedulerSSLTest.RunTaskAndTeardown/1 (1096 ms)
[----------] 2 tests from ContentTypeAndSSLConfig/SchedulerSSLTest (2169 ms total)

[----------] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN      ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[       OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (138 ms)
[ RUN      ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[       OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (158 ms)
[----------] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (340 ms total)

[----------] Global test environment tear-down
[==========] 627 tests from 66 test cases ran. (346326 ms total)
[  PASSED  ] 626 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] ContentType/MasterAPITest.EventAuthorizationFiltering/1, where GetParam() = application/json

 1 FAILED TEST
  YOU HAVE 174 DISABLED TESTS

```

- [mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/62336/logs/mesos-tests-stderr.log):

```
I0915 05:44:03.072856 16404 master.cpp:8418] Removing framework 66b909dc-a90d-4311-8431-0d6b99a5345f-0000 (default)
I0915 05:44:03.072856 16404 master.cpp:3267] Deactivating framework 66b909dc-a90d-4311-8431-0d6b99a5345f-0000 (default)
I0915 05:44:03.073856 15504 hierarchical.cpp:412] Deactivated framework 66b909dc-a90d-4311-8431-0d6b99a5345f-0000
I0915 05:44:03.073856 17008 slave.cpp:3246] Shutting down framework 66b909dc-a90d-4311-8431-0d6b99a5345f-0000
I0915 05:44:03.073856 16404 master.cpp:8993] Updating the state of task 89b835d3-f611-4dc3-911f-2c4c43e4a324 of framework 66b909dc-a90d-4311-8431-0d6b99a5345f-0000 (latest state: TASK_KILLED, status update state: TASK_KILLED)
I0915 05:44:03.086850 17008 slave.cpp:5742] Shutting down executor 'default' of framework 66b909dc-a90d-4311-8431-0d6b99a5345f-0000 (via HTTP)
I0915 05:44:03.090852 16404 master.cpp:9087] Removing task 89b835d3-f611-4dc3-911f-2c4c43e4a324 with resources [{"allocation_info":{"role":"*"},"name":"cpus","scalar":{"value":2.0},"type":"SCALAR"},{"allocation_info":{"role":"*"},"name":"mem","scalar":{"value":1024.0},"type":"SCALAR"},{"allocation_info":{"role":"*"},"name":"disk","scalar":{"value":1024.0},"type":"SCALAR"},{"allocation_info":{"role":"*"},"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}] of framework 66b909dc-a90d-4311-8431-0d6b99a5345f-0000 on agent 66b909dc-a90d-4311-8431-0d6b99a5345f-S0 at slave(254)@10.3.1.5:60615 (mesos-bld-s1.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0915 05:44:03.111853 16404 master.cpp:9116] Removing executor 'default' with resources [] of framework 66b909dc-a90d-4311-8431-0d6b99a5345f-0000 on agent 66b909dc-a90d-4311-8431-0d6b99a5345f-S0 at slave(254)@10.3.1.5:60615 (mesos-bld-s1.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0915 05:44:03.114852 16312 hierarchical.cpp:355] Removed framework 66b909dc-a90d-4311-8431-0d6b99a5345f-0000
E0915 05:44:03.115852 18072 scheduler.cpp:649] End-Of-File received from master. The master closed the event stream
I0915 05:44:03.116854 15504 scheduler.cpp:444] Re-detecting master
I0915 05:44:03.119853 17008 scheduler.cpp:470] New master detected at master@10.3.1.5:60615
I0915 05:44:03.138854 17804 slave.cpp:5418] Executor 'default' of framework 66b909dc-a90d-4311-8431-0d6b99a5345f-0000 exited with status 0
I0915 05:44:03.138854 17804 slave.cpp:5522] Cleaning up executor 'default' of framework 66b909dc-a90d-4311-8431-0d6b99a5345f-0000 (via HTTP)
W0915 05:44:03.139853 16404 master.cpp:7021] Ignoring unknown exited executor 'default' of framework 66b909dc-a90d-4311-8431-0d6b99a5345f-0000 on agent 66b909dc-a90d-4311-8431-0d6b99a5345f-S0 at slave(254)@10.3.1.5:60615 (mesos-bld-s1.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0915 05:44:03.152487 16312 gc.cpp:93] Scheduling 'C:\Users\mesos\AppData\Local\Temp\2\fil8Od\slaves\66b909dc-a90d-4311-8431-0d6b99a5345f-S0\frameworks\66b909dc-a90d-4311-8431-0d6b99a5345f-0000\executors\default\runs\1327ab88-0feb-4bce-b14b-bee0af28f108' for gc 6.99998666101926days in the future
I0915 05:44:03.153486 17804 slave.cpp:5618] Cleaning up framework 66b909dc-a90d-4311-8431-0d6b99a5345f-0000
I0915 05:44:03.153486 18072 gc.cpp:93] Scheduling 'C:\Users\mesos\AppData\Local\Temp\2\fil8Od\slaves\66b909dc-a90d-4311-8431-0d6b99a5345f-S0\frameworks\66b909dc-a90d-4311-8431-0d6b99a5345f-0000\executors\default' for gc 6.99998664945482days in the future
I0915 05:44:03.154486 17188 status_update_manager.cpp:285] Closing status update streams for framework 66b909dc-a90d-4311-8431-0d6b99a5345f-0000
I0915 05:44:03.155485 17008 gc.cpp:93] Scheduling 'C:\Users\mesos\AppData\Local\Temp\2\fil8Od\slaves\66b909dc-a90d-4311-8431-0d6b99a5345f-S0\frameworks\66b909dc-a90d-4311-8431-0d6b99a5345f-0000' for gc 6.99998663788148days in the future
I0915 05:44:03.155485 17804 slave.cpp:872] Agent terminating
I0915 05:44:03.171486 15504 master.cpp:1321] Agent 66b909dc-a90d-4311-8431-0d6b99a5345f-S0 at slave(254)@10.3.1.5:60615 (mesos-bld-s1.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I0915 05:44:03.171486 15504 master.cpp:3304] Disconnecting agent 66b909dc-a90d-4311-8431-0d6b99a5345f-S0 at slave(254)@10.3.1.5:60615 (mesos-bld-s1.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0915 05:44:03.172487 15504 master.cpp:3323] Deactivating agent 66b909dc-a90d-4311-8431-0d6b99a5345f-S0 at slave(254)@10.3.1.5:60615 (mesos-bld-s1.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0915 05:44:03.172487 16404 hierarchical.cpp:690] Agent 66b909dc-a90d-4311-8431-0d6b99a5345f-S0 deactivated
I0915 05:44:03.203493 16364 master.cpp:1163] Master terminating
I0915 05:44:03.207487 16312 hierarchical.cpp:626] Removed agent 66b909dc-a90d-4311-8431-0d6b99a5345f-S0
W0915 05:44:03.219494 16364 master.hpp:2761] Failed to close HTTP pipe for 66b909dc-a90d-4311-8431-0d6b99a5345f-0000 (default)
I0915 05:44:03.934675 16660 process.cpp:1068] Failed to accept socket: future discarded
```

- Mesos Reviewbot Windows


On Sept. 15, 2017, 2 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62336/
> -----------------------------------------------------------
> 
> (Updated Sept. 15, 2017, 2 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Jie Yu, and Vinod Kone.
> 
> 
> Bugs: MESOS-7939
>     https://issues.apache.org/jira/browse/MESOS-7939
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch calls `Slave::checkDiskUsage` after `Slave::recover`
> so completed executors are scheduled for GC to clean up disk space for
> status updates of terminated executors in `Slave::_recover`.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 7d07868451e93d34ba694d40216c1e4036fd4094 
>   src/slave/slave.cpp 6d1516a5d5b5db684f79385e60d892ff75fd00fd 
> 
> 
> Diff: https://reviews.apache.org/r/62336/diff/2/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 62336: Kicked in disk monitoring early during recovery.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62336/#review185496
-----------------------------------------------------------



Patch looks great!

Reviews applied: [62252, 62230, 62343, 62344, 62336]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Sept. 15, 2017, 2 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62336/
> -----------------------------------------------------------
> 
> (Updated Sept. 15, 2017, 2 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Jie Yu, and Vinod Kone.
> 
> 
> Bugs: MESOS-7939
>     https://issues.apache.org/jira/browse/MESOS-7939
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch calls `Slave::checkDiskUsage` after `Slave::recover`
> so completed executors are scheduled for GC to clean up disk space for
> status updates of terminated executors in `Slave::_recover`.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 7d07868451e93d34ba694d40216c1e4036fd4094 
>   src/slave/slave.cpp 6d1516a5d5b5db684f79385e60d892ff75fd00fd 
> 
> 
> Diff: https://reviews.apache.org/r/62336/diff/2/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 62336: Kicked in disk monitoring early during recovery.

Posted by Chun-Hung Hsiao <ch...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62336/
-----------------------------------------------------------

(Updated Sept. 15, 2017, 2 a.m.)


Review request for mesos, Benjamin Mahler, Jie Yu, and Vinod Kone.


Changes
-------

Reworked based on Vinod's feedback.


Summary (updated)
-----------------

Kicked in disk monitoring early during recovery.


Bugs: MESOS-7939
    https://issues.apache.org/jira/browse/MESOS-7939


Repository: mesos


Description (updated)
-------

This patch calls `Slave::checkDiskUsage` after `Slave::recover`
so completed executors are scheduled for GC to clean up disk space for
status updates of terminated executors in `Slave::_recover`.


Diffs (updated)
-----

  src/slave/slave.hpp 7d07868451e93d34ba694d40216c1e4036fd4094 
  src/slave/slave.cpp 6d1516a5d5b5db684f79385e60d892ff75fd00fd 


Diff: https://reviews.apache.org/r/62336/diff/2/

Changes: https://reviews.apache.org/r/62336/diff/1-2/


Testing
-------

sudo make check


Thanks,

Chun-Hung Hsiao


Re: Review Request 62336: [WIP] Kicked in disk monitoring early during recovery.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62336/#review185459
-----------------------------------------------------------



FAIL: Some Mesos tests failed.

Reviews applied: `['62336']`

Failed command: `C:\mesos\src\mesos-tests.exe --verbose`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/62336

Relevant logs:

- [mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/62336/logs/mesos-tests-stdout.log):

```
[ RUN      ] ContentType/SchedulerTest.SchedulerReconnect/0
[       OK ] ContentType/SchedulerTest.SchedulerReconnect/0 (247 ms)
[ RUN      ] ContentType/SchedulerTest.SchedulerReconnect/1
[       OK ] ContentType/SchedulerTest.SchedulerReconnect/1 (259 ms)
[----------] 30 tests from ContentType/SchedulerTest (25400 ms total)

[----------] 2 tests from ContentTypeAndSSLConfig/SchedulerSSLTest
[ RUN      ] ContentTypeAndSSLConfig/SchedulerSSLTest.RunTaskAndTeardown/0
[       OK ] ContentTypeAndSSLConfig/SchedulerSSLTest.RunTaskAndTeardown/0 (942 ms)
[ RUN      ] ContentTypeAndSSLConfig/SchedulerSSLTest.RunTaskAndTeardown/1
[       OK ] ContentTypeAndSSLConfig/SchedulerSSLTest.RunTaskAndTeardown/1 (1056 ms)
[----------] 2 tests from ContentTypeAndSSLConfig/SchedulerSSLTest (2088 ms total)

[----------] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN      ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[       OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (136 ms)
[ RUN      ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[       OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (151 ms)
[----------] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (343 ms total)

[----------] Global test environment tear-down
[==========] 627 tests from 66 test cases ran. (346585 ms total)
[  PASSED  ] 626 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] ContentType/MasterAPITest.EventAuthorizationFiltering/1, where GetParam() = application/json

 1 FAILED TEST
  YOU HAVE 174 DISABLED TESTS

```

- [mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/62336/logs/mesos-tests-stderr.log):

```
I0914 23:44:16.061240 17112 master.cpp:8418] Removing framework 07ade0c7-98d5-4a53-a737-74c7275f434a-0000 (default)
I0914 23:44:16.061240 17112 master.cpp:3267] Deactivating framework 07ade0c7-98d5-4a53-a737-74c7275f434a-0000 (default)
I0914 23:44:16.062239 13412 hierarchical.cpp:412] Deactivated framework 07ade0c7-98d5-4a53-a737-74c7275f434a-0000
I0914 23:44:16.062239 16704 slave.cpp:3241] Shutting down framework 07ade0c7-98d5-4a53-a737-74c7275f434a-0000
I0914 23:44:16.062239 17112 master.cpp:8993] Updating the state of task 3e8e92e0-5e09-4e4e-ae7d-93afa841d270 of framework 07ade0c7-98d5-4a53-a737-74c7275f434a-0000 (latest state: TASK_KILLED, status update state: TASK_KILLED)
I0914 23:44:16.063241 16704 slave.cpp:5737] Shutting down executor 'default' of framework 07ade0c7-98d5-4a53-a737-74c7275f434a-0000 (via HTTP)
I0914 23:44:16.083897 17112 master.cpp:9087] Removing task 3e8e92e0-5e09-4e4e-ae7d-93afa841d270 with resources [{"allocation_info":{"role":"*"},"name":"cpus","scalar":{"value":2.0},"type":"SCALAR"},{"allocation_info":{"role":"*"},"name":"mem","scalar":{"value":1024.0},"type":"SCALAR"},{"allocation_info":{"role":"*"},"name":"disk","scalar":{"value":1024.0},"type":"SCALAR"},{"allocation_info":{"role":"*"},"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}] of framework 07ade0c7-98d5-4a53-a737-74c7275f434a-0000 on agent 07ade0c7-98d5-4a53-a737-74c7275f434a-S0 at slave(254)@10.3.1.5:58444 (mesos-bld-s1.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0914 23:44:16.094899 17112 master.cpp:9116] Removing executor 'default' with resources [] of framework 07ade0c7-98d5-4a53-a737-74c7275f434a-0000 on agent 07ade0c7-98d5-4a53-a737-74c7275f434a-S0 at slave(254)@10.3.1.5:58444 (mesos-bld-s1.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0914 23:44:16.106900 16704 hierarchical.cpp:355] Removed framework 07ade0c7-98d5-4a53-a737-74c7275f434a-0000
E0914 23:44:16.106900 17236 scheduler.cpp:649] End-Of-File received from master. The master closed the event stream
I0914 23:44:16.108899 16564 scheduler.cpp:444] Re-detecting master
I0914 23:44:16.115906 16564 scheduler.cpp:470] New master detected at master@10.3.1.5:58444
I0914 23:44:16.129900 16012 slave.cpp:5413] Executor 'default' of framework 07ade0c7-98d5-4a53-a737-74c7275f434a-0000 exited with status 0
I0914 23:44:16.130900 16012 slave.cpp:5517] Cleaning up executor 'default' of framework 07ade0c7-98d5-4a53-a737-74c7275f434a-0000 (via HTTP)
W0914 23:44:16.130900 17112 master.cpp:7021] Ignoring unknown exited executor 'default' of framework 07ade0c7-98d5-4a53-a737-74c7275f434a-0000 on agent 07ade0c7-98d5-4a53-a737-74c7275f434a-S0 at slave(254)@10.3.1.5:58444 (mesos-bld-s1.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0914 23:44:16.131901 16564 gc.cpp:91] Scheduling 'C:\Users\mesos\AppData\Local\Temp\2\HVA3Fc\slaves\07ade0c7-98d5-4a53-a737-74c7275f434a-S0\frameworks\07ade0c7-98d5-4a53-a737-74c7275f434a-0000\executors\default\runs\f6c2d656-53ec-4d0b-9e44-998cb46ff00f' for gc 6.99998689928296days in the future
I0914 23:44:16.134902 16012 slave.cpp:5613] Cleaning up framework 07ade0c7-98d5-4a53-a737-74c7275f434a-0000
I0914 23:44:16.140902 17376 status_update_manager.cpp:285] Closing status update streams for framework 07ade0c7-98d5-4a53-a737-74c7275f434a-0000
I0914 23:44:16.140902 16564 gc.cpp:91] Scheduling 'C:\Users\mesos\AppData\Local\Temp\2\HVA3Fc\slaves\07ade0c7-98d5-4a53-a737-74c7275f434a-S0\frameworks\07ade0c7-98d5-4a53-a737-74c7275f434a-0000\executors\default' for gc 6.99998687613333days in the future
I0914 23:44:16.140902 16564 gc.cpp:91] Scheduling 'C:\Users\mesos\AppData\Local\Temp\2\HVA3Fc\slaves\07ade0c7-98d5-4a53-a737-74c7275f434a-S0\frameworks\07ade0c7-98d5-4a53-a737-74c7275f434a-0000' for gc 6.99998679510519days in the future
I0914 23:44:16.141901 16012 slave.cpp:867] Agent terminating
I0914 23:44:16.149586 17236 master.cpp:1321] Agent 07ade0c7-98d5-4a53-a737-74c7275f434a-S0 at slave(254)@10.3.1.5:58444 (mesos-bld-s1.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I0914 23:44:16.150491 17236 master.cpp:3304] Disconnecting agent 07ade0c7-98d5-4a53-a737-74c7275f434a-S0 at slave(254)@10.3.1.5:58444 (mesos-bld-s1.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0914 23:44:16.150491 17236 master.cpp:3323] Deactivating agent 07ade0c7-98d5-4a53-a737-74c7275f434a-S0 at slave(254)@10.3.1.5:58444 (mesos-bld-s1.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0914 23:44:16.151501 13412 hierarchical.cpp:690] Agent 07ade0c7-98d5-4a53-a737-74c7275f434a-S0 deactivated
I0914 23:44:16.178504 15748 master.cpp:1163] Master terminating
I0914 23:44:16.182503 17376 hierarchical.cpp:626] Removed agent 07ade0c7-98d5-4a53-a737-74c7275f434a-S0
W0914 23:44:16.186509 15748 master.hpp:2761] Failed to close HTTP pipe for 07ade0c7-98d5-4a53-a737-74c7275f434a-0000 (default)
I0914 23:44:16.889554 15044 process.cpp:1068] Failed to accept socket: future discarded
```

- Mesos Reviewbot Windows


On Sept. 14, 2017, 7:17 p.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62336/
> -----------------------------------------------------------
> 
> (Updated Sept. 14, 2017, 7:17 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Jie Yu, and Vinod Kone.
> 
> 
> Bugs: MESOS-7939
>     https://issues.apache.org/jira/browse/MESOS-7939
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch calls `Slave::checkDiskUsage` after `Slave::recoverFramework`
> so terminated containers are scheduled for GC, and before checkpointing
> anything so there would be enough disk for checkpointing.
> 
> To avoid introducing a one-time disk monitoring function,
> `Slave::checkDiskUsage` no longer contains a delayed recursion. Instead,
> we use `process::loop` to monitor disk ugase periodically.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp 6d1516a5d5b5db684f79385e60d892ff75fd00fd 
> 
> 
> Diff: https://reviews.apache.org/r/62336/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 62336: [WIP] Kicked in disk monitoring early during recovery.

Posted by Chun-Hung Hsiao <ch...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62336/
-----------------------------------------------------------

(Updated Sept. 14, 2017, 7:17 p.m.)


Review request for mesos, Benjamin Mahler, Jie Yu, and Vinod Kone.


Summary (updated)
-----------------

[WIP] Kicked in disk monitoring early during recovery.


Bugs: MESOS-7939
    https://issues.apache.org/jira/browse/MESOS-7939


Repository: mesos


Description
-------

This patch calls `Slave::checkDiskUsage` after `Slave::recoverFramework`
so terminated containers are scheduled for GC, and before checkpointing
anything so there would be enough disk for checkpointing.

To avoid introducing a one-time disk monitoring function,
`Slave::checkDiskUsage` no longer contains a delayed recursion. Instead,
we use `process::loop` to monitor disk ugase periodically.


Diffs
-----

  src/slave/slave.cpp 6d1516a5d5b5db684f79385e60d892ff75fd00fd 


Diff: https://reviews.apache.org/r/62336/diff/1/


Testing
-------

sudo make check


Thanks,

Chun-Hung Hsiao