You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2013/03/04 22:51:12 UTC
Re: Review Request: Added registration timeout for executors.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8077/
-----------------------------------------------------------
(Updated March 4, 2013, 9:51 p.m.)
Review request for mesos, Benjamin Hindman and Ben Mahler.
Changes
-------
Rebased off of "Properly cleanup cgroups" https://reviews.apache.org/r/9408/.
Summary (updated)
-----------------
Added registration timeout for executors.
Description
-------
This paves the way for a slave to shutdown executors that are not responsive during launch (e.g. HDFS hang during fetching executor).
This fix is not enough for cgroups isolation module (enough for the process based isolation module) because it currently blocks while launching an executor.
This addresses bug MESOS-310.
https://issues.apache.org/jira/browse/MESOS-310
Diffs (updated)
-----
src/master/hierarchical_allocator_process.hpp 33e059c10bb87db88b1fb9a100b7504bc89ac896
src/master/master.cpp 814a6e1af2c7caf77452748dfc9e6935d8386d8f
src/slave/cgroups_isolation_module.hpp a04fc46b15d2741886f5847cadbdc9bed351c588
src/slave/constants.hpp dc5b9877e2608f6806f13b6ef4d133b6fddaece4
src/slave/constants.cpp ecb96fd42afdb74aca394411c6ff60276f509cfd
src/slave/process_based_isolation_module.hpp 6a4e6aef9ab4b13cce9341f6f35c6b24e6f8fde7
src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8
src/slave/reaper.cpp c0ee4b4c07fd792bcb39455b666808b712eb32c2
src/slave/slave.hpp 7648c33230c1900eda7529045c5df9ccab105d47
src/slave/slave.cpp c13d268c5e0e107902f64e30304a18128927a571
src/tests/master_tests.cpp 104610cef3fdaaea75271657c42c4fd2b6b61618
third_party/libprocess/include/process/delay.hpp cb2fa9ab372e62befec01a95b3ca401567a1a626
Diff: https://reviews.apache.org/r/8077/diff/
Testing
-------
make check
Thanks,
Vinod Kone
Re: Review Request: Added registration timeout for executors.
Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8077/#review18109
-----------------------------------------------------------
Ship it!
src/slave/constants.cpp
<https://reviews.apache.org/r/8077/#comment38227>
plumb it through flags.
src/slave/slave.cpp
<https://reviews.apache.org/r/8077/#comment38228>
kill comment
src/slave/slave.cpp
<https://reviews.apache.org/r/8077/#comment38229>
kill comment
src/slave/slave.cpp
<https://reviews.apache.org/r/8077/#comment38230>
s/WARNING/INFO
rephrase
src/slave/slave.cpp
<https://reviews.apache.org/r/8077/#comment38231>
flags.
- Vinod Kone
On March 16, 2013, 1:49 a.m., Vinod Kone wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8077/
> -----------------------------------------------------------
>
> (Updated March 16, 2013, 1:49 a.m.)
>
>
> Review request for mesos, Benjamin Hindman and Ben Mahler.
>
>
> Description
> -------
>
> This paves the way for a slave to shutdown executors that are not responsive during launch (e.g. HDFS hang during fetching executor, executor process hung after fork).
>
>
> This addresses bug MESOS-310.
> https://issues.apache.org/jira/browse/MESOS-310
>
>
> Diffs
> -----
>
> src/master/hierarchical_allocator_process.hpp c1d6f54fb0c93d255f95ee85e1305022db1ce3cc
> src/master/master.cpp 87bbfac692c3f37f4ef1112b1a44e2afa01296d8
> src/slave/cgroups_isolator.hpp 049d50e306d936708d2bde72b1f16f8accaf9a13
> src/slave/cgroups_isolator.cpp c06baecac5a89886673a36d3dad934e06a577875
> src/slave/constants.hpp 02c45398cbc2686de22750c38f69761e0fccb24c
> src/slave/constants.cpp 8ef3642d1624db71f40a07be86373f44afa93f21
> src/slave/process_isolator.cpp e719a0711157f37cf87b92127fd4f50287b1cf21
> src/slave/slave.hpp dadd06850565d07b46cea2f581661fb57a121118
> src/slave/slave.cpp 912499e4cb34afa9bcc1556fd9536581001933ca
> src/tests/master_tests.cpp 9d13ebe2f1d3cbc1d39fd8bdd5060a15e2767eb0
> third_party/libprocess/include/process/delay.hpp cb2fa9ab372e62befec01a95b3ca401567a1a626
>
> Diff: https://reviews.apache.org/r/8077/diff/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Vinod Kone
>
>
Re: Review Request: Added registration timeout for executors.
Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8077/
-----------------------------------------------------------
(Updated March 20, 2013, 1:17 a.m.)
Review request for mesos, Benjamin Hindman and Ben Mahler.
Changes
-------
benh's comments. no need for further review, but need a shipit.
Description
-------
This paves the way for a slave to shutdown executors that are not responsive during launch (e.g. HDFS hang during fetching executor, executor process hung after fork).
This addresses bug MESOS-310.
https://issues.apache.org/jira/browse/MESOS-310
Diffs (updated)
-----
src/master/hierarchical_allocator_process.hpp c1d6f54fb0c93d255f95ee85e1305022db1ce3cc
src/master/master.cpp 87bbfac692c3f37f4ef1112b1a44e2afa01296d8
src/slave/cgroups_isolator.hpp 049d50e306d936708d2bde72b1f16f8accaf9a13
src/slave/cgroups_isolator.cpp c06baecac5a89886673a36d3dad934e06a577875
src/slave/constants.hpp 02c45398cbc2686de22750c38f69761e0fccb24c
src/slave/constants.cpp 8ef3642d1624db71f40a07be86373f44afa93f21
src/slave/flags.hpp 546edb0c34ca59e2d84c4ba8a35a66d1411422bd
src/slave/process_isolator.cpp e719a0711157f37cf87b92127fd4f50287b1cf21
src/slave/slave.hpp dadd06850565d07b46cea2f581661fb57a121118
src/slave/slave.cpp 912499e4cb34afa9bcc1556fd9536581001933ca
src/tests/master_tests.cpp 9d13ebe2f1d3cbc1d39fd8bdd5060a15e2767eb0
third_party/libprocess/include/process/delay.hpp cb2fa9ab372e62befec01a95b3ca401567a1a626
Diff: https://reviews.apache.org/r/8077/diff/
Testing
-------
make check
Thanks,
Vinod Kone
Re: Review Request: Added registration timeout for executors.
Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8077/
-----------------------------------------------------------
(Updated March 16, 2013, 1:49 a.m.)
Review request for mesos, Benjamin Hindman and Ben Mahler.
Changes
-------
rebased off trunk. ready for review!
also s/reason/message/ in cgroups isolator per discussion with benh.
Description (updated)
-------
This paves the way for a slave to shutdown executors that are not responsive during launch (e.g. HDFS hang during fetching executor, executor process hung after fork).
This addresses bug MESOS-310.
https://issues.apache.org/jira/browse/MESOS-310
Diffs (updated)
-----
src/master/hierarchical_allocator_process.hpp c1d6f54fb0c93d255f95ee85e1305022db1ce3cc
src/master/master.cpp 87bbfac692c3f37f4ef1112b1a44e2afa01296d8
src/slave/cgroups_isolator.hpp 049d50e306d936708d2bde72b1f16f8accaf9a13
src/slave/cgroups_isolator.cpp c06baecac5a89886673a36d3dad934e06a577875
src/slave/constants.hpp 02c45398cbc2686de22750c38f69761e0fccb24c
src/slave/constants.cpp 8ef3642d1624db71f40a07be86373f44afa93f21
src/slave/process_isolator.cpp e719a0711157f37cf87b92127fd4f50287b1cf21
src/slave/slave.hpp dadd06850565d07b46cea2f581661fb57a121118
src/slave/slave.cpp 912499e4cb34afa9bcc1556fd9536581001933ca
src/tests/master_tests.cpp 9d13ebe2f1d3cbc1d39fd8bdd5060a15e2767eb0
third_party/libprocess/include/process/delay.hpp cb2fa9ab372e62befec01a95b3ca401567a1a626
Diff: https://reviews.apache.org/r/8077/diff/
Testing
-------
make check
Thanks,
Vinod Kone
Re: Review Request: Added registration timeout for executors.
Posted by Benjamin Mahler <bm...@twitter.com>.
My 'ship it' stands, unless there's anything else you think needs more
reviewing.
On Mon, Mar 4, 2013 at 1:51 PM, Vinod Kone <vi...@gmail.com> wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8077/
> -----------------------------------------------------------
>
> (Updated March 4, 2013, 9:51 p.m.)
>
>
> Review request for mesos, Benjamin Hindman and Ben Mahler.
>
>
> Changes
> -------
>
> Rebased off of "Properly cleanup cgroups" https://reviews.apache.org/r/9408/.
>
>
> Summary (updated)
> -----------------
>
> Added registration timeout for executors.
>
>
> Description
> -------
>
> This paves the way for a slave to shutdown executors that are not responsive during launch (e.g. HDFS hang during fetching executor).
>
> This fix is not enough for cgroups isolation module (enough for the process based isolation module) because it currently blocks while launching an executor.
>
>
> This addresses bug MESOS-310.
> https://issues.apache.org/jira/browse/MESOS-310
>
>
> Diffs (updated)
> -----
>
> src/master/hierarchical_allocator_process.hpp 33e059c10bb87db88b1fb9a100b7504bc89ac896
> src/master/master.cpp 814a6e1af2c7caf77452748dfc9e6935d8386d8f
> src/slave/cgroups_isolation_module.hpp a04fc46b15d2741886f5847cadbdc9bed351c588
> src/slave/constants.hpp dc5b9877e2608f6806f13b6ef4d133b6fddaece4
> src/slave/constants.cpp ecb96fd42afdb74aca394411c6ff60276f509cfd
> src/slave/process_based_isolation_module.hpp 6a4e6aef9ab4b13cce9341f6f35c6b24e6f8fde7
> src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8
> src/slave/reaper.cpp c0ee4b4c07fd792bcb39455b666808b712eb32c2
> src/slave/slave.hpp 7648c33230c1900eda7529045c5df9ccab105d47
> src/slave/slave.cpp c13d268c5e0e107902f64e30304a18128927a571
> src/tests/master_tests.cpp 104610cef3fdaaea75271657c42c4fd2b6b61618
> third_party/libprocess/include/process/delay.hpp cb2fa9ab372e62befec01a95b3ca401567a1a626
>
> Diff: https://reviews.apache.org/r/8077/diff/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Vinod Kone
>