You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Ben Mahler <be...@gmail.com> on 2013/08/24 04:36:14 UTC
Review Request 13791: Added a recovery timeout for executor driver
self-termination.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13791/
-----------------------------------------------------------
Review request for mesos, Benjamin Hindman and Vinod Kone.
Repository: mesos-git
Description
-------
Before slave recovery, the executor driver self-terminated upon disconnection with the slave.
With slave recovery, the executor driver waits forever upon disconnection with the slave.
This adds a timeout (default of 15 minutes) for the executor to wait before self-terminating. The slave now has a time limit on how long it can stay down before the executor drivers self-terminate. This is configurable via a flag.
Note that this timeout is essential for the process isolator (where it's possible for processes to escape), or when the slave never comes back up.
Diffs
-----
src/exec/exec.cpp ca61892127cd5f977658bbbf3a67cfa82d12dddf
src/launcher/launcher.hpp 637c9bcdfd9c3ee4c071cc46ba8fd274a06873cf
src/launcher/launcher.cpp 004d90e4a21aa9c96a115327c98c3a949eee57c2
src/launcher/main.cpp 5674afb7eeded167af97a953d174f4045860a4c8
src/slave/cgroups_isolator.cpp d4ccd114bdcafcaff2e5b12b3881e46daa46f932
src/slave/constants.hpp 901fdf220a902de9241511393530eb19fdfc3244
src/slave/constants.cpp e8d16ca3307249a8b49720eaf8dcb0e7555fca7a
src/slave/flags.hpp 616be9b3ecb6fc4165be99da580a1c9876e51d81
src/slave/process_isolator.cpp 24a7fb2be63003d50b24848061cc7be313319eb9
src/tests/slave_recovery_tests.cpp 548e8c09875e4d911ac626b15a2556ff4fd8ff4b
Diff: https://reviews.apache.org/r/13791/diff/
Testing
-------
Added a test to verify self-termination.
Thanks,
Ben Mahler
Re: Review Request 13791: Added a recovery timeout for executor driver
self-termination.
Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13791/#review25530
-----------------------------------------------------------
Ship it!
src/exec/exec.cpp
<https://reviews.apache.org/r/13791/#comment49979>
s/registrations/re-registrations/
src/exec/exec.cpp
<https://reviews.apache.org/r/13791/#comment49983>
A log statement here would be great!
src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/13791/#comment49980>
s/ack/_statusUpdateAcknowledgement/
src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/13791/#comment49982>
s/recover/_recover/
- Vinod Kone
On Aug. 24, 2013, 2:36 a.m., Ben Mahler wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13791/
> -----------------------------------------------------------
>
> (Updated Aug. 24, 2013, 2:36 a.m.)
>
>
> Review request for mesos, Benjamin Hindman and Vinod Kone.
>
>
> Repository: mesos-git
>
>
> Description
> -------
>
> Before slave recovery, the executor driver self-terminated upon disconnection with the slave.
>
> With slave recovery, the executor driver waits forever upon disconnection with the slave.
>
> This adds a timeout (default of 15 minutes) for the executor to wait before self-terminating. The slave now has a time limit on how long it can stay down before the executor drivers self-terminate. This is configurable via a flag.
>
> Note that this timeout is essential for the process isolator (where it's possible for processes to escape), or when the slave never comes back up.
>
>
> Diffs
> -----
>
> src/exec/exec.cpp ca61892127cd5f977658bbbf3a67cfa82d12dddf
> src/launcher/launcher.hpp 637c9bcdfd9c3ee4c071cc46ba8fd274a06873cf
> src/launcher/launcher.cpp 004d90e4a21aa9c96a115327c98c3a949eee57c2
> src/launcher/main.cpp 5674afb7eeded167af97a953d174f4045860a4c8
> src/slave/cgroups_isolator.cpp d4ccd114bdcafcaff2e5b12b3881e46daa46f932
> src/slave/constants.hpp 901fdf220a902de9241511393530eb19fdfc3244
> src/slave/constants.cpp e8d16ca3307249a8b49720eaf8dcb0e7555fca7a
> src/slave/flags.hpp 616be9b3ecb6fc4165be99da580a1c9876e51d81
> src/slave/process_isolator.cpp 24a7fb2be63003d50b24848061cc7be313319eb9
> src/tests/slave_recovery_tests.cpp 548e8c09875e4d911ac626b15a2556ff4fd8ff4b
>
> Diff: https://reviews.apache.org/r/13791/diff/
>
>
> Testing
> -------
>
> Added a test to verify self-termination.
>
>
> Thanks,
>
> Ben Mahler
>
>
Re: Review Request 13791: Added a recovery timeout for executor driver
self-termination.
Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13791/
-----------------------------------------------------------
(Updated Aug. 26, 2013, 7:48 p.m.)
Review request for mesos, Benjamin Hindman and Vinod Kone.
Changes
-------
Vinod's review.
Repository: mesos-git
Description
-------
Before slave recovery, the executor driver self-terminated upon disconnection with the slave.
With slave recovery, the executor driver waits forever upon disconnection with the slave.
This adds a timeout (default of 15 minutes) for the executor to wait before self-terminating. The slave now has a time limit on how long it can stay down before the executor drivers self-terminate. This is configurable via a flag.
Note that this timeout is essential for the process isolator (where it's possible for processes to escape), or when the slave never comes back up.
Diffs (updated)
-----
src/exec/exec.cpp ca61892127cd5f977658bbbf3a67cfa82d12dddf
src/launcher/launcher.hpp 637c9bcdfd9c3ee4c071cc46ba8fd274a06873cf
src/launcher/launcher.cpp 004d90e4a21aa9c96a115327c98c3a949eee57c2
src/launcher/main.cpp 5674afb7eeded167af97a953d174f4045860a4c8
src/slave/cgroups_isolator.cpp d4ccd114bdcafcaff2e5b12b3881e46daa46f932
src/slave/constants.hpp 901fdf220a902de9241511393530eb19fdfc3244
src/slave/constants.cpp e8d16ca3307249a8b49720eaf8dcb0e7555fca7a
src/slave/flags.hpp 616be9b3ecb6fc4165be99da580a1c9876e51d81
src/slave/process_isolator.cpp 24a7fb2be63003d50b24848061cc7be313319eb9
src/tests/slave_recovery_tests.cpp 548e8c09875e4d911ac626b15a2556ff4fd8ff4b
Diff: https://reviews.apache.org/r/13791/diff/
Testing
-------
Added a test to verify self-termination.
Thanks,
Ben Mahler