You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Ben Mahler <be...@gmail.com> on 2013/08/24 04:36:14 UTC

Review Request 13791: Added a recovery timeout for executor driver self-termination.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13791/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman and Vinod Kone.


Repository: mesos-git


Description
-------

Before slave recovery, the executor driver self-terminated upon disconnection with the slave.

With slave recovery, the executor driver waits forever upon disconnection with the slave.

This adds a timeout (default of 15 minutes) for the executor to wait before self-terminating. The slave now has a time limit on how long it can stay down before the executor drivers self-terminate. This is configurable via a flag.

Note that this timeout is essential for the process isolator (where it's possible for processes to escape), or when the slave never comes back up.


Diffs
-----

  src/exec/exec.cpp ca61892127cd5f977658bbbf3a67cfa82d12dddf 
  src/launcher/launcher.hpp 637c9bcdfd9c3ee4c071cc46ba8fd274a06873cf 
  src/launcher/launcher.cpp 004d90e4a21aa9c96a115327c98c3a949eee57c2 
  src/launcher/main.cpp 5674afb7eeded167af97a953d174f4045860a4c8 
  src/slave/cgroups_isolator.cpp d4ccd114bdcafcaff2e5b12b3881e46daa46f932 
  src/slave/constants.hpp 901fdf220a902de9241511393530eb19fdfc3244 
  src/slave/constants.cpp e8d16ca3307249a8b49720eaf8dcb0e7555fca7a 
  src/slave/flags.hpp 616be9b3ecb6fc4165be99da580a1c9876e51d81 
  src/slave/process_isolator.cpp 24a7fb2be63003d50b24848061cc7be313319eb9 
  src/tests/slave_recovery_tests.cpp 548e8c09875e4d911ac626b15a2556ff4fd8ff4b 

Diff: https://reviews.apache.org/r/13791/diff/


Testing
-------

Added a test to verify self-termination.


Thanks,

Ben Mahler


Re: Review Request 13791: Added a recovery timeout for executor driver self-termination.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13791/#review25530
-----------------------------------------------------------

Ship it!



src/exec/exec.cpp
<https://reviews.apache.org/r/13791/#comment49979>

    s/registrations/re-registrations/



src/exec/exec.cpp
<https://reviews.apache.org/r/13791/#comment49983>

    A log statement here would be great!



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/13791/#comment49980>

    s/ack/_statusUpdateAcknowledgement/



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/13791/#comment49982>

    s/recover/_recover/


- Vinod Kone


On Aug. 24, 2013, 2:36 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13791/
> -----------------------------------------------------------
> 
> (Updated Aug. 24, 2013, 2:36 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Before slave recovery, the executor driver self-terminated upon disconnection with the slave.
> 
> With slave recovery, the executor driver waits forever upon disconnection with the slave.
> 
> This adds a timeout (default of 15 minutes) for the executor to wait before self-terminating. The slave now has a time limit on how long it can stay down before the executor drivers self-terminate. This is configurable via a flag.
> 
> Note that this timeout is essential for the process isolator (where it's possible for processes to escape), or when the slave never comes back up.
> 
> 
> Diffs
> -----
> 
>   src/exec/exec.cpp ca61892127cd5f977658bbbf3a67cfa82d12dddf 
>   src/launcher/launcher.hpp 637c9bcdfd9c3ee4c071cc46ba8fd274a06873cf 
>   src/launcher/launcher.cpp 004d90e4a21aa9c96a115327c98c3a949eee57c2 
>   src/launcher/main.cpp 5674afb7eeded167af97a953d174f4045860a4c8 
>   src/slave/cgroups_isolator.cpp d4ccd114bdcafcaff2e5b12b3881e46daa46f932 
>   src/slave/constants.hpp 901fdf220a902de9241511393530eb19fdfc3244 
>   src/slave/constants.cpp e8d16ca3307249a8b49720eaf8dcb0e7555fca7a 
>   src/slave/flags.hpp 616be9b3ecb6fc4165be99da580a1c9876e51d81 
>   src/slave/process_isolator.cpp 24a7fb2be63003d50b24848061cc7be313319eb9 
>   src/tests/slave_recovery_tests.cpp 548e8c09875e4d911ac626b15a2556ff4fd8ff4b 
> 
> Diff: https://reviews.apache.org/r/13791/diff/
> 
> 
> Testing
> -------
> 
> Added a test to verify self-termination.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Re: Review Request 13791: Added a recovery timeout for executor driver self-termination.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13791/
-----------------------------------------------------------

(Updated Aug. 26, 2013, 7:48 p.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Changes
-------

Vinod's review.


Repository: mesos-git


Description
-------

Before slave recovery, the executor driver self-terminated upon disconnection with the slave.

With slave recovery, the executor driver waits forever upon disconnection with the slave.

This adds a timeout (default of 15 minutes) for the executor to wait before self-terminating. The slave now has a time limit on how long it can stay down before the executor drivers self-terminate. This is configurable via a flag.

Note that this timeout is essential for the process isolator (where it's possible for processes to escape), or when the slave never comes back up.


Diffs (updated)
-----

  src/exec/exec.cpp ca61892127cd5f977658bbbf3a67cfa82d12dddf 
  src/launcher/launcher.hpp 637c9bcdfd9c3ee4c071cc46ba8fd274a06873cf 
  src/launcher/launcher.cpp 004d90e4a21aa9c96a115327c98c3a949eee57c2 
  src/launcher/main.cpp 5674afb7eeded167af97a953d174f4045860a4c8 
  src/slave/cgroups_isolator.cpp d4ccd114bdcafcaff2e5b12b3881e46daa46f932 
  src/slave/constants.hpp 901fdf220a902de9241511393530eb19fdfc3244 
  src/slave/constants.cpp e8d16ca3307249a8b49720eaf8dcb0e7555fca7a 
  src/slave/flags.hpp 616be9b3ecb6fc4165be99da580a1c9876e51d81 
  src/slave/process_isolator.cpp 24a7fb2be63003d50b24848061cc7be313319eb9 
  src/tests/slave_recovery_tests.cpp 548e8c09875e4d911ac626b15a2556ff4fd8ff4b 

Diff: https://reviews.apache.org/r/13791/diff/


Testing
-------

Added a test to verify self-termination.


Thanks,

Ben Mahler