You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2013/03/04 22:51:12 UTC

Re: Review Request: Added registration timeout for executors.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8077/
-----------------------------------------------------------

(Updated March 4, 2013, 9:51 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Rebased off of "Properly cleanup cgroups" https://reviews.apache.org/r/9408/.


Summary (updated)
-----------------

Added registration timeout for executors.


Description
-------

This paves the way for a slave to shutdown executors that are not responsive during launch (e.g. HDFS hang during fetching executor).

This fix is not enough for cgroups isolation module (enough for the process based isolation module) because it currently blocks while launching an executor.


This addresses bug MESOS-310.
    https://issues.apache.org/jira/browse/MESOS-310


Diffs (updated)
-----

  src/master/hierarchical_allocator_process.hpp 33e059c10bb87db88b1fb9a100b7504bc89ac896 
  src/master/master.cpp 814a6e1af2c7caf77452748dfc9e6935d8386d8f 
  src/slave/cgroups_isolation_module.hpp a04fc46b15d2741886f5847cadbdc9bed351c588 
  src/slave/constants.hpp dc5b9877e2608f6806f13b6ef4d133b6fddaece4 
  src/slave/constants.cpp ecb96fd42afdb74aca394411c6ff60276f509cfd 
  src/slave/process_based_isolation_module.hpp 6a4e6aef9ab4b13cce9341f6f35c6b24e6f8fde7 
  src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
  src/slave/reaper.cpp c0ee4b4c07fd792bcb39455b666808b712eb32c2 
  src/slave/slave.hpp 7648c33230c1900eda7529045c5df9ccab105d47 
  src/slave/slave.cpp c13d268c5e0e107902f64e30304a18128927a571 
  src/tests/master_tests.cpp 104610cef3fdaaea75271657c42c4fd2b6b61618 
  third_party/libprocess/include/process/delay.hpp cb2fa9ab372e62befec01a95b3ca401567a1a626 

Diff: https://reviews.apache.org/r/8077/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Added registration timeout for executors.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8077/#review18109
-----------------------------------------------------------

Ship it!



src/slave/constants.cpp
<https://reviews.apache.org/r/8077/#comment38227>

    plumb it through flags.



src/slave/slave.cpp
<https://reviews.apache.org/r/8077/#comment38228>

    kill comment



src/slave/slave.cpp
<https://reviews.apache.org/r/8077/#comment38229>

    kill comment



src/slave/slave.cpp
<https://reviews.apache.org/r/8077/#comment38230>

    s/WARNING/INFO
    
    rephrase



src/slave/slave.cpp
<https://reviews.apache.org/r/8077/#comment38231>

    flags.


- Vinod Kone


On March 16, 2013, 1:49 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8077/
> -----------------------------------------------------------
> 
> (Updated March 16, 2013, 1:49 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> This paves the way for a slave to shutdown executors that are not responsive during launch (e.g. HDFS hang during fetching executor, executor process hung after fork).
> 
> 
> This addresses bug MESOS-310.
>     https://issues.apache.org/jira/browse/MESOS-310
> 
> 
> Diffs
> -----
> 
>   src/master/hierarchical_allocator_process.hpp c1d6f54fb0c93d255f95ee85e1305022db1ce3cc 
>   src/master/master.cpp 87bbfac692c3f37f4ef1112b1a44e2afa01296d8 
>   src/slave/cgroups_isolator.hpp 049d50e306d936708d2bde72b1f16f8accaf9a13 
>   src/slave/cgroups_isolator.cpp c06baecac5a89886673a36d3dad934e06a577875 
>   src/slave/constants.hpp 02c45398cbc2686de22750c38f69761e0fccb24c 
>   src/slave/constants.cpp 8ef3642d1624db71f40a07be86373f44afa93f21 
>   src/slave/process_isolator.cpp e719a0711157f37cf87b92127fd4f50287b1cf21 
>   src/slave/slave.hpp dadd06850565d07b46cea2f581661fb57a121118 
>   src/slave/slave.cpp 912499e4cb34afa9bcc1556fd9536581001933ca 
>   src/tests/master_tests.cpp 9d13ebe2f1d3cbc1d39fd8bdd5060a15e2767eb0 
>   third_party/libprocess/include/process/delay.hpp cb2fa9ab372e62befec01a95b3ca401567a1a626 
> 
> Diff: https://reviews.apache.org/r/8077/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Added registration timeout for executors.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8077/
-----------------------------------------------------------

(Updated March 20, 2013, 1:17 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

benh's comments. no need for further review, but need a shipit.


Description
-------

This paves the way for a slave to shutdown executors that are not responsive during launch (e.g. HDFS hang during fetching executor, executor process hung after fork).


This addresses bug MESOS-310.
    https://issues.apache.org/jira/browse/MESOS-310


Diffs (updated)
-----

  src/master/hierarchical_allocator_process.hpp c1d6f54fb0c93d255f95ee85e1305022db1ce3cc 
  src/master/master.cpp 87bbfac692c3f37f4ef1112b1a44e2afa01296d8 
  src/slave/cgroups_isolator.hpp 049d50e306d936708d2bde72b1f16f8accaf9a13 
  src/slave/cgroups_isolator.cpp c06baecac5a89886673a36d3dad934e06a577875 
  src/slave/constants.hpp 02c45398cbc2686de22750c38f69761e0fccb24c 
  src/slave/constants.cpp 8ef3642d1624db71f40a07be86373f44afa93f21 
  src/slave/flags.hpp 546edb0c34ca59e2d84c4ba8a35a66d1411422bd 
  src/slave/process_isolator.cpp e719a0711157f37cf87b92127fd4f50287b1cf21 
  src/slave/slave.hpp dadd06850565d07b46cea2f581661fb57a121118 
  src/slave/slave.cpp 912499e4cb34afa9bcc1556fd9536581001933ca 
  src/tests/master_tests.cpp 9d13ebe2f1d3cbc1d39fd8bdd5060a15e2767eb0 
  third_party/libprocess/include/process/delay.hpp cb2fa9ab372e62befec01a95b3ca401567a1a626 

Diff: https://reviews.apache.org/r/8077/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Added registration timeout for executors.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8077/
-----------------------------------------------------------

(Updated March 16, 2013, 1:49 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

rebased off trunk. ready for review!

also s/reason/message/ in cgroups isolator per discussion with benh.


Description (updated)
-------

This paves the way for a slave to shutdown executors that are not responsive during launch (e.g. HDFS hang during fetching executor, executor process hung after fork).


This addresses bug MESOS-310.
    https://issues.apache.org/jira/browse/MESOS-310


Diffs (updated)
-----

  src/master/hierarchical_allocator_process.hpp c1d6f54fb0c93d255f95ee85e1305022db1ce3cc 
  src/master/master.cpp 87bbfac692c3f37f4ef1112b1a44e2afa01296d8 
  src/slave/cgroups_isolator.hpp 049d50e306d936708d2bde72b1f16f8accaf9a13 
  src/slave/cgroups_isolator.cpp c06baecac5a89886673a36d3dad934e06a577875 
  src/slave/constants.hpp 02c45398cbc2686de22750c38f69761e0fccb24c 
  src/slave/constants.cpp 8ef3642d1624db71f40a07be86373f44afa93f21 
  src/slave/process_isolator.cpp e719a0711157f37cf87b92127fd4f50287b1cf21 
  src/slave/slave.hpp dadd06850565d07b46cea2f581661fb57a121118 
  src/slave/slave.cpp 912499e4cb34afa9bcc1556fd9536581001933ca 
  src/tests/master_tests.cpp 9d13ebe2f1d3cbc1d39fd8bdd5060a15e2767eb0 
  third_party/libprocess/include/process/delay.hpp cb2fa9ab372e62befec01a95b3ca401567a1a626 

Diff: https://reviews.apache.org/r/8077/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Added registration timeout for executors.

Posted by Benjamin Mahler <bm...@twitter.com>.
My 'ship it' stands, unless there's anything else you think needs more
reviewing.

On Mon, Mar 4, 2013 at 1:51 PM, Vinod Kone <vi...@gmail.com> wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8077/
> -----------------------------------------------------------
>
> (Updated March 4, 2013, 9:51 p.m.)
>
>
> Review request for mesos, Benjamin Hindman and Ben Mahler.
>
>
> Changes
> -------
>
> Rebased off of "Properly cleanup cgroups" https://reviews.apache.org/r/9408/.
>
>
> Summary (updated)
> -----------------
>
> Added registration timeout for executors.
>
>
> Description
> -------
>
> This paves the way for a slave to shutdown executors that are not responsive during launch (e.g. HDFS hang during fetching executor).
>
> This fix is not enough for cgroups isolation module (enough for the process based isolation module) because it currently blocks while launching an executor.
>
>
> This addresses bug MESOS-310.
>     https://issues.apache.org/jira/browse/MESOS-310
>
>
> Diffs (updated)
> -----
>
>   src/master/hierarchical_allocator_process.hpp 33e059c10bb87db88b1fb9a100b7504bc89ac896
>   src/master/master.cpp 814a6e1af2c7caf77452748dfc9e6935d8386d8f
>   src/slave/cgroups_isolation_module.hpp a04fc46b15d2741886f5847cadbdc9bed351c588
>   src/slave/constants.hpp dc5b9877e2608f6806f13b6ef4d133b6fddaece4
>   src/slave/constants.cpp ecb96fd42afdb74aca394411c6ff60276f509cfd
>   src/slave/process_based_isolation_module.hpp 6a4e6aef9ab4b13cce9341f6f35c6b24e6f8fde7
>   src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8
>   src/slave/reaper.cpp c0ee4b4c07fd792bcb39455b666808b712eb32c2
>   src/slave/slave.hpp 7648c33230c1900eda7529045c5df9ccab105d47
>   src/slave/slave.cpp c13d268c5e0e107902f64e30304a18128927a571
>   src/tests/master_tests.cpp 104610cef3fdaaea75271657c42c4fd2b6b61618
>   third_party/libprocess/include/process/delay.hpp cb2fa9ab372e62befec01a95b3ca401567a1a626
>
> Diff: https://reviews.apache.org/r/8077/diff/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Vinod Kone
>