You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org> on 2012/03/23 02:18:22 UTC

[jira] [Commented] (MESOS-110) Mesos deploys should not restart tasks

    [ https://issues.apache.org/jira/browse/MESOS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236253#comment-13236253 ] 

jiraposter@reviews.apache.org commented on MESOS-110:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4462/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman and John Sirois.


Summary
-------

Sorry for the huge  CL!

Slave restarts now supports recovery!
--> Non-disruptive restart means running tasks are not lost
--> Re-connects to live executors
--> Checkpoints and reliably sends status updates
--> Ability to kill executors if the slave upgrade is incompatible with running executors


This addresses bug mesos-110.
    https://issues.apache.org/jira/browse/mesos-110


Diffs
-----

  src/Makefile.am 090c07a 
  src/common/hashset.hpp 1feb610 
  src/common/utils.hpp 015fef8 
  src/exec/exec.cpp 53f7e54 
  src/launcher/launcher.cpp 98a4847 
  src/local/local.hpp 55f9eaf 
  src/local/local.cpp affe432 
  src/master/master.cpp 6ae8aef 
  src/messages/messages.proto 7f9cffe 
  src/sched/sched.cpp 43d9717 
  src/slave/constants.hpp f0c8679 
  src/slave/isolation_module.hpp c896908 
  src/slave/lxc_isolation_module.hpp b7beefe 
  src/slave/lxc_isolation_module.cpp 8c25dd4 
  src/slave/main.cpp ac780c4 
  src/slave/process_based_isolation_module.hpp f6f9554 
  src/slave/process_based_isolation_module.cpp e0f3ee8 
  src/slave/slave.hpp b7bc45a 
  src/slave/slave.cpp 9332caa 
  src/tests/fault_tolerance_tests.cpp 130218d 
  src/tests/slave_restart_tests.cpp PRE-CREATION 
  src/tests/utils.hpp 8c038ce 

Diff: https://reviews.apache.org/r/4462/diff


Testing
-------

make check.

Note that only the new test in tests/slave_restart_tests.cpp  engages in recovery!

Recovery is disabled for old tests (though they still checkpoint relevant info!)


Thanks,

Vinod


                
> Mesos deploys should not restart tasks
> --------------------------------------
>
>                 Key: MESOS-110
>                 URL: https://issues.apache.org/jira/browse/MESOS-110
>             Project: Mesos
>          Issue Type: Improvement
>          Components: framework
>            Reporter: Rob Benson
>            Assignee: Vinod Kone
>
> Running a long-lived service on Mesos has a significant drawback right now in that Mesos build deploys restart your tasks. This could lead to nontrivial outages for services that have a high warm-up time.  Basically everything would need a graceful restart mechanism that basically allows a shutdown/restart with a new version of the code. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira