You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org> on 2012/04/03 22:58:31 UTC
[jira] [Commented] (MESOS-110) Mesos deploys should not restart
tasks
[ https://issues.apache.org/jira/browse/MESOS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245731#comment-13245731 ]
jiraposter@reviews.apache.org commented on MESOS-110:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4462/
-----------------------------------------------------------
(Updated 2012-04-03 20:58:17.982746)
Review request for mesos, Benjamin Hindman and John Sirois.
Changes
-------
merged with trunk
Summary
-------
Sorry for the huge CL!
Slave restarts now supports recovery!
--> Non-disruptive restart means running tasks are not lost
--> Re-connects with live executors
--> Checkpoints and reliably sends status updates
--> Ability to kill executors if the slave upgrade is incompatible with running executors
This addresses bug mesos-110.
https://issues.apache.org/jira/browse/mesos-110
Diffs (updated)
-----
src/Makefile.am d5edaa2
src/common/hashset.hpp 1feb610
src/common/utils.hpp 1d81e21
src/exec/exec.cpp e8db407
src/launcher/launcher.cpp a141b9a
src/local/local.hpp 55f9eaf
src/local/local.cpp affe432
src/master/master.cpp 4dc9ee0
src/messages/messages.proto 87e1548
src/sched/sched.cpp dcadb10
src/slave/constants.hpp f0c8679
src/slave/isolation_module.hpp c896908
src/slave/lxc_isolation_module.hpp b7beefe
src/slave/lxc_isolation_module.cpp 66a2a89
src/slave/main.cpp 85cba25
src/slave/process_based_isolation_module.hpp f6f9554
src/slave/process_based_isolation_module.cpp 2b37d42
src/slave/slave.hpp 279bc7b
src/slave/slave.cpp 3358ec4
src/tests/fault_tolerance_tests.cpp 6772daf
src/tests/slave_restart_tests.cpp PRE-CREATION
src/tests/utils.hpp e81ec82
Diff: https://reviews.apache.org/r/4462/diff
Testing
-------
make check.
Note that only the new test in tests/slave_restart_tests.cpp engages in recovery!
Recovery is disabled for old tests (though they still checkpoint relevant info!)
Thanks,
Vinod
> Mesos deploys should not restart tasks
> --------------------------------------
>
> Key: MESOS-110
> URL: https://issues.apache.org/jira/browse/MESOS-110
> Project: Mesos
> Issue Type: Improvement
> Components: framework
> Reporter: Rob Benson
> Assignee: Vinod Kone
>
> Running a long-lived service on Mesos has a significant drawback right now in that Mesos build deploys restart your tasks. This could lead to nontrivial outages for services that have a high warm-up time. Basically everything would need a graceful restart mechanism that basically allows a shutdown/restart with a new version of the code.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira