You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mesos.apache.org by vi...@apache.org on 2016/01/28 21:45:56 UTC
mesos git commit: Corrected the documentation about slave removal
semantics.
Repository: mesos
Updated Branches:
refs/heads/master 9f06e3407 -> 05533f43f
Corrected the documentation about slave removal semantics.
Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/05533f43
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/05533f43
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/05533f43
Branch: refs/heads/master
Commit: 05533f43fe8b1a5ebb99ab9b8ac5df83034a7d98
Parents: 9f06e34
Author: Vinod Kone <vi...@gmail.com>
Authored: Thu Jan 28 12:45:23 2016 -0800
Committer: Vinod Kone <vi...@gmail.com>
Committed: Thu Jan 28 12:45:23 2016 -0800
----------------------------------------------------------------------
docs/high-availability-framework-guide.md | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/mesos/blob/05533f43/docs/high-availability-framework-guide.md
----------------------------------------------------------------------
diff --git a/docs/high-availability-framework-guide.md b/docs/high-availability-framework-guide.md
index 1f65a3c..faae27d 100644
--- a/docs/high-availability-framework-guide.md
+++ b/docs/high-availability-framework-guide.md
@@ -214,8 +214,9 @@ that the agent has failed and takes steps to remove it from the cluster. Specifi
* If the framework is [checkpointing](slave-recovery.md): No immediate action is taken. The agent is
given a chance to reconnect until health checks time out.
- * If the framework is not-checkpointing: All the framework's tasks and executors are immediately marked
- as "failed" and resources recovered. The agent is given a chance to reconnect until health checks timeout.
+ * If the framework is not-checkpointing: All the framework's tasks and executors are considered lost. Master
+ immediately sends `TASK_LOST` status updates for the tasks. These updates are not delivered reliably to the
+ scheduler (see NOTE below). The agent is given a chance to reconnect until health checks timeout.
* If the agent fails health checks it is scheduled for removal. The removals can be rate limited by the master
(see `---slave_removal_rate_limit` master flag) to avoid removing a slew of slaves at once (e.g., during a
@@ -227,18 +228,15 @@ that the agent has failed and takes steps to remove it from the cluster. Specifi
and the agent asked to shutdown. A shutting down agent shuts down all running tasks and executors,
but any persistent volumes and dynamic reservations are still preserved.
- * To allow the failed agent node to rejoin the cluster, a new `mesos-slave`
+ * To allow the removed agent node to rejoin the cluster, a new `mesos-slave`
process can be started. This will ensure the agent receives a new agent ID and register with master
possibly with previously created persistent volumes and dynamic reservations. In effect, the agent will
be treated as a newly joined agent.
-* For each agent that is marked "failed" the scheduler receives
+* For each agent that is marked "removed" the scheduler receives a `slaveLost` callback and `TASK_LOST` status
+ updates for each task that was running on the agent
- * `slaveLost` callback
- * `executorLost` callback for each custom executor that was running on the agent
- * `TASK_LOST` status updates for each task that was running on the agent
-
- >NOTE: None of these callbacks or updates are reliably delivered by the master. For example if
+ >NOTE: Neither the callback nor the updates are reliably delivered by the master. For example if
the master or scheduler fails over or there is a network connection issue during the delivery
of these messages, they will not be resent.