You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mesos.apache.org by mp...@apache.org on 2016/07/03 15:38:04 UTC

mesos git commit: Documented behavior when framework and master both failover.

Repository: mesos
Updated Branches:
  refs/heads/master f7e88508a -> 096937397


Documented behavior when framework and master both failover.

Review: https://reviews.apache.org/r/49442/


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/09693739
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/09693739
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/09693739

Branch: refs/heads/master
Commit: 096937397c3120ce4f00638cc2ffb88908f95d38
Parents: f7e8850
Author: Neil Conway <ne...@gmail.com>
Authored: Sun Jul 3 17:37:26 2016 +0200
Committer: Michael Park <mp...@apache.org>
Committed: Sun Jul 3 17:37:26 2016 +0200

----------------------------------------------------------------------
 docs/high-availability-framework-guide.md | 9 +++++++++
 1 file changed, 9 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/09693739/docs/high-availability-framework-guide.md
----------------------------------------------------------------------
diff --git a/docs/high-availability-framework-guide.md b/docs/high-availability-framework-guide.md
index 0c73e14..ae5617b 100644
--- a/docs/high-availability-framework-guide.md
+++ b/docs/high-availability-framework-guide.md
@@ -127,6 +127,15 @@ Highly available framework designs typically follow a few common patterns:
     generous value. To avoid accidental destruction of tasks in production
     environments, many frameworks use a `failover_timeout` of 1 week or more.
 
+      * In the current implementation, a framework's `failover_timeout` is not
+        preserved during master failover. Hence, if a framework fails but the
+        leading master fails before the `failover_timeout` is reached, the newly
+        elected leading master won't know that the framework's tasks should be
+        killed after a period of time. Hence, if the framework never
+        reregisters, those tasks will continue to run indefinitely but will be
+        orphaned. This behavior will likely be fixed in a future version of
+        Mesos ([MESOS-4659](https://issues.apache.org/jira/browse/MESOS-4659)).
+
 4. After connecting to the Mesos master, the new leading scheduler should ensure
    that its local state is consistent with the current state of the cluster. For
    example, suppose that the previous leading scheduler attempted to launch a