You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@aurora.apache.org by zm...@apache.org on 2016/10/06 00:38:37 UTC

aurora git commit: Add support for receiving min_consecutive_successes in health checker

Repository: aurora
Updated Branches:
  refs/heads/master 640f07bab -> e91130e49


Add support for receiving min_consecutive_successes in health checker

- Add support for receiving a new HealthCheckConfig attribute
  "min_consecutive_successes" in health checker.
- Add an entry in release note that describes the health check driven update
  feature.

This patch is related to https://reviews.apache.org/r/52094/, in which I added a
new configuration value "min_consecutive_successes" in HealthCheckConfig.

Testing Done:
./build-support/jenkins/build.sh

./pants test.pytest src/test/python/apache/aurora/executor::

./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh

Bugs closed: AURORA-894

Reviewed at https://reviews.apache.org/r/52453/


Project: http://git-wip-us.apache.org/repos/asf/aurora/repo
Commit: http://git-wip-us.apache.org/repos/asf/aurora/commit/e91130e4
Tree: http://git-wip-us.apache.org/repos/asf/aurora/tree/e91130e4
Diff: http://git-wip-us.apache.org/repos/asf/aurora/diff/e91130e4

Branch: refs/heads/master
Commit: e91130e49445c3933b6e27f5fde18c3a0e61b87a
Parents: 640f07b
Author: Kai Huang <te...@hotmail.com>
Authored: Wed Oct 5 17:38:28 2016 -0700
Committer: Zameer Manji <zm...@apache.org>
Committed: Wed Oct 5 17:38:28 2016 -0700

----------------------------------------------------------------------
 RELEASE-NOTES.md                                |  5 ++++
 docs/features/job-updates.md                    | 26 +++++++++++++++++---
 .../apache/aurora/client/api/updater_util.py    |  4 +--
 .../aurora/executor/common/health_checker.py    |  3 ++-
 4 files changed, 32 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/aurora/blob/e91130e4/RELEASE-NOTES.md
----------------------------------------------------------------------
diff --git a/RELEASE-NOTES.md b/RELEASE-NOTES.md
index 97f05d5..6968bb5 100644
--- a/RELEASE-NOTES.md
+++ b/RELEASE-NOTES.md
@@ -2,6 +2,11 @@
 =========================
 
 ### New/updated:
+
+- Aurora scheduler job updater can now rely on health check status rather than `watch_secs` timeout
+  when deciding an individual instance update state. This will potentially speed up updates as the
+  `minWaitInInstanceRunningMs` will no longer have to be chosen based on the worst observed instance
+  startup/warmup delay but rather as a desired health check duration.
 - A task's tier is now mapped to a label on the Mesos `TaskInfo` proto.
 
 ### Deprecations and removals:

http://git-wip-us.apache.org/repos/asf/aurora/blob/e91130e4/docs/features/job-updates.md
----------------------------------------------------------------------
diff --git a/docs/features/job-updates.md b/docs/features/job-updates.md
index 792f2ae..c4ec42e 100644
--- a/docs/features/job-updates.md
+++ b/docs/features/job-updates.md
@@ -49,9 +49,29 @@ and performing these operations:
   the new config instance.
 
 The Aurora client continues through the instance list until all tasks are
-updated, in `RUNNING,` and healthy for a configurable amount of time.
-If the client determines the update is not going well (a percentage of health
-checks have failed), it cancels the update.
+updated. If the client determines the update is not going well (a percentage
+of health checks have failed), it cancels the update.
+
+Currently, the scheduler job updater uses two mechanisms to determine when
+to stop monitoring instance update state: a time-based grace interval and health
+check status.
+
+Job updates with health checks disabled (e.g. no \u2018health\u2019 port is defined
+in .aurora portmap) will rely on a time-based grace interval called [watch_secs]
+(../reference/configuration.md#updateconfig-objects).
+An instance will start executing task content when reaching `STARTING`
+state. Once the task sandbox is created, the instance is moved into `RUNNING`
+state. Afterward, the job updater will start the watch_secs countdown to ensure
+an instance is healthy, and then complete the update.
+
+Job updates with health check enabled will rely on health check status. When instance
+reaching `STARTING` state, health checks are performed periodically by the executor
+to ensure the instance is healthy. An instance is moved into `RUNNING` state only if
+a minimum number of consecutive successful health checks are performed
+during the initial warmup period (defined by [initial_interval_secs]
+(../reference/configuration.md#healthcheckconfig-objects)). If watch_secs is
+set as zero, the scheduler job updater will complete the update immediately.
+Otherwise, it will complete the update after the watch_secs expires.
 
 Update cancellation runs a procedure similar to the described above
 update sequence, but in reverse order. New instance configs are swapped

http://git-wip-us.apache.org/repos/asf/aurora/blob/e91130e4/src/main/python/apache/aurora/client/api/updater_util.py
----------------------------------------------------------------------
diff --git a/src/main/python/apache/aurora/client/api/updater_util.py b/src/main/python/apache/aurora/client/api/updater_util.py
index c649316..ebeddab 100644
--- a/src/main/python/apache/aurora/client/api/updater_util.py
+++ b/src/main/python/apache/aurora/client/api/updater_util.py
@@ -35,8 +35,8 @@ class UpdaterConfig(object):
 
     if batch_size <= 0:
       raise ValueError('Batch size should be greater than 0')
-    if watch_secs <= 0:
-      raise ValueError('Watch seconds should be greater than 0')
+    if watch_secs < 0:
+      raise ValueError('Watch seconds should not be negative')
     if pulse_interval_secs is not None and pulse_interval_secs < self.MIN_PULSE_INTERVAL_SECONDS:
       raise ValueError('Pulse interval seconds must be at least %s seconds.'
                        % self.MIN_PULSE_INTERVAL_SECONDS)

http://git-wip-us.apache.org/repos/asf/aurora/blob/e91130e4/src/main/python/apache/aurora/executor/common/health_checker.py
----------------------------------------------------------------------
diff --git a/src/main/python/apache/aurora/executor/common/health_checker.py b/src/main/python/apache/aurora/executor/common/health_checker.py
index 03fbffd..1e0be10 100644
--- a/src/main/python/apache/aurora/executor/common/health_checker.py
+++ b/src/main/python/apache/aurora/executor/common/health_checker.py
@@ -331,6 +331,7 @@ class HealthCheckerProvider(StatusCheckerProvider):
       sandbox,
       interval_secs=health_check_config.get('interval_secs'),
       initial_interval_secs=health_check_config.get('initial_interval_secs'),
-      max_consecutive_failures=health_check_config.get('max_consecutive_failures'))
+      max_consecutive_failures=health_check_config.get('max_consecutive_failures'),
+      min_consecutive_successes=health_check_config.get('min_consecutive_successes'))
 
     return health_checker