You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@aurora.apache.org by wf...@apache.org on 2014/07/23 01:29:26 UTC

git commit: Documentation for task health checking and graceful shutdown.

Repository: incubator-aurora
Updated Branches:
  refs/heads/master d53ff2ad6 -> 813794b0e


Documentation for task health checking and graceful shutdown.

Bugs closed: AURORA-574

Reviewed at https://reviews.apache.org/r/23316/


Project: http://git-wip-us.apache.org/repos/asf/incubator-aurora/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-aurora/commit/813794b0
Tree: http://git-wip-us.apache.org/repos/asf/incubator-aurora/tree/813794b0
Diff: http://git-wip-us.apache.org/repos/asf/incubator-aurora/diff/813794b0

Branch: refs/heads/master
Commit: 813794b0ef7367f0cc2b40d459528beae61ed81f
Parents: d53ff2a
Author: Bill Farner <wf...@apache.org>
Authored: Tue Jul 22 16:29:24 2014 -0700
Committer: Bill Farner <wf...@apache.org>
Committed: Tue Jul 22 16:29:24 2014 -0700

----------------------------------------------------------------------
 docs/user-guide.md | 71 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 57 insertions(+), 14 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-aurora/blob/813794b0/docs/user-guide.md
----------------------------------------------------------------------
diff --git a/docs/user-guide.md b/docs/user-guide.md
index 583a41f..6c703a3 100644
--- a/docs/user-guide.md
+++ b/docs/user-guide.md
@@ -1,17 +1,19 @@
 Aurora User Guide
 -----------------
 
-- [Overview](#overview)
-- [Job Lifecycle](#job-lifecycle)
-  - [Life Of A Task](#life-of-a-task)
-  - [PENDING to RUNNING states](#pending-to-running-states)
-  - [Task Updates](#task-updates)
-  - [Giving Priority to Production Tasks: PREEMPTING](#giving-priority-to-production-tasks-preempting)
-  - [Natural Termination: FINISHED, FAILED](#natural-termination-finished-failed)
-  - [Forceful Termination: KILLING, RESTARTING](#forceful-termination-killing-restarting)
-- [Configuration](#configuration)
-- [Creating Jobs](#creating-jobs)
-- [Interacting With Jobs](#interacting-with-jobs)
+- [Overview](#user-content-overview)
+- [Job Lifecycle](#user-content-job-lifecycle)
+	- [Life Of A Task](#user-content-life-of-a-task)
+	- [PENDING to RUNNING states](#user-content-pending-to-running-states)
+	- [Task Updates](#user-content-task-updates)
+	- [HTTP Health Checking and Graceful Shutdown](#user-content-http-health-checking-and-graceful-shutdown)
+		- [Tearing a task down](#user-content-tearing-a-task-down)
+	- [Giving Priority to Production Tasks: PREEMPTING](#user-content-giving-priority-to-production-tasks-preempting)
+	- [Natural Termination: FINISHED, FAILED](#user-content-natural-termination-finished-failed)
+	- [Forceful Termination: KILLING, RESTARTING](#user-content-forceful-termination-killing-restarting)
+- [Configuration](#user-content-configuration)
+- [Creating Jobs](#user-content-creating-jobs)
+- [Interacting With Jobs](#user-content-interacting-with-jobs)
 
 Overview
 --------
@@ -107,9 +109,6 @@ When Aurora reads a configuration file and finds a `Job` definition, it:
 4.  The scheduler puts the `Task`s into `PENDING` state, starting each
     `Task`'s life cycle.
 
-**Note**: It is not currently possible to create an Aurora job from
-within an Aurora job.
-
 ### Life Of A Task
 
 ![Life of a task](images/lifeofatask.png)
@@ -186,6 +185,50 @@ with old instance configs and batch updates proceed backwards
 from the point where the update failed. E.g.; (0,1,2) (3,4,5) (6,7,
 8-FAIL) results in a rollback in order (8,7,6) (5,4,3) (2,1,0).
 
+### HTTP Health Checking and Graceful Shutdown
+
+The Executor implements a protocol for rudimentary control of a task via HTTP.  Tasks subscribe for
+this protocol by declaring a port named `health`.  Take for example this configuration snippet:
+
+    nginx = Process(
+      name = 'nginx',
+      cmdline = './run_nginx.sh -port {{thermos.ports[http]}}')
+
+When this Process is included in a job, the job will be allocated a port, and the command line
+will be replaced with something like:
+
+    ./run_nginx.sh -port 42816
+
+Where 42816 happens to be the allocated. port.  Typically, the Executor monitors Processes within
+a task only by liveness of the forked process.  However, when a `health` port was allocated, it will
+also send periodic HTTP health checks.  A task requesting a `health` port must handle the following
+requests:
+
+| HTTP request            | Description                             |
+| ------------            | -----------                             |
+| `GET /health`           | Inquires whether the task is healthy.   |
+| `POST /quitquitquit`    | Task should initiate graceful shutdown. |
+| `POST /abortabortabort` | Final warning task is being killed.     |
+
+Please see the
+[configuration reference](configuration-reference.md#user-content-healthcheckconfig-objects) for
+configuration options for this feature.
+
+#### Tearing a task down
+
+The Executor follows an escalation sequence when killing a running task:
+
+  1. If `health` port is not present, skip to (5)
+  2. POST /quitquitquit
+  3. wait 5 seconds
+  4. POST /abortabortabort
+  5. Send SIGTERM (`kill`)
+  6. Send SIGKILL (`kill -9`)
+
+If the Executor notices that all Processes in a Task have aborted during this sequence, it will
+not proceed with subsequent steps.  Note that graceful shutdown is best-effort, and due to the many
+inevitable realities of distributed systems, it may not be performed.
+
 ### Giving Priority to Production Tasks: PREEMPTING
 
 Sometimes a Task needs to be interrupted, such as when a non-production