You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@aurora.apache.org by wf...@apache.org on 2014/07/23 01:29:26 UTC
git commit: Documentation for task health checking and graceful
shutdown.
Repository: incubator-aurora
Updated Branches:
refs/heads/master d53ff2ad6 -> 813794b0e
Documentation for task health checking and graceful shutdown.
Bugs closed: AURORA-574
Reviewed at https://reviews.apache.org/r/23316/
Project: http://git-wip-us.apache.org/repos/asf/incubator-aurora/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-aurora/commit/813794b0
Tree: http://git-wip-us.apache.org/repos/asf/incubator-aurora/tree/813794b0
Diff: http://git-wip-us.apache.org/repos/asf/incubator-aurora/diff/813794b0
Branch: refs/heads/master
Commit: 813794b0ef7367f0cc2b40d459528beae61ed81f
Parents: d53ff2a
Author: Bill Farner <wf...@apache.org>
Authored: Tue Jul 22 16:29:24 2014 -0700
Committer: Bill Farner <wf...@apache.org>
Committed: Tue Jul 22 16:29:24 2014 -0700
----------------------------------------------------------------------
docs/user-guide.md | 71 +++++++++++++++++++++++++++++++++++++++----------
1 file changed, 57 insertions(+), 14 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-aurora/blob/813794b0/docs/user-guide.md
----------------------------------------------------------------------
diff --git a/docs/user-guide.md b/docs/user-guide.md
index 583a41f..6c703a3 100644
--- a/docs/user-guide.md
+++ b/docs/user-guide.md
@@ -1,17 +1,19 @@
Aurora User Guide
-----------------
-- [Overview](#overview)
-- [Job Lifecycle](#job-lifecycle)
- - [Life Of A Task](#life-of-a-task)
- - [PENDING to RUNNING states](#pending-to-running-states)
- - [Task Updates](#task-updates)
- - [Giving Priority to Production Tasks: PREEMPTING](#giving-priority-to-production-tasks-preempting)
- - [Natural Termination: FINISHED, FAILED](#natural-termination-finished-failed)
- - [Forceful Termination: KILLING, RESTARTING](#forceful-termination-killing-restarting)
-- [Configuration](#configuration)
-- [Creating Jobs](#creating-jobs)
-- [Interacting With Jobs](#interacting-with-jobs)
+- [Overview](#user-content-overview)
+- [Job Lifecycle](#user-content-job-lifecycle)
+ - [Life Of A Task](#user-content-life-of-a-task)
+ - [PENDING to RUNNING states](#user-content-pending-to-running-states)
+ - [Task Updates](#user-content-task-updates)
+ - [HTTP Health Checking and Graceful Shutdown](#user-content-http-health-checking-and-graceful-shutdown)
+ - [Tearing a task down](#user-content-tearing-a-task-down)
+ - [Giving Priority to Production Tasks: PREEMPTING](#user-content-giving-priority-to-production-tasks-preempting)
+ - [Natural Termination: FINISHED, FAILED](#user-content-natural-termination-finished-failed)
+ - [Forceful Termination: KILLING, RESTARTING](#user-content-forceful-termination-killing-restarting)
+- [Configuration](#user-content-configuration)
+- [Creating Jobs](#user-content-creating-jobs)
+- [Interacting With Jobs](#user-content-interacting-with-jobs)
Overview
--------
@@ -107,9 +109,6 @@ When Aurora reads a configuration file and finds a `Job` definition, it:
4. The scheduler puts the `Task`s into `PENDING` state, starting each
`Task`'s life cycle.
-**Note**: It is not currently possible to create an Aurora job from
-within an Aurora job.
-
### Life Of A Task
![Life of a task](images/lifeofatask.png)
@@ -186,6 +185,50 @@ with old instance configs and batch updates proceed backwards
from the point where the update failed. E.g.; (0,1,2) (3,4,5) (6,7,
8-FAIL) results in a rollback in order (8,7,6) (5,4,3) (2,1,0).
+### HTTP Health Checking and Graceful Shutdown
+
+The Executor implements a protocol for rudimentary control of a task via HTTP. Tasks subscribe for
+this protocol by declaring a port named `health`. Take for example this configuration snippet:
+
+ nginx = Process(
+ name = 'nginx',
+ cmdline = './run_nginx.sh -port {{thermos.ports[http]}}')
+
+When this Process is included in a job, the job will be allocated a port, and the command line
+will be replaced with something like:
+
+ ./run_nginx.sh -port 42816
+
+Where 42816 happens to be the allocated. port. Typically, the Executor monitors Processes within
+a task only by liveness of the forked process. However, when a `health` port was allocated, it will
+also send periodic HTTP health checks. A task requesting a `health` port must handle the following
+requests:
+
+| HTTP request | Description |
+| ------------ | ----------- |
+| `GET /health` | Inquires whether the task is healthy. |
+| `POST /quitquitquit` | Task should initiate graceful shutdown. |
+| `POST /abortabortabort` | Final warning task is being killed. |
+
+Please see the
+[configuration reference](configuration-reference.md#user-content-healthcheckconfig-objects) for
+configuration options for this feature.
+
+#### Tearing a task down
+
+The Executor follows an escalation sequence when killing a running task:
+
+ 1. If `health` port is not present, skip to (5)
+ 2. POST /quitquitquit
+ 3. wait 5 seconds
+ 4. POST /abortabortabort
+ 5. Send SIGTERM (`kill`)
+ 6. Send SIGKILL (`kill -9`)
+
+If the Executor notices that all Processes in a Task have aborted during this sequence, it will
+not proceed with subsequent steps. Note that graceful shutdown is best-effort, and due to the many
+inevitable realities of distributed systems, it may not be performed.
+
### Giving Priority to Production Tasks: PREEMPTING
Sometimes a Task needs to be interrupted, such as when a non-production