You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@aurora.apache.org by se...@apache.org on 2016/02/06 23:25:40 UTC
aurora git commit: Move lifecycle documentation into separate file
Repository: aurora
Updated Branches:
refs/heads/master a9e7a35a2 -> 2d59b697a
Move lifecycle documentation into separate file
In addition to the move, a couple of releted additions and adjustements have been made:
* slight reorganization
* documentation of missing states (THROTTELD, DRAINING)
* custom section on reconciliation
* remark regarding the uniqueness of an instance
* updated documentation of the teardown of a task (HTTPLifecycleConfig and finalization_wait)
Bugs closed: AURORA-1068, AURORA-1262, AURORA-734
Reviewed at https://reviews.apache.org/r/43013/
Project: http://git-wip-us.apache.org/repos/asf/aurora/repo
Commit: http://git-wip-us.apache.org/repos/asf/aurora/commit/2d59b697
Tree: http://git-wip-us.apache.org/repos/asf/aurora/tree/2d59b697
Diff: http://git-wip-us.apache.org/repos/asf/aurora/diff/2d59b697
Branch: refs/heads/master
Commit: 2d59b697a745f9540d53da1659cda4683c929b34
Parents: a9e7a35
Author: Stephan Erb <se...@apache.org>
Authored: Sat Feb 6 23:22:21 2016 +0100
Committer: Stephan Erb <st...@dev.static-void.de>
Committed: Sat Feb 6 23:22:21 2016 +0100
----------------------------------------------------------------------
docs/README.md | 1 +
docs/configuration-reference.md | 19 ++---
docs/task-lifecycle.md | 146 +++++++++++++++++++++++++++++++++++
docs/user-guide.md | 125 ++----------------------------
4 files changed, 164 insertions(+), 127 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/aurora/blob/2d59b697/docs/README.md
----------------------------------------------------------------------
diff --git a/docs/README.md b/docs/README.md
index 8ebc061..78f062a 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -11,6 +11,7 @@ We encourage you to ask questions on the [Aurora user list](http://aurora.apache
* [Install Aurora on virtual machines on your private machine](vagrant.md)
* [Hello World Tutorial](tutorial.md)
* [User Guide](user-guide.md)
+ * [Task Lifecycle](task-lifecycle.md)
* [Configuration Tutorial](configuration-tutorial.md)
* [Aurora + Thermos Reference](configuration-reference.md)
* [Command Line Client](client-commands.md)
http://git-wip-us.apache.org/repos/asf/aurora/blob/2d59b697/docs/configuration-reference.md
----------------------------------------------------------------------
diff --git a/docs/configuration-reference.md b/docs/configuration-reference.md
index 995f706..3f023d7 100644
--- a/docs/configuration-reference.md
+++ b/docs/configuration-reference.md
@@ -312,10 +312,10 @@ upon one final Process ("reducer") to tabulate the results:
#### finalization_wait
-Tasks have three active stages: `ACTIVE`, `CLEANING`, and `FINALIZING`. The
-`ACTIVE` stage is when ordinary processes run. This stage lasts as
-long as Processes are running and the Task is healthy. The moment either
-all Processes have finished successfully or the Task has reached a
+Process execution is organizued into three active stages: `ACTIVE`,
+`CLEANING`, and `FINALIZING`. The `ACTIVE` stage is when ordinary processes run.
+This stage lasts as long as Processes are running and the Task is healthy.
+The moment either all Processes have finished successfully or the Task has reached a
maximum Process failure limit, it goes into `CLEANING` stage and send
SIGTERMs to all currently running Processes and their process trees.
Once all Processes have terminated, the Task goes into `FINALIZING` stage
@@ -327,10 +327,7 @@ finish during that time, all remaining Processes are sent SIGKILLs
(or if they depend upon uncompleted Processes, are
never invoked.)
-Client applications with higher priority may force a shorter
-finalization wait (e.g. through parameters to `thermos kill`), so this
-is mostly a best-effort signal.
-
+When running on Aurora, the `finalization_wait` is capped at 60 seconds.
### Constraint Object
@@ -515,7 +512,7 @@ Describes the container the job's processes will run inside.
### Docker Parameter Object
Docker CLI parameters. This needs to be enabled by the scheduler `enable_docker_parameters` option.
-See [Docker Command Line Reference](https://docs.docker.com/reference/commandline/run/) for valid parameters.
+See [Docker Command Line Reference](https://docs.docker.com/reference/commandline/run/) for valid parameters.
param | type | description
----- | :----: | -----------
@@ -611,6 +608,10 @@ to distinguish between Task replicas.
| ```instance``` | Integer | The instance number of the created task. A job with 5 replicas has instance numbers 0, 1, 2, 3, and 4.
| ```hostname``` | String | The instance hostname that the task was launched on.
+Please note, there is no uniqueness guarantee for `instance` in the presence of
+network partitions. If that is required, it should be baked in at the application
+level using a distributed coordination service such as Zookeeper.
+
### thermos Namespace
The `thermos` namespace contains variables that work directly on the
http://git-wip-us.apache.org/repos/asf/aurora/blob/2d59b697/docs/task-lifecycle.md
----------------------------------------------------------------------
diff --git a/docs/task-lifecycle.md b/docs/task-lifecycle.md
new file mode 100644
index 0000000..e85e754
--- /dev/null
+++ b/docs/task-lifecycle.md
@@ -0,0 +1,146 @@
+# Task Lifecycle
+
+When Aurora reads a configuration file and finds a `Job` definition, it:
+
+1. Evaluates the `Job` definition.
+2. Splits the `Job` into its constituent `Task`s.
+3. Sends those `Task`s to the scheduler.
+4. The scheduler puts the `Task`s into `PENDING` state, starting each
+ `Task`'s life cycle.
+
+
+![Life of a task](images/lifeofatask.png)
+
+Please note, a couple of task states described below are missing from
+this state diagram.
+
+
+## PENDING to RUNNING states
+
+When a `Task` is in the `PENDING` state, the scheduler constantly
+searches for machines satisfying that `Task`'s resource request
+requirements (RAM, disk space, CPU time) while maintaining configuration
+constraints such as "a `Task` must run on machines dedicated to a
+particular role" or attribute limit constraints such as "at most 2
+`Task`s from the same `Job` may run on each rack". When the scheduler
+finds a suitable match, it assigns the `Task` to a machine and puts the
+`Task` into the `ASSIGNED` state.
+
+From the `ASSIGNED` state, the scheduler sends an RPC to the slave
+machine containing `Task` configuration, which the slave uses to spawn
+an executor responsible for the `Task`'s lifecycle. When the scheduler
+receives an acknowledgment that the machine has accepted the `Task`,
+the `Task` goes into `STARTING` state.
+
+`STARTING` state initializes a `Task` sandbox. When the sandbox is fully
+initialized, Thermos begins to invoke `Process`es. Also, the slave
+machine sends an update to the scheduler that the `Task` is
+in `RUNNING` state.
+
+
+
+## RUNNING to terminal states
+
+There are various ways that an active `Task` can transition into a terminal
+state. By definition, it can never leave this state. However, depending on
+nature of the termination and the originating `Job` definition
+(e.g. `service`, `max_task_failures`), a replacement `Task` might be
+scheduled.
+
+### Natural Termination: FINISHED, FAILED
+
+A `RUNNING` `Task` can terminate without direct user interaction. For
+example, it may be a finite computation that finishes, even something as
+simple as `echo hello world.`, or it could be an exceptional condition in
+a long-lived service. If the `Task` is successful (its underlying
+processes have succeeded with exit status `0` or finished without
+reaching failure limits) it moves into `FINISHED` state. If it finished
+after reaching a set of failure limits, it goes into `FAILED` state.
+
+A terminated `TASK` which is subject to rescheduling will be temporarily
+`THROTTLED`, if it is considered to be flapping. A task is flapping, if its
+previous invocation was terminated after less than 5 minutes (scheduler
+default). The time penalty a task has to remain in the `THROTTLED` state,
+before it is eligible for rescheduling, increases with each consecutive
+failure.
+
+### Forceful Termination: KILLING, RESTARTING
+
+You can terminate a `Task` by issuing an `aurora job kill` command, which
+moves it into `KILLING` state. The scheduler then sends the slave a
+request to terminate the `Task`. If the scheduler receives a successful
+response, it moves the Task into `KILLED` state and never restarts it.
+
+If a `Task` is forced into the `RESTARTING` state via the `aurora job restart`
+command, the scheduler kills the underlying task but in parallel schedules
+an identical replacement for it.
+
+In any case, the responsible executor on the slave follows an escalation
+sequence when killing a running task:
+
+ 1. If a `HTTPLifecycleConfig` is not present, skip to (4).
+ 2. Send a POST to the `graceful_shutdown_endpoint` and wait 5 seconds.
+ 3. Send a POST to the `shutdown_endpoint` and wait 5 seconds.
+ 4. Send SIGTERM (`kill`) and wait at most `finalization_wait` seconds.
+ 5. Send SIGKILL (`kill -9`).
+
+If the executor notices that all `Process`es in a `Task` have aborted
+during this sequence, it will not proceed with subsequent steps.
+Note that graceful shutdown is best-effort, and due to the many
+inevitable realities of distributed systems, it may not be performed.
+
+### Unexpected Termination: LOST
+
+If a `Task` stays in a transient task state for too long (such as `ASSIGNED`
+or `STARTING`), the scheduler forces it into `LOST` state, creating a new
+`Task` in its place that's sent into `PENDING` state.
+
+In addition, if the Mesos core tells the scheduler that a slave has
+become unhealthy (or outright disappeared), the `Task`s assigned to that
+slave go into `LOST` state and new `Task`s are created in their place.
+From `PENDING` state, there is no guarantee a `Task` will be reassigned
+to the same machine unless job constraints explicitly force it there.
+
+### Giving Priority to Production Tasks: PREEMPTING
+
+Sometimes a Task needs to be interrupted, such as when a non-production
+Task's resources are needed by a higher priority production Task. This
+type of interruption is called a *pre-emption*. When this happens in
+Aurora, the non-production Task is killed and moved into
+the `PREEMPTING` state when both the following are true:
+
+- The task being killed is a non-production task.
+- The other task is a `PENDING` production task that hasn't been
+ scheduled due to a lack of resources.
+
+The scheduler UI shows the non-production task was preempted in favor of
+the production task. At some point, tasks in `PREEMPTING` move to `KILLED`.
+
+Note that non-production tasks consuming many resources are likely to be
+preempted in favor of production tasks.
+
+### Making Room for Maintenance: DRAINING
+
+Cluster operators can set slave into maintenance mode. This will transition
+all `Task` running on this slave into `DRAINING` and eventually to `KILLED`.
+Drained `Task`s will be restarted on other slaves for which no maintenance
+has been announced yet.
+
+
+
+## State Reconciliation
+
+Due to the many inevitable realities of distributed systems, there might
+be a mismatch of perceived and actual cluster state (e.g. a machine returns
+from a `netsplit` but the scheduler has already marked all its `Task`s as
+`LOST` and rescheduled them).
+
+Aurora regularly runs a state reconciliation process in order to detect
+and correct such issues (e.g. by killing the errant `RUNNING` tasks).
+By default, the proper detection of all failure scenarios and inconsistencies
+may take up to an hour.
+
+To emphasize this point: there is no uniqueness guarantee for a single
+instance of a job in the presence of network partitions. If the `Task`
+requires that, it should be baked in at the application level using a
+distributed coordination service such as Zookeeper.
http://git-wip-us.apache.org/repos/asf/aurora/blob/2d59b697/docs/user-guide.md
----------------------------------------------------------------------
diff --git a/docs/user-guide.md b/docs/user-guide.md
index df63468..656296c 100644
--- a/docs/user-guide.md
+++ b/docs/user-guide.md
@@ -3,14 +3,8 @@ Aurora User Guide
- [Overview](#user-content-overview)
- [Job Lifecycle](#user-content-job-lifecycle)
- - [Life Of A Task](#user-content-life-of-a-task)
- - [PENDING to RUNNING states](#user-content-pending-to-running-states)
- [Task Updates](#user-content-task-updates)
- - [HTTP Health Checking and Graceful Shutdown](#user-content-http-health-checking-and-graceful-shutdown)
- - [Tearing a task down](#user-content-tearing-a-task-down)
- - [Giving Priority to Production Tasks: PREEMPTING](#user-content-giving-priority-to-production-tasks-preempting)
- - [Natural Termination: FINISHED, FAILED](#user-content-natural-termination-finished-failed)
- - [Forceful Termination: KILLING, RESTARTING](#user-content-forceful-termination-killing-restarting)
+ - [HTTP Health Checking](#user-content-http-health-checking)
- [Service Discovery](#user-content-service-discovery)
- [Configuration](#user-content-configuration)
- [Creating Jobs](#user-content-creating-jobs)
@@ -99,60 +93,13 @@ will be around forever, e.g. by building log saving or other
checkpointing mechanisms directly into your application or into your
`Job` description.
+
Job Lifecycle
-------------
-When Aurora reads a configuration file and finds a `Job` definition, it:
-
-1. Evaluates the `Job` definition.
-2. Splits the `Job` into its constituent `Task`s.
-3. Sends those `Task`s to the scheduler.
-4. The scheduler puts the `Task`s into `PENDING` state, starting each
- `Task`'s life cycle.
-
-### Life Of A Task
-
-![Life of a task](images/lifeofatask.png)
-
-### PENDING to RUNNING states
-
-When a `Task` is in the `PENDING` state, the scheduler constantly
-searches for machines satisfying that `Task`'s resource request
-requirements (RAM, disk space, CPU time) while maintaining configuration
-constraints such as "a `Task` must run on machines dedicated to a
-particular role" or attribute limit constraints such as "at most 2
-`Task`s from the same `Job` may run on each rack". When the scheduler
-finds a suitable match, it assigns the `Task` to a machine and puts the
-`Task` into the `ASSIGNED` state.
-
-From the `ASSIGNED` state, the scheduler sends an RPC to the slave
-machine containing `Task` configuration, which the slave uses to spawn
-an executor responsible for the `Task`'s lifecycle. When the scheduler
-receives an acknowledgement that the machine has accepted the `Task`,
-the `Task` goes into `STARTING` state.
-
-`STARTING` state initializes a `Task` sandbox. When the sandbox is fully
-initialized, Thermos begins to invoke `Process`es. Also, the slave
-machine sends an update to the scheduler that the `Task` is
-in `RUNNING` state.
-
-If a `Task` stays in `ASSIGNED` or `STARTING` for too long, the
-scheduler forces it into `LOST` state, creating a new `Task` in its
-place that's sent into `PENDING` state. This is technically true of any
-active state: if the Mesos core tells the scheduler that a slave has
-become unhealthy (or outright disappeared), the `Task`s assigned to that
-slave go into `LOST` state and new `Task`s are created in their place.
-From `PENDING` state, there is no guarantee a `Task` will be reassigned
-to the same machine unless job constraints explicitly force it there.
-
-If there is a state mismatch, (e.g. a machine returns from a `netsplit`
-and the scheduler has marked all its `Task`s `LOST` and rescheduled
-them), a state reconciliation process kills the errant `RUNNING` tasks,
-which may take up to an hour. But to emphasize this point: there is no
-uniqueness guarantee for a single instance of a job in the presence of
-network partitions. If the Task requires that, it should be baked in at
-the application level using a distributed coordination service such as
-Zookeeper.
+`Job`s and their `Task`s have various states that are described in the [Task Lifecycle](task-lifecycle.md).
+However, in day to day use, you'll be primarily concerned with launching new jobs and updating existing ones.
+
### Task Updates
@@ -186,14 +133,14 @@ with old instance configs and batch updates proceed backwards
from the point where the update failed. E.g.; (0,1,2) (3,4,5) (6,7,
8-FAIL) results in a rollback in order (8,7,6) (5,4,3) (2,1,0).
-### HTTP Health Checking and Graceful Shutdown
+### HTTP Health Checking
The Executor implements a protocol for rudimentary control of a task via HTTP. Tasks subscribe for
this protocol by declaring a port named `health`. Take for example this configuration snippet:
nginx = Process(
name = 'nginx',
- cmdline = './run_nginx.sh -port {{thermos.ports[http]}}')
+ cmdline = './run_nginx.sh -port {{thermos.ports[health]}}')
When this Process is included in a job, the job will be allocated a port, and the command line
will be replaced with something like:
@@ -208,8 +155,6 @@ requests:
| HTTP request | Description |
| ------------ | ----------- |
| `GET /health` | Inquires whether the task is healthy. |
-| `POST /quitquitquit` | Task should initiate graceful shutdown. |
-| `POST /abortabortabort` | Final warning task is being killed. |
Please see the
[configuration reference](configuration-reference.md#user-content-healthcheckconfig-objects) for
@@ -227,62 +172,6 @@ process.
WARNING: Remember to remove this when you are done, otherwise your instance will have permanently
disabled health checks.
-#### Tearing a task down
-
-The Executor follows an escalation sequence when killing a running task:
-
- 1. If `health` port is not present, skip to (5)
- 2. POST /quitquitquit
- 3. wait 5 seconds
- 4. POST /abortabortabort
- 5. Send SIGTERM (`kill`)
- 6. Send SIGKILL (`kill -9`)
-
-If the Executor notices that all Processes in a Task have aborted during this sequence, it will
-not proceed with subsequent steps. Note that graceful shutdown is best-effort, and due to the many
-inevitable realities of distributed systems, it may not be performed.
-
-### Giving Priority to Production Tasks: PREEMPTING
-
-Sometimes a Task needs to be interrupted, such as when a non-production
-Task's resources are needed by a higher priority production Task. This
-type of interruption is called a *pre-emption*. When this happens in
-Aurora, the non-production Task is killed and moved into
-the `PREEMPTING` state when both the following are true:
-
-- The task being killed is a non-production task.
-- The other task is a `PENDING` production task that hasn't been
- scheduled due to a lack of resources.
-
-Since production tasks are much more important, Aurora kills off the
-non-production task to free up resources for the production task. The
-scheduler UI shows the non-production task was preempted in favor of the
-production task. At some point, tasks in `PREEMPTING` move to `KILLED`.
-
-Note that non-production tasks consuming many resources are likely to be
-preempted in favor of production tasks.
-
-### Natural Termination: FINISHED, FAILED
-
-A `RUNNING` `Task` can terminate without direct user interaction. For
-example, it may be a finite computation that finishes, even something as
-simple as `echo hello world. `Or it could be an exceptional condition in
-a long-lived service. If the `Task` is successful (its underlying
-processes have succeeded with exit status `0` or finished without
-reaching failure limits) it moves into `FINISHED` state. If it finished
-after reaching a set of failure limits, it goes into `FAILED` state.
-
-### Forceful Termination: KILLING, RESTARTING
-
-You can terminate a `Task` by issuing an `aurora job kill` command, which
-moves it into `KILLING` state. The scheduler then sends the slave a
-request to terminate the `Task`. If the scheduler receives a successful
-response, it moves the Task into `KILLED` state and never restarts it.
-
-The scheduler has access to a non-public `RESTARTING` state. If a `Task`
-is forced into the `RESTARTING` state, the scheduler kills the
-underlying task but in parallel schedules an identical replacement for
-it.
Configuration
-------------