You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@aurora.apache.org by se...@apache.org on 2016/02/06 23:25:40 UTC

aurora git commit: Move lifecycle documentation into separate file

Repository: aurora
Updated Branches:
  refs/heads/master a9e7a35a2 -> 2d59b697a


Move lifecycle documentation into separate file

In addition to the move, a couple of releted additions and adjustements have been made:

* slight reorganization
* documentation of missing states (THROTTELD, DRAINING)
* custom section on reconciliation
* remark regarding the uniqueness of an instance
* updated documentation of the teardown of a task (HTTPLifecycleConfig and finalization_wait)

Bugs closed: AURORA-1068, AURORA-1262, AURORA-734

Reviewed at https://reviews.apache.org/r/43013/


Project: http://git-wip-us.apache.org/repos/asf/aurora/repo
Commit: http://git-wip-us.apache.org/repos/asf/aurora/commit/2d59b697
Tree: http://git-wip-us.apache.org/repos/asf/aurora/tree/2d59b697
Diff: http://git-wip-us.apache.org/repos/asf/aurora/diff/2d59b697

Branch: refs/heads/master
Commit: 2d59b697a745f9540d53da1659cda4683c929b34
Parents: a9e7a35
Author: Stephan Erb <se...@apache.org>
Authored: Sat Feb 6 23:22:21 2016 +0100
Committer: Stephan Erb <st...@dev.static-void.de>
Committed: Sat Feb 6 23:22:21 2016 +0100

----------------------------------------------------------------------
 docs/README.md                  |   1 +
 docs/configuration-reference.md |  19 ++---
 docs/task-lifecycle.md          | 146 +++++++++++++++++++++++++++++++++++
 docs/user-guide.md              | 125 ++----------------------------
 4 files changed, 164 insertions(+), 127 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/aurora/blob/2d59b697/docs/README.md
----------------------------------------------------------------------
diff --git a/docs/README.md b/docs/README.md
index 8ebc061..78f062a 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -11,6 +11,7 @@ We encourage you to ask questions on the [Aurora user list](http://aurora.apache
  * [Install Aurora on virtual machines on your private machine](vagrant.md)
  * [Hello World Tutorial](tutorial.md)
  * [User Guide](user-guide.md)
+ * [Task Lifecycle](task-lifecycle.md)
  * [Configuration Tutorial](configuration-tutorial.md)
  * [Aurora + Thermos Reference](configuration-reference.md)
  * [Command Line Client](client-commands.md)

http://git-wip-us.apache.org/repos/asf/aurora/blob/2d59b697/docs/configuration-reference.md
----------------------------------------------------------------------
diff --git a/docs/configuration-reference.md b/docs/configuration-reference.md
index 995f706..3f023d7 100644
--- a/docs/configuration-reference.md
+++ b/docs/configuration-reference.md
@@ -312,10 +312,10 @@ upon one final Process ("reducer") to tabulate the results:
 
 #### finalization_wait
 
-Tasks have three active stages: `ACTIVE`, `CLEANING`, and `FINALIZING`. The
-`ACTIVE` stage is when ordinary processes run. This stage lasts as
-long as Processes are running and the Task is healthy. The moment either
-all Processes have finished successfully or the Task has reached a
+Process execution is organizued into three active stages: `ACTIVE`,
+`CLEANING`, and `FINALIZING`. The `ACTIVE` stage is when ordinary processes run.
+This stage lasts as long as Processes are running and the Task is healthy.
+The moment either all Processes have finished successfully or the Task has reached a
 maximum Process failure limit, it goes into `CLEANING` stage and send
 SIGTERMs to all currently running Processes and their process trees.
 Once all Processes have terminated, the Task goes into `FINALIZING` stage
@@ -327,10 +327,7 @@ finish during that time, all remaining Processes are sent SIGKILLs
 (or if they depend upon uncompleted Processes, are
 never invoked.)
 
-Client applications with higher priority may force a shorter
-finalization wait (e.g. through parameters to `thermos kill`), so this
-is mostly a best-effort signal.
-
+When running on Aurora, the `finalization_wait` is capped at 60 seconds.
 
 ### Constraint Object
 
@@ -515,7 +512,7 @@ Describes the container the job's processes will run inside.
 ### Docker Parameter Object
 
 Docker CLI parameters. This needs to be enabled by the scheduler `enable_docker_parameters` option.
-See [Docker Command Line Reference](https://docs.docker.com/reference/commandline/run/) for valid parameters. 
+See [Docker Command Line Reference](https://docs.docker.com/reference/commandline/run/) for valid parameters.
 
   param            | type            | description
   -----            | :----:          | -----------
@@ -611,6 +608,10 @@ to distinguish between Task replicas.
 | ```instance```    | Integer    | The instance number of the created task. A job with 5 replicas has instance numbers 0, 1, 2, 3, and 4.
 | ```hostname``` | String | The instance hostname that the task was launched on.
 
+Please note, there is no uniqueness guarantee for `instance` in the presence of
+network partitions. If that is required, it should be baked in at the application
+level using a distributed coordination service such as Zookeeper.
+
 ### thermos Namespace
 
 The `thermos` namespace contains variables that work directly on the

http://git-wip-us.apache.org/repos/asf/aurora/blob/2d59b697/docs/task-lifecycle.md
----------------------------------------------------------------------
diff --git a/docs/task-lifecycle.md b/docs/task-lifecycle.md
new file mode 100644
index 0000000..e85e754
--- /dev/null
+++ b/docs/task-lifecycle.md
@@ -0,0 +1,146 @@
+# Task Lifecycle
+
+When Aurora reads a configuration file and finds a `Job` definition, it:
+
+1.  Evaluates the `Job` definition.
+2.  Splits the `Job` into its constituent `Task`s.
+3.  Sends those `Task`s to the scheduler.
+4.  The scheduler puts the `Task`s into `PENDING` state, starting each
+    `Task`'s life cycle.
+
+
+![Life of a task](images/lifeofatask.png)
+
+Please note, a couple of task states described below are missing from
+this state diagram.
+
+
+## PENDING to RUNNING states
+
+When a `Task` is in the `PENDING` state, the scheduler constantly
+searches for machines satisfying that `Task`'s resource request
+requirements (RAM, disk space, CPU time) while maintaining configuration
+constraints such as "a `Task` must run on machines  dedicated  to a
+particular role" or attribute limit constraints such as "at most 2
+`Task`s from the same `Job` may run on each rack". When the scheduler
+finds a suitable match, it assigns the `Task` to a machine and puts the
+`Task` into the `ASSIGNED` state.
+
+From the `ASSIGNED` state, the scheduler sends an RPC to the slave
+machine containing `Task` configuration, which the slave uses to spawn
+an executor responsible for the `Task`'s lifecycle. When the scheduler
+receives an acknowledgment that the machine has accepted the `Task`,
+the `Task` goes into `STARTING` state.
+
+`STARTING` state initializes a `Task` sandbox. When the sandbox is fully
+initialized, Thermos begins to invoke `Process`es. Also, the slave
+machine sends an update to the scheduler that the `Task` is
+in `RUNNING` state.
+
+
+
+## RUNNING to terminal states
+
+There are various ways that an active `Task` can transition into a terminal
+state. By definition, it can never leave this state. However, depending on
+nature of the termination and the originating `Job` definition
+(e.g. `service`, `max_task_failures`), a replacement `Task` might be
+scheduled.
+
+### Natural Termination: FINISHED, FAILED
+
+A `RUNNING` `Task` can terminate without direct user interaction. For
+example, it may be a finite computation that finishes, even something as
+simple as `echo hello world.`, or it could be an exceptional condition in
+a long-lived service. If the `Task` is successful (its underlying
+processes have succeeded with exit status `0` or finished without
+reaching failure limits) it moves into `FINISHED` state. If it finished
+after reaching a set of failure limits, it goes into `FAILED` state.
+
+A terminated `TASK` which is subject to rescheduling will be temporarily
+`THROTTLED`, if it is considered to be flapping. A task is flapping, if its
+previous invocation was terminated after less than 5 minutes (scheduler
+default). The time penalty a task has to remain in the `THROTTLED` state,
+before it is eligible for rescheduling, increases with each consecutive
+failure.
+
+### Forceful Termination: KILLING, RESTARTING
+
+You can terminate a `Task` by issuing an `aurora job kill` command, which
+moves it into `KILLING` state. The scheduler then sends the slave a
+request to terminate the `Task`. If the scheduler receives a successful
+response, it moves the Task into `KILLED` state and never restarts it.
+
+If a `Task` is forced into the `RESTARTING` state via the `aurora job restart`
+command, the scheduler kills the underlying task but in parallel schedules
+an identical replacement for it.
+
+In any case, the responsible executor on the slave follows an escalation
+sequence when killing a running task:
+
+  1. If a `HTTPLifecycleConfig` is not present, skip to (4).
+  2. Send a POST to the `graceful_shutdown_endpoint` and wait 5 seconds.
+  3. Send a POST to the `shutdown_endpoint` and wait 5 seconds.
+  4. Send SIGTERM (`kill`) and wait at most `finalization_wait` seconds.
+  5. Send SIGKILL (`kill -9`).
+
+If the executor notices that all `Process`es in a `Task` have aborted
+during this sequence, it will not proceed with subsequent steps.
+Note that graceful shutdown is best-effort, and due to the many
+inevitable realities of distributed systems, it may not be performed.
+
+### Unexpected Termination: LOST
+
+If a `Task` stays in a transient task state for too long (such as `ASSIGNED`
+or `STARTING`), the scheduler forces it into `LOST` state, creating a new
+`Task` in its place that's sent into `PENDING` state.
+
+In addition, if the Mesos core tells the scheduler that a slave has
+become unhealthy (or outright disappeared), the `Task`s assigned to that
+slave go into `LOST` state and new `Task`s are created in their place.
+From `PENDING` state, there is no guarantee a `Task` will be reassigned
+to the same machine unless job constraints explicitly force it there.
+
+### Giving Priority to Production Tasks: PREEMPTING
+
+Sometimes a Task needs to be interrupted, such as when a non-production
+Task's resources are needed by a higher priority production Task. This
+type of interruption is called a *pre-emption*. When this happens in
+Aurora, the non-production Task is killed and moved into
+the `PREEMPTING` state  when both the following are true:
+
+- The task being killed is a non-production task.
+- The other task is a `PENDING` production task that hasn't been
+  scheduled due to a lack of resources.
+
+The scheduler UI shows the non-production task was preempted in favor of
+the production task. At some point, tasks in `PREEMPTING` move to `KILLED`.
+
+Note that non-production tasks consuming many resources are likely to be
+preempted in favor of production tasks.
+
+### Making Room for Maintenance: DRAINING
+
+Cluster operators can set slave into maintenance mode. This will transition
+all `Task` running on this slave into `DRAINING` and eventually to `KILLED`.
+Drained `Task`s will be restarted on other slaves for which no maintenance
+has been announced yet.
+
+
+
+## State Reconciliation
+
+Due to the many inevitable realities of distributed systems, there might
+be a mismatch of perceived and actual cluster state (e.g. a machine returns
+from a `netsplit` but the scheduler has already marked all its `Task`s as
+`LOST` and rescheduled them).
+
+Aurora regularly runs a state reconciliation process in order to detect
+and correct such issues (e.g. by killing the errant `RUNNING` tasks).
+By default, the proper detection of all failure scenarios and inconsistencies
+may take up to an hour.
+
+To emphasize this point: there is no uniqueness guarantee for a single
+instance of a job in the presence of network partitions. If the `Task`
+requires that, it should be baked in at the application level using a
+distributed coordination service such as Zookeeper.

http://git-wip-us.apache.org/repos/asf/aurora/blob/2d59b697/docs/user-guide.md
----------------------------------------------------------------------
diff --git a/docs/user-guide.md b/docs/user-guide.md
index df63468..656296c 100644
--- a/docs/user-guide.md
+++ b/docs/user-guide.md
@@ -3,14 +3,8 @@ Aurora User Guide
 
 - [Overview](#user-content-overview)
 - [Job Lifecycle](#user-content-job-lifecycle)
-	- [Life Of A Task](#user-content-life-of-a-task)
-	- [PENDING to RUNNING states](#user-content-pending-to-running-states)
 	- [Task Updates](#user-content-task-updates)
-	- [HTTP Health Checking and Graceful Shutdown](#user-content-http-health-checking-and-graceful-shutdown)
-		- [Tearing a task down](#user-content-tearing-a-task-down)
-	- [Giving Priority to Production Tasks: PREEMPTING](#user-content-giving-priority-to-production-tasks-preempting)
-	- [Natural Termination: FINISHED, FAILED](#user-content-natural-termination-finished-failed)
-	- [Forceful Termination: KILLING, RESTARTING](#user-content-forceful-termination-killing-restarting)
+	- [HTTP Health Checking](#user-content-http-health-checking)
 - [Service Discovery](#user-content-service-discovery)
 - [Configuration](#user-content-configuration)
 - [Creating Jobs](#user-content-creating-jobs)
@@ -99,60 +93,13 @@ will be around forever, e.g. by building log saving or other
 checkpointing mechanisms directly into your application or into your
 `Job` description.
 
+
 Job Lifecycle
 -------------
 
-When Aurora reads a configuration file and finds a `Job` definition, it:
-
-1.  Evaluates the `Job` definition.
-2.  Splits the `Job` into its constituent `Task`s.
-3.  Sends those `Task`s to the scheduler.
-4.  The scheduler puts the `Task`s into `PENDING` state, starting each
-    `Task`'s life cycle.
-
-### Life Of A Task
-
-![Life of a task](images/lifeofatask.png)
-
-### PENDING to RUNNING states
-
-When a `Task` is in the `PENDING` state, the scheduler constantly
-searches for machines satisfying that `Task`'s resource request
-requirements (RAM, disk space, CPU time) while maintaining configuration
-constraints such as "a `Task` must run on machines  dedicated  to a
-particular role" or attribute limit constraints such as "at most 2
-`Task`s from the same `Job` may run on each rack". When the scheduler
-finds a suitable match, it assigns the `Task` to a machine and puts the
-`Task` into the `ASSIGNED` state.
-
-From the `ASSIGNED` state, the scheduler sends an RPC to the slave
-machine containing `Task` configuration, which the slave uses to spawn
-an executor responsible for the `Task`'s lifecycle. When the scheduler
-receives an acknowledgement that the machine has accepted the `Task`,
-the `Task` goes into `STARTING` state.
-
-`STARTING` state initializes a `Task` sandbox. When the sandbox is fully
-initialized, Thermos begins to invoke `Process`es. Also, the slave
-machine sends an update to the scheduler that the `Task` is
-in `RUNNING` state.
-
-If a `Task` stays in `ASSIGNED` or `STARTING` for too long, the
-scheduler forces it into `LOST` state, creating a new `Task` in its
-place that's sent into `PENDING` state. This is technically true of any
-active state: if the Mesos core tells the scheduler that a slave has
-become unhealthy (or outright disappeared), the `Task`s assigned to that
-slave go into `LOST` state and new `Task`s are created in their place.
-From `PENDING` state, there is no guarantee a `Task` will be reassigned
-to the same machine unless job constraints explicitly force it there.
-
-If there is a state mismatch, (e.g. a machine returns from a `netsplit`
-and the scheduler has marked all its `Task`s `LOST` and rescheduled
-them), a state reconciliation process kills the errant `RUNNING` tasks,
-which may take up to an hour. But to emphasize this point: there is no
-uniqueness guarantee for a single instance of a job in the presence of
-network partitions. If the Task requires that, it should be baked in at
-the application level using a distributed coordination service such as
-Zookeeper.
+`Job`s and their `Task`s have various states that are described in the [Task Lifecycle](task-lifecycle.md).
+However, in day to day use, you'll be primarily concerned with launching new jobs and updating existing ones.
+
 
 ### Task Updates
 
@@ -186,14 +133,14 @@ with old instance configs and batch updates proceed backwards
 from the point where the update failed. E.g.; (0,1,2) (3,4,5) (6,7,
 8-FAIL) results in a rollback in order (8,7,6) (5,4,3) (2,1,0).
 
-### HTTP Health Checking and Graceful Shutdown
+### HTTP Health Checking
 
 The Executor implements a protocol for rudimentary control of a task via HTTP.  Tasks subscribe for
 this protocol by declaring a port named `health`.  Take for example this configuration snippet:
 
     nginx = Process(
       name = 'nginx',
-      cmdline = './run_nginx.sh -port {{thermos.ports[http]}}')
+      cmdline = './run_nginx.sh -port {{thermos.ports[health]}}')
 
 When this Process is included in a job, the job will be allocated a port, and the command line
 will be replaced with something like:
@@ -208,8 +155,6 @@ requests:
 | HTTP request            | Description                             |
 | ------------            | -----------                             |
 | `GET /health`           | Inquires whether the task is healthy.   |
-| `POST /quitquitquit`    | Task should initiate graceful shutdown. |
-| `POST /abortabortabort` | Final warning task is being killed.     |
 
 Please see the
 [configuration reference](configuration-reference.md#user-content-healthcheckconfig-objects) for
@@ -227,62 +172,6 @@ process.
 WARNING: Remember to remove this when you are done, otherwise your instance will have permanently
 disabled health checks.
 
-#### Tearing a task down
-
-The Executor follows an escalation sequence when killing a running task:
-
-  1. If `health` port is not present, skip to (5)
-  2. POST /quitquitquit
-  3. wait 5 seconds
-  4. POST /abortabortabort
-  5. Send SIGTERM (`kill`)
-  6. Send SIGKILL (`kill -9`)
-
-If the Executor notices that all Processes in a Task have aborted during this sequence, it will
-not proceed with subsequent steps.  Note that graceful shutdown is best-effort, and due to the many
-inevitable realities of distributed systems, it may not be performed.
-
-### Giving Priority to Production Tasks: PREEMPTING
-
-Sometimes a Task needs to be interrupted, such as when a non-production
-Task's resources are needed by a higher priority production Task. This
-type of interruption is called a *pre-emption*. When this happens in
-Aurora, the non-production Task is killed and moved into
-the `PREEMPTING` state  when both the following are true:
-
-- The task being killed is a non-production task.
-- The other task is a `PENDING` production task that hasn't been
-  scheduled due to a lack of resources.
-
-Since production tasks are much more important, Aurora kills off the
-non-production task to free up resources for the production task. The
-scheduler UI shows the non-production task was preempted in favor of the
-production task. At some point, tasks in `PREEMPTING` move to `KILLED`.
-
-Note that non-production tasks consuming many resources are likely to be
-preempted in favor of production tasks.
-
-### Natural Termination: FINISHED, FAILED
-
-A `RUNNING` `Task` can terminate without direct user interaction. For
-example, it may be a finite computation that finishes, even something as
-simple as `echo hello world. `Or it could be an exceptional condition in
-a long-lived service. If the `Task` is successful (its underlying
-processes have succeeded with exit status `0` or finished without
-reaching failure limits) it moves into `FINISHED` state. If it finished
-after reaching a set of failure limits, it goes into `FAILED` state.
-
-### Forceful Termination: KILLING, RESTARTING
-
-You can terminate a `Task` by issuing an `aurora job kill` command, which
-moves it into `KILLING` state. The scheduler then sends the slave  a
-request to terminate the `Task`. If the scheduler receives a successful
-response, it moves the Task into `KILLED` state and never restarts it.
-
-The scheduler has access to a non-public `RESTARTING` state. If a `Task`
-is forced into the `RESTARTING` state, the scheduler kills the
-underlying task but in parallel schedules an identical replacement for
-it.
 
 Configuration
 -------------