You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@aurora.apache.org by se...@apache.org on 2017/02/21 20:55:06 UTC

svn commit: r1783940 [20/20] - in /aurora/site: data/ publish/ publish/blog/ publish/blog/aurora-0-17-0-released/ publish/documentation/0.10.0/ publish/documentation/0.10.0/build-system/ publish/documentation/0.10.0/client-cluster-configuration/ publis...

Added: aurora/site/source/documentation/0.17.0/reference/task-lifecycle.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.17.0/reference/task-lifecycle.md?rev=1783940&view=auto
==============================================================================
--- aurora/site/source/documentation/0.17.0/reference/task-lifecycle.md (added)
+++ aurora/site/source/documentation/0.17.0/reference/task-lifecycle.md Tue Feb 21 20:54:58 2017
@@ -0,0 +1,148 @@
+# Task Lifecycle
+
+When Aurora reads a configuration file and finds a `Job` definition, it:
+
+1.  Evaluates the `Job` definition.
+2.  Splits the `Job` into its constituent `Task`s.
+3.  Sends those `Task`s to the scheduler.
+4.  The scheduler puts the `Task`s into `PENDING` state, starting each
+    `Task`'s life cycle.
+
+
+![Life of a task](../images/lifeofatask.png)
+
+Please note, a couple of task states described below are missing from
+this state diagram.
+
+
+## PENDING to RUNNING states
+
+When a `Task` is in the `PENDING` state, the scheduler constantly
+searches for machines satisfying that `Task`'s resource request
+requirements (RAM, disk space, CPU time) while maintaining configuration
+constraints such as "a `Task` must run on machines  dedicated  to a
+particular role" or attribute limit constraints such as "at most 2
+`Task`s from the same `Job` may run on each rack". When the scheduler
+finds a suitable match, it assigns the `Task` to a machine and puts the
+`Task` into the `ASSIGNED` state.
+
+From the `ASSIGNED` state, the scheduler sends an RPC to the agent
+machine containing `Task` configuration, which the agent uses to spawn
+an executor responsible for the `Task`'s lifecycle. When the scheduler
+receives an acknowledgment that the machine has accepted the `Task`,
+the `Task` goes into `STARTING` state.
+
+`STARTING` state initializes a `Task` sandbox. When the sandbox is fully
+initialized, Thermos begins to invoke `Process`es. Also, the agent
+machine sends an update to the scheduler that the `Task` is
+in `RUNNING` state, only after the task satisfies the liveness requirements.
+See [Health Checking](../features/services#health-checking) for more details
+for how to configure health checks.
+
+
+
+## RUNNING to terminal states
+
+There are various ways that an active `Task` can transition into a terminal
+state. By definition, it can never leave this state. However, depending on
+nature of the termination and the originating `Job` definition
+(e.g. `service`, `max_task_failures`), a replacement `Task` might be
+scheduled.
+
+### Natural Termination: FINISHED, FAILED
+
+A `RUNNING` `Task` can terminate without direct user interaction. For
+example, it may be a finite computation that finishes, even something as
+simple as `echo hello world.`, or it could be an exceptional condition in
+a long-lived service. If the `Task` is successful (its underlying
+processes have succeeded with exit status `0` or finished without
+reaching failure limits) it moves into `FINISHED` state. If it finished
+after reaching a set of failure limits, it goes into `FAILED` state.
+
+A terminated `TASK` which is subject to rescheduling will be temporarily
+`THROTTLED`, if it is considered to be flapping. A task is flapping, if its
+previous invocation was terminated after less than 5 minutes (scheduler
+default). The time penalty a task has to remain in the `THROTTLED` state,
+before it is eligible for rescheduling, increases with each consecutive
+failure.
+
+### Forceful Termination: KILLING, RESTARTING
+
+You can terminate a `Task` by issuing an `aurora job kill` command, which
+moves it into `KILLING` state. The scheduler then sends the agent a
+request to terminate the `Task`. If the scheduler receives a successful
+response, it moves the Task into `KILLED` state and never restarts it.
+
+If a `Task` is forced into the `RESTARTING` state via the `aurora job restart`
+command, the scheduler kills the underlying task but in parallel schedules
+an identical replacement for it.
+
+In any case, the responsible executor on the agent follows an escalation
+sequence when killing a running task:
+
+  1. If a `HttpLifecycleConfig` is not present, skip to (4).
+  2. Send a POST to the `graceful_shutdown_endpoint` and wait 5 seconds.
+  3. Send a POST to the `shutdown_endpoint` and wait 5 seconds.
+  4. Send SIGTERM (`kill`) and wait at most `finalization_wait` seconds.
+  5. Send SIGKILL (`kill -9`).
+
+If the executor notices that all `Process`es in a `Task` have aborted
+during this sequence, it will not proceed with subsequent steps.
+Note that graceful shutdown is best-effort, and due to the many
+inevitable realities of distributed systems, it may not be performed.
+
+### Unexpected Termination: LOST
+
+If a `Task` stays in a transient task state for too long (such as `ASSIGNED`
+or `STARTING`), the scheduler forces it into `LOST` state, creating a new
+`Task` in its place that's sent into `PENDING` state.
+
+In addition, if the Mesos core tells the scheduler that a agent has
+become unhealthy (or outright disappeared), the `Task`s assigned to that
+agent go into `LOST` state and new `Task`s are created in their place.
+From `PENDING` state, there is no guarantee a `Task` will be reassigned
+to the same machine unless job constraints explicitly force it there.
+
+### Giving Priority to Production Tasks: PREEMPTING
+
+Sometimes a Task needs to be interrupted, such as when a non-production
+Task's resources are needed by a higher priority production Task. This
+type of interruption is called a *pre-emption*. When this happens in
+Aurora, the non-production Task is killed and moved into
+the `PREEMPTING` state  when both the following are true:
+
+- The task being killed is a non-production task.
+- The other task is a `PENDING` production task that hasn't been
+  scheduled due to a lack of resources.
+
+The scheduler UI shows the non-production task was preempted in favor of
+the production task. At some point, tasks in `PREEMPTING` move to `KILLED`.
+
+Note that non-production tasks consuming many resources are likely to be
+preempted in favor of production tasks.
+
+### Making Room for Maintenance: DRAINING
+
+Cluster operators can set agent into maintenance mode. This will transition
+all `Task` running on this agent into `DRAINING` and eventually to `KILLED`.
+Drained `Task`s will be restarted on other agents for which no maintenance
+has been announced yet.
+
+
+
+## State Reconciliation
+
+Due to the many inevitable realities of distributed systems, there might
+be a mismatch of perceived and actual cluster state (e.g. a machine returns
+from a `netsplit` but the scheduler has already marked all its `Task`s as
+`LOST` and rescheduled them).
+
+Aurora regularly runs a state reconciliation process in order to detect
+and correct such issues (e.g. by killing the errant `RUNNING` tasks).
+By default, the proper detection of all failure scenarios and inconsistencies
+may take up to an hour.
+
+To emphasize this point: there is no uniqueness guarantee for a single
+instance of a job in the presence of network partitions. If the `Task`
+requires that, it should be baked in at the application level using a
+distributed coordination service such as Zookeeper.

Modified: aurora/site/source/documentation/latest/contributing.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/contributing.md?rev=1783940&r1=1783939&r2=1783940&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/contributing.md (original)
+++ aurora/site/source/documentation/latest/contributing.md Tue Feb 21 20:54:58 2017
@@ -36,7 +36,7 @@ Post a review with `rbt`, fill out the f
 
     ./rbt post -o
 
-If you're unsure about who to add as a reviewer, you can default to adding Bill Farner (wfarner) and
+If you're unsure about who to add as a reviewer, you can default to adding Zameer Manji (zmanji) and
 Joshua Cohen (jcohen). They will take care of finding an appropriate reviewer for the patch.
 
 Once you've done this, you probably want to mark the associated Jira issue as Reviewable.

Modified: aurora/site/source/documentation/latest/development/client.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/development/client.md?rev=1783940&r1=1783939&r2=1783940&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/development/client.md (original)
+++ aurora/site/source/documentation/latest/development/client.md Tue Feb 21 20:54:58 2017
@@ -17,6 +17,73 @@ are:
 If you want to build a source distribution of the client, you need to run `./build-support/release/make-python-sdists`.
 
 
+Creating Custom Builds
+----------------------
+
+There are situations where you may want to plug in custom logic to the Client that may not be
+applicable to the open source codebase. Rather than create a whole CLI from scratch, you can
+easily create your own custom, drop-in replacement aurora.pex using the pants build tool.
+
+First, create an AuroraCommandLine implementation as an entry-point for registering customizations:
+
+    from apache.aurora.client.cli.client import AuroraCommandLine
+
+    class CustomAuroraCommandLine(AuroraCommandLine):
+    """Custom AuroraCommandLine for your needs"""
+
+    @property
+    def name(self):
+      return "your-company-aurora"
+
+    @classmethod
+    def get_description(cls):
+      return 'Your Company internal Aurora client command line'
+
+    def __init__(self):
+      super(CustomAuroraCommandLine, self).__init__()
+      # Add custom plugins..
+      self.register_plugin(YourCustomPlugin())
+
+    def register_nouns(self):
+      super(CustomAuroraCommandLine, self).register_nouns()
+      # You can even add new commands / sub-commands!
+      self.register_noun(YourStartUpdateProxy())
+      self.register_noun(YourDeployWorkflowCommand())
+
+Secondly, create a main entry point:
+
+    def proxy_main():
+      client = CustomAuroraCommandLine()
+      if len(sys.argv) == 1:
+        sys.argv.append("-h")
+      sys.exit(client.execute(sys.argv[1:]))
+
+Finally, you can wire everything up with a pants BUILD file in your project directory:
+
+    python_binary(
+      name='aurora',
+      entry_point='your_company.aurora.client:proxy_main',
+      dependencies=[
+        ':client_lib'
+      ]
+    )
+
+    python_library(
+      name='client_lib',
+      sources = [
+        'client.py',
+        'custom_plugin.py',
+        'custom_command.py',
+      ],
+      dependencies = [
+        # The Apache Aurora client
+        # Any other dependencies for your custom code
+      ],
+    )
+
+Using the same commands to build the client as above (but obviously pointing to this BUILD file
+instead), you will have a drop-in replacement aurora.pex file with your customizations.
+
 Running/Debugging
 ------------------
 

Modified: aurora/site/source/documentation/latest/development/db-migration.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/development/db-migration.md?rev=1783940&r1=1783939&r2=1783940&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/development/db-migration.md (original)
+++ aurora/site/source/documentation/latest/development/db-migration.md Tue Feb 21 20:54:58 2017
@@ -14,7 +14,7 @@ When adding or altering tables or changi
 [schema.sql](../../src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql), a new
 migration class should be created under the org.apache.aurora.scheduler.storage.db.migration
 package. The class should implement the [MigrationScript](https://github.com/mybatis/migrations/blob/master/src/main/java/org/apache/ibatis/migration/MigrationScript.java)
-interface (see [V001_TestMigration](https://github.com/apache/aurora/blob/rel/0.16.0/src/test/java/org/apache/aurora/scheduler/storage/db/testmigration/V001_TestMigration.java)
+interface (see [V001_TestMigration](https://github.com/apache/aurora/blob/rel/0.17.0/src/test/java/org/apache/aurora/scheduler/storage/db/testmigration/V001_TestMigration.java)
 as an example). The upgrade and downgrade scripts are defined in this class. When restoring a
 snapshot the list of migrations on the classpath is compared to the list of applied changes in the
 DB. Any changes that have not yet been applied are executed and their downgrade script is stored

Modified: aurora/site/source/documentation/latest/development/design-documents.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/development/design-documents.md?rev=1783940&r1=1783939&r2=1783940&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/development/design-documents.md (original)
+++ aurora/site/source/documentation/latest/development/design-documents.md Tue Feb 21 20:54:58 2017
@@ -11,7 +11,7 @@ Current and past documents:
 * [Command Hooks for the Aurora Client](../design/command-hooks/)
 * [Dynamic Reservations](https://docs.google.com/document/d/19gV8Po6DIHO14tOC7Qouk8RnboY8UCfRTninwn_5-7c/edit)
 * [GPU Resources in Aurora](https://docs.google.com/document/d/1J9SIswRMpVKQpnlvJAMAJtKfPP7ZARFknuyXl-2aZ-M/edit)
-* [Health Checks for Updates](https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit)
+* [Health Checks for Updates](https://docs.google.com/document/d/1KOO0LC046k75TqQqJ4c0FQcVGbxvrn71E10wAjMorVY/edit)
 * [JobUpdateDiff thrift API](https://docs.google.com/document/d/1Fc_YhhV7fc4D9Xv6gJzpfooxbK4YWZcvzw6Bd3qVTL8/edit)
 * [REST API RFC](https://docs.google.com/document/d/11_lAsYIRlD5ETRzF2eSd3oa8LXAHYFD8rSetspYXaf4/edit)
 * [Revocable Mesos offers in Aurora](https://docs.google.com/document/d/1r1WCHgmPJp5wbrqSZLsgtxPNj3sULfHrSFmxp2GyPTo/edit)

Modified: aurora/site/source/documentation/latest/development/thrift.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/development/thrift.md?rev=1783940&r1=1783939&r2=1783940&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/development/thrift.md (original)
+++ aurora/site/source/documentation/latest/development/thrift.md Tue Feb 21 20:54:58 2017
@@ -6,7 +6,7 @@ client/server RPC protocol as well as fo
 correctly handling additions and renames of the existing members, field removals must be done
 carefully to ensure backwards compatibility and provide predictable deprecation cycle. This
 document describes general guidelines for making Thrift schema changes to the existing fields in
-[api.thrift](https://github.com/apache/aurora/blob/rel/0.16.0/api/src/main/thrift/org/apache/aurora/gen/api.thrift).
+[api.thrift](https://github.com/apache/aurora/blob/rel/0.17.0/api/src/main/thrift/org/apache/aurora/gen/api.thrift).
 
 It is highly recommended to go through the
 [Thrift: The Missing Guide](http://diwakergupta.github.io/thrift-missing-guide/) first to refresh on
@@ -33,7 +33,7 @@ communicate with scheduler/client from v
 * Add a new field as an eventual replacement of the old one and implement a dual read/write
 anywhere the old field is used. If a thrift struct is mapped in the DB store make sure both columns
 are marked as `NOT NULL`
-* Check [storage.thrift](https://github.com/apache/aurora/blob/rel/0.16.0/api/src/main/thrift/org/apache/aurora/gen/storage.thrift) to see if
+* Check [storage.thrift](https://github.com/apache/aurora/blob/rel/0.17.0/api/src/main/thrift/org/apache/aurora/gen/storage.thrift) to see if
 the affected struct is stored in Aurora scheduler storage. If so, it's almost certainly also
 necessary to perform a [DB migration](../db-migration/).
 * Add a deprecation jira ticket into the vCurrent+1 release candidate

Modified: aurora/site/source/documentation/latest/features/containers.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/containers.md?rev=1783940&r1=1783939&r2=1783940&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/features/containers.md (original)
+++ aurora/site/source/documentation/latest/features/containers.md Tue Feb 21 20:54:58 2017
@@ -66,7 +66,12 @@ Docker Containerizer
 
 The Docker containerizer launches container images using the Docker engine. It may often provide
 more advanced features than the native Mesos containerizer, but has to be installed separately to
-Mesos on each agent host,
+Mesos on each agent host.
+
+Starting with the 0.17.0 release, `image` can be specified with a `{{docker.image[name][tag]}}` binder so that
+the tag can be resolved to a concrete image digest. This ensures that the job always uses the same image
+across restarts, even if the version identified by the tag has been updated, guaranteeing that only job
+updates can mutate configuration.
 
 Example (available in the [Vagrant environment](../../getting-started/vagrant/)):
 
@@ -93,9 +98,28 @@ Example (available in the [Vagrant envir
         name = 'hello_docker',
         task = task,
         container = Docker(image = 'python:2.7')
+      ), Service(
+        cluster = 'devcluster',
+        environment = 'devel',
+        role = 'www-data',
+        name = 'hello_docker_engine_binding',
+        task = task,
+        container = Docker(image = '{{docker.image[library/python][2.7]}}')
       )
     ]
 
+Note, this feature requires a v2 Docker registry. If using a private Docker registry its url
+must be specified in the `clusters.json` configuration file under the key `docker_registry`.
+If not specified `docker_registry` defaults to `https://registry-1.docker.io` (Docker Hub).
+
+Example:
+    # clusters.json
+    [{
+      "name": "devcluster",
+      ...
+      "docker_registry": "https://registry.example.com"
+    }]
+
 Details of how to use Docker via the Docker engine can be found in the
 [Reference Documentation](../../reference/configuration/#docker-object). Please note that in order to
 correctly execute processes inside a job, the Docker container must have Python 2.7 and potentitally

Modified: aurora/site/source/documentation/latest/features/job-updates.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/job-updates.md?rev=1783940&r1=1783939&r2=1783940&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/features/job-updates.md (original)
+++ aurora/site/source/documentation/latest/features/job-updates.md Tue Feb 21 20:54:58 2017
@@ -34,7 +34,7 @@ You may `abort` a job update regardless
 instruct the scheduler to completely abandon the job update and leave the job
 in the current (possibly partially-updated) state.
 
-For a configuration update, the Aurora Client calculates required changes
+For a configuration update, the Aurora Scheduler calculates required changes
 by examining the current job config state and the new desired job config.
 It then starts a *rolling batched update process* by going through every batch
 and performing these operations:
@@ -44,14 +44,13 @@ and performing these operations:
 - If an instance is not present in the scheduler but is present in
   the new config, then the instance is created.
 - If an instance is present in both the scheduler and the new config, then
-  the client diffs both task configs. If it detects any changes, it
+  the scheduler diffs both task configs. If it detects any changes, it
   performs an instance update by killing the old config instance and adds
   the new config instance.
 
-The Aurora client continues through the instance list until all tasks are
-updated, in `RUNNING,` and healthy for a configurable amount of time.
-If the client determines the update is not going well (a percentage of health
-checks have failed), it cancels the update.
+The Aurora Scheduler continues through the instance list until all tasks are
+updated and in `RUNNING`. If the scheduler determines the update is not going
+well (based on the criteria specified in the UpdateConfig), it cancels the update.
 
 Update cancellation runs a procedure similar to the described above
 update sequence, but in reverse order. New instance configs are swapped
@@ -59,7 +58,7 @@ with old instance configs and batch upda
 from the point where the update failed. E.g.; (0,1,2) (3,4,5) (6,7,
 8-FAIL) results in a rollback in order (8,7,6) (5,4,3) (2,1,0).
 
-For details how to control a job update, please see the
+For details on how to control a job update, please see the
 [UpdateConfig](../../reference/configuration/#updateconfig-objects) configuration object.
 
 
@@ -71,7 +70,7 @@ acknowledging ("heartbeating") job updat
 service updates where explicit job health monitoring is vital during the entire job update
 lifecycle. Such job updates would rely on an external service (or a custom client) periodically
 pulsing an active coordinated job update via a
-[pulseJobUpdate RPC](https://github.com/apache/aurora/blob/rel/0.16.0/api/src/main/thrift/org/apache/aurora/gen/api.thrift).
+[pulseJobUpdate RPC](https://github.com/apache/aurora/blob/rel/0.17.0/api/src/main/thrift/org/apache/aurora/gen/api.thrift).
 
 A coordinated update is defined by setting a positive
 [pulse_interval_secs](../../reference/configuration/#updateconfig-objects) value in job configuration

Modified: aurora/site/source/documentation/latest/features/services.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/services.md?rev=1783940&r1=1783939&r2=1783940&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/features/services.md (original)
+++ aurora/site/source/documentation/latest/features/services.md Tue Feb 21 20:54:58 2017
@@ -90,6 +90,23 @@ Please see the
 [configuration reference](../../reference/configuration/#healthcheckconfig-objects)
 for configuration options for this feature.
 
+Starting with the 0.17.0 release, job updates rely only on task health-checks by introducing
+a `min_consecutive_successes` parameter on the HealthCheckConfig object. This parameter represents
+the number of successful health checks needed before a task is moved into the `RUNNING` state. Tasks
+that do not have enough successful health checks within the first `n` attempts, are moved to the
+`FAILED` state, where `n = ceil(initial_interval_secs/interval_secs) + max_consecutive_failures +
+min_consecutive_successes`. In order to accommodate variability during task warm up, `initial_interval_secs`
+will act as a grace period. Any health-check failures during the first `m` attempts are ignored and
+do not count towards `max_consecutive_failures`, where `m = ceil(initial_interval_secs/interval_secs)`.
+
+As [job updates](../job-updates/) are based only on health-checks, it is not necessary to set
+`watch_secs` to the worst-case update time, it can instead be set to 0. The scheduler considers a
+task that is in the `RUNNING` to be healthy and proceeds to updating the next batch of instances.
+For details on how to control health checks, please see the
+[HealthCheckConfig](../../reference/configuration/#healthcheckconfig-objects) configuration object.
+Existing jobs that do not configure a health-check can fall-back to using `watch_secs` to
+monitor a task before considering it healthy.
+
 You can pause health checking by touching a file inside of your sandbox, named `.healthchecksnooze`.
 As long as that file is present, health checks will be disabled, enabling users to gather core
 dumps or other performance measurements without worrying about Aurora's health check killing

Modified: aurora/site/source/documentation/latest/features/sla-metrics.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/sla-metrics.md?rev=1783940&r1=1783939&r2=1783940&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/features/sla-metrics.md (original)
+++ aurora/site/source/documentation/latest/features/sla-metrics.md Tue Feb 21 20:54:58 2017
@@ -63,7 +63,7 @@ relevant to uptime calculations. By appl
 transition records, we can build a deterministic downtime trace for every given service instance.
 
 A task going through a state transition carries one of three possible SLA meanings
-(see [SlaAlgorithm.java](https://github.com/apache/aurora/blob/rel/0.16.0/src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java) for
+(see [SlaAlgorithm.java](https://github.com/apache/aurora/blob/rel/0.17.0/src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java) for
 sla-to-task-state mapping):
 
 * Task is UP: starts a period where the task is considered to be up and running from the Aurora
@@ -110,7 +110,7 @@ metric that helps track the dependency o
 * Per job - `sla_<job_key>_mtta_ms`
 * Per cluster - `sla_cluster_mtta_ms`
 * Per instance size (small, medium, large, x-large, xx-large). Size are defined in:
-[ResourceBag.java](https://github.com/apache/aurora/blob/rel/0.16.0/src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java)
+[ResourceBag.java](https://github.com/apache/aurora/blob/rel/0.17.0/src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java)
   * By CPU:
     * `sla_cpu_small_mtta_ms`
     * `sla_cpu_medium_mtta_ms`
@@ -147,7 +147,7 @@ for a task.*
 * Per job - `sla_<job_key>_mtts_ms`
 * Per cluster - `sla_cluster_mtts_ms`
 * Per instance size (small, medium, large, x-large, xx-large). Size are defined in:
-[ResourceBag.java](https://github.com/apache/aurora/blob/rel/0.16.0/src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java)
+[ResourceBag.java](https://github.com/apache/aurora/blob/rel/0.17.0/src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java)
   * By CPU:
     * `sla_cpu_small_mtts_ms`
     * `sla_cpu_medium_mtts_ms`
@@ -182,7 +182,7 @@ reflecting on the overall time it takes
 * Per job - `sla_<job_key>_mttr_ms`
 * Per cluster - `sla_cluster_mttr_ms`
 * Per instance size (small, medium, large, x-large, xx-large). Size are defined in:
-[ResourceBag.java](https://github.com/apache/aurora/blob/rel/0.16.0/src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java)
+[ResourceBag.java](https://github.com/apache/aurora/blob/rel/0.17.0/src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java)
   * By CPU:
     * `sla_cpu_small_mttr_ms`
     * `sla_cpu_medium_mttr_ms`

Modified: aurora/site/source/documentation/latest/operations/configuration.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/operations/configuration.md?rev=1783940&r1=1783939&r2=1783940&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/operations/configuration.md (original)
+++ aurora/site/source/documentation/latest/operations/configuration.md Tue Feb 21 20:54:58 2017
@@ -70,7 +70,7 @@ for Mesos replicated log files to ensure
 ### `-native_log_zk_group_path`
 ZooKeeper path used for Mesos replicated log quorum discovery.
 
-See [code](https://github.com/apache/aurora/blob/rel/0.16.0/src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java) for
+See [code](https://github.com/apache/aurora/blob/rel/0.17.0/src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java) for
 other available Mesos replicated log configuration options and default values.
 
 ### Changing the Quorum Size
@@ -131,7 +131,7 @@ the latter needs to be enabled via:
 
     -enable_revocable_ram=true
 
-Unless you want to use the [default](https://github.com/apache/aurora/blob/rel/0.16.0/src/main/resources/org/apache/aurora/scheduler/tiers.json)
+Unless you want to use the [default](https://github.com/apache/aurora/blob/rel/0.17.0/src/main/resources/org/apache/aurora/scheduler/tiers.json)
 tier configuration, you will also have to specify a file path:
 
     -tier_config=path/to/tiers/config.json

Modified: aurora/site/source/documentation/latest/operations/installation.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/operations/installation.md?rev=1783940&r1=1783939&r2=1783940&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/operations/installation.md (original)
+++ aurora/site/source/documentation/latest/operations/installation.md Tue Feb 21 20:54:58 2017
@@ -61,8 +61,8 @@ Any machines that users submit jobs from
 
         sudo update-alternatives --set java /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
 
-        wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-scheduler_0.15.0_amd64.deb
-        sudo dpkg -i aurora-scheduler_0.15.0_amd64.deb
+        wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-scheduler_0.17.0_amd64.deb
+        sudo dpkg -i aurora-scheduler_0.17.0_amd64.deb
 
 ### CentOS 7
 
@@ -83,8 +83,8 @@ Any machines that users submit jobs from
 
         sudo yum install -y wget
 
-        wget -c https://apache.bintray.com/aurora/centos-7/aurora-scheduler-0.15.0-1.el7.centos.aurora.x86_64.rpm
-        sudo yum install -y aurora-scheduler-0.15.0-1.el7.centos.aurora.x86_64.rpm
+        wget -c https://apache.bintray.com/aurora/centos-7/aurora-scheduler-0.17.0-1.el7.centos.aurora.x86_64.rpm
+        sudo yum install -y aurora-scheduler-0.17.0-1.el7.centos.aurora.x86_64.rpm
 
 ### Finalizing
 By default, the scheduler will start in an uninitialized mode.  This is because external
@@ -123,8 +123,8 @@ CentOS: `sudo systemctl start aurora`
         # for the python mesos native bindings.
         sudo apt-get -y install libcurl4-nss-dev
 
-        wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-executor_0.15.0_amd64.deb
-        sudo dpkg -i aurora-executor_0.15.0_amd64.deb
+        wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-executor_0.17.0_amd64.deb
+        sudo dpkg -i aurora-executor_0.17.0_amd64.deb
 
 ### CentOS 7
 
@@ -137,8 +137,8 @@ CentOS: `sudo systemctl start aurora`
 
         sudo yum install -y python2 wget
 
-        wget -c https://apache.bintray.com/aurora/centos-7/aurora-executor-0.15.0-1.el7.centos.aurora.x86_64.rpm
-        sudo yum install -y aurora-executor-0.15.0-1.el7.centos.aurora.x86_64.rpm
+        wget -c https://apache.bintray.com/aurora/centos-7/aurora-executor-0.17.0-1.el7.centos.aurora.x86_64.rpm
+        sudo yum install -y aurora-executor-0.17.0-1.el7.centos.aurora.x86_64.rpm
 
 ### Configuration
 The executor typically does not require configuration.  Command line arguments can
@@ -199,15 +199,15 @@ Make an edit to add the `--mesos-root` f
 
     sudo apt-get install -y python2.7 wget
 
-    wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-tools_0.15.0_amd64.deb
-    sudo dpkg -i aurora-tools_0.15.0_amd64.deb
+    wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-tools_0.17.0_amd64.deb
+    sudo dpkg -i aurora-tools_0.17.0_amd64.deb
 
 ### CentOS 7
 
     sudo yum install -y python2 wget
 
-    wget -c https://apache.bintray.com/aurora/centos-7/aurora-tools-0.15.0-1.el7.centos.aurora.x86_64.rpm
-    sudo yum install -y aurora-tools-0.15.0-1.el7.centos.aurora.x86_64.rpm
+    wget -c https://apache.bintray.com/aurora/centos-7/aurora-tools-0.17.0-1.el7.centos.aurora.x86_64.rpm
+    sudo yum install -y aurora-tools-0.17.0-1.el7.centos.aurora.x86_64.rpm
 
 ### Mac OS X
 
@@ -239,12 +239,12 @@ are identical for both.
     sudo apt-get -y update
 
     # Use `apt-cache showpkg mesos | grep [version]` to find the exact version.
-    sudo apt-get -y install mesos=0.28.2-2.0.27.ubuntu1404_amd64
+    sudo apt-get -y install mesos=1.1.0-2.0.107.ubuntu1404_amd64.deb
 
 ### Mesos on CentOS 7
 
     sudo rpm -Uvh https://repos.mesosphere.io/el/7/noarch/RPMS/mesosphere-el-repo-7-1.noarch.rpm
-    sudo yum -y install mesos-0.28.2
+    sudo yum -y install mesos-1.1.0
 
 
 

Modified: aurora/site/source/documentation/latest/operations/security.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/operations/security.md?rev=1783940&r1=1783939&r2=1783940&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/operations/security.md (original)
+++ aurora/site/source/documentation/latest/operations/security.md Tue Feb 21 20:54:58 2017
@@ -21,10 +21,11 @@ controls for talking to ZooKeeper.
 		- [Caveats](#caveats)
 - [Implementing a Custom Realm](#implementing-a-custom-realm)
 	- [Packaging a realm module](#packaging-a-realm-module)
-- [Known Issues](#known-issues)
 - [Announcer Authentication](#announcer-authentication)
     - [ZooKeeper authentication configuration](#zookeeper-authentication-configuration)
     - [Executor settings](#executor-settings)
+- [Scheduler HTTPS](#scheduler-https)
+- [Known Issues](#known-issues)
 
 # Enabling Security
 
@@ -275,18 +276,6 @@ class name:
 -shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ,com.example.MyRealmModule
 ```
 
-# Known Issues
-
-While the APIs and SPIs we ship with are stable as of 0.8.0, we are aware of several incremental
-improvements. Please follow, vote, or send patches.
-
-Relevant tickets:
-* [AURORA-343](https://issues.apache.org/jira/browse/AURORA-343): HTTPS support
-* [AURORA-1248](https://issues.apache.org/jira/browse/AURORA-1248): Client retries 4xx errors
-* [AURORA-1279](https://issues.apache.org/jira/browse/AURORA-1279): Remove kerberos-specific build targets
-* [AURORA-1293](https://issues.apache.org/jira/browse/AURORA-1291): Consider defining a JSON format in place of INI
-* [AURORA-1179](https://issues.apache.org/jira/browse/AURORA-1179): Supported hashed passwords in security.ini
-* [AURORA-1295](https://issues.apache.org/jira/browse/AURORA-1295): Support security for the ReadOnlyScheduler service
 
 # Announcer Authentication
 The Thermos executor can be configured to authenticate with ZooKeeper and include
@@ -337,4 +326,37 @@ All properties of the `permissions` obje
 
 ## Executor settings
 To enable the executor to authenticate against ZK, `--announcer-zookeeper-auth-config` should be
-set to the configuration file.
\ No newline at end of file
+set to the configuration file.
+
+
+# Scheduler HTTPS
+
+The Aurora scheduler does not provide native HTTPS support ([AURORA-343](https://issues.apache.org/jira/browse/AURORA-343)).
+It is therefore recommended to deploy it behind an HTTPS capable reverse proxy such as nginx or Apache2.
+
+A simple setup is to launch both the reverse proxy and the Aurora scheduler on the same port, but
+bind the reverse proxy to the public IP of the host and the scheduler to localhost:
+
+    -ip=127.0.0.1
+    -http_port=8081
+
+If your clients connect to the scheduler via [`proxy_url`](../../reference/scheduler-configuration/),
+you can update it to `https`. If you use the ZooKeeper based discovery instead, the scheduler
+needs to be launched via
+
+    -serverset_endpoint_name=https
+
+in order to announce its HTTPS support within ZooKeeper.
+
+
+# Known Issues
+
+While the APIs and SPIs we ship with are stable as of 0.8.0, we are aware of several incremental
+improvements. Please follow, vote, or send patches.
+
+Relevant tickets:
+* [AURORA-1248](https://issues.apache.org/jira/browse/AURORA-1248): Client retries 4xx errors
+* [AURORA-1279](https://issues.apache.org/jira/browse/AURORA-1279): Remove kerberos-specific build targets
+* [AURORA-1293](https://issues.apache.org/jira/browse/AURORA-1291): Consider defining a JSON format in place of INI
+* [AURORA-1179](https://issues.apache.org/jira/browse/AURORA-1179): Supported hashed passwords in security.ini
+* [AURORA-1295](https://issues.apache.org/jira/browse/AURORA-1295): Support security for the ReadOnlyScheduler service

Modified: aurora/site/source/documentation/latest/reference/client-cluster-configuration.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/client-cluster-configuration.md?rev=1783940&r1=1783939&r2=1783940&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/reference/client-cluster-configuration.md (original)
+++ aurora/site/source/documentation/latest/reference/client-cluster-configuration.md Tue Feb 21 20:54:58 2017
@@ -35,6 +35,7 @@ The following properties may be set:
    **scheduler_uri**       | String   | URI of Aurora scheduler instance.
    **proxy_url**           | String   | Used by the client to format URLs for display.
    **auth_mechanism**      | String   | The authentication mechanism to use when communicating with the scheduler. (Default: UNAUTHENTICATED)
+   **docker_registry**     | String   | Used by the client to resolve docker tags.
 
 
 ## Details
@@ -91,3 +92,8 @@ URL of your VIP in a loadbalancer or a r
 The identifier of an authentication mechanism that the client should use when communicating with the
 scheduler. Support for values other than `UNAUTHENTICATED` requires a matching scheduler-side
 [security configuration](../../operations/security/).
+
+### `docker_registry`
+
+The URI of the Docker Registry that will be used by the Aurora client to resolve docker tags to concrete
+image ids, when using the docker binding helper, like `{{docker.image[name][tag]}}`.

Modified: aurora/site/source/documentation/latest/reference/configuration-tutorial.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/configuration-tutorial.md?rev=1783940&r1=1783939&r2=1783940&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/reference/configuration-tutorial.md (original)
+++ aurora/site/source/documentation/latest/reference/configuration-tutorial.md Tue Feb 21 20:54:58 2017
@@ -243,6 +243,26 @@ The template for this Process is:
 
 Note: Be sure the extracted code archive has an executable.
 
+## Getting Environment Variables Into The Sandbox
+
+Every time a process is forked the Thermos executor checks for the existence of the
+`.thermos_profile` file, if the `.thermos_profile` file exists it will be sourced.
+You can utilize this process to pass environment variables to the sandbox.
+
+An example for this Process is:
+
+    setup_env = Process(
+      name = 'setup'
+      cmdline = '''cat <<EOF > .thermos_profile
+                   export RESULT=hello
+                   EOF'''
+    )
+
+    read_env = Process(
+      name = 'read'
+      cmdline = 'echo $RESULT'
+    )
+
 ## Defining Task Objects
 
 Tasks are handled by Mesos. A task is a collection of processes that
@@ -508,4 +528,4 @@ Then issue the following commands to cre
 
     aurora job create cluster1/$USER/test/hello_world-cluster1 hello_world_productionized.aurora
 
-    aurora job kill cluster1/$USER/test/hello_world-cluster1
+    aurora job kill cluster1/$USER/test/hello_world-cluster1
\ No newline at end of file

Modified: aurora/site/source/documentation/latest/reference/configuration.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/configuration.md?rev=1783940&r1=1783939&r2=1783940&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/reference/configuration.md (original)
+++ aurora/site/source/documentation/latest/reference/configuration.md Tue Feb 21 20:54:58 2017
@@ -379,9 +379,10 @@ Parameters for controlling a task's heal
 | param                          | type      | description
 | -------                        | :-------: | --------
 | ```health_checker```           | HealthCheckerConfig | Configure what kind of health check to use.
-| ```initial_interval_secs```    | Integer   | Initial delay for performing a health check. (Default: 15)
+| ```initial_interval_secs```    | Integer   | Initial grace period (during which health-check failures are ignored) while performing health checks. (Default: 15)
 | ```interval_secs```            | Integer   | Interval on which to check the task's health. (Default: 10)
 | ```max_consecutive_failures``` | Integer   | Maximum number of consecutive failures that will be tolerated before considering a task unhealthy (Default: 0)
+| ```min_consecutive_successes``` | Integer   | Minimum number of consecutive successful health checks required before considering a task healthy (Default: 1)
 | ```timeout_secs```             | Integer   | Health check timeout. (Default: 1)
 
 ### HealthCheckerConfig Objects

Modified: aurora/site/source/documentation/latest/reference/scheduler-configuration.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/scheduler-configuration.md?rev=1783940&r1=1783939&r2=1783940&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/reference/scheduler-configuration.md (original)
+++ aurora/site/source/documentation/latest/reference/scheduler-configuration.md Tue Feb 21 20:54:58 2017
@@ -42,6 +42,8 @@ Required flags:
 	Endpoint specification for the ZooKeeper servers.
 
 Optional flags:
+-allow_container_volumes (default false)
+	Allow passing in volumes in the job. Enabling this could pose a privilege escalation threat.
 -allow_docker_parameters (default false)
 	Allow to pass docker container parameters in the job.
 -allow_gpu_resource (default false)
@@ -56,9 +58,11 @@ Optional flags:
 	The number of worker threads to process async task operations with.
 -backup_interval (default (1, hrs))
 	Minimum interval on which to write a storage backup.
--cron_scheduler_num_threads (default 100)
+-cron_scheduler_num_threads (default 10)
 	Number of threads to use for the cron scheduler thread pool.
--cron_start_initial_backoff (default (1, secs))
+-cron_scheduling_max_batch_size (default 10) [must be > 0]
+	The maximum number of triggered cron jobs that can be processed in a batch.
+-cron_start_initial_backoff (default (5, secs))
 	Initial backoff delay while waiting for a previous cron run to be killed.
 -cron_start_max_backoff (default (1, mins))
 	Max backoff delay while waiting for a previous cron run to be killed.
@@ -80,6 +84,8 @@ Optional flags:
 	Specifies the frequency at which snapshots of local storage are taken and written to the log.
 -enable_cors_for
 	List of domains for which CORS support should be enabled.
+-enable_db_metrics (default true)
+	Whether to use MyBatis interceptor to measure the timing of intercepted Statements.
 -enable_h2_console (default false)
 	Enable H2 DB management console.
 -enable_mesos_fetcher (default false)
@@ -100,8 +106,8 @@ Optional flags:
 	When 'framework_authentication_file' flag is set, the FrameworkInfo registered with the mesos master will also contain the principal. This is necessary if you intend to use mesos authorization via mesos ACLs. The default will change in a future release. Changing this value is backwards incompatible. For details, see MESOS-703.
 -framework_failover_timeout (default (21, days))
 	Time after which a framework is considered deleted.  SHOULD BE VERY HIGH.
--framework_name (default TwitterScheduler)
-	Name used to register the Aurora framework with Mesos. Changing this value can be backwards incompatible. For details, see MESOS-703.
+-framework_name (default Aurora)
+	Name used to register the Aurora framework with Mesos.
 -global_container_mounts (default [])
 	A comma separated list of mount points (in host:container form) to mount into all (non-mesos) containers.
 -history_max_per_job_threshold (default 100)
@@ -150,8 +156,12 @@ Optional flags:
 	Maximum delay between attempts to schedule a PENDING tasks.
 -max_status_update_batch_size (default 1000) [must be > 0]
 	The maximum number of status updates that can be processed in a batch.
+-max_task_event_batch_size (default 300) [must be > 0]
+	The maximum number of task state change events that can be processed in a batch.
 -max_tasks_per_job (default 4000) [must be > 0]
 	Maximum number of allowed tasks in a single job.
+-max_tasks_per_schedule_attempt (default 5) [must be > 0]
+	The maximum number of tasks to pick in a single scheduling attempt.
 -max_update_instance_failures (default 20000) [must be > 0]
 	Upper limit on the number of failures allowed during a job update. This helps cap potentially unbounded entries into storage.
 -min_offer_hold_time (default (5, mins))
@@ -200,9 +210,13 @@ Optional flags:
 	Difference between explicit and implicit reconciliation intervals intended to create a non-overlapping task reconciliation schedule.
 -require_docker_use_executor (default true)
 	If false, Docker tasks may run without an executor (EXPERIMENTAL)
+-scheduling_max_batch_size (default 3) [must be > 0]
+	The maximum number of scheduling attempts that can be processed in a batch.
+-serverset_endpoint_name (default http)
+	Name of the scheduler endpoint published in ZooKeeper.
 -shiro_ini_path
 	Path to shiro.ini for authentication and authorization configuration.
--shiro_realm_modules (default [org.apache.aurora.scheduler.app.MoreModules$1@158a8276])
+-shiro_realm_modules (default [class org.apache.aurora.scheduler.http.api.security.IniShiroRealmModule])
 	Guice modules for configuring Shiro Realms.
 -sla_non_prod_metrics (default [])
 	Metric categories collected for non production tasks.
@@ -214,6 +228,8 @@ Optional flags:
 	Log all queries that take at least this long to execute.
 -slow_query_log_threshold (default (25, ms))
 	Log all queries that take at least this long to execute.
+-snapshot_hydrate_stores (default [locks, hosts, quota, job_updates])
+	Which H2-backed stores to fully hydrate on the Snapshot.
 -stat_retention_period (default (1, hrs))
 	Time for a stat to be retained in memory before expiring.
 -stat_sampling_interval (default (1, secs))

Modified: aurora/site/source/documentation/latest/reference/task-lifecycle.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/task-lifecycle.md?rev=1783940&r1=1783939&r2=1783940&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/reference/task-lifecycle.md (original)
+++ aurora/site/source/documentation/latest/reference/task-lifecycle.md Tue Feb 21 20:54:58 2017
@@ -35,7 +35,9 @@ the `Task` goes into `STARTING` state.
 `STARTING` state initializes a `Task` sandbox. When the sandbox is fully
 initialized, Thermos begins to invoke `Process`es. Also, the agent
 machine sends an update to the scheduler that the `Task` is
-in `RUNNING` state.
+in `RUNNING` state, only after the task satisfies the liveness requirements.
+See [Health Checking](../features/services#health-checking) for more details
+for how to configure health checks.
 
 
 

Modified: aurora/site/source/downloads.html.md.erb
URL: http://svn.apache.org/viewvc/aurora/site/source/downloads.html.md.erb?rev=1783940&r1=1783939&r2=1783940&view=diff
==============================================================================
--- aurora/site/source/downloads.html.md.erb (original)
+++ aurora/site/source/downloads.html.md.erb Tue Feb 21 20:54:58 2017
@@ -9,6 +9,8 @@ If you're new, consider starting with th
   Signature files are not propagated to mirrors, so we link directly to those.
 -->
 ## Binary distributions
+Ubuntu 16.04 [ ![Download](https://api.bintray.com/packages/apache/aurora/ubuntu-xenial/images/download.svg) ](https://bintray.com/apache/aurora/ubuntu-xenial/_latestVersion)
+
 Ubuntu 14.04 [ ![Download](https://api.bintray.com/packages/apache/aurora/debian-ubuntu-trusty/images/download.svg) ](https://bintray.com/apache/aurora/debian-ubuntu-trusty/_latestVersion)
 
 Debian Jessie [ ![Download](https://api.bintray.com/packages/apache/aurora/debian-jessie/images/download.svg) ](https://bintray.com/apache/aurora/debian-jessie/_latestVersion)