You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@aurora.apache.org by se...@apache.org on 2016/03/28 22:55:42 UTC

[1/7] aurora git commit: Reorganize Documentation

Repository: aurora
Updated Branches:
  refs/heads/master 095009596 -> f28f41a70


http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/security.md
----------------------------------------------------------------------
diff --git a/docs/security.md b/docs/security.md
deleted file mode 100644
index 32bea42..0000000
--- a/docs/security.md
+++ /dev/null
@@ -1,279 +0,0 @@
-Aurora integrates with [Apache Shiro](http://shiro.apache.org/) to provide security
-controls for its API. In addition to providing some useful features out of the box, Shiro
-also allows Aurora cluster administrators to adapt the security system to their organization’s
-existing infrastructure.
-
-- [Enabling Security](#enabling-security)
-- [Authentication](#authentication)
-	- [HTTP Basic Authentication](#http-basic-authentication)
-		- [Server Configuration](#server-configuration)
-		- [Client Configuration](#client-configuration)
-	- [HTTP SPNEGO Authentication (Kerberos)](#http-spnego-authentication-kerberos)
-		- [Server Configuration](#server-configuration-1)
-		- [Client Configuration](#client-configuration-1)
-- [Authorization](#authorization)
-	- [Using an INI file to define security controls](#using-an-ini-file-to-define-security-controls)
-		- [Caveats](#caveats)
-- [Implementing a Custom Realm](#implementing-a-custom-realm)
-	- [Packaging a realm module](#packaging-a-realm-module)
-- [Known Issues](#known-issues)
-
-# Enabling Security
-
-There are two major components of security:
-[authentication and authorization](http://en.wikipedia.org/wiki/Authentication#Authorization).  A
-cluster administrator may choose the approach used for each, and may also implement custom
-mechanisms for either.  Later sections describe the options available.
-
-# Authentication
-
-The scheduler must be configured with instructions for how to process authentication
-credentials at a minimum.  There are currently two built-in authentication schemes -
-[HTTP Basic Authentication](http://en.wikipedia.org/wiki/Basic_access_authentication), and
-[SPNEGO](http://en.wikipedia.org/wiki/SPNEGO) (Kerberos).
-
-## HTTP Basic Authentication
-
-Basic Authentication is a very quick way to add *some* security.  It is supported
-by all major browsers and HTTP client libraries with minimal work.  However,
-before relying on Basic Authentication you should be aware of the [security
-considerations](http://tools.ietf.org/html/rfc2617#section-4).
-
-### Server Configuration
-
-At a minimum you need to set 4 command-line flags on the scheduler:
-
-```
--http_authentication_mechanism=BASIC
--shiro_realm_modules=INI_AUTHNZ
--shiro_ini_path=path/to/security.ini
-```
-
-And create a security.ini file like so:
-
-```
-[users]
-sally = apple, admin
-
-[roles]
-admin = *
-```
-
-The details of the security.ini file are explained below. Note that this file contains plaintext,
-unhashed passwords.
-
-### Client Configuration
-
-To configure the client for HTTP Basic authentication, add an entry to ~/.netrc with your credentials
-
-```
-% cat ~/.netrc
-# ...
-
-machine aurora.example.com
-login sally
-password apple
-
-# ...
-```
-
-No changes are required to `clusters.json`.
-
-## HTTP SPNEGO Authentication (Kerberos)
-
-### Server Configuration
-At a minimum you need to set 6 command-line flags on the scheduler:
-
-```
--http_authentication_mechanism=NEGOTIATE
--shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ
--kerberos_server_principal=HTTP/aurora.example.com@EXAMPLE.COM
--kerberos_server_keytab=path/to/aurora.example.com.keytab
--shiro_ini_path=path/to/security.ini
-```
-
-And create a security.ini file like so:
-
-```
-% cat path/to/security.ini
-[users]
-sally = _, admin
-
-[roles]
-admin = *
-```
-
-What's going on here? First, Aurora must be configured to request Kerberos credentials when presented with an
-unauthenticated request. This is achieved by setting
-
-```
--http_authentication_mechanism=NEGOTIATE
-```
-
-Next, a Realm module must be configured to **authenticate** the current request using the Kerberos
-credentials that were requested. Aurora ships with a realm module that can do this
-
-```
--shiro_realm_modules=KERBEROS5_AUTHN[,...]
-```
-
-The Kerberos5Realm requires a keytab file and a server principal name. The principal name will usually
-be in the form `HTTP/aurora.example.com@EXAMPLE.COM`.
-
-```
--kerberos_server_principal=HTTP/aurora.example.com@EXAMPLE.COM
--kerberos_server_keytab=path/to/aurora.example.com.keytab
-```
-
-The Kerberos5 realm module is authentication-only. For scheduler security to work you must also
-enable a realm module that provides an Authorizer implementation. For example, to do this using the
-IniShiroRealmModule:
-
-```
--shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ
-```
-
-You can then configure authorization using a security.ini file as described below
-(the password field is ignored). You must configure the realm module with the path to this file:
-
-```
--shiro_ini_path=path/to/security.ini
-```
-
-### Client Configuration
-To use Kerberos on the client-side you must build Kerberos-enabled client binaries. Do this with
-
-```
-./pants binary src/main/python/apache/aurora/kerberos:kaurora
-./pants binary src/main/python/apache/aurora/kerberos:kaurora_admin
-```
-
-You must also configure each cluster where you've enabled Kerberos on the scheduler
-to use Kerberos authentication. Do this by setting `auth_mechanism` to `KERBEROS`
-in `clusters.json`.
-
-```
-% cat ~/.aurora/clusters.json
-{
-    "devcluser": {
-        "auth_mechanism": "KERBEROS",
-        ...
-    },
-    ...
-}
-```
-
-# Authorization
-Given a means to authenticate the entity a client claims they are, we need to define what privileges they have.
-
-## Using an INI file to define security controls
-
-The simplest security configuration for Aurora is an INI file on the scheduler.  For small
-clusters, or clusters where the users and access controls change relatively infrequently, this is
-likely the preferred approach.  However you may want to avoid this approach if access permissions
-are rapidly changing, or if your access control information already exists in another system.
-
-You can enable INI-based configuration with following scheduler command line arguments:
-
-```
--http_authentication_mechanism=BASIC
--shiro_ini_path=path/to/security.ini
-```
-
-*note* As the argument name reveals, this is using Shiro’s
-[IniRealm](http://shiro.apache.org/configuration.html#Configuration-INIConfiguration) behind
-the scenes.
-
-The INI file will contain two sections - users and roles.  Here’s an example for what might
-be in security.ini:
-
-```
-[users]
-sally = apple, admin
-jim = 123456, accounting
-becky = letmein, webapp
-larry = 654321,accounting
-steve = password
-
-[roles]
-admin = *
-accounting = thrift.AuroraAdmin:setQuota
-webapp = thrift.AuroraSchedulerManager:*:webapp
-```
-
-The users section defines user user credentials and the role(s) they are members of.  These lines
-are of the format `<user> = <password>[, <role>...]`.  As you probably noticed, the passwords are
-in plaintext and as a result read access to this file should be restricted.
-
-In this configuration, each user has different privileges for actions in the cluster because
-of the roles they are a part of:
-
-* admin is granted all privileges
-* accounting may adjust the amount of resource quota for any role
-* webapp represents a collection of jobs that represents a service, and its members may create and modify any jobs owned by it
-
-### Caveats
-You might find documentation on the Internet suggesting there are additional sections in `shiro.ini`,
-like `[main]` and `[urls]`. These are not supported by Aurora as it uses a different mechanism to configure
-those parts of Shiro. Think of Aurora's `security.ini` as a subset with only `[users]` and `[roles]` sections.
-
-## Implementing Delegated Authorization
-
-It is possible to leverage Shiro's `runAs` feature by implementing a custom Servlet Filter that provides
-the capability and passing it's fully qualified class name to the command line argument
-`-shiro_after_auth_filter`. The filter is registered in the same filter chain as the Shiro auth filters
-and is placed after the Shiro auth filters in the filter chain. This ensures that the Filter is invoked
-after the Shiro filters have had a chance to authenticate the request.
-
-# Implementing a Custom Realm
-
-Since Aurora’s security is backed by [Apache Shiro](https://shiro.apache.org), you can implement a
-custom [Realm](http://shiro.apache.org/realm.html) to define organization-specific security behavior.
-
-In addition to using Shiro's standard APIs to implement a Realm you can link against Aurora to
-access the type-safe Permissions Aurora uses. See the Javadoc for `org.apache.aurora.scheduler.spi`
-for more information.
-
-## Packaging a realm module
-Package your custom Realm(s) with a Guice module that exposes a `Set<Realm>` multibinding.
-
-```java
-package com.example;
-
-import com.google.inject.AbstractModule;
-import com.google.inject.multibindings.Multibinder;
-import org.apache.shiro.realm.Realm;
-
-public class MyRealmModule extends AbstractModule {
-  @Override
-  public void configure() {
-    Realm myRealm = new MyRealm();
-
-    Multibinder.newSetBinder(binder(), Realm.class).addBinding().toInstance(myRealm);
-  }
-
-  static class MyRealm implements Realm {
-    // Realm implementation.
-  }
-}
-```
-
-To use your module in the scheduler, include it as a realm module based on its fully-qualified
-class name:
-
-```
--shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ,com.example.MyRealmModule
-```
-
-# Known Issues
-
-While the APIs and SPIs we ship with are stable as of 0.8.0, we are aware of several incremental
-improvements. Please follow, vote, or send patches.
-
-Relevant tickets:
-* [AURORA-343](https://issues.apache.org/jira/browse/AURORA-343): HTTPS support
-* [AURORA-1248](https://issues.apache.org/jira/browse/AURORA-1248): Client retries 4xx errors
-* [AURORA-1279](https://issues.apache.org/jira/browse/AURORA-1279): Remove kerberos-specific build targets
-* [AURORA-1293](https://issues.apache.org/jira/browse/AURORA-1291): Consider defining a JSON format in place of INI
-* [AURORA-1179](https://issues.apache.org/jira/browse/AURORA-1179): Supported hashed passwords in security.ini
-* [AURORA-1295](https://issues.apache.org/jira/browse/AURORA-1295): Support security for the ReadOnlyScheduler service

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/sla.md
----------------------------------------------------------------------
diff --git a/docs/sla.md b/docs/sla.md
deleted file mode 100644
index a558e00..0000000
--- a/docs/sla.md
+++ /dev/null
@@ -1,177 +0,0 @@
-Aurora SLA Measurement
---------------
-
-- [Overview](#overview)
-- [Metric Details](#metric-details)
-  - [Platform Uptime](#platform-uptime)
-  - [Job Uptime](#job-uptime)
-  - [Median Time To Assigned (MTTA)](#median-time-to-assigned-\(mtta\))
-  - [Median Time To Running (MTTR)](#median-time-to-running-\(mttr\))
-- [Limitations](#limitations)
-
-## Overview
-
-The primary goal of the feature is collection and monitoring of Aurora job SLA (Service Level
-Agreements) metrics that defining a contractual relationship between the Aurora/Mesos platform
-and hosted services.
-
-The Aurora SLA feature is by default only enabled for service (non-cron)
-production jobs (`"production = True"` in your `.aurora` config). It can be enabled for
-non-production services via the scheduler command line flag `-sla_non_prod_metrics`.
-
-Counters that track SLA measurements are computed periodically within the scheduler.
-The individual instance metrics are refreshed every minute (configurable via
-`sla_stat_refresh_interval`). The instance counters are subsequently aggregated by
-relevant grouping types before exporting to scheduler `/vars` endpoint (when using `vagrant`
-that would be `http://192.168.33.7:8081/vars`)
-
-## Metric Details
-
-### Platform Uptime
-
-*Aggregate amount of time a job spends in a non-runnable state due to platform unavailability
-or scheduling delays. This metric tracks Aurora/Mesos uptime performance and reflects on any
-system-caused downtime events (tasks LOST or DRAINED). Any user-initiated task kills/restarts
-will not degrade this metric.*
-
-**Collection scope:**
-
-* Per job - `sla_<job_key>_platform_uptime_percent`
-* Per cluster - `sla_cluster_platform_uptime_percent`
-
-**Units:** percent
-
-A fault in the task environment may cause the Aurora/Mesos to have different views on the task state
-or lose track of the task existence. In such cases, the service task is marked as LOST and
-rescheduled by Aurora. For example, this may happen when the task stays in ASSIGNED or STARTING
-for too long or the Mesos slave becomes unhealthy (or disappears completely). The time between
-task entering LOST and its replacement reaching RUNNING state is counted towards platform downtime.
-
-Another example of a platform downtime event is the administrator-requested task rescheduling. This
-happens during planned Mesos slave maintenance when all slave tasks are marked as DRAINED and
-rescheduled elsewhere.
-
-To accurately calculate Platform Uptime, we must separate platform incurred downtime from user
-actions that put a service instance in a non-operational state. It is simpler to isolate
-user-incurred downtime and treat all other downtime as platform incurred.
-
-Currently, a user can cause a healthy service (task) downtime in only two ways: via `killTasks`
-or `restartShards` RPCs. For both, their affected tasks leave an audit state transition trail
-relevant to uptime calculations. By applying a special "SLA meaning" to exposed task state
-transition records, we can build a deterministic downtime trace for every given service instance.
-
-A task going through a state transition carries one of three possible SLA meanings
-(see [SlaAlgorithm.java](../src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java) for
-sla-to-task-state mapping):
-
-* Task is UP: starts a period where the task is considered to be up and running from the Aurora
-  platform standpoint.
-
-* Task is DOWN: starts a period where the task cannot reach the UP state for some
-  non-user-related reason. Counts towards instance downtime.
-
-* Task is REMOVED from SLA: starts a period where the task is not expected to be UP due to
-  user initiated action or failure. We ignore this period for the uptime calculation purposes.
-
-This metric is recalculated over the last sampling period (last minute) to account for
-any UP/DOWN/REMOVED events. It ignores any UP/DOWN events not immediately adjacent to the
-sampling interval as well as adjacent REMOVED events.
-
-### Job Uptime
-
-*Percentage of the job instances considered to be in RUNNING state for the specified duration
-relative to request time. This is a purely application side metric that is considering aggregate
-uptime of all RUNNING instances. Any user- or platform initiated restarts directly affect
-this metric.*
-
-**Collection scope:** We currently expose job uptime values at 5 pre-defined
-percentiles (50th,75th,90th,95th and 99th):
-
-* `sla_<job_key>_job_uptime_50_00_sec`
-* `sla_<job_key>_job_uptime_75_00_sec`
-* `sla_<job_key>_job_uptime_90_00_sec`
-* `sla_<job_key>_job_uptime_95_00_sec`
-* `sla_<job_key>_job_uptime_99_00_sec`
-
-**Units:** seconds
-You can also get customized real-time stats from aurora client. See `aurora sla -h` for
-more details.
-
-### Median Time To Assigned (MTTA)
-
-*Median time a job spends waiting for its tasks to be assigned to a host. This is a combined
-metric that helps track the dependency of scheduling performance on the requested resources
-(user scope) as well as the internal scheduler bin-packing algorithm efficiency (platform scope).*
-
-**Collection scope:**
-
-* Per job - `sla_<job_key>_mtta_ms`
-* Per cluster - `sla_cluster_mtta_ms`
-* Per instance size (small, medium, large, x-large, xx-large). Size are defined in:
-[ResourceAggregates.java](../src/main/java/org/apache/aurora/scheduler/base/ResourceAggregates.java)
-  * By CPU:
-    * `sla_cpu_small_mtta_ms`
-    * `sla_cpu_medium_mtta_ms`
-    * `sla_cpu_large_mtta_ms`
-    * `sla_cpu_xlarge_mtta_ms`
-    * `sla_cpu_xxlarge_mtta_ms`
-  * By RAM:
-    * `sla_ram_small_mtta_ms`
-    * `sla_ram_medium_mtta_ms`
-    * `sla_ram_large_mtta_ms`
-    * `sla_ram_xlarge_mtta_ms`
-    * `sla_ram_xxlarge_mtta_ms`
-  * By DISK:
-    * `sla_disk_small_mtta_ms`
-    * `sla_disk_medium_mtta_ms`
-    * `sla_disk_large_mtta_ms`
-    * `sla_disk_xlarge_mtta_ms`
-    * `sla_disk_xxlarge_mtta_ms`
-
-**Units:** milliseconds
-
-MTTA only considers instances that have already reached ASSIGNED state and ignores those
-that are still PENDING. This ensures straggler instances (e.g. with unreasonable resource
-constraints) do not affect metric curves.
-
-### Median Time To Running (MTTR)
-
-*Median time a job waits for its tasks to reach RUNNING state. This is a comprehensive metric
-reflecting on the overall time it takes for the Aurora/Mesos to start executing user content.*
-
-**Collection scope:**
-
-* Per job - `sla_<job_key>_mttr_ms`
-* Per cluster - `sla_cluster_mttr_ms`
-* Per instance size (small, medium, large, x-large, xx-large). Size are defined in:
-[ResourceAggregates.java](../src/main/java/org/apache/aurora/scheduler/base/ResourceAggregates.java)
-  * By CPU:
-    * `sla_cpu_small_mttr_ms`
-    * `sla_cpu_medium_mttr_ms`
-    * `sla_cpu_large_mttr_ms`
-    * `sla_cpu_xlarge_mttr_ms`
-    * `sla_cpu_xxlarge_mttr_ms`
-  * By RAM:
-    * `sla_ram_small_mttr_ms`
-    * `sla_ram_medium_mttr_ms`
-    * `sla_ram_large_mttr_ms`
-    * `sla_ram_xlarge_mttr_ms`
-    * `sla_ram_xxlarge_mttr_ms`
-  * By DISK:
-    * `sla_disk_small_mttr_ms`
-    * `sla_disk_medium_mttr_ms`
-    * `sla_disk_large_mttr_ms`
-    * `sla_disk_xlarge_mttr_ms`
-    * `sla_disk_xxlarge_mttr_ms`
-
-**Units:** milliseconds
-
-MTTR only considers instances in RUNNING state. This ensures straggler instances (e.g. with
-unreasonable resource constraints) do not affect metric curves.
-
-## Limitations
-
-* The availability of Aurora SLA metrics is bound by the scheduler availability.
-
-* All metrics are calculated at a pre-defined interval (currently set at 1 minute).
-  Scheduler restarts may result in missed collections.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/storage-config.md
----------------------------------------------------------------------
diff --git a/docs/storage-config.md b/docs/storage-config.md
deleted file mode 100644
index 7c64841..0000000
--- a/docs/storage-config.md
+++ /dev/null
@@ -1,153 +0,0 @@
-# Storage Configuration And Maintenance
-
-- [Overview](#overview)
-- [Scheduler storage configuration flags](#scheduler-storage-configuration-flags)
-  - [Mesos replicated log configuration flags](#mesos-replicated-log-configuration-flags)
-    - [-native_log_quorum_size](#-native_log_quorum_size)
-    - [-native_log_file_path](#-native_log_file_path)
-    - [-native_log_zk_group_path](#-native_log_zk_group_path)
-  - [Backup configuration flags](#backup-configuration-flags)
-    - [-backup_interval](#-backup_interval)
-    - [-backup_dir](#-backup_dir)
-    - [-max_saved_backups](#-max_saved_backups)
-- [Recovering from a scheduler backup](#recovering-from-a-scheduler-backup)
-  - [Summary](#summary)
-  - [Preparation](#preparation)
-  - [Cleanup and re-initialize Mesos replicated log](#cleanup-and-re-initialize-mesos-replicated-log)
-  - [Restore from backup](#restore-from-backup)
-  - [Cleanup](#cleanup)
-
-## Overview
-
-This document summarizes Aurora storage configuration and maintenance details and is
-intended for use by anyone deploying and/or maintaining Aurora.
-
-For a high level overview of the Aurora storage architecture refer to [this document](storage.md).
-
-## Scheduler storage configuration flags
-
-Below is a summary of scheduler storage configuration flags that either don't have default values
-or require attention before deploying in a production environment.
-
-### Mesos replicated log configuration flags
-
-#### -native_log_quorum_size
-Defines the Mesos replicated log quorum size. See
-[the replicated log configuration document](deploying-aurora-scheduler.md#replicated-log-configuration)
-on how to choose the right value.
-
-#### -native_log_file_path
-Location of the Mesos replicated log files. Consider allocating a dedicated disk (preferably SSD)
-for Mesos replicated log files to ensure optimal storage performance.
-
-#### -native_log_zk_group_path
-ZooKeeper path used for Mesos replicated log quorum discovery.
-
-See [code](../src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java) for
-other available Mesos replicated log configuration options and default values.
-
-### Backup configuration flags
-
-Configuration options for the Aurora scheduler backup manager.
-
-#### -backup_interval
-The interval on which the scheduler writes local storage backups.  The default is every hour.
-
-#### -backup_dir
-Directory to write backups to.
-
-#### -max_saved_backups
-Maximum number of backups to retain before deleting the oldest backup(s).
-
-## Recovering from a scheduler backup
-
-**Be sure to read the entire page before attempting to restore from a backup, as it may have
-unintended consequences.**
-
-### Summary
-
-The restoration procedure replaces the existing (possibly corrupted) Mesos replicated log with an
-earlier, backed up, version and requires all schedulers to be taken down temporarily while
-restoring. Once completed, the scheduler state resets to what it was when the backup was created.
-This means any jobs/tasks created or updated after the backup are unknown to the scheduler and will
-be killed shortly after the cluster restarts. All other tasks continue operating as normal.
-
-Usually, it is a bad idea to restore a backup that is not extremely recent (i.e. older than a few
-hours). This is because the scheduler will expect the cluster to look exactly as the backup does,
-so any tasks that have been rescheduled since the backup was taken will be killed.
-
-Instructions below have been verified in [Vagrant environment](vagrant.md) and with minor
-syntax/path changes should be applicable to any Aurora cluster.
-
-### Preparation
-
-Follow these steps to prepare the cluster for restoring from a backup:
-
-* Stop all scheduler instances
-
-* Consider blocking external traffic on a port defined in `-http_port` for all schedulers to
-prevent users from interacting with the scheduler during the restoration process. This will help
-troubleshooting by reducing the scheduler log noise and prevent users from making changes that will
-be erased after the backup snapshot is restored.
-
-* Configure `aurora_admin` access to run all commands listed in
-  [Restore from backup](#restore-from-backup) section locally on the leading scheduler:
-  * Make sure the [clusters.json](client-commands.md#cluster-configuration) file configured to
-    access scheduler directly. Set `scheduler_uri` setting and remove `zk`. Since leader can get
-    re-elected during the restore steps, consider doing it on all scheduler replicas.
-  * Depending on your particular security approach you will need to either turn off scheduler
-    authorization by removing scheduler `-http_authentication_mechanism` flag or make sure the
-    direct scheduler access is properly authorized. E.g.: in case of Kerberos you will need to make
-    a `/etc/hosts` file change to match your local IP to the scheduler URL configured in keytabs:
-
-        <local_ip> <scheduler_domain_in_keytabs>
-
-* Next steps are required to put scheduler into a partially disabled state where it would still be
-able to accept storage recovery requests but unable to schedule or change task states. This may be
-accomplished by updating the following scheduler configuration options:
-  * Set `-mesos_master_address` to a non-existent zk address. This will prevent scheduler from
-    registering with Mesos. E.g.: `-mesos_master_address=zk://localhost:1111/mesos/master`
-  * `-max_registration_delay` - set to sufficiently long interval to prevent registration timeout
-    and as a result scheduler suicide. E.g: `-max_registration_delay=360mins`
-  * Make sure `-reconciliation_initial_delay` option is set high enough (e.g.: `365days`) to
-    prevent accidental task GC. This is important as scheduler will attempt to reconcile the cluster
-    state and will kill all tasks when restarted with an empty Mesos replicated log.
-
-* Restart all schedulers
-
-### Cleanup and re-initialize Mesos replicated log
-
-Get rid of the corrupted files and re-initialize Mesos replicated log:
-
-* Stop schedulers
-* Delete all files under `-native_log_file_path` on all schedulers
-* Initialize Mesos replica's log file: `sudo mesos-log initialize --path=<-native_log_file_path>`
-* Start schedulers
-
-### Restore from backup
-
-At this point the scheduler is ready to rehydrate from the backup:
-
-* Identify the leading scheduler by:
-  * examining the `scheduler_lifecycle_LEADER_AWAITING_REGISTRATION` metric at the scheduler
-    `/vars` endpoint. Leader will have 1. All other replicas - 0.
-  * examining scheduler logs
-  * or examining Zookeeper registration under the path defined by `-zk_endpoints`
-    and `-serverset_path`
-
-* Locate the desired backup file, copy it to the leading scheduler's `-backup_dir` folder and stage
-recovery by running the following command on a leader
-`aurora_admin scheduler_stage_recovery --bypass-leader-redirect <cluster> scheduler-backup-<yyyy-MM-dd-HH-mm>`
-
-* At this point, the recovery snapshot is staged and available for manual verification/modification
-via `aurora_admin scheduler_print_recovery_tasks --bypass-leader-redirect` and
-`scheduler_delete_recovery_tasks --bypass-leader-redirect` commands.
-See `aurora_admin help <command>` for usage details.
-
-* Commit recovery. This instructs the scheduler to overwrite the existing Mesos replicated log with
-the provided backup snapshot and initiate a mandatory failover
-`aurora_admin scheduler_commit_recovery --bypass-leader-redirect  <cluster>`
-
-### Cleanup
-Undo any modification done during [Preparation](#preparation) sequence.
-

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/storage.md
----------------------------------------------------------------------
diff --git a/docs/storage.md b/docs/storage.md
deleted file mode 100644
index 6ffed54..0000000
--- a/docs/storage.md
+++ /dev/null
@@ -1,88 +0,0 @@
-#Aurora Scheduler Storage
-
-- [Overview](#overview)
-- [Reads, writes, modifications](#reads-writes-modifications)
-  - [Read lifecycle](#read-lifecycle)
-  - [Write lifecycle](#write-lifecycle)
-- [Atomicity, consistency and isolation](#atomicity-consistency-and-isolation)
-- [Population on restart](#population-on-restart)
-
-## Overview
-
-Aurora scheduler maintains data that need to be persisted to survive failovers and restarts.
-For example:
-
-* Task configurations and scheduled task instances
-* Job update configurations and update progress
-* Production resource quotas
-* Mesos resource offer host attributes
-
-Aurora solves its persistence needs by leveraging the Mesos implementation of a Paxos replicated
-log [[1]](https://ramcloud.stanford.edu/~ongaro/userstudy/paxos.pdf)
-[[2]](http://en.wikipedia.org/wiki/State_machine_replication) with a key-value
-[LevelDB](https://github.com/google/leveldb) storage as persistence media.
-
-Conceptually, it can be represented by the following major components:
-
-* Volatile storage: in-memory cache of all available data. Implemented via in-memory
-[H2 Database](http://www.h2database.com/html/main.html) and accessed via
-[MyBatis](http://mybatis.github.io/mybatis-3/).
-* Log manager: interface between Aurora storage and Mesos replicated log. The default schema format
-is [thrift](https://github.com/apache/thrift). Data is stored in serialized binary form.
-* Snapshot manager: all data is periodically persisted in Mesos replicated log in a single snapshot.
-This helps establishing periodic recovery checkpoints and speeds up volatile storage recovery on
-restart.
-* Backup manager: as a precaution, snapshots are periodically written out into backup files.
-This solves a [disaster recovery problem](storage-config.md#recovering-from-a-scheduler-backup)
-in case of a complete loss or corruption of Mesos log files.
-
-![Storage hierarchy](images/storage_hierarchy.png)
-
-## Reads, writes, modifications
-
-All services in Aurora access data via a set of predefined store interfaces (aka stores) logically
-grouped by the type of data they serve. Every interface defines a specific set of operations allowed
-on the data thus abstracting out the storage access and the actual persistence implementation. The
-latter is especially important in view of a general immutability of persisted data. With the Mesos
-replicated log as the underlying persistence solution, data can be read and written easily but not
-modified. All modifications are simulated by saving new versions of modified objects. This feature
-and general performance considerations justify the existence of the volatile in-memory store.
-
-### Read lifecycle
-
-There are two types of reads available in Aurora: consistent and weakly-consistent. The difference
-is explained [below](#atomicity-and-isolation).
-
-All reads are served from the volatile storage making reads generally cheap storage operations
-from the performance standpoint. The majority of the volatile stores are represented by the
-in-memory H2 database. This allows for rich schema definitions, queries and relationships that
-key-value storage is unable to match.
-
-### Write lifecycle
-
-Writes are more involved operations since in addition to updating the volatile store data has to be
-appended to the replicated log. Data is not available for reads until fully ack-ed by both
-replicated log and volatile storage.
-
-## Atomicity, consistency and isolation
-
-Aurora uses [write-ahead logging](http://en.wikipedia.org/wiki/Write-ahead_logging) to ensure
-consistency between replicated and volatile storage. In Aurora, data is first written into the
-replicated log and only then updated in the volatile store.
-
-Aurora storage uses read-write locks to serialize data mutations and provide consistent view of the
-available data. The available `Storage` interface exposes 3 major types of operations:
-* `consistentRead` - access is locked using reader's lock and provides consistent view on read
-* `weaklyConsistentRead` - access is lock-less. Delivers best contention performance but may result
-in stale reads
-* `write` - access is fully serialized by using writer's lock. Operation success requires both
-volatile and replicated writes to succeed.
-
-The consistency of the volatile store is enforced via H2 transactional isolation.
-
-## Population on restart
-
-Any time a scheduler restarts, it restores its volatile state from the most recent position recorded
-in the replicated log by restoring the snapshot and replaying individual log entries on top to fully
-recover the state up to the last write.
-

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/task-lifecycle.md
----------------------------------------------------------------------
diff --git a/docs/task-lifecycle.md b/docs/task-lifecycle.md
deleted file mode 100644
index 5d6456c..0000000
--- a/docs/task-lifecycle.md
+++ /dev/null
@@ -1,146 +0,0 @@
-# Task Lifecycle
-
-When Aurora reads a configuration file and finds a `Job` definition, it:
-
-1.  Evaluates the `Job` definition.
-2.  Splits the `Job` into its constituent `Task`s.
-3.  Sends those `Task`s to the scheduler.
-4.  The scheduler puts the `Task`s into `PENDING` state, starting each
-    `Task`'s life cycle.
-
-
-![Life of a task](images/lifeofatask.png)
-
-Please note, a couple of task states described below are missing from
-this state diagram.
-
-
-## PENDING to RUNNING states
-
-When a `Task` is in the `PENDING` state, the scheduler constantly
-searches for machines satisfying that `Task`'s resource request
-requirements (RAM, disk space, CPU time) while maintaining configuration
-constraints such as "a `Task` must run on machines  dedicated  to a
-particular role" or attribute limit constraints such as "at most 2
-`Task`s from the same `Job` may run on each rack". When the scheduler
-finds a suitable match, it assigns the `Task` to a machine and puts the
-`Task` into the `ASSIGNED` state.
-
-From the `ASSIGNED` state, the scheduler sends an RPC to the slave
-machine containing `Task` configuration, which the slave uses to spawn
-an executor responsible for the `Task`'s lifecycle. When the scheduler
-receives an acknowledgment that the machine has accepted the `Task`,
-the `Task` goes into `STARTING` state.
-
-`STARTING` state initializes a `Task` sandbox. When the sandbox is fully
-initialized, Thermos begins to invoke `Process`es. Also, the slave
-machine sends an update to the scheduler that the `Task` is
-in `RUNNING` state.
-
-
-
-## RUNNING to terminal states
-
-There are various ways that an active `Task` can transition into a terminal
-state. By definition, it can never leave this state. However, depending on
-nature of the termination and the originating `Job` definition
-(e.g. `service`, `max_task_failures`), a replacement `Task` might be
-scheduled.
-
-### Natural Termination: FINISHED, FAILED
-
-A `RUNNING` `Task` can terminate without direct user interaction. For
-example, it may be a finite computation that finishes, even something as
-simple as `echo hello world.`, or it could be an exceptional condition in
-a long-lived service. If the `Task` is successful (its underlying
-processes have succeeded with exit status `0` or finished without
-reaching failure limits) it moves into `FINISHED` state. If it finished
-after reaching a set of failure limits, it goes into `FAILED` state.
-
-A terminated `TASK` which is subject to rescheduling will be temporarily
-`THROTTLED`, if it is considered to be flapping. A task is flapping, if its
-previous invocation was terminated after less than 5 minutes (scheduler
-default). The time penalty a task has to remain in the `THROTTLED` state,
-before it is eligible for rescheduling, increases with each consecutive
-failure.
-
-### Forceful Termination: KILLING, RESTARTING
-
-You can terminate a `Task` by issuing an `aurora job kill` command, which
-moves it into `KILLING` state. The scheduler then sends the slave a
-request to terminate the `Task`. If the scheduler receives a successful
-response, it moves the Task into `KILLED` state and never restarts it.
-
-If a `Task` is forced into the `RESTARTING` state via the `aurora job restart`
-command, the scheduler kills the underlying task but in parallel schedules
-an identical replacement for it.
-
-In any case, the responsible executor on the slave follows an escalation
-sequence when killing a running task:
-
-  1. If a `HttpLifecycleConfig` is not present, skip to (4).
-  2. Send a POST to the `graceful_shutdown_endpoint` and wait 5 seconds.
-  3. Send a POST to the `shutdown_endpoint` and wait 5 seconds.
-  4. Send SIGTERM (`kill`) and wait at most `finalization_wait` seconds.
-  5. Send SIGKILL (`kill -9`).
-
-If the executor notices that all `Process`es in a `Task` have aborted
-during this sequence, it will not proceed with subsequent steps.
-Note that graceful shutdown is best-effort, and due to the many
-inevitable realities of distributed systems, it may not be performed.
-
-### Unexpected Termination: LOST
-
-If a `Task` stays in a transient task state for too long (such as `ASSIGNED`
-or `STARTING`), the scheduler forces it into `LOST` state, creating a new
-`Task` in its place that's sent into `PENDING` state.
-
-In addition, if the Mesos core tells the scheduler that a slave has
-become unhealthy (or outright disappeared), the `Task`s assigned to that
-slave go into `LOST` state and new `Task`s are created in their place.
-From `PENDING` state, there is no guarantee a `Task` will be reassigned
-to the same machine unless job constraints explicitly force it there.
-
-### Giving Priority to Production Tasks: PREEMPTING
-
-Sometimes a Task needs to be interrupted, such as when a non-production
-Task's resources are needed by a higher priority production Task. This
-type of interruption is called a *pre-emption*. When this happens in
-Aurora, the non-production Task is killed and moved into
-the `PREEMPTING` state  when both the following are true:
-
-- The task being killed is a non-production task.
-- The other task is a `PENDING` production task that hasn't been
-  scheduled due to a lack of resources.
-
-The scheduler UI shows the non-production task was preempted in favor of
-the production task. At some point, tasks in `PREEMPTING` move to `KILLED`.
-
-Note that non-production tasks consuming many resources are likely to be
-preempted in favor of production tasks.
-
-### Making Room for Maintenance: DRAINING
-
-Cluster operators can set slave into maintenance mode. This will transition
-all `Task` running on this slave into `DRAINING` and eventually to `KILLED`.
-Drained `Task`s will be restarted on other slaves for which no maintenance
-has been announced yet.
-
-
-
-## State Reconciliation
-
-Due to the many inevitable realities of distributed systems, there might
-be a mismatch of perceived and actual cluster state (e.g. a machine returns
-from a `netsplit` but the scheduler has already marked all its `Task`s as
-`LOST` and rescheduled them).
-
-Aurora regularly runs a state reconciliation process in order to detect
-and correct such issues (e.g. by killing the errant `RUNNING` tasks).
-By default, the proper detection of all failure scenarios and inconsistencies
-may take up to an hour.
-
-To emphasize this point: there is no uniqueness guarantee for a single
-instance of a job in the presence of network partitions. If the `Task`
-requires that, it should be baked in at the application level using a
-distributed coordination service such as Zookeeper.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/test-resource-generation.md
----------------------------------------------------------------------
diff --git a/docs/test-resource-generation.md b/docs/test-resource-generation.md
deleted file mode 100644
index e78e742..0000000
--- a/docs/test-resource-generation.md
+++ /dev/null
@@ -1,24 +0,0 @@
-# Generating test resources
-
-## Background
-The Aurora source repository and distributions contain several
-[binary files](../src/test/resources/org/apache/thermos/root/checkpoints) to
-qualify the backwards-compatibility of thermos with checkpoint data. Since
-thermos persists state to disk, to be read by the thermos observer), it is important that we have
-tests that prevent regressions affecting the ability to parse previously-written data.
-
-## Generating test files
-The files included represent persisted checkpoints that exercise different
-features of thermos. The existing files should not be modified unless
-we are accepting backwards incompatibility, such as with a major release.
-
-It is not practical to write source code to generate these files on the fly,
-as source would be vulnerable to drift (e.g. due to refactoring) in ways
-that would undermine the goal of ensuring backwards compatibility.
-
-The most common reason to add a new checkpoint file would be to provide
-coverage for new thermos features that alter the data format. This is
-accomplished by writing and running a
-[job configuration](configuration-reference.md) that exercises the feature, and
-copying the checkpoint file from the sandbox directory, by default this is
-`/var/run/thermos/checkpoints/<aurora task id>`.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/thrift-deprecation.md
----------------------------------------------------------------------
diff --git a/docs/thrift-deprecation.md b/docs/thrift-deprecation.md
deleted file mode 100644
index 62a71bc..0000000
--- a/docs/thrift-deprecation.md
+++ /dev/null
@@ -1,54 +0,0 @@
-# Thrift API Changes
-
-## Overview
-Aurora uses [Apache Thrift](https://thrift.apache.org/) for representing structured data in
-client/server RPC protocol as well as for internal data storage. While Thrift is capable of
-correctly handling additions and renames of the existing members, field removals must be done
-carefully to ensure backwards compatibility and provide predictable deprecation cycle. This
-document describes general guidelines for making Thrift schema changes to the existing fields in
-[api.thrift](../api/src/main/thrift/org/apache/aurora/gen/api.thrift).
-
-It is highly recommended to go through the
-[Thrift: The Missing Guide](http://diwakergupta.github.io/thrift-missing-guide/) first to refresh on
-basic Thrift schema concepts.
-
-## Checklist
-Every existing Thrift schema modification is unique in its requirements and must be analyzed
-carefully to identify its scope and expected consequences. The following checklist may help in that
-analysis:
-* Is this a new field/struct? If yes, go ahead
-* Is this a pure field/struct rename without any type/structure change? If yes, go ahead and rename
-* Anything else, read further to make sure your change is properly planned
-
-## Deprecation cycle
-Any time a breaking change (e.g.: field replacement or removal) is required, the following cycle
-must be followed:
-
-### vCurrent
-Change is applied in a way that does not break scheduler/client with this version to
-communicate with scheduler/client from vCurrent-1.
-* Do not remove or rename the old field
-* Add a new field as an eventual replacement of the old one and implement a dual read/write
-anywhere the old field is used. If a thrift struct is mapped in the DB store make sure both columns
-are marked as `NOT NULL`
-* Check [storage.thrift](../api/src/main/thrift/org/apache/aurora/gen/storage.thrift) to see if the
-affected struct is stored in Aurora scheduler storage. If so, you most likely need to backfill
-existing data to ensure both fields are populated eagerly on startup. See
-[this patch](https://reviews.apache.org/r/43172) as a real-life example of thrift-struct
-backfilling. IMPORTANT: backfilling implementation needs to ensure both fields are populated. This
-is critical to enable graceful scheduler upgrade as well as rollback to the old version if needed.
-* Add a deprecation jira ticket into the vCurrent+1 release candidate
-* Add a TODO for the deprecated field mentioning the jira ticket
-
-### vCurrent+1
-Finalize the change by removing the deprecated fields from the Thrift schema.
-* Drop any dual read/write routines added in the previous version
-* Remove thrift backfilling in scheduler
-* Remove the deprecated Thrift field
-
-## Testing
-It's always advisable to test your changes in the local vagrant environment to build more
-confidence that you change is backwards compatible. It's easy to simulate different
-client/scheduler versions by playing with `aurorabuild` command. See [this document](vagrant.md)
-for more.
-

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/tools.md
----------------------------------------------------------------------
diff --git a/docs/tools.md b/docs/tools.md
deleted file mode 100644
index 2ae550d..0000000
--- a/docs/tools.md
+++ /dev/null
@@ -1,16 +0,0 @@
-# Tools
-
-Various tools integrate with Aurora. Is there a tool missing? Let us know, or submit a patch to add it!
-
-* Load-balacing technology used to direct traffic to services running on Aurora
-  - [synapse](https://github.com/airbnb/synapse) based on HAProxy
-  - [aurproxy](https://github.com/tellapart/aurproxy) based on nginx
-  - [jobhopper](https://github.com/benley/aurora-jobhopper) performing HTTP redirects for easy developers and administor access
-
-* Monitoring
-  - [collectd-aurora](https://github.com/zircote/collectd-aurora) for cluster monitoring using collectd
-  - [Prometheus Aurora exporter](https://github.com/tommyulfsparre/aurora_exporter) for cluster monitoring using Prometheus
-  - [Prometheus service discovery integration](http://prometheus.io/docs/operating/configuration/#zookeeper-serverset-sd-configurations-serverset_sd_config) for discovering and monitoring services running on Aurora
-
-* Packaging and deployment
-  - [aurora-packaging](https://github.com/apache/aurora-packaging), the source of the official Aurora packaes

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/tutorial.md
----------------------------------------------------------------------
diff --git a/docs/tutorial.md b/docs/tutorial.md
deleted file mode 100644
index 95539ef..0000000
--- a/docs/tutorial.md
+++ /dev/null
@@ -1,260 +0,0 @@
-# Aurora Tutorial
-
-This tutorial shows how to use the Aurora scheduler to run (and "`printf-debug`")
-a hello world program on Mesos. This is the recommended document for new Aurora users
-to start getting up to speed on the system.
-
-- [Prerequisite](#setup-install-aurora)
-- [The Script](#the-script)
-- [Aurora Configuration](#aurora-configuration)
-- [Creating the Job](#creating-the-job)
-- [Watching the Job Run](#watching-the-job-run)
-- [Cleanup](#cleanup)
-- [Next Steps](#next-steps)
-
-
-## Prerequisite
-
-This tutorial assumes you are running [Aurora locally using Vagrant](vagrant.md).
-However, in general the instructions are also applicable to any other
-[Aurora installation](installing.md).
-
-Unless otherwise stated, all commands are to be run from the root of the aurora
-repository clone.
-
-
-## The Script
-
-Our "hello world" application is a simple Python script that loops
-forever, displaying the time every few seconds. Copy the code below and
-put it in a file named `hello_world.py` in the root of your Aurora repository clone
-(Note: this directory is the same as `/vagrant` inside the Vagrant VMs).
-
-The script has an intentional bug, which we will explain later on.
-
-<!-- NOTE: If you are changing this file, be sure to also update examples/vagrant/test_tutorial.sh.
--->
-```python
-import time
-
-def main():
-  SLEEP_DELAY = 10
-  # Python ninjas - ignore this blatant bug.
-  for i in xrang(100):
-    print("Hello world! The time is now: %s. Sleeping for %d secs" % (
-      time.asctime(), SLEEP_DELAY))
-    time.sleep(SLEEP_DELAY)
-
-if __name__ == "__main__":
-  main()
-```
-
-## Aurora Configuration
-
-Once we have our script/program, we need to create a *configuration
-file* that tells Aurora how to manage and launch our Job. Save the below
-code in the file `hello_world.aurora`.
-
-<!-- NOTE: If you are changing this file, be sure to also update examples/vagrant/test_tutorial.sh.
--->
-```python
-pkg_path = '/vagrant/hello_world.py'
-
-# we use a trick here to make the configuration change with
-# the contents of the file, for simplicity.  in a normal setting, packages would be
-# versioned, and the version number would be changed in the configuration.
-import hashlib
-with open(pkg_path, 'rb') as f:
-  pkg_checksum = hashlib.md5(f.read()).hexdigest()
-
-# copy hello_world.py into the local sandbox
-install = Process(
-  name = 'fetch_package',
-  cmdline = 'cp %s . && echo %s && chmod +x hello_world.py' % (pkg_path, pkg_checksum))
-
-# run the script
-hello_world = Process(
-  name = 'hello_world',
-  cmdline = 'python -u hello_world.py')
-
-# describe the task
-hello_world_task = SequentialTask(
-  processes = [install, hello_world],
-  resources = Resources(cpu = 1, ram = 1*MB, disk=8*MB))
-
-jobs = [
-  Service(cluster = 'devcluster',
-          environment = 'devel',
-          role = 'www-data',
-          name = 'hello_world',
-          task = hello_world_task)
-]
-```
-
-There is a lot going on in that configuration file:
-
-1. From a "big picture" viewpoint, it first defines two
-Processes. Then it defines a Task that runs the two Processes in the
-order specified in the Task definition, as well as specifying what
-computational and memory resources are available for them.  Finally,
-it defines a Job that will schedule the Task on available and suitable
-machines. This Job is the sole member of a list of Jobs; you can
-specify more than one Job in a config file.
-
-2. At the Process level, it specifies how to get your code into the
-local sandbox in which it will run. It then specifies how the code is
-actually run once the second Process starts.
-
-For more about Aurora configuration files, see the [Configuration
-Tutorial](configuration-tutorial.md) and the [Aurora + Thermos
-Reference](configuration-reference.md) (preferably after finishing this
-tutorial).
-
-
-## Creating the Job
-
-We're ready to launch our job! To do so, we use the Aurora Client to
-issue a Job creation request to the Aurora scheduler.
-
-Many Aurora Client commands take a *job key* argument, which uniquely
-identifies a Job. A job key consists of four parts, each separated by a
-"/". The four parts are  `<cluster>/<role>/<environment>/<jobname>`
-in that order:
-
-* Cluster refers to the name of a particular Aurora installation.
-* Role names are user accounts existing on the slave machines. If you
-don't know what accounts are available, contact your sysadmin.
-* Environment names are namespaces; you can count on `test`, `devel`,
-`staging` and `prod` existing.
-* Jobname is the custom name of your job.
-
-When comparing two job keys, if any of the four parts is different from
-its counterpart in the other key, then the two job keys identify two separate
-jobs. If all four values are identical, the job keys identify the same job.
-
-The `clusters.json` [client configuration](client-cluster-configuration.md)
-for the Aurora scheduler defines the available cluster names.
-For Vagrant, from the top-level of your Aurora repository clone, do:
-
-    $ vagrant ssh
-
-Followed by:
-
-    vagrant@aurora:~$ cat /etc/aurora/clusters.json
-
-You'll see something like the following. The `name` value shown here, corresponds to a job key's cluster value.
-
-```javascript
-[{
-  "name": "devcluster",
-  "zk": "192.168.33.7",
-  "scheduler_zk_path": "/aurora/scheduler",
-  "auth_mechanism": "UNAUTHENTICATED",
-  "slave_run_directory": "latest",
-  "slave_root": "/var/lib/mesos"
-}]
-```
-
-The Aurora Client command that actually runs our Job is `aurora job create`. It creates a Job as
-specified by its job key and configuration file arguments and runs it.
-
-    aurora job create <cluster>/<role>/<environment>/<jobname> <config_file>
-
-Or for our example:
-
-    aurora job create devcluster/www-data/devel/hello_world /vagrant/hello_world.aurora
-
-After entering our virtual machine using `vagrant ssh`, this returns:
-
-    vagrant@aurora:~$ aurora job create devcluster/www-data/devel/hello_world /vagrant/hello_world.aurora
-     INFO] Creating job hello_world
-     INFO] Checking status of devcluster/www-data/devel/hello_world
-    Job create succeeded: job url=http://aurora.local:8081/scheduler/www-data/devel/hello_world
-
-
-## Watching the Job Run
-
-Now that our job is running, let's see what it's doing. Access the
-scheduler web interface at `http://$scheduler_hostname:$scheduler_port/scheduler`
-Or when using `vagrant`, `http://192.168.33.7:8081/scheduler`
-First we see what Jobs are scheduled:
-
-![Scheduled Jobs](images/ScheduledJobs.png)
-
-Click on your user name, which in this case was `www-data`, and we see the Jobs associated
-with that role:
-
-![Role Jobs](images/RoleJobs.png)
-
-If you click on your `hello_world` Job, you'll see:
-
-![hello_world Job](images/HelloWorldJob.png)
-
-Oops, looks like our first job didn't quite work! The task is temporarily throttled for
-having failed on every attempt of the Aurora scheduler to run it. We have to figure out
-what is going wrong.
-
-On the Completed tasks tab, we see all past attempts of the Aurora scheduler to run our job.
-
-![Completed tasks tab](images/CompletedTasks.png)
-
-We can navigate to the Task page of a failed run by clicking on the host link.
-
-![Task page](images/TaskBreakdown.png)
-
-Once there, we see that the `hello_world` process failed. The Task page
-captures the standard error and standard output streams and makes them available.
-Clicking through to `stderr` on the failed `hello_world` process, we see what happened.
-
-![stderr page](images/stderr.png)
-
-It looks like we made a typo in our Python script. We wanted `xrange`,
-not `xrang`. Edit the `hello_world.py` script to use the correct function
-and save it as `hello_world_v2.py`. Then update the `hello_world.aurora`
-configuration to the newest version.
-
-In order to try again, we can now instruct the scheduler to update our job:
-
-    vagrant@aurora:~$ aurora update start devcluster/www-data/devel/hello_world /vagrant/hello_world.aurora
-     INFO] Starting update for: hello_world
-    Job update has started. View your update progress at http://aurora.local:8081/scheduler/www-data/devel/hello_world/update/8ef38017-e60f-400d-a2f2-b5a8b724e95b
-
-This time, the task comes up.
-
-![Running Job](images/RunningJob.png)
-
-By again clicking on the host, we inspect the Task page, and see that the
-`hello_world` process is running.
-
-![Running Task page](images/runningtask.png)
-
-We then inspect the output by clicking on `stdout` and see our process'
-output:
-
-![stdout page](images/stdout.png)
-
-## Cleanup
-
-Now that we're done, we kill the job using the Aurora client:
-
-    vagrant@aurora:~$ aurora job killall devcluster/www-data/devel/hello_world
-     INFO] Killing tasks for job: devcluster/www-data/devel/hello_world
-     INFO] Instances to be killed: [0]
-    Successfully killed instances [0]
-    Job killall succeeded
-
-The job page now shows the `hello_world` tasks as completed.
-
-![Killed Task page](images/killedtask.png)
-
-## Next Steps
-
-Now that you've finished this Tutorial, you should read or do the following:
-
-- [The Aurora Configuration Tutorial](configuration-tutorial.md), which provides more examples
-  and best practices for writing Aurora configurations. You should also look at
-  the [Aurora + Thermos Configuration Reference](configuration-reference.md).
-- The [Aurora User Guide](user-guide.md) provides an overview of how Aurora, Mesos, and
-  Thermos work "under the hood".
-- Explore the Aurora Client - use `aurora -h`, and read the
-  [Aurora Client Commands](client-commands.md) document.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/user-guide.md
----------------------------------------------------------------------
diff --git a/docs/user-guide.md b/docs/user-guide.md
deleted file mode 100644
index 656296c..0000000
--- a/docs/user-guide.md
+++ /dev/null
@@ -1,244 +0,0 @@
-Aurora User Guide
------------------
-
-- [Overview](#user-content-overview)
-- [Job Lifecycle](#user-content-job-lifecycle)
-	- [Task Updates](#user-content-task-updates)
-	- [HTTP Health Checking](#user-content-http-health-checking)
-- [Service Discovery](#user-content-service-discovery)
-- [Configuration](#user-content-configuration)
-- [Creating Jobs](#user-content-creating-jobs)
-- [Interacting With Jobs](#user-content-interacting-with-jobs)
-
-Overview
---------
-
-This document gives an overview of how Aurora works under the hood.
-It assumes you've already worked through the "hello world" example
-job in the [Aurora Tutorial](tutorial.md). Specifics of how to use Aurora are **not**
- given here, but pointers to documentation about how to use Aurora are
-provided.
-
-Aurora is a Mesos framework used to schedule *jobs* onto Mesos. Mesos
-cares about individual *tasks*, but typical jobs consist of dozens or
-hundreds of task replicas. Aurora provides a layer on top of Mesos with
-its `Job` abstraction. An Aurora `Job` consists of a task template and
-instructions for creating near-identical replicas of that task (modulo
-things like "instance id" or specific port numbers which may differ from
-machine to machine).
-
-How many tasks make up a Job is complicated. On a basic level, a Job consists of
-one task template and instructions for creating near-idential replicas of that task
-(otherwise referred to as "instances" or "shards").
-
-However, since Jobs can be updated on the fly, a single Job identifier or *job key*
-can have multiple job configurations associated with it.
-
-For example, consider when I have a Job with 4 instances that each
-request 1 core of cpu, 1 GB of RAM, and 1 GB of disk space as specified
-in the configuration file `hello_world.aurora`. I want to
-update it so it requests 2 GB of RAM instead of 1. I create a new
-configuration file to do that called `new_hello_world.aurora` and
-issue a `aurora update start <job_key_value>/0-1 new_hello_world.aurora`
-command.
-
-This results in instances 0 and 1 having 1 cpu, 2 GB of RAM, and 1 GB of disk space,
-while instances 2 and 3 have 1 cpu, 1 GB of RAM, and 1 GB of disk space. If instance 3
-dies and restarts, it restarts with 1 cpu, 1 GB RAM, and 1 GB disk space.
-
-So that means there are two simultaneous task configurations for the same Job
-at the same time, just valid for different ranges of instances.
-
-This isn't a recommended pattern, but it is valid and supported by the
-Aurora scheduler. This most often manifests in the "canary pattern" where
-instance 0 runs with a different configuration than instances 1-N to test
-different code versions alongside the actual production job.
-
-A task can merely be a single *process* corresponding to a single
-command line, such as `python2.6 my_script.py`. However, a task can also
-consist of many separate processes, which all run within a single
-sandbox. For example, running multiple cooperating agents together,
-such as `logrotate`, `installer`, master, or slave processes. This is
-where Thermos  comes in. While Aurora provides a `Job` abstraction on
-top of Mesos `Tasks`, Thermos provides a `Process` abstraction
-underneath Mesos `Task`s and serves as part of the Aurora framework's
-executor.
-
-You define `Job`s,` Task`s, and `Process`es in a configuration file.
-Configuration files are written in Python, and make use of the Pystachio
-templating language. They end in a `.aurora` extension.
-
-Pystachio is a type-checked dictionary templating library.
-
-> TL;DR
->
-> -   Aurora manages jobs made of tasks.
-> -   Mesos manages tasks made of processes.
-> -   Thermos manages processes.
-> -   All defined in `.aurora` configuration file.
-
-![Aurora hierarchy](images/aurora_hierarchy.png)
-
-Each `Task` has a *sandbox* created when the `Task` starts and garbage
-collected when it finishes. All of a `Task'`s processes run in its
-sandbox, so processes can share state by using a shared current working
-directory.
-
-The sandbox garbage collection policy considers many factors, most
-importantly age and size. It makes a best-effort attempt to keep
-sandboxes around as long as possible post-task in order for service
-owners to inspect data and logs, should the `Task` have completed
-abnormally. But you can't design your applications assuming sandboxes
-will be around forever, e.g. by building log saving or other
-checkpointing mechanisms directly into your application or into your
-`Job` description.
-
-
-Job Lifecycle
--------------
-
-`Job`s and their `Task`s have various states that are described in the [Task Lifecycle](task-lifecycle.md).
-However, in day to day use, you'll be primarily concerned with launching new jobs and updating existing ones.
-
-
-### Task Updates
-
-`Job` configurations can be updated at any point in their lifecycle.
-Usually updates are done incrementally using a process called a *rolling
-upgrade*, in which Tasks are upgraded in small groups, one group at a
-time.  Updates are done using various Aurora Client commands.
-
-For a configuration update, the Aurora Client calculates required changes
-by examining the current job config state and the new desired job config.
-It then starts a rolling batched update process by going through every batch
-and performing these operations:
-
-- If an instance is present in the scheduler but isn't in the new config,
-  then that instance is killed.
-- If an instance is not present in the scheduler but is present in
-  the new config, then the instance is created.
-- If an instance is present in both the scheduler and the new config, then
-  the client diffs both task configs. If it detects any changes, it
-  performs an instance update by killing the old config instance and adds
-  the new config instance.
-
-The Aurora client continues through the instance list until all tasks are
-updated, in `RUNNING,` and healthy for a configurable amount of time.
-If the client determines the update is not going well (a percentage of health
-checks have failed), it cancels the update.
-
-Update cancellation runs a procedure similar to the described above
-update sequence, but in reverse order. New instance configs are swapped
-with old instance configs and batch updates proceed backwards
-from the point where the update failed. E.g.; (0,1,2) (3,4,5) (6,7,
-8-FAIL) results in a rollback in order (8,7,6) (5,4,3) (2,1,0).
-
-### HTTP Health Checking
-
-The Executor implements a protocol for rudimentary control of a task via HTTP.  Tasks subscribe for
-this protocol by declaring a port named `health`.  Take for example this configuration snippet:
-
-    nginx = Process(
-      name = 'nginx',
-      cmdline = './run_nginx.sh -port {{thermos.ports[health]}}')
-
-When this Process is included in a job, the job will be allocated a port, and the command line
-will be replaced with something like:
-
-    ./run_nginx.sh -port 42816
-
-Where 42816 happens to be the allocated. port.  Typically, the Executor monitors Processes within
-a task only by liveness of the forked process.  However, when a `health` port was allocated, it will
-also send periodic HTTP health checks.  A task requesting a `health` port must handle the following
-requests:
-
-| HTTP request            | Description                             |
-| ------------            | -----------                             |
-| `GET /health`           | Inquires whether the task is healthy.   |
-
-Please see the
-[configuration reference](configuration-reference.md#user-content-healthcheckconfig-objects) for
-configuration options for this feature.
-
-#### Snoozing Health Checks
-
-If you need to pause your health check, you can do so by touching a file inside of your sandbox,
-named `.healthchecksnooze`
-
-As long as that file is present, health checks will be disabled, enabling users to gather core dumps
-or other performance measurements without worrying about Aurora's health check killing their
-process.
-
-WARNING: Remember to remove this when you are done, otherwise your instance will have permanently
-disabled health checks.
-
-
-Configuration
--------------
-
-You define and configure your Jobs (and their Tasks and Processes) in
-Aurora configuration files. Their filenames end with the `.aurora`
-suffix, and you write them in Python making use of the Pystachio
-templating language, along
-with specific Aurora, Mesos, and Thermos commands and methods. See the
-[Configuration Guide and Reference](configuration-reference.md) and
-[Configuration Tutorial](configuration-tutorial.md).
-
-Service Discovery
------------------
-
-It is possible for the Aurora executor to announce tasks into ServerSets for
-the purpose of service discovery.  ServerSets use the Zookeeper [group membership pattern](http://zookeeper.apache.org/doc/trunk/recipes.html#sc_outOfTheBox)
-of which there are several reference implementations:
-
-  - [C++](https://github.com/apache/mesos/blob/master/src/zookeeper/group.cpp)
-  - [Java](https://github.com/twitter/commons/blob/master/src/java/com/twitter/common/zookeeper/ServerSetImpl.java#L221)
-  - [Python](https://github.com/twitter/commons/blob/master/src/python/twitter/common/zookeeper/serverset/serverset.py#L51)
-
-These can also be used natively in Finagle using the [ZookeeperServerSetCluster](https://github.com/twitter/finagle/blob/master/finagle-serversets/src/main/scala/com/twitter/finagle/zookeeper/ZookeeperServerSetCluster.scala).
-
-For more information about how to configure announcing, see the [Configuration Reference](configuration-reference.md).
-
-Creating Jobs
--------------
-
-You create and manipulate Aurora Jobs with the Aurora client, which starts all its
-command line commands with
-`aurora`. See [Aurora Client Commands](client-commands.md) for details
-about the Aurora Client.
-
-Interacting With Jobs
----------------------
-
-You interact with Aurora jobs either via:
-
-- Read-only Web UIs
-
-  Part of the output from creating a new Job is a URL for the Job's scheduler UI page.
-
-  For example:
-
-      vagrant@precise64:~$ aurora job create devcluster/www-data/prod/hello \
-      /vagrant/examples/jobs/hello_world.aurora
-      INFO] Creating job hello
-      INFO] Response from scheduler: OK (message: 1 new tasks pending for job www-data/prod/hello)
-      INFO] Job url: http://precise64:8081/scheduler/www-data/prod/hello
-
-  The "Job url" goes to the Job's scheduler UI page. To go to the overall scheduler UI page,
-  stop at the "scheduler" part of the URL, in this case, `http://precise64:8081/scheduler`
-
-  You can also reach the scheduler UI page via the Client command `aurora job open`:
-
-      aurora job open [<cluster>[/<role>[/<env>/<job_name>]]]
-
-  If only the cluster is specified, it goes directly to that cluster's scheduler main page.
-  If the role is specified, it goes to the top-level role page. If the full job key is specified,
-  it goes directly to the job page where you can inspect individual tasks.
-
-  Once you click through to a role page, you see Jobs arranged separately by pending jobs, active
-  jobs, and finished jobs. Jobs are arranged by role, typically a service account for production
-  jobs and user accounts for test or development jobs.
-
-- The Aurora client
-
-  See [client commands](client-commands.md).

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/vagrant.md
----------------------------------------------------------------------
diff --git a/docs/vagrant.md b/docs/vagrant.md
deleted file mode 100644
index 3bc201f..0000000
--- a/docs/vagrant.md
+++ /dev/null
@@ -1,137 +0,0 @@
-Getting Started
-===============
-
-This document shows you how to configure a complete cluster using a virtual machine. This setup
-replicates a real cluster in your development machine as closely as possible. After you complete
-the steps outlined here, you will be ready to create and run your first Aurora job.
-
-The following sections describe these steps in detail:
-
-1. [Overview](#user-content-overview)
-1. [Install VirtualBox and Vagrant](#user-content-install-virtualbox-and-vagrant)
-1. [Clone the Aurora repository](#user-content-clone-the-aurora-repository)
-1. [Start the local cluster](#user-content-start-the-local-cluster)
-1. [Log onto the VM](#user-content-log-onto-the-vm)
-1. [Run your first job](#user-content-run-your-first-job)
-1. [Rebuild components](#user-content-rebuild-components)
-1. [Shut down or delete your local cluster](#user-content-shut-down-or-delete-your-local-cluster)
-1. [Troubleshooting](#user-content-troubleshooting)
-
-
-Overview
---------
-
-The Aurora distribution includes a set of scripts that enable you to create a local cluster in
-your development machine. These scripts use [Vagrant](https://www.vagrantup.com/) and
-[VirtualBox](https://www.virtualbox.org/) to run and configure a virtual machine. Once the
-virtual machine is running, the scripts install and initialize Aurora and any required components
-to create the local cluster.
-
-
-Install VirtualBox and Vagrant
-------------------------------
-
-First, download and install [VirtualBox](https://www.virtualbox.org/) on your development machine.
-
-Then download and install [Vagrant](https://www.vagrantup.com/). To verify that the installation
-was successful, open a terminal window and type the `vagrant` command. You should see a list of
-common commands for this tool.
-
-
-Clone the Aurora repository
----------------------------
-
-To obtain the Aurora source distribution, clone its Git repository using the following command:
-
-     git clone git://git.apache.org/aurora.git
-
-
-Start the local cluster
------------------------
-
-Now change into the `aurora/` directory, which contains the Aurora source code and
-other scripts and tools:
-
-     cd aurora/
-
-To start the local cluster, type the following command:
-
-     vagrant up
-
-This command uses the configuration scripts in the Aurora distribution to:
-
-* Download a Linux system image.
-* Start a virtual machine (VM) and configure it.
-* Install the required build tools on the VM.
-* Install Aurora's requirements (like [Mesos](http://mesos.apache.org/) and
-[Zookeeper](http://zookeeper.apache.org/)) on the VM.
-* Build and install Aurora from source on the VM.
-* Start Aurora's services on the VM.
-
-This process takes several minutes to complete.
-
-To verify that Aurora is running on the cluster, visit the following URLs:
-
-* Scheduler - http://192.168.33.7:8081
-* Observer - http://192.168.33.7:1338
-* Mesos Master - http://192.168.33.7:5050
-* Mesos Slave - http://192.168.33.7:5051
-
-
-Log onto the VM
----------------
-
-To SSH into the VM, run the following command in your development machine:
-
-     vagrant ssh
-
-To verify that Aurora is installed in the VM, type the `aurora` command. You should see a list
-of arguments and possible commands.
-
-The `/vagrant` directory on the VM is mapped to the `aurora/` local directory
-from which you started the cluster. You can edit files inside this directory in your development
-machine and access them from the VM under `/vagrant`.
-
-A pre-installed `clusters.json` file refers to your local cluster as `devcluster`, which you
-will use in client commands.
-
-
-Run your first job
-------------------
-
-Now that your cluster is up and running, you are ready to define and run your first job in Aurora.
-For more information, see the [Aurora Tutorial](tutorial.md).
-
-
-Rebuild components
-------------------
-
-If you are changing Aurora code and would like to rebuild a component, you can use the `aurorabuild`
-command on the VM to build and restart a component.  This is considerably faster than destroying
-and rebuilding your VM.
-
-`aurorabuild` accepts a list of components to build and update. To get a list of supported
-components, invoke the `aurorabuild` command with no arguments:
-
-     vagrant ssh -c 'aurorabuild client'
-
-
-Shut down or delete your local cluster
---------------------------------------
-
-To shut down your local cluster, run the `vagrant halt` command in your development machine. To
-start it again, run the `vagrant up` command.
-
-Once you are finished with your local cluster, or if you would otherwise like to start from scratch,
-you can use the command `vagrant destroy` to turn off and delete the virtual file system.
-
-
-Troubleshooting
----------------
-
-Most of the vagrant related problems can be fixed by the following steps:
-
-* Destroying the vagrant environment with `vagrant destroy`
-* Killing any orphaned VMs (see AURORA-499) with `virtualbox` UI or `VBoxManage` command line tool
-* Cleaning the repository of build artifacts and other intermediate output with `git clean -fdx`
-* Bringing up the vagrant environment with `vagrant up`


[7/7] aurora git commit: Reorganize Documentation

Posted by se...@apache.org.
Reorganize Documentation

This started as a spike to structure the documentation in a way that makes it more approachable.
In addition, I believe the new structure will allow us to extend and improve the documentation
more easily, as the different sections have more room to grow into something useful
(eg. service discovery).

The new structure was inspired by the documentation of Hubspot's Singularity scheduler. What I
have done was mostly to cut & paste documentation and code examples and embedded those into the
following:

* getting-started: the most basic information for all users
* features: proper explanation of our most important features. This should make it much easier
  for people to discover Aurora's unique selling points.
* operators: stuff only the operatos will care about
* developers: stuff only contributors and committers acare about.
* references: the details.

Reviewed at https://reviews.apache.org/r/45392/


Project: http://git-wip-us.apache.org/repos/asf/aurora/repo
Commit: http://git-wip-us.apache.org/repos/asf/aurora/commit/f28f41a7
Tree: http://git-wip-us.apache.org/repos/asf/aurora/tree/f28f41a7
Diff: http://git-wip-us.apache.org/repos/asf/aurora/diff/f28f41a7

Branch: refs/heads/master
Commit: f28f41a70568989ff39cadad94038f74527383e8
Parents: 0950095
Author: Stephan Erb <se...@apache.org>
Authored: Mon Mar 28 22:53:55 2016 +0200
Committer: Stephan Erb <st...@dev.static-void.de>
Committed: Mon Mar 28 22:53:55 2016 +0200

----------------------------------------------------------------------
 README.md                                      |  27 +-
 docs/README.md                                 |  95 +-
 docs/additional-resources/presentations.md     |  80 ++
 docs/additional-resources/tools.md             |  21 +
 docs/build-system.md                           | 100 --
 docs/client-cluster-configuration.md           |  70 --
 docs/client-commands.md                        | 389 --------
 docs/committers.md                             |  81 --
 docs/configuration-reference.md                | 722 ---------------
 docs/configuration-tutorial.md                 | 954 --------------------
 docs/cron-jobs.md                              | 131 ---
 docs/deploying-aurora-scheduler.md             | 379 --------
 docs/design-documents.md                       |  19 -
 docs/design/command-hooks.md                   | 102 ---
 docs/developing-aurora-client.md               |  93 --
 docs/developing-aurora-scheduler.md            | 163 ----
 docs/development/client.md                     |  81 ++
 docs/development/committers-guide.md           |  86 ++
 docs/development/design-documents.md           |  20 +
 docs/development/design/command-hooks.md       | 102 +++
 docs/development/scheduler.md                  | 118 +++
 docs/development/thermos.md                    | 126 +++
 docs/development/thrift.md                     |  57 ++
 docs/development/ui.md                         |  46 +
 docs/features/constraints.md                   | 126 +++
 docs/features/containers.md                    |  43 +
 docs/features/cron-jobs.md                     | 124 +++
 docs/features/job-updates.md                   | 111 +++
 docs/features/multitenancy.md                  |  62 ++
 docs/features/resource-isolation.md            | 167 ++++
 docs/features/service-discovery.md             |  14 +
 docs/features/services.md                      |  99 ++
 docs/features/sla-metrics.md                   | 178 ++++
 docs/getting-started/overview.md               | 108 +++
 docs/getting-started/tutorial.md               | 258 ++++++
 docs/getting-started/vagrant.md                | 137 +++
 docs/hooks.md                                  | 244 -----
 docs/installing.md                             | 335 -------
 docs/monitoring.md                             | 181 ----
 docs/operations/backup-restore.md              |  91 ++
 docs/operations/configuration.md               | 182 ++++
 docs/operations/installation.md                | 324 +++++++
 docs/operations/monitoring.md                  | 181 ++++
 docs/operations/security.md                    | 282 ++++++
 docs/operations/storage.md                     |  97 ++
 docs/presentations.md                          |  80 --
 docs/reference/client-cluster-configuration.md |  93 ++
 docs/reference/client-commands.md              | 326 +++++++
 docs/reference/client-hooks.md                 | 228 +++++
 docs/reference/configuration-best-practices.md | 187 ++++
 docs/reference/configuration-templating.md     | 306 +++++++
 docs/reference/configuration-tutorial.md       | 511 +++++++++++
 docs/reference/configuration.md                | 573 ++++++++++++
 docs/reference/scheduler-configuration.md      | 318 +++++++
 docs/reference/task-lifecycle.md               | 146 +++
 docs/resources.md                              | 164 ----
 docs/scheduler-configuration.md                | 318 -------
 docs/security.md                               | 279 ------
 docs/sla.md                                    | 177 ----
 docs/storage-config.md                         | 153 ----
 docs/storage.md                                |  88 --
 docs/task-lifecycle.md                         | 146 ---
 docs/test-resource-generation.md               |  24 -
 docs/thrift-deprecation.md                     |  54 --
 docs/tools.md                                  |  16 -
 docs/tutorial.md                               | 260 ------
 docs/user-guide.md                             | 244 -----
 docs/vagrant.md                                | 137 ---
 68 files changed, 6084 insertions(+), 6150 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/README.md
----------------------------------------------------------------------
diff --git a/README.md b/README.md
index e2b5632..59260d2 100644
--- a/README.md
+++ b/README.md
@@ -15,27 +15,28 @@ you can instruct to do things like _run 100 of these, somewhere, forever_.
 Aurora is built for users _and_ operators.
 
 * User-facing Features:
-  - Management of long-running services
-  - Cron scheduling
-  - Resource quotas: provide guaranteed resources for specific applications
-  - Rolling job updates, with automatic rollback
-  - Multi-user support
-  - Sophisticated [DSL](docs/configuration-tutorial.md): supports templating, allowing you to
+  - Management of [long-running services](docs/features/services.md)
+  - [Cron jobs](docs/features/cron-jobs.md)
+  - [Resource quotas](docs/features/multitenancy.md): provide guaranteed resources for specific
+    applications
+  - [Rolling job updates](docs/features/job-updates.md), with automatic rollback
+  - [Multi-user support](docs/features/multitenancy.md)
+  - Sophisticated [DSL](docs/reference/configuration-tutorial.md): supports templating, allowing you to
     establish common patterns and avoid redundant configurations
-  - [Dedicated machines](docs/deploying-aurora-scheduler.md#dedicated-attribute):
+  - [Dedicated machines](docs/features/constraints.md#dedicated-attribute):
     for things like stateful services that must always run on the same machines
-  - Service registration: [announce](docs/configuration-reference.md#announcer-objects) services in
-    [ZooKeeper](http://zookeeper.apache.org/) for discovery by clients like
-    [finagle](https://twitter.github.io/finagle).
-  - [Scheduling constraints](docs/configuration-reference.md#specifying-scheduling-constraints)
+  - [Service registration](docs/features/service-discovery.md): announce services in
+    [ZooKeeper](http://zookeeper.apache.org/) for discovery by [various clients](docs/additional-resources/tools.md)
+  - [Scheduling constraints](docs/features/constraints.md)
     to run on specific machines, or to mitigate impact of issues like machine and rack failure
 
 * Under the hood, to help you rest easy:
-  - Preemption: important services can 'steal' resources when they need it
+  - [Preemption](docs/features/multitenancy.md): important services can 'steal' resources when they need it
   - High-availability: resists machine failures and disk failures
   - Scalable: proven to work in data center-sized clusters, with hundreds of users and thousands of
     jobs
-  - Instrumented: a wealth of information makes it easy to [monitor](docs/monitoring.md) and debug
+  - Instrumented: a wealth of information makes it easy to [monitor](docs/operations/monitoring.md)
+    and debug
 
 ### When and when not to use Aurora
 Aurora can take over for most uses of software like monit and chef.  Aurora can manage applications,

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/README.md
----------------------------------------------------------------------
diff --git a/docs/README.md b/docs/README.md
index 673c854..50eb7b2 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -1,46 +1,73 @@
 ## Introduction
-Apache Aurora is a service scheduler that runs on top of Apache Mesos, enabling you to run long-running services that take advantage of Apache Mesos' scalability, fault-tolerance, and resource isolation. This documentation has been organized into sections with three audiences in mind:
 
- * Users: General information about the project and to learn how to run an Aurora job.
- * Operators: For those that wish to manage and fine-tune an Aurora cluster.
- * Developers: All the information you need to start modifying Aurora and contributing back to the project.
+Apache Aurora is a service scheduler that runs on top of Apache Mesos, enabling you to run
+long-running services, cron jobs, and ad-hoc jobs that take advantage of Apache Mesos' scalability,
+fault-tolerance, and resource isolation.
 
-We encourage you to ask questions on the [Aurora user list](http://aurora.apache.org/community/) or the `#aurora` IRC channel on `irc.freenode.net`.
+We encourage you to ask questions on the [Aurora user list](http://aurora.apache.org/community/) or
+the `#aurora` IRC channel on `irc.freenode.net`.
 
-## Users
- * [Install Aurora on virtual machines on your private machine](vagrant.md)
- * [Hello World Tutorial](tutorial.md)
- * [User Guide](user-guide.md)
- * [Task Lifecycle](task-lifecycle.md)
- * [Configuration Tutorial](configuration-tutorial.md)
- * [Aurora + Thermos Reference](configuration-reference.md)
- * [Command Line Client](client-commands.md)
- * [Client cluster configuration](client-cluster-configuration.md)
- * [Cron Jobs](cron-jobs.md)
+
+## Getting Started
+Information for everyone new to Apache Aurora.
+
+ * [Aurora System Overview](getting-started/overview.md)
+ * [Hello World Tutorial](getting-started/tutorial.md)
+ * [Local cluster with Vagrant](getting-started/vagrant.md)
+
+## Features
+Description of important Aurora features.
+
+ * [Containers](features/containers.md)
+ * [Cron Jobs](features/cron-jobs.md)
+ * [Job Updates](features/job-updates.md)
+ * [Multitenancy](features/multitenancy.md)
+ * [Resource Isolation](features/resource-isolation.md)
+ * [Scheduling Constraints](features/constraints.md)
+ * [Services](features/services.md)
+ * [Service Discovery](features/service-discovery.md)
+ * [SLA Metrics](features/sla-metrics.md)
 
 ## Operators
- * [Installation](installing.md)
- * [Deployment and cluster configuration](deploying-aurora-scheduler.md)
- * [Security](security.md)
- * [Monitoring](monitoring.md)
- * [Hooks for Aurora Client API](hooks.md)
- * [Scheduler Configuration](scheduler-configuration.md)
- * [Scheduler Storage](storage.md)
- * [Scheduler Storage and Maintenance](storage-config.md)
- * [SLA Measurement](sla.md)
- * [Resource Isolation and Sizing](resources.md)
+For those that wish to manage and fine-tune an Aurora cluster.
+
+ * [Installation](operations/installation.md)
+ * [Configuration](operations/configuration.md)
+ * [Monitoring](operations/monitoring.md)
+ * [Security](operations/security.md)
+ * [Storage](operations/storage.md)
+ * [Backup](operations/backup-restore.md)
+
+## Reference
+The complete reference of commands, configuration options, and scheduler internals.
+
+ * [Task lifecycle](reference/task-lifecycle.md)
+ * Configuration (`.aurora` files)
+    - [Configuration Reference](reference/configuration.md)
+    - [Configuration Tutorial](reference/configuration-tutorial.md)
+    - [Configuration Best Practices](reference/configuration-best-bractices.md)
+    - [Configuration Templating](reference/configuration-templating.md)
+ * Aurora Client
+    - [Client Commands](reference/client-commands.md)
+    - [Client Hooks](reference/client-hooks.md)
+    - [Client Cluster Configuration](reference/client-cluster-configuration.md)
+ * [Scheduler Configuration](reference/scheduler-configuration.md)
+
+## Additional Resources
+ * [Tools integrating with Aurora](additional-resources/tools.md)
+ * [Presentation videos and slides](additional-resources/presentations.md)
 
 ## Developers
+All the information you need to start modifying Aurora and contributing back to the project.
+
  * [Contributing to the project](../CONTRIBUTING.md)
- * [Developing the Aurora Scheduler](developing-aurora-scheduler.md)
- * [Developing the Aurora Client](developing-aurora-client.md)
- * [Committers Guide](committers.md)
+ * [Committer's Guide](development/committers-guide.md)
  * [Design Documents](design-documents.md)
- * [Deprecation Guide](thrift-deprecation.md)
- * [Build System](build-system.md)
- * [Generating test resources](test-resource-generation.md)
+ * Developing the Aurora components:
+     - [Client](development/client.md)
+     - [Scheduler](development/scheduler.md)
+     - [Scheduler UI](development/ui.md)
+     - [Thermos](development/thermos.md)
+     - [Thrift structures](development/thrift.md)
 
 
-## Additional Resources
- * [Tools integrating with Aurora](tools.md)
- * [Presentation videos and slides](presentations.md)

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/additional-resources/presentations.md
----------------------------------------------------------------------
diff --git a/docs/additional-resources/presentations.md b/docs/additional-resources/presentations.md
new file mode 100644
index 0000000..70623c6
--- /dev/null
+++ b/docs/additional-resources/presentations.md
@@ -0,0 +1,80 @@
+# Apache Aurora Presentations
+Video and slides from presentations and panel discussions about Apache Aurora.
+
+_(Listed in date descending order)_
+
+<table>
+
+	<tr>
+		<td><img src="../images/presentations/10_08_2015_mesos_aurora_on_a_small_scale_thumb.png" alt="Mesos and Aurora on a Small Scale Thumbnail" /></td>
+		<td><strong><a href="https://www.youtube.com/watch?v=q5iIqhaCJ_o">Mesos &amp; Aurora on a Small Scale (Video)</a></strong>
+		<p>Presented by Florian Pfeiffer</p>
+		<p>October 8, 2015 at <a href="http://events.linuxfoundation.org/events/archive/2015/mesoscon-europe">#MesosCon Europe 2015</a></p></td>
+	</tr>
+	<tr>
+		<td><img src="../images/presentations/10_08_2015_sla_aware_maintenance_for_operators_thumb.png" alt="SLA Aware Maintenance for Operators Thumbnail" /></td>
+		<td><strong><a href="https://www.youtube.com/watch?v=tZ0-SISvCis">SLA Aware Maintenance for Operators (Video)</a></strong>
+		<p>Presented by Joe Smith</p>
+		<p>October 8, 2015 at <a href="http://events.linuxfoundation.org/events/archive/2015/mesoscon-europe">#MesosCon Europe 2015</a></p></td>
+	</tr>
+	<tr>
+		<td><img src="../images/presentations/09_20_2015_shipping_code_with_aurora_thumb.png" alt="Shipping Code with Aurora Thumbnail" /></td>
+		<td><strong><a href="https://www.youtube.com/watch?v=y1hi7K1lPkk">Shipping Code with Aurora (Video)</a></strong>
+		<p>Presented by Bill Farner</p>
+		<p>August 20, 2015 at <a href="http://events.linuxfoundation.org/events/archive/2015/mesoscon">#MesosCon 2015</a></p></td>
+	</tr>
+	<tr>
+		<td><img src="../images/presentations/09_20_2015_twitter_production_scale_thumb.png" alt="Twitter Production Scale Thumbnail" /></td>
+		<td><strong><a href="https://www.youtube.com/watch?v=nNrh-gdu9m4">Twitter’s Production Scale: Mesos and Aurora Operations (Video)</a></strong>
+		<p>Presented by Joe Smith</p>
+		<p>August 20, 2015 at <a href="http://events.linuxfoundation.org/events/archive/2015/mesoscon">#MesosCon 2015</a></p></td>
+	</tr>
+	<tr>
+		<td><img src="../images/presentations/04_30_2015_monolith_to_microservices_thumb.png" alt="From Monolith to Microservices with Aurora Video Thumbnail" /></td>
+		<td><strong><a href="https://www.youtube.com/watch?v=yXkOgnyK4Hw">From Monolith to Microservices w/ Aurora (Video)</a></strong>
+		<p>Presented by Thanos Baskous, Tony Dong, Dobromir Montauk</p>
+		<p>April 30, 2015 at <a href="http://www.meetup.com/Bay-Area-Apache-Aurora-Users-Group/events/221219480/">Bay Area Apache Aurora Users Group</a></p></td>
+	</tr>
+	<tr>
+		<td><img src="../images/presentations/03_07_2015_aurora_mesos_in_practice_at_twitter_thumb.png" alt="Aurora + Mesos in Practice at Twitter Thumbnail" /></td>
+		<td><strong><a href="https://www.youtube.com/watch?v=1XYJGX_qZVU">Aurora + Mesos in Practice at Twitter (Video)</a></strong>
+		<p>Presented by Bill Farner</p>
+		<p>March 07, 2015 at <a href="http://www.bigeng.io/aurora-mesos-in-practice-at-twitter">Bigcommerce TechTalk</a></p></td>
+	</tr>
+	<tr>
+		<td><img src="../images/presentations/02_28_2015_apache_aurora_thumb.png" alt="Apache Auroraの始めかた Slideshow Thumbnail" /></td>
+		<td><strong><a href="http://www.slideshare.net/zembutsu/apache-aurora-introduction-and-tutorial-osc15tk">Apache Auroraの始めかた (Slides)</a></strong>
+		<p>Presented by Masahito Zembutsu</p>
+		<p>February 28, 2015 at <a href="http://www.ospn.jp/osc2015-spring/">Open Source Conference 2015 Tokyo Spring</a></p></td>
+	</tr>
+	<tr>
+		<td><img src="../images/presentations/02_19_2015_aurora_adopters_panel_thumb.png" alt="Apache Aurora Adopters Panel Video Thumbnail" /></td>
+		<td><strong><a href="https://www.youtube.com/watch?v=2Jsj0zFdRlg">Apache Aurora Adopters Panel (Video)</a></strong>
+		<p>Panelists Ben Staffin, Josh Adams, Bill Farner, Berk Demir</p>
+		<p>February 19, 2015 at <a href="http://www.meetup.com/Bay-Area-Mesos-User-Group/events/220279080/">Bay Area Mesos Users Group</a></p></td>
+	</tr>
+	<tr>
+		<td><img src="../images/presentations/02_19_2015_aurora_at_twitter_thumb.png" alt="Operating Apache Aurora and Mesos at Twitter Video Thumbnail" /></td>
+		<td><strong><a href="https://www.youtube.com/watch?v=E4lxX6epM_U">Operating Apache Aurora and Mesos at Twitter (Video)</a></strong>
+		<p>Presented by Joe Smith</p>
+		<p>February 19, 2015 at <a href="http://www.meetup.com/Bay-Area-Mesos-User-Group/events/220279080/">Bay Area Mesos Users Group</a></p></td>
+	</tr>
+	<tr>
+		<td><img src="../images/presentations/02_19_2015_aurora_at_tellapart_thumb.png" alt="Apache Aurora and Mesos at TellApart" /></td>
+		<td><strong><a href="https://www.youtube.com/watch?v=ZZXtXLvTXAE">Apache Aurora and Mesos at TellApart (Video)</a></strong>
+		<p>Presented by Steve Niemitz</p>
+		<p>February 19, 2015 at <a href="http://www.meetup.com/Bay-Area-Mesos-User-Group/events/220279080/">Bay Area Mesos Users Group</a></p></td>
+	</tr>
+	<tr>
+		<td><img src="../images/presentations/08_21_2014_past_present_future_thumb.png" alt="Past, Present, and Future of the Aurora Scheduler Video Thumbnail" /></td>
+		<td><strong><a href="https://www.youtube.com/watch?v=Dsc5CPhKs4o">Past, Present, and Future of the Aurora Scheduler (Video)</a></strong>
+		<p>Presented by Bill Farner</p>
+		<p>August 21, 2014 at <a href="http://events.linuxfoundation.org/events/archive/2014/mesoscon">#MesosCon 2014</a></p></td>
+	</tr>
+	<tr>
+		<td><img src="../images/presentations/03_25_2014_introduction_to_aurora_thumb.png" alt="Introduction to Apache Aurora Video Thumbnail" /></td>
+		<td><strong><a href="https://www.youtube.com/watch?v=asd_h6VzaJc">Introduction to Apache Aurora (Video)</a></strong>
+		<p>Presented by Bill Farner</p>
+		<p>March 25, 2014 at <a href="https://www.eventbrite.com/e/aurora-and-mesosframeworksmeetup-tickets-10850994617">Aurora and Mesos Frameworks Meetup</a></p></td>
+	</tr>
+</table>

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/additional-resources/tools.md
----------------------------------------------------------------------
diff --git a/docs/additional-resources/tools.md b/docs/additional-resources/tools.md
new file mode 100644
index 0000000..109f125
--- /dev/null
+++ b/docs/additional-resources/tools.md
@@ -0,0 +1,21 @@
+# Tools
+
+Various tools integrate with Aurora. Is there a tool missing? Let us know, or submit a patch to add it!
+
+* Load-balancing technology used to direct traffic to services running on Aurora:
+  - [synapse](https://github.com/airbnb/synapse) based on HAProxy
+  - [aurproxy](https://github.com/tellapart/aurproxy) based on nginx
+  - [jobhopper](https://github.com/benley/aurora-jobhopper) performing HTTP redirects for easy developers and administor access
+
+* RPC libraries that integrate with the Aurora's [service discovery mechanism](../features/service-discovery.md):
+  - [linkerd](https://linkerd.io/) RPC proxy
+  - [finagle](https://twitter.github.io/finagle) (Python)
+  - [scales](https://github.com/steveniemitz/scales) (Scala)
+
+* Monitoring:
+  - [collectd-aurora](https://github.com/zircote/collectd-aurora) for cluster monitoring using collectd
+  - [Prometheus Aurora exporter](https://github.com/tommyulfsparre/aurora_exporter) for cluster monitoring using Prometheus
+  - [Prometheus service discovery integration](http://prometheus.io/docs/operating/configuration/#zookeeper-serverset-sd-configurations-serverset_sd_config) for discovering and monitoring services running on Aurora
+
+* Packaging and deployment:
+  - [aurora-packaging](https://github.com/apache/aurora-packaging), the source of the official Aurora packaes

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/build-system.md
----------------------------------------------------------------------
diff --git a/docs/build-system.md b/docs/build-system.md
deleted file mode 100644
index 39c231d..0000000
--- a/docs/build-system.md
+++ /dev/null
@@ -1,100 +0,0 @@
-The Python components of Aurora are built using [Pants](https://pantsbuild.github.io).
-
-Python Build Conventions
-========================
-The Python code is laid out according to the following conventions: 
-
-1. 1 `BUILD` per 3rd level directory. For a list of current top-level packages run:
-
-        % find src/main/python -maxdepth 3 -mindepth 3 -type d |\
-        while read dname; do echo $dname |\
-            sed 's@src/main/python/\(.*\)/\(.*\)/\(.*\).*@\1.\2.\3@'; done
-
-2.  Each `BUILD` file exports 1 
-    [`python_library`](https://pantsbuild.github.io/build_dictionary.html#bdict_python_library)
-    that provides a
-    [`setup_py`](https://pantsbuild.github.io/build_dictionary.html#setup_py)
-    containing each
-    [`python_binary`](https://pantsbuild.github.io/build_dictionary.html#python_binary)
-    in the `BUILD` file, named the same as the directory it's in so that it can be referenced
-    without a ':' character. The `sources` field in the `python_library` will almost always be
-    `rglobs('*.py')`.
-
-3.  Other BUILD files may only depend on this single public `python_library`
-    target. Any other target is considered a private implementation detail and
-    should be prefixed with an `_`.
-
-4.  `python_binary` targets are always named the same as the exported console script.
-
-5.  `python_binary` targets must have identical `dependencies` to the `python_library` exported
-    by the package and must use `entry_point`.
-
-    The means a PEX file generated by pants will contain exactly the same files that will be
-    available on the `PYTHONPATH` in the case of `pip install` of the corresponding library
-    target. This will help our migration off of Pants in the future.
-
-Annotated example - apache.thermos.runner
------------------------------------------
-```
-% find src/main/python/apache/thermos/runner
-src/main/python/apache/thermos/runner
-src/main/python/apache/thermos/runner/__init__.py
-src/main/python/apache/thermos/runner/thermos_runner.py
-src/main/python/apache/thermos/runner/BUILD
-% cat src/main/python/apache/thermos/runner/BUILD
-# License boilerplate omitted
-import os
-
-
-# Private target so that a setup_py can exist without a circular dependency. Only targets within
-# this file should depend on this.
-python_library(
-  name = '_runner',
-  # The target covers every python file under this directory and subdirectories.
-  sources = rglobs('*.py'),
-  dependencies = [
-    '3rdparty/python:twitter.common.app',
-    '3rdparty/python:twitter.common.log',
-    # Source dependencies are always referenced without a ':'.
-    'src/main/python/apache/thermos/common',
-    'src/main/python/apache/thermos/config',
-    'src/main/python/apache/thermos/core',
-  ],
-)
-
-# Binary target for thermos_runner.pex. Nothing should depend on this - it's only used as an
-# argument to ./pants binary.
-python_binary(
-  name = 'thermos_runner',
-  # Use entry_point, not source so the files used here are the same ones tests see.
-  entry_point = 'apache.thermos.bin.thermos_runner',
-  dependencies = [
-    # Notice that we depend only on the single private target from this BUILD file here.
-    ':_runner',
-  ],
-)
-
-# The public library that everyone importing the runner symbols uses.
-# The test targets and any other dependent source code should depend on this.
-python_library(
-  name = 'runner',
-  dependencies = [
-    # Again, notice that we depend only on the single private target from this BUILD file here.
-    ':_runner',
-  ],
-  # We always provide a setup_py. This will cause any dependee libraries to automatically
-  # reference this library in their requirements.txt rather than copy the source files into their
-  # sdist.
-  provides = setup_py(
-    # Conventionally named and versioned.
-    name = 'apache.thermos.runner',
-    version = open(os.path.join(get_buildroot(), '.auroraversion')).read().strip().upper(),
-  ).with_binaries({
-    # Every binary in this file should also be repeated here.
-    # Always use the dict-form of .with_binaries so that commands with dashes in their names are
-    # supported.
-    # The console script name is always the same as the PEX with .pex stripped.
-    'thermos_runner': ':thermos_runner',
-  }),
-)
-```

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/client-cluster-configuration.md
----------------------------------------------------------------------
diff --git a/docs/client-cluster-configuration.md b/docs/client-cluster-configuration.md
deleted file mode 100644
index 88b9d77..0000000
--- a/docs/client-cluster-configuration.md
+++ /dev/null
@@ -1,70 +0,0 @@
-# Client Cluster Configuration
-
-A cluster configuration file is used by the Aurora client to describe the Aurora clusters with
-which it can communicate. Ultimately this allows client users to reference clusters with short names
-like us-east and eu. The following properties may be set:
-
-  **Property**             | **Type** | **Description**
-  :------------------------| :------- | :--------------
-   **name**                | String   | Cluster name (Required)
-   **slave_root**          | String   | Path to mesos slave work dir (Required)
-   **slave_run_directory** | String   | Name of mesos slave run dir (Required)
-   **zk**                  | String   | Hostname of ZooKeeper instance used to resolve Aurora schedulers.
-   **zk_port**             | Integer  | Port of ZooKeeper instance used to locate Aurora schedulers (Default: 2181)
-   **scheduler_zk_path**   | String   | ZooKeeper path under which scheduler instances are registered.
-   **scheduler_uri**       | String   | URI of Aurora scheduler instance.
-   **proxy_url**           | String   | Used by the client to format URLs for display.
-   **auth_mechanism**      | String   | The authentication mechanism to use when communicating with the scheduler. (Default: UNAUTHENTICATED)
-
-#### name
-
-The name of the Aurora cluster represented by this entry. This name will be the `cluster` portion of
-any job keys identifying jobs running within the cluster.
-
-#### slave_root
-
-The path on the mesos slaves where executing tasks can be found. It is used in combination with the
-`slave_run_directory` property by `aurora task run` and `aurora task ssh` to change into the sandbox
-directory after connecting to the host. This value should match the value passed to `mesos-slave`
-as `-work_dir`.
-
-#### slave_run_directory
-
-The name of the directory where the task run can be found. This is used in combination with the
-`slave_root` property by `aurora task run` and `aurora task ssh` to change into the sandbox
-directory after connecting to the host. This should almost always be set to `latest`.
-
-#### zk
-
-The hostname of the ZooKeeper instance used to resolve the Aurora scheduler. Aurora uses ZooKeeper
-to elect a leader. The client will connect to this ZooKeeper instance to determine the current
-leader. This host should match the host passed to the scheduler as `-zk_endpoints`.
-
-#### zk_port
-
-The port on which the ZooKeeper instance is running. If not set this will default to the standard
-ZooKeeper port of 2181. This port should match the port in the host passed to the scheduler as
-`-zk_endpoints`.
-
-#### scheduler_zk_path
-
-The path on the ZooKeeper instance under which the Aurora serverset is registered. This value should
-match the value passed to the scheduler as `-serverset_path`.
-
-#### scheduler_uri
-
-The URI of the scheduler. This would be used in place of the ZooKeeper related configuration above
-in circumstances where direct communication with a single scheduler is needed (e.g. testing
-environments). It is strongly advised to **never** use this property for production deploys.
-
-#### proxy_url
-
-Instead of using the hostname of the leading scheduler as the base url, if `proxy_url` is set, its
-value will be used instead. In that scenario the value for `proxy_url` would be, for example, the
-URL of your VIP in a loadbalancer or a roundrobin DNS name.
-
-#### auth_mechanism
-
-The identifier of an authentication mechanism that the client should use when communicating with the
-scheduler. Support for values other than `UNAUTHENTICATED` requires a matching scheduler-side
-[security configuration](security.md).

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/client-commands.md
----------------------------------------------------------------------
diff --git a/docs/client-commands.md b/docs/client-commands.md
deleted file mode 100644
index 156fe4c..0000000
--- a/docs/client-commands.md
+++ /dev/null
@@ -1,389 +0,0 @@
-Aurora Client Commands
-======================
-
-- [Introduction](#introduction)
-- [Cluster Configuration](#cluster-configuration)
-- [Job Keys](#job-keys)
-- [Modifying Aurora Client Commands](#modifying-aurora-client-commands)
-- [Regular Jobs](#regular-jobs)
-    - [Creating and Running a Job](#creating-and-running-a-job)
-    - [Running a Command On a Running Job](#running-a-command-on-a-running-job)
-    - [Killing a Job](#killing-a-job)
-    - [Adding Instances](#adding-instances)
-    - [Updating a Job](#updating-a-job)
-        - [Coordinated job updates](#user-content-coordinated-job-updates)
-    - [Renaming a Job](#renaming-a-job)
-    - [Restarting Jobs](#restarting-jobs)
-- [Cron Jobs](#cron-jobs)
-- [Comparing Jobs](#comparing-jobs)
-- [Viewing/Examining Jobs](#viewingexamining-jobs)
-    - [Listing Jobs](#listing-jobs)
-    - [Inspecting a Job](#inspecting-a-job)
-    - [Versions](#versions)
-    - [Checking Your Quota](#checking-your-quota)
-    - [Finding a Job on Web UI](#finding-a-job-on-web-ui)
-    - [Getting Job Status](#getting-job-status)
-    - [Opening the Web UI](#opening-the-web-ui)
-    - [SSHing to a Specific Task Machine](#sshing-to-a-specific-task-machine)
-    - [Templating Command Arguments](#templating-command-arguments)
-
-Introduction
-------------
-
-Once you have written an `.aurora` configuration file that describes
-your Job and its parameters and functionality, you interact with Aurora
-using Aurora Client commands. This document describes all of these commands
-and how and when to use them. All Aurora Client commands start with
-`aurora`, followed by the name of the specific command and its
-arguments.
-
-*Job keys* are a very common argument to Aurora commands, as well as the
-gateway to useful information about a Job. Before using Aurora, you
-should read the next section which describes them in detail. The section
-after that briefly describes how you can modify the behavior of certain
-Aurora Client commands, linking to a detailed document about how to do
-that.
-
-This is followed by the Regular Jobs section, which describes the basic
-Client commands for creating, running, and manipulating Aurora Jobs.
-After that are sections on Comparing Jobs and Viewing/Examining Jobs. In
-other words, various commands for getting information and metadata about
-Aurora Jobs.
-
-Cluster Configuration
----------------------
-
-The client must be able to find a configuration file that specifies available clusters. This file
-declares shorthand names for clusters, which are in turn referenced by job configuration files
-and client commands.
-
-The client will load at most two configuration files, making both of their defined clusters
-available. The first is intended to be a system-installed cluster, using the path specified in
-the environment variable `AURORA_CONFIG_ROOT`, defaulting to `/etc/aurora/clusters.json` if the
-environment variable is not set. The second is a user-installed file, located at
-`~/.aurora/clusters.json`.
-
-A cluster configuration is formatted as JSON.  The simplest cluster configuration is one that
-communicates with a single (non-leader-elected) scheduler.  For example:
-
-```javascript
-[{
-  "name": "example",
-  "scheduler_uri": "http://localhost:55555",
-}]
-```
-
-A configuration for a leader-elected scheduler would contain something like:
-
-```javascript
-[{
-  "name": "example",
-  "zk": "192.168.33.7",
-  "scheduler_zk_path": "/aurora/scheduler"
-}]
-```
-
-For more details on cluster configuration see the
-[Client Cluster Configuration](client-cluster-configuration.md) documentation.
-
-Job Keys
---------
-
-A job key is a unique system-wide identifier for an Aurora-managed
-Job, for example `cluster1/web-team/test/experiment204`. It is a 4-tuple
-consisting of, in order, *cluster*, *role*, *environment*, and
-*jobname*, separated by /s. Cluster is the name of an Aurora
-cluster. Role is the Unix service account under which the Job
-runs. Environment is a namespace component like `devel`, `test`,
-`prod`, or `stagingN.` Jobname is the Job's name.
-
-The combination of all four values uniquely specifies the Job. If any
-one value is different from that of another job key, the two job keys
-refer to different Jobs. For example, job key
-`cluster1/tyg/prod/workhorse` is different from
-`cluster1/tyg/prod/workcamel` is different from
-`cluster2/tyg/prod/workhorse` is different from
-`cluster2/foo/prod/workhorse` is different from
-`cluster1/tyg/test/workhorse.`
-
-Role names are user accounts existing on the slave machines. If you don't know what accounts
-are available, contact your sysadmin.
-
-Environment names are namespaces; you can count on `prod`, `devel` and `test` existing.
-
-Modifying Aurora Client Commands
---------------------------------
-
-For certain Aurora Client commands, you can define hook methods that run
-either before or after an action that takes place during the command's
-execution, as well as based on whether the action finished successfully or failed
-during execution. Basically, a hook is code that lets you extend the
-command's actions. The hook executes on the client side, specifically on
-the machine executing Aurora commands.
-
-Hooks can be associated with these Aurora Client commands.
-
-  - `job create`
-  - `job kill`
-  - `job restart`
-
-The process for writing and activating them is complex enough
-that we explain it in a devoted document, [Hooks for Aurora Client API](hooks.md).
-
-Regular Jobs
-------------
-
-This section covers Aurora commands related to running, killing,
-renaming, updating, and restarting a basic Aurora Job.
-
-### Creating and Running a Job
-
-    aurora job create <job key> <configuration file>
-
-Creates and then runs a Job with the specified job key based on a `.aurora` configuration file.
-The configuration file may also contain and activate hook definitions.
-
-### Running a Command On a Running Job
-
-    aurora task run CLUSTER/ROLE/ENV/NAME[/INSTANCES] <cmd>
-
-Runs a shell command on all machines currently hosting shards of a
-single Job.
-
-`run` supports the same command line wildcards used to populate a Job's
-commands; i.e. anything in the `{{mesos.*}}` and `{{thermos.*}}`
-namespaces.
-
-### Killing a Job
-
-    aurora job killall CLUSTER/ROLE/ENV/NAME
-
-Kills all Tasks associated with the specified Job, blocking until all
-are terminated. Defaults to killing all instances in the Job.
-
-The `<configuration file>` argument for `kill` is optional. Use it only
-if it contains hook definitions and activations that affect the
-kill command.
-
-### Adding Instances
-
-    aurora job add CLUSTER/ROLE/ENV/NAME/INSTANCE <count>
-
-Adds `<count>` instances to the existing job. The configuration of the new instances is derived from
-an active job instance pointed by the `/INSTANCE` part of the job specification. This command is
-a simpler way to scale out an existing job when an instance with desired task configuration
-already exists. Use `aurora update start` to add instances with a new (updated) configuration.
-
-### Updating a Job
-
-There are several sub-commands to manage job updates:
-
-    aurora update start <job key> <configuration file>
-    aurora update info <job key>
-    aurora update pause <job key>
-    aurora update resume <job key>
-    aurora update abort <job key>
-    aurora update list <cluster>
-
-When you `start` a job update, the command will return once it has sent the
-instructions to the scheduler.  At that point, you may view detailed
-progress for the update with the `info` subcommand, in addition to viewing
-graphical progress in the web browser.  You may also get a full listing of
-in-progress updates in a cluster with `list`.
-
-Once an update has been started, you can `pause` to keep the update but halt
-progress.  This can be useful for doing things like debug a  partially-updated
-job to determine whether you would like to proceed.  You can `resume` to
-proceed.
-
-You may `abort` a job update regardless of the state it is in. This will
-instruct the scheduler to completely abandon the job update and leave the job
-in the current (possibly partially-updated) state.
-
-#### Coordinated job updates
-
-Some Aurora services may benefit from having more control over updates by explicitly
-acknowledging ("heartbeating") job update progress. This may be helpful for mission-critical
-service updates where explicit job health monitoring is vital during the entire job update
-lifecycle. Such job updates would rely on an external service (or a custom client) periodically
-pulsing an active coordinated job update via a
-[pulseJobUpdate RPC](../api/src/main/thrift/org/apache/aurora/gen/api.thrift).
-
-A coordinated update is defined by setting a positive
-[pulse_interval_secs](configuration-reference.md#updateconfig-objects) value in job configuration
-file. If no pulses are received within specified interval the update will be blocked. A blocked
-update is unable to continue rolling forward (or rolling back) but retains its active status.
-It may only be unblocked by a fresh `pulseJobUpdate` call.
-
-NOTE: A coordinated update starts in `ROLL_FORWARD_AWAITING_PULSE` state and will not make any
-progress until the first pulse arrives. However, a paused update (`ROLL_FORWARD_PAUSED` or
-`ROLL_BACK_PAUSED`) is still considered active and upon resuming will immediately make progress
-provided the pulse interval has not expired.
-
-### Renaming a Job
-
-Renaming is a tricky operation as downstream clients must be informed of
-the new name. A conservative approach
-to renaming suitable for production services is:
-
-1.  Modify the Aurora configuration file to change the role,
-    environment, and/or name as appropriate to the standardized naming
-    scheme.
-2.  Check that only these naming components have changed
-    with `aurora diff`.
-
-        aurora job diff CLUSTER/ROLE/ENV/NAME <job_configuration>
-
-3.  Create the (identical) job at the new key. You may need to request a
-    temporary quota increase.
-
-        aurora job create CLUSTER/ROLE/ENV/NEW_NAME <job_configuration>
-
-4.  Migrate all clients over to the new job key. Update all links and
-    dashboards. Ensure that both job keys run identical versions of the
-    code while in this state.
-5.  After verifying that all clients have successfully moved over, kill
-    the old job.
-
-        aurora job killall CLUSTER/ROLE/ENV/NAME
-
-6.  If you received a temporary quota increase, be sure to let the
-    powers that be know you no longer need the additional capacity.
-
-### Restarting Jobs
-
-`restart` restarts all of a job key identified Job's shards:
-
-    aurora job restart CLUSTER/ROLE/ENV/NAME[/INSTANCES]
-
-Restarts are controlled on the client side, so aborting
-the `job restart` command halts the restart operation.
-
-**Note**: `job restart` only applies its command line arguments and does not
-use or is affected by `update.config`. Restarting
-does ***not*** involve a configuration change. To update the
-configuration, use `update.config`.
-
-The `--config` argument for restart is optional. Use it only
-if it contains hook definitions and activations that affect the
-`job restart` command.
-
-Cron Jobs
----------
-
-You can manage cron jobs using the `aurora cron` command.  Please see
-[cron-jobs.md](cron-jobs.md) for more details.
-
-You will see various commands and options relating to cron jobs in
-`aurora -h` and similar. Ignore them, as they're not yet implemented.
-
-Comparing Jobs
---------------
-
-    aurora job diff CLUSTER/ROLE/ENV/NAME <job configuration>
-
-Compares a job configuration against a running job. By default the diff
-is determined using `diff`, though you may choose an alternate
- diff program by specifying the `DIFF_VIEWER` environment variable.
-
-Viewing/Examining Jobs
-----------------------
-
-Above we discussed creating, killing, and updating Jobs. Here we discuss
-how to view and examine Jobs.
-
-### Listing Jobs
-
-    aurora config list <job configuration>
-
-Lists all Jobs registered with the Aurora scheduler in the named cluster for the named role.
-
-### Inspecting a Job
-
-    aurora job inspect CLUSTER/ROLE/ENV/NAME <job configuration>
-
-`inspect` verifies that its specified job can be parsed from a
-configuration file, and displays the parsed configuration.
-
-### Checking Your Quota
-
-    aurora quota get CLUSTER/ROLE
-
-  Prints the production quota allocated to the role's value at the given
-cluster. Only non-[dedicated](deploying-aurora-scheduler.md#dedicated-attribute)
-[production](configuration-reference.md#job-objects) jobs consume quota.
-
-### Finding a Job on Web UI
-
-When you create a job, part of the output response contains a URL that goes
-to the job's scheduler UI page. For example:
-
-    vagrant@precise64:~$ aurora job create devcluster/www-data/prod/hello /vagrant/examples/jobs/hello_world.aurora
-    INFO] Creating job hello
-    INFO] Response from scheduler: OK (message: 1 new tasks pending for job www-data/prod/hello)
-    INFO] Job url: http://precise64:8081/scheduler/www-data/prod/hello
-
-You can go to the scheduler UI page for this job via `http://precise64:8081/scheduler/www-data/prod/hello`
-You can go to the overall scheduler UI page by going to the part of that URL that ends at `scheduler`; `http://precise64:8081/scheduler`
-
-Once you click through to a role page, you see Jobs arranged
-separately by pending jobs, active jobs and finished jobs.
-Jobs are arranged by role, typically a service account for
-production jobs and user accounts for test or development jobs.
-
-### Getting Job Status
-
-    aurora job status <job_key>
-
-Returns the status of recent tasks associated with the
-`job_key` specified Job in its supplied cluster. Typically this includes
-a mix of active tasks (running or assigned) and inactive tasks
-(successful, failed, and lost.)
-
-### Opening the Web UI
-
-Use the Job's web UI scheduler URL or the `aurora status` command to find out on which
-machines individual tasks are scheduled. You can open the web UI via the
-`open` command line command if invoked from your machine:
-
-    aurora job open [<cluster>[/<role>[/<env>/<job_name>]]]
-
-If only the cluster is specified, it goes directly to that cluster's
-scheduler main page. If the role is specified, it goes to the top-level
-role page. If the full job key is specified, it goes directly to the job
-page where you can inspect individual tasks.
-
-### SSHing to a Specific Task Machine
-
-    aurora task ssh <job_key> <shard number>
-
-You can have the Aurora client ssh directly to the machine that has been
-assigned a particular Job/shard number. This may be useful for quickly
-diagnosing issues such as performance issues or abnormal behavior on a
-particular machine.
-
-### Templating Command Arguments
-
-    aurora task run [-e] [-t THREADS] <job_key> -- <<command-line>>
-
-Given a job specification, run the supplied command on all hosts and
-return the output. You may use the standard Mustache templating rules:
-
-- `{{thermos.ports[name]}}` substitutes the specific named port of the
-  task assigned to this machine
-- `{{mesos.instance}}` substitutes the shard id of the job's task
-  assigned to this machine
-- `{{thermos.task_id}}` substitutes the task id of the job's task
-  assigned to this machine
-
-For example, the following type of pattern can be a powerful diagnostic
-tool:
-
-    aurora task run -t5 cluster1/tyg/devel/seizure -- \
-      'curl -s -m1 localhost:{{thermos.ports[http]}}/vars | grep uptime'
-
-By default, the command runs in the Task's sandbox. The `-e` option can
-run the command in the executor's sandbox. This is mostly useful for
-Aurora administrators.
-
-You can parallelize the runs by using the `-t` option.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/committers.md
----------------------------------------------------------------------
diff --git a/docs/committers.md b/docs/committers.md
deleted file mode 100644
index f69a898..0000000
--- a/docs/committers.md
+++ /dev/null
@@ -1,81 +0,0 @@
-Setting up your email account
------------------------------
-Once your Apache ID has been set up you can configure your account and add ssh keys and setup an
-email forwarding address at
-
-  http://id.apache.org
-
-Additional instructions for setting up your new committer email can be found at
-
-  http://www.apache.org/dev/user-email.html
-
-The recommended setup is to configure all services (mailing lists, JIRA, ReviewBoard) to send
-emails to your @apache.org email address.
-
-
-Creating a gpg key for releases
--------------------------------
-In order to create a release candidate you will need a gpg key published to an external key server
-and that key will need to be added to our KEYS file as well.
-
-1. Create a key:
-
-               gpg --gen-key
-
-2. Add your gpg key to the Apache Aurora KEYS file:
-
-               git clone https://git-wip-us.apache.org/repos/asf/aurora.git
-               (gpg --list-sigs <KEY ID> && gpg --armor --export <KEY ID>) >> KEYS
-               git add KEYS && git commit -m "Adding gpg key for <APACHE ID>"
-               ./rbt post -o -g
-
-3. Publish the key to an external key server:
-
-               gpg --keyserver pgp.mit.edu --send-keys <KEY ID>
-
-4. Update the changes to the KEYS file to the Apache Aurora svn dist locations listed below:
-
-               https://dist.apache.org/repos/dist/dev/aurora/KEYS
-               https://dist.apache.org/repos/dist/release/aurora/KEYS
-
-5. Add your key to git config for use with the release scripts:
-
-               git config --global user.signingkey <KEY ID>
-
-
-Creating a release
-------------------
-The following will guide you through the steps to create a release candidate, vote, and finally an
-official Apache Aurora release. Before starting your gpg key should be in the KEYS file and you
-must have access to commit to the dist.a.o repositories.
-
-1. Ensure that all issues resolved for this release candidate are tagged with the correct Fix
-Version in Jira, the changelog script will use this to generate the CHANGELOG in step #2.
-
-2. Create a release candidate. This will automatically update the CHANGELOG and commit it, create a
-branch and update the current version within the trunk. To create a minor version update and publish
-it run
-
-               ./build-support/release/release-candidate -l m -p
-
-3. Update, if necessary, the draft email created from the `release-candidate` script in step #2 and
-send the [VOTE] email to the dev@ mailing list. You can verify the release signature and checksums
-by running
-
-               ./build-support/release/verify-release-candidate
-
-4. Wait for the vote to complete. If the vote fails close the vote by replying to the initial [VOTE]
-email sent in step #3 by editing the subject to [RESULT][VOTE] ... and noting the failure reason
-(example [here](http://markmail.org/message/d4d6xtvj7vgwi76f)). Now address any issues and go back to
-step #1 and run again, this time you will use the -r flag to increment the release candidate
-version. This will automatically clean up the release candidate rc0 branch and source distribution.
-
-               ./build-support/release/release-candidate -l m -r 1 -p
-
-5. Once the vote has successfully passed create the release
-
-               ./build-support/release/release
-
-6. Update the draft email created fom the `release` script in step #5 to include the Apache ID's for
-all binding votes and send the [RESULT][VOTE] email to the dev@ mailing list.
-

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/configuration-reference.md
----------------------------------------------------------------------
diff --git a/docs/configuration-reference.md b/docs/configuration-reference.md
deleted file mode 100644
index 7bcf22d..0000000
--- a/docs/configuration-reference.md
+++ /dev/null
@@ -1,722 +0,0 @@
-Aurora + Thermos Configuration Reference
-========================================
-
-- [Aurora + Thermos Configuration Reference](#aurora--thermos-configuration-reference)
-- [Introduction](#introduction)
-- [Process Schema](#process-schema)
-    - [Process Objects](#process-objects)
-      - [name](#name)
-      - [cmdline](#cmdline)
-      - [max_failures](#max_failures)
-      - [daemon](#daemon)
-      - [ephemeral](#ephemeral)
-      - [min_duration](#min_duration)
-      - [final](#final)
-      - [logger](#logger)
-- [Task Schema](#task-schema)
-    - [Task Object](#task-object)
-      - [name](#name-1)
-      - [processes](#processes)
-        - [constraints](#constraints)
-      - [resources](#resources)
-      - [max_failures](#max_failures-1)
-      - [max_concurrency](#max_concurrency)
-      - [finalization_wait](#finalization_wait)
-    - [Constraint Object](#constraint-object)
-    - [Resource Object](#resource-object)
-- [Job Schema](#job-schema)
-    - [Job Objects](#job-objects)
-    - [Services](#services)
-    - [Revocable Jobs](#revocable-jobs)
-    - [UpdateConfig Objects](#updateconfig-objects)
-    - [HealthCheckConfig Objects](#healthcheckconfig-objects)
-    - [Announcer Objects](#announcer-objects)
-    - [Container Objects](#container)
-    - [LifecycleConfig Objects](#lifecycleconfig-objects)
-- [Specifying Scheduling Constraints](#specifying-scheduling-constraints)
-- [Executor Wrapper](#executor-wrapper)
-- [Template Namespaces](#template-namespaces)
-    - [mesos Namespace](#mesos-namespace)
-    - [thermos Namespace](#thermos-namespace)
-- [Basic Examples](#basic-examples)
-    - [hello_world.aurora](#hello_worldaurora)
-    - [Environment Tailoring](#environment-tailoring)
-      - [hello_world_productionized.aurora](#hello_world_productionizedaurora)
-
-Introduction
-============
-
-Don't know where to start? The Aurora configuration schema is very
-powerful, and configurations can become quite complex for advanced use
-cases.
-
-For examples of simple configurations to get something up and running
-quickly, check out the [Tutorial](tutorial.md). When you feel comfortable with the basics, move
-on to the [Configuration Tutorial](configuration-tutorial.md) for more in-depth coverage of
-configuration design.
-
-For additional basic configuration examples, see [the end of this document](#BasicExamples).
-
-Process Schema
-==============
-
-Process objects consist of required `name` and `cmdline` attributes. You can customize Process
-behavior with its optional attributes. Remember, Processes are handled by Thermos.
-
-### Process Objects
-
-  **Attribute Name**  | **Type**    | **Description**
-  ------------------- | :---------: | ---------------------------------
-   **name**           | String      | Process name (Required)
-   **cmdline**        | String      | Command line (Required)
-   **max_failures**   | Integer     | Maximum process failures (Default: 1)
-   **daemon**         | Boolean     | When True, this is a daemon process. (Default: False)
-   **ephemeral**      | Boolean     | When True, this is an ephemeral process. (Default: False)
-   **min_duration**   | Integer     | Minimum duration between process restarts in seconds. (Default: 15)
-   **final**          | Boolean     | When True, this process is a finalizing one that should run last. (Default: False)
-   **logger**         | Logger      | Struct defining the log behavior for the process. (Default: Empty)
-
-#### name
-
-The name is any valid UNIX filename string (specifically no
-slashes, NULLs or leading periods). Within a Task object, each Process name
-must be unique.
-
-#### cmdline
-
-The command line run by the process. The command line is invoked in a bash
-subshell, so can involve fully-blown bash scripts. However, nothing is
-supplied for command-line arguments so `$*` is unspecified.
-
-#### max_failures
-
-The maximum number of failures (non-zero exit statuses) this process can
-have before being marked permanently failed and not retried. If a
-process permanently fails, Thermos looks at the failure limit of the task
-containing the process (usually 1) to determine if the task has
-failed as well.
-
-Setting `max_failures` to 0 makes the process retry
-indefinitely until it achieves a successful (zero) exit status.
-It retries at most once every `min_duration` seconds to prevent
-an effective denial of service attack on the coordinating Thermos scheduler.
-
-#### daemon
-
-By default, Thermos processes are non-daemon. If `daemon` is set to True, a
-successful (zero) exit status does not prevent future process runs.
-Instead, the process reinvokes after `min_duration` seconds.
-However, the maximum failure limit still applies. A combination of
-`daemon=True` and `max_failures=0` causes a process to retry
-indefinitely regardless of exit status. This should be avoided
-for very short-lived processes because of the accumulation of
-checkpointed state for each process run. When running in Mesos
-specifically, `max_failures` is capped at 100.
-
-#### ephemeral
-
-By default, Thermos processes are non-ephemeral. If `ephemeral` is set to
-True, the process' status is not used to determine if its containing task
-has completed. For example, consider a task with a non-ephemeral
-webserver process and an ephemeral logsaver process
-that periodically checkpoints its log files to a centralized data store.
-The task is considered finished once the webserver process has
-completed, regardless of the logsaver's current status.
-
-#### min_duration
-
-Processes may succeed or fail multiple times during a single task's
-duration. Each of these is called a *process run*. `min_duration` is
-the minimum number of seconds the scheduler waits before running the
-same process.
-
-#### final
-
-Processes can be grouped into two classes: ordinary processes and
-finalizing processes. By default, Thermos processes are ordinary. They
-run as long as the task is considered healthy (i.e., no failure
-limits have been reached.) But once all regular Thermos processes
-finish or the task reaches a certain failure threshold, it
-moves into a "finalization" stage and runs all finalizing
-processes. These are typically processes necessary for cleaning up the
-task, such as log checkpointers, or perhaps e-mail notifications that
-the task completed.
-
-Finalizing processes may not depend upon ordinary processes or
-vice-versa, however finalizing processes may depend upon other
-finalizing processes and otherwise run as a typical process
-schedule.
-
-#### logger
-
-The default behavior of Thermos is to store  stderr/stdout logs in files which grow unbounded.
-In the event that you have large log volume, you may want to configure Thermos to automatically rotate logs
-after they grow to a certain size, which can prevent your job from using more than its allocated
-disk space.
-
-A Logger union consists of a destination enum, a mode enum and a rotation policy.
-It's to set where the process logs should be sent using `destination`. Default
-option is `file`. Its also possible to specify `console` to get logs output
-to stdout/stderr, `none` to suppress any logs output or `both` to send logs to files and
-console output. In case of using `none` or `console` rotation attributes are ignored.
-Rotation policies only apply to loggers whose mode is `rotate`. The acceptable values
-for the LoggerMode enum are `standard` and `rotate`. The rotation policy applies to both
-stderr and stdout.
-
-By default, all processes use the `standard` LoggerMode.
-
-  **Attribute Name**  | **Type**          | **Description**
-  ------------------- | :---------------: | ---------------------------------
-   **destination**    | LoggerDestination | Destination of logs. (Default: `file`)
-   **mode**           | LoggerMode        | Mode of the logger. (Default: `standard`)
-   **rotate**         | RotatePolicy      | An optional rotation policy.
-
-A RotatePolicy describes log rotation behavior for when `mode` is set to `rotate`. It is ignored
-otherwise.
-
-  **Attribute Name**  | **Type**     | **Description**
-  ------------------- | :----------: | ---------------------------------
-   **log_size**       | Integer      | Maximum size (in bytes) of an individual log file. (Default: 100 MiB)
-   **backups**        | Integer      | The maximum number of backups to retain. (Default: 5)
-
-An example process configuration is as follows:
-
-        process = Process(
-          name='process',
-          logger=Logger(
-            destination=LoggerDestination('both'),
-            mode=LoggerMode('rotate'),
-            rotate=RotatePolicy(log_size=5*MB, backups=5)
-          )
-        )
-
-Task Schema
-===========
-
-Tasks fundamentally consist of a `name` and a list of Process objects stored as the
-value of the `processes` attribute. Processes can be further constrained with
-`constraints`. By default, `name`'s value inherits from the first Process in the
-`processes` list, so for simple `Task` objects with one Process, `name`
-can be omitted. In Mesos, `resources` is also required.
-
-### Task Object
-
-   **param**               | **type**                         | **description**
-   ---------               | :---------:                      | ---------------
-   ```name```              | String                           | Process name (Required) (Default: ```processes0.name```)
-   ```processes```         | List of ```Process``` objects    | List of ```Process``` objects bound to this task. (Required)
-   ```constraints```       | List of ```Constraint``` objects | List of ```Constraint``` objects constraining processes.
-   ```resources```         | ```Resource``` object            | Resource footprint. (Required)
-   ```max_failures```      | Integer                          | Maximum process failures before being considered failed (Default: 1)
-   ```max_concurrency```   | Integer                          | Maximum number of concurrent processes (Default: 0, unlimited concurrency.)
-   ```finalization_wait``` | Integer                          | Amount of time allocated for finalizing processes, in seconds. (Default: 30)
-
-#### name
-`name` is a string denoting the name of this task. It defaults to the name of the first Process in
-the list of Processes associated with the `processes` attribute.
-
-#### processes
-
-`processes` is an unordered list of `Process` objects. To constrain the order
-in which they run, use `constraints`.
-
-##### constraints
-
-A list of `Constraint` objects. Currently it supports only one type,
-the `order` constraint. `order` is a list of process names
-that should run in the order given. For example,
-
-        process = Process(cmdline = "echo hello {{name}}")
-        task = Task(name = "echoes",
-                    processes = [process(name = "jim"), process(name = "bob")],
-                    constraints = [Constraint(order = ["jim", "bob"]))
-
-Constraints can be supplied ad-hoc and in duplicate. Not all
-Processes need be constrained, however Tasks with cycles are
-rejected by the Thermos scheduler.
-
-Use the `order` function as shorthand to generate `Constraint` lists.
-The following:
-
-        order(process1, process2)
-
-is shorthand for
-
-        [Constraint(order = [process1.name(), process2.name()])]
-
-The `order` function accepts Process name strings `('foo', 'bar')` or the processes
-themselves, e.g. `foo=Process(name='foo', ...)`, `bar=Process(name='bar', ...)`,
-`constraints=order(foo, bar)`.
-
-
-#### resources
-
-Takes a `Resource` object, which specifies the amounts of CPU, memory, and disk space resources
-to allocate to the Task.
-
-#### max_failures
-
-`max_failures` is the number of failed processes needed for the `Task` to be
-marked as failed.
-
-For example, assume a Task has two Processes and a `max_failures` value of `2`:
-
-        template = Process(max_failures=10)
-        task = Task(
-          name = "fail",
-          processes = [
-             template(name = "failing", cmdline = "exit 1"),
-             template(name = "succeeding", cmdline = "exit 0")
-          ],
-          max_failures=2)
-
-The `failing` Process could fail 10 times before being marked as permanently
-failed, and the `succeeding` Process could succeed on the first run. However,
-the task would succeed despite only allowing for two failed processes. To be more
-specific, there would be 10 failed process runs yet 1 failed process. Both processes
-would have to fail for the Task to fail.
-
-
-
-#### max_concurrency
-
-For Tasks with a number of expensive but otherwise independent
-processes, you may want to limit the amount of concurrency
-the Thermos scheduler provides rather than artificially constraining
-it via `order` constraints. For example, a test framework may
-generate a task with 100 test run processes, but wants to run it on
-a machine with only 4 cores. You can limit the amount of parallelism to
-4 by setting `max_concurrency=4` in your task configuration.
-
-For example, the following task spawns 180 Processes ("mappers")
-to compute individual elements of a 180 degree sine table, all dependent
-upon one final Process ("reducer") to tabulate the results:
-
-    def make_mapper(id):
-      return Process(
-        name = "mapper%03d" % id,
-        cmdline = "echo 'scale=50;s(%d\*4\*a(1)/180)' | bc -l >
-                   temp.sine_table.%03d" % (id, id))
-
-    def make_reducer():
-      return Process(name = "reducer", cmdline = "cat temp.\* | nl \> sine\_table.txt
-                     && rm -f temp.\*")
-
-    processes = map(make_mapper, range(180))
-
-    task = Task(
-      name = "mapreduce",
-      processes = processes + [make\_reducer()],
-      constraints = [Constraint(order = [mapper.name(), 'reducer']) for mapper
-                     in processes],
-      max_concurrency = 8)
-
-#### finalization_wait
-
-Process execution is organizued into three active stages: `ACTIVE`,
-`CLEANING`, and `FINALIZING`. The `ACTIVE` stage is when ordinary processes run.
-This stage lasts as long as Processes are running and the Task is healthy.
-The moment either all Processes have finished successfully or the Task has reached a
-maximum Process failure limit, it goes into `CLEANING` stage and send
-SIGTERMs to all currently running Processes and their process trees.
-Once all Processes have terminated, the Task goes into `FINALIZING` stage
-and invokes the schedule of all Processes with the "final" attribute set to True.
-
-This whole process from the end of `ACTIVE` stage to the end of `FINALIZING`
-must happen within `finalization_wait` seconds. If it does not
-finish during that time, all remaining Processes are sent SIGKILLs
-(or if they depend upon uncompleted Processes, are
-never invoked.)
-
-When running on Aurora, the `finalization_wait` is capped at 60 seconds.
-
-### Constraint Object
-
-Current constraint objects only support a single ordering constraint, `order`,
-which specifies its processes run sequentially in the order given. By
-default, all processes run in parallel when bound to a `Task` without
-ordering constraints.
-
-   param | type           | description
-   ----- | :----:         | -----------
-   order | List of String | List of processes by name (String) that should be run serially.
-
-### Resource Object
-
-Specifies the amount of CPU, Ram, and disk resources the task needs. See the
-[Resource Isolation document](resources.md) for suggested values and to understand how
-resources are allocated.
-
-  param      | type    | description
-  -----      | :----:  | -----------
-  ```cpu```  | Float   | Fractional number of cores required by the task.
-  ```ram```  | Integer | Bytes of RAM required by the task.
-  ```disk``` | Integer | Bytes of disk required by the task.
-
-
-Job Schema
-==========
-
-### Job Objects
-
-   name | type | description
-   ------ | :-------: | -------
-  ```task``` | Task | The Task object to bind to this job. Required.
-  ```name``` | String | Job name. (Default: inherited from the task attribute's name)
-  ```role``` | String | Job role account. Required.
-  ```cluster``` | String | Cluster in which this job is scheduled. Required.
-   ```environment``` | String | Job environment, default ```devel```. Must be one of ```prod```, ```devel```, ```test``` or ```staging<number>```.
-  ```contact``` | String | Best email address to reach the owner of the job. For production jobs, this is usually a team mailing list.
-  ```instances```| Integer | Number of instances (sometimes referred to as replicas or shards) of the task to create. (Default: 1)
-   ```cron_schedule``` | String | Cron schedule in cron format. May only be used with non-service jobs. See [Cron Jobs](cron-jobs.md) for more information. Default: None (not a cron job.)
-  ```cron_collision_policy``` | String | Policy to use when a cron job is triggered while a previous run is still active. KILL_EXISTING Kill the previous run, and schedule the new run CANCEL_NEW Let the previous run continue, and cancel the new run. (Default: KILL_EXISTING)
-  ```update_config``` | ```UpdateConfig``` object | Parameters for controlling the rate and policy of rolling updates.
-  ```constraints``` | dict | Scheduling constraints for the tasks. See the section on the [constraint specification language](#Specifying-Scheduling-Constraints)
-  ```service``` | Boolean | If True, restart tasks regardless of success or failure. (Default: False)
-  ```max_task_failures``` | Integer | Maximum number of failures after which the task is considered to have failed (Default: 1) Set to -1 to allow for infinite failures
-  ```priority``` | Integer | Preemption priority to give the task (Default 0). Tasks with higher priorities may preempt tasks at lower priorities.
-  ```production``` | Boolean |  Whether or not this is a production task that may [preempt](resources.md#task-preemption) other tasks (Default: False). Production job role must have the appropriate [quota](resources.md#resource-quota).
-  ```health_check_config``` | ```HealthCheckConfig``` object | Parameters for controlling a task's health checks. HTTP health check is only used if a  health port was assigned with a command line wildcard.
-  ```container``` | ```Container``` object | An optional container to run all processes inside of.
-  ```lifecycle``` | ```LifecycleConfig``` object | An optional task lifecycle configuration that dictates commands to be executed on startup/teardown.  HTTP lifecycle is enabled by default if the "health" port is requested.  See [LifecycleConfig Objects](#lifecycleconfig-objects) for more information.
-  ```tier``` | String | Task tier type. When set to `revocable` requires the task to run with Mesos revocable resources. This is work [in progress](https://issues.apache.org/jira/browse/AURORA-1343) and is currently only supported for the revocable tasks. The ultimate goal is to simplify task configuration by hiding various configuration knobs behind a task tier definition. See AURORA-1343 and AURORA-1443 for more details.
-
-### Services
-
-Jobs with the `service` flag set to True are called Services. The `Service`
-alias can be used as shorthand for `Job` with `service=True`.
-Services are differentiated from non-service Jobs in that tasks
-always restart on completion, whether successful or unsuccessful.
-Jobs without the service bit set only restart up to
-`max_task_failures` times and only if they terminated unsuccessfully
-either due to human error or machine failure.
-
-### Revocable Jobs
-
-**WARNING**: This feature is currently in alpha status. Do not use it in production clusters!
-
-Mesos [supports a concept of revocable tasks](http://mesos.apache.org/documentation/latest/oversubscription/)
-by oversubscribing machine resources by the amount deemed safe to not affect the existing
-non-revocable tasks. Aurora now supports revocable jobs via a `tier` setting set to `revocable`
-value.
-
-More implementation details in this [ticket](https://issues.apache.org/jira/browse/AURORA-1343).
-
-Scheduler must be [configured](deploying-aurora-scheduler.md#configuring-resource-oversubscription)
-to receive revocable offers from Mesos and accept revocable jobs. If not configured properly
-revocable tasks will never get assigned to hosts and will stay in PENDING.
-
-### UpdateConfig Objects
-
-Parameters for controlling the rate and policy of rolling updates.
-
-| object                       | type     | description
-| ---------------------------- | :------: | ------------
-| ```batch_size```             | Integer  | Maximum number of shards to be updated in one iteration (Default: 1)
-| ```watch_secs```             | Integer  | Minimum number of seconds a shard must remain in ```RUNNING``` state before considered a success (Default: 45)
-| ```max_per_shard_failures``` | Integer  | Maximum number of restarts per shard during update. Increments total failure count when this limit is exceeded. (Default: 0)
-| ```max_total_failures```     | Integer  | Maximum number of shard failures to be tolerated in total during an update. Cannot be greater than or equal to the total number of tasks in a job. (Default: 0)
-| ```rollback_on_failure```    | boolean  | When False, prevents auto rollback of a failed update (Default: True)
-| ```wait_for_batch_completion```| boolean | When True, all threads from a given batch will be blocked from picking up new instances until the entire batch is updated. This essentially simulates the legacy sequential updater algorithm. (Default: False)
-| ```pulse_interval_secs```    | Integer  |  Indicates a [coordinated update](client-commands.md#user-content-coordinated-job-updates). If no pulses are received within the provided interval the update will be blocked. Beta-updater only. Will fail on submission when used with client updater. (Default: None)
-
-### HealthCheckConfig Objects
-
-*Note: ```endpoint```, ```expected_response``` and ```expected_response_code``` are deprecated from ```HealthCheckConfig``` and must be definied in ```HttpHealthChecker```.*
-
-Parameters for controlling a task's health checks via HTTP or a shell command.
-
-| param                          | type      | description
-| -------                        | :-------: | --------
-| ```health_checker```           | HealthCheckerConfig | Configure what kind of health check to use.
-| ```initial_interval_secs```    | Integer   | Initial delay for performing a health check. (Default: 15)
-| ```interval_secs```            | Integer   | Interval on which to check the task's health. (Default: 10)
-| ```max_consecutive_failures``` | Integer   | Maximum number of consecutive failures that will be tolerated before considering a task unhealthy (Default: 0)
-| ```timeout_secs```             | Integer   | Health check timeout. (Default: 1)
-
-### HealthCheckerConfig Objects
-| param                          | type                | description
-| -------                        | :-------:           | --------
-| ```http```                     | HttpHealthChecker  | Configure health check to use HTTP. (Default)
-| ```shell```                    | ShellHealthChecker | Configure health check via a shell command.
-
-
-### HttpHealthChecker Objects
-| param                          | type      | description
-| -------                        | :-------: | --------
-| ```endpoint```                 | String    | HTTP endpoint to check (Default: /health)
-| ```expected_response```        | String    | If not empty, fail the HTTP health check if the response differs. Case insensitive. (Default: ok)
-| ```expected_response_code```   | Integer   | If not zero, fail the HTTP health check if the response code differs. (Default: 0)
-
-### ShellHealthChecker Objects
-| param                          | type      | description
-| -------                        | :-------: | --------
-| ```shell_command```            | String    | An alternative to HTTP health checking. Specifies a shell command that will be executed. Any non-zero exit status will be interpreted as a health check failure.
-
-
-### Announcer Objects
-
-If the `announce` field in the Job configuration is set, each task will be
-registered in the ServerSet `/aurora/role/environment/jobname` in the
-zookeeper ensemble configured by the executor (which can be optionally overriden by specifying
-zk_path parameter).  If no Announcer object is specified,
-no announcement will take place.  For more information about ServerSets, see the [User Guide](user-guide.md).
-
-By default, the hostname in the registered endpoints will be the `--hostname` parameter
-that is passed to the mesos slave. To override the hostname value, the executor can be started
-with `--announcer-hostname=<overriden_value>`. If you decide to use `--announcer-hostname` and if
-the overriden value needs to change for every executor, then the executor has to be started inside a wrapper, see [Executor Wrapper](#executor-wrapper).
-
-For example, if you want the hostname in the endpoint to be an IP address instead of the hostname,
-the `--hostname` parameter to the mesos slave can be set to the machine IP or the executor can
-be started with `--announcer-hostname=<host_ip>` while wrapping the executor inside a script.
-
-| object                         | type      | description
-| -------                        | :-------: | --------
-| ```primary_port```             | String    | Which named port to register as the primary endpoint in the ServerSet (Default: `http`)
-| ```portmap```                  | dict      | A mapping of additional endpoints to announced in the ServerSet (Default: `{ 'aurora': '{{primary_port}}' }`)
-| ```zk_path```                  | String    | Zookeeper serverset path override (executor must be started with the --announcer-allow-custom-serverset-path parameter)
-
-### Port aliasing with the Announcer `portmap`
-
-The primary endpoint registered in the ServerSet is the one allocated to the port
-specified by the `primary_port` in the `Announcer` object, by default
-the `http` port.  This port can be referenced from anywhere within a configuration
-as `{{thermos.ports[http]}}`.
-
-Without the port map, each named port would be allocated a unique port number.
-The `portmap` allows two different named ports to be aliased together.  The default
-`portmap` aliases the `aurora` port (i.e. `{{thermos.ports[aurora]}}`) to
-the `http` port.  Even though the two ports can be referenced independently,
-only one port is allocated by Mesos.  Any port referenced in a `Process` object
-but which is not in the portmap will be allocated dynamically by Mesos and announced as well.
-
-It is possible to use the portmap to alias names to static port numbers, e.g.
-`{'http': 80, 'https': 443, 'aurora': 'http'}`.  In this case, referencing
-`{{thermos.ports[aurora]}}` would look up `{{thermos.ports[http]}}` then
-find a static port 80.  No port would be requested of or allocated by Mesos.
-
-Static ports should be used cautiously as Aurora does nothing to prevent two
-tasks with the same static port allocations from being co-scheduled.
-External constraints such as slave attributes should be used to enforce such
-guarantees should they be needed.
-
-### Container Objects
-
-*Note: The only container type currently supported is "docker".  Docker support is currently EXPERIMENTAL.*
-*Note: In order to correctly execute processes inside a job, the Docker container must have python 2.7 installed.*
-
-*Note: For private docker registry, mesos mandates the docker credential file to be named as `.dockercfg`, even though docker may create a credential file with a different name on various platforms. Also, the `.dockercfg` file needs to be copied into the sandbox using the `-thermos_executor_resources` flag, specified while starting Aurora.*
-
-Describes the container the job's processes will run inside.
-
-  param          | type           | description
-  -----          | :----:         | -----------
-  ```docker```   | Docker         | A docker container to use.
-
-### Docker Object
-
-  param            | type            | description
-  -----            | :----:          | -----------
-  ```image```      | String          | The name of the docker image to execute.  If the image does not exist locally it will be pulled with ```docker pull```.
-  ```parameters``` | List(Parameter) | Additional parameters to pass to the docker containerizer.
-
-### Docker Parameter Object
-
-Docker CLI parameters. This needs to be enabled by the scheduler `allow_docker_parameters` option.
-See [Docker Command Line Reference](https://docs.docker.com/reference/commandline/run/) for valid parameters.
-
-  param            | type            | description
-  -----            | :----:          | -----------
-  ```name```       | String          | The name of the docker parameter. E.g. volume
-  ```value```      | String          | The value of the parameter. E.g. /usr/local/bin:/usr/bin:rw
-
-### LifecycleConfig Objects
-
-*Note: The only lifecycle configuration supported is the HTTP lifecycle via the HttpLifecycleConfig.*
-
-  param          | type                | description
-  -----          | :----:              | -----------
-  ```http```     | HttpLifecycleConfig | Configure the lifecycle manager to send lifecycle commands to the task via HTTP.
-
-### HttpLifecycleConfig Objects
-
-  param          | type            | description
-  -----          | :----:          | -----------
-  ```port```     | String          | The named port to send POST commands (Default: health)
-  ```graceful_shutdown_endpoint``` | String | Endpoint to hit to indicate that a task should gracefully shutdown. (Default: /quitquitquit)
-  ```shutdown_endpoint``` | String | Endpoint to hit to give a task its final warning before being killed. (Default: /abortabortabort)
-
-#### graceful_shutdown_endpoint
-
-If the Job is listening on the port as specified by the HttpLifecycleConfig
-(default: `health`), a HTTP POST request will be sent over localhost to this
-endpoint to request that the task gracefully shut itself down.  This is a
-courtesy call before the `shutdown_endpoint` is invoked a fixed amount of
-time later.
-
-#### shutdown_endpoint
-
-If the Job is listening on the port as specified by the HttpLifecycleConfig
-(default: `health`), a HTTP POST request will be sent over localhost to this
-endpoint to request as a final warning before being shut down.  If the task
-does not shut down on its own after this, it will be forcefully killed
-
-
-Specifying Scheduling Constraints
-=================================
-
-In the `Job` object there is a map `constraints` from String to String
-allowing the user to tailor the schedulability of tasks within the job.
-
-Each slave in the cluster is assigned a set of string-valued
-key/value pairs called attributes. For example, consider the host
-`cluster1-aaa-03-sr2` and its following attributes (given in key:value
-format): `host:cluster1-aaa-03-sr2` and `rack:aaa`.
-
-The constraint map's key value is the attribute name in which we
-constrain Tasks within our Job. The value is how we constrain them.
-There are two types of constraints: *limit constraints* and *value
-constraints*.
-
-| constraint    | description
-| ------------- | --------------
-| Limit         | A string that specifies a limit for a constraint. Starts with <code>'limit:</code> followed by an Integer and closing single quote, such as ```'limit:1'```.
-| Value         | A string that specifies a value for a constraint. To include a list of values, separate the values using commas. To negate the values of a constraint, start with a ```!``` ```.```
-
-You can also control machine diversity using constraints. The below
-constraint ensures that no more than two instances of your job may run
-on a single host. Think of this as a "group by" limit.
-
-    constraints = {
-      'host': 'limit:2',
-    }
-
-Likewise, you can use constraints to control rack diversity, e.g. at
-most one task per rack:
-
-    constraints = {
-      'rack': 'limit:1',
-    }
-
-Use these constraints sparingly as they can dramatically reduce Tasks' schedulability.
-
-
-Executor Wrapper
-================
-
-If you need to do computation before starting the thermos executor (for example, setting a different
-`--announcer-hostname` parameter for every executor), then the thermos executor should be invoked
- inside a wrapper script. In such a case, the aurora scheduler should be started with
- `-thermos_executor_path` pointing to the wrapper script and `-thermos_executor_resources`
- set to a comma separated string of all the resources that should be copied into
- the sandbox (including the original thermos executor).
-
-For example, to wrap the executor inside a simple wrapper, the scheduler will be started like this
-`-thermos_executor_path=/path/to/wrapper.sh -thermos_executor_resources=/usr/share/aurora/bin/thermos_executor.pex`
-
-Template Namespaces
-===================
-
-Currently, a few Pystachio namespaces have special semantics. Using them
-in your configuration allow you to tailor application behavior
-through environment introspection or interact in special ways with the
-Aurora client or Aurora-provided services.
-
-### mesos Namespace
-
-The `mesos` namespace contains variables which relate to the `mesos` slave
-which launched the task. The `instance` variable can be used
-to distinguish between Task replicas.
-
-| variable name     | type       | description
-| --------------- | :--------: | -------------
-| ```instance```    | Integer    | The instance number of the created task. A job with 5 replicas has instance numbers 0, 1, 2, 3, and 4.
-| ```hostname``` | String | The instance hostname that the task was launched on.
-
-Please note, there is no uniqueness guarantee for `instance` in the presence of
-network partitions. If that is required, it should be baked in at the application
-level using a distributed coordination service such as Zookeeper.
-
-### thermos Namespace
-
-The `thermos` namespace contains variables that work directly on the
-Thermos platform in addition to Aurora. This namespace is fully
-compatible with Tasks invoked via the `thermos` CLI.
-
-| variable      | type                     | description                        |
-| :----------:  | ---------                | ------------                       |
-| ```ports```   | map of string to Integer | A map of names to port numbers     |
-| ```task_id``` | string                   | The task ID assigned to this task. |
-
-The `thermos.ports` namespace is automatically populated by Aurora when
-invoking tasks on Mesos. When running the `thermos` command directly,
-these ports must be explicitly mapped with the `-P` option.
-
-For example, if '{{`thermos.ports[http]`}}' is specified in a `Process`
-configuration, it is automatically extracted and auto-populated by
-Aurora, but must be specified with, for example, `thermos -P http:12345`
-to map `http` to port 12345 when running via the CLI.
-
-Basic Examples
-==============
-
-These are provided to give a basic understanding of simple Aurora jobs.
-
-### hello_world.aurora
-
-Put the following in a file named `hello_world.aurora`, substituting your own values
-for values such as `cluster`s.
-
-    import os
-    hello_world_process = Process(name = 'hello_world', cmdline = 'echo hello world')
-
-    hello_world_task = Task(
-      resources = Resources(cpu = 0.1, ram = 16 * MB, disk = 16 * MB),
-      processes = [hello_world_process])
-
-    hello_world_job = Job(
-      cluster = 'cluster1',
-      role = os.getenv('USER'),
-      task = hello_world_task)
-
-    jobs = [hello_world_job]
-
-Then issue the following commands to create and kill the job, using your own values for the job key.
-
-    aurora job create cluster1/$USER/test/hello_world hello_world.aurora
-
-    aurora job kill cluster1/$USER/test/hello_world
-
-### Environment Tailoring
-
-#### hello_world_productionized.aurora
-
-Put the following in a file named `hello_world_productionized.aurora`, substituting your own values
-for values such as `cluster`s.
-
-    include('hello_world.aurora')
-
-    production_resources = Resources(cpu = 1.0, ram = 512 * MB, disk = 2 * GB)
-    staging_resources = Resources(cpu = 0.1, ram = 32 * MB, disk = 512 * MB)
-    hello_world_template = hello_world(
-        name = "hello_world-{{cluster}}"
-        task = hello_world(resources=production_resources))
-
-    jobs = [
-      # production jobs
-      hello_world_template(cluster = 'cluster1', instances = 25),
-      hello_world_template(cluster = 'cluster2', instances = 15),
-
-      # staging jobs
-      hello_world_template(
-        cluster = 'local',
-        instances = 1,
-        task = hello_world(resources=staging_resources)),
-    ]
-
-Then issue the following commands to create and kill the job, using your own values for the job key
-
-    aurora job create cluster1/$USER/test/hello_world-cluster1 hello_world_productionized.aurora
-
-    aurora job kill cluster1/$USER/test/hello_world-cluster1


[3/7] aurora git commit: Reorganize Documentation

Posted by se...@apache.org.
http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/presentations.md
----------------------------------------------------------------------
diff --git a/docs/presentations.md b/docs/presentations.md
deleted file mode 100644
index 84756a2..0000000
--- a/docs/presentations.md
+++ /dev/null
@@ -1,80 +0,0 @@
-# Apache Aurora Presentations
-Video and slides from presentations and panel discussions about Apache Aurora.
-
-_(Listed in date descending order)_
-
-<table>
-
-	<tr>
-		<td><img src="images/presentations/10_08_2015_mesos_aurora_on_a_small_scale_thumb.png" alt="Mesos and Aurora on a Small Scale Thumbnail" /></td>
-		<td><strong><a href="https://www.youtube.com/watch?v=q5iIqhaCJ_o">Mesos &amp; Aurora on a Small Scale (Video)</a></strong>
-		<p>Presented by Florian Pfeiffer</p>
-		<p>October 8, 2015 at <a href="http://events.linuxfoundation.org/events/archive/2015/mesoscon-europe">#MesosCon Europe 2015</a></p></td>
-	</tr>
-	<tr>
-		<td><img src="images/presentations/10_08_2015_sla_aware_maintenance_for_operators_thumb.png" alt="SLA Aware Maintenance for Operators Thumbnail" /></td>
-		<td><strong><a href="https://www.youtube.com/watch?v=tZ0-SISvCis">SLA Aware Maintenance for Operators (Video)</a></strong>
-		<p>Presented by Joe Smith</p>
-		<p>October 8, 2015 at <a href="http://events.linuxfoundation.org/events/archive/2015/mesoscon-europe">#MesosCon Europe 2015</a></p></td>
-	</tr>
-	<tr>
-		<td><img src="images/presentations/09_20_2015_shipping_code_with_aurora_thumb.png" alt="Shipping Code with Aurora Thumbnail" /></td>
-		<td><strong><a href="https://www.youtube.com/watch?v=y1hi7K1lPkk">Shipping Code with Aurora (Video)</a></strong>
-		<p>Presented by Bill Farner</p>
-		<p>August 20, 2015 at <a href="http://events.linuxfoundation.org/events/archive/2015/mesoscon">#MesosCon 2015</a></p></td>
-	</tr>
-	<tr>
-		<td><img src="images/presentations/09_20_2015_twitter_production_scale_thumb.png" alt="Twitter Production Scale Thumbnail" /></td>
-		<td><strong><a href="https://www.youtube.com/watch?v=nNrh-gdu9m4">Twitter’s Production Scale: Mesos and Aurora Operations (Video)</a></strong>
-		<p>Presented by Joe Smith</p>
-		<p>August 20, 2015 at <a href="http://events.linuxfoundation.org/events/archive/2015/mesoscon">#MesosCon 2015</a></p></td>
-	</tr>
-	<tr>
-		<td><img src="images/presentations/04_30_2015_monolith_to_microservices_thumb.png" alt="From Monolith to Microservices with Aurora Video Thumbnail" /></td>
-		<td><strong><a href="https://www.youtube.com/watch?v=yXkOgnyK4Hw">From Monolith to Microservices w/ Aurora (Video)</a></strong>
-		<p>Presented by Thanos Baskous, Tony Dong, Dobromir Montauk</p>
-		<p>April 30, 2015 at <a href="http://www.meetup.com/Bay-Area-Apache-Aurora-Users-Group/events/221219480/">Bay Area Apache Aurora Users Group</a></p></td>
-	</tr>
-	<tr>
-		<td><img src="images/presentations/03_07_2015_aurora_mesos_in_practice_at_twitter_thumb.png" alt="Aurora + Mesos in Practice at Twitter Thumbnail" /></td>
-		<td><strong><a href="https://www.youtube.com/watch?v=1XYJGX_qZVU">Aurora + Mesos in Practice at Twitter (Video)</a></strong>
-		<p>Presented by Bill Farner</p>
-		<p>March 07, 2015 at <a href="http://www.bigeng.io/aurora-mesos-in-practice-at-twitter">Bigcommerce TechTalk</a></p></td>
-	</tr>
-	<tr>
-		<td><img src="images/presentations/02_28_2015_apache_aurora_thumb.png" alt="Apache Auroraの始めかた Slideshow Thumbnail" /></td>
-		<td><strong><a href="http://www.slideshare.net/zembutsu/apache-aurora-introduction-and-tutorial-osc15tk">Apache Auroraの始めかた (Slides)</a></strong>
-		<p>Presented by Masahito Zembutsu</p>
-		<p>February 28, 2015 at <a href="http://www.ospn.jp/osc2015-spring/">Open Source Conference 2015 Tokyo Spring</a></p></td>
-	</tr>
-	<tr>
-		<td><img src="images/presentations/02_19_2015_aurora_adopters_panel_thumb.png" alt="Apache Aurora Adopters Panel Video Thumbnail" /></td>
-		<td><strong><a href="https://www.youtube.com/watch?v=2Jsj0zFdRlg">Apache Aurora Adopters Panel (Video)</a></strong>
-		<p>Panelists Ben Staffin, Josh Adams, Bill Farner, Berk Demir</p>
-		<p>February 19, 2015 at <a href="http://www.meetup.com/Bay-Area-Mesos-User-Group/events/220279080/">Bay Area Mesos Users Group</a></p></td>
-	</tr>
-	<tr>
-		<td><img src="images/presentations/02_19_2015_aurora_at_twitter_thumb.png" alt="Operating Apache Aurora and Mesos at Twitter Video Thumbnail" /></td>
-		<td><strong><a href="https://www.youtube.com/watch?v=E4lxX6epM_U">Operating Apache Aurora and Mesos at Twitter (Video)</a></strong>
-		<p>Presented by Joe Smith</p>
-		<p>February 19, 2015 at <a href="http://www.meetup.com/Bay-Area-Mesos-User-Group/events/220279080/">Bay Area Mesos Users Group</a></p></td>
-	</tr>
-	<tr>
-		<td><img src="images/presentations/02_19_2015_aurora_at_tellapart_thumb.png" alt="Apache Aurora and Mesos at TellApart" /></td>
-		<td><strong><a href="https://www.youtube.com/watch?v=ZZXtXLvTXAE">Apache Aurora and Mesos at TellApart (Video)</a></strong>
-		<p>Presented by Steve Niemitz</p>
-		<p>February 19, 2015 at <a href="http://www.meetup.com/Bay-Area-Mesos-User-Group/events/220279080/">Bay Area Mesos Users Group</a></p></td>
-	</tr>
-	<tr>
-		<td><img src="images/presentations/08_21_2014_past_present_future_thumb.png" alt="Past, Present, and Future of the Aurora Scheduler Video Thumbnail" /></td>
-		<td><strong><a href="https://www.youtube.com/watch?v=Dsc5CPhKs4o">Past, Present, and Future of the Aurora Scheduler (Video)</a></strong>
-		<p>Presented by Bill Farner</p>
-		<p>August 21, 2014 at <a href="http://events.linuxfoundation.org/events/archive/2014/mesoscon">#MesosCon 2014</a></p></td>
-	</tr>
-	<tr>
-		<td><img src="images/presentations/03_25_2014_introduction_to_aurora_thumb.png" alt="Introduction to Apache Aurora Video Thumbnail" /></td>
-		<td><strong><a href="https://www.youtube.com/watch?v=asd_h6VzaJc">Introduction to Apache Aurora (Video)</a></strong>
-		<p>Presented by Bill Farner</p>
-		<p>March 25, 2014 at <a href="https://www.eventbrite.com/e/aurora-and-mesosframeworksmeetup-tickets-10850994617">Aurora and Mesos Frameworks Meetup</a></p></td>
-	</tr>
-</table>

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/reference/client-cluster-configuration.md
----------------------------------------------------------------------
diff --git a/docs/reference/client-cluster-configuration.md b/docs/reference/client-cluster-configuration.md
new file mode 100644
index 0000000..ee02ca1
--- /dev/null
+++ b/docs/reference/client-cluster-configuration.md
@@ -0,0 +1,93 @@
+# Client Cluster Configuration
+
+A cluster configuration file is used by the Aurora client to describe the Aurora clusters with
+which it can communicate. Ultimately this allows client users to reference clusters with short names
+like us-east and eu.
+
+A cluster configuration is formatted as JSON.  The simplest cluster configuration is one that
+communicates with a single (non-leader-elected) scheduler.  For example:
+
+    [{
+      "name": "example",
+      "scheduler_uri": "http://localhost:55555",
+    }]
+
+
+A configuration for a leader-elected scheduler would contain something like:
+
+    [{
+      "name": "example",
+      "zk": "192.168.33.7",
+      "scheduler_zk_path": "/aurora/scheduler"
+    }]
+
+
+The following properties may be set:
+
+  **Property**             | **Type** | **Description**
+  :------------------------| :------- | :--------------
+   **name**                | String   | Cluster name (Required)
+   **slave_root**          | String   | Path to mesos slave work dir (Required)
+   **slave_run_directory** | String   | Name of mesos slave run dir (Required)
+   **zk**                  | String   | Hostname of ZooKeeper instance used to resolve Aurora schedulers.
+   **zk_port**             | Integer  | Port of ZooKeeper instance used to locate Aurora schedulers (Default: 2181)
+   **scheduler_zk_path**   | String   | ZooKeeper path under which scheduler instances are registered.
+   **scheduler_uri**       | String   | URI of Aurora scheduler instance.
+   **proxy_url**           | String   | Used by the client to format URLs for display.
+   **auth_mechanism**      | String   | The authentication mechanism to use when communicating with the scheduler. (Default: UNAUTHENTICATED)
+
+
+## Details
+
+### `name`
+
+The name of the Aurora cluster represented by this entry. This name will be the `cluster` portion of
+any job keys identifying jobs running within the cluster.
+
+### `slave_root`
+
+The path on the mesos slaves where executing tasks can be found. It is used in combination with the
+`slave_run_directory` property by `aurora task run` and `aurora task ssh` to change into the sandbox
+directory after connecting to the host. This value should match the value passed to `mesos-slave`
+as `-work_dir`.
+
+### `slave_run_directory`
+
+The name of the directory where the task run can be found. This is used in combination with the
+`slave_root` property by `aurora task run` and `aurora task ssh` to change into the sandbox
+directory after connecting to the host. This should almost always be set to `latest`.
+
+### `zk`
+
+The hostname of the ZooKeeper instance used to resolve the Aurora scheduler. Aurora uses ZooKeeper
+to elect a leader. The client will connect to this ZooKeeper instance to determine the current
+leader. This host should match the host passed to the scheduler as `-zk_endpoints`.
+
+### `zk_port`
+
+The port on which the ZooKeeper instance is running. If not set this will default to the standard
+ZooKeeper port of 2181. This port should match the port in the host passed to the scheduler as
+`-zk_endpoints`.
+
+### `scheduler_zk_path`
+
+The path on the ZooKeeper instance under which the Aurora serverset is registered. This value should
+match the value passed to the scheduler as `-serverset_path`.
+
+### `scheduler_uri`
+
+The URI of the scheduler. This would be used in place of the ZooKeeper related configuration above
+in circumstances where direct communication with a single scheduler is needed (e.g. testing
+environments). It is strongly advised to **never** use this property for production deploys.
+
+### `proxy_url`
+
+Instead of using the hostname of the leading scheduler as the base url, if `proxy_url` is set, its
+value will be used instead. In that scenario the value for `proxy_url` would be, for example, the
+URL of your VIP in a loadbalancer or a roundrobin DNS name.
+
+### `auth_mechanism`
+
+The identifier of an authentication mechanism that the client should use when communicating with the
+scheduler. Support for values other than `UNAUTHENTICATED` requires a matching scheduler-side
+[security configuration](../operations/security.md).

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/reference/client-commands.md
----------------------------------------------------------------------
diff --git a/docs/reference/client-commands.md b/docs/reference/client-commands.md
new file mode 100644
index 0000000..84a8bd4
--- /dev/null
+++ b/docs/reference/client-commands.md
@@ -0,0 +1,326 @@
+Aurora Client Commands
+======================
+
+- [Introduction](#introduction)
+- [Cluster Configuration](#cluster-configuration)
+- [Job Keys](#job-keys)
+- [Modifying Aurora Client Commands](#modifying-aurora-client-commands)
+- [Regular Jobs](#regular-jobs)
+    - [Creating and Running a Job](#creating-and-running-a-job)
+    - [Running a Command On a Running Job](#running-a-command-on-a-running-job)
+    - [Killing a Job](#killing-a-job)
+    - [Adding Instances](#adding-instances)
+    - [Updating a Job](#updating-a-job)
+        - [Coordinated job updates](#user-content-coordinated-job-updates)
+    - [Renaming a Job](#renaming-a-job)
+    - [Restarting Jobs](#restarting-jobs)
+- [Cron Jobs](#cron-jobs)
+- [Comparing Jobs](#comparing-jobs)
+- [Viewing/Examining Jobs](#viewingexamining-jobs)
+    - [Listing Jobs](#listing-jobs)
+    - [Inspecting a Job](#inspecting-a-job)
+    - [Versions](#versions)
+    - [Checking Your Quota](#checking-your-quota)
+    - [Finding a Job on Web UI](#finding-a-job-on-web-ui)
+    - [Getting Job Status](#getting-job-status)
+    - [Opening the Web UI](#opening-the-web-ui)
+    - [SSHing to a Specific Task Machine](#sshing-to-a-specific-task-machine)
+    - [Templating Command Arguments](#templating-command-arguments)
+
+Introduction
+------------
+
+Once you have written an `.aurora` configuration file that describes
+your Job and its parameters and functionality, you interact with Aurora
+using Aurora Client commands. This document describes all of these commands
+and how and when to use them. All Aurora Client commands start with
+`aurora`, followed by the name of the specific command and its
+arguments.
+
+*Job keys* are a very common argument to Aurora commands, as well as the
+gateway to useful information about a Job. Before using Aurora, you
+should read the next section which describes them in detail. The section
+after that briefly describes how you can modify the behavior of certain
+Aurora Client commands, linking to a detailed document about how to do
+that.
+
+This is followed by the Regular Jobs section, which describes the basic
+Client commands for creating, running, and manipulating Aurora Jobs.
+After that are sections on Comparing Jobs and Viewing/Examining Jobs. In
+other words, various commands for getting information and metadata about
+Aurora Jobs.
+
+Cluster Configuration
+---------------------
+
+The client must be able to find a configuration file that specifies available clusters. This file
+declares shorthand names for clusters, which are in turn referenced by job configuration files
+and client commands.
+
+The client will load at most two configuration files, making both of their defined clusters
+available. The first is intended to be a system-installed cluster, using the path specified in
+the environment variable `AURORA_CONFIG_ROOT`, defaulting to `/etc/aurora/clusters.json` if the
+environment variable is not set. The second is a user-installed file, located at
+`~/.aurora/clusters.json`.
+
+For more details on cluster configuration see the
+[Client Cluster Configuration](client-cluster-configuration.md) documentation.
+
+Job Keys
+--------
+
+A job key is a unique system-wide identifier for an Aurora-managed
+Job, for example `cluster1/web-team/test/experiment204`. It is a 4-tuple
+consisting of, in order, *cluster*, *role*, *environment*, and
+*jobname*, separated by /s. Cluster is the name of an Aurora
+cluster. Role is the Unix service account under which the Job
+runs. Environment is a namespace component like `devel`, `test`,
+`prod`, or `stagingN.` Jobname is the Job's name.
+
+The combination of all four values uniquely specifies the Job. If any
+one value is different from that of another job key, the two job keys
+refer to different Jobs. For example, job key
+`cluster1/tyg/prod/workhorse` is different from
+`cluster1/tyg/prod/workcamel` is different from
+`cluster2/tyg/prod/workhorse` is different from
+`cluster2/foo/prod/workhorse` is different from
+`cluster1/tyg/test/workhorse.`
+
+Role names are user accounts existing on the slave machines. If you don't know what accounts
+are available, contact your sysadmin.
+
+Environment names are namespaces; you can count on `prod`, `devel` and `test` existing.
+
+Modifying Aurora Client Commands
+--------------------------------
+
+For certain Aurora Client commands, you can define hook methods that run
+either before or after an action that takes place during the command's
+execution, as well as based on whether the action finished successfully or failed
+during execution. Basically, a hook is code that lets you extend the
+command's actions. The hook executes on the client side, specifically on
+the machine executing Aurora commands.
+
+Hooks can be associated with these Aurora Client commands.
+
+  - `job create`
+  - `job kill`
+  - `job restart`
+
+The process for writing and activating them is complex enough
+that we explain it in a devoted document, [Hooks for Aurora Client API](client-hooks.md).
+
+Regular Jobs
+------------
+
+This section covers Aurora commands related to running, killing,
+renaming, updating, and restarting a basic Aurora Job.
+
+### Creating and Running a Job
+
+    aurora job create <job key> <configuration file>
+
+Creates and then runs a Job with the specified job key based on a `.aurora` configuration file.
+The configuration file may also contain and activate hook definitions.
+
+### Running a Command On a Running Job
+
+    aurora task run CLUSTER/ROLE/ENV/NAME[/INSTANCES] <cmd>
+
+Runs a shell command on all machines currently hosting shards of a
+single Job.
+
+`run` supports the same command line wildcards used to populate a Job's
+commands; i.e. anything in the `{{mesos.*}}` and `{{thermos.*}}`
+namespaces.
+
+### Killing a Job
+
+    aurora job killall CLUSTER/ROLE/ENV/NAME
+
+Kills all Tasks associated with the specified Job, blocking until all
+are terminated. Defaults to killing all instances in the Job.
+
+The `<configuration file>` argument for `kill` is optional. Use it only
+if it contains hook definitions and activations that affect the
+kill command.
+
+### Adding Instances
+
+    aurora job add CLUSTER/ROLE/ENV/NAME/INSTANCE <count>
+
+Adds `<count>` instances to the existing job. The configuration of the new instances is derived from
+an active job instance pointed by the `/INSTANCE` part of the job specification. This command is
+a simpler way to scale out an existing job when an instance with desired task configuration
+already exists. Use `aurora update start` to add instances with a new (updated) configuration.
+
+### Updating a Job
+
+You can manage job updates using the `aurora update` command.  Please see
+[the Job Update documentation](../features/job-updates.md) for more details.
+
+
+### Renaming a Job
+
+Renaming is a tricky operation as downstream clients must be informed of
+the new name. A conservative approach
+to renaming suitable for production services is:
+
+1.  Modify the Aurora configuration file to change the role,
+    environment, and/or name as appropriate to the standardized naming
+    scheme.
+2.  Check that only these naming components have changed
+    with `aurora diff`.
+
+        aurora job diff CLUSTER/ROLE/ENV/NAME <job_configuration>
+
+3.  Create the (identical) job at the new key. You may need to request a
+    temporary quota increase.
+
+        aurora job create CLUSTER/ROLE/ENV/NEW_NAME <job_configuration>
+
+4.  Migrate all clients over to the new job key. Update all links and
+    dashboards. Ensure that both job keys run identical versions of the
+    code while in this state.
+5.  After verifying that all clients have successfully moved over, kill
+    the old job.
+
+        aurora job killall CLUSTER/ROLE/ENV/NAME
+
+6.  If you received a temporary quota increase, be sure to let the
+    powers that be know you no longer need the additional capacity.
+
+### Restarting Jobs
+
+`restart` restarts all of a job key identified Job's shards:
+
+    aurora job restart CLUSTER/ROLE/ENV/NAME[/INSTANCES]
+
+Restarts are controlled on the client side, so aborting
+the `job restart` command halts the restart operation.
+
+**Note**: `job restart` only applies its command line arguments and does not
+use or is affected by `update.config`. Restarting
+does ***not*** involve a configuration change. To update the
+configuration, use `update.config`.
+
+The `--config` argument for restart is optional. Use it only
+if it contains hook definitions and activations that affect the
+`job restart` command.
+
+Cron Jobs
+---------
+
+You can manage cron jobs using the `aurora cron` command.  Please see
+[the Cron Jobs Feature](../features/cron-jobs.md) for more details.
+
+Comparing Jobs
+--------------
+
+    aurora job diff CLUSTER/ROLE/ENV/NAME <job configuration>
+
+Compares a job configuration against a running job. By default the diff
+is determined using `diff`, though you may choose an alternate
+ diff program by specifying the `DIFF_VIEWER` environment variable.
+
+Viewing/Examining Jobs
+----------------------
+
+Above we discussed creating, killing, and updating Jobs. Here we discuss
+how to view and examine Jobs.
+
+### Listing Jobs
+
+    aurora config list <job configuration>
+
+Lists all Jobs registered with the Aurora scheduler in the named cluster for the named role.
+
+### Inspecting a Job
+
+    aurora job inspect CLUSTER/ROLE/ENV/NAME <job configuration>
+
+`inspect` verifies that its specified job can be parsed from a
+configuration file, and displays the parsed configuration.
+
+### Checking Your Quota
+
+    aurora quota get CLUSTER/ROLE
+
+Prints the production quota allocated to the role's value at the given
+cluster. Only non-[dedicated](../features/constraints.md#dedicated-attribute)
+[production](configuration.md#job-objects) jobs consume quota.
+
+### Finding a Job on Web UI
+
+When you create a job, part of the output response contains a URL that goes
+to the job's scheduler UI page. For example:
+
+    vagrant@precise64:~$ aurora job create devcluster/www-data/prod/hello /vagrant/examples/jobs/hello_world.aurora
+    INFO] Creating job hello
+    INFO] Response from scheduler: OK (message: 1 new tasks pending for job www-data/prod/hello)
+    INFO] Job url: http://precise64:8081/scheduler/www-data/prod/hello
+
+You can go to the scheduler UI page for this job via `http://precise64:8081/scheduler/www-data/prod/hello`
+You can go to the overall scheduler UI page by going to the part of that URL that ends at `scheduler`; `http://precise64:8081/scheduler`
+
+Once you click through to a role page, you see Jobs arranged
+separately by pending jobs, active jobs and finished jobs.
+Jobs are arranged by role, typically a service account for
+production jobs and user accounts for test or development jobs.
+
+### Getting Job Status
+
+    aurora job status <job_key>
+
+Returns the status of recent tasks associated with the
+`job_key` specified Job in its supplied cluster. Typically this includes
+a mix of active tasks (running or assigned) and inactive tasks
+(successful, failed, and lost.)
+
+### Opening the Web UI
+
+Use the Job's web UI scheduler URL or the `aurora status` command to find out on which
+machines individual tasks are scheduled. You can open the web UI via the
+`open` command line command if invoked from your machine:
+
+    aurora job open [<cluster>[/<role>[/<env>/<job_name>]]]
+
+If only the cluster is specified, it goes directly to that cluster's
+scheduler main page. If the role is specified, it goes to the top-level
+role page. If the full job key is specified, it goes directly to the job
+page where you can inspect individual tasks.
+
+### SSHing to a Specific Task Machine
+
+    aurora task ssh <job_key> <shard number>
+
+You can have the Aurora client ssh directly to the machine that has been
+assigned a particular Job/shard number. This may be useful for quickly
+diagnosing issues such as performance issues or abnormal behavior on a
+particular machine.
+
+### Templating Command Arguments
+
+    aurora task run [-e] [-t THREADS] <job_key> -- <<command-line>>
+
+Given a job specification, run the supplied command on all hosts and
+return the output. You may use the standard Mustache templating rules:
+
+- `{{thermos.ports[name]}}` substitutes the specific named port of the
+  task assigned to this machine
+- `{{mesos.instance}}` substitutes the shard id of the job's task
+  assigned to this machine
+- `{{thermos.task_id}}` substitutes the task id of the job's task
+  assigned to this machine
+
+For example, the following type of pattern can be a powerful diagnostic
+tool:
+
+    aurora task run -t5 cluster1/tyg/devel/seizure -- \
+      'curl -s -m1 localhost:{{thermos.ports[http]}}/vars | grep uptime'
+
+By default, the command runs in the Task's sandbox. The `-e` option can
+run the command in the executor's sandbox. This is mostly useful for
+Aurora administrators.
+
+You can parallelize the runs by using the `-t` option.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/reference/client-hooks.md
----------------------------------------------------------------------
diff --git a/docs/reference/client-hooks.md b/docs/reference/client-hooks.md
new file mode 100644
index 0000000..ee64193
--- /dev/null
+++ b/docs/reference/client-hooks.md
@@ -0,0 +1,228 @@
+# Hooks for Aurora Client API
+
+You can execute hook methods around Aurora API Client methods when they are called by the Aurora Command Line commands.
+
+Explaining how hooks work is a bit tricky because of some indirection about what they apply to. Basically, a hook is code that executes when a particular Aurora Client API method runs, letting you extend the method's actions. The hook executes on the client side, specifically on the machine executing Aurora commands.
+
+The catch is that hooks are associated with Aurora Client API methods, which users don't directly call. Instead, users call Aurora Command Line commands, which call Client API methods during their execution. Since which hooks run depend on which Client API methods get called, you will need to know which Command Line commands call which API methods. Later on, there is a table showing the various associations.
+
+**Terminology Note**: From now on, "method(s)" refer to Client API methods, and "command(s)" refer to Command Line commands.
+
+- [Hook Types](#hook-types)
+- [Execution Order](#execution-order)
+- [Hookable Methods](#hookable-methods)
+- [Activating and Using Hooks](#activating-and-using-hooks)
+- [.aurora Config File Settings](#aurora-config-file-settings)
+- [Command Line](#command-line)
+- [Hooks Protocol](#hooks-protocol)
+  - [pre_ Methods](#pre_-methods)
+  - [err_ Methods](#err_-methods)
+  - [post_ Methods](#post_-methods)
+- [Generic Hooks](#generic-hooks)
+- [Hooks Process Checklist](#hooks-process-checklist)
+
+
+## Hook Types
+
+Hooks have three basic types, differing by when they run with respect to their associated method.
+
+`pre_<method_name>`: When its associated method is called, the `pre_` hook executes first, then the called method. If the `pre_` hook fails, the method never runs. Later code that expected the method to succeed may be affected by this, and result in terminating the Aurora client.
+
+Note that a `pre_` hook can error-trap internally so it does not
+return `False`. Designers/contributors of new `pre_` hooks should
+consider whether or not to error-trap them. You can error trap at the
+highest level very generally and always pass the `pre_` hook by
+returning `True`. For example:
+
+    def pre_create(...):
+      do_something()  # if do_something fails with an exception, the create_job is not attempted!
+      return True
+
+    # However...
+    def pre_create(...):
+      try:
+        do_something()  # may cause exception
+      except Exception:  # generic error trap will catch it
+        pass  # and ignore the exception
+      return True  # create_job will run in any case!
+
+`post_<method_name>`: A `post_` hook executes after its associated method successfully finishes running. If it fails, the already executed method is unaffected. A `post_` hook's error is trapped, and any later operations are unaffected.
+
+`err_<method_name>`: Executes only when its associated method returns a status other than OK or throws an exception. If an `err_` hook fails, the already executed method is unaffected. An `err_` hook's error is trapped, and any later operations are unaffected.
+
+## Execution Order
+
+A command with `pre_`, `post_`, and `err_` hooks defined and activated for its called method executes in the following order when the method successfully executes:
+
+1. Command called
+2. Command code executes
+3. Method Called
+4. `pre_` method hook runs
+5. Method runs and successfully finishes
+6. `post_` method hook runs
+7. Command code executes
+8. Command execution ends
+
+The following is what happens when, for the same command and hooks, the method associated with the command suffers an error and does not successfully finish executing:
+
+1. Command called
+2. Command code executes
+3. Method Called
+4. `pre_` method hook runs
+5. Method runs and fails
+6. `err_` method hook runs
+7. Command Code executes (if `err_` method does not end the command execution)
+8. Command execution ends
+
+Note that the `post_` and `err_` hooks for the same method can never both run for a single execution of that method.
+
+## Hookable Methods
+
+You can associate `pre_`, `post_`, and `err_` hooks with the following methods. Since you do not directly interact with the methods, but rather the Aurora Command Line commands that call them, for each method we also list the command(s) that can call the method. Note that a different method or methods may be called by a command depending on how the command's other code executes. Similarly, multiple commands can call the same method. We also list the methods' argument signatures, which are used by their associated hooks. <a name="Chart"></a>
+
+  Aurora Client API Method | Client API Method Argument Signature | Aurora Command Line Command
+  -------------------------| ------------------------------------- | ---------------------------
+  ```create_job``` | ```self```, ```config``` | ```job create```, <code>runtask
+  ```restart``` | ```self```, ```job_key```, ```shards```, ```update_config```, ```health_check_interval_seconds``` | ```job restart```
+  ```kill_job``` | ```self```, ```job_key```, ```shards=None``` |  ```job kill```
+  ```start_cronjob``` | ```self```, ```job_key``` | ```cron start```
+  ```start_job_update``` | ```self```, ```config```, ```instances=None``` | ```update start```
+
+Some specific examples:
+
+* `pre_create_job` executes when a `create_job` method is called, and before the `create_job` method itself executes.
+
+* `post_cancel_update` executes after a `cancel_update` method has successfully finished running.
+
+* `err_kill_job` executes when the `kill_job` method is called, but doesn't successfully finish running.
+
+## Activating and Using Hooks
+
+By default, hooks are inactive. If you do not want to use hooks, you do not need to make any changes to your code. If you do want to use hooks, you will need to alter your `.aurora` config file to activate them both for the configuration as a whole as well as for individual `Job`s. And, of course, you will need to define in your config file what happens when a particular hook executes.
+
+## .aurora Config File Settings
+
+You can define a top-level `hooks` variable in any `.aurora` config file. `hooks` is a list of all objects that define hooks used by `Job`s defined in that config file. If you do not want to define any hooks for a configuration, `hooks` is optional.
+
+    hooks = [Object_with_defined_hooks1, Object_with_defined_hooks2]
+
+Be careful when assembling a config file using `include` on multiple smaller config files. If there are multiple files that assign a value to `hooks`, only the last assignment made will stick. For example, if `x.aurora` has `hooks = [a, b, c]` and `y.aurora` has `hooks = [d, e, f]` and `z.aurora` has, in this order, `include x.aurora` and `include y.aurora`, the `hooks` value will be `[d, e, f]`.
+
+Also, for any `Job` that you want to use hooks with, its `Job` definition in the `.aurora` config file must set an `enable_hooks` flag to `True` (it defaults to `False`). By default, hooks are disabled and you must enable them for `Job`s of your choice.
+
+To summarize, to use hooks for a particular job, you must both activate hooks for your config file as a whole, and for that job. Activating hooks only for individual jobs won't work, nor will only activating hooks for your config file as a whole. You must also specify the hooks' defining object in the `hooks` variable.
+
+Recall that `.aurora` config files are written in Pystachio. So the following turns on hooks for production jobs at cluster1 and cluster2, but leaves them off for similar jobs with a defined user role. Of course, you also need to list the objects that define the hooks in your config file's `hooks` variable.
+
+    jobs = [
+            Job(enable_hooks = True, cluster = c, env = 'prod') for c in ('cluster1', 'cluster2')
+           ]
+    jobs.extend(
+       Job(cluster = c, env = 'prod', role = getpass.getuser()) for c in ('cluster1', 'cluster2'))
+       # Hooks disabled for these jobs
+
+## Command Line
+
+All Aurora Command Line commands now accept an `.aurora` config file as an optional parameter (some, of course, accept it as a required parameter). Whenever a command has a `.aurora` file parameter, any hooks specified and activated in the `.aurora` file can be used. For example:
+
+    aurora job restart cluster1/role/env/app myapp.aurora
+
+The command activates any hooks specified and activated in `myapp.aurora`. For the `restart` command, that is the only thing the `myapp.aurora` parameter does. So, if the command was the following, since there is no `.aurora` config file to specify any hooks, no hooks on the `restart` command can run.
+
+    aurora job restart cluster1/role/env/app
+
+## Hooks Protocol
+
+Any object defined in the `.aurora` config file can define hook methods. You should define your hook methods within a class, and then use the class name as a value in the `hooks` list in your config file.
+
+Note that you can define other methods in the class that its hook methods can call; all the logic of a hook does not have to be in its definition.
+
+The following example defines a class containing a `pre_kill_job` hook definition that calls another method defined in the class.
+
+    # Defines a method pre_kill_job
+    class KillConfirmer(object):
+      def confirm(self, msg):
+        return raw_input(msg).lower() == 'yes'
+
+      def pre_kill_job(self, job_key, shards=None):
+        shards = ('shards %s' % shards) if shards is not None else 'all shards'
+        return self.confirm('Are you sure you want to kill %s (%s)? (yes/no): '
+                            % (job_key, shards))
+
+### pre_ Methods
+
+`pre_` methods have the signature:
+
+    pre_<API method name>(self, <associated method's signature>)
+
+`pre_` methods have the same signature as their associated method, with the addition of `self` as the first parameter. See the [chart](#Chart) above for the mapping of parameters to methods. When writing `pre_` methods, you can use the `*` and `**` syntax to designate that all unspecified parameters are passed in a list to the `*`ed variable and all named parameters with values are passed as name/value pairs to the `**`ed variable.
+
+If this method returns False, the API command call aborts.
+
+### err_ Methods
+
+`err_` methods have the signature:
+
+    err_<API method name>(self, exc, <associated method's signature>)
+
+`err_` methods have the same signature as their associated method, with the addition of a first parameter `self` and a second parameter `exc`. `exc` is either a result with responseCode other than `ResponseCode.OK` or an `Exception`. See the [chart](#Chart) above for the mapping of parameters to methods. When writing `err`_ methods, you can use the `*` and `**` syntax to designate that all unspecified parameters are passed in a list to the `*`ed variable and all named parameters with values are passed as name/value pairs to the `**`ed variable.
+
+`err_` method return codes are ignored.
+
+### post_ Methods
+
+`post_` methods have the signature:
+
+    post_<API method name>(self, result, <associated method signature>)
+
+`post_` method parameters are `self`, then `result`, followed by the same parameter signature as their associated method. `result` is the result of the associated method call. See the [chart](#chart) above for the mapping of parameters to methods. When writing `post_` methods, you can use the `*` and `**` syntax to designate that all unspecified arguments are passed in a list to the `*`ed parameter and all unspecified named arguments with values are passed as name/value pairs to the `**`ed parameter.
+
+`post_` method return codes are ignored.
+
+## Generic Hooks
+
+There are seven Aurora API Methods which any of the three hook types can attach to. Thus, there are 21 possible hook/method combinations for a single `.aurora` config file. Say that you define `pre_` and `post_` hooks for the `restart` method. That leaves 19 undefined hook/method combinations; `err_restart` and the 3 `pre_`, `post_`, and `err_` hooks for each of the other 6 hookable methods. You can define what happens when any of these otherwise undefined 19 hooks execute via a generic hook, whose signature is:
+
+    generic_hook(self, hook_config, event, method_name, result_or_err, args*, kw**)
+
+where:
+
+* `hook_config` is a named tuple of `config` (the Pystashio `config` object) and `job_key`.
+
+* `event` is one of `pre`, `err`, or `post`, indicating which type of hook the genetic hook is standing in for. For example, assume no specific hooks were defined for the `restart` API command. If `generic_hook` is defined and activated, and `restart` is called, `generic_hook` will effectively run as `pre_restart`, `post_restart`, and `err_restart`. You can use a selection statement on this value so that `generic_hook` will act differently based on whether it is standing in for a `pre_`, `post_`, or `err_` hook.
+
+* `method_name` is the Client API method name whose execution is causing this execution of the `generic_hook`.
+
+* `args*`, `kw**` are the API method arguments and keyword arguments respectively.
+* `result_or_err` is a tri-state parameter taking one of these three values:
+  1. None for `pre_`hooks
+  2. `result` for `post_` nooks
+  3. `exc` for `err_` hooks
+
+Example:
+
+    # Overrides the standard do-nothing generic_hook by adding a log writing operation.
+    from twitter.common import log
+      class Logger(object):
+        '''Adds to the log every time a hookable API method is called'''
+        def generic_hook(self, hook_config, event, method_name, result_or_err, *args, **kw)
+          log.info('%s: %s_%s of %s'
+                   % (self.__class__.__name__, event, method_name, hook_config.job_key))
+
+## Hooks Process Checklist
+
+1. In your `.aurora` config file, add a `hooks` variable. Note that you may want to define a `.aurora` file only for hook definitions and then include this file in multiple other config files that you want to use the same hooks.
+
+    hooks = []
+
+2. In the `hooks` variable, list all objects that define hooks used by `Job`s defined in this config:
+
+    hooks = [Object_hook_definer1, Object_hook_definer2]
+
+3. For each job that uses hooks in this config file, add `enable_hooks = True` to the `Job` definition. Note that this is necessary even if you only want to use the generic hook.
+
+4. Write your `pre_`, `post_`, and `err_` hook definitions as part of an object definition in your `.aurora` config file.
+
+5. If desired, write your `generic_hook` definition as part of an object definition in your `.aurora` config file. Remember, the object must be listed as a member of `hooks`.
+
+6. If your Aurora command line command does not otherwise take an `.aurora` config file argument, add the appropriate `.aurora` file as an argument in order to define and activate the configuration's hooks.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/reference/configuration-best-practices.md
----------------------------------------------------------------------
diff --git a/docs/reference/configuration-best-practices.md b/docs/reference/configuration-best-practices.md
new file mode 100644
index 0000000..71e4959
--- /dev/null
+++ b/docs/reference/configuration-best-practices.md
@@ -0,0 +1,187 @@
+Aurora Configuration Best Practices
+===================================
+
+Use As Few .aurora Files As Possible
+------------------------------------
+
+When creating your `.aurora` configuration, try to keep all versions of
+a particular job within the same `.aurora` file. For example, if you
+have separate jobs for `cluster1`, `cluster1` staging, `cluster1`
+testing, and`cluster2`, keep them as close together as possible.
+
+Constructs shared across multiple jobs owned by your team (e.g.
+team-level defaults or structural templates) can be split into separate
+`.aurora`files and included via the `include` directive.
+
+
+Avoid Boilerplate
+------------------
+
+If you see repetition or find yourself copy and pasting any parts of
+your configuration, it's likely an opportunity for templating. Take the
+example below:
+
+`redundant.aurora` contains:
+
+    download = Process(
+      name = 'download',
+      cmdline = 'wget http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tar.bz2',
+      max_failures = 5,
+      min_duration = 1)
+
+    unpack = Process(
+      name = 'unpack',
+      cmdline = 'rm -rf Python-2.7.3 && tar xzf Python-2.7.3.tar.bz2',
+      max_failures = 5,
+      min_duration = 1)
+
+    build = Process(
+      name = 'build',
+      cmdline = 'pushd Python-2.7.3 && ./configure && make && popd',
+      max_failures = 1)
+
+    email = Process(
+      name = 'email',
+      cmdline = 'echo Success | mail feynman@tmc.com',
+      max_failures = 5,
+      min_duration = 1)
+
+    build_python = Task(
+      name = 'build_python',
+      processes = [download, unpack, build, email],
+      constraints = [Constraint(order = ['download', 'unpack', 'build', 'email'])])
+
+As you'll notice, there's a lot of repetition in the `Process`
+definitions. For example, almost every process sets a `max_failures`
+limit to 5 and a `min_duration` to 1. This is an opportunity for factoring
+into a common process template.
+
+Furthermore, the Python version is repeated everywhere. This can be
+bound via structural templating as described in the [Advanced Binding](configuration-templating.md#AdvancedBinding)
+section.
+
+`less_redundant.aurora` contains:
+
+    class Python(Struct):
+      version = Required(String)
+      base = Default(String, 'Python-{{version}}')
+      package = Default(String, '{{base}}.tar.bz2')
+
+    ReliableProcess = Process(
+      max_failures = 5,
+      min_duration = 1)
+
+    download = ReliableProcess(
+      name = 'download',
+      cmdline = 'wget http://www.python.org/ftp/python/{{python.version}}/{{python.package}}')
+
+    unpack = ReliableProcess(
+      name = 'unpack',
+      cmdline = 'rm -rf {{python.base}} && tar xzf {{python.package}}')
+
+    build = ReliableProcess(
+      name = 'build',
+      cmdline = 'pushd {{python.base}} && ./configure && make && popd',
+      max_failures = 1)
+
+    email = ReliableProcess(
+      name = 'email',
+      cmdline = 'echo Success | mail {{role}}@foocorp.com')
+
+    build_python = SequentialTask(
+      name = 'build_python',
+      processes = [download, unpack, build, email]).bind(python = Python(version = "2.7.3"))
+
+
+Thermos Uses bash, But Thermos Is Not bash
+-------------------------------------------
+
+#### Bad
+
+Many tiny Processes makes for harder to manage configurations.
+
+    copy = Process(
+      name = 'copy',
+      cmdline = 'rcp user@my_machine:my_application .'
+     )
+
+     unpack = Process(
+       name = 'unpack',
+       cmdline = 'unzip app.zip'
+     )
+
+     remove = Process(
+       name = 'remove',
+       cmdline = 'rm -f app.zip'
+     )
+
+     run = Process(
+       name = 'app',
+       cmdline = 'java -jar app.jar'
+     )
+
+     run_task = Task(
+       processes = [copy, unpack, remove, run],
+       constraints = order(copy, unpack, remove, run)
+     )
+
+#### Good
+
+Each `cmdline` runs in a bash subshell, so you have the full power of
+bash. Chaining commands with `&&` or `||` is almost always the right
+thing to do.
+
+Also for Tasks that are simply a list of processes that run one after
+another, consider using the `SequentialTask` helper which applies a
+linear ordering constraint for you.
+
+    stage = Process(
+      name = 'stage',
+      cmdline = 'rcp user@my_machine:my_application . && unzip app.zip && rm -f app.zip')
+
+    run = Process(name = 'app', cmdline = 'java -jar app.jar')
+
+    run_task = SequentialTask(processes = [stage, run])
+
+
+Rarely Use Functions In Your Configurations
+-------------------------------------------
+
+90% of the time you define a function in a `.aurora` file, you're
+probably Doing It Wrong(TM).
+
+#### Bad
+
+    def get_my_task(name, user, cpu, ram, disk):
+      return Task(
+        name = name,
+        user = user,
+        processes = [STAGE_PROCESS, RUN_PROCESS],
+        constraints = order(STAGE_PROCESS, RUN_PROCESS),
+        resources = Resources(cpu = cpu, ram = ram, disk = disk)
+     )
+
+     task_one = get_my_task('task_one', 'feynman', 1.0, 32*MB, 1*GB)
+     task_two = get_my_task('task_two', 'feynman', 2.0, 64*MB, 1*GB)
+
+#### Good
+
+This one is more idiomatic. Forced keyword arguments prevents accidents,
+e.g. constructing a task with "32*MB" when you mean 32MB of ram and not
+disk. Less proliferation of task-construction techniques means
+easier-to-read, quicker-to-understand, and a more composable
+configuration.
+
+    TASK_TEMPLATE = SequentialTask(
+      user = 'wickman',
+      processes = [STAGE_PROCESS, RUN_PROCESS],
+    )
+
+    task_one = TASK_TEMPLATE(
+      name = 'task_one',
+      resources = Resources(cpu = 1.0, ram = 32*MB, disk = 1*GB) )
+
+    task_two = TASK_TEMPLATE(
+      name = 'task_two',
+      resources = Resources(cpu = 2.0, ram = 64*MB, disk = 1*GB)
+    )

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/reference/configuration-templating.md
----------------------------------------------------------------------
diff --git a/docs/reference/configuration-templating.md b/docs/reference/configuration-templating.md
new file mode 100644
index 0000000..c54bbbf
--- /dev/null
+++ b/docs/reference/configuration-templating.md
@@ -0,0 +1,306 @@
+Aurora Configuration Templating
+===============================
+
+The `.aurora` file format is just Python. However, `Job`, `Task`,
+`Process`, and other classes are defined by a templating library called
+*Pystachio*, a powerful tool for configuration specification and reuse.
+
+[Aurora Configuration Reference](configuration.md)
+has a full reference of all Aurora/Thermos defined Pystachio objects.
+
+When writing your `.aurora` file, you may use any Pystachio datatypes, as
+well as any objects shown in the *Aurora+Thermos Configuration
+Reference* without `import` statements - the Aurora config loader
+injects them automatically. Other than that the `.aurora` format
+works like any other Python script.
+
+
+Templating 1: Binding in Pystachio
+----------------------------------
+
+Pystachio uses the visually distinctive {{}} to indicate template
+variables. These are often called "mustache variables" after the
+similarly appearing variables in the Mustache templating system and
+because the curly braces resemble mustaches.
+
+If you are familiar with the Mustache system, templates in Pystachio
+have significant differences. They have no nesting, joining, or
+inheritance semantics. On the other hand, when evaluated, templates
+are evaluated iteratively, so this affords some level of indirection.
+
+Let's start with the simplest template; text with one
+variable, in this case `name`;
+
+    Hello {{name}}
+
+If we evaluate this as is, we'd get back:
+
+    Hello
+
+If a template variable doesn't have a value, when evaluated it's
+replaced with nothing. If we add a binding to give it a value:
+
+    { "name" : "Tom" }
+
+We'd get back:
+
+    Hello Tom
+
+Every Pystachio object has an associated `.bind` method that can bind
+values to {{}} variables. Bindings are not immediately evaluated.
+Instead, they are evaluated only when the interpolated value of the
+object is necessary, e.g. for performing equality or serializing a
+message over the wire.
+
+Objects with and without mustache templated variables behave
+differently:
+
+    >>> Float(1.5)
+    Float(1.5)
+
+    >>> Float('{{x}}.5')
+    Float({{x}}.5)
+
+    >>> Float('{{x}}.5').bind(x = 1)
+    Float(1.5)
+
+    >>> Float('{{x}}.5').bind(x = 1) == Float(1.5)
+    True
+
+    >>> contextual_object = String('{{metavar{{number}}}}').bind(
+    ... metavar1 = "first", metavar2 = "second")
+
+    >>> contextual_object
+    String({{metavar{{number}}}})
+
+    >>> contextual_object.bind(number = 1)
+    String(first)
+
+    >>> contextual_object.bind(number = 2)
+    String(second)
+
+You usually bind simple key to value pairs, but you can also bind three
+other objects: lists, dictionaries, and structurals. These will be
+described in detail later.
+
+
+### Structurals in Pystachio / Aurora
+
+Most Aurora/Thermos users don't ever (knowingly) interact with `String`,
+`Float`, or `Integer` Pystashio objects directly. Instead they interact
+with derived structural (`Struct`) objects that are collections of
+fundamental and structural objects. The structural object components are
+called *attributes*. Aurora's most used structural objects are `Job`,
+`Task`, and `Process`:
+
+    class Process(Struct):
+      cmdline = Required(String)
+      name = Required(String)
+      max_failures = Default(Integer, 1)
+      daemon = Default(Boolean, False)
+      ephemeral = Default(Boolean, False)
+      min_duration = Default(Integer, 5)
+      final = Default(Boolean, False)
+
+Construct default objects by following the object's type with (). If you
+want an attribute to have a value different from its default, include
+the attribute name and value inside the parentheses.
+
+    >>> Process()
+    Process(daemon=False, max_failures=1, ephemeral=False,
+      min_duration=5, final=False)
+
+Attribute values can be template variables, which then receive specific
+values when creating the object.
+
+    >>> Process(cmdline = 'echo {{message}}')
+    Process(daemon=False, max_failures=1, ephemeral=False, min_duration=5,
+            cmdline=echo {{message}}, final=False)
+
+    >>> Process(cmdline = 'echo {{message}}').bind(message = 'hello world')
+    Process(daemon=False, max_failures=1, ephemeral=False, min_duration=5,
+            cmdline=echo hello world, final=False)
+
+A powerful binding property is that all of an object's children inherit its
+bindings:
+
+    >>> List(Process)([
+    ... Process(name = '{{prefix}}_one'),
+    ... Process(name = '{{prefix}}_two')
+    ... ]).bind(prefix = 'hello')
+    ProcessList(
+      Process(daemon=False, name=hello_one, max_failures=1, ephemeral=False, min_duration=5, final=False),
+      Process(daemon=False, name=hello_two, max_failures=1, ephemeral=False, min_duration=5, final=False)
+      )
+
+Remember that an Aurora Job contains Tasks which contain Processes. A
+Job level binding is inherited by its Tasks and all their Processes.
+Similarly a Task level binding is available to that Task and its
+Processes but is *not* visible at the Job level (inheritance is a
+one-way street.)
+
+#### Mustaches Within Structurals
+
+When you define a `Struct` schema, one powerful, but confusing, feature
+is that all of that structure's attributes are Mustache variables within
+the enclosing scope *once they have been populated*.
+
+For example, when `Process` is defined above, all its attributes such as
+{{`name`}}, {{`cmdline`}}, {{`max_failures`}} etc., are all immediately
+defined as Mustache variables, implicitly bound into the `Process`, and
+inherit all child objects once they are defined.
+
+Thus, you can do the following:
+
+    >>> Process(name = "installer", cmdline = "echo {{name}} is running")
+    Process(daemon=False, name=installer, max_failures=1, ephemeral=False, min_duration=5,
+            cmdline=echo installer is running, final=False)
+
+WARNING: This binding only takes place in one direction. For example,
+the following does NOT work and does not set the `Process` `name`
+attribute's value.
+
+    >>> Process().bind(name = "installer")
+    Process(daemon=False, max_failures=1, ephemeral=False, min_duration=5, final=False)
+
+The following is also not possible and results in an infinite loop that
+attempts to resolve `Process.name`.
+
+    >>> Process(name = '{{name}}').bind(name = 'installer')
+
+Do not confuse Structural attributes with bound Mustache variables.
+Attributes are implicitly converted to Mustache variables but not vice
+versa.
+
+### Templating 2: Structurals Are Factories
+
+#### A Second Way of Templating
+
+A second templating method is both as powerful as the aforementioned and
+often confused with it. This method is due to automatic conversion of
+Struct attributes to Mustache variables as described above.
+
+Suppose you create a Process object:
+
+    >>> p = Process(name = "process_one", cmdline = "echo hello world")
+
+    >>> p
+    Process(daemon=False, name=process_one, max_failures=1, ephemeral=False, min_duration=5,
+            cmdline=echo hello world, final=False)
+
+This `Process` object, "`p`", can be used wherever a `Process` object is
+needed. It can also be reused by changing the value(s) of its
+attribute(s). Here we change its `name` attribute from `process_one` to
+`process_two`.
+
+    >>> p(name = "process_two")
+    Process(daemon=False, name=process_two, max_failures=1, ephemeral=False, min_duration=5,
+            cmdline=echo hello world, final=False)
+
+Template creation is a common use for this technique:
+
+    >>> Daemon = Process(daemon = True)
+    >>> logrotate = Daemon(name = 'logrotate', cmdline = './logrotate conf/logrotate.conf')
+    >>> mysql = Daemon(name = 'mysql', cmdline = 'bin/mysqld --safe-mode')
+
+### Advanced Binding
+
+As described above, `.bind()` binds simple strings or numbers to
+Mustache variables. In addition to Structural types formed by combining
+atomic types, Pystachio has two container types; `List` and `Map` which
+can also be bound via `.bind()`.
+
+#### Bind Syntax
+
+The `bind()` function can take Python dictionaries or `kwargs`
+interchangeably (when "`kwargs`" is in a function definition, `kwargs`
+receives a Python dictionary containing all keyword arguments after the
+formal parameter list).
+
+    >>> String('{{foo}}').bind(foo = 'bar') == String('{{foo}}').bind({'foo': 'bar'})
+    True
+
+Bindings done "closer" to the object in question take precedence:
+
+    >>> p = Process(name = '{{context}}_process')
+    >>> t = Task().bind(context = 'global')
+    >>> t(processes = [p, p.bind(context = 'local')])
+    Task(processes=ProcessList(
+      Process(daemon=False, name=global_process, max_failures=1, ephemeral=False, final=False,
+              min_duration=5),
+      Process(daemon=False, name=local_process, max_failures=1, ephemeral=False, final=False,
+              min_duration=5)
+    ))
+
+#### Binding Complex Objects
+
+##### Lists
+
+    >>> fibonacci = List(Integer)([1, 1, 2, 3, 5, 8, 13])
+    >>> String('{{fib[4]}}').bind(fib = fibonacci)
+    String(5)
+
+##### Maps
+
+    >>> first_names = Map(String, String)({'Kent': 'Clark', 'Wayne': 'Bruce', 'Prince': 'Diana'})
+    >>> String('{{first[Kent]}}').bind(first = first_names)
+    String(Clark)
+
+##### Structurals
+
+    >>> String('{{p.cmdline}}').bind(p = Process(cmdline = "echo hello world"))
+    String(echo hello world)
+
+### Structural Binding
+
+Use structural templates when binding more than two or three individual
+values at the Job or Task level. For fewer than two or three, standard
+key to string binding is sufficient.
+
+Structural binding is a very powerful pattern and is most useful in
+Aurora/Thermos for doing Structural configuration. For example, you can
+define a job profile. The following profile uses `HDFS`, the Hadoop
+Distributed File System, to designate a file's location. `HDFS` does
+not come with Aurora, so you'll need to either install it separately
+or change the way the dataset is designated.
+
+    class Profile(Struct):
+      version = Required(String)
+      environment = Required(String)
+      dataset = Default(String, hdfs://home/aurora/data/{{environment}}')
+
+    PRODUCTION = Profile(version = 'live', environment = 'prod')
+    DEVEL = Profile(version = 'latest',
+                    environment = 'devel',
+                    dataset = 'hdfs://home/aurora/data/test')
+    TEST = Profile(version = 'latest', environment = 'test')
+
+    JOB_TEMPLATE = Job(
+      name = 'application',
+      role = 'myteam',
+      cluster = 'cluster1',
+      environment = '{{profile.environment}}',
+      task = SequentialTask(
+        name = 'task',
+        resources = Resources(cpu = 2, ram = 4*GB, disk = 8*GB),
+        processes = [
+      Process(name = 'main', cmdline = 'java -jar application.jar -hdfsPath
+                 {{profile.dataset}}')
+        ]
+       )
+     )
+
+    jobs = [
+      JOB_TEMPLATE(instances = 100).bind(profile = PRODUCTION),
+      JOB_TEMPLATE.bind(profile = DEVEL),
+      JOB_TEMPLATE.bind(profile = TEST),
+     ]
+
+In this case, a custom structural "Profile" is created to self-document
+the configuration to some degree. This also allows some schema
+"type-checking", and for default self-substitution, e.g. in
+`Profile.dataset` above.
+
+So rather than a `.bind()` with a half-dozen substituted variables, you
+can bind a single object that has sensible defaults stored in a single
+place.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/reference/configuration-tutorial.md
----------------------------------------------------------------------
diff --git a/docs/reference/configuration-tutorial.md b/docs/reference/configuration-tutorial.md
new file mode 100644
index 0000000..4390cd6
--- /dev/null
+++ b/docs/reference/configuration-tutorial.md
@@ -0,0 +1,511 @@
+Aurora Configuration Tutorial
+=============================
+
+How to write Aurora configuration files, including feature descriptions
+and best practices. When writing a configuration file, make use of
+`aurora job inspect`. It takes the same job key and configuration file
+arguments as `aurora job create` or `aurora update start`. It first ensures the
+configuration parses, then outputs it in human-readable form.
+
+You should read this after going through the general [Aurora Tutorial](../getting-started/tutorial.md).
+
+- [The Basics](#user-content-the-basics)
+	- [Use Bottom-To-Top Object Ordering](#user-content-use-bottom-to-top-object-ordering)
+- [An Example Configuration File](#user-content-an-example-configuration-file)
+- [Defining Process Objects](#user-content-defining-process-objects)
+- [Getting Your Code Into The Sandbox](#user-content-getting-your-code-into-the-sandbox)
+- [Defining Task Objects](#user-content-defining-task-objects)
+	- [SequentialTask: Running Processes in Parallel or Sequentially](#user-content-sequentialtask-running-processes-in-parallel-or-sequentially)
+	- [SimpleTask](#user-content-simpletask)
+	- [Combining tasks](#user-content-combining-tasks)
+- [Defining Job Objects](#user-content-defining-job-objects)
+- [The jobs List](#user-content-the-jobs-list)
+- [Basic Examples](#basic-examples)
+
+
+The Basics
+----------
+
+To run a job on Aurora, you must specify a configuration file that tells
+Aurora what it needs to know to schedule the job, what Mesos needs to
+run the tasks the job is made up of, and what Thermos needs to run the
+processes that make up the tasks. This file must have
+a`.aurora` suffix.
+
+A configuration file defines a collection of objects, along with parameter
+values for their attributes. An Aurora configuration file contains the
+following three types of objects:
+
+- Job
+- Task
+- Process
+
+A configuration also specifies a list of `Job` objects assigned
+to the variable `jobs`.
+
+- jobs (list of defined Jobs to run)
+
+The `.aurora` file format is just Python. However, `Job`, `Task`,
+`Process`, and other classes are defined by a type-checked dictionary
+templating library called *Pystachio*, a powerful tool for
+configuration specification and reuse. Pystachio objects are tailored
+via {{}} surrounded templates.
+
+When writing your `.aurora` file, you may use any Pystachio datatypes, as
+well as any objects shown in the [*Aurora+Thermos Configuration
+Reference*](configuration-reference.md), without `import` statements - the
+Aurora config loader injects them automatically. Other than that, an `.aurora`
+file works like any other Python script.
+
+[*Aurora Configuration Reference*](configuration.md)
+has a full reference of all Aurora/Thermos defined Pystachio objects.
+
+### Use Bottom-To-Top Object Ordering
+
+A well-structured configuration starts with structural templates (if
+any). Structural templates encapsulate in their attributes all the
+differences between Jobs in the configuration that are not directly
+manipulated at the `Job` level, but typically at the `Process` or `Task`
+level. For example, if certain processes are invoked with slightly
+different settings or input.
+
+After structural templates, define, in order, `Process`es, `Task`s, and
+`Job`s.
+
+Structural template names should be *UpperCamelCased* and their
+instantiations are typically *UPPER\_SNAKE\_CASED*. `Process`, `Task`,
+and `Job` names are typically *lower\_snake\_cased*. Indentation is typically 2
+spaces.
+
+An Example Configuration File
+-----------------------------
+
+The following is a typical configuration file. Don't worry if there are
+parts you don't understand yet, but you may want to refer back to this
+as you read about its individual parts. Note that names surrounded by
+curly braces {{}} are template variables, which the system replaces with
+bound values for the variables.
+
+    # --- templates here ---
+	class Profile(Struct):
+	  package_version = Default(String, 'live')
+	  java_binary = Default(String, '/usr/lib/jvm/java-1.7.0-openjdk/bin/java')
+	  extra_jvm_options = Default(String, '')
+	  parent_environment = Default(String, 'prod')
+	  parent_serverset = Default(String,
+                                 '/foocorp/service/bird/{{parent_environment}}/bird')
+
+	# --- processes here ---
+	main = Process(
+	  name = 'application',
+	  cmdline = '{{profile.java_binary}} -server -Xmx1792m '
+	            '{{profile.extra_jvm_options}} '
+	            '-jar application.jar '
+	            '-upstreamService {{profile.parent_serverset}}'
+	)
+
+	# --- tasks ---
+	base_task = SequentialTask(
+	  name = 'application',
+	  processes = [
+	    Process(
+	      name = 'fetch',
+	      cmdline = 'curl -O
+                  https://packages.foocorp.com/{{profile.package_version}}/application.jar'),
+	  ]
+	)
+
+        # not always necessary but often useful to have separate task
+        # resource classes
+        staging_task = base_task(resources =
+                         Resources(cpu = 1.0,
+                                   ram = 2048*MB,
+                                   disk = 1*GB))
+	production_task = base_task(resources =
+                            Resources(cpu = 4.0,
+                                      ram = 2560*MB,
+                                      disk = 10*GB))
+
+	# --- job template ---
+	job_template = Job(
+	  name = 'application',
+	  role = 'myteam',
+	  contact = 'myteam-team@foocorp.com',
+	  instances = 20,
+	  service = True,
+	  task = production_task
+	)
+
+	# -- profile instantiations (if any) ---
+	PRODUCTION = Profile()
+	STAGING = Profile(
+	  extra_jvm_options = '-Xloggc:gc.log',
+	  parent_environment = 'staging'
+	)
+
+	# -- job instantiations --
+	jobs = [
+          job_template(cluster = 'cluster1', environment = 'prod')
+	               .bind(profile = PRODUCTION),
+
+          job_template(cluster = 'cluster2', environment = 'prod')
+	                .bind(profile = PRODUCTION),
+
+          job_template(cluster = 'cluster1',
+                        environment = 'staging',
+			service = False,
+			task = staging_task,
+			instances = 2)
+			.bind(profile = STAGING),
+	]
+
+## Defining Process Objects
+
+Processes are handled by the Thermos system. A process is a single
+executable step run as a part of an Aurora task, which consists of a
+bash-executable statement.
+
+The key (and required) `Process` attributes are:
+
+-   `name`: Any string which is a valid Unix filename (no slashes,
+    NULLs, or leading periods). The `name` value must be unique relative
+    to other Processes in a `Task`.
+-   `cmdline`: A command line run in a bash subshell, so you can use
+    bash scripts. Nothing is supplied for command-line arguments,
+    so `$*` is unspecified.
+
+Many tiny processes make managing configurations more difficult. For
+example, the following is a bad way to define processes.
+
+    copy = Process(
+      name = 'copy',
+      cmdline = 'curl -O https://packages.foocorp.com/app.zip'
+    )
+    unpack = Process(
+      name = 'unpack',
+      cmdline = 'unzip app.zip'
+    )
+    remove = Process(
+      name = 'remove',
+      cmdline = 'rm -f app.zip'
+    )
+    run = Process(
+      name = 'app',
+      cmdline = 'java -jar app.jar'
+    )
+    run_task = Task(
+      processes = [copy, unpack, remove, run],
+      constraints = order(copy, unpack, remove, run)
+    )
+
+Since `cmdline` runs in a bash subshell, you can chain commands
+with `&&` or `||`.
+
+When defining a `Task` that is just a list of Processes run in a
+particular order, use `SequentialTask`, as described in the [*Defining*
+`Task` *Objects*](#Task) section. The following simplifies and combines the
+above multiple `Process` definitions into just two.
+
+    stage = Process(
+      name = 'stage',
+      cmdline = 'curl -O https://packages.foocorp.com/app.zip && '
+                'unzip app.zip && rm -f app.zip')
+
+    run = Process(name = 'app', cmdline = 'java -jar app.jar')
+
+    run_task = SequentialTask(processes = [stage, run])
+
+`Process` also has optional attributes to customize its behaviour. Details can be found in the [Aurora Configuration Reference](configuration.md#process-objects).
+
+
+## Getting Your Code Into The Sandbox
+
+When using Aurora, you need to get your executable code into its "sandbox", specifically
+the Task sandbox where the code executes for the Processes that make up that Task.
+
+Each Task has a sandbox created when the Task starts and garbage
+collected when it finishes. All of a Task's processes run in its
+sandbox, so processes can share state by using a shared current
+working directory.
+
+Typically, you save this code somewhere. You then need to define a Process
+in your `.aurora` configuration file that fetches the code from that somewhere
+to where the slave can see it. For a public cloud, that can be anywhere public on
+the Internet, such as S3. For a private cloud internal storage, you need to put in
+on an accessible HDFS cluster or similar storage.
+
+The template for this Process is:
+
+    <name> = Process(
+      name = '<name>'
+      cmdline = '<command to copy and extract code archive into current working directory>'
+    )
+
+Note: Be sure the extracted code archive has an executable.
+
+## Defining Task Objects
+
+Tasks are handled by Mesos. A task is a collection of processes that
+runs in a shared sandbox. It's the fundamental unit Aurora uses to
+schedule the datacenter; essentially what Aurora does is find places
+in the cluster to run tasks.
+
+The key (and required) parts of a Task are:
+
+-   `name`: A string giving the Task's name. By default, if a Task is
+    not given a name, it inherits the first name in its Process list.
+
+-   `processes`: An unordered list of Process objects bound to the Task.
+    The value of the optional `constraints` attribute affects the
+    contents as a whole. Currently, the only constraint, `order`, determines if
+    the processes run in parallel or sequentially.
+
+-   `resources`: A `Resource` object defining the Task's resource
+        footprint. A `Resource` object has three attributes:
+        -   `cpu`: A Float, the fractional number of cores the Task
+        requires.
+        -   `ram`: An Integer, RAM bytes the Task requires.
+        -   `disk`: An integer, disk bytes the Task requires.
+
+A basic Task definition looks like:
+
+    Task(
+        name="hello_world",
+        processes=[Process(name = "hello_world", cmdline = "echo hello world")],
+        resources=Resources(cpu = 1.0,
+                            ram = 1*GB,
+                            disk = 1*GB))
+
+A Task has optional attributes to customize its behaviour. Details can be found in the [Aurora Configuration Reference](configuration.md#task-object)
+
+
+### SequentialTask: Running Processes in Parallel or Sequentially
+
+By default, a Task with several Processes runs them in parallel. There
+are two ways to run Processes sequentially:
+
+-   Include an `order` constraint in the Task definition's `constraints`
+    attribute whose arguments specify the processes' run order:
+
+        Task( ... processes=[process1, process2, process3],
+	          constraints = order(process1, process2, process3), ...)
+
+-   Use `SequentialTask` instead of `Task`; it automatically runs
+    processes in the order specified in the `processes` attribute. No
+    `constraint` parameter is needed:
+
+        SequentialTask( ... processes=[process1, process2, process3] ...)
+
+### SimpleTask
+
+For quickly creating simple tasks, use the `SimpleTask` helper. It
+creates a basic task from a provided name and command line using a
+default set of resources. For example, in a .`aurora` configuration
+file:
+
+    SimpleTask(name="hello_world", command="echo hello world")
+
+is equivalent to
+
+    Task(name="hello_world",
+         processes=[Process(name = "hello_world", cmdline = "echo hello world")],
+         resources=Resources(cpu = 1.0,
+                             ram = 1*GB,
+                             disk = 1*GB))
+
+The simplest idiomatic Job configuration thus becomes:
+
+    import os
+    hello_world_job = Job(
+      task=SimpleTask(name="hello_world", command="echo hello world"),
+      role=os.getenv('USER'),
+      cluster="cluster1")
+
+When written to `hello_world.aurora`, you invoke it with a simple
+`aurora job create cluster1/$USER/test/hello_world hello_world.aurora`.
+
+### Combining tasks
+
+`Tasks.concat`(synonym,`concat_tasks`) and
+`Tasks.combine`(synonym,`combine_tasks`) merge multiple Task definitions
+into a single Task. It may be easier to define complex Jobs
+as smaller constituent Tasks. But since a Job only includes a single
+Task, the subtasks must be combined before using them in a Job.
+Smaller Tasks can also be reused between Jobs, instead of having to
+repeat their definition for multiple Jobs.
+
+With both methods, the merged Task takes the first Task's name. The
+difference between the two is the result Task's process ordering.
+
+-   `Tasks.combine` runs its subtasks' processes in no particular order.
+    The new Task's resource consumption is the sum of all its subtasks'
+    consumption.
+
+-   `Tasks.concat` runs its subtasks in the order supplied, with each
+    subtask's processes run serially between tasks. It is analogous to
+    the `order` constraint helper, except at the Task level instead of
+    the Process level. The new Task's resource consumption is the
+    maximum value specified by any subtask for each Resource attribute
+    (cpu, ram and disk).
+
+For example, given the following:
+
+    setup_task = Task(
+      ...
+      processes=[download_interpreter, update_zookeeper],
+      # It is important to note that {{Tasks.concat}} has
+      # no effect on the ordering of the processes within a task;
+      # hence the necessity of the {{order}} statement below
+      # (otherwise, the order in which {{download_interpreter}}
+      # and {{update_zookeeper}} run will be non-deterministic)
+      constraints=order(download_interpreter, update_zookeeper),
+      ...
+    )
+
+    run_task = SequentialTask(
+      ...
+      processes=[download_application, start_application],
+      ...
+    )
+
+    combined_task = Tasks.concat(setup_task, run_task)
+
+The `Tasks.concat` command merges the two Tasks into a single Task and
+ensures all processes in `setup_task` run before the processes
+in `run_task`. Conceptually, the task is reduced to:
+
+    task = Task(
+      ...
+      processes=[download_interpreter, update_zookeeper,
+                 download_application, start_application],
+      constraints=order(download_interpreter, update_zookeeper,
+                        download_application, start_application),
+      ...
+    )
+
+In the case of `Tasks.combine`, the two schedules run in parallel:
+
+    task = Task(
+      ...
+      processes=[download_interpreter, update_zookeeper,
+                 download_application, start_application],
+      constraints=order(download_interpreter, update_zookeeper) +
+                        order(download_application, start_application),
+      ...
+    )
+
+In the latter case, each of the two sequences may operate in parallel.
+Of course, this may not be the intended behavior (for example, if
+the `start_application` Process implicitly relies
+upon `download_interpreter`). Make sure you understand the difference
+between using one or the other.
+
+## Defining Job Objects
+
+A job is a group of identical tasks that Aurora can run in a Mesos cluster.
+
+A `Job` object is defined by the values of several attributes, some
+required and some optional. The required attributes are:
+
+-   `task`: Task object to bind to this job. Note that a Job can
+    only take a single Task.
+
+-   `role`: Job's role account; in other words, the user account to run
+    the job as on a Mesos cluster machine. A common value is
+    `os.getenv('USER')`; using a Python command to get the user who
+    submits the job request. The other common value is the service
+    account that runs the job, e.g. `www-data`.
+
+-   `environment`: Job's environment, typical values
+    are `devel`, `test`, or `prod`.
+
+-   `cluster`: Aurora cluster to schedule the job in, defined in
+    `/etc/aurora/clusters.json` or `~/.clusters.json`. You can specify
+    jobs where the only difference is the `cluster`, then at run time
+    only run the Job whose job key includes your desired cluster's name.
+
+You usually see a `name` parameter. By default, `name` inherits its
+value from the Job's associated Task object, but you can override this
+default. For these four parameters, a Job definition might look like:
+
+    foo_job = Job( name = 'foo', cluster = 'cluster1',
+              role = os.getenv('USER'), environment = 'prod',
+              task = foo_task)
+
+In addition to the required attributes, there are several optional
+attributes. Details can be found in the [Aurora Configuration Reference](configuration.md#job-objects).
+
+
+## The jobs List
+
+At the end of your `.aurora` file, you need to specify a list of the
+file's defined Jobs. For example, the following exports the jobs `job1`,
+`job2`, and `job3`.
+
+    jobs = [job1, job2, job3]
+
+This allows the aurora client to invoke commands on those jobs, such as
+starting, updating, or killing them.
+
+
+
+Basic Examples
+==============
+
+These are provided to give a basic understanding of simple Aurora jobs.
+
+### hello_world.aurora
+
+Put the following in a file named `hello_world.aurora`, substituting your own values
+for values such as `cluster`s.
+
+    import os
+    hello_world_process = Process(name = 'hello_world', cmdline = 'echo hello world')
+
+    hello_world_task = Task(
+      resources = Resources(cpu = 0.1, ram = 16 * MB, disk = 16 * MB),
+      processes = [hello_world_process])
+
+    hello_world_job = Job(
+      cluster = 'cluster1',
+      role = os.getenv('USER'),
+      task = hello_world_task)
+
+    jobs = [hello_world_job]
+
+Then issue the following commands to create and kill the job, using your own values for the job key.
+
+    aurora job create cluster1/$USER/test/hello_world hello_world.aurora
+
+    aurora job kill cluster1/$USER/test/hello_world
+
+### Environment Tailoring
+
+Put the following in a file named `hello_world_productionized.aurora`, substituting your own values
+for values such as `cluster`s.
+
+    include('hello_world.aurora')
+
+    production_resources = Resources(cpu = 1.0, ram = 512 * MB, disk = 2 * GB)
+    staging_resources = Resources(cpu = 0.1, ram = 32 * MB, disk = 512 * MB)
+    hello_world_template = hello_world(
+        name = "hello_world-{{cluster}}"
+        task = hello_world(resources=production_resources))
+
+    jobs = [
+      # production jobs
+      hello_world_template(cluster = 'cluster1', instances = 25),
+      hello_world_template(cluster = 'cluster2', instances = 15),
+
+      # staging jobs
+      hello_world_template(
+        cluster = 'local',
+        instances = 1,
+        task = hello_world(resources=staging_resources)),
+    ]
+
+Then issue the following commands to create and kill the job, using your own values for the job key
+
+    aurora job create cluster1/$USER/test/hello_world-cluster1 hello_world_productionized.aurora
+
+    aurora job kill cluster1/$USER/test/hello_world-cluster1


[4/7] aurora git commit: Reorganize Documentation

Posted by se...@apache.org.
http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/hooks.md
----------------------------------------------------------------------
diff --git a/docs/hooks.md b/docs/hooks.md
deleted file mode 100644
index 28307ab..0000000
--- a/docs/hooks.md
+++ /dev/null
@@ -1,244 +0,0 @@
-# Hooks for Aurora Client API
-
-- [Introduction](#introduction)
-- [Hook Types](#hook-types)
-- [Execution Order](#execution-order)
-- [Hookable Methods](#hookable-methods)
-- [Activating and Using Hooks](#activating-and-using-hooks)
-- [.aurora Config File Settings](#aurora-config-file-settings)
-- [Command Line](#command-line)
-- [Hooks Protocol](#hooks-protocol)
-  - [pre_ Methods](#pre_-methods)
-  - [err_ Methods](#err_-methods)
-  - [post_ Methods](#post_-methods)
-- [Generic Hooks](#generic-hooks)
-- [Hooks Process Checklist](#hooks-process-checklist)
-
-## Introduction
-
-You can execute hook methods around Aurora API Client methods when they are called by the Aurora Command Line commands.
-
-Explaining how hooks work is a bit tricky because of some indirection about what they apply to. Basically, a hook is code that executes when a particular Aurora Client API method runs, letting you extend the method's actions. The hook executes on the client side, specifically on the machine executing Aurora commands.
-
-The catch is that hooks are associated with Aurora Client API methods, which users don't directly call. Instead, users call Aurora Command Line commands, which call Client API methods during their execution. Since which hooks run depend on which Client API methods get called, you will need to know which Command Line commands call which API methods. Later on, there is a table showing the various associations.
-
-**Terminology Note**: From now on, "method(s)" refer to Client API methods, and "command(s)" refer to Command Line commands.
-
-## Hook Types
-
-Hooks have three basic types, differing by when they run with respect to their associated method.
-
-`pre_<method_name>`: When its associated method is called, the `pre_` hook executes first, then the called method. If the `pre_` hook fails, the method never runs. Later code that expected the method to succeed may be affected by this, and result in terminating the Aurora client.
-
-Note that a `pre_` hook can error-trap internally so it does not
-return `False`. Designers/contributors of new `pre_` hooks should
-consider whether or not to error-trap them. You can error trap at the
-highest level very generally and always pass the `pre_` hook by
-returning `True`. For example:
-
-```python
-def pre_create(...):
-  do_something()  # if do_something fails with an exception, the create_job is not attempted!
-  return True
-
-# However...
-def pre_create(...):
-  try:
-    do_something()  # may cause exception
-  except Exception:  # generic error trap will catch it
-    pass  # and ignore the exception
-  return True  # create_job will run in any case!
-```
-
-`post_<method_name>`: A `post_` hook executes after its associated method successfully finishes running. If it fails, the already executed method is unaffected. A `post_` hook's error is trapped, and any later operations are unaffected.
-
-`err_<method_name>`: Executes only when its associated method returns a status other than OK or throws an exception. If an `err_` hook fails, the already executed method is unaffected. An `err_` hook's error is trapped, and any later operations are unaffected.
-
-## Execution Order
-
-A command with `pre_`, `post_`, and `err_` hooks defined and activated for its called method executes in the following order when the method successfully executes:
-
-1. Command called
-2. Command code executes
-3. Method Called
-4. `pre_` method hook runs
-5. Method runs and successfully finishes
-6. `post_` method hook runs
-7. Command code executes
-8. Command execution ends
-
-The following is what happens when, for the same command and hooks, the method associated with the command suffers an error and does not successfully finish executing:
-
-1. Command called
-2. Command code executes
-3. Method Called
-4. `pre_` method hook runs
-5. Method runs and fails
-6. `err_` method hook runs
-7. Command Code executes (if `err_` method does not end the command execution)
-8. Command execution ends
-
-Note that the `post_` and `err_` hooks for the same method can never both run for a single execution of that method.
-
-## Hookable Methods
-
-You can associate `pre_`, `post_`, and `err_` hooks with the following methods. Since you do not directly interact with the methods, but rather the Aurora Command Line commands that call them, for each method we also list the command(s) that can call the method. Note that a different method or methods may be called by a command depending on how the command's other code executes. Similarly, multiple commands can call the same method. We also list the methods' argument signatures, which are used by their associated hooks. <a name="Chart"></a>
-
-  Aurora Client API Method | Client API Method Argument Signature | Aurora Command Line Command
-  -------------------------| ------------------------------------- | ---------------------------
-  ```create_job``` | ```self```, ```config``` | ```job create```, <code>runtask
-  ```restart``` | ```self```, ```job_key```, ```shards```, ```update_config```, ```health_check_interval_seconds``` | ```job restart```
-  ```kill_job``` | ```self```, ```job_key```, ```shards=None``` |  ```job kill```
-  ```start_cronjob``` | ```self```, ```job_key``` | ```cron start```
-  ```start_job_update``` | ```self```, ```config```, ```instances=None``` | ```update start```
-
-Some specific examples:
-
-* `pre_create_job` executes when a `create_job` method is called, and before the `create_job` method itself executes.
-
-* `post_cancel_update` executes after a `cancel_update` method has successfully finished running.
-
-* `err_kill_job` executes when the `kill_job` method is called, but doesn't successfully finish running.
-
-## Activating and Using Hooks
-
-By default, hooks are inactive. If you do not want to use hooks, you do not need to make any changes to your code. If you do want to use hooks, you will need to alter your `.aurora` config file to activate them both for the configuration as a whole as well as for individual `Job`s. And, of course, you will need to define in your config file what happens when a particular hook executes.
-
-## .aurora Config File Settings
-
-You can define a top-level `hooks` variable in any `.aurora` config file. `hooks` is a list of all objects that define hooks used by `Job`s defined in that config file. If you do not want to define any hooks for a configuration, `hooks` is optional.
-
-    hooks = [Object_with_defined_hooks1, Object_with_defined_hooks2]
-
-Be careful when assembling a config file using `include` on multiple smaller config files. If there are multiple files that assign a value to `hooks`, only the last assignment made will stick. For example, if `x.aurora` has `hooks = [a, b, c]` and `y.aurora` has `hooks = [d, e, f]` and `z.aurora` has, in this order, `include x.aurora` and `include y.aurora`, the `hooks` value will be `[d, e, f]`.
-
-Also, for any `Job` that you want to use hooks with, its `Job` definition in the `.aurora` config file must set an `enable_hooks` flag to `True` (it defaults to `False`). By default, hooks are disabled and you must enable them for `Job`s of your choice.
-
-To summarize, to use hooks for a particular job, you must both activate hooks for your config file as a whole, and for that job. Activating hooks only for individual jobs won't work, nor will only activating hooks for your config file as a whole. You must also specify the hooks' defining object in the `hooks` variable.
-
-Recall that `.aurora` config files are written in Pystachio. So the following turns on hooks for production jobs at cluster1 and cluster2, but leaves them off for similar jobs with a defined user role. Of course, you also need to list the objects that define the hooks in your config file's `hooks` variable.
-
-```python
-jobs = [
-        Job(enable_hooks = True, cluster = c, env = 'prod') for c in ('cluster1', 'cluster2')
-       ]
-jobs.extend(
-   Job(cluster = c, env = 'prod', role = getpass.getuser()) for c in ('cluster1', 'cluster2'))
-   # Hooks disabled for these jobs
-```
-
-## Command Line
-
-All Aurora Command Line commands now accept an `.aurora` config file as an optional parameter (some, of course, accept it as a required parameter). Whenever a command has a `.aurora` file parameter, any hooks specified and activated in the `.aurora` file can be used. For example:
-
-    aurora job restart cluster1/role/env/app myapp.aurora
-
-The command activates any hooks specified and activated in `myapp.aurora`. For the `restart` command, that is the only thing the `myapp.aurora` parameter does. So, if the command was the following, since there is no `.aurora` config file to specify any hooks, no hooks on the `restart` command can run.
-
-    aurora job restart cluster1/role/env/app
-
-## Hooks Protocol
-
-Any object defined in the `.aurora` config file can define hook methods. You should define your hook methods within a class, and then use the class name as a value in the `hooks` list in your config file.
-
-Note that you can define other methods in the class that its hook methods can call; all the logic of a hook does not have to be in its definition.
-
-The following example defines a class containing a `pre_kill_job` hook definition that calls another method defined in the class.
-
-```python
-# Defines a method pre_kill_job
-class KillConfirmer(object):
-  def confirm(self, msg):
-    return raw_input(msg).lower() == 'yes'
-
-  def pre_kill_job(self, job_key, shards=None):
-    shards = ('shards %s' % shards) if shards is not None else 'all shards'
-    return self.confirm('Are you sure you want to kill %s (%s)? (yes/no): '
-                        % (job_key, shards))
-```
-
-### pre_ Methods
-
-`pre_` methods have the signature:
-
-    pre_<API method name>(self, <associated method's signature>)
-
-`pre_` methods have the same signature as their associated method, with the addition of `self` as the first parameter. See the [chart](#Chart) above for the mapping of parameters to methods. When writing `pre_` methods, you can use the `*` and `**` syntax to designate that all unspecified parameters are passed in a list to the `*`ed variable and all named parameters with values are passed as name/value pairs to the `**`ed variable.
-
-If this method returns False, the API command call aborts.
-
-### err_ Methods
-
-`err_` methods have the signature:
-
-    err_<API method name>(self, exc, <associated method's signature>)
-
-`err_` methods have the same signature as their associated method, with the addition of a first parameter `self` and a second parameter `exc`. `exc` is either a result with responseCode other than `ResponseCode.OK` or an `Exception`. See the [chart](#Chart) above for the mapping of parameters to methods. When writing `err`_ methods, you can use the `*` and `**` syntax to designate that all unspecified parameters are passed in a list to the `*`ed variable and all named parameters with values are passed as name/value pairs to the `**`ed variable.
-
-`err_` method return codes are ignored.
-
-### post_ Methods
-
-`post_` methods have the signature:
-
-    post_<API method name>(self, result, <associated method signature>)
-
-`post_` method parameters are `self`, then `result`, followed by the same parameter signature as their associated method. `result` is the result of the associated method call. See the [chart](#chart) above for the mapping of parameters to methods. When writing `post_` methods, you can use the `*` and `**` syntax to designate that all unspecified arguments are passed in a list to the `*`ed parameter and all unspecified named arguments with values are passed as name/value pairs to the `**`ed parameter.
-
-`post_` method return codes are ignored.
-
-## Generic Hooks
-
-There are seven Aurora API Methods which any of the three hook types can attach to. Thus, there are 21 possible hook/method combinations for a single `.aurora` config file. Say that you define `pre_` and `post_` hooks for the `restart` method. That leaves 19 undefined hook/method combinations; `err_restart` and the 3 `pre_`, `post_`, and `err_` hooks for each of the other 6 hookable methods. You can define what happens when any of these otherwise undefined 19 hooks execute via a generic hook, whose signature is:
-
-```python
-generic_hook(self, hook_config, event, method_name, result_or_err, args*, kw**)
-```
-
-where:
-
-* `hook_config` is a named tuple of `config` (the Pystashio `config` object) and `job_key`.
-
-* `event` is one of `pre`, `err`, or `post`, indicating which type of hook the genetic hook is standing in for. For example, assume no specific hooks were defined for the `restart` API command. If `generic_hook` is defined and activated, and `restart` is called, `generic_hook` will effectively run as `pre_restart`, `post_restart`, and `err_restart`. You can use a selection statement on this value so that `generic_hook` will act differently based on whether it is standing in for a `pre_`, `post_`, or `err_` hook.
-
-* `method_name` is the Client API method name whose execution is causing this execution of the `generic_hook`.
-
-* `args*`, `kw**` are the API method arguments and keyword arguments respectively.
-* `result_or_err` is a tri-state parameter taking one of these three values:
-  1. None for `pre_`hooks
-  2. `result` for `post_` nooks
-  3. `exc` for `err_` hooks
-
-Example:
-
-```python
-# Overrides the standard do-nothing generic_hook by adding a log writing operation.
-from twitter.common import log
-  class Logger(object):
-    '''Adds to the log every time a hookable API method is called'''
-    def generic_hook(self, hook_config, event, method_name, result_or_err, *args, **kw)
-      log.info('%s: %s_%s of %s'
-               % (self.__class__.__name__, event, method_name, hook_config.job_key))
-```
-
-## Hooks Process Checklist
-
-1. In your `.aurora` config file, add a `hooks` variable. Note that you may want to define a `.aurora` file only for hook definitions and then include this file in multiple other config files that you want to use the same hooks.
-
-```python
-hooks = []
-```
-
-2. In the `hooks` variable, list all objects that define hooks used by `Job`s defined in this config:
-
-```python
-hooks = [Object_hook_definer1, Object_hook_definer2]
-```
-
-3. For each job that uses hooks in this config file, add `enable_hooks = True` to the `Job` definition. Note that this is necessary even if you only want to use the generic hook.
-
-4. Write your `pre_`, `post_`, and `err_` hook definitions as part of an object definition in your `.aurora` config file.
-
-5. If desired, write your `generic_hook` definition as part of an object definition in your `.aurora` config file. Remember, the object must be listed as a member of `hooks`.
-
-6. If your Aurora command line command does not otherwise take an `.aurora` config file argument, add the appropriate `.aurora` file as an argument in order to define and activate the configuration's hooks.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/installing.md
----------------------------------------------------------------------
diff --git a/docs/installing.md b/docs/installing.md
deleted file mode 100644
index 5cc153a..0000000
--- a/docs/installing.md
+++ /dev/null
@@ -1,335 +0,0 @@
-# Installing Aurora
-
-- [Components](#components)
-    - [Machine profiles](#machine-profiles)
-      - [Coordinator](#coordinator)
-      - [Worker](#worker)
-      - [Client](#client)
-- [Getting Aurora](#getting-aurora)
-    - [Building your own binary packages](#building-your-own-binary-packages)
-    - [RPMs](#rpms)
-- [Installing the scheduler](#installing-the-scheduler)
-    - [Ubuntu Trusty](#ubuntu-trusty)
-    - [CentOS 7](#centos-7)
-    - [Finalizing](#finalizing)
-    - [Configuration](#configuration)
-- [Installing worker components](#installing-worker-components)
-    - [Ubuntu Trusty](#ubuntu-trusty-1)
-    - [CentOS 7](#centos-7-1)
-    - [Configuration](#configuration-1)
-- [Installing the client](#installing-the-client)
-    - [Ubuntu Trusty](#ubuntu-trusty-2)
-    - [CentOS 7](#centos-7-2)
-    - [Configuration](#configuration-2)
-- [See also](#see-also)
-- [Installing Mesos](#installing-mesos)
-    - [Mesos on Ubuntu Trusty](#mesos-on-ubuntu-trusty)
-    - [Mesos on CentOS 7](#mesos-on-centos-7)
-
-## Components
-Before installing Aurora, it's important to have an understanding of the components that make up
-a functioning Aurora cluster.
-
-![Aurora Components](images/components.png)
-
-* **Aurora scheduler**  
-  The scheduler will be your primary interface to the work you run in your cluster.  You will
-  instruct it to run jobs, and it will manage them in Mesos for you.  You will also frequently use
-  the scheduler's web interface as a heads-up display for what's running in your cluster.
-
-* **Aurora client**  
-  The client (`aurora` command) is a command line tool that exposes primitives that you can use to
-  interact with the scheduler.
-
-  Aurora also provides an admin client (`aurora_admin` command) that contains commands built for
-  cluster administrators.  You can use this tool to do things like manage user quotas and manage
-  graceful maintenance on machines in cluster.
-
-* **Aurora executor**  
-  The executor (a.k.a. Thermos executor) is responsible for carrying out the workloads described in
-  the Aurora DSL (`.aurora` files).  The executor is what actually executes user processes.  It will
-  also perform health checking of tasks and register tasks in ZooKeeper for the purposes of dynamic
-  service discovery.  You can find lots more detail on the executor and Thermos in the
-  [user guide](user-guide.md).
-
-* **Aurora observer**  
-  The observer provides browser-based access to the status of individual tasks executing on worker
-  machines.  It gives insight into the processes executing, and facilitates browsing of task sandbox
-  directories.
-
-* **ZooKeeper**  
-  [ZooKeeper](http://zookeeper.apache.org) is a distributed consensus system.  In an Aurora cluster
-  it is used for reliable election of the leading Aurora scheduler and Mesos master.
-
-* **Mesos master**  
-  The master is responsible for tracking worker machines and performing accounting of their
-  resources.  The scheduler interfaces with the master to control the cluster.
-
-* **Mesos agent**  
-  The agent receives work assigned by the scheduler and executes them.  It interfaces with Linux
-  isolation systems like cgroups, namespaces and Docker to manage the resource consumption of tasks.
-  When a user task is launched, the agent will launch the executor (in the context of a Linux cgroup
-  or Docker container depending upon the environment), which will in turn fork user processes.
-
-## Machine profiles
-Given that many of these components communicate over the network, there are numerous ways you could
-assemble them to create an Aurora cluster.  The simplest way is to think in terms of three machine
-profiles:
-
-### Coordinator
-**Components**: ZooKeeper, Aurora scheduler, Mesos master
-
-A small number of machines (typically 3 or 5) responsible for cluster orchestration.  In most cases
-it is fine to co-locate these components in anything but very large clusters (> 1000 machines).
-Beyond that point, operators will likely want to manage these services on separate machines.
-
-In practice, 5 coordinators have been shown to reliably manage clusters with tens of thousands of
-machines.
-
-
-### Worker
-**Components**: Aurora executor, Aurora observer, Mesos agent
-
-The bulk of the cluster, where services will actually run.
-
-### Client
-**Components**: Aurora client, Aurora admin client
-
-Any machines that users submit jobs from.
-
-## Getting Aurora
-Source and binary distributions can be found on our
-[downloads](https://aurora.apache.org/downloads/) page.  Installing from binary packages is
-recommended for most.
-
-### Building your own binary packages
-Our package build toolchain makes it easy to build your own packages if you would like.  See the
-[instructions](https://github.com/apache/aurora-packaging) to learn how.
-
-## Installing the scheduler
-### Ubuntu Trusty
-
-1. Install Mesos  
-   Skip down to [install mesos](#mesos-on-ubuntu-trusty), then run:
-
-        sudo start mesos-master
-
-2. Install ZooKeeper
-
-        sudo apt-get install -y zookeeperd
-
-3. Install the Aurora scheduler
-
-        sudo add-apt-repository -y ppa:openjdk-r/ppa
-        sudo apt-get update
-        sudo apt-get install -y openjdk-8-jre-headless wget
-
-        sudo update-alternatives --set java /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
-
-        wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-scheduler_0.12.0_amd64.deb
-        sudo dpkg -i aurora-scheduler_0.12.0_amd64.deb
-
-### CentOS 7
-
-1. Install Mesos  
-   Skip down to [install mesos](#mesos-on-centos-7), then run:
-
-        sudo systemctl start mesos-master
-
-2. Install ZooKeeper
-
-        sudo rpm -Uvh https://archive.cloudera.com/cdh4/one-click-install/redhat/6/x86_64/cloudera-cdh-4-0.x86_64.rpm
-        sudo yum install -y java-1.8.0-openjdk-headless zookeeper-server
-
-        sudo service zookeeper-server init
-        sudo systemctl start zookeeper-server
-
-3. Install the Aurora scheduler
-
-        sudo yum install -y wget
-
-        wget -c https://apache.bintray.com/aurora/centos-7/aurora-scheduler-0.12.0-1.el7.centos.aurora.x86_64.rpm
-        sudo yum install -y aurora-scheduler-0.12.0-1.el7.centos.aurora.x86_64.rpm
-
-### Finalizing
-By default, the scheduler will start in an uninitialized mode.  This is because external
-coordination is necessary to be certain operator error does not result in a quorum of schedulers
-starting up and believing their databases are empty when in fact they should be re-joining a
-cluster.
-
-Because of this, a fresh install of the scheduler will need intervention to start up.  First,
-stop the scheduler service.  
-Ubuntu: `sudo stop aurora-scheduler`  
-CentOS: `sudo systemctl stop aurora`
-
-Now initialize the database:
-
-    sudo -u aurora mkdir -p /var/lib/aurora/scheduler/db
-    sudo -u aurora mesos-log initialize --path=/var/lib/aurora/scheduler/db
-
-Now you can start the scheduler back up.  
-Ubuntu: `sudo start aurora-scheduler`  
-CentOS: `sudo systemctl start aurora`
-
-### Configuration
-For more detail on this topic, see the dedicated page on
-[deploying the scheduler](deploying-aurora-scheduler.md)
-
-
-## Installing worker components
-### Ubuntu Trusty
-
-1. Install Mesos  
-   Skip down to [install mesos](#mesos-on-ubuntu-trusty), then run:
-
-        start mesos-slave
-
-2. Install Aurora executor and observer
-
-        sudo apt-get install -y python2.7 wget
-
-        # NOTE: This appears to be a missing dependency of the mesos deb package and is needed
-        # for the python mesos native bindings.
-        sudo apt-get -y install libcurl4-nss-dev
-
-        wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-executor_0.12.0_amd64.deb
-        sudo dpkg -i aurora-executor_0.12.0_amd64.deb
-
-### CentOS 7
-
-1. Install Mesos  
-   Skip down to [install mesos](#mesos-on-centos-7), then run:
-
-        sudo systemctl start mesos-slave
-
-2. Install Aurora executor and observer
-
-        sudo yum install -y python2 wget
-
-        wget -c https://apache.bintray.com/aurora/centos-7/aurora-executor-0.12.0-1.el7.centos.aurora.x86_64.rpm
-        sudo yum install -y aurora-executor-0.12.0-1.el7.centos.aurora.x86_64.rpm
-
-### Configuration
-The executor typically does not require configuration.  Command line arguments can
-be passed to the executor using a command line argument on the scheduler.
-
-The observer needs to be configured to look at the correct mesos directory in order to find task
-sandboxes. You should 1st find the Mesos working directory by looking for the Mesos slave
-`--work_dir` flag. You should see something like:
-
-        ps -eocmd | grep "mesos-slave" | grep -v grep | tr ' ' '\n' | grep "\--work_dir"
-        --work_dir=/var/lib/mesos
-
-If the flag is not set, you can view the default value like so:
-
-        mesos-slave --help
-        Usage: mesos-slave [options]
-
-          ...
-          --work_dir=VALUE      Directory path to place framework work directories
-                                (default: /tmp/mesos)
-          ...
-
-The value you find for `--work_dir`, `/var/lib/mesos` in this example, should match the Aurora
-observer value for `--mesos-root`.  You can look for that setting in a similar way on a worker
-node by grepping for `thermos_observer` and `--mesos-root`.  If the flag is not set, you can view
-the default value like so:
-
-        thermos_observer -h
-        Options:
-          ...
-          --mesos-root=MESOS_ROOT
-                                The mesos root directory to search for Thermos
-                                executor sandboxes [default: /var/lib/mesos]
-          ...
-
-In this case the default is `/var/lib/mesos` and we have a match. If there is no match, you can
-either adjust the mesos-master start script(s) and restart the master(s) or else adjust the
-Aurora observer start scripts and restart the observers.  To adjust the Aurora observer:
-
-#### Ubuntu Trusty
-
-    sudo sh -c 'echo "MESOS_ROOT=/tmp/mesos" >> /etc/default/thermos'
-
-NB: In Aurora releases up through 0.12.0, you'll also need to edit /etc/init/thermos.conf like so:
-
-    diff -C 1 /etc/init/thermos.conf.orig /etc/init/thermos.conf
-    *** /etc/init/thermos.conf.orig 2016-03-22 22:34:46.286199718 +0000
-    --- /etc/init/thermos.conf  2016-03-22 17:09:49.357689038 +0000
-    ***************
-    *** 24,25 ****
-    --- 24,26 ----
-          --port=${OBSERVER_PORT:-1338} \
-    +     --mesos-root=${MESOS_ROOT:-/var/lib/mesos} \
-          --log_to_disk=NONE \
-
-#### CentOS 7
-
-Make an edit to add the `--mesos-root` flag resulting in something like:
-
-    grep -A5 OBSERVER_ARGS /etc/sysconfig/thermos-observer
-    OBSERVER_ARGS=(
-      --port=1338
-      --mesos-root=/tmp/mesos
-      --log_to_disk=NONE
-      --log_to_stderr=google:INFO
-    )
-
-## Installing the client
-### Ubuntu Trusty
-
-    sudo apt-get install -y python2.7 wget
-
-    wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-tools_0.12.0_amd64.deb
-    sudo dpkg -i aurora-tools_0.12.0_amd64.deb
-
-### CentOS 7
-
-    sudo yum install -y python2 wget
-
-    wget -c https://apache.bintray.com/aurora/centos-7/aurora-tools-0.12.0-1.el7.centos.aurora.x86_64.rpm
-    sudo yum install -y aurora-tools-0.12.0-1.el7.centos.aurora.x86_64.rpm
-
-### Mac OS X
-
-    brew upgrade
-    brew install aurora-cli
-
-### Configuration
-Client configuration lives in a json file that describes the clusters available and how to reach
-them.  By default this file is at `/etc/aurora/clusters.json`.
-
-Jobs may be submitted to the scheduler using the client, and are described with
-[job configurations](configuration-reference.md) expressed in `.aurora` files.  Typically you will
-maintain a single job configuration file to describe one or more deployment environments (e.g.
-dev, test, prod) for a production job.
-
-## See also
-We have other docs that you will find useful once you have your cluster up and running:
-
-- [Monitor](monitoring.md) your cluster
-- Enable scheduler [security](security.md)
-- View job SLA [statistics](sla.md)
-- Understand the internals of the scheduler's [storage](storage.md)
-
-## Installing Mesos
-Mesos uses a single package for the Mesos master and slave.  As a result, the package dependencies
-are identical for both.
-
-### Mesos on Ubuntu Trusty
-
-    sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E56151BF
-    DISTRO=$(lsb_release -is | tr '[:upper:]' '[:lower:]')
-    CODENAME=$(lsb_release -cs)
-
-    echo "deb http://repos.mesosphere.io/${DISTRO} ${CODENAME} main" | \
-      sudo tee /etc/apt/sources.list.d/mesosphere.list
-    sudo apt-get -y update
-
-    # Use `apt-cache showpkg mesos | grep [version]` to find the exact version.
-    sudo apt-get -y install mesos=0.25.0-0.2.70.ubuntu1404
-
-### Mesos on CentOS 7
-
-    sudo rpm -Uvh https://repos.mesosphere.io/el/7/noarch/RPMS/mesosphere-el-repo-7-1.noarch.rpm
-    sudo yum -y install mesos-0.25.0

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/monitoring.md
----------------------------------------------------------------------
diff --git a/docs/monitoring.md b/docs/monitoring.md
deleted file mode 100644
index 3cb2a79..0000000
--- a/docs/monitoring.md
+++ /dev/null
@@ -1,181 +0,0 @@
-# Monitoring your Aurora cluster
-
-Before you start running important services in your Aurora cluster, it's important to set up
-monitoring and alerting of Aurora itself.  Most of your monitoring can be against the scheduler,
-since it will give you a global view of what's going on.
-
-## Reading stats
-The scheduler exposes a *lot* of instrumentation data via its HTTP interface. You can get a quick
-peek at the first few of these in our vagrant image:
-
-    $ vagrant ssh -c 'curl -s localhost:8081/vars | head'
-    async_tasks_completed 1004
-    attribute_store_fetch_all_events 15
-    attribute_store_fetch_all_events_per_sec 0.0
-    attribute_store_fetch_all_nanos_per_event 0.0
-    attribute_store_fetch_all_nanos_total 3048285
-    attribute_store_fetch_all_nanos_total_per_sec 0.0
-    attribute_store_fetch_one_events 3391
-    attribute_store_fetch_one_events_per_sec 0.0
-    attribute_store_fetch_one_nanos_per_event 0.0
-    attribute_store_fetch_one_nanos_total 454690753
-
-These values are served as `Content-Type: text/plain`, with each line containing a space-separated metric
-name and value. Values may be integers, doubles, or strings (note: strings are static, others
-may be dynamic).
-
-If your monitoring infrastructure prefers JSON, the scheduler exports that as well:
-
-    $ vagrant ssh -c 'curl -s localhost:8081/vars.json | python -mjson.tool | head'
-    {
-        "async_tasks_completed": 1009,
-        "attribute_store_fetch_all_events": 15,
-        "attribute_store_fetch_all_events_per_sec": 0.0,
-        "attribute_store_fetch_all_nanos_per_event": 0.0,
-        "attribute_store_fetch_all_nanos_total": 3048285,
-        "attribute_store_fetch_all_nanos_total_per_sec": 0.0,
-        "attribute_store_fetch_one_events": 3409,
-        "attribute_store_fetch_one_events_per_sec": 0.0,
-        "attribute_store_fetch_one_nanos_per_event": 0.0,
-
-This will be the same data as above, served with `Content-Type: application/json`.
-
-## Viewing live stat samples on the scheduler
-The scheduler uses the Twitter commons stats library, which keeps an internal time-series database
-of exported variables - nearly everything in `/vars` is available for instant graphing.  This is
-useful for debugging, but is not a replacement for an external monitoring system.
-
-You can view these graphs on a scheduler at `/graphview`.  It supports some composition and
-aggregation of values, which can be invaluable when triaging a problem.  For example, if you have
-the scheduler running in vagrant, check out these links:
-[simple graph](http://192.168.33.7:8081/graphview?query=jvm_uptime_secs)
-[complex composition](http://192.168.33.7:8081/graphview?query=rate\(scheduler_log_native_append_nanos_total\)%2Frate\(scheduler_log_native_append_events\)%2F1e6)
-
-### Counters and gauges
-Among numeric stats, there are two fundamental types of stats exported: _counters_ and _gauges_.
-Counters are guaranteed to be monotonically-increasing for the lifetime of a process, while gauges
-may decrease in value.  Aurora uses counters to represent things like the number of times an event
-has occurred, and gauges to capture things like the current length of a queue.  Counters are a
-natural fit for accurate composition into [rate ratios](http://en.wikipedia.org/wiki/Rate_ratio)
-(useful for sample-resistant latency calculation), while gauges are not.
-
-# Alerting
-
-## Quickstart
-If you are looking for just bare-minimum alerting to get something in place quickly, set up alerting
-on `framework_registered` and `task_store_LOST`. These will give you a decent picture of overall
-health.
-
-## A note on thresholds
-One of the most difficult things in monitoring is choosing alert thresholds. With many of these
-stats, there is no value we can offer as a threshold that will be guaranteed to work for you. It
-will depend on the size of your cluster, number of jobs, churn of tasks in the cluster, etc. We
-recommend you start with a strict value after viewing a small amount of collected data, and then
-adjust thresholds as you see fit. Feel free to ask us if you would like to validate that your alerts
-and thresholds make sense.
-
-## Important stats
-
-### `jvm_uptime_secs`
-Type: integer counter
-
-The number of seconds the JVM process has been running. Comes from
-[RuntimeMXBean#getUptime()](http://docs.oracle.com/javase/7/docs/api/java/lang/management/RuntimeMXBean.html#getUptime\(\))
-
-Detecting resets (decreasing values) on this stat will tell you that the scheduler is failing to
-stay alive.
-
-Look at the scheduler logs to identify the reason the scheduler is exiting.
-
-### `system_load_avg`
-Type: double gauge
-
-The current load average of the system for the last minute. Comes from
-[OperatingSystemMXBean#getSystemLoadAverage()](http://docs.oracle.com/javase/7/docs/api/java/lang/management/OperatingSystemMXBean.html?is-external=true#getSystemLoadAverage\(\)).
-
-A high sustained value suggests that the scheduler machine may be over-utilized.
-
-Use standard unix tools like `top` and `ps` to track down the offending process(es).
-
-### `process_cpu_cores_utilized`
-Type: double gauge
-
-The current number of CPU cores in use by the JVM process. This should not exceed the number of
-logical CPU cores on the machine. Derived from
-[OperatingSystemMXBean#getProcessCpuTime()](http://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html)
-
-A high sustained value indicates that the scheduler is overworked. Due to current internal design
-limitations, if this value is sustained at `1`, there is a good chance the scheduler is under water.
-
-There are two main inputs that tend to drive this figure: task scheduling attempts and status
-updates from Mesos.  You may see activity in the scheduler logs to give an indication of where
-time is being spent.  Beyond that, it really takes good familiarity with the code to effectively
-triage this.  We suggest engaging with an Aurora developer.
-
-### `task_store_LOST`
-Type: integer gauge
-
-The number of tasks stored in the scheduler that are in the `LOST` state, and have been rescheduled.
-
-If this value is increasing at a high rate, it is a sign of trouble.
-
-There are many sources of `LOST` tasks in Mesos: the scheduler, master, slave, and executor can all
-trigger this.  The first step is to look in the scheduler logs for `LOST` to identify where the
-state changes are originating.
-
-### `scheduler_resource_offers`
-Type: integer counter
-
-The number of resource offers that the scheduler has received.
-
-For a healthy scheduler, this value must be increasing over time.
-
-Assuming the scheduler is up and otherwise healthy, you will want to check if the master thinks it
-is sending offers. You should also look at the master's web interface to see if it has a large
-number of outstanding offers that it is waiting to be returned.
-
-### `framework_registered`
-Type: binary integer counter
-
-Will be `1` for the leading scheduler that is registered with the Mesos master, `0` for passive
-schedulers,
-
-A sustained period without a `1` (or where `sum() != 1`) warrants investigation.
-
-If there is no leading scheduler, look in the scheduler and master logs for why.  If there are
-multiple schedulers claiming leadership, this suggests a split brain and warrants filing a critical
-bug.
-
-### `rate(scheduler_log_native_append_nanos_total)/rate(scheduler_log_native_append_events)`
-Type: rate ratio of integer counters
-
-This composes two counters to compute a windowed figure for the latency of replicated log writes.
-
-A hike in this value suggests disk bandwidth contention.
-
-Look in scheduler logs for any reported oddness with saving to the replicated log. Also use
-standard tools like `vmstat` and `iotop` to identify whether the disk has become slow or
-over-utilized. We suggest using a dedicated disk for the replicated log to mitigate this.
-
-### `timed_out_tasks`
-Type: integer counter
-
-Tracks the number of times the scheduler has given up while waiting
-(for `-transient_task_state_timeout`) to hear back about a task that is in a transient state
-(e.g. `ASSIGNED`, `KILLING`), and has moved to `LOST` before rescheduling.
-
-This value is currently known to increase occasionally when the scheduler fails over
-([AURORA-740](https://issues.apache.org/jira/browse/AURORA-740)). However, any large spike in this
-value warrants investigation.
-
-The scheduler will log when it times out a task. You should trace the task ID of the timed out
-task into the master, slave, and/or executors to determine where the message was dropped.
-
-### `http_500_responses_events`
-Type: integer counter
-
-The total number of HTTP 500 status responses sent by the scheduler. Includes API and asset serving.
-
-An increase warrants investigation.
-
-Look in scheduler logs to identify why the scheduler returned a 500, there should be a stack trace.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/operations/backup-restore.md
----------------------------------------------------------------------
diff --git a/docs/operations/backup-restore.md b/docs/operations/backup-restore.md
new file mode 100644
index 0000000..da467c3
--- /dev/null
+++ b/docs/operations/backup-restore.md
@@ -0,0 +1,91 @@
+# Recovering from a Scheduler Backup
+
+**Be sure to read the entire page before attempting to restore from a backup, as it may have
+unintended consequences.**
+
+# Summary
+
+The restoration procedure replaces the existing (possibly corrupted) Mesos replicated log with an
+earlier, backed up, version and requires all schedulers to be taken down temporarily while
+restoring. Once completed, the scheduler state resets to what it was when the backup was created.
+This means any jobs/tasks created or updated after the backup are unknown to the scheduler and will
+be killed shortly after the cluster restarts. All other tasks continue operating as normal.
+
+Usually, it is a bad idea to restore a backup that is not extremely recent (i.e. older than a few
+hours). This is because the scheduler will expect the cluster to look exactly as the backup does,
+so any tasks that have been rescheduled since the backup was taken will be killed.
+
+Instructions below have been verified in [Vagrant environment](../getting-started/vagrant.md) and with minor
+syntax/path changes should be applicable to any Aurora cluster.
+
+# Preparation
+
+Follow these steps to prepare the cluster for restoring from a backup:
+
+* Stop all scheduler instances
+
+* Consider blocking external traffic on a port defined in `-http_port` for all schedulers to
+prevent users from interacting with the scheduler during the restoration process. This will help
+troubleshooting by reducing the scheduler log noise and prevent users from making changes that will
+be erased after the backup snapshot is restored.
+
+* Configure `aurora_admin` access to run all commands listed in
+  [Restore from backup](#restore-from-backup) section locally on the leading scheduler:
+  * Make sure the [clusters.json](../reference/client-cluster-configuration.md) file configured to
+    access scheduler directly. Set `scheduler_uri` setting and remove `zk`. Since leader can get
+    re-elected during the restore steps, consider doing it on all scheduler replicas.
+  * Depending on your particular security approach you will need to either turn off scheduler
+    authorization by removing scheduler `-http_authentication_mechanism` flag or make sure the
+    direct scheduler access is properly authorized. E.g.: in case of Kerberos you will need to make
+    a `/etc/hosts` file change to match your local IP to the scheduler URL configured in keytabs:
+
+        <local_ip> <scheduler_domain_in_keytabs>
+
+* Next steps are required to put scheduler into a partially disabled state where it would still be
+able to accept storage recovery requests but unable to schedule or change task states. This may be
+accomplished by updating the following scheduler configuration options:
+  * Set `-mesos_master_address` to a non-existent zk address. This will prevent scheduler from
+    registering with Mesos. E.g.: `-mesos_master_address=zk://localhost:1111/mesos/master`
+  * `-max_registration_delay` - set to sufficiently long interval to prevent registration timeout
+    and as a result scheduler suicide. E.g: `-max_registration_delay=360mins`
+  * Make sure `-reconciliation_initial_delay` option is set high enough (e.g.: `365days`) to
+    prevent accidental task GC. This is important as scheduler will attempt to reconcile the cluster
+    state and will kill all tasks when restarted with an empty Mesos replicated log.
+
+* Restart all schedulers
+
+# Cleanup and re-initialize Mesos replicated log
+
+Get rid of the corrupted files and re-initialize Mesos replicated log:
+
+* Stop schedulers
+* Delete all files under `-native_log_file_path` on all schedulers
+* Initialize Mesos replica's log file: `sudo mesos-log initialize --path=<-native_log_file_path>`
+* Start schedulers
+
+# Restore from backup
+
+At this point the scheduler is ready to rehydrate from the backup:
+
+* Identify the leading scheduler by:
+  * examining the `scheduler_lifecycle_LEADER_AWAITING_REGISTRATION` metric at the scheduler
+    `/vars` endpoint. Leader will have 1. All other replicas - 0.
+  * examining scheduler logs
+  * or examining Zookeeper registration under the path defined by `-zk_endpoints`
+    and `-serverset_path`
+
+* Locate the desired backup file, copy it to the leading scheduler's `-backup_dir` folder and stage
+recovery by running the following command on a leader
+`aurora_admin scheduler_stage_recovery --bypass-leader-redirect <cluster> scheduler-backup-<yyyy-MM-dd-HH-mm>`
+
+* At this point, the recovery snapshot is staged and available for manual verification/modification
+via `aurora_admin scheduler_print_recovery_tasks --bypass-leader-redirect` and
+`scheduler_delete_recovery_tasks --bypass-leader-redirect` commands.
+See `aurora_admin help <command>` for usage details.
+
+* Commit recovery. This instructs the scheduler to overwrite the existing Mesos replicated log with
+the provided backup snapshot and initiate a mandatory failover
+`aurora_admin scheduler_commit_recovery --bypass-leader-redirect  <cluster>`
+
+# Cleanup
+Undo any modification done during [Preparation](#preparation) sequence.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/operations/configuration.md
----------------------------------------------------------------------
diff --git a/docs/operations/configuration.md b/docs/operations/configuration.md
new file mode 100644
index 0000000..f9e8844
--- /dev/null
+++ b/docs/operations/configuration.md
@@ -0,0 +1,182 @@
+# Scheduler Configuration
+
+The Aurora scheduler can take a variety of configuration options through command-line arguments.
+Examples are available under `examples/scheduler/`. For a list of available Aurora flags and their
+documentation, see [Scheduler Configuration Reference](../reference/scheduler-configuration.md).
+
+
+## A Note on Configuration
+Like Mesos, Aurora uses command-line flags for runtime configuration. As such the Aurora
+"configuration file" is typically a `scheduler.sh` shell script of the form.
+
+    #!/bin/bash
+    AURORA_HOME=/usr/local/aurora-scheduler
+
+    # Flags controlling the JVM.
+    JAVA_OPTS=(
+      -Xmx2g
+      -Xms2g
+      # GC tuning, etc.
+    )
+
+    # Flags controlling the scheduler.
+    AURORA_FLAGS=(
+      # Port for client RPCs and the web UI
+      -http_port=8081
+      # Log configuration, etc.
+    )
+
+    # Environment variables controlling libmesos
+    export JAVA_HOME=...
+    export GLOG_v=1
+    # Port used to communicate with the Mesos master and for the replicated log
+    export LIBPROCESS_PORT=8083
+
+    JAVA_OPTS="${JAVA_OPTS[*]}" exec "$AURORA_HOME/bin/aurora-scheduler" "${AURORA_FLAGS[@]}"
+
+That way Aurora's current flags are visible in `ps` and in the `/vars` admin endpoint.
+
+
+## Replicated Log Configuration
+
+Aurora schedulers use ZooKeeper to discover log replicas and elect a leader. Only one scheduler is
+leader at a given time - the other schedulers follow log writes and prepare to take over as leader
+but do not communicate with the Mesos master. Either 3 or 5 schedulers are recommended in a
+production deployment depending on failure tolerance and they must have persistent storage.
+
+Below is a summary of scheduler storage configuration flags that either don't have default values
+or require attention before deploying in a production environment.
+
+### `-native_log_quorum_size`
+Defines the Mesos replicated log quorum size. In a cluster with `N` schedulers, the flag
+`-native_log_quorum_size` should be set to `floor(N/2) + 1`. So in a cluster with 1 scheduler
+it should be set to `1`, in a cluster with 3 it should be set to `2`, and in a cluster of 5 it
+should be set to `3`.
+
+  Number of schedulers (N) | ```-native_log_quorum_size``` setting (```floor(N/2) + 1```)
+  ------------------------ | -------------------------------------------------------------
+  1                        | 1
+  3                        | 2
+  5                        | 3
+  7                        | 4
+
+*Incorrectly setting this flag will cause data corruption to occur!*
+
+### `-native_log_file_path`
+Location of the Mesos replicated log files. Consider allocating a dedicated disk (preferably SSD)
+for Mesos replicated log files to ensure optimal storage performance.
+
+### `-native_log_zk_group_path`
+ZooKeeper path used for Mesos replicated log quorum discovery.
+
+See [code](../../src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java) for
+other available Mesos replicated log configuration options and default values.
+
+### Changing the Quorum Size
+Special care needs to be taken when changing the size of the Aurora scheduler quorum.
+Since Aurora uses a Mesos replicated log, similar steps need to be followed as when
+[changing the mesos quorum size](http://mesos.apache.org/documentation/latest/operational-guide).
+
+As a preparation, increase `-native_log_quorum_size` on each existing scheduler and restart them.
+When updating from 3 to 5 schedulers, the quorum size would grow from 2 to 3.
+
+When starting the new schedulers, use the `-native_log_quorum_size` set to the new value. Failing to
+first increase the quorum size on running schedulers can in some cases result in corruption
+or truncating of the replicated log used by Aurora. In that case, see the documentation on
+[recovering from backup](backup-restore.md).
+
+
+## Backup Configuration
+
+Configuration options for the Aurora scheduler backup manager.
+
+### `-backup_interval`
+The interval on which the scheduler writes local storage backups.  The default is every hour.
+
+### `-backup_dir`
+Directory to write backups to.
+
+### `-max_saved_backups`
+Maximum number of backups to retain before deleting the oldest backup(s).
+
+
+## Process Logs
+
+### Log destination
+By default, Thermos will write process stdout/stderr to log files in the sandbox. Process object configuration
+allows specifying alternate log file destinations like streamed stdout/stderr or suppression of all log output.
+Default behavior can be configured for the entire cluster with the following flag (through the `-thermos_executor_flags`
+argument to the Aurora scheduler):
+
+    --runner-logger-destination=both
+
+`both` configuration will send logs to files and stream to parent stdout/stderr outputs.
+
+See [Configuration Reference](../reference/configuration.md#logger) for all destination options.
+
+### Log rotation
+By default, Thermos will not rotate the stdout/stderr logs from child processes and they will grow
+without bound. An individual user may change this behavior via configuration on the Process object,
+but it may also be desirable to change the default configuration for the entire cluster.
+In order to enable rotation by default, the following flags can be applied to Thermos (through the
+-thermos_executor_flags argument to the Aurora scheduler):
+
+    --runner-logger-mode=rotate
+    --runner-rotate-log-size-mb=100
+    --runner-rotate-log-backups=10
+
+In the above example, each instance of the Thermos runner will rotate stderr/stdout logs once they
+reach 100 MiB in size and keep a maximum of 10 backups. If a user has provided a custom setting for
+their process, it will override these default settings.
+
+
+
+## Thermos Executor Wrapper
+
+If you need to do computation before starting the thermos executor (for example, setting a different
+`--announcer-hostname` parameter for every executor), then the thermos executor should be invoked
+ inside a wrapper script. In such a case, the aurora scheduler should be started with
+ `-thermos_executor_path` pointing to the wrapper script and `-thermos_executor_resources`
+ set to a comma separated string of all the resources that should be copied into
+ the sandbox (including the original thermos executor).
+
+For example, to wrap the executor inside a simple wrapper, the scheduler will be started like this
+`-thermos_executor_path=/path/to/wrapper.sh -thermos_executor_resources=/usr/share/aurora/bin/thermos_executor.pex`
+
+
+
+### Docker containers
+In order for Aurora to launch jobs using docker containers, a few extra configuration options
+must be set.  The [docker containerizer](http://mesos.apache.org/documentation/latest/docker-containerizer/)
+must be enabled on the mesos slaves by launching them with the `--containerizers=docker,mesos` option.
+
+By default, Aurora will configure Mesos to copy the file specified in `-thermos_executor_path`
+into the container's sandbox.  If using a wrapper script to launch the thermos executor,
+specify the path to the wrapper in that argument. In addition, the path to the executor pex itself
+must be included in the `-thermos_executor_resources` option. Doing so will ensure that both the
+wrapper script and executor are correctly copied into the sandbox. Finally, ensure the wrapper
+script does not access resources outside of the sandbox, as when the script is run from within a
+docker container those resources will not exist.
+
+A scheduler flag, `-global_container_mounts` allows mounting paths from the host (i.e., the slave)
+into all containers on that host. The format is a comma separated list of host_path:container_path[:mode]
+tuples. For example `-global_container_mounts=/opt/secret_keys_dir:/mnt/secret_keys_dir:ro` mounts
+`/opt/secret_keys_dir` from the slaves into all launched containers. Valid modes are `ro` and `rw`.
+
+If you would like to supply your own parameters to `docker run` when launching jobs in docker
+containers, you may use the following flags:
+
+    -allow_docker_parameters
+    -default_docker_parameters
+
+`-allow_docker_parameters` controls whether or not users may pass their own configuration parameters
+through the job configuration files. If set to `false` (the default), the scheduler will reject
+jobs with custom parameters. *NOTE*: this setting should be used with caution as it allows any job
+owner to specify any parameters they wish, including those that may introduce security concerns
+(`privileged=true`, for example).
+
+`-default_docker_parameters` allows a cluster operator to specify a universal set of parameters that
+should be used for every container that does not have parameters explicitly configured at the job
+level. The argument accepts a multimap format:
+
+    -default_docker_parameters="read-only=true,tmpfs=/tmp,tmpfs=/run"

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/operations/installation.md
----------------------------------------------------------------------
diff --git a/docs/operations/installation.md b/docs/operations/installation.md
new file mode 100644
index 0000000..739c322
--- /dev/null
+++ b/docs/operations/installation.md
@@ -0,0 +1,324 @@
+# Installing Aurora
+
+Source and binary distributions can be found on our
+[downloads](https://aurora.apache.org/downloads/) page.  Installing from binary packages is
+recommended for most.
+
+- [Installing the scheduler](#installing-the-scheduler)
+- [Installing worker components](#installing-worker-components)
+- [Installing the client](#installing-the-client)
+- [Installing Mesos](#installing-mesos)
+- [Troubleshooting](#troubleshooting)
+
+If our binay packages don't suite you, our package build toolchain makes it easy to build your
+own packages. See the [instructions](https://github.com/apache/aurora-packaging) to learn how.
+
+
+## Machine profiles
+
+Given that many of these components communicate over the network, there are numerous ways you could
+assemble them to create an Aurora cluster.  The simplest way is to think in terms of three machine
+profiles:
+
+### Coordinator
+**Components**: ZooKeeper, Aurora scheduler, Mesos master
+
+A small number of machines (typically 3 or 5) responsible for cluster orchestration.  In most cases
+it is fine to co-locate these components in anything but very large clusters (> 1000 machines).
+Beyond that point, operators will likely want to manage these services on separate machines.
+
+In practice, 5 coordinators have been shown to reliably manage clusters with tens of thousands of
+machines.
+
+### Worker
+**Components**: Aurora executor, Aurora observer, Mesos agent
+
+The bulk of the cluster, where services will actually run.
+
+### Client
+**Components**: Aurora client, Aurora admin client
+
+Any machines that users submit jobs from.
+
+
+## Installing the scheduler
+### Ubuntu Trusty
+
+1. Install Mesos
+   Skip down to [install mesos](#mesos-on-ubuntu-trusty), then run:
+
+        sudo start mesos-master
+
+2. Install ZooKeeper
+
+        sudo apt-get install -y zookeeperd
+
+3. Install the Aurora scheduler
+
+        sudo add-apt-repository -y ppa:openjdk-r/ppa
+        sudo apt-get update
+        sudo apt-get install -y openjdk-8-jre-headless wget
+
+        sudo update-alternatives --set java /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
+
+        wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-scheduler_0.12.0_amd64.deb
+        sudo dpkg -i aurora-scheduler_0.12.0_amd64.deb
+
+### CentOS 7
+
+1. Install Mesos
+   Skip down to [install mesos](#mesos-on-centos-7), then run:
+
+        sudo systemctl start mesos-master
+
+2. Install ZooKeeper
+
+        sudo rpm -Uvh https://archive.cloudera.com/cdh4/one-click-install/redhat/6/x86_64/cloudera-cdh-4-0.x86_64.rpm
+        sudo yum install -y java-1.8.0-openjdk-headless zookeeper-server
+
+        sudo service zookeeper-server init
+        sudo systemctl start zookeeper-server
+
+3. Install the Aurora scheduler
+
+        sudo yum install -y wget
+
+        wget -c https://apache.bintray.com/aurora/centos-7/aurora-scheduler-0.12.0-1.el7.centos.aurora.x86_64.rpm
+        sudo yum install -y aurora-scheduler-0.12.0-1.el7.centos.aurora.x86_64.rpm
+
+### Finalizing
+By default, the scheduler will start in an uninitialized mode.  This is because external
+coordination is necessary to be certain operator error does not result in a quorum of schedulers
+starting up and believing their databases are empty when in fact they should be re-joining a
+cluster.
+
+Because of this, a fresh install of the scheduler will need intervention to start up.  First,
+stop the scheduler service.
+Ubuntu: `sudo stop aurora-scheduler`
+CentOS: `sudo systemctl stop aurora`
+
+Now initialize the database:
+
+    sudo -u aurora mkdir -p /var/lib/aurora/scheduler/db
+    sudo -u aurora mesos-log initialize --path=/var/lib/aurora/scheduler/db
+
+Now you can start the scheduler back up.
+Ubuntu: `sudo start aurora-scheduler`
+CentOS: `sudo systemctl start aurora`
+
+
+## Installing worker components
+### Ubuntu Trusty
+
+1. Install Mesos
+   Skip down to [install mesos](#mesos-on-ubuntu-trusty), then run:
+
+        start mesos-slave
+
+2. Install Aurora executor and observer
+
+        sudo apt-get install -y python2.7 wget
+
+        # NOTE: This appears to be a missing dependency of the mesos deb package and is needed
+        # for the python mesos native bindings.
+        sudo apt-get -y install libcurl4-nss-dev
+
+        wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-executor_0.12.0_amd64.deb
+        sudo dpkg -i aurora-executor_0.12.0_amd64.deb
+
+### CentOS 7
+
+1. Install Mesos
+   Skip down to [install mesos](#mesos-on-centos-7), then run:
+
+        sudo systemctl start mesos-slave
+
+2. Install Aurora executor and observer
+
+        sudo yum install -y python2 wget
+
+        wget -c https://apache.bintray.com/aurora/centos-7/aurora-executor-0.12.0-1.el7.centos.aurora.x86_64.rpm
+        sudo yum install -y aurora-executor-0.12.0-1.el7.centos.aurora.x86_64.rpm
+
+### Configuration
+The executor typically does not require configuration.  Command line arguments can
+be passed to the executor using a command line argument on the scheduler.
+
+The observer needs to be configured to look at the correct mesos directory in order to find task
+sandboxes. You should 1st find the Mesos working directory by looking for the Mesos slave
+`--work_dir` flag. You should see something like:
+
+        ps -eocmd | grep "mesos-slave" | grep -v grep | tr ' ' '\n' | grep "\--work_dir"
+        --work_dir=/var/lib/mesos
+
+If the flag is not set, you can view the default value like so:
+
+        mesos-slave --help
+        Usage: mesos-slave [options]
+
+          ...
+          --work_dir=VALUE      Directory path to place framework work directories
+                                (default: /tmp/mesos)
+          ...
+
+The value you find for `--work_dir`, `/var/lib/mesos` in this example, should match the Aurora
+observer value for `--mesos-root`.  You can look for that setting in a similar way on a worker
+node by grepping for `thermos_observer` and `--mesos-root`.  If the flag is not set, you can view
+the default value like so:
+
+        thermos_observer -h
+        Options:
+          ...
+          --mesos-root=MESOS_ROOT
+                                The mesos root directory to search for Thermos
+                                executor sandboxes [default: /var/lib/mesos]
+          ...
+
+In this case the default is `/var/lib/mesos` and we have a match. If there is no match, you can
+either adjust the mesos-master start script(s) and restart the master(s) or else adjust the
+Aurora observer start scripts and restart the observers.  To adjust the Aurora observer:
+
+#### Ubuntu Trusty
+
+    sudo sh -c 'echo "MESOS_ROOT=/tmp/mesos" >> /etc/default/thermos'
+
+NB: In Aurora releases up through 0.12.0, you'll also need to edit /etc/init/thermos.conf like so:
+
+    diff -C 1 /etc/init/thermos.conf.orig /etc/init/thermos.conf
+    *** /etc/init/thermos.conf.orig 2016-03-22 22:34:46.286199718 +0000
+    --- /etc/init/thermos.conf  2016-03-22 17:09:49.357689038 +0000
+    ***************
+    *** 24,25 ****
+    --- 24,26 ----
+          --port=${OBSERVER_PORT:-1338} \
+    +     --mesos-root=${MESOS_ROOT:-/var/lib/mesos} \
+          --log_to_disk=NONE \
+
+#### CentOS 7
+
+Make an edit to add the `--mesos-root` flag resulting in something like:
+
+    grep -A5 OBSERVER_ARGS /etc/sysconfig/thermos-observer
+    OBSERVER_ARGS=(
+      --port=1338
+      --mesos-root=/tmp/mesos
+      --log_to_disk=NONE
+      --log_to_stderr=google:INFO
+    )
+
+## Installing the client
+### Ubuntu Trusty
+
+    sudo apt-get install -y python2.7 wget
+
+    wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-tools_0.12.0_amd64.deb
+    sudo dpkg -i aurora-tools_0.12.0_amd64.deb
+
+### CentOS 7
+
+    sudo yum install -y python2 wget
+
+    wget -c https://apache.bintray.com/aurora/centos-7/aurora-tools-0.12.0-1.el7.centos.aurora.x86_64.rpm
+    sudo yum install -y aurora-tools-0.12.0-1.el7.centos.aurora.x86_64.rpm
+
+### Mac OS X
+
+    brew upgrade
+    brew install aurora-cli
+
+### Configuration
+Client configuration lives in a json file that describes the clusters available and how to reach
+them.  By default this file is at `/etc/aurora/clusters.json`.
+
+Jobs may be submitted to the scheduler using the client, and are described with
+[job configurations](../reference/configuration.md) expressed in `.aurora` files.  Typically you will
+maintain a single job configuration file to describe one or more deployment environments (e.g.
+dev, test, prod) for a production job.
+
+
+## Installing Mesos
+Mesos uses a single package for the Mesos master and slave.  As a result, the package dependencies
+are identical for both.
+
+### Mesos on Ubuntu Trusty
+
+    sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E56151BF
+    DISTRO=$(lsb_release -is | tr '[:upper:]' '[:lower:]')
+    CODENAME=$(lsb_release -cs)
+
+    echo "deb http://repos.mesosphere.io/${DISTRO} ${CODENAME} main" | \
+      sudo tee /etc/apt/sources.list.d/mesosphere.list
+    sudo apt-get -y update
+
+    # Use `apt-cache showpkg mesos | grep [version]` to find the exact version.
+    sudo apt-get -y install mesos=0.25.0-0.2.70.ubuntu1404
+
+### Mesos on CentOS 7
+
+    sudo rpm -Uvh https://repos.mesosphere.io/el/7/noarch/RPMS/mesosphere-el-repo-7-1.noarch.rpm
+    sudo yum -y install mesos-0.25.0
+
+
+
+## Troubleshooting
+So you've started your first cluster and are running into some issues? We've collected some common
+stumbling blocks and solutions here to help get you moving.
+
+### Replicated log not initialized
+
+#### Symptoms
+- Scheduler RPCs and web interface claim `Storage is not READY`
+- Scheduler log repeatedly prints messages like
+
+  ```
+  I1016 16:12:27.234133 26081 replica.cpp:638] Replica in EMPTY status
+  received a broadcasted recover request
+  I1016 16:12:27.234256 26084 recover.cpp:188] Received a recover response
+  from a replica in EMPTY status
+  ```
+
+#### Solution
+When you create a new cluster, you need to inform a quorum of schedulers that they are safe to
+consider their database to be empty by [initializing](#initializing-the-replicated-log) the
+replicated log. This is done to prevent the scheduler from modifying the cluster state in the event
+of multiple simultaneous disk failures or, more likely, misconfiguration of the replicated log path.
+
+
+### Scheduler not registered
+
+#### Symptoms
+Scheduler log contains
+
+    Framework has not been registered within the tolerated delay.
+
+#### Solution
+Double-check that the scheduler is configured correctly to reach the Mesos master. If you are registering
+the master in ZooKeeper, make sure command line argument to the master:
+
+    --zk=zk://$ZK_HOST:2181/mesos/master
+
+is the same as the one on the scheduler:
+
+    -mesos_master_address=zk://$ZK_HOST:2181/mesos/master
+
+
+### Scheduler not running
+
+### Symptom
+The scheduler process commits suicide regularly. This happens under error conditions, but
+also on purpose in regular intervals.
+
+## Solution
+Aurora is meant to be run under supervision. You have to configure a supervisor like
+[Monit](http://mmonit.com/monit/) or [supervisord](http://supervisord.org/) to run the scheduler
+and restart it whenever it fails or exists on purpose.
+
+Aurora supports an active health checking protocol on its admin HTTP interface - if a `GET /health`
+times out or returns anything other than `200 OK` the scheduler process is unhealthy and should be
+restarted.
+
+For example, monit can be configured with
+
+    if failed port 8081 send "GET /health HTTP/1.0\r\n" expect "OK\n" with timeout 2 seconds for 10 cycles then restart
+
+assuming you set `-http_port=8081`.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/operations/monitoring.md
----------------------------------------------------------------------
diff --git a/docs/operations/monitoring.md b/docs/operations/monitoring.md
new file mode 100644
index 0000000..3cb2a79
--- /dev/null
+++ b/docs/operations/monitoring.md
@@ -0,0 +1,181 @@
+# Monitoring your Aurora cluster
+
+Before you start running important services in your Aurora cluster, it's important to set up
+monitoring and alerting of Aurora itself.  Most of your monitoring can be against the scheduler,
+since it will give you a global view of what's going on.
+
+## Reading stats
+The scheduler exposes a *lot* of instrumentation data via its HTTP interface. You can get a quick
+peek at the first few of these in our vagrant image:
+
+    $ vagrant ssh -c 'curl -s localhost:8081/vars | head'
+    async_tasks_completed 1004
+    attribute_store_fetch_all_events 15
+    attribute_store_fetch_all_events_per_sec 0.0
+    attribute_store_fetch_all_nanos_per_event 0.0
+    attribute_store_fetch_all_nanos_total 3048285
+    attribute_store_fetch_all_nanos_total_per_sec 0.0
+    attribute_store_fetch_one_events 3391
+    attribute_store_fetch_one_events_per_sec 0.0
+    attribute_store_fetch_one_nanos_per_event 0.0
+    attribute_store_fetch_one_nanos_total 454690753
+
+These values are served as `Content-Type: text/plain`, with each line containing a space-separated metric
+name and value. Values may be integers, doubles, or strings (note: strings are static, others
+may be dynamic).
+
+If your monitoring infrastructure prefers JSON, the scheduler exports that as well:
+
+    $ vagrant ssh -c 'curl -s localhost:8081/vars.json | python -mjson.tool | head'
+    {
+        "async_tasks_completed": 1009,
+        "attribute_store_fetch_all_events": 15,
+        "attribute_store_fetch_all_events_per_sec": 0.0,
+        "attribute_store_fetch_all_nanos_per_event": 0.0,
+        "attribute_store_fetch_all_nanos_total": 3048285,
+        "attribute_store_fetch_all_nanos_total_per_sec": 0.0,
+        "attribute_store_fetch_one_events": 3409,
+        "attribute_store_fetch_one_events_per_sec": 0.0,
+        "attribute_store_fetch_one_nanos_per_event": 0.0,
+
+This will be the same data as above, served with `Content-Type: application/json`.
+
+## Viewing live stat samples on the scheduler
+The scheduler uses the Twitter commons stats library, which keeps an internal time-series database
+of exported variables - nearly everything in `/vars` is available for instant graphing.  This is
+useful for debugging, but is not a replacement for an external monitoring system.
+
+You can view these graphs on a scheduler at `/graphview`.  It supports some composition and
+aggregation of values, which can be invaluable when triaging a problem.  For example, if you have
+the scheduler running in vagrant, check out these links:
+[simple graph](http://192.168.33.7:8081/graphview?query=jvm_uptime_secs)
+[complex composition](http://192.168.33.7:8081/graphview?query=rate\(scheduler_log_native_append_nanos_total\)%2Frate\(scheduler_log_native_append_events\)%2F1e6)
+
+### Counters and gauges
+Among numeric stats, there are two fundamental types of stats exported: _counters_ and _gauges_.
+Counters are guaranteed to be monotonically-increasing for the lifetime of a process, while gauges
+may decrease in value.  Aurora uses counters to represent things like the number of times an event
+has occurred, and gauges to capture things like the current length of a queue.  Counters are a
+natural fit for accurate composition into [rate ratios](http://en.wikipedia.org/wiki/Rate_ratio)
+(useful for sample-resistant latency calculation), while gauges are not.
+
+# Alerting
+
+## Quickstart
+If you are looking for just bare-minimum alerting to get something in place quickly, set up alerting
+on `framework_registered` and `task_store_LOST`. These will give you a decent picture of overall
+health.
+
+## A note on thresholds
+One of the most difficult things in monitoring is choosing alert thresholds. With many of these
+stats, there is no value we can offer as a threshold that will be guaranteed to work for you. It
+will depend on the size of your cluster, number of jobs, churn of tasks in the cluster, etc. We
+recommend you start with a strict value after viewing a small amount of collected data, and then
+adjust thresholds as you see fit. Feel free to ask us if you would like to validate that your alerts
+and thresholds make sense.
+
+## Important stats
+
+### `jvm_uptime_secs`
+Type: integer counter
+
+The number of seconds the JVM process has been running. Comes from
+[RuntimeMXBean#getUptime()](http://docs.oracle.com/javase/7/docs/api/java/lang/management/RuntimeMXBean.html#getUptime\(\))
+
+Detecting resets (decreasing values) on this stat will tell you that the scheduler is failing to
+stay alive.
+
+Look at the scheduler logs to identify the reason the scheduler is exiting.
+
+### `system_load_avg`
+Type: double gauge
+
+The current load average of the system for the last minute. Comes from
+[OperatingSystemMXBean#getSystemLoadAverage()](http://docs.oracle.com/javase/7/docs/api/java/lang/management/OperatingSystemMXBean.html?is-external=true#getSystemLoadAverage\(\)).
+
+A high sustained value suggests that the scheduler machine may be over-utilized.
+
+Use standard unix tools like `top` and `ps` to track down the offending process(es).
+
+### `process_cpu_cores_utilized`
+Type: double gauge
+
+The current number of CPU cores in use by the JVM process. This should not exceed the number of
+logical CPU cores on the machine. Derived from
+[OperatingSystemMXBean#getProcessCpuTime()](http://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html)
+
+A high sustained value indicates that the scheduler is overworked. Due to current internal design
+limitations, if this value is sustained at `1`, there is a good chance the scheduler is under water.
+
+There are two main inputs that tend to drive this figure: task scheduling attempts and status
+updates from Mesos.  You may see activity in the scheduler logs to give an indication of where
+time is being spent.  Beyond that, it really takes good familiarity with the code to effectively
+triage this.  We suggest engaging with an Aurora developer.
+
+### `task_store_LOST`
+Type: integer gauge
+
+The number of tasks stored in the scheduler that are in the `LOST` state, and have been rescheduled.
+
+If this value is increasing at a high rate, it is a sign of trouble.
+
+There are many sources of `LOST` tasks in Mesos: the scheduler, master, slave, and executor can all
+trigger this.  The first step is to look in the scheduler logs for `LOST` to identify where the
+state changes are originating.
+
+### `scheduler_resource_offers`
+Type: integer counter
+
+The number of resource offers that the scheduler has received.
+
+For a healthy scheduler, this value must be increasing over time.
+
+Assuming the scheduler is up and otherwise healthy, you will want to check if the master thinks it
+is sending offers. You should also look at the master's web interface to see if it has a large
+number of outstanding offers that it is waiting to be returned.
+
+### `framework_registered`
+Type: binary integer counter
+
+Will be `1` for the leading scheduler that is registered with the Mesos master, `0` for passive
+schedulers,
+
+A sustained period without a `1` (or where `sum() != 1`) warrants investigation.
+
+If there is no leading scheduler, look in the scheduler and master logs for why.  If there are
+multiple schedulers claiming leadership, this suggests a split brain and warrants filing a critical
+bug.
+
+### `rate(scheduler_log_native_append_nanos_total)/rate(scheduler_log_native_append_events)`
+Type: rate ratio of integer counters
+
+This composes two counters to compute a windowed figure for the latency of replicated log writes.
+
+A hike in this value suggests disk bandwidth contention.
+
+Look in scheduler logs for any reported oddness with saving to the replicated log. Also use
+standard tools like `vmstat` and `iotop` to identify whether the disk has become slow or
+over-utilized. We suggest using a dedicated disk for the replicated log to mitigate this.
+
+### `timed_out_tasks`
+Type: integer counter
+
+Tracks the number of times the scheduler has given up while waiting
+(for `-transient_task_state_timeout`) to hear back about a task that is in a transient state
+(e.g. `ASSIGNED`, `KILLING`), and has moved to `LOST` before rescheduling.
+
+This value is currently known to increase occasionally when the scheduler fails over
+([AURORA-740](https://issues.apache.org/jira/browse/AURORA-740)). However, any large spike in this
+value warrants investigation.
+
+The scheduler will log when it times out a task. You should trace the task ID of the timed out
+task into the master, slave, and/or executors to determine where the message was dropped.
+
+### `http_500_responses_events`
+Type: integer counter
+
+The total number of HTTP 500 status responses sent by the scheduler. Includes API and asset serving.
+
+An increase warrants investigation.
+
+Look in scheduler logs to identify why the scheduler returned a 500, there should be a stack trace.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/operations/security.md
----------------------------------------------------------------------
diff --git a/docs/operations/security.md b/docs/operations/security.md
new file mode 100644
index 0000000..1a3d9b7
--- /dev/null
+++ b/docs/operations/security.md
@@ -0,0 +1,282 @@
+Securing your Aurora Cluster
+============================
+
+Aurora integrates with [Apache Shiro](http://shiro.apache.org/) to provide security
+controls for its API. In addition to providing some useful features out of the box, Shiro
+also allows Aurora cluster administrators to adapt the security system to their organization’s
+existing infrastructure.
+
+- [Enabling Security](#enabling-security)
+- [Authentication](#authentication)
+	- [HTTP Basic Authentication](#http-basic-authentication)
+		- [Server Configuration](#server-configuration)
+		- [Client Configuration](#client-configuration)
+	- [HTTP SPNEGO Authentication (Kerberos)](#http-spnego-authentication-kerberos)
+		- [Server Configuration](#server-configuration-1)
+		- [Client Configuration](#client-configuration-1)
+- [Authorization](#authorization)
+	- [Using an INI file to define security controls](#using-an-ini-file-to-define-security-controls)
+		- [Caveats](#caveats)
+- [Implementing a Custom Realm](#implementing-a-custom-realm)
+	- [Packaging a realm module](#packaging-a-realm-module)
+- [Known Issues](#known-issues)
+
+# Enabling Security
+
+There are two major components of security:
+[authentication and authorization](http://en.wikipedia.org/wiki/Authentication#Authorization).  A
+cluster administrator may choose the approach used for each, and may also implement custom
+mechanisms for either.  Later sections describe the options available.
+
+# Authentication
+
+The scheduler must be configured with instructions for how to process authentication
+credentials at a minimum.  There are currently two built-in authentication schemes -
+[HTTP Basic Authentication](http://en.wikipedia.org/wiki/Basic_access_authentication), and
+[SPNEGO](http://en.wikipedia.org/wiki/SPNEGO) (Kerberos).
+
+## HTTP Basic Authentication
+
+Basic Authentication is a very quick way to add *some* security.  It is supported
+by all major browsers and HTTP client libraries with minimal work.  However,
+before relying on Basic Authentication you should be aware of the [security
+considerations](http://tools.ietf.org/html/rfc2617#section-4).
+
+### Server Configuration
+
+At a minimum you need to set 4 command-line flags on the scheduler:
+
+```
+-http_authentication_mechanism=BASIC
+-shiro_realm_modules=INI_AUTHNZ
+-shiro_ini_path=path/to/security.ini
+```
+
+And create a security.ini file like so:
+
+```
+[users]
+sally = apple, admin
+
+[roles]
+admin = *
+```
+
+The details of the security.ini file are explained below. Note that this file contains plaintext,
+unhashed passwords.
+
+### Client Configuration
+
+To configure the client for HTTP Basic authentication, add an entry to ~/.netrc with your credentials
+
+```
+% cat ~/.netrc
+# ...
+
+machine aurora.example.com
+login sally
+password apple
+
+# ...
+```
+
+No changes are required to `clusters.json`.
+
+## HTTP SPNEGO Authentication (Kerberos)
+
+### Server Configuration
+At a minimum you need to set 6 command-line flags on the scheduler:
+
+```
+-http_authentication_mechanism=NEGOTIATE
+-shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ
+-kerberos_server_principal=HTTP/aurora.example.com@EXAMPLE.COM
+-kerberos_server_keytab=path/to/aurora.example.com.keytab
+-shiro_ini_path=path/to/security.ini
+```
+
+And create a security.ini file like so:
+
+```
+% cat path/to/security.ini
+[users]
+sally = _, admin
+
+[roles]
+admin = *
+```
+
+What's going on here? First, Aurora must be configured to request Kerberos credentials when presented with an
+unauthenticated request. This is achieved by setting
+
+```
+-http_authentication_mechanism=NEGOTIATE
+```
+
+Next, a Realm module must be configured to **authenticate** the current request using the Kerberos
+credentials that were requested. Aurora ships with a realm module that can do this
+
+```
+-shiro_realm_modules=KERBEROS5_AUTHN[,...]
+```
+
+The Kerberos5Realm requires a keytab file and a server principal name. The principal name will usually
+be in the form `HTTP/aurora.example.com@EXAMPLE.COM`.
+
+```
+-kerberos_server_principal=HTTP/aurora.example.com@EXAMPLE.COM
+-kerberos_server_keytab=path/to/aurora.example.com.keytab
+```
+
+The Kerberos5 realm module is authentication-only. For scheduler security to work you must also
+enable a realm module that provides an Authorizer implementation. For example, to do this using the
+IniShiroRealmModule:
+
+```
+-shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ
+```
+
+You can then configure authorization using a security.ini file as described below
+(the password field is ignored). You must configure the realm module with the path to this file:
+
+```
+-shiro_ini_path=path/to/security.ini
+```
+
+### Client Configuration
+To use Kerberos on the client-side you must build Kerberos-enabled client binaries. Do this with
+
+```
+./pants binary src/main/python/apache/aurora/kerberos:kaurora
+./pants binary src/main/python/apache/aurora/kerberos:kaurora_admin
+```
+
+You must also configure each cluster where you've enabled Kerberos on the scheduler
+to use Kerberos authentication. Do this by setting `auth_mechanism` to `KERBEROS`
+in `clusters.json`.
+
+```
+% cat ~/.aurora/clusters.json
+{
+    "devcluser": {
+        "auth_mechanism": "KERBEROS",
+        ...
+    },
+    ...
+}
+```
+
+# Authorization
+Given a means to authenticate the entity a client claims they are, we need to define what privileges they have.
+
+## Using an INI file to define security controls
+
+The simplest security configuration for Aurora is an INI file on the scheduler.  For small
+clusters, or clusters where the users and access controls change relatively infrequently, this is
+likely the preferred approach.  However you may want to avoid this approach if access permissions
+are rapidly changing, or if your access control information already exists in another system.
+
+You can enable INI-based configuration with following scheduler command line arguments:
+
+```
+-http_authentication_mechanism=BASIC
+-shiro_ini_path=path/to/security.ini
+```
+
+*note* As the argument name reveals, this is using Shiro’s
+[IniRealm](http://shiro.apache.org/configuration.html#Configuration-INIConfiguration) behind
+the scenes.
+
+The INI file will contain two sections - users and roles.  Here’s an example for what might
+be in security.ini:
+
+```
+[users]
+sally = apple, admin
+jim = 123456, accounting
+becky = letmein, webapp
+larry = 654321,accounting
+steve = password
+
+[roles]
+admin = *
+accounting = thrift.AuroraAdmin:setQuota
+webapp = thrift.AuroraSchedulerManager:*:webapp
+```
+
+The users section defines user user credentials and the role(s) they are members of.  These lines
+are of the format `<user> = <password>[, <role>...]`.  As you probably noticed, the passwords are
+in plaintext and as a result read access to this file should be restricted.
+
+In this configuration, each user has different privileges for actions in the cluster because
+of the roles they are a part of:
+
+* admin is granted all privileges
+* accounting may adjust the amount of resource quota for any role
+* webapp represents a collection of jobs that represents a service, and its members may create and modify any jobs owned by it
+
+### Caveats
+You might find documentation on the Internet suggesting there are additional sections in `shiro.ini`,
+like `[main]` and `[urls]`. These are not supported by Aurora as it uses a different mechanism to configure
+those parts of Shiro. Think of Aurora's `security.ini` as a subset with only `[users]` and `[roles]` sections.
+
+## Implementing Delegated Authorization
+
+It is possible to leverage Shiro's `runAs` feature by implementing a custom Servlet Filter that provides
+the capability and passing it's fully qualified class name to the command line argument
+`-shiro_after_auth_filter`. The filter is registered in the same filter chain as the Shiro auth filters
+and is placed after the Shiro auth filters in the filter chain. This ensures that the Filter is invoked
+after the Shiro filters have had a chance to authenticate the request.
+
+# Implementing a Custom Realm
+
+Since Aurora’s security is backed by [Apache Shiro](https://shiro.apache.org), you can implement a
+custom [Realm](http://shiro.apache.org/realm.html) to define organization-specific security behavior.
+
+In addition to using Shiro's standard APIs to implement a Realm you can link against Aurora to
+access the type-safe Permissions Aurora uses. See the Javadoc for `org.apache.aurora.scheduler.spi`
+for more information.
+
+## Packaging a realm module
+Package your custom Realm(s) with a Guice module that exposes a `Set<Realm>` multibinding.
+
+```java
+package com.example;
+
+import com.google.inject.AbstractModule;
+import com.google.inject.multibindings.Multibinder;
+import org.apache.shiro.realm.Realm;
+
+public class MyRealmModule extends AbstractModule {
+  @Override
+  public void configure() {
+    Realm myRealm = new MyRealm();
+
+    Multibinder.newSetBinder(binder(), Realm.class).addBinding().toInstance(myRealm);
+  }
+
+  static class MyRealm implements Realm {
+    // Realm implementation.
+  }
+}
+```
+
+To use your module in the scheduler, include it as a realm module based on its fully-qualified
+class name:
+
+```
+-shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ,com.example.MyRealmModule
+```
+
+# Known Issues
+
+While the APIs and SPIs we ship with are stable as of 0.8.0, we are aware of several incremental
+improvements. Please follow, vote, or send patches.
+
+Relevant tickets:
+* [AURORA-343](https://issues.apache.org/jira/browse/AURORA-343): HTTPS support
+* [AURORA-1248](https://issues.apache.org/jira/browse/AURORA-1248): Client retries 4xx errors
+* [AURORA-1279](https://issues.apache.org/jira/browse/AURORA-1279): Remove kerberos-specific build targets
+* [AURORA-1293](https://issues.apache.org/jira/browse/AURORA-1291): Consider defining a JSON format in place of INI
+* [AURORA-1179](https://issues.apache.org/jira/browse/AURORA-1179): Supported hashed passwords in security.ini
+* [AURORA-1295](https://issues.apache.org/jira/browse/AURORA-1295): Support security for the ReadOnlyScheduler service

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/operations/storage.md
----------------------------------------------------------------------
diff --git a/docs/operations/storage.md b/docs/operations/storage.md
new file mode 100644
index 0000000..c30922f
--- /dev/null
+++ b/docs/operations/storage.md
@@ -0,0 +1,97 @@
+# Aurora Scheduler Storage
+
+- [Overview](#overview)
+- [Replicated Log Configuration](#replicated-log-configuration)
+- [Backup Configuration](#replicated-log-configuration)
+- [Storage Semantics](#storage-semantics)
+  - [Reads, writes, modifications](#reads-writes-modifications)
+    - [Read lifecycle](#read-lifecycle)
+    - [Write lifecycle](#write-lifecycle)
+  - [Atomicity, consistency and isolation](#atomicity-consistency-and-isolation)
+  - [Population on restart](#population-on-restart)
+
+
+## Overview
+
+Aurora scheduler maintains data that need to be persisted to survive failovers and restarts.
+For example:
+
+* Task configurations and scheduled task instances
+* Job update configurations and update progress
+* Production resource quotas
+* Mesos resource offer host attributes
+
+Aurora solves its persistence needs by leveraging the Mesos implementation of a Paxos replicated
+log [[1]](https://ramcloud.stanford.edu/~ongaro/userstudy/paxos.pdf)
+[[2]](http://en.wikipedia.org/wiki/State_machine_replication) with a key-value
+[LevelDB](https://github.com/google/leveldb) storage as persistence media.
+
+Conceptually, it can be represented by the following major components:
+
+* Volatile storage: in-memory cache of all available data. Implemented via in-memory
+[H2 Database](http://www.h2database.com/html/main.html) and accessed via
+[MyBatis](http://mybatis.github.io/mybatis-3/).
+* Log manager: interface between Aurora storage and Mesos replicated log. The default schema format
+is [thrift](https://github.com/apache/thrift). Data is stored in serialized binary form.
+* Snapshot manager: all data is periodically persisted in Mesos replicated log in a single snapshot.
+This helps establishing periodic recovery checkpoints and speeds up volatile storage recovery on
+restart.
+* Backup manager: as a precaution, snapshots are periodically written out into backup files.
+This solves a [disaster recovery problem](backup-restore.md)
+in case of a complete loss or corruption of Mesos log files.
+
+![Storage hierarchy](../images/storage_hierarchy.png)
+
+
+## Storage Semantics
+
+Implementation details of the Aurora storage system. Understanding those can sometimes be useful
+when investigating performance issues.
+
+### Reads, writes, modifications
+
+All services in Aurora access data via a set of predefined store interfaces (aka stores) logically
+grouped by the type of data they serve. Every interface defines a specific set of operations allowed
+on the data thus abstracting out the storage access and the actual persistence implementation. The
+latter is especially important in view of a general immutability of persisted data. With the Mesos
+replicated log as the underlying persistence solution, data can be read and written easily but not
+modified. All modifications are simulated by saving new versions of modified objects. This feature
+and general performance considerations justify the existence of the volatile in-memory store.
+
+#### Read lifecycle
+
+There are two types of reads available in Aurora: consistent and weakly-consistent. The difference
+is explained [below](#atomicity-consistency-and-isolation).
+
+All reads are served from the volatile storage making reads generally cheap storage operations
+from the performance standpoint. The majority of the volatile stores are represented by the
+in-memory H2 database. This allows for rich schema definitions, queries and relationships that
+key-value storage is unable to match.
+
+#### Write lifecycle
+
+Writes are more involved operations since in addition to updating the volatile store data has to be
+appended to the replicated log. Data is not available for reads until fully ack-ed by both
+replicated log and volatile storage.
+
+### Atomicity, consistency and isolation
+
+Aurora uses [write-ahead logging](http://en.wikipedia.org/wiki/Write-ahead_logging) to ensure
+consistency between replicated and volatile storage. In Aurora, data is first written into the
+replicated log and only then updated in the volatile store.
+
+Aurora storage uses read-write locks to serialize data mutations and provide consistent view of the
+available data. The available `Storage` interface exposes 3 major types of operations:
+* `consistentRead` - access is locked using reader's lock and provides consistent view on read
+* `weaklyConsistentRead` - access is lock-less. Delivers best contention performance but may result
+in stale reads
+* `write` - access is fully serialized by using writer's lock. Operation success requires both
+volatile and replicated writes to succeed.
+
+The consistency of the volatile store is enforced via H2 transactional isolation.
+
+### Population on restart
+
+Any time a scheduler restarts, it restores its volatile state from the most recent position recorded
+in the replicated log by restoring the snapshot and replaying individual log entries on top to fully
+recover the state up to the last write.


[5/7] aurora git commit: Reorganize Documentation

Posted by se...@apache.org.
http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/development/design/command-hooks.md
----------------------------------------------------------------------
diff --git a/docs/development/design/command-hooks.md b/docs/development/design/command-hooks.md
new file mode 100644
index 0000000..3f3f70f
--- /dev/null
+++ b/docs/development/design/command-hooks.md
@@ -0,0 +1,102 @@
+# Command Hooks for the Aurora Client
+
+## Introduction/Motivation
+
+We've got hooks in the client that surround API calls. These are
+pretty awkward, because they don't correlate with user actions. For
+example, suppose we wanted a policy that said users weren't allowed to
+kill all instances of a production job at once.
+
+Right now, all that we could hook would be the "killJob" api call. But
+kill (at least in newer versions of the client) normally runs in
+batches. If a user called killall, what we would see on the API level
+is a series of "killJob" calls, each of which specified a batch of
+instances. We woudn't be able to distinguish between really killing
+all instances of a job (which is forbidden under this policy), and
+carefully killing in batches (which is permitted.) In each case, the
+hook would just see a series of API calls, and couldn't find out what
+the actual command being executed was!
+
+For most policy enforcement, what we really want to be able to do is
+look at and vet the commands that a user is performing, not the API
+calls that the client uses to implement those commands.
+
+So I propose that we add a new kind of hooks, which surround noun/verb
+commands. A hook will register itself to handle a collection of (noun,
+verb) pairs. Whenever any of those noun/verb commands are invoked, the
+hooks methods will be called around the execution of the verb. A
+pre-hook will have the ability to reject a command, preventing the
+verb from being executed.
+
+## Registering Hooks
+
+These hooks will be registered via configuration plugins. A configuration plugin
+can register hooks using an API. Hooks registered this way are, effectively,
+hardwired into the client executable.
+
+The order of execution of hooks is unspecified: they may be called in
+any order. There is no way to guarantee that one hook will execute
+before some other hook.
+
+
+### Global Hooks
+
+Commands registered by the python call are called _global_ hooks,
+because they will run for all configurations, whether or not they
+specify any hooks in the configuration file.
+
+In the implementation, hooks are registered in the module
+`apache.aurora.client.cli.command_hooks`, using the class
+`GlobalCommandHookRegistry`. A global hook can be registered by calling
+`GlobalCommandHookRegistry.register_command_hook` in a configuration plugin.
+
+### The API
+
+    class CommandHook(object)
+      @property
+      def name(self):
+        """Returns a name for the hook."
+
+      def get_nouns(self):
+        """Return the nouns that have verbs that should invoke this hook."""
+
+      def get_verbs(self, noun):
+        """Return the verbs for a particular noun that should invoke his hook."""
+
+      @abstractmethod
+      def pre_command(self, noun, verb, context, commandline):
+        """Execute a hook before invoking a verb.
+        * noun: the noun being invoked.
+        * verb: the verb being invoked.
+        * context: the context object that will be used to invoke the verb.
+          The options object will be initialized before calling the hook
+        * commandline: the original argv collection used to invoke the client.
+        Returns: True if the command should be allowed to proceed; False if the command
+        should be rejected.
+        """
+
+      def post_command(self, noun, verb, context, commandline, result):
+        """Execute a hook after invoking a verb.
+        * noun: the noun being invoked.
+        * verb: the verb being invoked.
+        * context: the context object that will be used to invoke the verb.
+          The options object will be initialized before calling the hook
+        * commandline: the original argv collection used to invoke the client.
+        * result: the result code returned by the verb.
+        Returns: nothing
+        """
+
+    class GlobalCommandHookRegistry(object):
+      @classmethod
+      def register_command_hook(self, hook):
+        pass
+
+### Skipping Hooks
+
+To skip a hook, a user uses a command-line option, `--skip-hooks`. The option can either
+specify specific hooks to skip, or "all":
+
+* `aurora --skip-hooks=all job create east/bozo/devel/myjob` will create a job
+  without running any hooks.
+* `aurora --skip-hooks=test,iq create east/bozo/devel/myjob` will create a job,
+  and will skip only the hooks named "test" and "iq".

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/development/scheduler.md
----------------------------------------------------------------------
diff --git a/docs/development/scheduler.md b/docs/development/scheduler.md
new file mode 100644
index 0000000..66d0857
--- /dev/null
+++ b/docs/development/scheduler.md
@@ -0,0 +1,118 @@
+Developing the Aurora Scheduler
+===============================
+
+The Aurora scheduler is written in Java code and built with [Gradle](http://gradle.org).
+
+
+Prerequisite
+============
+
+When using Apache Aurora checked out from the source repository or the binary
+distribution, the Gradle wrapper and JavaScript dependencies are provided.
+However, you need to manually install them when using the source release
+downloads:
+
+1. Install Gradle following the instructions on the [Gradle web site](http://gradle.org)
+2. From the root directory of the Apache Aurora project generate the gradle
+wrapper by running:
+
+    gradle wrapper
+
+
+Getting Started
+===============
+
+You will need Java 8 installed and on your `PATH` or unzipped somewhere with `JAVA_HOME` set. Then
+
+    ./gradlew tasks
+
+will bootstrap the build system and show available tasks. This can take a while the first time you
+run it but subsequent runs will be much faster due to cached artifacts.
+
+Running the Tests
+-----------------
+Aurora has a comprehensive unit test suite. To run the tests use
+
+    ./gradlew build
+
+Gradle will only re-run tests when dependencies of them have changed. To force a re-run of all
+tests use
+
+    ./gradlew clean build
+
+Running the build with code quality checks
+------------------------------------------
+To speed up development iteration, the plain gradle commands will not run static analysis tools.
+However, you should run these before posting a review diff, and **always** run this before pushing a
+commit to origin/master.
+
+    ./gradlew build -Pq
+
+Running integration tests
+-------------------------
+To run the same tests that are run in the Apache Aurora continuous integration
+environment:
+
+    ./build-support/jenkins/build.sh
+
+In addition, there is an end-to-end test that runs a suite of aurora commands
+using a virtual cluster:
+
+    ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
+
+Creating a bundle for deployment
+--------------------------------
+Gradle can create a zip file containing Aurora, all of its dependencies, and a launch script with
+
+    ./gradlew distZip
+
+or a tar file containing the same files with
+
+    ./gradlew distTar
+
+The output file will be written to `dist/distributions/aurora-scheduler.zip` or
+`dist/distributions/aurora-scheduler.tar`.
+
+
+
+Developing Aurora Java code
+===========================
+
+Setting up an IDE
+-----------------
+Gradle can generate project files for your IDE. To generate an IntelliJ IDEA project run
+
+    ./gradlew idea
+
+and import the generated `aurora.ipr` file.
+
+Adding or Upgrading a Dependency
+--------------------------------
+New dependencies can be added from Maven central by adding a `compile` dependency to `build.gradle`.
+For example, to add a dependency on `com.example`'s `example-lib` 1.0 add this block:
+
+    compile 'com.example:example-lib:1.0'
+
+NOTE: Anyone thinking about adding a new dependency should first familiarize themself with the
+Apache Foundation's third-party licensing
+[policy](http://www.apache.org/legal/resolved.html#category-x).
+
+
+
+Developing the Aurora Build System
+==================================
+
+Bootstrapping Gradle
+--------------------
+The following files were autogenerated by `gradle wrapper` using gradle 1.8's
+[Wrapper](http://www.gradle.org/docs/1.8/dsl/org.gradle.api.tasks.wrapper.Wrapper.html) plugin and
+should not be modified directly:
+
+    ./gradlew
+    ./gradlew.bat
+    ./gradle/wrapper/gradle-wrapper.jar
+    ./gradle/wrapper/gradle-wrapper.properties
+
+To upgrade Gradle unpack the new version somewhere, run `/path/to/new/gradle wrapper` in the
+repository root and commit the changed files.
+

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/development/thermos.md
----------------------------------------------------------------------
diff --git a/docs/development/thermos.md b/docs/development/thermos.md
new file mode 100644
index 0000000..d1a7b9a
--- /dev/null
+++ b/docs/development/thermos.md
@@ -0,0 +1,126 @@
+The Python components of Aurora are built using [Pants](https://pantsbuild.github.io).
+
+
+Python Build Conventions
+========================
+The Python code is laid out according to the following conventions:
+
+1. 1 `BUILD` per 3rd level directory. For a list of current top-level packages run:
+
+        % find src/main/python -maxdepth 3 -mindepth 3 -type d |\
+        while read dname; do echo $dname |\
+            sed 's@src/main/python/\(.*\)/\(.*\)/\(.*\).*@\1.\2.\3@'; done
+
+2.  Each `BUILD` file exports 1
+    [`python_library`](https://pantsbuild.github.io/build_dictionary.html#bdict_python_library)
+    that provides a
+    [`setup_py`](https://pantsbuild.github.io/build_dictionary.html#setup_py)
+    containing each
+    [`python_binary`](https://pantsbuild.github.io/build_dictionary.html#python_binary)
+    in the `BUILD` file, named the same as the directory it's in so that it can be referenced
+    without a ':' character. The `sources` field in the `python_library` will almost always be
+    `rglobs('*.py')`.
+
+3.  Other BUILD files may only depend on this single public `python_library`
+    target. Any other target is considered a private implementation detail and
+    should be prefixed with an `_`.
+
+4.  `python_binary` targets are always named the same as the exported console script.
+
+5.  `python_binary` targets must have identical `dependencies` to the `python_library` exported
+    by the package and must use `entry_point`.
+
+    The means a PEX file generated by pants will contain exactly the same files that will be
+    available on the `PYTHONPATH` in the case of `pip install` of the corresponding library
+    target. This will help our migration off of Pants in the future.
+
+Annotated example - apache.thermos.runner
+-----------------------------------------
+
+    % find src/main/python/apache/thermos/runner
+    src/main/python/apache/thermos/runner
+    src/main/python/apache/thermos/runner/__init__.py
+    src/main/python/apache/thermos/runner/thermos_runner.py
+    src/main/python/apache/thermos/runner/BUILD
+    % cat src/main/python/apache/thermos/runner/BUILD
+    # License boilerplate omitted
+    import os
+
+
+    # Private target so that a setup_py can exist without a circular dependency. Only targets within
+    # this file should depend on this.
+    python_library(
+      name = '_runner',
+      # The target covers every python file under this directory and subdirectories.
+      sources = rglobs('*.py'),
+      dependencies = [
+        '3rdparty/python:twitter.common.app',
+        '3rdparty/python:twitter.common.log',
+        # Source dependencies are always referenced without a ':'.
+        'src/main/python/apache/thermos/common',
+        'src/main/python/apache/thermos/config',
+        'src/main/python/apache/thermos/core',
+      ],
+    )
+
+    # Binary target for thermos_runner.pex. Nothing should depend on this - it's only used as an
+    # argument to ./pants binary.
+    python_binary(
+      name = 'thermos_runner',
+      # Use entry_point, not source so the files used here are the same ones tests see.
+      entry_point = 'apache.thermos.bin.thermos_runner',
+      dependencies = [
+        # Notice that we depend only on the single private target from this BUILD file here.
+        ':_runner',
+      ],
+    )
+
+    # The public library that everyone importing the runner symbols uses.
+    # The test targets and any other dependent source code should depend on this.
+    python_library(
+      name = 'runner',
+      dependencies = [
+        # Again, notice that we depend only on the single private target from this BUILD file here.
+        ':_runner',
+      ],
+      # We always provide a setup_py. This will cause any dependee libraries to automatically
+      # reference this library in their requirements.txt rather than copy the source files into their
+      # sdist.
+      provides = setup_py(
+        # Conventionally named and versioned.
+        name = 'apache.thermos.runner',
+        version = open(os.path.join(get_buildroot(), '.auroraversion')).read().strip().upper(),
+      ).with_binaries({
+        # Every binary in this file should also be repeated here.
+        # Always use the dict-form of .with_binaries so that commands with dashes in their names are
+        # supported.
+        # The console script name is always the same as the PEX with .pex stripped.
+        'thermos_runner': ':thermos_runner',
+      }),
+    )
+
+
+
+Thermos Test resources
+======================
+
+The Aurora source repository and distributions contain several
+[binary files](../../src/test/resources/org/apache/thermos/root/checkpoints) to
+qualify the backwards-compatibility of thermos with checkpoint data. Since
+thermos persists state to disk, to be read by the thermos observer), it is important that we have
+tests that prevent regressions affecting the ability to parse previously-written data.
+
+The files included represent persisted checkpoints that exercise different
+features of thermos. The existing files should not be modified unless
+we are accepting backwards incompatibility, such as with a major release.
+
+It is not practical to write source code to generate these files on the fly,
+as source would be vulnerable to drift (e.g. due to refactoring) in ways
+that would undermine the goal of ensuring backwards compatibility.
+
+The most common reason to add a new checkpoint file would be to provide
+coverage for new thermos features that alter the data format. This is
+accomplished by writing and running a
+[job configuration](../reference/configuration.md) that exercises the feature, and
+copying the checkpoint file from the sandbox directory, by default this is
+`/var/run/thermos/checkpoints/<aurora task id>`.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/development/thrift.md
----------------------------------------------------------------------
diff --git a/docs/development/thrift.md b/docs/development/thrift.md
new file mode 100644
index 0000000..7f098c2
--- /dev/null
+++ b/docs/development/thrift.md
@@ -0,0 +1,57 @@
+Thrift
+======
+
+Aurora uses [Apache Thrift](https://thrift.apache.org/) for representing structured data in
+client/server RPC protocol as well as for internal data storage. While Thrift is capable of
+correctly handling additions and renames of the existing members, field removals must be done
+carefully to ensure backwards compatibility and provide predictable deprecation cycle. This
+document describes general guidelines for making Thrift schema changes to the existing fields in
+[api.thrift](../../api/src/main/thrift/org/apache/aurora/gen/api.thrift).
+
+It is highly recommended to go through the
+[Thrift: The Missing Guide](http://diwakergupta.github.io/thrift-missing-guide/) first to refresh on
+basic Thrift schema concepts.
+
+Checklist
+---------
+Every existing Thrift schema modification is unique in its requirements and must be analyzed
+carefully to identify its scope and expected consequences. The following checklist may help in that
+analysis:
+* Is this a new field/struct? If yes, go ahead
+* Is this a pure field/struct rename without any type/structure change? If yes, go ahead and rename
+* Anything else, read further to make sure your change is properly planned
+
+Deprecation cycle
+-----------------
+Any time a breaking change (e.g.: field replacement or removal) is required, the following cycle
+must be followed:
+
+### vCurrent
+Change is applied in a way that does not break scheduler/client with this version to
+communicate with scheduler/client from vCurrent-1.
+* Do not remove or rename the old field
+* Add a new field as an eventual replacement of the old one and implement a dual read/write
+anywhere the old field is used. If a thrift struct is mapped in the DB store make sure both columns
+are marked as `NOT NULL`
+* Check [storage.thrift](../../api/src/main/thrift/org/apache/aurora/gen/storage.thrift) to see if the
+affected struct is stored in Aurora scheduler storage. If so, you most likely need to backfill
+existing data to ensure both fields are populated eagerly on startup. See
+[this patch](https://reviews.apache.org/r/43172) as a real-life example of thrift-struct
+backfilling. IMPORTANT: backfilling implementation needs to ensure both fields are populated. This
+is critical to enable graceful scheduler upgrade as well as rollback to the old version if needed.
+* Add a deprecation jira ticket into the vCurrent+1 release candidate
+* Add a TODO for the deprecated field mentioning the jira ticket
+
+### vCurrent+1
+Finalize the change by removing the deprecated fields from the Thrift schema.
+* Drop any dual read/write routines added in the previous version
+* Remove thrift backfilling in scheduler
+* Remove the deprecated Thrift field
+
+Testing
+-------
+It's always advisable to test your changes in the local vagrant environment to build more
+confidence that you change is backwards compatible. It's easy to simulate different
+client/scheduler versions by playing with `aurorabuild` command. See [this document](../getting-started/vagrant.md)
+for more.
+

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/development/ui.md
----------------------------------------------------------------------
diff --git a/docs/development/ui.md b/docs/development/ui.md
new file mode 100644
index 0000000..f41614e
--- /dev/null
+++ b/docs/development/ui.md
@@ -0,0 +1,46 @@
+Developing the Aurora Scheduler UI
+==================================
+
+Installing bower (optional)
+----------------------------
+Third party JS libraries used in Aurora (located at 3rdparty/javascript/bower_components) are
+managed by bower, a JS dependency manager. Bower is only required if you plan to add, remove or
+update JS libraries. Bower can be installed using the following command:
+
+    npm install -g bower
+
+Bower depends on node.js and npm. The easiest way to install node on a mac is via brew:
+
+    brew install node
+
+For more node.js installation options refer to https://github.com/joyent/node/wiki/Installation.
+
+More info on installing and using bower can be found at: http://bower.io/. Once installed, you can
+use the following commands to view and modify the bower repo at
+3rdparty/javascript/bower_components
+
+    bower list
+    bower install <library name>
+    bower remove <library name>
+    bower update <library name>
+    bower help
+
+
+Faster Iteration in Vagrant
+---------------------------
+The scheduler serves UI assets from the classpath. For production deployments this means the assets
+are served from within a jar. However, for faster development iteration, the vagrant image is
+configured to add the `scheduler` subtree of `/vagrant/dist/resources/main` to the head of
+`CLASSPATH`. This path is configured as a shared filesystem to the path on the host system where
+your Aurora repository lives. This means that any updates under `dist/resources/main/scheduler` in
+your checkout will be reflected immediately in the UI served from within the vagrant image.
+
+The one caveat to this is that this path is under `dist` not `src`. This is because the assets must
+be processed by gradle before they can be served. So, unfortunately, you cannot just save your local
+changes and see them reflected in the UI, you must first run `./gradlew processResources`. This is
+less than ideal, but better than having to restart the scheduler after every change. Additionally,
+gradle makes this process somewhat easier with the use of the `--continuous` flag. If you run:
+`./gradlew processResources --continuous` gradle will monitor the filesystem for changes and run the
+task automatically as necessary. This doesn't quite provide hot-reload capabilities, but it does
+allow for <5s from save to changes being visibile in the UI with no further action required on the
+part of the developer.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/features/constraints.md
----------------------------------------------------------------------
diff --git a/docs/features/constraints.md b/docs/features/constraints.md
new file mode 100644
index 0000000..ef2f664
--- /dev/null
+++ b/docs/features/constraints.md
@@ -0,0 +1,126 @@
+Scheduling Constraints
+======================
+
+By default, Aurora will pick any random slave with sufficient resources
+in order to schedule a task. This scheduling choice can be further
+restricted with the help of constraints.
+
+
+Mesos Attributes
+----------------
+
+Data centers are often organized with hierarchical failure domains.  Common failure domains
+include hosts, racks, rows, and PDUs.  If you have this information available, it is wise to tag
+the Mesos slave with them as
+[attributes](https://mesos.apache.org/documentation/attributes-resources/).
+
+The Mesos slave `--attributes` command line argument can be used to mark slaves with
+static key/value pairs, so called attributes (not to be confused with `--resources`, which are
+dynamic and accounted).
+
+For example, consider the host `cluster1-aaa-03-sr2` and its following attributes (given in
+key:value format): `host:cluster1-aaa-03-sr2` and `rack:aaa`.
+
+Aurora makes these attributes available for matching with scheduling constraints.
+
+
+Limit Constraints
+-----------------
+
+Limit constraints allow to control machine diversity using constraints. The below
+constraint ensures that no more than two instances of your job may run on a single host.
+Think of this as a "group by" limit.
+
+    Service(
+      name = 'webservice',
+      role = 'www-data',
+      constraints = {
+        'host': 'limit:2',
+      }
+      ...
+    )
+
+
+Likewise, you can use constraints to control rack diversity, e.g. at
+most one task per rack:
+
+    constraints = {
+      'rack': 'limit:1',
+    }
+
+Use these constraints sparingly as they can dramatically reduce Tasks' schedulability.
+Further details are available in the reference documentation on
+[Scheduling Constraints](#specifying-scheduling-constraints).
+
+
+
+Value Constraints
+-----------------
+
+Value constraints can be used to express that a certain attribute with a certain value
+should be present on a Mesos slave. For example, the following job would only be
+scheduled on nodes that claim to have an `SSD` as their disk.
+
+    Service(
+      name = 'webservice',
+      role = 'www-data',
+      constraints = {
+        'disk': 'SSD',
+      }
+      ...
+    )
+
+
+Further details are available in the reference documentation on
+[Scheduling Constraints](#specifying-scheduling-constraints).
+
+
+Running stateful services
+-------------------------
+
+Aurora is best suited to run stateless applications, but it also accommodates for stateful services
+like databases, or services that otherwise need to always run on the same machines.
+
+### Dedicated attribute
+
+Most of the Mesos attributes arbitrary and available for custom use.  There is one exception,
+though: the `dedicated` attribute.  Aurora treats this specially, and only allows matching jobs to
+run on these machines, and will only schedule matching jobs on these machines.
+
+
+#### Syntax
+The dedicated attribute has semantic meaning. The format is `$role(/.*)?`. When a job is created,
+the scheduler requires that the `$role` component matches the `role` field in the job
+configuration, and will reject the job creation otherwise.  The remainder of the attribute is
+free-form. We've developed the idiom of formatting this attribute as `$role/$job`, but do not
+enforce this. For example: a job `devcluster/www-data/prod/hello` with a dedicated constraint set as
+`www-data/web.multi` will have its tasks scheduled only on Mesos slaves configured with:
+`--attributes=dedicated:www-data/web.multi`.
+
+A wildcard (`*`) may be used for the role portion of the dedicated attribute, which will allow any
+owner to elect for a job to run on the host(s). For example: tasks from both
+`devcluster/www-data/prod/hello` and `devcluster/vagrant/test/hello` with a dedicated constraint
+formatted as `*/web.multi` will be scheduled only on Mesos slaves configured with
+`--attributes=dedicated:*/web.multi`. This may be useful when assembling a virtual cluster of
+machines sharing the same set of traits or requirements.
+
+##### Example
+Consider the following slave command line:
+
+    mesos-slave --attributes="dedicated:db_team/redis" ...
+
+And this job configuration:
+
+    Service(
+      name = 'redis',
+      role = 'db_team',
+      constraints = {
+        'dedicated': 'db_team/redis'
+      }
+      ...
+    )
+
+The job configuration is indicating that it should only be scheduled on slaves with the attribute
+`dedicated:db_team/redis`.  Additionally, Aurora will prevent any tasks that do _not_ have that
+constraint from running on those slaves.
+

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/features/containers.md
----------------------------------------------------------------------
diff --git a/docs/features/containers.md b/docs/features/containers.md
new file mode 100644
index 0000000..a0fd4ac
--- /dev/null
+++ b/docs/features/containers.md
@@ -0,0 +1,43 @@
+Containers
+==========
+
+
+Docker
+------
+
+Aurora has optional support for launching Docker containers, if correctly [configured by an Operator](../operations/configuration.md#docker-containers).
+
+Example (available in the [Vagrant environment](../getting-started/vagrant.md)):
+
+
+    $ cat /vagrant/examples/jobs/docker/hello_docker.aurora
+    hello_docker = Process(
+      name = 'hello',
+      cmdline = """
+        while true; do
+          echo hello world
+          sleep 10
+        done
+      """)
+
+    hello_world_docker = Task(
+      name = 'hello docker',
+      processes = [hello_world_proc],
+      resources = Resources(cpu = 1, ram = 1*MB, disk=8*MB)
+    )
+
+    jobs = [
+      Service(
+        cluster = 'devcluster',
+        environment = 'devel',
+        role = 'docker-test',
+        name = 'hello_docker',
+        task = hello_world_docker,
+        container = Container(docker = Docker(image = 'python:2.7'))
+      )
+    ]
+
+
+In order to correctly execute processes inside a job, the docker container must have Python 2.7
+installed. Further details of how to use Docker can be found in the
+[Reference Documentation](..reference/configuration.md#docker-object).

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/features/cron-jobs.md
----------------------------------------------------------------------
diff --git a/docs/features/cron-jobs.md b/docs/features/cron-jobs.md
new file mode 100644
index 0000000..8a864e9
--- /dev/null
+++ b/docs/features/cron-jobs.md
@@ -0,0 +1,124 @@
+# Cron Jobs
+
+Aurora supports execution of scheduled jobs on a Mesos cluster using cron-style syntax.
+
+- [Overview](#overview)
+- [Collision Policies](#collision-policies)
+- [Failure recovery](#failure-recovery)
+- [Interacting with cron jobs via the Aurora CLI](#interacting-with-cron-jobs-via-the-aurora-cli)
+	- [cron schedule](#cron-schedule)
+	- [cron deschedule](#cron-deschedule)
+	- [cron start](#cron-start)
+	- [job killall, job restart, job kill](#job-killall-job-restart-job-kill)
+- [Technical Note About Syntax](#technical-note-about-syntax)
+- [Caveats](#caveats)
+	- [Failovers](#failovers)
+	- [Collision policy is best-effort](#collision-policy-is-best-effort)
+	- [Timezone Configuration](#timezone-configuration)
+
+## Overview
+
+A job is identified as a cron job by the presence of a
+`cron_schedule` attribute containing a cron-style schedule in the
+[`Job`](../reference/configuration.md#job-objects) object. Examples of cron schedules
+include "every 5 minutes" (`*/5 * * * *`), "Fridays at 17:00" (`* 17 * * FRI`), and
+"the 1st and 15th day of the month at 03:00" (`0 3 1,15 *`).
+
+Example (available in the [Vagrant environment](../getting-started/vagrant.md)):
+
+    $ cat /vagrant/examples/jobs/cron_hello_world.aurora
+    # A cron job that runs every 5 minutes.
+    jobs = [
+      Job(
+        cluster = 'devcluster',
+        role = 'www-data',
+        environment = 'test',
+        name = 'cron_hello_world',
+        cron_schedule = '*/5 * * * *',
+        task = SimpleTask(
+          'cron_hello_world',
+          'echo "Hello world from cron, the time is now $(date --rfc-822)"'),
+      ),
+    ]
+
+## Collision Policies
+
+The `cron_collision_policy` field specifies the scheduler's behavior when a new cron job is
+triggered while an older run hasn't finished. The scheduler has two policies available:
+
+* `KILL_EXISTING`: The default policy - on a collision the old instances are killed and a instances with the current
+configuration are started.
+* `CANCEL_NEW`: On a collision the new run is cancelled.
+
+Note that the use of `CANCEL_NEW` is likely a code smell - interrupted cron jobs should be able
+to recover their progress on a subsequent invocation, otherwise they risk having their work queue
+grow faster than they can process it.
+
+## Failure recovery
+
+Unlike with services, which aurora will always re-execute regardless of exit status, instances of
+cron jobs retry according to the `max_task_failures` attribute of the
+[Task](../reference/configuration.md#task-objects) object. To get "run-until-success" semantics,
+set `max_task_failures` to `-1`.
+
+## Interacting with cron jobs via the Aurora CLI
+
+Most interaction with cron jobs takes place using the `cron` subcommand. See `aurora cron -h`
+for up-to-date usage instructions.
+
+### cron schedule
+Schedules a new cron job on the Aurora cluster for later runs or replaces the existing cron template
+with a new one. Only future runs will be affected, any existing active tasks are left intact.
+
+    $ aurora cron schedule devcluster/www-data/test/cron_hello_world /vagrant/examples/jobs/cron_hello_world.aurora
+
+### cron deschedule
+Deschedules a cron job, preventing future runs but allowing current runs to complete.
+
+    $ aurora cron deschedule devcluster/www-data/test/cron_hello_world
+
+### cron start
+Start a cron job immediately, outside of its normal cron schedule.
+
+    $ aurora cron start devcluster/www-data/test/cron_hello_world
+
+### job killall, job restart, job kill
+Cron jobs create instances running on the cluster that you can interact with like normal Aurora
+tasks with `job kill` and `job restart`.
+
+
+## Technical Note About Syntax
+
+`cron_schedule` uses a restricted subset of BSD crontab syntax. While the
+execution engine currently uses Quartz, the schedule parsing is custom, a subset of FreeBSD
+[crontab(5)](http://www.freebsd.org/cgi/man.cgi?crontab(5)) syntax. See
+[the source](https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/cron/CrontabEntry.java#L106-L124)
+for details.
+
+
+## Caveats
+
+### Failovers
+No failover recovery. Aurora does not record the latest minute it fired
+triggers for across failovers. Therefore it's possible to miss triggers
+on failover. Note that this behavior may change in the future.
+
+It's necessary to sync time between schedulers with something like `ntpd`.
+Clock skew could cause double or missed triggers in the case of a failover.
+
+### Collision policy is best-effort
+Aurora aims to always have *at least one copy* of a given instance running at a time - it's
+an AP system, meaning it chooses Availability and Partition Tolerance at the expense of
+Consistency.
+
+If your collision policy was `CANCEL_NEW` and a task has terminated but
+Aurora has not noticed this Aurora will go ahead and create your new
+task.
+
+If your collision policy was `KILL_EXISTING` and a task was marked `LOST`
+but not yet GCed Aurora will go ahead and create your new task without
+attempting to kill the old one (outside the GC interval).
+
+### Timezone Configuration
+Cron timezone is configured indepdendently of JVM timezone with the `-cron_timezone` flag and
+defaults to UTC.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/features/job-updates.md
----------------------------------------------------------------------
diff --git a/docs/features/job-updates.md b/docs/features/job-updates.md
new file mode 100644
index 0000000..792f2ae
--- /dev/null
+++ b/docs/features/job-updates.md
@@ -0,0 +1,111 @@
+Aurora Job Updates
+==================
+
+`Job` configurations can be updated at any point in their lifecycle.
+Usually updates are done incrementally using a process called a *rolling
+upgrade*, in which Tasks are upgraded in small groups, one group at a
+time.  Updates are done using various Aurora Client commands.
+
+
+Rolling Job Updates
+-------------------
+
+There are several sub-commands to manage job updates:
+
+    aurora update start <job key> <configuration file>
+    aurora update info <job key>
+    aurora update pause <job key>
+    aurora update resume <job key>
+    aurora update abort <job key>
+    aurora update list <cluster>
+
+When you `start` a job update, the command will return once it has sent the
+instructions to the scheduler.  At that point, you may view detailed
+progress for the update with the `info` subcommand, in addition to viewing
+graphical progress in the web browser.  You may also get a full listing of
+in-progress updates in a cluster with `list`.
+
+Once an update has been started, you can `pause` to keep the update but halt
+progress.  This can be useful for doing things like debug a  partially-updated
+job to determine whether you would like to proceed.  You can `resume` to
+proceed.
+
+You may `abort` a job update regardless of the state it is in. This will
+instruct the scheduler to completely abandon the job update and leave the job
+in the current (possibly partially-updated) state.
+
+For a configuration update, the Aurora Client calculates required changes
+by examining the current job config state and the new desired job config.
+It then starts a *rolling batched update process* by going through every batch
+and performing these operations:
+
+- If an instance is present in the scheduler but isn't in the new config,
+  then that instance is killed.
+- If an instance is not present in the scheduler but is present in
+  the new config, then the instance is created.
+- If an instance is present in both the scheduler and the new config, then
+  the client diffs both task configs. If it detects any changes, it
+  performs an instance update by killing the old config instance and adds
+  the new config instance.
+
+The Aurora client continues through the instance list until all tasks are
+updated, in `RUNNING,` and healthy for a configurable amount of time.
+If the client determines the update is not going well (a percentage of health
+checks have failed), it cancels the update.
+
+Update cancellation runs a procedure similar to the described above
+update sequence, but in reverse order. New instance configs are swapped
+with old instance configs and batch updates proceed backwards
+from the point where the update failed. E.g.; (0,1,2) (3,4,5) (6,7,
+8-FAIL) results in a rollback in order (8,7,6) (5,4,3) (2,1,0).
+
+For details how to control a job update, please see the
+[UpdateConfig](../reference/configuration.md#updateconfig-objects) configuration object.
+
+
+Coordinated Job Updates
+------------------------
+
+Some Aurora services may benefit from having more control over updates by explicitly
+acknowledging ("heartbeating") job update progress. This may be helpful for mission-critical
+service updates where explicit job health monitoring is vital during the entire job update
+lifecycle. Such job updates would rely on an external service (or a custom client) periodically
+pulsing an active coordinated job update via a
+[pulseJobUpdate RPC](../../api/src/main/thrift/org/apache/aurora/gen/api.thrift).
+
+A coordinated update is defined by setting a positive
+[pulse_interval_secs](../reference/configuration.md#updateconfig-objects) value in job configuration
+file. If no pulses are received within specified interval the update will be blocked. A blocked
+update is unable to continue rolling forward (or rolling back) but retains its active status.
+It may only be unblocked by a fresh `pulseJobUpdate` call.
+
+NOTE: A coordinated update starts in `ROLL_FORWARD_AWAITING_PULSE` state and will not make any
+progress until the first pulse arrives. However, a paused update (`ROLL_FORWARD_PAUSED` or
+`ROLL_BACK_PAUSED`) is still considered active and upon resuming will immediately make progress
+provided the pulse interval has not expired.
+
+
+Canary Deployments
+------------------
+
+Canary deployments are a pattern for rolling out updates to a subset of job instances,
+in order to test different code versions alongside the actual production job.
+It is a risk-mitigation strategy for job owners and commonly used in a form where
+job instance 0 runs with a different configuration than the instances 1-N.
+
+For example, consider a job with 4 instances that each
+request 1 core of cpu, 1 GB of RAM, and 1 GB of disk space as specified
+in the configuration file `hello_world.aurora`. If you want to
+update it so it requests 2 GB of RAM instead of 1. You can create a new
+configuration file to do that called `new_hello_world.aurora` and
+issue
+
+    aurora update start <job_key_value>/0-1 new_hello_world.aurora
+
+This results in instances 0 and 1 having 1 cpu, 2 GB of RAM, and 1 GB of disk space,
+while instances 2 and 3 have 1 cpu, 1 GB of RAM, and 1 GB of disk space. If instance 3
+dies and restarts, it restarts with 1 cpu, 1 GB RAM, and 1 GB disk space.
+
+So that means there are two simultaneous task configurations for the same job
+at the same time, just valid for different ranges of instances. While this isn't a recommended
+pattern, it is valid and supported by the Aurora scheduler.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/features/multitenancy.md
----------------------------------------------------------------------
diff --git a/docs/features/multitenancy.md b/docs/features/multitenancy.md
new file mode 100644
index 0000000..62bcd53
--- /dev/null
+++ b/docs/features/multitenancy.md
@@ -0,0 +1,62 @@
+Multitenancy
+============
+
+Aurora is a multi-tenant system that can run jobs of multiple clients/tenants.
+Going beyond the [resource isolation on an individual host](resource-isolation.md), it is
+crucial to prevent those jobs from stepping on each others toes.
+
+
+Job Namespaces
+--------------
+
+The namespace for jobs in Aurora follows a hierarchical structure. This is meant to make it easier
+to differentiate between different jobs. A job key consists of four parts. The four parts are
+`<cluster>/<role>/<environment>/<jobname>` in that order:
+
+* Cluster refers to the name of a particular Aurora installation.
+* Role names are user accounts.
+* Environment names are namespaces.
+* Jobname is the custom name of your job.
+
+Role names correspond to user accounts. They are used for
+[authentication](../operations/security.md), as the linux user used to run jobs, and for the
+assignment of [quota](#preemption). If you don't know what accounts are available, contact your
+sysadmin.
+
+The environment component in the job key, serves as a namespace. The values for
+environment are validated in the client and the scheduler so as to allow any of `devel`, `test`,
+`production`, and any value matching the regular expression `staging[0-9]*`.
+
+None of the values imply any difference in the scheduling behavior. Conventionally, the
+"environment" is set so as to indicate a certain level of stability in the behavior of the job
+by ensuring that an appropriate level of testing has been performed on the application code. e.g.
+in the case of a typical Job, releases may progress through the following phases in order of
+increasing level of stability: `devel`, `test`, `staging`, `production`.
+
+
+Preemption
+----------
+
+In order to guarantee that important production jobs are always running, Aurora supports
+preemption.
+
+Let's consider we have a pending job that is candidate for scheduling but resource shortage pressure
+prevents this. Active tasks can become the victim of preemption, if:
+
+ - both candidate and victim are owned by the same role and the
+   [priority](../reference/configuration.md#job-objects) of a victim is lower than the
+   [priority](../reference/configuration.md#job-objects) of the candidate.
+ - OR a victim is non-[production](../reference/configuration.md#job-objects) and the candidate is
+   [production](../reference/configuration.md#job-objects).
+
+In other words, tasks from [production](../reference/configuration.md#job-objects) jobs may preempt
+tasks from any non-production job. However, a production task may only be preempted by tasks from
+production jobs in the same role with higher [priority](../reference/configuration.md#job-objects).
+
+Aurora requires resource quotas for [production non-dedicated jobs](../reference/configuration.md#job-objects).
+Quota is enforced at the job role level and when set, defines a non-preemptible pool of compute resources within
+that role. All job types (service, adhoc or cron) require role resource quota unless a job has
+[dedicated constraint set](constraints.md#dedicated-attribute).
+
+To grant quota to a particular role in production, an operator can use the command
+`aurora_admin set_quota`.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/features/resource-isolation.md
----------------------------------------------------------------------
diff --git a/docs/features/resource-isolation.md b/docs/features/resource-isolation.md
new file mode 100644
index 0000000..b9cdfeb
--- /dev/null
+++ b/docs/features/resource-isolation.md
@@ -0,0 +1,167 @@
+Resources Isolation and Sizing
+==============================
+
+- [Isolation](#isolation)
+- [Sizing](#sizing)
+- [Oversubscription](#oversubscription)
+
+
+Isolation
+---------
+
+Aurora is a multi-tenant system; a single software instance runs on a
+server, serving multiple clients/tenants. To share resources among
+tenants, it implements isolation of:
+
+* CPU
+* memory
+* disk space
+
+CPU is a soft limit, and handled differently from memory and disk space.
+Too low a CPU value results in throttling your application and
+slowing it down. Memory and disk space are both hard limits; when your
+application goes over these values, it's killed.
+
+### CPU Isolation
+
+Mesos uses a quota based CPU scheduler (the *Completely Fair Scheduler*)
+to provide consistent and predictable performance.  This is effectively
+a guarantee of resources -- you receive at least what you requested, but
+also no more than you've requested.
+
+The scheduler gives applications a CPU quota for every 100 ms interval.
+When an application uses its quota for an interval, it is throttled for
+the rest of the 100 ms. Usage resets for each interval and unused
+quota does not carry over.
+
+For example, an application specifying 4.0 CPU has access to 400 ms of
+CPU time every 100 ms. This CPU quota can be used in different ways,
+depending on the application and available resources. Consider the
+scenarios shown in this diagram.
+
+![CPU Availability](../images/CPUavailability.png)
+
+* *Scenario A*: the application can use up to 4 cores continuously for
+every 100 ms interval. It is never throttled and starts processing
+new requests immediately.
+
+* *Scenario B* : the application uses up to 8 cores (depending on
+availability) but is throttled after 50 ms. The CPU quota resets at the
+start of each new 100 ms interval.
+
+* *Scenario C* : is like Scenario A, but there is a garbage collection
+event in the second interval that consumes all CPU quota. The
+application throttles for the remaining 75 ms of that interval and
+cannot service requests until the next interval. In this example, the
+garbage collection finished in one interval but, depending on how much
+garbage needs collecting, it may take more than one interval and further
+delay service of requests.
+
+*Technical Note*: Mesos considers logical cores, also known as
+hyperthreading or SMT cores, as the unit of CPU.
+
+### Memory Isolation
+
+Mesos uses dedicated memory allocation. Your application always has
+access to the amount of memory specified in your configuration. The
+application's memory use is defined as the sum of the resident set size
+(RSS) of all processes in a shard. Each shard is considered
+independently.
+
+In other words, say you specified a memory size of 10GB. Each shard
+would receive 10GB of memory. If an individual shard's memory demands
+exceed 10GB, that shard is killed, but the other shards continue
+working.
+
+*Technical note*: Total memory size is not enforced at allocation time,
+so your application can request more than its allocation without getting
+an ENOMEM. However, it will be killed shortly after.
+
+### Disk Space
+
+Disk space used by your application is defined as the sum of the files'
+disk space in your application's directory, including the `stdout` and
+`stderr` logged from your application. Each shard is considered
+independently. You should use off-node storage for your application's
+data whenever possible.
+
+In other words, say you specified disk space size of 100MB. Each shard
+would receive 100MB of disk space. If an individual shard's disk space
+demands exceed 100MB, that shard is killed, but the other shards
+continue working.
+
+After your application finishes running, its allocated disk space is
+reclaimed. Thus, your job's final action should move any disk content
+that you want to keep, such as logs, to your home file system or other
+less transitory storage. Disk reclamation takes place an undefined
+period after the application finish time; until then, the disk contents
+are still available but you shouldn't count on them being so.
+
+*Technical note* : Disk space is not enforced at write so your
+application can write above its quota without getting an ENOSPC, but it
+will be killed shortly after. This is subject to change.
+
+### Other Resources
+
+Other resources, such as network bandwidth, do not have any performance
+guarantees. For some resources, such as memory bandwidth, there are no
+practical sharing methods so some application combinations collocated on
+the same host may cause contention.
+
+
+Sizing
+-------
+
+### CPU Sizing
+
+To correctly size Aurora-run Mesos tasks, specify a per-shard CPU value
+that lets the task run at its desired performance when at peak load
+distributed across all shards. Include reserve capacity of at least 50%,
+possibly more, depending on how critical your service is (or how
+confident you are about your original estimate : -)), ideally by
+increasing the number of shards to also improve resiliency. When running
+your application, observe its CPU stats over time. If consistently at or
+near your quota during peak load, you should consider increasing either
+per-shard CPU or the number of shards.
+
+## Memory Sizing
+
+Size for your application's peak requirement. Observe the per-instance
+memory statistics over time, as memory requirements can vary over
+different periods. Remember that if your application exceeds its memory
+value, it will be killed, so you should also add a safety margin of
+around 10-20%. If you have the ability to do so, you may also want to
+put alerts on the per-instance memory.
+
+## Disk Space Sizing
+
+Size for your application's peak requirement. Rotate and discard log
+files as needed to stay within your quota. When running a Java process,
+add the maximum size of the Java heap to your disk space requirement, in
+order to account for an out of memory error dumping the heap
+into the application's sandbox space.
+
+
+Oversubscription
+----------------
+
+**WARNING**: This feature is currently in alpha status. Do not use it in production clusters!
+
+Mesos [supports a concept of revocable tasks](http://mesos.apache.org/documentation/latest/oversubscription/)
+by oversubscribing machine resources by the amount deemed safe to not affect the existing
+non-revocable tasks. Aurora now supports revocable jobs via a `tier` setting set to `revocable`
+value.
+
+The Aurora scheduler must be configured to receive revocable offers from Mesos and accept revocable
+jobs. If not configured properly revocable tasks will never get assigned to hosts and will stay in
+`PENDING`. Set these scheduler flag to allow receiving revocable Mesos offers:
+
+    -receive_revocable_resources=true
+
+Specify a tier configuration file path (unless you want to use the [default](../../src/main/resources/org/apache/aurora/scheduler/tiers.json)):
+
+    -tier_config=path/to/tiers/config.json
+
+
+See the [Configuration Reference](../references/configuration.md) for details on how to mark a job
+as being revocable.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/features/service-discovery.md
----------------------------------------------------------------------
diff --git a/docs/features/service-discovery.md b/docs/features/service-discovery.md
new file mode 100644
index 0000000..858ca2a
--- /dev/null
+++ b/docs/features/service-discovery.md
@@ -0,0 +1,14 @@
+Service Discovery
+=================
+
+It is possible for the Aurora executor to announce tasks into ServerSets for
+the purpose of service discovery.  ServerSets use the Zookeeper [group membership pattern](http://zookeeper.apache.org/doc/trunk/recipes.html#sc_outOfTheBox)
+of which there are several reference implementations:
+
+  - [C++](https://github.com/apache/mesos/blob/master/src/zookeeper/group.cpp)
+  - [Java](https://github.com/twitter/commons/blob/master/src/java/com/twitter/common/zookeeper/ServerSetImpl.java#L221)
+  - [Python](https://github.com/twitter/commons/blob/master/src/python/twitter/common/zookeeper/serverset/serverset.py#L51)
+
+These can also be used natively in Finagle using the [ZookeeperServerSetCluster](https://github.com/twitter/finagle/blob/master/finagle-serversets/src/main/scala/com/twitter/finagle/zookeeper/ZookeeperServerSetCluster.scala).
+
+For more information about how to configure announcing, see the [Configuration Reference](../reference/configuration.md).

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/features/services.md
----------------------------------------------------------------------
diff --git a/docs/features/services.md b/docs/features/services.md
new file mode 100644
index 0000000..691fa28
--- /dev/null
+++ b/docs/features/services.md
@@ -0,0 +1,99 @@
+Long-running Services
+=====================
+
+Jobs that are always restart on completion, whether successful or unsuccessful,
+are called services. This is useful for long-running processes
+such as webservices that should always be running, unless stopped explicitly.
+
+
+Service Specification
+---------------------
+
+A job is identified as a service by the presence of the flag
+``service=True` in the [`Job`](../reference/configuration.md#job-objects) object.
+The `Service` alias can be used as shorthand for `Job` with `service=True`.
+
+Example (available in the [Vagrant environment](../getting-started/vagrant.md)):
+
+    $ cat /vagrant/examples/jobs/hello_world.aurora
+    hello = Process(
+      name = 'hello',
+      cmdline = """
+        while true; do
+          echo hello world
+          sleep 10
+        done
+      """)
+
+    task = SequentialTask(
+      processes = [hello],
+      resources = Resources(cpu = 1.0, ram = 128*MB, disk = 128*MB)
+    )
+
+    jobs = [
+      Service(
+        task = task,
+        cluster = 'devcluster',
+        role = 'www-data',
+        environment = 'prod',
+        name = 'hello'
+      )
+    ]
+
+
+Jobs without the service bit set only restart up to `max_task_failures` times and only if they
+terminated unsuccessfully either due to human error or machine failure (see the
+[`Job`](../reference/configuration.md#job-objects) object for details).
+
+
+Ports
+-----
+
+In order to be useful, most services have to bind to one or more ports. Aurora enables this
+usecase via the [`thermos.ports` namespace](..reference/configuration.md#thermos-namespace) that
+allows to request arbitrarily named ports:
+
+
+    nginx = Process(
+      name = 'nginx',
+      cmdline = './run_nginx.sh -port {{thermos.ports[http]}}'
+    )
+
+
+When this process is included in a job, the job will be allocated a port, and the command line
+will be replaced with something like:
+
+    ./run_nginx.sh -port 42816
+
+Where 42816 happens to be the allocated port.
+
+For details on how to enable clients to discover this dynamically assigned port, see our
+[Service Discovery](service-discovery.md) documentation.
+
+
+Health Checking
+---------------
+
+Typically, the Thermos executor monitors processes within a task only by liveness of the forked
+process. In addition to that, Aurora has support for rudimentary health checking: Either via HTTP
+via custom shell scripts.
+
+For example, simply by requesting a `health` port, a process can request to be health checked
+via repeated calls to the `/health` endpoint:
+
+    nginx = Process(
+      name = 'nginx',
+      cmdline = './run_nginx.sh -port {{thermos.ports[health]}}'
+    )
+
+Please see the
+[configuration reference](../reference/configuration.md#user-content-healthcheckconfig-objects)
+for configuration options for this feature.
+
+You can pause health checking by touching a file inside of your sandbox, named `.healthchecksnooze`.
+As long as that file is present, health checks will be disabled, enabling users to gather core
+dumps or other performance measurements without worrying about Aurora's health check killing
+their process.
+
+WARNING: Remember to remove this when you are done, otherwise your instance will have permanently
+disabled health checks.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/features/sla-metrics.md
----------------------------------------------------------------------
diff --git a/docs/features/sla-metrics.md b/docs/features/sla-metrics.md
new file mode 100644
index 0000000..11a1ced
--- /dev/null
+++ b/docs/features/sla-metrics.md
@@ -0,0 +1,178 @@
+Aurora SLA Measurement
+======================
+
+- [Overview](#overview)
+- [Metric Details](#metric-details)
+  - [Platform Uptime](#platform-uptime)
+  - [Job Uptime](#job-uptime)
+  - [Median Time To Assigned (MTTA)](#median-time-to-assigned-\(mtta\))
+  - [Median Time To Running (MTTR)](#median-time-to-running-\(mttr\))
+- [Limitations](#limitations)
+
+## Overview
+
+The primary goal of the feature is collection and monitoring of Aurora job SLA (Service Level
+Agreements) metrics that defining a contractual relationship between the Aurora/Mesos platform
+and hosted services.
+
+The Aurora SLA feature is by default only enabled for service (non-cron)
+production jobs (`"production=True"` in your `.aurora` config). It can be enabled for
+non-production services by an operator via the scheduler command line flag `-sla_non_prod_metrics`.
+
+Counters that track SLA measurements are computed periodically within the scheduler.
+The individual instance metrics are refreshed every minute (configurable via
+`sla_stat_refresh_interval`). The instance counters are subsequently aggregated by
+relevant grouping types before exporting to scheduler `/vars` endpoint (when using `vagrant`
+that would be `http://192.168.33.7:8081/vars`)
+
+
+## Metric Details
+
+### Platform Uptime
+
+*Aggregate amount of time a job spends in a non-runnable state due to platform unavailability
+or scheduling delays. This metric tracks Aurora/Mesos uptime performance and reflects on any
+system-caused downtime events (tasks LOST or DRAINED). Any user-initiated task kills/restarts
+will not degrade this metric.*
+
+**Collection scope:**
+
+* Per job - `sla_<job_key>_platform_uptime_percent`
+* Per cluster - `sla_cluster_platform_uptime_percent`
+
+**Units:** percent
+
+A fault in the task environment may cause the Aurora/Mesos to have different views on the task state
+or lose track of the task existence. In such cases, the service task is marked as LOST and
+rescheduled by Aurora. For example, this may happen when the task stays in ASSIGNED or STARTING
+for too long or the Mesos slave becomes unhealthy (or disappears completely). The time between
+task entering LOST and its replacement reaching RUNNING state is counted towards platform downtime.
+
+Another example of a platform downtime event is the administrator-requested task rescheduling. This
+happens during planned Mesos slave maintenance when all slave tasks are marked as DRAINED and
+rescheduled elsewhere.
+
+To accurately calculate Platform Uptime, we must separate platform incurred downtime from user
+actions that put a service instance in a non-operational state. It is simpler to isolate
+user-incurred downtime and treat all other downtime as platform incurred.
+
+Currently, a user can cause a healthy service (task) downtime in only two ways: via `killTasks`
+or `restartShards` RPCs. For both, their affected tasks leave an audit state transition trail
+relevant to uptime calculations. By applying a special "SLA meaning" to exposed task state
+transition records, we can build a deterministic downtime trace for every given service instance.
+
+A task going through a state transition carries one of three possible SLA meanings
+(see [SlaAlgorithm.java](../../src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java) for
+sla-to-task-state mapping):
+
+* Task is UP: starts a period where the task is considered to be up and running from the Aurora
+  platform standpoint.
+
+* Task is DOWN: starts a period where the task cannot reach the UP state for some
+  non-user-related reason. Counts towards instance downtime.
+
+* Task is REMOVED from SLA: starts a period where the task is not expected to be UP due to
+  user initiated action or failure. We ignore this period for the uptime calculation purposes.
+
+This metric is recalculated over the last sampling period (last minute) to account for
+any UP/DOWN/REMOVED events. It ignores any UP/DOWN events not immediately adjacent to the
+sampling interval as well as adjacent REMOVED events.
+
+### Job Uptime
+
+*Percentage of the job instances considered to be in RUNNING state for the specified duration
+relative to request time. This is a purely application side metric that is considering aggregate
+uptime of all RUNNING instances. Any user- or platform initiated restarts directly affect
+this metric.*
+
+**Collection scope:** We currently expose job uptime values at 5 pre-defined
+percentiles (50th,75th,90th,95th and 99th):
+
+* `sla_<job_key>_job_uptime_50_00_sec`
+* `sla_<job_key>_job_uptime_75_00_sec`
+* `sla_<job_key>_job_uptime_90_00_sec`
+* `sla_<job_key>_job_uptime_95_00_sec`
+* `sla_<job_key>_job_uptime_99_00_sec`
+
+**Units:** seconds
+You can also get customized real-time stats from aurora client. See `aurora sla -h` for
+more details.
+
+### Median Time To Assigned (MTTA)
+
+*Median time a job spends waiting for its tasks to be assigned to a host. This is a combined
+metric that helps track the dependency of scheduling performance on the requested resources
+(user scope) as well as the internal scheduler bin-packing algorithm efficiency (platform scope).*
+
+**Collection scope:**
+
+* Per job - `sla_<job_key>_mtta_ms`
+* Per cluster - `sla_cluster_mtta_ms`
+* Per instance size (small, medium, large, x-large, xx-large). Size are defined in:
+[ResourceAggregates.java](../../src/main/java/org/apache/aurora/scheduler/base/ResourceAggregates.java)
+  * By CPU:
+    * `sla_cpu_small_mtta_ms`
+    * `sla_cpu_medium_mtta_ms`
+    * `sla_cpu_large_mtta_ms`
+    * `sla_cpu_xlarge_mtta_ms`
+    * `sla_cpu_xxlarge_mtta_ms`
+  * By RAM:
+    * `sla_ram_small_mtta_ms`
+    * `sla_ram_medium_mtta_ms`
+    * `sla_ram_large_mtta_ms`
+    * `sla_ram_xlarge_mtta_ms`
+    * `sla_ram_xxlarge_mtta_ms`
+  * By DISK:
+    * `sla_disk_small_mtta_ms`
+    * `sla_disk_medium_mtta_ms`
+    * `sla_disk_large_mtta_ms`
+    * `sla_disk_xlarge_mtta_ms`
+    * `sla_disk_xxlarge_mtta_ms`
+
+**Units:** milliseconds
+
+MTTA only considers instances that have already reached ASSIGNED state and ignores those
+that are still PENDING. This ensures straggler instances (e.g. with unreasonable resource
+constraints) do not affect metric curves.
+
+### Median Time To Running (MTTR)
+
+*Median time a job waits for its tasks to reach RUNNING state. This is a comprehensive metric
+reflecting on the overall time it takes for the Aurora/Mesos to start executing user content.*
+
+**Collection scope:**
+
+* Per job - `sla_<job_key>_mttr_ms`
+* Per cluster - `sla_cluster_mttr_ms`
+* Per instance size (small, medium, large, x-large, xx-large). Size are defined in:
+[ResourceAggregates.java](../../src/main/java/org/apache/aurora/scheduler/base/ResourceAggregates.java)
+  * By CPU:
+    * `sla_cpu_small_mttr_ms`
+    * `sla_cpu_medium_mttr_ms`
+    * `sla_cpu_large_mttr_ms`
+    * `sla_cpu_xlarge_mttr_ms`
+    * `sla_cpu_xxlarge_mttr_ms`
+  * By RAM:
+    * `sla_ram_small_mttr_ms`
+    * `sla_ram_medium_mttr_ms`
+    * `sla_ram_large_mttr_ms`
+    * `sla_ram_xlarge_mttr_ms`
+    * `sla_ram_xxlarge_mttr_ms`
+  * By DISK:
+    * `sla_disk_small_mttr_ms`
+    * `sla_disk_medium_mttr_ms`
+    * `sla_disk_large_mttr_ms`
+    * `sla_disk_xlarge_mttr_ms`
+    * `sla_disk_xxlarge_mttr_ms`
+
+**Units:** milliseconds
+
+MTTR only considers instances in RUNNING state. This ensures straggler instances (e.g. with
+unreasonable resource constraints) do not affect metric curves.
+
+## Limitations
+
+* The availability of Aurora SLA metrics is bound by the scheduler availability.
+
+* All metrics are calculated at a pre-defined interval (currently set at 1 minute).
+  Scheduler restarts may result in missed collections.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/getting-started/overview.md
----------------------------------------------------------------------
diff --git a/docs/getting-started/overview.md b/docs/getting-started/overview.md
new file mode 100644
index 0000000..854f8d7
--- /dev/null
+++ b/docs/getting-started/overview.md
@@ -0,0 +1,108 @@
+Aurora System Overview
+======================
+
+Apache Aurora is a service scheduler that runs on top of Apache Mesos, enabling you to run
+long-running services, cron jobs, and ad-hoc jobs that take advantage of Apache Mesos' scalability,
+fault-tolerance, and resource isolation.
+
+
+Components
+----------
+
+It is important to have an understanding of the components that make up
+a functioning Aurora cluster.
+
+![Aurora Components](../images/components.png)
+
+* **Aurora scheduler**
+  The scheduler is your primary interface to the work you run in your cluster.  You will
+  instruct it to run jobs, and it will manage them in Mesos for you.  You will also frequently use
+  the scheduler's read-only web interface as a heads-up display for what's running in your cluster.
+
+* **Aurora client**
+  The client (`aurora` command) is a command line tool that exposes primitives that you can use to
+  interact with the scheduler. The client operates on
+
+  Aurora also provides an admin client (`aurora_admin` command) that contains commands built for
+  cluster administrators.  You can use this tool to do things like manage user quotas and manage
+  graceful maintenance on machines in cluster.
+
+* **Aurora executor**
+  The executor (a.k.a. Thermos executor) is responsible for carrying out the workloads described in
+  the Aurora DSL (`.aurora` files).  The executor is what actually executes user processes.  It will
+  also perform health checking of tasks and register tasks in ZooKeeper for the purposes of dynamic
+  service discovery.
+
+* **Aurora observer**
+  The observer provides browser-based access to the status of individual tasks executing on worker
+  machines.  It gives insight into the processes executing, and facilitates browsing of task sandbox
+  directories.
+
+* **ZooKeeper**
+  [ZooKeeper](http://zookeeper.apache.org) is a distributed consensus system.  In an Aurora cluster
+  it is used for reliable election of the leading Aurora scheduler and Mesos master.
+
+* **Mesos master**
+  The master is responsible for tracking worker machines and performing accounting of their
+  resources.  The scheduler interfaces with the master to control the cluster.
+
+* **Mesos agent**
+  The agent receives work assigned by the scheduler and executes them.  It interfaces with Linux
+  isolation systems like cgroups, namespaces and Docker to manage the resource consumption of tasks.
+  When a user task is launched, the agent will launch the executor (in the context of a Linux cgroup
+  or Docker container depending upon the environment), which will in turn fork user processes.
+
+
+Jobs, Tasks and Processes
+--------------------------
+
+Aurora is a Mesos framework used to schedule *jobs* onto Mesos. Mesos
+cares about individual *tasks*, but typical jobs consist of dozens or
+hundreds of task replicas. Aurora provides a layer on top of Mesos with
+its `Job` abstraction. An Aurora `Job` consists of a task template and
+instructions for creating near-identical replicas of that task (modulo
+things like "instance id" or specific port numbers which may differ from
+machine to machine).
+
+How many tasks make up a Job is complicated. On a basic level, a Job consists of
+one task template and instructions for creating near-idential replicas of that task
+(otherwise referred to as "instances" or "shards").
+
+A task can merely be a single *process* corresponding to a single
+command line, such as `python2.7 my_script.py`. However, a task can also
+consist of many separate processes, which all run within a single
+sandbox. For example, running multiple cooperating agents together,
+such as `logrotate`, `installer`, master, or slave processes. This is
+where Thermos  comes in. While Aurora provides a `Job` abstraction on
+top of Mesos `Tasks`, Thermos provides a `Process` abstraction
+underneath Mesos `Task`s and serves as part of the Aurora framework's
+executor.
+
+You define `Job`s,` Task`s, and `Process`es in a configuration file.
+Configuration files are written in Python, and make use of the Pystachio
+templating language, along with specific Aurora, Mesos, and Thermos
+commands and methods. They end in a `.aurora` extension.
+
+This can be summarized as:
+
+* Aurora manages jobs made of tasks.
+* Mesos manages tasks made of processes.
+* Thermos manages processes.
+* All that is defined in `.aurora` configuration files
+
+![Aurora hierarchy](../images/aurora_hierarchy.png)
+
+Each `Task` has a *sandbox* created when the `Task` starts and garbage
+collected when it finishes. All of a `Task'`s processes run in its
+sandbox, so processes can share state by using a shared current working
+directory.
+
+The sandbox garbage collection policy considers many factors, most
+importantly age and size. It makes a best-effort attempt to keep
+sandboxes around as long as possible post-task in order for service
+owners to inspect data and logs, should the `Task` have completed
+abnormally. But you can't design your applications assuming sandboxes
+will be around forever, e.g. by building log saving or other
+checkpointing mechanisms directly into your application or into your
+`Job` description.
+

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/getting-started/tutorial.md
----------------------------------------------------------------------
diff --git a/docs/getting-started/tutorial.md b/docs/getting-started/tutorial.md
new file mode 100644
index 0000000..dd3ac0a
--- /dev/null
+++ b/docs/getting-started/tutorial.md
@@ -0,0 +1,258 @@
+# Aurora Tutorial
+
+This tutorial shows how to use the Aurora scheduler to run (and "`printf-debug`")
+a hello world program on Mesos. This is the recommended document for new Aurora users
+to start getting up to speed on the system.
+
+- [Prerequisite](#setup-install-aurora)
+- [The Script](#the-script)
+- [Aurora Configuration](#aurora-configuration)
+- [Creating the Job](#creating-the-job)
+- [Watching the Job Run](#watching-the-job-run)
+- [Cleanup](#cleanup)
+- [Next Steps](#next-steps)
+
+
+## Prerequisite
+
+This tutorial assumes you are running [Aurora locally using Vagrant](vagrant.md).
+However, in general the instructions are also applicable to any other
+[Aurora installation](../operations/installation.md).
+
+Unless otherwise stated, all commands are to be run from the root of the aurora
+repository clone.
+
+
+## The Script
+
+Our "hello world" application is a simple Python script that loops
+forever, displaying the time every few seconds. Copy the code below and
+put it in a file named `hello_world.py` in the root of your Aurora repository clone
+(Note: this directory is the same as `/vagrant` inside the Vagrant VMs).
+
+The script has an intentional bug, which we will explain later on.
+
+<!-- NOTE: If you are changing this file, be sure to also update examples/vagrant/test_tutorial.sh.
+-->
+```python
+import time
+
+def main():
+  SLEEP_DELAY = 10
+  # Python experts - ignore this blatant bug.
+  for i in xrang(100):
+    print("Hello world! The time is now: %s. Sleeping for %d secs" % (
+      time.asctime(), SLEEP_DELAY))
+    time.sleep(SLEEP_DELAY)
+
+if __name__ == "__main__":
+  main()
+```
+
+## Aurora Configuration
+
+Once we have our script/program, we need to create a *configuration
+file* that tells Aurora how to manage and launch our Job. Save the below
+code in the file `hello_world.aurora`.
+
+<!-- NOTE: If you are changing this file, be sure to also update examples/vagrant/test_tutorial.sh.
+-->
+```python
+pkg_path = '/vagrant/hello_world.py'
+
+# we use a trick here to make the configuration change with
+# the contents of the file, for simplicity.  in a normal setting, packages would be
+# versioned, and the version number would be changed in the configuration.
+import hashlib
+with open(pkg_path, 'rb') as f:
+  pkg_checksum = hashlib.md5(f.read()).hexdigest()
+
+# copy hello_world.py into the local sandbox
+install = Process(
+  name = 'fetch_package',
+  cmdline = 'cp %s . && echo %s && chmod +x hello_world.py' % (pkg_path, pkg_checksum))
+
+# run the script
+hello_world = Process(
+  name = 'hello_world',
+  cmdline = 'python -u hello_world.py')
+
+# describe the task
+hello_world_task = SequentialTask(
+  processes = [install, hello_world],
+  resources = Resources(cpu = 1, ram = 1*MB, disk=8*MB))
+
+jobs = [
+  Service(cluster = 'devcluster',
+          environment = 'devel',
+          role = 'www-data',
+          name = 'hello_world',
+          task = hello_world_task)
+]
+```
+
+There is a lot going on in that configuration file:
+
+1. From a "big picture" viewpoint, it first defines two
+Processes. Then it defines a Task that runs the two Processes in the
+order specified in the Task definition, as well as specifying what
+computational and memory resources are available for them.  Finally,
+it defines a Job that will schedule the Task on available and suitable
+machines. This Job is the sole member of a list of Jobs; you can
+specify more than one Job in a config file.
+
+2. At the Process level, it specifies how to get your code into the
+local sandbox in which it will run. It then specifies how the code is
+actually run once the second Process starts.
+
+For more about Aurora configuration files, see the [Configuration
+Tutorial](configuration-tutorial.md) and the [Aurora + Thermos
+Reference](configuration-reference.md) (preferably after finishing this
+tutorial).
+
+
+## Creating the Job
+
+We're ready to launch our job! To do so, we use the Aurora Client to
+issue a Job creation request to the Aurora scheduler.
+
+Many Aurora Client commands take a *job key* argument, which uniquely
+identifies a Job. A job key consists of four parts, each separated by a
+"/". The four parts are  `<cluster>/<role>/<environment>/<jobname>`
+in that order:
+
+* Cluster refers to the name of a particular Aurora installation.
+* Role names are user accounts existing on the slave machines. If you
+don't know what accounts are available, contact your sysadmin.
+* Environment names are namespaces; you can count on `test`, `devel`,
+`staging` and `prod` existing.
+* Jobname is the custom name of your job.
+
+When comparing two job keys, if any of the four parts is different from
+its counterpart in the other key, then the two job keys identify two separate
+jobs. If all four values are identical, the job keys identify the same job.
+
+The `clusters.json` [client configuration](../reference/client-cluster-configuration.md)
+for the Aurora scheduler defines the available cluster names.
+For Vagrant, from the top-level of your Aurora repository clone, do:
+
+    $ vagrant ssh
+
+Followed by:
+
+    vagrant@aurora:~$ cat /etc/aurora/clusters.json
+
+You'll see something like the following. The `name` value shown here, corresponds to a job key's cluster value.
+
+```javascript
+[{
+  "name": "devcluster",
+  "zk": "192.168.33.7",
+  "scheduler_zk_path": "/aurora/scheduler",
+  "auth_mechanism": "UNAUTHENTICATED",
+  "slave_run_directory": "latest",
+  "slave_root": "/var/lib/mesos"
+}]
+```
+
+The Aurora Client command that actually runs our Job is `aurora job create`. It creates a Job as
+specified by its job key and configuration file arguments and runs it.
+
+    aurora job create <cluster>/<role>/<environment>/<jobname> <config_file>
+
+Or for our example:
+
+    aurora job create devcluster/www-data/devel/hello_world /vagrant/hello_world.aurora
+
+After entering our virtual machine using `vagrant ssh`, this returns:
+
+    vagrant@aurora:~$ aurora job create devcluster/www-data/devel/hello_world /vagrant/hello_world.aurora
+     INFO] Creating job hello_world
+     INFO] Checking status of devcluster/www-data/devel/hello_world
+    Job create succeeded: job url=http://aurora.local:8081/scheduler/www-data/devel/hello_world
+
+
+## Watching the Job Run
+
+Now that our job is running, let's see what it's doing. Access the
+scheduler web interface at `http://$scheduler_hostname:$scheduler_port/scheduler`
+Or when using `vagrant`, `http://192.168.33.7:8081/scheduler`
+First we see what Jobs are scheduled:
+
+![Scheduled Jobs](../images/ScheduledJobs.png)
+
+Click on your user name, which in this case was `www-data`, and we see the Jobs associated
+with that role:
+
+![Role Jobs](../images/RoleJobs.png)
+
+If you click on your `hello_world` Job, you'll see:
+
+![hello_world Job](../images/HelloWorldJob.png)
+
+Oops, looks like our first job didn't quite work! The task is temporarily throttled for
+having failed on every attempt of the Aurora scheduler to run it. We have to figure out
+what is going wrong.
+
+On the Completed tasks tab, we see all past attempts of the Aurora scheduler to run our job.
+
+![Completed tasks tab](../images/CompletedTasks.png)
+
+We can navigate to the Task page of a failed run by clicking on the host link.
+
+![Task page](../images/TaskBreakdown.png)
+
+Once there, we see that the `hello_world` process failed. The Task page
+captures the standard error and standard output streams and makes them available.
+Clicking through to `stderr` on the failed `hello_world` process, we see what happened.
+
+![stderr page](../images/stderr.png)
+
+It looks like we made a typo in our Python script. We wanted `xrange`,
+not `xrang`. Edit the `hello_world.py` script to use the correct function
+and save it as `hello_world_v2.py`. Then update the `hello_world.aurora`
+configuration to the newest version.
+
+In order to try again, we can now instruct the scheduler to update our job:
+
+    vagrant@aurora:~$ aurora update start devcluster/www-data/devel/hello_world /vagrant/hello_world.aurora
+     INFO] Starting update for: hello_world
+    Job update has started. View your update progress at http://aurora.local:8081/scheduler/www-data/devel/hello_world/update/8ef38017-e60f-400d-a2f2-b5a8b724e95b
+
+This time, the task comes up.
+
+![Running Job](../images/RunningJob.png)
+
+By again clicking on the host, we inspect the Task page, and see that the
+`hello_world` process is running.
+
+![Running Task page](../images/runningtask.png)
+
+We then inspect the output by clicking on `stdout` and see our process'
+output:
+
+![stdout page](../images/stdout.png)
+
+## Cleanup
+
+Now that we're done, we kill the job using the Aurora client:
+
+    vagrant@aurora:~$ aurora job killall devcluster/www-data/devel/hello_world
+     INFO] Killing tasks for job: devcluster/www-data/devel/hello_world
+     INFO] Instances to be killed: [0]
+    Successfully killed instances [0]
+    Job killall succeeded
+
+The job page now shows the `hello_world` tasks as completed.
+
+![Killed Task page](../images/killedtask.png)
+
+## Next Steps
+
+Now that you've finished this Tutorial, you should read or do the following:
+
+- [The Aurora Configuration Tutorial](../reference/configuration-tutorial.md), which provides more examples
+  and best practices for writing Aurora configurations. You should also look at
+  the [Aurora Configuration Reference](../reference/configuration.md).
+- Explore the Aurora Client - use `aurora -h`, and read the
+  [Aurora Client Commands](../reference/client-commands.md) document.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/getting-started/vagrant.md
----------------------------------------------------------------------
diff --git a/docs/getting-started/vagrant.md b/docs/getting-started/vagrant.md
new file mode 100644
index 0000000..d6af3b5
--- /dev/null
+++ b/docs/getting-started/vagrant.md
@@ -0,0 +1,137 @@
+A local Cluster with Vagrant
+============================
+
+This document shows you how to configure a complete cluster using a virtual machine. This setup
+replicates a real cluster in your development machine as closely as possible. After you complete
+the steps outlined here, you will be ready to create and run your first Aurora job.
+
+The following sections describe these steps in detail:
+
+1. [Overview](#user-content-overview)
+1. [Install VirtualBox and Vagrant](#user-content-install-virtualbox-and-vagrant)
+1. [Clone the Aurora repository](#user-content-clone-the-aurora-repository)
+1. [Start the local cluster](#user-content-start-the-local-cluster)
+1. [Log onto the VM](#user-content-log-onto-the-vm)
+1. [Run your first job](#user-content-run-your-first-job)
+1. [Rebuild components](#user-content-rebuild-components)
+1. [Shut down or delete your local cluster](#user-content-shut-down-or-delete-your-local-cluster)
+1. [Troubleshooting](#user-content-troubleshooting)
+
+
+Overview
+--------
+
+The Aurora distribution includes a set of scripts that enable you to create a local cluster in
+your development machine. These scripts use [Vagrant](https://www.vagrantup.com/) and
+[VirtualBox](https://www.virtualbox.org/) to run and configure a virtual machine. Once the
+virtual machine is running, the scripts install and initialize Aurora and any required components
+to create the local cluster.
+
+
+Install VirtualBox and Vagrant
+------------------------------
+
+First, download and install [VirtualBox](https://www.virtualbox.org/) on your development machine.
+
+Then download and install [Vagrant](https://www.vagrantup.com/). To verify that the installation
+was successful, open a terminal window and type the `vagrant` command. You should see a list of
+common commands for this tool.
+
+
+Clone the Aurora repository
+---------------------------
+
+To obtain the Aurora source distribution, clone its Git repository using the following command:
+
+     git clone git://git.apache.org/aurora.git
+
+
+Start the local cluster
+-----------------------
+
+Now change into the `aurora/` directory, which contains the Aurora source code and
+other scripts and tools:
+
+     cd aurora/
+
+To start the local cluster, type the following command:
+
+     vagrant up
+
+This command uses the configuration scripts in the Aurora distribution to:
+
+* Download a Linux system image.
+* Start a virtual machine (VM) and configure it.
+* Install the required build tools on the VM.
+* Install Aurora's requirements (like [Mesos](http://mesos.apache.org/) and
+[Zookeeper](http://zookeeper.apache.org/)) on the VM.
+* Build and install Aurora from source on the VM.
+* Start Aurora's services on the VM.
+
+This process takes several minutes to complete.
+
+To verify that Aurora is running on the cluster, visit the following URLs:
+
+* Scheduler - http://192.168.33.7:8081
+* Observer - http://192.168.33.7:1338
+* Mesos Master - http://192.168.33.7:5050
+* Mesos Slave - http://192.168.33.7:5051
+
+
+Log onto the VM
+---------------
+
+To SSH into the VM, run the following command in your development machine:
+
+     vagrant ssh
+
+To verify that Aurora is installed in the VM, type the `aurora` command. You should see a list
+of arguments and possible commands.
+
+The `/vagrant` directory on the VM is mapped to the `aurora/` local directory
+from which you started the cluster. You can edit files inside this directory in your development
+machine and access them from the VM under `/vagrant`.
+
+A pre-installed `clusters.json` file refers to your local cluster as `devcluster`, which you
+will use in client commands.
+
+
+Run your first job
+------------------
+
+Now that your cluster is up and running, you are ready to define and run your first job in Aurora.
+For more information, see the [Aurora Tutorial](tutorial.md).
+
+
+Rebuild components
+------------------
+
+If you are changing Aurora code and would like to rebuild a component, you can use the `aurorabuild`
+command on the VM to build and restart a component.  This is considerably faster than destroying
+and rebuilding your VM.
+
+`aurorabuild` accepts a list of components to build and update. To get a list of supported
+components, invoke the `aurorabuild` command with no arguments:
+
+     vagrant ssh -c 'aurorabuild client'
+
+
+Shut down or delete your local cluster
+--------------------------------------
+
+To shut down your local cluster, run the `vagrant halt` command in your development machine. To
+start it again, run the `vagrant up` command.
+
+Once you are finished with your local cluster, or if you would otherwise like to start from scratch,
+you can use the command `vagrant destroy` to turn off and delete the virtual file system.
+
+
+Troubleshooting
+---------------
+
+Most of the vagrant related problems can be fixed by the following steps:
+
+* Destroying the vagrant environment with `vagrant destroy`
+* Killing any orphaned VMs (see AURORA-499) with `virtualbox` UI or `VBoxManage` command line tool
+* Cleaning the repository of build artifacts and other intermediate output with `git clean -fdx`
+* Bringing up the vagrant environment with `vagrant up`


[6/7] aurora git commit: Reorganize Documentation

Posted by se...@apache.org.
http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/configuration-tutorial.md
----------------------------------------------------------------------
diff --git a/docs/configuration-tutorial.md b/docs/configuration-tutorial.md
deleted file mode 100644
index 97664f3..0000000
--- a/docs/configuration-tutorial.md
+++ /dev/null
@@ -1,954 +0,0 @@
-Aurora Configuration Tutorial
-=============================
-
-How to write Aurora configuration files, including feature descriptions
-and best practices. When writing a configuration file, make use of
-`aurora job inspect`. It takes the same job key and configuration file
-arguments as `aurora job create` or `aurora update start`. It first ensures the
-configuration parses, then outputs it in human-readable form.
-
-You should read this after going through the general [Aurora Tutorial](tutorial.md).
-
-- [Aurora Configuration Tutorial](#user-content-aurora-configuration-tutorial)
-	- [The Basics](#user-content-the-basics)
-		- [Use Bottom-To-Top Object Ordering](#user-content-use-bottom-to-top-object-ordering)
-	- [An Example Configuration File](#user-content-an-example-configuration-file)
-	- [Defining Process Objects](#user-content-defining-process-objects)
-	- [Getting Your Code Into The Sandbox](#user-content-getting-your-code-into-the-sandbox)
-	- [Defining Task Objects](#user-content-defining-task-objects)
-		- [SequentialTask: Running Processes in Parallel or Sequentially](#user-content-sequentialtask-running-processes-in-parallel-or-sequentially)
-		- [SimpleTask](#user-content-simpletask)
-		- [Combining tasks](#user-content-combining-tasks)
-	- [Defining Job Objects](#user-content-defining-job-objects)
-	- [The jobs List](#user-content-the-jobs-list)
-	- [Templating](#user-content-templating)
-		- [Templating 1: Binding in Pystachio](#user-content-templating-1-binding-in-pystachio)
-		- [Structurals in Pystachio / Aurora](#user-content-structurals-in-pystachio--aurora)
-			- [Mustaches Within Structurals](#user-content-mustaches-within-structurals)
-		- [Templating 2: Structurals Are Factories](#user-content-templating-2-structurals-are-factories)
-			- [A Second Way of Templating](#user-content-a-second-way-of-templating)
-		- [Advanced Binding](#user-content-advanced-binding)
-			- [Bind Syntax](#user-content-bind-syntax)
-			- [Binding Complex Objects](#user-content-binding-complex-objects)
-				- [Lists](#user-content-lists)
-				- [Maps](#user-content-maps)
-				- [Structurals](#user-content-structurals)
-		- [Structural Binding](#user-content-structural-binding)
-	- [Configuration File Writing Tips And Best Practices](#user-content-configuration-file-writing-tips-and-best-practices)
-		- [Use As Few .aurora Files As Possible](#user-content-use-as-few-aurora-files-as-possible)
-		- [Avoid Boilerplate](#user-content-avoid-boilerplate)
-		- [Thermos Uses bash, But Thermos Is Not bash](#user-content-thermos-uses-bash-but-thermos-is-not-bash)
-			- [Bad](#user-content-bad)
-			- [Good](#user-content-good)
-		- [Rarely Use Functions In Your Configurations](#user-content-rarely-use-functions-in-your-configurations)
-			- [Bad](#user-content-bad-1)
-			- [Good](#user-content-good-1)
-
-The Basics
-----------
-
-To run a job on Aurora, you must specify a configuration file that tells
-Aurora what it needs to know to schedule the job, what Mesos needs to
-run the tasks the job is made up of, and what Thermos needs to run the
-processes that make up the tasks. This file must have
-a`.aurora` suffix.
-
-A configuration file defines a collection of objects, along with parameter
-values for their attributes. An Aurora configuration file contains the
-following three types of objects:
-
-- Job
-- Task
-- Process
-
-A configuration also specifies a list of `Job` objects assigned
-to the variable `jobs`.
-
-- jobs (list of defined Jobs to run)
-
-The `.aurora` file format is just Python. However, `Job`, `Task`,
-`Process`, and other classes are defined by a type-checked dictionary
-templating library called *Pystachio*, a powerful tool for
-configuration specification and reuse. Pystachio objects are tailored
-via {{}} surrounded templates.
-
-When writing your `.aurora` file, you may use any Pystachio datatypes, as
-well as any objects shown in the [*Aurora+Thermos Configuration
-Reference*](configuration-reference.md), without `import` statements - the
-Aurora config loader injects them automatically. Other than that, an `.aurora`
-file works like any other Python script.
-
-[*Aurora+Thermos Configuration Reference*](configuration-reference.md)
-has a full reference of all Aurora/Thermos defined Pystachio objects.
-
-### Use Bottom-To-Top Object Ordering
-
-A well-structured configuration starts with structural templates (if
-any). Structural templates encapsulate in their attributes all the
-differences between Jobs in the configuration that are not directly
-manipulated at the `Job` level, but typically at the `Process` or `Task`
-level. For example, if certain processes are invoked with slightly
-different settings or input.
-
-After structural templates, define, in order, `Process`es, `Task`s, and
-`Job`s.
-
-Structural template names should be *UpperCamelCased* and their
-instantiations are typically *UPPER\_SNAKE\_CASED*. `Process`, `Task`,
-and `Job` names are typically *lower\_snake\_cased*. Indentation is typically 2
-spaces.
-
-An Example Configuration File
------------------------------
-
-The following is a typical configuration file. Don't worry if there are
-parts you don't understand yet, but you may want to refer back to this
-as you read about its individual parts. Note that names surrounded by
-curly braces {{}} are template variables, which the system replaces with
-bound values for the variables.
-
-    # --- templates here ---
-	class Profile(Struct):
-	  package_version = Default(String, 'live')
-	  java_binary = Default(String, '/usr/lib/jvm/java-1.7.0-openjdk/bin/java')
-	  extra_jvm_options = Default(String, '')
-	  parent_environment = Default(String, 'prod')
-	  parent_serverset = Default(String,
-                                 '/foocorp/service/bird/{{parent_environment}}/bird')
-
-	# --- processes here ---
-	main = Process(
-	  name = 'application',
-	  cmdline = '{{profile.java_binary}} -server -Xmx1792m '
-	            '{{profile.extra_jvm_options}} '
-	            '-jar application.jar '
-	            '-upstreamService {{profile.parent_serverset}}'
-	)
-
-	# --- tasks ---
-	base_task = SequentialTask(
-	  name = 'application',
-	  processes = [
-	    Process(
-	      name = 'fetch',
-	      cmdline = 'curl -O
-                  https://packages.foocorp.com/{{profile.package_version}}/application.jar'),
-	  ]
-	)
-
-        # not always necessary but often useful to have separate task
-        # resource classes
-        staging_task = base_task(resources =
-                         Resources(cpu = 1.0,
-                                   ram = 2048*MB,
-                                   disk = 1*GB))
-	production_task = base_task(resources =
-                            Resources(cpu = 4.0,
-                                      ram = 2560*MB,
-                                      disk = 10*GB))
-
-	# --- job template ---
-	job_template = Job(
-	  name = 'application',
-	  role = 'myteam',
-	  contact = 'myteam-team@foocorp.com',
-	  instances = 20,
-	  service = True,
-	  task = production_task
-	)
-
-	# -- profile instantiations (if any) ---
-	PRODUCTION = Profile()
-	STAGING = Profile(
-	  extra_jvm_options = '-Xloggc:gc.log',
-	  parent_environment = 'staging'
-	)
-
-	# -- job instantiations --
-	jobs = [
-          job_template(cluster = 'cluster1', environment = 'prod')
-	               .bind(profile = PRODUCTION),
-
-          job_template(cluster = 'cluster2', environment = 'prod')
-	                .bind(profile = PRODUCTION),
-
-          job_template(cluster = 'cluster1',
-                        environment = 'staging',
-			service = False,
-			task = staging_task,
-			instances = 2)
-			.bind(profile = STAGING),
-	]
-
-## Defining Process Objects
-
-Processes are handled by the Thermos system. A process is a single
-executable step run as a part of an Aurora task, which consists of a
-bash-executable statement.
-
-The key (and required) `Process` attributes are:
-
--   `name`: Any string which is a valid Unix filename (no slashes,
-    NULLs, or leading periods). The `name` value must be unique relative
-    to other Processes in a `Task`.
--   `cmdline`: A command line run in a bash subshell, so you can use
-    bash scripts. Nothing is supplied for command-line arguments,
-    so `$*` is unspecified.
-
-Many tiny processes make managing configurations more difficult. For
-example, the following is a bad way to define processes.
-
-    copy = Process(
-      name = 'copy',
-      cmdline = 'curl -O https://packages.foocorp.com/app.zip'
-    )
-    unpack = Process(
-      name = 'unpack',
-      cmdline = 'unzip app.zip'
-    )
-    remove = Process(
-      name = 'remove',
-      cmdline = 'rm -f app.zip'
-    )
-    run = Process(
-      name = 'app',
-      cmdline = 'java -jar app.jar'
-    )
-    run_task = Task(
-      processes = [copy, unpack, remove, run],
-      constraints = order(copy, unpack, remove, run)
-    )
-
-Since `cmdline` runs in a bash subshell, you can chain commands
-with `&&` or `||`.
-
-When defining a `Task` that is just a list of Processes run in a
-particular order, use `SequentialTask`, as described in the [*Defining*
-`Task` *Objects*](#Task) section. The following simplifies and combines the
-above multiple `Process` definitions into just two.
-
-    stage = Process(
-      name = 'stage',
-      cmdline = 'curl -O https://packages.foocorp.com/app.zip && '
-                'unzip app.zip && rm -f app.zip')
-
-    run = Process(name = 'app', cmdline = 'java -jar app.jar')
-
-    run_task = SequentialTask(processes = [stage, run])
-
-`Process` also has optional attributes to customize its behaviour. Details can be found in the [*Aurora+Thermos Configuration Reference*](configuration-reference.md#process-objects).
-
-
-## Getting Your Code Into The Sandbox
-
-When using Aurora, you need to get your executable code into its "sandbox", specifically
-the Task sandbox where the code executes for the Processes that make up that Task.
-
-Each Task has a sandbox created when the Task starts and garbage
-collected when it finishes. All of a Task's processes run in its
-sandbox, so processes can share state by using a shared current
-working directory.
-
-Typically, you save this code somewhere. You then need to define a Process
-in your `.aurora` configuration file that fetches the code from that somewhere
-to where the slave can see it. For a public cloud, that can be anywhere public on
-the Internet, such as S3. For a private cloud internal storage, you need to put in
-on an accessible HDFS cluster or similar storage.
-
-The template for this Process is:
-
-    <name> = Process(
-      name = '<name>'
-      cmdline = '<command to copy and extract code archive into current working directory>'
-    )
-
-Note: Be sure the extracted code archive has an executable.
-
-## Defining Task Objects
-
-Tasks are handled by Mesos. A task is a collection of processes that
-runs in a shared sandbox. It's the fundamental unit Aurora uses to
-schedule the datacenter; essentially what Aurora does is find places
-in the cluster to run tasks.
-
-The key (and required) parts of a Task are:
-
--   `name`: A string giving the Task's name. By default, if a Task is
-    not given a name, it inherits the first name in its Process list.
-
--   `processes`: An unordered list of Process objects bound to the Task.
-    The value of the optional `constraints` attribute affects the
-    contents as a whole. Currently, the only constraint, `order`, determines if
-    the processes run in parallel or sequentially.
-
--   `resources`: A `Resource` object defining the Task's resource
-        footprint. A `Resource` object has three attributes:
-        -   `cpu`: A Float, the fractional number of cores the Task
-        requires.
-        -   `ram`: An Integer, RAM bytes the Task requires.
-        -   `disk`: An integer, disk bytes the Task requires.
-
-A basic Task definition looks like:
-
-    Task(
-        name="hello_world",
-        processes=[Process(name = "hello_world", cmdline = "echo hello world")],
-        resources=Resources(cpu = 1.0,
-                            ram = 1*GB,
-                            disk = 1*GB))
-
-A Task has optional attributes to customize its behaviour. Details can be found in the [*Aurora+Thermos Configuration Reference*](configuration-reference.md#task-object)
-
-
-### SequentialTask: Running Processes in Parallel or Sequentially
-
-By default, a Task with several Processes runs them in parallel. There
-are two ways to run Processes sequentially:
-
--   Include an `order` constraint in the Task definition's `constraints`
-    attribute whose arguments specify the processes' run order:
-
-        Task( ... processes=[process1, process2, process3],
-	          constraints = order(process1, process2, process3), ...)
-
--   Use `SequentialTask` instead of `Task`; it automatically runs
-    processes in the order specified in the `processes` attribute. No
-    `constraint` parameter is needed:
-
-        SequentialTask( ... processes=[process1, process2, process3] ...)
-
-### SimpleTask
-
-For quickly creating simple tasks, use the `SimpleTask` helper. It
-creates a basic task from a provided name and command line using a
-default set of resources. For example, in a .`aurora` configuration
-file:
-
-    SimpleTask(name="hello_world", command="echo hello world")
-
-is equivalent to
-
-    Task(name="hello_world",
-         processes=[Process(name = "hello_world", cmdline = "echo hello world")],
-         resources=Resources(cpu = 1.0,
-                             ram = 1*GB,
-                             disk = 1*GB))
-
-The simplest idiomatic Job configuration thus becomes:
-
-    import os
-    hello_world_job = Job(
-      task=SimpleTask(name="hello_world", command="echo hello world"),
-      role=os.getenv('USER'),
-      cluster="cluster1")
-
-When written to `hello_world.aurora`, you invoke it with a simple
-`aurora job create cluster1/$USER/test/hello_world hello_world.aurora`.
-
-### Combining tasks
-
-`Tasks.concat`(synonym,`concat_tasks`) and
-`Tasks.combine`(synonym,`combine_tasks`) merge multiple Task definitions
-into a single Task. It may be easier to define complex Jobs
-as smaller constituent Tasks. But since a Job only includes a single
-Task, the subtasks must be combined before using them in a Job.
-Smaller Tasks can also be reused between Jobs, instead of having to
-repeat their definition for multiple Jobs.
-
-With both methods, the merged Task takes the first Task's name. The
-difference between the two is the result Task's process ordering.
-
--   `Tasks.combine` runs its subtasks' processes in no particular order.
-    The new Task's resource consumption is the sum of all its subtasks'
-    consumption.
-
--   `Tasks.concat` runs its subtasks in the order supplied, with each
-    subtask's processes run serially between tasks. It is analogous to
-    the `order` constraint helper, except at the Task level instead of
-    the Process level. The new Task's resource consumption is the
-    maximum value specified by any subtask for each Resource attribute
-    (cpu, ram and disk).
-
-For example, given the following:
-
-    setup_task = Task(
-      ...
-      processes=[download_interpreter, update_zookeeper],
-      # It is important to note that {{Tasks.concat}} has
-      # no effect on the ordering of the processes within a task;
-      # hence the necessity of the {{order}} statement below
-      # (otherwise, the order in which {{download_interpreter}}
-      # and {{update_zookeeper}} run will be non-deterministic)
-      constraints=order(download_interpreter, update_zookeeper),
-      ...
-    )
-
-    run_task = SequentialTask(
-      ...
-      processes=[download_application, start_application],
-      ...
-    )
-
-    combined_task = Tasks.concat(setup_task, run_task)
-
-The `Tasks.concat` command merges the two Tasks into a single Task and
-ensures all processes in `setup_task` run before the processes
-in `run_task`. Conceptually, the task is reduced to:
-
-    task = Task(
-      ...
-      processes=[download_interpreter, update_zookeeper,
-                 download_application, start_application],
-      constraints=order(download_interpreter, update_zookeeper,
-                        download_application, start_application),
-      ...
-    )
-
-In the case of `Tasks.combine`, the two schedules run in parallel:
-
-    task = Task(
-      ...
-      processes=[download_interpreter, update_zookeeper,
-                 download_application, start_application],
-      constraints=order(download_interpreter, update_zookeeper) +
-                        order(download_application, start_application),
-      ...
-    )
-
-In the latter case, each of the two sequences may operate in parallel.
-Of course, this may not be the intended behavior (for example, if
-the `start_application` Process implicitly relies
-upon `download_interpreter`). Make sure you understand the difference
-between using one or the other.
-
-## Defining Job Objects
-
-A job is a group of identical tasks that Aurora can run in a Mesos cluster.
-
-A `Job` object is defined by the values of several attributes, some
-required and some optional. The required attributes are:
-
--   `task`: Task object to bind to this job. Note that a Job can
-    only take a single Task.
-
--   `role`: Job's role account; in other words, the user account to run
-    the job as on a Mesos cluster machine. A common value is
-    `os.getenv('USER')`; using a Python command to get the user who
-    submits the job request. The other common value is the service
-    account that runs the job, e.g. `www-data`.
-
--   `environment`: Job's environment, typical values
-    are `devel`, `test`, or `prod`.
-
--   `cluster`: Aurora cluster to schedule the job in, defined in
-    `/etc/aurora/clusters.json` or `~/.clusters.json`. You can specify
-    jobs where the only difference is the `cluster`, then at run time
-    only run the Job whose job key includes your desired cluster's name.
-
-You usually see a `name` parameter. By default, `name` inherits its
-value from the Job's associated Task object, but you can override this
-default. For these four parameters, a Job definition might look like:
-
-    foo_job = Job( name = 'foo', cluster = 'cluster1',
-              role = os.getenv('USER'), environment = 'prod',
-              task = foo_task)
-
-In addition to the required attributes, there are several optional
-attributes. Details can be found in the [Aurora+Thermos Configuration Reference](configuration-reference.md#job-objects).
-
-
-## The jobs List
-
-At the end of your `.aurora` file, you need to specify a list of the
-file's defined Jobs. For example, the following exports the jobs `job1`,
-`job2`, and `job3`.
-
-    jobs = [job1, job2, job3]
-
-This allows the aurora client to invoke commands on those jobs, such as
-starting, updating, or killing them.
-
-Templating
-----------
-
-The `.aurora` file format is just Python. However, `Job`, `Task`,
-`Process`, and other classes are defined by a templating library called
-*Pystachio*, a powerful tool for configuration specification and reuse.
-
-[Aurora+Thermos Configuration Reference](configuration-reference.md)
-has a full reference of all Aurora/Thermos defined Pystachio objects.
-
-When writing your `.aurora` file, you may use any Pystachio datatypes, as
-well as any objects shown in the *Aurora+Thermos Configuration
-Reference* without `import` statements - the Aurora config loader
-injects them automatically. Other than that the `.aurora` format
-works like any other Python script.
-
-### Templating 1: Binding in Pystachio
-
-Pystachio uses the visually distinctive {{}} to indicate template
-variables. These are often called "mustache variables" after the
-similarly appearing variables in the Mustache templating system and
-because the curly braces resemble mustaches.
-
-If you are familiar with the Mustache system, templates in Pystachio
-have significant differences. They have no nesting, joining, or
-inheritance semantics. On the other hand, when evaluated, templates
-are evaluated iteratively, so this affords some level of indirection.
-
-Let's start with the simplest template; text with one
-variable, in this case `name`;
-
-    Hello {{name}}
-
-If we evaluate this as is, we'd get back:
-
-    Hello
-
-If a template variable doesn't have a value, when evaluated it's
-replaced with nothing. If we add a binding to give it a value:
-
-    { "name" : "Tom" }
-
-We'd get back:
-
-    Hello Tom
-
-Every Pystachio object has an associated `.bind` method that can bind
-values to {{}} variables. Bindings are not immediately evaluated.
-Instead, they are evaluated only when the interpolated value of the
-object is necessary, e.g. for performing equality or serializing a
-message over the wire.
-
-Objects with and without mustache templated variables behave
-differently:
-
-    >>> Float(1.5)
-    Float(1.5)
-
-    >>> Float('{{x}}.5')
-    Float({{x}}.5)
-
-    >>> Float('{{x}}.5').bind(x = 1)
-    Float(1.5)
-
-    >>> Float('{{x}}.5').bind(x = 1) == Float(1.5)
-    True
-
-    >>> contextual_object = String('{{metavar{{number}}}}').bind(
-    ... metavar1 = "first", metavar2 = "second")
-
-    >>> contextual_object
-    String({{metavar{{number}}}})
-
-    >>> contextual_object.bind(number = 1)
-    String(first)
-
-    >>> contextual_object.bind(number = 2)
-    String(second)
-
-You usually bind simple key to value pairs, but you can also bind three
-other objects: lists, dictionaries, and structurals. These will be
-described in detail later.
-
-### Structurals in Pystachio / Aurora
-
-Most Aurora/Thermos users don't ever (knowingly) interact with `String`,
-`Float`, or `Integer` Pystashio objects directly. Instead they interact
-with derived structural (`Struct`) objects that are collections of
-fundamental and structural objects. The structural object components are
-called *attributes*. Aurora's most used structural objects are `Job`,
-`Task`, and `Process`:
-
-    class Process(Struct):
-      cmdline = Required(String)
-      name = Required(String)
-      max_failures = Default(Integer, 1)
-      daemon = Default(Boolean, False)
-      ephemeral = Default(Boolean, False)
-      min_duration = Default(Integer, 5)
-      final = Default(Boolean, False)
-
-Construct default objects by following the object's type with (). If you
-want an attribute to have a value different from its default, include
-the attribute name and value inside the parentheses.
-
-    >>> Process()
-    Process(daemon=False, max_failures=1, ephemeral=False,
-      min_duration=5, final=False)
-
-Attribute values can be template variables, which then receive specific
-values when creating the object.
-
-    >>> Process(cmdline = 'echo {{message}}')
-    Process(daemon=False, max_failures=1, ephemeral=False, min_duration=5,
-            cmdline=echo {{message}}, final=False)
-
-    >>> Process(cmdline = 'echo {{message}}').bind(message = 'hello world')
-    Process(daemon=False, max_failures=1, ephemeral=False, min_duration=5,
-            cmdline=echo hello world, final=False)
-
-A powerful binding property is that all of an object's children inherit its
-bindings:
-
-    >>> List(Process)([
-    ... Process(name = '{{prefix}}_one'),
-    ... Process(name = '{{prefix}}_two')
-    ... ]).bind(prefix = 'hello')
-    ProcessList(
-      Process(daemon=False, name=hello_one, max_failures=1, ephemeral=False, min_duration=5, final=False),
-      Process(daemon=False, name=hello_two, max_failures=1, ephemeral=False, min_duration=5, final=False)
-      )
-
-Remember that an Aurora Job contains Tasks which contain Processes. A
-Job level binding is inherited by its Tasks and all their Processes.
-Similarly a Task level binding is available to that Task and its
-Processes but is *not* visible at the Job level (inheritance is a
-one-way street.)
-
-#### Mustaches Within Structurals
-
-When you define a `Struct` schema, one powerful, but confusing, feature
-is that all of that structure's attributes are Mustache variables within
-the enclosing scope *once they have been populated*.
-
-For example, when `Process` is defined above, all its attributes such as
-{{`name`}}, {{`cmdline`}}, {{`max_failures`}} etc., are all immediately
-defined as Mustache variables, implicitly bound into the `Process`, and
-inherit all child objects once they are defined.
-
-Thus, you can do the following:
-
-    >>> Process(name = "installer", cmdline = "echo {{name}} is running")
-    Process(daemon=False, name=installer, max_failures=1, ephemeral=False, min_duration=5,
-            cmdline=echo installer is running, final=False)
-
-WARNING: This binding only takes place in one direction. For example,
-the following does NOT work and does not set the `Process` `name`
-attribute's value.
-
-    >>> Process().bind(name = "installer")
-    Process(daemon=False, max_failures=1, ephemeral=False, min_duration=5, final=False)
-
-The following is also not possible and results in an infinite loop that
-attempts to resolve `Process.name`.
-
-    >>> Process(name = '{{name}}').bind(name = 'installer')
-
-Do not confuse Structural attributes with bound Mustache variables.
-Attributes are implicitly converted to Mustache variables but not vice
-versa.
-
-### Templating 2: Structurals Are Factories
-
-#### A Second Way of Templating
-
-A second templating method is both as powerful as the aforementioned and
-often confused with it. This method is due to automatic conversion of
-Struct attributes to Mustache variables as described above.
-
-Suppose you create a Process object:
-
-    >>> p = Process(name = "process_one", cmdline = "echo hello world")
-
-    >>> p
-    Process(daemon=False, name=process_one, max_failures=1, ephemeral=False, min_duration=5,
-            cmdline=echo hello world, final=False)
-
-This `Process` object, "`p`", can be used wherever a `Process` object is
-needed. It can also be reused by changing the value(s) of its
-attribute(s). Here we change its `name` attribute from `process_one` to
-`process_two`.
-
-    >>> p(name = "process_two")
-    Process(daemon=False, name=process_two, max_failures=1, ephemeral=False, min_duration=5,
-            cmdline=echo hello world, final=False)
-
-Template creation is a common use for this technique:
-
-    >>> Daemon = Process(daemon = True)
-    >>> logrotate = Daemon(name = 'logrotate', cmdline = './logrotate conf/logrotate.conf')
-    >>> mysql = Daemon(name = 'mysql', cmdline = 'bin/mysqld --safe-mode')
-
-### Advanced Binding
-
-As described above, `.bind()` binds simple strings or numbers to
-Mustache variables. In addition to Structural types formed by combining
-atomic types, Pystachio has two container types; `List` and `Map` which
-can also be bound via `.bind()`.
-
-#### Bind Syntax
-
-The `bind()` function can take Python dictionaries or `kwargs`
-interchangeably (when "`kwargs`" is in a function definition, `kwargs`
-receives a Python dictionary containing all keyword arguments after the
-formal parameter list).
-
-    >>> String('{{foo}}').bind(foo = 'bar') == String('{{foo}}').bind({'foo': 'bar'})
-    True
-
-Bindings done "closer" to the object in question take precedence:
-
-    >>> p = Process(name = '{{context}}_process')
-    >>> t = Task().bind(context = 'global')
-    >>> t(processes = [p, p.bind(context = 'local')])
-    Task(processes=ProcessList(
-      Process(daemon=False, name=global_process, max_failures=1, ephemeral=False, final=False,
-              min_duration=5),
-      Process(daemon=False, name=local_process, max_failures=1, ephemeral=False, final=False,
-              min_duration=5)
-    ))
-
-#### Binding Complex Objects
-
-##### Lists
-
-    >>> fibonacci = List(Integer)([1, 1, 2, 3, 5, 8, 13])
-    >>> String('{{fib[4]}}').bind(fib = fibonacci)
-    String(5)
-
-##### Maps
-
-    >>> first_names = Map(String, String)({'Kent': 'Clark', 'Wayne': 'Bruce', 'Prince': 'Diana'})
-    >>> String('{{first[Kent]}}').bind(first = first_names)
-    String(Clark)
-
-##### Structurals
-
-    >>> String('{{p.cmdline}}').bind(p = Process(cmdline = "echo hello world"))
-    String(echo hello world)
-
-### Structural Binding
-
-Use structural templates when binding more than two or three individual
-values at the Job or Task level. For fewer than two or three, standard
-key to string binding is sufficient.
-
-Structural binding is a very powerful pattern and is most useful in
-Aurora/Thermos for doing Structural configuration. For example, you can
-define a job profile. The following profile uses `HDFS`, the Hadoop
-Distributed File System, to designate a file's location. `HDFS` does
-not come with Aurora, so you'll need to either install it separately
-or change the way the dataset is designated.
-
-    class Profile(Struct):
-      version = Required(String)
-      environment = Required(String)
-      dataset = Default(String, hdfs://home/aurora/data/{{environment}}')
-
-    PRODUCTION = Profile(version = 'live', environment = 'prod')
-    DEVEL = Profile(version = 'latest',
-                    environment = 'devel',
-                    dataset = 'hdfs://home/aurora/data/test')
-    TEST = Profile(version = 'latest', environment = 'test')
-
-    JOB_TEMPLATE = Job(
-      name = 'application',
-      role = 'myteam',
-      cluster = 'cluster1',
-      environment = '{{profile.environment}}',
-      task = SequentialTask(
-        name = 'task',
-        resources = Resources(cpu = 2, ram = 4*GB, disk = 8*GB),
-        processes = [
-	  Process(name = 'main', cmdline = 'java -jar application.jar -hdfsPath
-                 {{profile.dataset}}')
-        ]
-       )
-     )
-
-    jobs = [
-      JOB_TEMPLATE(instances = 100).bind(profile = PRODUCTION),
-      JOB_TEMPLATE.bind(profile = DEVEL),
-      JOB_TEMPLATE.bind(profile = TEST),
-     ]
-
-In this case, a custom structural "Profile" is created to self-document
-the configuration to some degree. This also allows some schema
-"type-checking", and for default self-substitution, e.g. in
-`Profile.dataset` above.
-
-So rather than a `.bind()` with a half-dozen substituted variables, you
-can bind a single object that has sensible defaults stored in a single
-place.
-
-Configuration File Writing Tips And Best Practices
---------------------------------------------------
-
-### Use As Few .aurora Files As Possible
-
-When creating your `.aurora` configuration, try to keep all versions of
-a particular job within the same `.aurora` file. For example, if you
-have separate jobs for `cluster1`, `cluster1` staging, `cluster1`
-testing, and`cluster2`, keep them as close together as possible.
-
-Constructs shared across multiple jobs owned by your team (e.g.
-team-level defaults or structural templates) can be split into separate
-`.aurora`files and included via the `include` directive.
-
-### Avoid Boilerplate
-
-If you see repetition or find yourself copy and pasting any parts of
-your configuration, it's likely an opportunity for templating. Take the
-example below:
-
-`redundant.aurora` contains:
-
-    download = Process(
-      name = 'download',
-      cmdline = 'wget http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tar.bz2',
-      max_failures = 5,
-      min_duration = 1)
-
-    unpack = Process(
-      name = 'unpack',
-      cmdline = 'rm -rf Python-2.7.3 && tar xzf Python-2.7.3.tar.bz2',
-      max_failures = 5,
-      min_duration = 1)
-
-    build = Process(
-      name = 'build',
-      cmdline = 'pushd Python-2.7.3 && ./configure && make && popd',
-      max_failures = 1)
-
-    email = Process(
-      name = 'email',
-      cmdline = 'echo Success | mail feynman@tmc.com',
-      max_failures = 5,
-      min_duration = 1)
-
-    build_python = Task(
-      name = 'build_python',
-      processes = [download, unpack, build, email],
-      constraints = [Constraint(order = ['download', 'unpack', 'build', 'email'])])
-
-As you'll notice, there's a lot of repetition in the `Process`
-definitions. For example, almost every process sets a `max_failures`
-limit to 5 and a `min_duration` to 1. This is an opportunity for factoring
-into a common process template.
-
-Furthermore, the Python version is repeated everywhere. This can be
-bound via structural templating as described in the [Advanced Binding](#AdvancedBinding)
-section.
-
-`less_redundant.aurora` contains:
-
-    class Python(Struct):
-      version = Required(String)
-      base = Default(String, 'Python-{{version}}')
-      package = Default(String, '{{base}}.tar.bz2')
-
-    ReliableProcess = Process(
-      max_failures = 5,
-      min_duration = 1)
-
-    download = ReliableProcess(
-      name = 'download',
-      cmdline = 'wget http://www.python.org/ftp/python/{{python.version}}/{{python.package}}')
-
-    unpack = ReliableProcess(
-      name = 'unpack',
-      cmdline = 'rm -rf {{python.base}} && tar xzf {{python.package}}')
-
-    build = ReliableProcess(
-      name = 'build',
-      cmdline = 'pushd {{python.base}} && ./configure && make && popd',
-      max_failures = 1)
-
-    email = ReliableProcess(
-      name = 'email',
-      cmdline = 'echo Success | mail {{role}}@foocorp.com')
-
-    build_python = SequentialTask(
-      name = 'build_python',
-      processes = [download, unpack, build, email]).bind(python = Python(version = "2.7.3"))
-
-### Thermos Uses bash, But Thermos Is Not bash
-
-#### Bad
-
-Many tiny Processes makes for harder to manage configurations.
-
-    copy = Process(
-      name = 'copy',
-      cmdline = 'rcp user@my_machine:my_application .'
-     )
-
-     unpack = Process(
-       name = 'unpack',
-       cmdline = 'unzip app.zip'
-     )
-
-     remove = Process(
-       name = 'remove',
-       cmdline = 'rm -f app.zip'
-     )
-
-     run = Process(
-       name = 'app',
-       cmdline = 'java -jar app.jar'
-     )
-
-     run_task = Task(
-       processes = [copy, unpack, remove, run],
-       constraints = order(copy, unpack, remove, run)
-     )
-
-#### Good
-
-Each `cmdline` runs in a bash subshell, so you have the full power of
-bash. Chaining commands with `&&` or `||` is almost always the right
-thing to do.
-
-Also for Tasks that are simply a list of processes that run one after
-another, consider using the `SequentialTask` helper which applies a
-linear ordering constraint for you.
-
-    stage = Process(
-      name = 'stage',
-      cmdline = 'rcp user@my_machine:my_application . && unzip app.zip && rm -f app.zip')
-
-    run = Process(name = 'app', cmdline = 'java -jar app.jar')
-
-    run_task = SequentialTask(processes = [stage, run])
-
-### Rarely Use Functions In Your Configurations
-
-90% of the time you define a function in a `.aurora` file, you're
-probably Doing It Wrong(TM).
-
-#### Bad
-
-    def get_my_task(name, user, cpu, ram, disk):
-      return Task(
-        name = name,
-        user = user,
-        processes = [STAGE_PROCESS, RUN_PROCESS],
-        constraints = order(STAGE_PROCESS, RUN_PROCESS),
-        resources = Resources(cpu = cpu, ram = ram, disk = disk)
-     )
-
-     task_one = get_my_task('task_one', 'feynman', 1.0, 32*MB, 1*GB)
-     task_two = get_my_task('task_two', 'feynman', 2.0, 64*MB, 1*GB)
-
-#### Good
-
-This one is more idiomatic. Forced keyword arguments prevents accidents,
-e.g. constructing a task with "32*MB" when you mean 32MB of ram and not
-disk. Less proliferation of task-construction techniques means
-easier-to-read, quicker-to-understand, and a more composable
-configuration.
-
-    TASK_TEMPLATE = SequentialTask(
-      user = 'wickman',
-      processes = [STAGE_PROCESS, RUN_PROCESS],
-    )
-
-    task_one = TASK_TEMPLATE(
-      name = 'task_one',
-      resources = Resources(cpu = 1.0, ram = 32*MB, disk = 1*GB) )
-
-    task_two = TASK_TEMPLATE(
-      name = 'task_two',
-      resources = Resources(cpu = 2.0, ram = 64*MB, disk = 1*GB)
-    )

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/cron-jobs.md
----------------------------------------------------------------------
diff --git a/docs/cron-jobs.md b/docs/cron-jobs.md
deleted file mode 100644
index 0f98425..0000000
--- a/docs/cron-jobs.md
+++ /dev/null
@@ -1,131 +0,0 @@
-# Cron Jobs
-
-Aurora supports execution of scheduled jobs on a Mesos cluster using cron-style syntax.
-
-- [Overview](#overview)
-- [Collision Policies](#collision-policies)
-	- [KILL_EXISTING](#kill_existing)
-	- [CANCEL_NEW](#cancel_new)
-- [Failure recovery](#failure-recovery)
-- [Interacting with cron jobs via the Aurora CLI](#interacting-with-cron-jobs-via-the-aurora-cli)
-	- [cron schedule](#cron-schedule)
-	- [cron deschedule](#cron-deschedule)
-	- [cron start](#cron-start)
-	- [job killall, job restart, job kill](#job-killall-job-restart-job-kill)
-- [Technical Note About Syntax](#technical-note-about-syntax)
-- [Caveats](#caveats)
-	- [Failovers](#failovers)
-	- [Collision policy is best-effort](#collision-policy-is-best-effort)
-	- [Timezone Configuration](#timezone-configuration)
-
-## Overview
-
-A job is identified as a cron job by the presence of a
-`cron_schedule` attribute containing a cron-style schedule in the
-[`Job`](configuration-reference.md#job-objects) object. Examples of cron schedules
-include "every 5 minutes" (`*/5 * * * *`), "Fridays at 17:00" (`* 17 * * FRI`), and
-"the 1st and 15th day of the month at 03:00" (`0 3 1,15 *`).
-
-Example (available in the [Vagrant environment](vagrant.md)):
-
-    $ cat /vagrant/examples/job/cron_hello_world.aurora
-    # cron_hello_world.aurora
-    # A cron job that runs every 5 minutes.
-    jobs = [
-      Job(
-        cluster = 'devcluster',
-        role = 'www-data',
-        environment = 'test',
-        name = 'cron_hello_world',
-        cron_schedule = '*/5 * * * *',
-        task = SimpleTask(
-          'cron_hello_world',
-          'echo "Hello world from cron, the time is now $(date --rfc-822)"'),
-      ),
-    ]
-
-## Collision Policies
-
-The `cron_collision_policy` field specifies the scheduler's behavior when a new cron job is
-triggered while an older run hasn't finished. The scheduler has two policies available,
-[KILL_EXISTING](#kill_existing) and [CANCEL_NEW](#cancel_new).
-
-### KILL_EXISTING
-
-The default policy - on a collision the old instances are killed and a instances with the current
-configuration are started.
-
-### CANCEL_NEW
-
-On a collision the new run is cancelled.
-
-Note that the use of this flag is likely a code smell - interrupted cron jobs should be able
-to recover their progress on a subsequent invocation, otherwise they risk having their work queue
-grow faster than they can process it.
-
-## Failure recovery
-
-Unlike with services, which aurora will always re-execute regardless of exit status, instances of
-cron jobs retry according to the `max_task_failures` attribute of the
-[Task](configuration-reference.md#task-objects) object. To get "run-until-success" semantics,
-set `max_task_failures` to `-1`.
-
-## Interacting with cron jobs via the Aurora CLI
-
-Most interaction with cron jobs takes place using the `cron` subcommand. See `aurora cron -h`
-for up-to-date usage instructions.
-
-### cron schedule
-Schedules a new cron job on the Aurora cluster for later runs or replaces the existing cron template
-with a new one. Only future runs will be affected, any existing active tasks are left intact.
-
-    $ aurora cron schedule devcluster/www-data/test/cron_hello_world /vagrant/examples/jobs/cron_hello_world.aurora
-
-### cron deschedule
-Deschedules a cron job, preventing future runs but allowing current runs to complete.
-
-    $ aurora cron deschedule devcluster/www-data/test/cron_hello_world
-
-### cron start
-Start a cron job immediately, outside of its normal cron schedule.
-
-    $ aurora cron start devcluster/www-data/test/cron_hello_world
-
-### job killall, job restart, job kill
-Cron jobs create instances running on the cluster that you can interact with like normal Aurora
-tasks with `job kill` and `job restart`.
-
-## Technical Note About Syntax
-
-`cron_schedule` uses a restricted subset of BSD crontab syntax. While the
-execution engine currently uses Quartz, the schedule parsing is custom, a subset of FreeBSD
-[crontab(5)](http://www.freebsd.org/cgi/man.cgi?crontab(5)) syntax. See
-[the source](https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/cron/CrontabEntry.java#L106-L124)
-for details.
-
-## Caveats
-
-### Failovers
-No failover recovery. Aurora does not record the latest minute it fired
-triggers for across failovers. Therefore it's possible to miss triggers
-on failover. Note that this behavior may change in the future.
-
-It's necessary to sync time between schedulers with something like `ntpd`.
-Clock skew could cause double or missed triggers in the case of a failover.
-
-### Collision policy is best-effort
-Aurora aims to always have *at least one copy* of a given instance running at a time - it's
-an AP system, meaning it chooses Availability and Partition Tolerance at the expense of
-Consistency.
-
-If your collision policy was `CANCEL_NEW` and a task has terminated but
-Aurora has not noticed this Aurora will go ahead and create your new
-task.
-
-If your collision policy was `KILL_EXISTING` and a task was marked `LOST`
-but not yet GCed Aurora will go ahead and create your new task without
-attempting to kill the old one (outside the GC interval).
-
-### Timezone Configuration
-Cron timezone is configured indepdendently of JVM timezone with the `-cron_timezone` flag and
-defaults to UTC.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/deploying-aurora-scheduler.md
----------------------------------------------------------------------
diff --git a/docs/deploying-aurora-scheduler.md b/docs/deploying-aurora-scheduler.md
deleted file mode 100644
index 03bfdba..0000000
--- a/docs/deploying-aurora-scheduler.md
+++ /dev/null
@@ -1,379 +0,0 @@
-# Deploying the Aurora Scheduler
-
-When setting up your cluster, you will install the scheduler on a small number (usually 3 or 5) of
-machines.  This guide helps you get the scheduler set up and troubleshoot some common hurdles.
-
-- [Installing Aurora](#installing-aurora)
-  - [Creating the Distribution .zip File (Optional)](#creating-the-distribution-zip-file-optional)
-  - [Installing Aurora](#installing-aurora-1)
-- [Configuring Aurora](#configuring-aurora)
-  - [A Note on Configuration](#a-note-on-configuration)
-  - [Replicated Log Configuration](#replicated-log-configuration)
-  - [Initializing the Replicated Log](#initializing-the-replicated-log)
-  - [Storage Performance Considerations](#storage-performance-considerations)
-  - [Network considerations](#network-considerations)
-  - [Considerations for running jobs in docker](#considerations-for-running-jobs-in-docker)
-  - [Security Considerations](#security-considerations)
-  - [Configuring Resource Oversubscription](#configuring-resource-oversubscription)
-  - [Process Logs](#process-logs)
-- [Running Aurora](#running-aurora)
-  - [Maintaining an Aurora Installation](#maintaining-an-aurora-installation)
-  - [Monitoring](#monitoring)
-  - [Running stateful services](#running-stateful-services)
-    - [Dedicated attribute](#dedicated-attribute)
-      - [Syntax](#syntax)
-      - [Example](#example)
-- [Best practices](#best-practices)
-  - [Diversity](#diversity)
-- [Common problems](#common-problems)
-  - [Replicated log not initialized](#replicated-log-not-initialized)
-    - [Symptoms](#symptoms)
-    - [Solution](#solution)
-  - [Scheduler not registered](#scheduler-not-registered)
-    - [Symptoms](#symptoms-1)
-    - [Solution](#solution-1)
-- [Changing Scheduler Quorum Size](#changing-scheduler-quorum-size)
-    - [Preparation](#preparation)
-    - [Adding New Schedulers](#adding-new-schedulers)
-
-## Installing Aurora
-The Aurora scheduler is a standalone Java server. As part of the build process it creates a bundle
-of all its dependencies, with the notable exceptions of the JVM and libmesos. Each target server
-should have a JVM (Java 8 or higher) and libmesos (0.25.0) installed.
-
-### Creating the Distribution .zip File (Optional)
-To create a distribution for installation you will need build tools installed. On Ubuntu this can be
-done with `sudo apt-get install build-essential default-jdk`.
-
-    git clone http://git-wip-us.apache.org/repos/asf/aurora.git
-    cd aurora
-    ./gradlew distZip
-
-Copy the generated `dist/distributions/aurora-scheduler-*.zip` to each node that will run a scheduler.
-
-### Installing Aurora
-Extract the aurora-scheduler zip file. The example configurations assume it is extracted to
-`/usr/local/aurora-scheduler`.
-
-    sudo unzip dist/distributions/aurora-scheduler-*.zip -d /usr/local
-    sudo ln -nfs "$(ls -dt /usr/local/aurora-scheduler-* | head -1)" /usr/local/aurora-scheduler
-
-## Configuring Aurora
-
-### A Note on Configuration
-Like Mesos, Aurora uses command-line flags for runtime configuration. As such the Aurora
-"configuration file" is typically a `scheduler.sh` shell script of the form.
-
-    #!/bin/bash
-    AURORA_HOME=/usr/local/aurora-scheduler
-
-    # Flags controlling the JVM.
-    JAVA_OPTS=(
-      -Xmx2g
-      -Xms2g
-      # GC tuning, etc.
-    )
-
-    # Flags controlling the scheduler.
-    AURORA_FLAGS=(
-      -http_port=8081
-      # Log configuration, etc.
-    )
-
-    # Environment variables controlling libmesos
-    export JAVA_HOME=...
-    export GLOG_v=1
-    export LIBPROCESS_PORT=8083
-
-    JAVA_OPTS="${JAVA_OPTS[*]}" exec "$AURORA_HOME/bin/aurora-scheduler" "${AURORA_FLAGS[@]}"
-
-That way Aurora's current flags are visible in `ps` and in the `/vars` admin endpoint.
-
-Examples are available under `examples/scheduler/`. For a list of available Aurora flags and their
-documentation, see [this document](scheduler-configuration.md).
-
-### Replicated Log Configuration
-All Aurora state is persisted to a replicated log. This includes all jobs Aurora is running
-including where in the cluster they are being run and the configuration for running them, as
-well as other information such as metadata needed to reconnect to the Mesos master, resource
-quotas, and any other locks in place.
-
-Aurora schedulers use ZooKeeper to discover log replicas and elect a leader. Only one scheduler is
-leader at a given time - the other schedulers follow log writes and prepare to take over as leader
-but do not communicate with the Mesos master. Either 3 or 5 schedulers are recommended in a
-production deployment depending on failure tolerance and they must have persistent storage.
-
-In a cluster with `N` schedulers, the flag `-native_log_quorum_size` should be set to
-`floor(N/2) + 1`. So in a cluster with 1 scheduler it should be set to `1`, in a cluster with 3 it
-should be set to `2`, and in a cluster of 5 it should be set to `3`.
-
-  Number of schedulers (N) | ```-native_log_quorum_size``` setting (```floor(N/2) + 1```)
-  ------------------------ | -------------------------------------------------------------
-  1                        | 1
-  3                        | 2
-  5                        | 3
-  7                        | 4
-
-*Incorrectly setting this flag will cause data corruption to occur!*
-
-See [this document](storage-config.md#scheduler-storage-configuration-flags) for more replicated
-log and storage configuration options.
-
-## Initializing the Replicated Log
-Before you start Aurora you will also need to initialize the log on a majority of the schedulers.
-
-    mesos-log initialize --path="/path/to/native/log"
-
-The `--path` flag should match the `--native_log_file_path` flag to the scheduler.
-Failing to do this will result the following message when you try to start the scheduler.
-
-    Replica in EMPTY status received a broadcasted recover request
-
-### Storage Performance Considerations
-
-See [this document](scheduler-storage.md) for scheduler storage performance considerations.
-
-### Network considerations
-The Aurora scheduler listens on 2 ports - an HTTP port used for client RPCs and a web UI,
-and a libprocess (HTTP+Protobuf) port used to communicate with the Mesos master and for the log
-replication protocol. These can be left unconfigured (the scheduler publishes all selected ports
-to ZooKeeper) or explicitly set in the startup script as follows:
-
-    # ...
-    AURORA_FLAGS=(
-      # ...
-      -http_port=8081
-      # ...
-    )
-    # ...
-    export LIBPROCESS_PORT=8083
-    # ...
-
-### Considerations for running jobs in docker containers
-In order for Aurora to launch jobs using docker containers, a few extra configuration options
-must be set.  The [docker containerizer](http://mesos.apache.org/documentation/latest/docker-containerizer/)
-must be enabled on the mesos slaves by launching them with the `--containerizers=docker,mesos` option.
-
-By default, Aurora will configure Mesos to copy the file specified in `-thermos_executor_path`
-into the container's sandbox.  If using a wrapper script to launch the thermos executor,
-specify the path to the wrapper in that argument. In addition, the path to the executor pex itself
-must be included in the `-thermos_executor_resources` option. Doing so will ensure that both the
-wrapper script and executor are correctly copied into the sandbox. Finally, ensure the wrapper
-script does not access resources outside of the sandbox, as when the script is run from within a
-docker container those resources will not exist.
-
-In order to correctly execute processes inside a job, the docker container must have python 2.7
-installed.
-
-A scheduler flag, `-global_container_mounts` allows mounting paths from the host (i.e., the slave)
-into all containers on that host. The format is a comma separated list of host_path:container_path[:mode]
-tuples. For example `-global_container_mounts=/opt/secret_keys_dir:/mnt/secret_keys_dir:ro` mounts
-`/opt/secret_keys_dir` from the slaves into all launched containers. Valid modes are `ro` and `rw`.
-
-If you would like to supply your own parameters to `docker run` when launching jobs in docker
-containers, you may use the following flags:
-
-    -allow_docker_parameters
-    -default_docker_parameters
-
-`-allow_docker_parameters` controls whether or not users may pass their own configuration parameters
-through the job configuration files. If set to `false` (the default), the scheduler will reject
-jobs with custom parameters. *NOTE*: this setting should be used with caution as it allows any job
-owner to specify any parameters they wish, including those that may introduce security concerns
-(`privileged=true`, for example).
-
-`-default_docker_parameters` allows a cluster operator to specify a universal set of parameters that
-should be used for every container that does not have parameters explicitly configured at the job
-level. The argument accepts a multimap format:
-
-    -default_docker_parameters="read-only=true,tmpfs=/tmp,tmpfs=/run"
-
-### Process Logs
-
-#### Log destination
-By default, Thermos will write process stdout/stderr to log files in the sandbox. Process object configuration
-allows specifying alternate log file destinations like streamed stdout/stderr or suppression of all log output.
-Default behavior can be configured for the entire cluster with the following flag (through the `-thermos_executor_flags`
-argument to the Aurora scheduler):
-
-    --runner-logger-destination=both
-
-`both` configuration will send logs to files and stream to parent stdout/stderr outputs.
-
-See [this document](configuration-reference.md#logger) for all destination options.
-
-#### Log rotation
-By default, Thermos will not rotate the stdout/stderr logs from child processes and they will grow
-without bound. An individual user may change this behavior via configuration on the Process object,
-but it may also be desirable to change the default configuration for the entire cluster.
-In order to enable rotation by default, the following flags can be applied to Thermos (through the
--thermos_executor_flags argument to the Aurora scheduler):
-
-    --runner-logger-mode=rotate
-    --runner-rotate-log-size-mb=100
-    --runner-rotate-log-backups=10
-
-In the above example, each instance of the Thermos runner will rotate stderr/stdout logs once they
-reach 100 MiB in size and keep a maximum of 10 backups. If a user has provided a custom setting for
-their process, it will override these default settings.
-
-## Running Aurora
-Configure a supervisor like [Monit](http://mmonit.com/monit/) or
-[supervisord](http://supervisord.org/) to run the created `scheduler.sh` file and restart it
-whenever it fails. Aurora expects to be restarted by an external process when it fails. Aurora
-supports an active health checking protocol on its admin HTTP interface - if a `GET /health` times
-out or returns anything other than `200 OK` the scheduler process is unhealthy and should be
-restarted.
-
-For example, monit can be configured with
-
-    if failed port 8081 send "GET /health HTTP/1.0\r\n" expect "OK\n" with timeout 2 seconds for 10 cycles then restart
-
-assuming you set `-http_port=8081`.
-
-## Security Considerations
-
-See [security.md](security.md).
-
-## Configuring Resource Oversubscription
-
-**WARNING**: This feature is currently in alpha status. Do not use it in production clusters!
-See [this document](configuration-reference.md#revocable-jobs) for more feature details.
-
-Set these scheduler flag to allow receiving revocable Mesos offers:
-
-    -receive_revocable_resources=true
-
-Specify a tier configuration file path:
-
-    -tier_config=path/to/tiers/config.json
-
-Default [tier configuration file](../src/main/resources/org/apache/aurora/scheduler/tiers.json).
-
-### Maintaining an Aurora Installation
-
-### Monitoring
-Please see our dedicated [monitoring guide](monitoring.md) for in-depth discussion on monitoring.
-
-### Running stateful services
-Aurora is best suited to run stateless applications, but it also accommodates for stateful services
-like databases, or services that otherwise need to always run on the same machines.
-
-#### Dedicated attribute
-The Mesos slave has the `--attributes` command line argument which can be used to mark a slave with
-static attributes (not to be confused with `--resources`, which are dynamic and accounted).
-
-Aurora makes these attributes available for matching with scheduling
-[constraints](configuration-reference.md#specifying-scheduling-constraints).  Most of these
-constraints are arbitrary and available for custom use.  There is one exception, though: the
-`dedicated` attribute.  Aurora treats this specially, and only allows matching jobs to run on these
-machines, and will only schedule matching jobs on these machines.
-
-See the [section](resources.md#resource-quota) about resource quotas to learn how quotas apply to
-dedicated jobs.
-
-##### Syntax
-The dedicated attribute has semantic meaning. The format is `$role(/.*)?`. When a job is created,
-the scheduler requires that the `$role` component matches the `role` field in the job
-configuration, and will reject the job creation otherwise.  The remainder of the attribute is
-free-form. We've developed the idiom of formatting this attribute as `$role/$job`, but do not
-enforce this. For example: a job `devcluster/www-data/prod/hello` with a dedicated constraint set as
-`www-data/web.multi` will have its tasks scheduled only on Mesos slaves configured with:
-`--attributes=dedicated:www-data/web.multi`.
-
-A wildcard (`*`) may be used for the role portion of the dedicated attribute, which will allow any
-owner to elect for a job to run on the host(s). For example: tasks from both
-`devcluster/www-data/prod/hello` and `devcluster/vagrant/test/hello` with a dedicated constraint
-formatted as `*/web.multi` will be scheduled only on Mesos slaves configured with
-`--attributes=dedicated:*/web.multi`. This may be useful when assembling a virtual cluster of
-machines sharing the same set of traits or requirements.
-
-##### Example
-Consider the following slave command line:
-
-    mesos-slave --attributes="dedicated:db_team/redis" ...
-
-And this job configuration:
-
-    Service(
-      name = 'redis',
-      role = 'db_team',
-      constraints = {
-        'dedicated': 'db_team/redis'
-      }
-      ...
-    )
-
-The job configuration is indicating that it should only be scheduled on slaves with the attribute
-`dedicated:db_team/redis`.  Additionally, Aurora will prevent any tasks that do _not_ have that
-constraint from running on those slaves.
-
-## Best practices
-### Diversity
-Data centers are often organized with hierarchical failure domains.  Common failure domains
-include hosts, racks, rows, and PDUs.  If you have this information available, it is wise to tag
-the mesos-slave with them as
-[attributes](https://mesos.apache.org/documentation/attributes-resources/).
-
-When it comes time to schedule jobs, Aurora will automatically spread them across the failure
-domains as specified in the
-[job configuration](configuration-reference.md#specifying-scheduling-constraints).
-
-Note: in virtualized environments like EC2, the only attribute that usually makes sense for this
-purpose is `host`.
-
-## Common problems
-So you've started your first cluster and are running into some issues? We've collected some common
-stumbling blocks and solutions here to help get you moving.
-
-### Replicated log not initialized
-
-#### Symptoms
-- Scheduler RPCs and web interface claim `Storage is not READY`
-- Scheduler log repeatedly prints messages like
-
-  ```
-  I1016 16:12:27.234133 26081 replica.cpp:638] Replica in EMPTY status
-  received a broadcasted recover request
-  I1016 16:12:27.234256 26084 recover.cpp:188] Received a recover response
-  from a replica in EMPTY status
-  ```
-
-#### Solution
-When you create a new cluster, you need to inform a quorum of schedulers that they are safe to
-consider their database to be empty by [initializing](#initializing-the-replicated-log) the
-replicated log. This is done to prevent the scheduler from modifying the cluster state in the event
-of multiple simultaneous disk failures or, more likely, misconfiguration of the replicated log path.
-
-### Scheduler not registered
-
-#### Symptoms
-Scheduler log contains
-
-    Framework has not been registered within the tolerated delay.
-
-#### Solution
-Double-check that the scheduler is configured correctly to reach the master. If you are registering
-the master in ZooKeeper, make sure command line argument to the master:
-
-    --zk=zk://$ZK_HOST:2181/mesos/master
-
-is the same as the one on the scheduler:
-
-    -mesos_master_address=zk://$ZK_HOST:2181/mesos/master
-
-## Changing Scheduler Quorum Size
-Special care needs to be taken when changing the size of the Aurora scheduler quorum.
-Since Aurora uses a Mesos replicated log, similar steps need to be followed as when
-[changing the mesos quorum size](http://mesos.apache.org/documentation/latest/operational-guide).
-
-### Preparation
-Increase [-native_log_quorum_size](storage-config.md#-native_log_quorum_size) on each
-existing scheduler and restart them. When updating from 3 to 5 schedulers, the quorum size
-would grow from 2 to 3.
-
-### Adding New Schedulers
-Start the new schedulers with `-native_log_quorum_size` set to the new value. Failing to
-first increase the quorum size on running schedulers can in some cases result in corruption
-or truncating of the replicated log used by Aurora. In that case, see the documentation on
-[recovering from backup](storage-config.md#recovering-from-a-scheduler-backup).

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/design-documents.md
----------------------------------------------------------------------
diff --git a/docs/design-documents.md b/docs/design-documents.md
deleted file mode 100644
index 4d14caa..0000000
--- a/docs/design-documents.md
+++ /dev/null
@@ -1,19 +0,0 @@
-# Design Documents
-
-Since its inception as an Apache project, larger feature additions to the
-Aurora code base are discussed in form of design documents. Design documents
-are living documents until a consensus has been reached to implement a feature
-in the proposed form.
-
-Current and past documents:
-
-* [Command Hooks for the Aurora Client](design/command-hooks.md)
-* [Health Checks for Updates](https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit)
-* [JobUpdateDiff thrift API](https://docs.google.com/document/d/1Fc_YhhV7fc4D9Xv6gJzpfooxbK4YWZcvzw6Bd3qVTL8/edit)
-* [REST API RFC](https://docs.google.com/document/d/11_lAsYIRlD5ETRzF2eSd3oa8LXAHYFD8rSetspYXaf4/edit)
-* [Revocable Mesos offers in Aurora](https://docs.google.com/document/d/1r1WCHgmPJp5wbrqSZLsgtxPNj3sULfHrSFmxp2GyPTo/edit)
-* [Supporting the Mesos Universal Containerizer](https://docs.google.com/document/d/111T09NBF2zjjl7HE95xglsDpRdKoZqhCRM5hHmOfTLA/edit?usp=sharing)
-* [Tier Management In Apache Aurora](https://docs.google.com/document/d/1erszT-HsWf1zCIfhbqHlsotHxWUvDyI2xUwNQQQxLgs/edit?usp=sharing)
-* [Ubiquitous Jobs](https://docs.google.com/document/d/12hr6GnUZU3mc7xsWRzMi3nQILGB-3vyUxvbG-6YmvdE/edit)
-
-Design documents can be found in the Aurora issue tracker via the query [`project = AURORA AND text ~ "docs.google.com" ORDER BY created`](https://issues.apache.org/jira/browse/AURORA-1528?jql=project%20%3D%20AURORA%20AND%20text%20~%20%22docs.google.com%22%20ORDER%20BY%20created).

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/design/command-hooks.md
----------------------------------------------------------------------
diff --git a/docs/design/command-hooks.md b/docs/design/command-hooks.md
deleted file mode 100644
index 3f3f70f..0000000
--- a/docs/design/command-hooks.md
+++ /dev/null
@@ -1,102 +0,0 @@
-# Command Hooks for the Aurora Client
-
-## Introduction/Motivation
-
-We've got hooks in the client that surround API calls. These are
-pretty awkward, because they don't correlate with user actions. For
-example, suppose we wanted a policy that said users weren't allowed to
-kill all instances of a production job at once.
-
-Right now, all that we could hook would be the "killJob" api call. But
-kill (at least in newer versions of the client) normally runs in
-batches. If a user called killall, what we would see on the API level
-is a series of "killJob" calls, each of which specified a batch of
-instances. We woudn't be able to distinguish between really killing
-all instances of a job (which is forbidden under this policy), and
-carefully killing in batches (which is permitted.) In each case, the
-hook would just see a series of API calls, and couldn't find out what
-the actual command being executed was!
-
-For most policy enforcement, what we really want to be able to do is
-look at and vet the commands that a user is performing, not the API
-calls that the client uses to implement those commands.
-
-So I propose that we add a new kind of hooks, which surround noun/verb
-commands. A hook will register itself to handle a collection of (noun,
-verb) pairs. Whenever any of those noun/verb commands are invoked, the
-hooks methods will be called around the execution of the verb. A
-pre-hook will have the ability to reject a command, preventing the
-verb from being executed.
-
-## Registering Hooks
-
-These hooks will be registered via configuration plugins. A configuration plugin
-can register hooks using an API. Hooks registered this way are, effectively,
-hardwired into the client executable.
-
-The order of execution of hooks is unspecified: they may be called in
-any order. There is no way to guarantee that one hook will execute
-before some other hook.
-
-
-### Global Hooks
-
-Commands registered by the python call are called _global_ hooks,
-because they will run for all configurations, whether or not they
-specify any hooks in the configuration file.
-
-In the implementation, hooks are registered in the module
-`apache.aurora.client.cli.command_hooks`, using the class
-`GlobalCommandHookRegistry`. A global hook can be registered by calling
-`GlobalCommandHookRegistry.register_command_hook` in a configuration plugin.
-
-### The API
-
-    class CommandHook(object)
-      @property
-      def name(self):
-        """Returns a name for the hook."
-
-      def get_nouns(self):
-        """Return the nouns that have verbs that should invoke this hook."""
-
-      def get_verbs(self, noun):
-        """Return the verbs for a particular noun that should invoke his hook."""
-
-      @abstractmethod
-      def pre_command(self, noun, verb, context, commandline):
-        """Execute a hook before invoking a verb.
-        * noun: the noun being invoked.
-        * verb: the verb being invoked.
-        * context: the context object that will be used to invoke the verb.
-          The options object will be initialized before calling the hook
-        * commandline: the original argv collection used to invoke the client.
-        Returns: True if the command should be allowed to proceed; False if the command
-        should be rejected.
-        """
-
-      def post_command(self, noun, verb, context, commandline, result):
-        """Execute a hook after invoking a verb.
-        * noun: the noun being invoked.
-        * verb: the verb being invoked.
-        * context: the context object that will be used to invoke the verb.
-          The options object will be initialized before calling the hook
-        * commandline: the original argv collection used to invoke the client.
-        * result: the result code returned by the verb.
-        Returns: nothing
-        """
-
-    class GlobalCommandHookRegistry(object):
-      @classmethod
-      def register_command_hook(self, hook):
-        pass
-
-### Skipping Hooks
-
-To skip a hook, a user uses a command-line option, `--skip-hooks`. The option can either
-specify specific hooks to skip, or "all":
-
-* `aurora --skip-hooks=all job create east/bozo/devel/myjob` will create a job
-  without running any hooks.
-* `aurora --skip-hooks=test,iq create east/bozo/devel/myjob` will create a job,
-  and will skip only the hooks named "test" and "iq".

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/developing-aurora-client.md
----------------------------------------------------------------------
diff --git a/docs/developing-aurora-client.md b/docs/developing-aurora-client.md
deleted file mode 100644
index 27f1c97..0000000
--- a/docs/developing-aurora-client.md
+++ /dev/null
@@ -1,93 +0,0 @@
-Getting Started
-===============
-
-The client is written in Python, and uses the
-[Pants](http://pantsbuild.github.io/python-readme.html) build tool.
-
-Client Configuration
-====================
-
-The client uses a configuration file that specifies available clusters. More information about the
-contents of this file can be found in the
-[Client Cluster Configuration](client-cluster-configuration.md) documentation. Information about
-how the client locates this file can be found in the
-[Client Commands](client-commands.md#cluster-configuration) documentation.
-
-Building and Testing the Client
-===============================
-
-Building and testing the client code are both done using Pants. The relevant targets to know about
-are:
-
-   * Build a client executable: `./pants binary src/main/python/apache/aurora/client:aurora`
-   * Test client code: `./pants test src/test/python/apache/aurora/client/cli:all`
-
-If you want to build a source distribution of the client, you need to run `./build-support/release/make-python-sdists`.
-
-Running/Debugging the Client
-============================
-
-For manually testing client changes against a cluster, we use [Vagrant](https://www.vagrantup.com/).
-To start a virtual cluster, you need to install Vagrant, and then run `vagrant up` for the root of
-the aurora workspace. This will create a vagrant host named "devcluster", with a mesos master, a set
-of mesos slaves, and an aurora scheduler.
-
-If you have a change you would like to test in your local cluster, you'll rebuild the client:
-
-    vagrant ssh -c 'aurorabuild client'
-
-Once this completes, the `aurora` command will reflect your changes.
-
-Running/Debugging the Client in PyCharm
-=======================================
-
-It's possible to use PyCharm to run and debug both the client and client tests in an IDE. In order
-to do this, first run:
-
-    build-support/python/make-pycharm-virtualenv
-
-This script will configure a virtualenv with all of our Python requirements. Once the script
-completes it will emit instructions for configuring PyCharm:
-
-    Your PyCharm environment is now set up.  You can open the project root
-    directory with PyCharm.
-
-    Once the project is loaded:
-      - open project settings
-      - click 'Project Interpreter'
-      - click the cog in the upper-right corner
-      - click 'Add Local'
-      - select 'build-support/python/pycharm.venv/bin/python'
-      - click 'OK'
-
-### Running/Debugging Tests
-
-After following these instructions, you should now be able to run/debug tests directly from the IDE
-by right-clicking on a test (or test class) and choosing to run or debug:
-
-[![Debug Client Test](images/debug-client-test.png)](images/debug-client-test.png)
-
-If you've set a breakpoint, you can see the run will now stop and let you debug:
-
-[![Debugging Client Test](images/debugging-client-test.png)](images/debugging-client-test.png)
-
-### Running/Debugging the Client
-
-Actually running and debugging the client is unfortunately a bit more complex. You'll need to create
-a Run configuration:
-
-* Go to Run → Edit Configurations
-* Click the + icon to add a new configuration.
-* Choose python and name the configuration 'client'.
-* Set the script path to `/your/path/to/aurora/src/main/python/apache/aurora/client/cli/client.py`
-* Set the script parameters to the command you want to run (e.g. `job status <job key>`)
-* Expand the Environment section and click the ellipsis to add a new environment variable
-* Click the + at the bottom to add a new variable named AURORA_CONFIG_ROOT whose value is the
-  path where the your cluster configuration can be found. For example, to talk to the scheduler
-  running in the vagrant image, it would be set to `/your/path/to/aurora/examples/vagrant` (this
-  is the directory where our example clusters.json is found).
-* You should now be able to run and debug this configuration!
-
-Making thrift schema changes
-============================
-See [this document](thrift-deprecation.md) for any thrift related changes.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/developing-aurora-scheduler.md
----------------------------------------------------------------------
diff --git a/docs/developing-aurora-scheduler.md b/docs/developing-aurora-scheduler.md
deleted file mode 100644
index a703871..0000000
--- a/docs/developing-aurora-scheduler.md
+++ /dev/null
@@ -1,163 +0,0 @@
-Java code in the aurora repo is built with [Gradle](http://gradle.org).
-
-
-Prerequisite
-============
-
-When using Apache Aurora checked out from the source repository or the binary
-distribution, the Gradle wrapper and JavaScript dependencies are provided.
-However, you need to manually install them when using the source release
-downloads:
-
-1. Install Gradle following the instructions on the [Gradle web site](http://gradle.org)
-2. From the root directory of the Apache Aurora project generate the gradle
-wrapper by running:
-
-    gradle wrapper
-
-
-Getting Started
-===============
-
-You will need Java 8 installed and on your `PATH` or unzipped somewhere with `JAVA_HOME` set. Then
-
-    ./gradlew tasks
-
-will bootstrap the build system and show available tasks. This can take a while the first time you
-run it but subsequent runs will be much faster due to cached artifacts.
-
-Running the Tests
------------------
-Aurora has a comprehensive unit test suite. To run the tests use
-
-    ./gradlew build
-
-Gradle will only re-run tests when dependencies of them have changed. To force a re-run of all
-tests use
-
-    ./gradlew clean build
-
-Running the build with code quality checks
-------------------------------------------
-To speed up development iteration, the plain gradle commands will not run static analysis tools.
-However, you should run these before posting a review diff, and **always** run this before pushing a
-commit to origin/master.
-
-    ./gradlew build -Pq
-
-Running integration tests
--------------------------
-To run the same tests that are run in the Apache Aurora continuous integration
-environment:
-
-    ./build-support/jenkins/build.sh
-
-
-In addition, there is an end-to-end test that runs a suite of aurora commands
-using a virtual cluster:
-
-    ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
-
-
-
-Creating a bundle for deployment
---------------------------------
-Gradle can create a zip file containing Aurora, all of its dependencies, and a launch script with
-
-    ./gradlew distZip
-
-or a tar file containing the same files with
-
-    ./gradlew distTar
-
-The output file will be written to `dist/distributions/aurora-scheduler.zip` or
-`dist/distributions/aurora-scheduler.tar`.
-
-Developing Aurora Java code
-===========================
-
-Setting up an IDE
------------------
-Gradle can generate project files for your IDE. To generate an IntelliJ IDEA project run
-
-    ./gradlew idea
-
-and import the generated `aurora.ipr` file.
-
-Adding or Upgrading a Dependency
---------------------------------
-New dependencies can be added from Maven central by adding a `compile` dependency to `build.gradle`.
-For example, to add a dependency on `com.example`'s `example-lib` 1.0 add this block:
-
-    compile 'com.example:example-lib:1.0'
-
-NOTE: Anyone thinking about adding a new dependency should first familiarize themself with the
-Apache Foundation's third-party licensing
-[policy](http://www.apache.org/legal/resolved.html#category-x).
-
-Developing Aurora UI
-======================
-
-Installing bower (optional)
-----------------------------
-Third party JS libraries used in Aurora (located at 3rdparty/javascript/bower_components) are
-managed by bower, a JS dependency manager. Bower is only required if you plan to add, remove or
-update JS libraries. Bower can be installed using the following command:
-
-    npm install -g bower
-
-Bower depends on node.js and npm. The easiest way to install node on a mac is via brew:
-
-    brew install node
-
-For more node.js installation options refer to https://github.com/joyent/node/wiki/Installation.
-
-More info on installing and using bower can be found at: http://bower.io/. Once installed, you can
-use the following commands to view and modify the bower repo at
-3rdparty/javascript/bower_components
-
-    bower list
-    bower install <library name>
-    bower remove <library name>
-    bower update <library name>
-    bower help
-
-Faster Iteration in Vagrant
----------------------------
-The scheduler serves UI assets from the classpath. For production deployments this means the assets
-are served from within a jar. However, for faster development iteration, the vagrant image is
-configured to add the `scheduler` subtree of `/vagrant/dist/resources/main` to the head of
-`CLASSPATH`. This path is configured as a shared filesystem to the path on the host system where
-your Aurora repository lives. This means that any updates under `dist/resources/main/scheduler` in
-your checkout will be reflected immediately in the UI served from within the vagrant image.
-
-The one caveat to this is that this path is under `dist` not `src`. This is because the assets must
-be processed by gradle before they can be served. So, unfortunately, you cannot just save your local
-changes and see them reflected in the UI, you must first run `./gradlew processResources`. This is
-less than ideal, but better than having to restart the scheduler after every change. Additionally,
-gradle makes this process somewhat easier with the use of the `--continuous` flag. If you run:
-`./gradlew processResources --continuous` gradle will monitor the filesystem for changes and run the
-task automatically as necessary. This doesn't quite provide hot-reload capabilities, but it does
-allow for <5s from save to changes being visibile in the UI with no further action required on the
-part of the developer.
-
-Developing the Aurora Build System
-==================================
-
-Bootstrapping Gradle
---------------------
-The following files were autogenerated by `gradle wrapper` using gradle 1.8's
-[Wrapper](http://www.gradle.org/docs/1.8/dsl/org.gradle.api.tasks.wrapper.Wrapper.html) plugin and
-should not be modified directly:
-
-    ./gradlew
-    ./gradlew.bat
-    ./gradle/wrapper/gradle-wrapper.jar
-    ./gradle/wrapper/gradle-wrapper.properties
-
-To upgrade Gradle unpack the new version somewhere, run `/path/to/new/gradle wrapper` in the
-repository root and commit the changed files.
-
-Making thrift schema changes
-============================
-See [this document](thrift-deprecation.md) for any thrift related changes.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/development/client.md
----------------------------------------------------------------------
diff --git a/docs/development/client.md b/docs/development/client.md
new file mode 100644
index 0000000..a5fee37
--- /dev/null
+++ b/docs/development/client.md
@@ -0,0 +1,81 @@
+Developing the Aurora Client
+============================
+
+The client is written in Python, and uses the
+[Pants](http://pantsbuild.github.io/python-readme.html) build tool.
+
+
+Building and Testing
+--------------------
+
+Building and testing the client code are both done using Pants. The relevant targets to know about
+are:
+
+   * Build a client executable: `./pants binary src/main/python/apache/aurora/client:aurora`
+   * Test client code: `./pants test src/test/python/apache/aurora/client/cli:all`
+
+If you want to build a source distribution of the client, you need to run `./build-support/release/make-python-sdists`.
+
+
+Running/Debugging
+------------------
+
+For manually testing client changes against a cluster, we use [Vagrant](https://www.vagrantup.com/).
+To start a virtual cluster, you need to install Vagrant, and then run `vagrant up` for the root of
+the aurora workspace. This will create a vagrant host named "devcluster", with a mesos master, a set
+of mesos slaves, and an aurora scheduler.
+
+If you have a change you would like to test in your local cluster, you'll rebuild the client:
+
+    vagrant ssh -c 'aurorabuild client'
+
+Once this completes, the `aurora` command will reflect your changes.
+
+
+Running/Debugging in PyCharm
+-----------------------------
+
+It's possible to use PyCharm to run and debug both the client and client tests in an IDE. In order
+to do this, first run:
+
+    build-support/python/make-pycharm-virtualenv
+
+This script will configure a virtualenv with all of our Python requirements. Once the script
+completes it will emit instructions for configuring PyCharm:
+
+    Your PyCharm environment is now set up.  You can open the project root
+    directory with PyCharm.
+
+    Once the project is loaded:
+      - open project settings
+      - click 'Project Interpreter'
+      - click the cog in the upper-right corner
+      - click 'Add Local'
+      - select 'build-support/python/pycharm.venv/bin/python'
+      - click 'OK'
+
+### Running/Debugging Tests
+After following these instructions, you should now be able to run/debug tests directly from the IDE
+by right-clicking on a test (or test class) and choosing to run or debug:
+
+[![Debug Client Test](../images/debug-client-test.png)](../images/debug-client-test.png)
+
+If you've set a breakpoint, you can see the run will now stop and let you debug:
+
+[![Debugging Client Test](../images/debugging-client-test.png)](../images/debugging-client-test.png)
+
+### Running/Debugging the Client
+Actually running and debugging the client is unfortunately a bit more complex. You'll need to create
+a Run configuration:
+
+* Go to Run → Edit Configurations
+* Click the + icon to add a new configuration.
+* Choose python and name the configuration 'client'.
+* Set the script path to `/your/path/to/aurora/src/main/python/apache/aurora/client/cli/client.py`
+* Set the script parameters to the command you want to run (e.g. `job status <job key>`)
+* Expand the Environment section and click the ellipsis to add a new environment variable
+* Click the + at the bottom to add a new variable named AURORA_CONFIG_ROOT whose value is the
+  path where the your cluster configuration can be found. For example, to talk to the scheduler
+  running in the vagrant image, it would be set to `/your/path/to/aurora/examples/vagrant` (this
+  is the directory where our example clusters.json is found).
+* You should now be able to run and debug this configuration!

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/development/committers-guide.md
----------------------------------------------------------------------
diff --git a/docs/development/committers-guide.md b/docs/development/committers-guide.md
new file mode 100644
index 0000000..70f67a6
--- /dev/null
+++ b/docs/development/committers-guide.md
@@ -0,0 +1,86 @@
+Committer's Guide
+=================
+
+Information for official Apache Aurora committers.
+
+Setting up your email account
+-----------------------------
+Once your Apache ID has been set up you can configure your account and add ssh keys and setup an
+email forwarding address at
+
+  http://id.apache.org
+
+Additional instructions for setting up your new committer email can be found at
+
+  http://www.apache.org/dev/user-email.html
+
+The recommended setup is to configure all services (mailing lists, JIRA, ReviewBoard) to send
+emails to your @apache.org email address.
+
+
+Creating a gpg key for releases
+-------------------------------
+In order to create a release candidate you will need a gpg key published to an external key server
+and that key will need to be added to our KEYS file as well.
+
+1. Create a key:
+
+               gpg --gen-key
+
+2. Add your gpg key to the Apache Aurora KEYS file:
+
+               git clone https://git-wip-us.apache.org/repos/asf/aurora.git
+               (gpg --list-sigs <KEY ID> && gpg --armor --export <KEY ID>) >> KEYS
+               git add KEYS && git commit -m "Adding gpg key for <APACHE ID>"
+               ./rbt post -o -g
+
+3. Publish the key to an external key server:
+
+               gpg --keyserver pgp.mit.edu --send-keys <KEY ID>
+
+4. Update the changes to the KEYS file to the Apache Aurora svn dist locations listed below:
+
+               https://dist.apache.org/repos/dist/dev/aurora/KEYS
+               https://dist.apache.org/repos/dist/release/aurora/KEYS
+
+5. Add your key to git config for use with the release scripts:
+
+               git config --global user.signingkey <KEY ID>
+
+
+Creating a release
+------------------
+The following will guide you through the steps to create a release candidate, vote, and finally an
+official Apache Aurora release. Before starting your gpg key should be in the KEYS file and you
+must have access to commit to the dist.a.o repositories.
+
+1. Ensure that all issues resolved for this release candidate are tagged with the correct Fix
+Version in Jira, the changelog script will use this to generate the CHANGELOG in step #2.
+
+2. Create a release candidate. This will automatically update the CHANGELOG and commit it, create a
+branch and update the current version within the trunk. To create a minor version update and publish
+it run
+
+               ./build-support/release/release-candidate -l m -p
+
+3. Update, if necessary, the draft email created from the `release-candidate` script in step #2 and
+send the [VOTE] email to the dev@ mailing list. You can verify the release signature and checksums
+by running
+
+               ./build-support/release/verify-release-candidate
+
+4. Wait for the vote to complete. If the vote fails close the vote by replying to the initial [VOTE]
+email sent in step #3 by editing the subject to [RESULT][VOTE] ... and noting the failure reason
+(example [here](http://markmail.org/message/d4d6xtvj7vgwi76f)). Now address any issues and go back to
+step #1 and run again, this time you will use the -r flag to increment the release candidate
+version. This will automatically clean up the release candidate rc0 branch and source distribution.
+
+               ./build-support/release/release-candidate -l m -r 1 -p
+
+5. Once the vote has successfully passed create the release
+
+               ./build-support/release/release
+
+6. Update the draft email created fom the `release` script in step #5 to include the Apache ID's for
+all binding votes and send the [RESULT][VOTE] email to the dev@ mailing list.
+

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/development/design-documents.md
----------------------------------------------------------------------
diff --git a/docs/development/design-documents.md b/docs/development/design-documents.md
new file mode 100644
index 0000000..b01cfd7
--- /dev/null
+++ b/docs/development/design-documents.md
@@ -0,0 +1,20 @@
+Design Documents
+================
+
+Since its inception as an Apache project, larger feature additions to the
+Aurora code base are discussed in form of design documents. Design documents
+are living documents until a consensus has been reached to implement a feature
+in the proposed form.
+
+Current and past documents:
+
+* [Command Hooks for the Aurora Client](design/command-hooks.md)
+* [Health Checks for Updates](https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit)
+* [JobUpdateDiff thrift API](https://docs.google.com/document/d/1Fc_YhhV7fc4D9Xv6gJzpfooxbK4YWZcvzw6Bd3qVTL8/edit)
+* [REST API RFC](https://docs.google.com/document/d/11_lAsYIRlD5ETRzF2eSd3oa8LXAHYFD8rSetspYXaf4/edit)
+* [Revocable Mesos offers in Aurora](https://docs.google.com/document/d/1r1WCHgmPJp5wbrqSZLsgtxPNj3sULfHrSFmxp2GyPTo/edit)
+* [Supporting the Mesos Universal Containerizer](https://docs.google.com/document/d/111T09NBF2zjjl7HE95xglsDpRdKoZqhCRM5hHmOfTLA/edit?usp=sharing)
+* [Tier Management In Apache Aurora](https://docs.google.com/document/d/1erszT-HsWf1zCIfhbqHlsotHxWUvDyI2xUwNQQQxLgs/edit?usp=sharing)
+* [Ubiquitous Jobs](https://docs.google.com/document/d/12hr6GnUZU3mc7xsWRzMi3nQILGB-3vyUxvbG-6YmvdE/edit)
+
+Design documents can be found in the Aurora issue tracker via the query [`project = AURORA AND text ~ "docs.google.com" ORDER BY created`](https://issues.apache.org/jira/browse/AURORA-1528?jql=project%20%3D%20AURORA%20AND%20text%20~%20%22docs.google.com%22%20ORDER%20BY%20created).


[2/7] aurora git commit: Reorganize Documentation

Posted by se...@apache.org.
http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/reference/configuration.md
----------------------------------------------------------------------
diff --git a/docs/reference/configuration.md b/docs/reference/configuration.md
new file mode 100644
index 0000000..fa09fd4
--- /dev/null
+++ b/docs/reference/configuration.md
@@ -0,0 +1,573 @@
+Aurora Configuration Reference
+==============================
+
+Don't know where to start? The Aurora configuration schema is very
+powerful, and configurations can become quite complex for advanced use
+cases.
+
+For examples of simple configurations to get something up and running
+quickly, check out the [Tutorial](../getting-started/tutorial.md). When you feel comfortable with the basics, move
+on to the [Configuration Tutorial](configuration-tutorial.md) for more in-depth coverage of
+configuration design.
+
+- [Process Schema](#process-schema)
+    - [Process Objects](#process-objects)
+- [Task Schema](#task-schema)
+    - [Task Object](#task-object)
+    - [Constraint Object](#constraint-object)
+    - [Resource Object](#resource-object)
+- [Job Schema](#job-schema)
+    - [Job Objects](#job-objects)
+    - [UpdateConfig Objects](#updateconfig-objects)
+    - [HealthCheckConfig Objects](#healthcheckconfig-objects)
+    - [Announcer Objects](#announcer-objects)
+    - [Container Objects](#container)
+    - [LifecycleConfig Objects](#lifecycleconfig-objects)
+- [Specifying Scheduling Constraints](#specifying-scheduling-constraints)
+- [Template Namespaces](#template-namespaces)
+    - [mesos Namespace](#mesos-namespace)
+    - [thermos Namespace](#thermos-namespace)
+
+
+Process Schema
+==============
+
+Process objects consist of required `name` and `cmdline` attributes. You can customize Process
+behavior with its optional attributes. Remember, Processes are handled by Thermos.
+
+### Process Objects
+
+  **Attribute Name**  | **Type**    | **Description**
+  ------------------- | :---------: | ---------------------------------
+   **name**           | String      | Process name (Required)
+   **cmdline**        | String      | Command line (Required)
+   **max_failures**   | Integer     | Maximum process failures (Default: 1)
+   **daemon**         | Boolean     | When True, this is a daemon process. (Default: False)
+   **ephemeral**      | Boolean     | When True, this is an ephemeral process. (Default: False)
+   **min_duration**   | Integer     | Minimum duration between process restarts in seconds. (Default: 15)
+   **final**          | Boolean     | When True, this process is a finalizing one that should run last. (Default: False)
+   **logger**         | Logger      | Struct defining the log behavior for the process. (Default: Empty)
+
+#### name
+
+The name is any valid UNIX filename string (specifically no
+slashes, NULLs or leading periods). Within a Task object, each Process name
+must be unique.
+
+#### cmdline
+
+The command line run by the process. The command line is invoked in a bash
+subshell, so can involve fully-blown bash scripts. However, nothing is
+supplied for command-line arguments so `$*` is unspecified.
+
+#### max_failures
+
+The maximum number of failures (non-zero exit statuses) this process can
+have before being marked permanently failed and not retried. If a
+process permanently fails, Thermos looks at the failure limit of the task
+containing the process (usually 1) to determine if the task has
+failed as well.
+
+Setting `max_failures` to 0 makes the process retry
+indefinitely until it achieves a successful (zero) exit status.
+It retries at most once every `min_duration` seconds to prevent
+an effective denial of service attack on the coordinating Thermos scheduler.
+
+#### daemon
+
+By default, Thermos processes are non-daemon. If `daemon` is set to True, a
+successful (zero) exit status does not prevent future process runs.
+Instead, the process reinvokes after `min_duration` seconds.
+However, the maximum failure limit still applies. A combination of
+`daemon=True` and `max_failures=0` causes a process to retry
+indefinitely regardless of exit status. This should be avoided
+for very short-lived processes because of the accumulation of
+checkpointed state for each process run. When running in Mesos
+specifically, `max_failures` is capped at 100.
+
+#### ephemeral
+
+By default, Thermos processes are non-ephemeral. If `ephemeral` is set to
+True, the process' status is not used to determine if its containing task
+has completed. For example, consider a task with a non-ephemeral
+webserver process and an ephemeral logsaver process
+that periodically checkpoints its log files to a centralized data store.
+The task is considered finished once the webserver process has
+completed, regardless of the logsaver's current status.
+
+#### min_duration
+
+Processes may succeed or fail multiple times during a single task's
+duration. Each of these is called a *process run*. `min_duration` is
+the minimum number of seconds the scheduler waits before running the
+same process.
+
+#### final
+
+Processes can be grouped into two classes: ordinary processes and
+finalizing processes. By default, Thermos processes are ordinary. They
+run as long as the task is considered healthy (i.e., no failure
+limits have been reached.) But once all regular Thermos processes
+finish or the task reaches a certain failure threshold, it
+moves into a "finalization" stage and runs all finalizing
+processes. These are typically processes necessary for cleaning up the
+task, such as log checkpointers, or perhaps e-mail notifications that
+the task completed.
+
+Finalizing processes may not depend upon ordinary processes or
+vice-versa, however finalizing processes may depend upon other
+finalizing processes and otherwise run as a typical process
+schedule.
+
+#### logger
+
+The default behavior of Thermos is to store  stderr/stdout logs in files which grow unbounded.
+In the event that you have large log volume, you may want to configure Thermos to automatically rotate logs
+after they grow to a certain size, which can prevent your job from using more than its allocated
+disk space.
+
+A Logger union consists of a destination enum, a mode enum and a rotation policy.
+It's to set where the process logs should be sent using `destination`. Default
+option is `file`. Its also possible to specify `console` to get logs output
+to stdout/stderr, `none` to suppress any logs output or `both` to send logs to files and
+console output. In case of using `none` or `console` rotation attributes are ignored.
+Rotation policies only apply to loggers whose mode is `rotate`. The acceptable values
+for the LoggerMode enum are `standard` and `rotate`. The rotation policy applies to both
+stderr and stdout.
+
+By default, all processes use the `standard` LoggerMode.
+
+  **Attribute Name**  | **Type**          | **Description**
+  ------------------- | :---------------: | ---------------------------------
+   **destination**    | LoggerDestination | Destination of logs. (Default: `file`)
+   **mode**           | LoggerMode        | Mode of the logger. (Default: `standard`)
+   **rotate**         | RotatePolicy      | An optional rotation policy.
+
+A RotatePolicy describes log rotation behavior for when `mode` is set to `rotate`. It is ignored
+otherwise.
+
+  **Attribute Name**  | **Type**     | **Description**
+  ------------------- | :----------: | ---------------------------------
+   **log_size**       | Integer      | Maximum size (in bytes) of an individual log file. (Default: 100 MiB)
+   **backups**        | Integer      | The maximum number of backups to retain. (Default: 5)
+
+An example process configuration is as follows:
+
+        process = Process(
+          name='process',
+          logger=Logger(
+            destination=LoggerDestination('both'),
+            mode=LoggerMode('rotate'),
+            rotate=RotatePolicy(log_size=5*MB, backups=5)
+          )
+        )
+
+Task Schema
+===========
+
+Tasks fundamentally consist of a `name` and a list of Process objects stored as the
+value of the `processes` attribute. Processes can be further constrained with
+`constraints`. By default, `name`'s value inherits from the first Process in the
+`processes` list, so for simple `Task` objects with one Process, `name`
+can be omitted. In Mesos, `resources` is also required.
+
+### Task Object
+
+   **param**               | **type**                         | **description**
+   ---------               | :---------:                      | ---------------
+   ```name```              | String                           | Process name (Required) (Default: ```processes0.name```)
+   ```processes```         | List of ```Process``` objects    | List of ```Process``` objects bound to this task. (Required)
+   ```constraints```       | List of ```Constraint``` objects | List of ```Constraint``` objects constraining processes.
+   ```resources```         | ```Resource``` object            | Resource footprint. (Required)
+   ```max_failures```      | Integer                          | Maximum process failures before being considered failed (Default: 1)
+   ```max_concurrency```   | Integer                          | Maximum number of concurrent processes (Default: 0, unlimited concurrency.)
+   ```finalization_wait``` | Integer                          | Amount of time allocated for finalizing processes, in seconds. (Default: 30)
+
+#### name
+`name` is a string denoting the name of this task. It defaults to the name of the first Process in
+the list of Processes associated with the `processes` attribute.
+
+#### processes
+
+`processes` is an unordered list of `Process` objects. To constrain the order
+in which they run, use `constraints`.
+
+##### constraints
+
+A list of `Constraint` objects. Currently it supports only one type,
+the `order` constraint. `order` is a list of process names
+that should run in the order given. For example,
+
+        process = Process(cmdline = "echo hello {{name}}")
+        task = Task(name = "echoes",
+                    processes = [process(name = "jim"), process(name = "bob")],
+                    constraints = [Constraint(order = ["jim", "bob"]))
+
+Constraints can be supplied ad-hoc and in duplicate. Not all
+Processes need be constrained, however Tasks with cycles are
+rejected by the Thermos scheduler.
+
+Use the `order` function as shorthand to generate `Constraint` lists.
+The following:
+
+        order(process1, process2)
+
+is shorthand for
+
+        [Constraint(order = [process1.name(), process2.name()])]
+
+The `order` function accepts Process name strings `('foo', 'bar')` or the processes
+themselves, e.g. `foo=Process(name='foo', ...)`, `bar=Process(name='bar', ...)`,
+`constraints=order(foo, bar)`.
+
+#### resources
+
+Takes a `Resource` object, which specifies the amounts of CPU, memory, and disk space resources
+to allocate to the Task.
+
+#### max_failures
+
+`max_failures` is the number of failed processes needed for the `Task` to be
+marked as failed.
+
+For example, assume a Task has two Processes and a `max_failures` value of `2`:
+
+        template = Process(max_failures=10)
+        task = Task(
+          name = "fail",
+          processes = [
+             template(name = "failing", cmdline = "exit 1"),
+             template(name = "succeeding", cmdline = "exit 0")
+          ],
+          max_failures=2)
+
+The `failing` Process could fail 10 times before being marked as permanently
+failed, and the `succeeding` Process could succeed on the first run. However,
+the task would succeed despite only allowing for two failed processes. To be more
+specific, there would be 10 failed process runs yet 1 failed process. Both processes
+would have to fail for the Task to fail.
+
+#### max_concurrency
+
+For Tasks with a number of expensive but otherwise independent
+processes, you may want to limit the amount of concurrency
+the Thermos scheduler provides rather than artificially constraining
+it via `order` constraints. For example, a test framework may
+generate a task with 100 test run processes, but wants to run it on
+a machine with only 4 cores. You can limit the amount of parallelism to
+4 by setting `max_concurrency=4` in your task configuration.
+
+For example, the following task spawns 180 Processes ("mappers")
+to compute individual elements of a 180 degree sine table, all dependent
+upon one final Process ("reducer") to tabulate the results:
+
+    def make_mapper(id):
+      return Process(
+        name = "mapper%03d" % id,
+        cmdline = "echo 'scale=50;s(%d\*4\*a(1)/180)' | bc -l >
+                   temp.sine_table.%03d" % (id, id))
+
+    def make_reducer():
+      return Process(name = "reducer", cmdline = "cat temp.\* | nl \> sine\_table.txt
+                     && rm -f temp.\*")
+
+    processes = map(make_mapper, range(180))
+
+    task = Task(
+      name = "mapreduce",
+      processes = processes + [make\_reducer()],
+      constraints = [Constraint(order = [mapper.name(), 'reducer']) for mapper
+                     in processes],
+      max_concurrency = 8)
+
+#### finalization_wait
+
+Process execution is organizued into three active stages: `ACTIVE`,
+`CLEANING`, and `FINALIZING`. The `ACTIVE` stage is when ordinary processes run.
+This stage lasts as long as Processes are running and the Task is healthy.
+The moment either all Processes have finished successfully or the Task has reached a
+maximum Process failure limit, it goes into `CLEANING` stage and send
+SIGTERMs to all currently running Processes and their process trees.
+Once all Processes have terminated, the Task goes into `FINALIZING` stage
+and invokes the schedule of all Processes with the "final" attribute set to True.
+
+This whole process from the end of `ACTIVE` stage to the end of `FINALIZING`
+must happen within `finalization_wait` seconds. If it does not
+finish during that time, all remaining Processes are sent SIGKILLs
+(or if they depend upon uncompleted Processes, are
+never invoked.)
+
+When running on Aurora, the `finalization_wait` is capped at 60 seconds.
+
+### Constraint Object
+
+Current constraint objects only support a single ordering constraint, `order`,
+which specifies its processes run sequentially in the order given. By
+default, all processes run in parallel when bound to a `Task` without
+ordering constraints.
+
+   param | type           | description
+   ----- | :----:         | -----------
+   order | List of String | List of processes by name (String) that should be run serially.
+
+### Resource Object
+
+Specifies the amount of CPU, Ram, and disk resources the task needs. See the
+[Resource Isolation document](../features/resource-isolation.md) for suggested values and to understand how
+resources are allocated.
+
+  param      | type    | description
+  -----      | :----:  | -----------
+  ```cpu```  | Float   | Fractional number of cores required by the task.
+  ```ram```  | Integer | Bytes of RAM required by the task.
+  ```disk``` | Integer | Bytes of disk required by the task.
+
+
+Job Schema
+==========
+
+### Job Objects
+
+   name | type | description
+   ------ | :-------: | -------
+  ```task``` | Task | The Task object to bind to this job. Required.
+  ```name``` | String | Job name. (Default: inherited from the task attribute's name)
+  ```role``` | String | Job role account. Required.
+  ```cluster``` | String | Cluster in which this job is scheduled. Required.
+   ```environment``` | String | Job environment, default ```devel```. Must be one of ```prod```, ```devel```, ```test``` or ```staging<number>```.
+  ```contact``` | String | Best email address to reach the owner of the job. For production jobs, this is usually a team mailing list.
+  ```instances```| Integer | Number of instances (sometimes referred to as replicas or shards) of the task to create. (Default: 1)
+   ```cron_schedule``` | String | Cron schedule in cron format. May only be used with non-service jobs. See [Cron Jobs](cron-jobs.md) for more information. Default: None (not a cron job.)
+  ```cron_collision_policy``` | String | Policy to use when a cron job is triggered while a previous run is still active. KILL_EXISTING Kill the previous run, and schedule the new run CANCEL_NEW Let the previous run continue, and cancel the new run. (Default: KILL_EXISTING)
+  ```update_config``` | ```UpdateConfig``` object | Parameters for controlling the rate and policy of rolling updates.
+  ```constraints``` | dict | Scheduling constraints for the tasks. See the section on the [constraint specification language](#Specifying-Scheduling-Constraints)
+  ```service``` | Boolean | If True, restart tasks regardless of success or failure. (Default: False)
+  ```max_task_failures``` | Integer | Maximum number of failures after which the task is considered to have failed (Default: 1) Set to -1 to allow for infinite failures
+  ```priority``` | Integer | Preemption priority to give the task (Default 0). Tasks with higher priorities may preempt tasks at lower priorities.
+  ```production``` | Boolean |  Whether or not this is a production task that may [preempt](resources.md#task-preemption) other tasks (Default: False). Production job role must have the appropriate [quota](resources.md#resource-quota).
+  ```health_check_config``` | ```HealthCheckConfig``` object | Parameters for controlling a task's health checks. HTTP health check is only used if a  health port was assigned with a command line wildcard.
+  ```container``` | ```Container``` object | An optional container to run all processes inside of.
+  ```lifecycle``` | ```LifecycleConfig``` object | An optional task lifecycle configuration that dictates commands to be executed on startup/teardown.  HTTP lifecycle is enabled by default if the "health" port is requested.  See [LifecycleConfig Objects](#lifecycleconfig-objects) for more information.
+  ```tier``` | String | Task tier type. When set to `revocable` requires the task to run with Mesos revocable resources. This is work [in progress](https://issues.apache.org/jira/browse/AURORA-1343) and is currently only supported for the revocable tasks. The ultimate goal is to simplify task configuration by hiding various configuration knobs behind a task tier definition. See AURORA-1343 and AURORA-1443 for more details.
+
+
+### UpdateConfig Objects
+
+Parameters for controlling the rate and policy of rolling updates.
+
+| object                       | type     | description
+| ---------------------------- | :------: | ------------
+| ```batch_size```             | Integer  | Maximum number of shards to be updated in one iteration (Default: 1)
+| ```watch_secs```             | Integer  | Minimum number of seconds a shard must remain in ```RUNNING``` state before considered a success (Default: 45)
+| ```max_per_shard_failures``` | Integer  | Maximum number of restarts per shard during update. Increments total failure count when this limit is exceeded. (Default: 0)
+| ```max_total_failures```     | Integer  | Maximum number of shard failures to be tolerated in total during an update. Cannot be greater than or equal to the total number of tasks in a job. (Default: 0)
+| ```rollback_on_failure```    | boolean  | When False, prevents auto rollback of a failed update (Default: True)
+| ```wait_for_batch_completion```| boolean | When True, all threads from a given batch will be blocked from picking up new instances until the entire batch is updated. This essentially simulates the legacy sequential updater algorithm. (Default: False)
+| ```pulse_interval_secs```    | Integer  |  Indicates a [coordinated update](client-commands.md#user-content-coordinated-job-updates). If no pulses are received within the provided interval the update will be blocked. Beta-updater only. Will fail on submission when used with client updater. (Default: None)
+
+### HealthCheckConfig Objects
+
+*Note: ```endpoint```, ```expected_response``` and ```expected_response_code``` are deprecated from ```HealthCheckConfig``` and must be definied in ```HttpHealthChecker```.*
+
+Parameters for controlling a task's health checks via HTTP or a shell command.
+
+| param                          | type      | description
+| -------                        | :-------: | --------
+| ```health_checker```           | HealthCheckerConfig | Configure what kind of health check to use.
+| ```initial_interval_secs```    | Integer   | Initial delay for performing a health check. (Default: 15)
+| ```interval_secs```            | Integer   | Interval on which to check the task's health. (Default: 10)
+| ```max_consecutive_failures``` | Integer   | Maximum number of consecutive failures that will be tolerated before considering a task unhealthy (Default: 0)
+| ```timeout_secs```             | Integer   | Health check timeout. (Default: 1)
+
+### HealthCheckerConfig Objects
+| param                          | type                | description
+| -------                        | :-------:           | --------
+| ```http```                     | HttpHealthChecker  | Configure health check to use HTTP. (Default)
+| ```shell```                    | ShellHealthChecker | Configure health check via a shell command.
+
+### HttpHealthChecker Objects
+| param                          | type      | description
+| -------                        | :-------: | --------
+| ```endpoint```                 | String    | HTTP endpoint to check (Default: /health)
+| ```expected_response```        | String    | If not empty, fail the HTTP health check if the response differs. Case insensitive. (Default: ok)
+| ```expected_response_code```   | Integer   | If not zero, fail the HTTP health check if the response code differs. (Default: 0)
+
+### ShellHealthChecker Objects
+| param                          | type      | description
+| -------                        | :-------: | --------
+| ```shell_command```            | String    | An alternative to HTTP health checking. Specifies a shell command that will be executed. Any non-zero exit status will be interpreted as a health check failure.
+
+
+### Announcer Objects
+
+If the `announce` field in the Job configuration is set, each task will be
+registered in the ServerSet `/aurora/role/environment/jobname` in the
+zookeeper ensemble configured by the executor (which can be optionally overriden by specifying
+`zk_path` parameter).  If no Announcer object is specified,
+no announcement will take place.  For more information about ServerSets, see the [Service Discover](../features/service-discovery.md)
+documentation.
+
+By default, the hostname in the registered endpoints will be the `--hostname` parameter
+that is passed to the mesos slave. To override the hostname value, the executor can be started
+with `--announcer-hostname=<overriden_value>`. If you decide to use `--announcer-hostname` and if
+the overriden value needs to change for every executor, then the executor has to be started inside a wrapper, see [Executor Wrapper](../operations/configuration.md#thermos-executor-wrapper).
+
+For example, if you want the hostname in the endpoint to be an IP address instead of the hostname,
+the `--hostname` parameter to the mesos slave can be set to the machine IP or the executor can
+be started with `--announcer-hostname=<host_ip>` while wrapping the executor inside a script.
+
+| object                         | type      | description
+| -------                        | :-------: | --------
+| ```primary_port```             | String    | Which named port to register as the primary endpoint in the ServerSet (Default: `http`)
+| ```portmap```                  | dict      | A mapping of additional endpoints to be announced in the ServerSet (Default: `{ 'aurora': '{{primary_port}}' }`)
+| ```zk_path```                  | String    | Zookeeper serverset path override (executor must be started with the `--announcer-allow-custom-serverset-path` parameter)
+
+#### Port aliasing with the Announcer `portmap`
+
+The primary endpoint registered in the ServerSet is the one allocated to the port
+specified by the `primary_port` in the `Announcer` object, by default
+the `http` port.  This port can be referenced from anywhere within a configuration
+as `{{thermos.ports[http]}}`.
+
+Without the port map, each named port would be allocated a unique port number.
+The `portmap` allows two different named ports to be aliased together.  The default
+`portmap` aliases the `aurora` port (i.e. `{{thermos.ports[aurora]}}`) to
+the `http` port.  Even though the two ports can be referenced independently,
+only one port is allocated by Mesos.  Any port referenced in a `Process` object
+but which is not in the portmap will be allocated dynamically by Mesos and announced as well.
+
+It is possible to use the portmap to alias names to static port numbers, e.g.
+`{'http': 80, 'https': 443, 'aurora': 'http'}`.  In this case, referencing
+`{{thermos.ports[aurora]}}` would look up `{{thermos.ports[http]}}` then
+find a static port 80.  No port would be requested of or allocated by Mesos.
+
+Static ports should be used cautiously as Aurora does nothing to prevent two
+tasks with the same static port allocations from being co-scheduled.
+External constraints such as slave attributes should be used to enforce such
+guarantees should they be needed.
+
+### Container Objects
+
+*Note: The only container type currently supported is "docker".  Docker support is currently EXPERIMENTAL.*
+*Note: In order to correctly execute processes inside a job, the Docker container must have python 2.7 installed.*
+
+*Note: For private docker registry, mesos mandates the docker credential file to be named as `.dockercfg`, even though docker may create a credential file with a different name on various platforms. Also, the `.dockercfg` file needs to be copied into the sandbox using the `-thermos_executor_resources` flag, specified while starting Aurora.*
+
+Describes the container the job's processes will run inside.
+
+  param          | type           | description
+  -----          | :----:         | -----------
+  ```docker```   | Docker         | A docker container to use.
+
+### Docker Object
+
+  param            | type            | description
+  -----            | :----:          | -----------
+  ```image```      | String          | The name of the docker image to execute.  If the image does not exist locally it will be pulled with ```docker pull```.
+  ```parameters``` | List(Parameter) | Additional parameters to pass to the docker containerizer.
+
+### Docker Parameter Object
+
+Docker CLI parameters. This needs to be enabled by the scheduler `allow_docker_parameters` option.
+See [Docker Command Line Reference](https://docs.docker.com/reference/commandline/run/) for valid parameters.
+
+  param            | type            | description
+  -----            | :----:          | -----------
+  ```name```       | String          | The name of the docker parameter. E.g. volume
+  ```value```      | String          | The value of the parameter. E.g. /usr/local/bin:/usr/bin:rw
+
+### LifecycleConfig Objects
+
+*Note: The only lifecycle configuration supported is the HTTP lifecycle via the HttpLifecycleConfig.*
+
+  param          | type                | description
+  -----          | :----:              | -----------
+  ```http```     | HttpLifecycleConfig | Configure the lifecycle manager to send lifecycle commands to the task via HTTP.
+
+### HttpLifecycleConfig Objects
+
+  param          | type            | description
+  -----          | :----:          | -----------
+  ```port```     | String          | The named port to send POST commands (Default: health)
+  ```graceful_shutdown_endpoint``` | String | Endpoint to hit to indicate that a task should gracefully shutdown. (Default: /quitquitquit)
+  ```shutdown_endpoint``` | String | Endpoint to hit to give a task its final warning before being killed. (Default: /abortabortabort)
+
+#### graceful_shutdown_endpoint
+
+If the Job is listening on the port as specified by the HttpLifecycleConfig
+(default: `health`), a HTTP POST request will be sent over localhost to this
+endpoint to request that the task gracefully shut itself down.  This is a
+courtesy call before the `shutdown_endpoint` is invoked a fixed amount of
+time later.
+
+#### shutdown_endpoint
+
+If the Job is listening on the port as specified by the HttpLifecycleConfig
+(default: `health`), a HTTP POST request will be sent over localhost to this
+endpoint to request as a final warning before being shut down.  If the task
+does not shut down on its own after this, it will be forcefully killed
+
+
+Specifying Scheduling Constraints
+=================================
+
+In the `Job` object there is a map `constraints` from String to String
+allowing the user to tailor the schedulability of tasks within the job.
+
+The constraint map's key value is the attribute name in which we
+constrain Tasks within our Job. The value is how we constrain them.
+There are two types of constraints: *limit constraints* and *value
+constraints*.
+
+| constraint    | description
+| ------------- | --------------
+| Limit         | A string that specifies a limit for a constraint. Starts with <code>'limit:</code> followed by an Integer and closing single quote, such as ```'limit:1'```.
+| Value         | A string that specifies a value for a constraint. To include a list of values, separate the values using commas. To negate the values of a constraint, start with a ```!``` ```.```
+
+Further details can be found in the [Scheduling Constraints](../features/constraints) feature
+description.
+
+
+Template Namespaces
+===================
+
+Currently, a few Pystachio namespaces have special semantics. Using them
+in your configuration allow you to tailor application behavior
+through environment introspection or interact in special ways with the
+Aurora client or Aurora-provided services.
+
+### mesos Namespace
+
+The `mesos` namespace contains variables which relate to the `mesos` slave
+which launched the task. The `instance` variable can be used
+to distinguish between Task replicas.
+
+| variable name     | type       | description
+| --------------- | :--------: | -------------
+| ```instance```    | Integer    | The instance number of the created task. A job with 5 replicas has instance numbers 0, 1, 2, 3, and 4.
+| ```hostname``` | String | The instance hostname that the task was launched on.
+
+Please note, there is no uniqueness guarantee for `instance` in the presence of
+network partitions. If that is required, it should be baked in at the application
+level using a distributed coordination service such as Zookeeper.
+
+### thermos Namespace
+
+The `thermos` namespace contains variables that work directly on the
+Thermos platform in addition to Aurora. This namespace is fully
+compatible with Tasks invoked via the `thermos` CLI.
+
+| variable      | type                     | description                        |
+| :----------:  | ---------                | ------------                       |
+| ```ports```   | map of string to Integer | A map of names to port numbers     |
+| ```task_id``` | string                   | The task ID assigned to this task. |
+
+The `thermos.ports` namespace is automatically populated by Aurora when
+invoking tasks on Mesos. When running the `thermos` command directly,
+these ports must be explicitly mapped with the `-P` option.
+
+For example, if '{{`thermos.ports[http]`}}' is specified in a `Process`
+configuration, it is automatically extracted and auto-populated by
+Aurora, but must be specified with, for example, `thermos -P http:12345`
+to map `http` to port 12345 when running via the CLI.
+

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/reference/scheduler-configuration.md
----------------------------------------------------------------------
diff --git a/docs/reference/scheduler-configuration.md b/docs/reference/scheduler-configuration.md
new file mode 100644
index 0000000..0b1e3c7
--- /dev/null
+++ b/docs/reference/scheduler-configuration.md
@@ -0,0 +1,318 @@
+# Scheduler Configuration
+
+The Aurora scheduler can take a variety of configuration options through command-line arguments.
+A list of the available options can be seen by running `aurora-scheduler -help`.
+
+Please refer to the [Operator Configuration Guide](../operations/configuration.md) for details on how
+to properly set the most important options.
+
+```
+$ aurora-scheduler -help
+-------------------------------------------------------------------------
+-h or -help to print this help message
+
+Required flags:
+-backup_dir [not null]
+    Directory to store backups under. Will be created if it does not exist.
+    (org.apache.aurora.scheduler.storage.backup.BackupModule.backup_dir)
+-cluster_name [not null]
+    Name to identify the cluster being served.
+    (org.apache.aurora.scheduler.app.SchedulerMain.cluster_name)
+-framework_authentication_file
+    Properties file which contains framework credentials to authenticate with Mesosmaster. Must contain the properties 'aurora_authentication_principal' and 'aurora_authentication_secret'.
+    (org.apache.aurora.scheduler.mesos.CommandLineDriverSettingsModule.framework_authentication_file)
+-mesos_master_address [not null]
+    Address for the mesos master, can be a socket address or zookeeper path.
+    (org.apache.aurora.scheduler.mesos.CommandLineDriverSettingsModule.mesos_master_address)
+-mesos_role
+    The Mesos role this framework will register as. The default is to left this empty, and the framework will register without any role and only receive unreserved resources in offer.
+    (org.apache.aurora.scheduler.mesos.CommandLineDriverSettingsModule.mesos_role)
+-serverset_path [not null, must be non-empty]
+    ZooKeeper ServerSet path to register at.
+    (org.apache.aurora.scheduler.app.SchedulerMain.serverset_path)
+-shiro_after_auth_filter
+    Fully qualified class name of the servlet filter to be applied after the shiro auth filters are applied.
+    (org.apache.aurora.scheduler.http.api.security.HttpSecurityModule.shiro_after_auth_filter)
+-thermos_executor_path
+    Path to the thermos executor entry point.
+    (org.apache.aurora.scheduler.configuration.executor.ExecutorModule.thermos_executor_path)
+-tier_config [file must be readable]
+    Configuration file defining supported task tiers, task traits and behaviors.
+    (org.apache.aurora.scheduler.SchedulerModule.tier_config)
+-zk_digest_credentials
+    user:password to use when authenticating with ZooKeeper.
+    (org.apache.aurora.scheduler.zookeeper.guice.client.flagged.FlaggedClientConfig.zk_digest_credentials)
+-zk_endpoints [must have at least 1 item]
+    Endpoint specification for the ZooKeeper servers.
+    (org.apache.aurora.scheduler.zookeeper.guice.client.flagged.FlaggedClientConfig.zk_endpoints)
+
+Optional flags:
+-allow_docker_parameters=false
+    Allow to pass docker container parameters in the job.
+    (org.apache.aurora.scheduler.app.AppModule.allow_docker_parameters)
+-allowed_container_types=[MESOS]
+    Container types that are allowed to be used by jobs.
+    (org.apache.aurora.scheduler.app.AppModule.allowed_container_types)
+-async_slot_stat_update_interval=(1, mins)
+    Interval on which to try to update open slot stats.
+    (org.apache.aurora.scheduler.stats.AsyncStatsModule.async_slot_stat_update_interval)
+-async_task_stat_update_interval=(1, hrs)
+    Interval on which to try to update resource consumption stats.
+    (org.apache.aurora.scheduler.stats.AsyncStatsModule.async_task_stat_update_interval)
+-async_worker_threads=8
+    The number of worker threads to process async task operations with.
+    (org.apache.aurora.scheduler.async.AsyncModule.async_worker_threads)
+-backup_interval=(1, hrs)
+    Minimum interval on which to write a storage backup.
+    (org.apache.aurora.scheduler.storage.backup.BackupModule.backup_interval)
+-cron_scheduler_num_threads=100
+    Number of threads to use for the cron scheduler thread pool.
+    (org.apache.aurora.scheduler.cron.quartz.CronModule.cron_scheduler_num_threads)
+-cron_start_initial_backoff=(1, secs)
+    Initial backoff delay while waiting for a previous cron run to be killed.
+    (org.apache.aurora.scheduler.cron.quartz.CronModule.cron_start_initial_backoff)
+-cron_start_max_backoff=(1, mins)
+    Max backoff delay while waiting for a previous cron run to be killed.
+    (org.apache.aurora.scheduler.cron.quartz.CronModule.cron_start_max_backoff)
+-cron_timezone=GMT
+    TimeZone to use for cron predictions.
+    (org.apache.aurora.scheduler.cron.quartz.CronModule.cron_timezone)
+-custom_executor_config [file must exist, file must be readable]
+    Path to custom executor settings configuration file.
+    (org.apache.aurora.scheduler.configuration.executor.ExecutorModule.custom_executor_config)
+-db_lock_timeout=(1, mins)
+    H2 table lock timeout
+    (org.apache.aurora.scheduler.storage.db.DbModule.db_lock_timeout)
+-db_row_gc_interval=(2, hrs)
+    Interval on which to scan the database for unused row references.
+    (org.apache.aurora.scheduler.storage.db.DbModule.db_row_gc_interval)
+-default_docker_parameters={}
+    Default docker parameters for any job that does not explicitly declare parameters.
+    (org.apache.aurora.scheduler.app.AppModule.default_docker_parameters)
+-dlog_max_entry_size=(512, KB)
+    Specifies the maximum entry size to append to the log. Larger entries will be split across entry Frames.
+    (org.apache.aurora.scheduler.storage.log.LogStorageModule.dlog_max_entry_size)
+-dlog_shutdown_grace_period=(2, secs)
+    Specifies the maximum time to wait for scheduled checkpoint and snapshot actions to complete before forcibly shutting down.
+    (org.apache.aurora.scheduler.storage.log.LogStorageModule.dlog_shutdown_grace_period)
+-dlog_snapshot_interval=(1, hrs)
+    Specifies the frequency at which snapshots of local storage are taken and written to the log.
+    (org.apache.aurora.scheduler.storage.log.LogStorageModule.dlog_snapshot_interval)
+-enable_cors_for
+    List of domains for which CORS support should be enabled.
+    (org.apache.aurora.scheduler.http.api.ApiModule.enable_cors_for)
+-enable_h2_console=false
+    Enable H2 DB management console.
+    (org.apache.aurora.scheduler.http.H2ConsoleModule.enable_h2_console)
+-enable_preemptor=true
+    Enable the preemptor and preemption
+    (org.apache.aurora.scheduler.preemptor.PreemptorModule.enable_preemptor)
+-executor_user=root
+    User to start the executor. Defaults to "root". Set this to an unprivileged user if the mesos master was started with "--no-root_submissions". If set to anything other than "root", the executor will ignore the "role" setting for jobs since it can't use setuid() anymore. This means that all your jobs will run under the specified user and the user has to exist on the mesos slaves.
+    (org.apache.aurora.scheduler.mesos.CommandLineDriverSettingsModule.executor_user)
+-first_schedule_delay=(1, ms)
+    Initial amount of time to wait before first attempting to schedule a PENDING task.
+    (org.apache.aurora.scheduler.scheduling.SchedulingModule.first_schedule_delay)
+-flapping_task_threshold=(5, mins)
+    A task that repeatedly runs for less than this time is considered to be flapping.
+    (org.apache.aurora.scheduler.scheduling.SchedulingModule.flapping_task_threshold)
+-framework_announce_principal=false
+    When 'framework_authentication_file' flag is set, the FrameworkInfo registered with the mesos master will also contain the principal. This is necessary if you intend to use mesos authorization via mesos ACLs. The default will change in a future release.
+    (org.apache.aurora.scheduler.mesos.CommandLineDriverSettingsModule.framework_announce_principal)
+-framework_failover_timeout=(21, days)
+    Time after which a framework is considered deleted.  SHOULD BE VERY HIGH.
+    (org.apache.aurora.scheduler.mesos.CommandLineDriverSettingsModule.framework_failover_timeout)
+-global_container_mounts=[]
+    A comma seperated list of mount points (in host:container form) to mount into all (non-mesos) containers.
+    (org.apache.aurora.scheduler.configuration.executor.ExecutorModule.global_container_mounts)
+-history_max_per_job_threshold=100
+    Maximum number of terminated tasks to retain in a job history.
+    (org.apache.aurora.scheduler.pruning.PruningModule.history_max_per_job_threshold)
+-history_min_retention_threshold=(1, hrs)
+    Minimum guaranteed time for task history retention before any pruning is attempted.
+    (org.apache.aurora.scheduler.pruning.PruningModule.history_min_retention_threshold)
+-history_prune_threshold=(2, days)
+    Time after which the scheduler will prune terminated task history.
+    (org.apache.aurora.scheduler.pruning.PruningModule.history_prune_threshold)
+-hostname
+    The hostname to advertise in ZooKeeper instead of the locally-resolved hostname.
+    (org.apache.aurora.scheduler.http.JettyServerModule.hostname)
+-http_authentication_mechanism=NONE
+    HTTP Authentication mechanism to use.
+    (org.apache.aurora.scheduler.http.api.security.HttpSecurityModule.http_authentication_mechanism)
+-http_port=0
+    The port to start an HTTP server on.  Default value will choose a random port.
+    (org.apache.aurora.scheduler.http.JettyServerModule.http_port)
+-initial_flapping_task_delay=(30, secs)
+    Initial amount of time to wait before attempting to schedule a flapping task.
+    (org.apache.aurora.scheduler.scheduling.SchedulingModule.initial_flapping_task_delay)
+-initial_schedule_penalty=(1, secs)
+    Initial amount of time to wait before attempting to schedule a task that has failed to schedule.
+    (org.apache.aurora.scheduler.scheduling.SchedulingModule.initial_schedule_penalty)
+-initial_task_kill_retry_interval=(5, secs)
+    When killing a task, retry after this delay if mesos has not responded, backing off up to transient_task_state_timeout
+    (org.apache.aurora.scheduler.reconciliation.ReconciliationModule.initial_task_kill_retry_interval)
+-job_update_history_per_job_threshold=10
+    Maximum number of completed job updates to retain in a job update history.
+    (org.apache.aurora.scheduler.pruning.PruningModule.job_update_history_per_job_threshold)
+-job_update_history_pruning_interval=(15, mins)
+    Job update history pruning interval.
+    (org.apache.aurora.scheduler.pruning.PruningModule.job_update_history_pruning_interval)
+-job_update_history_pruning_threshold=(30, days)
+    Time after which the scheduler will prune completed job update history.
+    (org.apache.aurora.scheduler.pruning.PruningModule.job_update_history_pruning_threshold)
+-kerberos_debug=false
+    Produce additional Kerberos debugging output.
+    (org.apache.aurora.scheduler.http.api.security.Kerberos5ShiroRealmModule.kerberos_debug)
+-kerberos_server_keytab
+    Path to the server keytab.
+    (org.apache.aurora.scheduler.http.api.security.Kerberos5ShiroRealmModule.kerberos_server_keytab)
+-kerberos_server_principal
+    Kerberos server principal to use, usually of the form HTTP/aurora.example.com@EXAMPLE.COM
+    (org.apache.aurora.scheduler.http.api.security.Kerberos5ShiroRealmModule.kerberos_server_principal)
+-max_flapping_task_delay=(5, mins)
+    Maximum delay between attempts to schedule a flapping task.
+    (org.apache.aurora.scheduler.scheduling.SchedulingModule.max_flapping_task_delay)
+-max_leading_duration=(1, days)
+    After leading for this duration, the scheduler should commit suicide.
+    (org.apache.aurora.scheduler.SchedulerModule.max_leading_duration)
+-max_registration_delay=(1, mins)
+    Max allowable delay to allow the driver to register before aborting
+    (org.apache.aurora.scheduler.SchedulerModule.max_registration_delay)
+-max_reschedule_task_delay_on_startup=(30, secs)
+    Upper bound of random delay for pending task rescheduling on scheduler startup.
+    (org.apache.aurora.scheduler.scheduling.SchedulingModule.max_reschedule_task_delay_on_startup)
+-max_saved_backups=48
+    Maximum number of backups to retain before deleting the oldest backups.
+    (org.apache.aurora.scheduler.storage.backup.BackupModule.max_saved_backups)
+-max_schedule_attempts_per_sec=40.0
+    Maximum number of scheduling attempts to make per second.
+    (org.apache.aurora.scheduler.scheduling.SchedulingModule.max_schedule_attempts_per_sec)
+-max_schedule_penalty=(1, mins)
+    Maximum delay between attempts to schedule a PENDING tasks.
+    (org.apache.aurora.scheduler.scheduling.SchedulingModule.max_schedule_penalty)
+-max_status_update_batch_size=1000 [must be > 0]
+    The maximum number of status updates that can be processed in a batch.
+    (org.apache.aurora.scheduler.SchedulerModule.max_status_update_batch_size)
+-max_tasks_per_job=4000 [must be > 0]
+    Maximum number of allowed tasks in a single job.
+    (org.apache.aurora.scheduler.app.AppModule.max_tasks_per_job)
+-max_update_instance_failures=20000 [must be > 0]
+    Upper limit on the number of failures allowed during a job update. This helps cap potentially unbounded entries into storage.
+    (org.apache.aurora.scheduler.app.AppModule.max_update_instance_failures)
+-min_offer_hold_time=(5, mins)
+    Minimum amount of time to hold a resource offer before declining.
+    (org.apache.aurora.scheduler.offers.OffersModule.min_offer_hold_time)
+-native_log_election_retries=20
+    The maximum number of attempts to obtain a new log writer.
+    (org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.native_log_election_retries)
+-native_log_election_timeout=(15, secs)
+    The timeout for a single attempt to obtain a new log writer.
+    (org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.native_log_election_timeout)
+-native_log_file_path
+    Path to a file to store the native log data in.  If the parent directory doesnot exist it will be created.
+    (org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.native_log_file_path)
+-native_log_quorum_size=1
+    The size of the quorum required for all log mutations.
+    (org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.native_log_quorum_size)
+-native_log_read_timeout=(5, secs)
+    The timeout for doing log reads.
+    (org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.native_log_read_timeout)
+-native_log_write_timeout=(3, secs)
+    The timeout for doing log appends and truncations.
+    (org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.native_log_write_timeout)
+-native_log_zk_group_path
+    A zookeeper node for use by the native log to track the master coordinator.
+    (org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.native_log_zk_group_path)
+-offer_hold_jitter_window=(1, mins)
+    Maximum amount of random jitter to add to the offer hold time window.
+    (org.apache.aurora.scheduler.offers.OffersModule.offer_hold_jitter_window)
+-offer_reservation_duration=(3, mins)
+    Time to reserve a slave's offers while trying to satisfy a task preempting another.
+    (org.apache.aurora.scheduler.scheduling.SchedulingModule.offer_reservation_duration)
+-preemption_delay=(3, mins)
+    Time interval after which a pending task becomes eligible to preempt other tasks
+    (org.apache.aurora.scheduler.preemptor.PreemptorModule.preemption_delay)
+-preemption_slot_hold_time=(5, mins)
+    Time to hold a preemption slot found before it is discarded.
+    (org.apache.aurora.scheduler.preemptor.PreemptorModule.preemption_slot_hold_time)
+-preemption_slot_search_interval=(1, mins)
+    Time interval between pending task preemption slot searches.
+    (org.apache.aurora.scheduler.preemptor.PreemptorModule.preemption_slot_search_interval)
+-receive_revocable_resources=false
+    Allows receiving revocable resource offers from Mesos.
+    (org.apache.aurora.scheduler.mesos.CommandLineDriverSettingsModule.receive_revocable_resources)
+-reconciliation_explicit_interval=(60, mins)
+    Interval on which scheduler will ask Mesos for status updates of all non-terminal tasks known to scheduler.
+    (org.apache.aurora.scheduler.reconciliation.ReconciliationModule.reconciliation_explicit_interval)
+-reconciliation_implicit_interval=(60, mins)
+    Interval on which scheduler will ask Mesos for status updates of all non-terminal tasks known to Mesos.
+    (org.apache.aurora.scheduler.reconciliation.ReconciliationModule.reconciliation_implicit_interval)
+-reconciliation_initial_delay=(1, mins)
+    Initial amount of time to delay task reconciliation after scheduler start up.
+    (org.apache.aurora.scheduler.reconciliation.ReconciliationModule.reconciliation_initial_delay)
+-reconciliation_schedule_spread=(30, mins)
+    Difference between explicit and implicit reconciliation intervals intended to create a non-overlapping task reconciliation schedule.
+    (org.apache.aurora.scheduler.reconciliation.ReconciliationModule.reconciliation_schedule_spread)
+-shiro_ini_path
+    Path to shiro.ini for authentication and authorization configuration.
+    (org.apache.aurora.scheduler.http.api.security.IniShiroRealmModule.shiro_ini_path)
+-shiro_realm_modules=[org.apache.aurora.scheduler.app.MoreModules$1@30c15d8b]
+    Guice modules for configuring Shiro Realms.
+    (org.apache.aurora.scheduler.http.api.security.HttpSecurityModule.shiro_realm_modules)
+-sla_non_prod_metrics=[]
+    Metric categories collected for non production tasks.
+    (org.apache.aurora.scheduler.sla.SlaModule.sla_non_prod_metrics)
+-sla_prod_metrics=[JOB_UPTIMES, PLATFORM_UPTIME, MEDIANS]
+    Metric categories collected for production tasks.
+    (org.apache.aurora.scheduler.sla.SlaModule.sla_prod_metrics)
+-sla_stat_refresh_interval=(1, mins)
+    The SLA stat refresh interval.
+    (org.apache.aurora.scheduler.sla.SlaModule.sla_stat_refresh_interval)
+-slow_query_log_threshold=(25, ms)
+    Log all queries that take at least this long to execute.
+    (org.apache.aurora.scheduler.storage.mem.InMemStoresModule.slow_query_log_threshold)
+-slow_query_log_threshold=(25, ms)
+    Log all queries that take at least this long to execute.
+    (org.apache.aurora.scheduler.storage.db.DbModule.slow_query_log_threshold)
+-stat_retention_period=(1, hrs)
+    Time for a stat to be retained in memory before expiring.
+    (org.apache.aurora.scheduler.stats.StatsModule.stat_retention_period)
+-stat_sampling_interval=(1, secs)
+    Statistic value sampling interval.
+    (org.apache.aurora.scheduler.stats.StatsModule.stat_sampling_interval)
+-thermos_executor_cpu=0.25
+    The number of CPU cores to allocate for each instance of the executor.
+    (org.apache.aurora.scheduler.configuration.executor.ExecutorModule.thermos_executor_cpu)
+-thermos_executor_flags
+    Extra arguments to be passed to the thermos executor
+    (org.apache.aurora.scheduler.configuration.executor.ExecutorModule.thermos_executor_flags)
+-thermos_executor_ram=(128, MB)
+    The amount of RAM to allocate for each instance of the executor.
+    (org.apache.aurora.scheduler.configuration.executor.ExecutorModule.thermos_executor_ram)
+-thermos_executor_resources=[]
+    A comma seperated list of additional resources to copy into the sandbox.Note: if thermos_executor_path is not the thermos_executor.pex file itself, this must include it.
+    (org.apache.aurora.scheduler.configuration.executor.ExecutorModule.thermos_executor_resources)
+-thermos_observer_root=/var/run/thermos
+    Path to the thermos observer root (by default /var/run/thermos.)
+    (org.apache.aurora.scheduler.configuration.executor.ExecutorModule.thermos_observer_root)
+-transient_task_state_timeout=(5, mins)
+    The amount of time after which to treat a task stuck in a transient state as LOST.
+    (org.apache.aurora.scheduler.reconciliation.ReconciliationModule.transient_task_state_timeout)
+-use_beta_db_task_store=false
+    Whether to use the experimental database-backed task store.
+    (org.apache.aurora.scheduler.storage.db.DbModule.use_beta_db_task_store)
+-viz_job_url_prefix=
+    URL prefix for job container stats.
+    (org.apache.aurora.scheduler.app.SchedulerMain.viz_job_url_prefix)
+-zk_chroot_path
+    chroot path to use for the ZooKeeper connections
+    (org.apache.aurora.scheduler.zookeeper.guice.client.flagged.FlaggedClientConfig.zk_chroot_path)
+-zk_in_proc=false
+    Launches an embedded zookeeper server for local testing causing -zk_endpoints to be ignored if specified.
+    (org.apache.aurora.scheduler.zookeeper.guice.client.flagged.FlaggedClientConfig.zk_in_proc)
+-zk_session_timeout=(4, secs)
+    The ZooKeeper session timeout.
+    (org.apache.aurora.scheduler.zookeeper.guice.client.flagged.FlaggedClientConfig.zk_session_timeout)
+-------------------------------------------------------------------------
+```

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/reference/task-lifecycle.md
----------------------------------------------------------------------
diff --git a/docs/reference/task-lifecycle.md b/docs/reference/task-lifecycle.md
new file mode 100644
index 0000000..1477364
--- /dev/null
+++ b/docs/reference/task-lifecycle.md
@@ -0,0 +1,146 @@
+# Task Lifecycle
+
+When Aurora reads a configuration file and finds a `Job` definition, it:
+
+1.  Evaluates the `Job` definition.
+2.  Splits the `Job` into its constituent `Task`s.
+3.  Sends those `Task`s to the scheduler.
+4.  The scheduler puts the `Task`s into `PENDING` state, starting each
+    `Task`'s life cycle.
+
+
+![Life of a task](../images/lifeofatask.png)
+
+Please note, a couple of task states described below are missing from
+this state diagram.
+
+
+## PENDING to RUNNING states
+
+When a `Task` is in the `PENDING` state, the scheduler constantly
+searches for machines satisfying that `Task`'s resource request
+requirements (RAM, disk space, CPU time) while maintaining configuration
+constraints such as "a `Task` must run on machines  dedicated  to a
+particular role" or attribute limit constraints such as "at most 2
+`Task`s from the same `Job` may run on each rack". When the scheduler
+finds a suitable match, it assigns the `Task` to a machine and puts the
+`Task` into the `ASSIGNED` state.
+
+From the `ASSIGNED` state, the scheduler sends an RPC to the slave
+machine containing `Task` configuration, which the slave uses to spawn
+an executor responsible for the `Task`'s lifecycle. When the scheduler
+receives an acknowledgment that the machine has accepted the `Task`,
+the `Task` goes into `STARTING` state.
+
+`STARTING` state initializes a `Task` sandbox. When the sandbox is fully
+initialized, Thermos begins to invoke `Process`es. Also, the slave
+machine sends an update to the scheduler that the `Task` is
+in `RUNNING` state.
+
+
+
+## RUNNING to terminal states
+
+There are various ways that an active `Task` can transition into a terminal
+state. By definition, it can never leave this state. However, depending on
+nature of the termination and the originating `Job` definition
+(e.g. `service`, `max_task_failures`), a replacement `Task` might be
+scheduled.
+
+### Natural Termination: FINISHED, FAILED
+
+A `RUNNING` `Task` can terminate without direct user interaction. For
+example, it may be a finite computation that finishes, even something as
+simple as `echo hello world.`, or it could be an exceptional condition in
+a long-lived service. If the `Task` is successful (its underlying
+processes have succeeded with exit status `0` or finished without
+reaching failure limits) it moves into `FINISHED` state. If it finished
+after reaching a set of failure limits, it goes into `FAILED` state.
+
+A terminated `TASK` which is subject to rescheduling will be temporarily
+`THROTTLED`, if it is considered to be flapping. A task is flapping, if its
+previous invocation was terminated after less than 5 minutes (scheduler
+default). The time penalty a task has to remain in the `THROTTLED` state,
+before it is eligible for rescheduling, increases with each consecutive
+failure.
+
+### Forceful Termination: KILLING, RESTARTING
+
+You can terminate a `Task` by issuing an `aurora job kill` command, which
+moves it into `KILLING` state. The scheduler then sends the slave a
+request to terminate the `Task`. If the scheduler receives a successful
+response, it moves the Task into `KILLED` state and never restarts it.
+
+If a `Task` is forced into the `RESTARTING` state via the `aurora job restart`
+command, the scheduler kills the underlying task but in parallel schedules
+an identical replacement for it.
+
+In any case, the responsible executor on the slave follows an escalation
+sequence when killing a running task:
+
+  1. If a `HttpLifecycleConfig` is not present, skip to (4).
+  2. Send a POST to the `graceful_shutdown_endpoint` and wait 5 seconds.
+  3. Send a POST to the `shutdown_endpoint` and wait 5 seconds.
+  4. Send SIGTERM (`kill`) and wait at most `finalization_wait` seconds.
+  5. Send SIGKILL (`kill -9`).
+
+If the executor notices that all `Process`es in a `Task` have aborted
+during this sequence, it will not proceed with subsequent steps.
+Note that graceful shutdown is best-effort, and due to the many
+inevitable realities of distributed systems, it may not be performed.
+
+### Unexpected Termination: LOST
+
+If a `Task` stays in a transient task state for too long (such as `ASSIGNED`
+or `STARTING`), the scheduler forces it into `LOST` state, creating a new
+`Task` in its place that's sent into `PENDING` state.
+
+In addition, if the Mesos core tells the scheduler that a slave has
+become unhealthy (or outright disappeared), the `Task`s assigned to that
+slave go into `LOST` state and new `Task`s are created in their place.
+From `PENDING` state, there is no guarantee a `Task` will be reassigned
+to the same machine unless job constraints explicitly force it there.
+
+### Giving Priority to Production Tasks: PREEMPTING
+
+Sometimes a Task needs to be interrupted, such as when a non-production
+Task's resources are needed by a higher priority production Task. This
+type of interruption is called a *pre-emption*. When this happens in
+Aurora, the non-production Task is killed and moved into
+the `PREEMPTING` state  when both the following are true:
+
+- The task being killed is a non-production task.
+- The other task is a `PENDING` production task that hasn't been
+  scheduled due to a lack of resources.
+
+The scheduler UI shows the non-production task was preempted in favor of
+the production task. At some point, tasks in `PREEMPTING` move to `KILLED`.
+
+Note that non-production tasks consuming many resources are likely to be
+preempted in favor of production tasks.
+
+### Making Room for Maintenance: DRAINING
+
+Cluster operators can set slave into maintenance mode. This will transition
+all `Task` running on this slave into `DRAINING` and eventually to `KILLED`.
+Drained `Task`s will be restarted on other slaves for which no maintenance
+has been announced yet.
+
+
+
+## State Reconciliation
+
+Due to the many inevitable realities of distributed systems, there might
+be a mismatch of perceived and actual cluster state (e.g. a machine returns
+from a `netsplit` but the scheduler has already marked all its `Task`s as
+`LOST` and rescheduled them).
+
+Aurora regularly runs a state reconciliation process in order to detect
+and correct such issues (e.g. by killing the errant `RUNNING` tasks).
+By default, the proper detection of all failure scenarios and inconsistencies
+may take up to an hour.
+
+To emphasize this point: there is no uniqueness guarantee for a single
+instance of a job in the presence of network partitions. If the `Task`
+requires that, it should be baked in at the application level using a
+distributed coordination service such as Zookeeper.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/resources.md
----------------------------------------------------------------------
diff --git a/docs/resources.md b/docs/resources.md
deleted file mode 100644
index 27a2678..0000000
--- a/docs/resources.md
+++ /dev/null
@@ -1,164 +0,0 @@
-Resources and Sizing
-=============================
-
-- [Introduction](#introduction)
-- [CPU Isolation](#cpu-isolation)
-- [CPU Sizing](#cpu-sizing)
-- [Memory Isolation](#memory-isolation)
-- [Memory Sizing](#memory-sizing)
-- [Disk Space](#disk-space)
-- [Disk Space Sizing](#disk-space-sizing)
-- [Other Resources](#other-resources)
-- [Resource Quota](#resource-quota)
-- [Task Preemption](#task-preemption)
-
-## Introduction
-
-Aurora is a multi-tenant system; a single software instance runs on a
-server, serving multiple clients/tenants. To share resources among
-tenants, it implements isolation of:
-
-* CPU
-* memory
-* disk space
-
-CPU is a soft limit, and handled differently from memory and disk space.
-Too low a CPU value results in throttling your application and
-slowing it down. Memory and disk space are both hard limits; when your
-application goes over these values, it's killed.
-
-Let's look at each resource type in more detail:
-
-## CPU Isolation
-
-Mesos uses a quota based CPU scheduler (the *Completely Fair Scheduler*)
-to provide consistent and predictable performance.  This is effectively
-a guarantee of resources -- you receive at least what you requested, but
-also no more than you've requested.
-
-The scheduler gives applications a CPU quota for every 100 ms interval.
-When an application uses its quota for an interval, it is throttled for
-the rest of the 100 ms. Usage resets for each interval and unused
-quota does not carry over.
-
-For example, an application specifying 4.0 CPU has access to 400 ms of
-CPU time every 100 ms. This CPU quota can be used in different ways,
-depending on the application and available resources. Consider the
-scenarios shown in this diagram.
-
-![CPU Availability](images/CPUavailability.png)
-
-* *Scenario A*: the application can use up to 4 cores continuously for
-every 100 ms interval. It is never throttled and starts processing
-new requests immediately.
-
-* *Scenario B* : the application uses up to 8 cores (depending on
-availability) but is throttled after 50 ms. The CPU quota resets at the
-start of each new 100 ms interval.
-
-* *Scenario C* : is like Scenario A, but there is a garbage collection
-event in the second interval that consumes all CPU quota. The
-application throttles for the remaining 75 ms of that interval and
-cannot service requests until the next interval. In this example, the
-garbage collection finished in one interval but, depending on how much
-garbage needs collecting, it may take more than one interval and further
-delay service of requests.
-
-*Technical Note*: Mesos considers logical cores, also known as
-hyperthreading or SMT cores, as the unit of CPU.
-
-## CPU Sizing
-
-To correctly size Aurora-run Mesos tasks, specify a per-shard CPU value
-that lets the task run at its desired performance when at peak load
-distributed across all shards. Include reserve capacity of at least 50%,
-possibly more, depending on how critical your service is (or how
-confident you are about your original estimate : -)), ideally by
-increasing the number of shards to also improve resiliency. When running
-your application, observe its CPU stats over time. If consistently at or
-near your quota during peak load, you should consider increasing either
-per-shard CPU or the number of shards.
-
-## Memory Isolation
-
-Mesos uses dedicated memory allocation. Your application always has
-access to the amount of memory specified in your configuration. The
-application's memory use is defined as the sum of the resident set size
-(RSS) of all processes in a shard. Each shard is considered
-independently.
-
-In other words, say you specified a memory size of 10GB. Each shard
-would receive 10GB of memory. If an individual shard's memory demands
-exceed 10GB, that shard is killed, but the other shards continue
-working.
-
-*Technical note*: Total memory size is not enforced at allocation time,
-so your application can request more than its allocation without getting
-an ENOMEM. However, it will be killed shortly after.
-
-## Memory Sizing
-
-Size for your application's peak requirement. Observe the per-instance
-memory statistics over time, as memory requirements can vary over
-different periods. Remember that if your application exceeds its memory
-value, it will be killed, so you should also add a safety margin of
-around 10-20%. If you have the ability to do so, you may also want to
-put alerts on the per-instance memory.
-
-## Disk Space
-
-Disk space used by your application is defined as the sum of the files'
-disk space in your application's directory, including the `stdout` and
-`stderr` logged from your application. Each shard is considered
-independently. You should use off-node storage for your application's
-data whenever possible.
-
-In other words, say you specified disk space size of 100MB. Each shard
-would receive 100MB of disk space. If an individual shard's disk space
-demands exceed 100MB, that shard is killed, but the other shards
-continue working.
-
-After your application finishes running, its allocated disk space is
-reclaimed. Thus, your job's final action should move any disk content
-that you want to keep, such as logs, to your home file system or other
-less transitory storage. Disk reclamation takes place an undefined
-period after the application finish time; until then, the disk contents
-are still available but you shouldn't count on them being so.
-
-*Technical note* : Disk space is not enforced at write so your
-application can write above its quota without getting an ENOSPC, but it
-will be killed shortly after. This is subject to change.
-
-## Disk Space Sizing
-
-Size for your application's peak requirement. Rotate and discard log
-files as needed to stay within your quota. When running a Java process,
-add the maximum size of the Java heap to your disk space requirement, in
-order to account for an out of memory error dumping the heap
-into the application's sandbox space.
-
-## Other Resources
-
-Other resources, such as network bandwidth, do not have any performance
-guarantees. For some resources, such as memory bandwidth, there are no
-practical sharing methods so some application combinations collocated on
-the same host may cause contention.
-
-## Resource Quota
-
-Aurora requires resource quotas for
-[production non-dedicated jobs](configuration-reference.md#job-objects). Quota is enforced at
-the job role level and when set, defines a non-preemptible pool of compute resources within
-that role.
-
-To grant quota to a particular role in production use `aurora_admin set_quota` command.
-
-NOTE: all job types (service, adhoc or cron) require role resource quota unless a job has
-[dedicated constraint set](deploying-aurora-scheduler.md#dedicated-attribute).
-
-## Task preemption
-
-Under a particular resource shortage pressure, tasks from
-[production](configuration-reference.md#job-objects) jobs may preempt tasks from any non-production
-job. A production task may only be preempted by tasks from production jobs in the same role with
-higher [priority](configuration-reference.md#job-objects).
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/scheduler-configuration.md
----------------------------------------------------------------------
diff --git a/docs/scheduler-configuration.md b/docs/scheduler-configuration.md
deleted file mode 100644
index 7e3d801..0000000
--- a/docs/scheduler-configuration.md
+++ /dev/null
@@ -1,318 +0,0 @@
-# Scheduler Configuration
-
-The Aurora scheduler can take a variety of configuration options through command-line arguments.
-A list of the available options can be seen by running `aurora-scheduler -help`.
-
-Please refer to [Deploying the Aurora Scheduler](deploying-aurora-scheduler.md) for details on how
-to properly set the most important options.
-
-```
-$ aurora-scheduler -help
--------------------------------------------------------------------------
--h or -help to print this help message
-
-Required flags:
--backup_dir [not null]
-	Directory to store backups under. Will be created if it does not exist.
-	(org.apache.aurora.scheduler.storage.backup.BackupModule.backup_dir)
--cluster_name [not null]
-	Name to identify the cluster being served.
-	(org.apache.aurora.scheduler.app.SchedulerMain.cluster_name)
--framework_authentication_file
-	Properties file which contains framework credentials to authenticate with Mesosmaster. Must contain the properties 'aurora_authentication_principal' and 'aurora_authentication_secret'.
-	(org.apache.aurora.scheduler.mesos.CommandLineDriverSettingsModule.framework_authentication_file)
--mesos_master_address [not null]
-	Address for the mesos master, can be a socket address or zookeeper path.
-	(org.apache.aurora.scheduler.mesos.CommandLineDriverSettingsModule.mesos_master_address)
--mesos_role
-	The Mesos role this framework will register as. The default is to left this empty, and the framework will register without any role and only receive unreserved resources in offer.
-	(org.apache.aurora.scheduler.mesos.CommandLineDriverSettingsModule.mesos_role)
--serverset_path [not null, must be non-empty]
-	ZooKeeper ServerSet path to register at.
-	(org.apache.aurora.scheduler.app.SchedulerMain.serverset_path)
--shiro_after_auth_filter
-	Fully qualified class name of the servlet filter to be applied after the shiro auth filters are applied.
-	(org.apache.aurora.scheduler.http.api.security.HttpSecurityModule.shiro_after_auth_filter)
--thermos_executor_path
-	Path to the thermos executor entry point.
-	(org.apache.aurora.scheduler.configuration.executor.ExecutorModule.thermos_executor_path)
--tier_config [file must be readable]
-	Configuration file defining supported task tiers, task traits and behaviors.
-	(org.apache.aurora.scheduler.SchedulerModule.tier_config)
--zk_digest_credentials
-	user:password to use when authenticating with ZooKeeper.
-	(org.apache.aurora.scheduler.zookeeper.guice.client.flagged.FlaggedClientConfig.zk_digest_credentials)
--zk_endpoints [must have at least 1 item]
-	Endpoint specification for the ZooKeeper servers.
-	(org.apache.aurora.scheduler.zookeeper.guice.client.flagged.FlaggedClientConfig.zk_endpoints)
-
-Optional flags:
--allow_docker_parameters=false
-	Allow to pass docker container parameters in the job.
-	(org.apache.aurora.scheduler.app.AppModule.allow_docker_parameters)
--allowed_container_types=[MESOS]
-	Container types that are allowed to be used by jobs.
-	(org.apache.aurora.scheduler.app.AppModule.allowed_container_types)
--async_slot_stat_update_interval=(1, mins)
-	Interval on which to try to update open slot stats.
-	(org.apache.aurora.scheduler.stats.AsyncStatsModule.async_slot_stat_update_interval)
--async_task_stat_update_interval=(1, hrs)
-	Interval on which to try to update resource consumption stats.
-	(org.apache.aurora.scheduler.stats.AsyncStatsModule.async_task_stat_update_interval)
--async_worker_threads=8
-	The number of worker threads to process async task operations with.
-	(org.apache.aurora.scheduler.async.AsyncModule.async_worker_threads)
--backup_interval=(1, hrs)
-	Minimum interval on which to write a storage backup.
-	(org.apache.aurora.scheduler.storage.backup.BackupModule.backup_interval)
--cron_scheduler_num_threads=100
-	Number of threads to use for the cron scheduler thread pool.
-	(org.apache.aurora.scheduler.cron.quartz.CronModule.cron_scheduler_num_threads)
--cron_start_initial_backoff=(1, secs)
-	Initial backoff delay while waiting for a previous cron run to be killed.
-	(org.apache.aurora.scheduler.cron.quartz.CronModule.cron_start_initial_backoff)
--cron_start_max_backoff=(1, mins)
-	Max backoff delay while waiting for a previous cron run to be killed.
-	(org.apache.aurora.scheduler.cron.quartz.CronModule.cron_start_max_backoff)
--cron_timezone=GMT
-	TimeZone to use for cron predictions.
-	(org.apache.aurora.scheduler.cron.quartz.CronModule.cron_timezone)
--custom_executor_config [file must exist, file must be readable]
-	Path to custom executor settings configuration file.
-	(org.apache.aurora.scheduler.configuration.executor.ExecutorModule.custom_executor_config)
--db_lock_timeout=(1, mins)
-	H2 table lock timeout
-	(org.apache.aurora.scheduler.storage.db.DbModule.db_lock_timeout)
--db_row_gc_interval=(2, hrs)
-	Interval on which to scan the database for unused row references.
-	(org.apache.aurora.scheduler.storage.db.DbModule.db_row_gc_interval)
--default_docker_parameters={}
-	Default docker parameters for any job that does not explicitly declare parameters.
-	(org.apache.aurora.scheduler.app.AppModule.default_docker_parameters)
--dlog_max_entry_size=(512, KB)
-	Specifies the maximum entry size to append to the log. Larger entries will be split across entry Frames.
-	(org.apache.aurora.scheduler.storage.log.LogStorageModule.dlog_max_entry_size)
--dlog_shutdown_grace_period=(2, secs)
-	Specifies the maximum time to wait for scheduled checkpoint and snapshot actions to complete before forcibly shutting down.
-	(org.apache.aurora.scheduler.storage.log.LogStorageModule.dlog_shutdown_grace_period)
--dlog_snapshot_interval=(1, hrs)
-	Specifies the frequency at which snapshots of local storage are taken and written to the log.
-	(org.apache.aurora.scheduler.storage.log.LogStorageModule.dlog_snapshot_interval)
--enable_cors_for
-	List of domains for which CORS support should be enabled.
-	(org.apache.aurora.scheduler.http.api.ApiModule.enable_cors_for)
--enable_h2_console=false
-	Enable H2 DB management console.
-	(org.apache.aurora.scheduler.http.H2ConsoleModule.enable_h2_console)
--enable_preemptor=true
-	Enable the preemptor and preemption
-	(org.apache.aurora.scheduler.preemptor.PreemptorModule.enable_preemptor)
--executor_user=root
-	User to start the executor. Defaults to "root". Set this to an unprivileged user if the mesos master was started with "--no-root_submissions". If set to anything other than "root", the executor will ignore the "role" setting for jobs since it can't use setuid() anymore. This means that all your jobs will run under the specified user and the user has to exist on the mesos slaves.
-	(org.apache.aurora.scheduler.mesos.CommandLineDriverSettingsModule.executor_user)
--first_schedule_delay=(1, ms)
-	Initial amount of time to wait before first attempting to schedule a PENDING task.
-	(org.apache.aurora.scheduler.scheduling.SchedulingModule.first_schedule_delay)
--flapping_task_threshold=(5, mins)
-	A task that repeatedly runs for less than this time is considered to be flapping.
-	(org.apache.aurora.scheduler.scheduling.SchedulingModule.flapping_task_threshold)
--framework_announce_principal=false
-	When 'framework_authentication_file' flag is set, the FrameworkInfo registered with the mesos master will also contain the principal. This is necessary if you intend to use mesos authorization via mesos ACLs. The default will change in a future release.
-	(org.apache.aurora.scheduler.mesos.CommandLineDriverSettingsModule.framework_announce_principal)
--framework_failover_timeout=(21, days)
-	Time after which a framework is considered deleted.  SHOULD BE VERY HIGH.
-	(org.apache.aurora.scheduler.mesos.CommandLineDriverSettingsModule.framework_failover_timeout)
--global_container_mounts=[]
-	A comma seperated list of mount points (in host:container form) to mount into all (non-mesos) containers.
-	(org.apache.aurora.scheduler.configuration.executor.ExecutorModule.global_container_mounts)
--history_max_per_job_threshold=100
-	Maximum number of terminated tasks to retain in a job history.
-	(org.apache.aurora.scheduler.pruning.PruningModule.history_max_per_job_threshold)
--history_min_retention_threshold=(1, hrs)
-	Minimum guaranteed time for task history retention before any pruning is attempted.
-	(org.apache.aurora.scheduler.pruning.PruningModule.history_min_retention_threshold)
--history_prune_threshold=(2, days)
-	Time after which the scheduler will prune terminated task history.
-	(org.apache.aurora.scheduler.pruning.PruningModule.history_prune_threshold)
--hostname
-	The hostname to advertise in ZooKeeper instead of the locally-resolved hostname.
-	(org.apache.aurora.scheduler.http.JettyServerModule.hostname)
--http_authentication_mechanism=NONE
-	HTTP Authentication mechanism to use.
-	(org.apache.aurora.scheduler.http.api.security.HttpSecurityModule.http_authentication_mechanism)
--http_port=0
-	The port to start an HTTP server on.  Default value will choose a random port.
-	(org.apache.aurora.scheduler.http.JettyServerModule.http_port)
--initial_flapping_task_delay=(30, secs)
-	Initial amount of time to wait before attempting to schedule a flapping task.
-	(org.apache.aurora.scheduler.scheduling.SchedulingModule.initial_flapping_task_delay)
--initial_schedule_penalty=(1, secs)
-	Initial amount of time to wait before attempting to schedule a task that has failed to schedule.
-	(org.apache.aurora.scheduler.scheduling.SchedulingModule.initial_schedule_penalty)
--initial_task_kill_retry_interval=(5, secs)
-	When killing a task, retry after this delay if mesos has not responded, backing off up to transient_task_state_timeout
-	(org.apache.aurora.scheduler.reconciliation.ReconciliationModule.initial_task_kill_retry_interval)
--job_update_history_per_job_threshold=10
-	Maximum number of completed job updates to retain in a job update history.
-	(org.apache.aurora.scheduler.pruning.PruningModule.job_update_history_per_job_threshold)
--job_update_history_pruning_interval=(15, mins)
-	Job update history pruning interval.
-	(org.apache.aurora.scheduler.pruning.PruningModule.job_update_history_pruning_interval)
--job_update_history_pruning_threshold=(30, days)
-	Time after which the scheduler will prune completed job update history.
-	(org.apache.aurora.scheduler.pruning.PruningModule.job_update_history_pruning_threshold)
--kerberos_debug=false
-	Produce additional Kerberos debugging output.
-	(org.apache.aurora.scheduler.http.api.security.Kerberos5ShiroRealmModule.kerberos_debug)
--kerberos_server_keytab
-	Path to the server keytab.
-	(org.apache.aurora.scheduler.http.api.security.Kerberos5ShiroRealmModule.kerberos_server_keytab)
--kerberos_server_principal
-	Kerberos server principal to use, usually of the form HTTP/aurora.example.com@EXAMPLE.COM
-	(org.apache.aurora.scheduler.http.api.security.Kerberos5ShiroRealmModule.kerberos_server_principal)
--max_flapping_task_delay=(5, mins)
-	Maximum delay between attempts to schedule a flapping task.
-	(org.apache.aurora.scheduler.scheduling.SchedulingModule.max_flapping_task_delay)
--max_leading_duration=(1, days)
-	After leading for this duration, the scheduler should commit suicide.
-	(org.apache.aurora.scheduler.SchedulerModule.max_leading_duration)
--max_registration_delay=(1, mins)
-	Max allowable delay to allow the driver to register before aborting
-	(org.apache.aurora.scheduler.SchedulerModule.max_registration_delay)
--max_reschedule_task_delay_on_startup=(30, secs)
-	Upper bound of random delay for pending task rescheduling on scheduler startup.
-	(org.apache.aurora.scheduler.scheduling.SchedulingModule.max_reschedule_task_delay_on_startup)
--max_saved_backups=48
-	Maximum number of backups to retain before deleting the oldest backups.
-	(org.apache.aurora.scheduler.storage.backup.BackupModule.max_saved_backups)
--max_schedule_attempts_per_sec=40.0
-	Maximum number of scheduling attempts to make per second.
-	(org.apache.aurora.scheduler.scheduling.SchedulingModule.max_schedule_attempts_per_sec)
--max_schedule_penalty=(1, mins)
-	Maximum delay between attempts to schedule a PENDING tasks.
-	(org.apache.aurora.scheduler.scheduling.SchedulingModule.max_schedule_penalty)
--max_status_update_batch_size=1000 [must be > 0]
-	The maximum number of status updates that can be processed in a batch.
-	(org.apache.aurora.scheduler.SchedulerModule.max_status_update_batch_size)
--max_tasks_per_job=4000 [must be > 0]
-	Maximum number of allowed tasks in a single job.
-	(org.apache.aurora.scheduler.app.AppModule.max_tasks_per_job)
--max_update_instance_failures=20000 [must be > 0]
-	Upper limit on the number of failures allowed during a job update. This helps cap potentially unbounded entries into storage.
-	(org.apache.aurora.scheduler.app.AppModule.max_update_instance_failures)
--min_offer_hold_time=(5, mins)
-	Minimum amount of time to hold a resource offer before declining.
-	(org.apache.aurora.scheduler.offers.OffersModule.min_offer_hold_time)
--native_log_election_retries=20
-	The maximum number of attempts to obtain a new log writer.
-	(org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.native_log_election_retries)
--native_log_election_timeout=(15, secs)
-	The timeout for a single attempt to obtain a new log writer.
-	(org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.native_log_election_timeout)
--native_log_file_path
-	Path to a file to store the native log data in.  If the parent directory doesnot exist it will be created.
-	(org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.native_log_file_path)
--native_log_quorum_size=1
-	The size of the quorum required for all log mutations.
-	(org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.native_log_quorum_size)
--native_log_read_timeout=(5, secs)
-	The timeout for doing log reads.
-	(org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.native_log_read_timeout)
--native_log_write_timeout=(3, secs)
-	The timeout for doing log appends and truncations.
-	(org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.native_log_write_timeout)
--native_log_zk_group_path
-	A zookeeper node for use by the native log to track the master coordinator.
-	(org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.native_log_zk_group_path)
--offer_hold_jitter_window=(1, mins)
-	Maximum amount of random jitter to add to the offer hold time window.
-	(org.apache.aurora.scheduler.offers.OffersModule.offer_hold_jitter_window)
--offer_reservation_duration=(3, mins)
-	Time to reserve a slave's offers while trying to satisfy a task preempting another.
-	(org.apache.aurora.scheduler.scheduling.SchedulingModule.offer_reservation_duration)
--preemption_delay=(3, mins)
-	Time interval after which a pending task becomes eligible to preempt other tasks
-	(org.apache.aurora.scheduler.preemptor.PreemptorModule.preemption_delay)
--preemption_slot_hold_time=(5, mins)
-	Time to hold a preemption slot found before it is discarded.
-	(org.apache.aurora.scheduler.preemptor.PreemptorModule.preemption_slot_hold_time)
--preemption_slot_search_interval=(1, mins)
-	Time interval between pending task preemption slot searches.
-	(org.apache.aurora.scheduler.preemptor.PreemptorModule.preemption_slot_search_interval)
--receive_revocable_resources=false
-	Allows receiving revocable resource offers from Mesos.
-	(org.apache.aurora.scheduler.mesos.CommandLineDriverSettingsModule.receive_revocable_resources)
--reconciliation_explicit_interval=(60, mins)
-	Interval on which scheduler will ask Mesos for status updates of all non-terminal tasks known to scheduler.
-	(org.apache.aurora.scheduler.reconciliation.ReconciliationModule.reconciliation_explicit_interval)
--reconciliation_implicit_interval=(60, mins)
-	Interval on which scheduler will ask Mesos for status updates of all non-terminal tasks known to Mesos.
-	(org.apache.aurora.scheduler.reconciliation.ReconciliationModule.reconciliation_implicit_interval)
--reconciliation_initial_delay=(1, mins)
-	Initial amount of time to delay task reconciliation after scheduler start up.
-	(org.apache.aurora.scheduler.reconciliation.ReconciliationModule.reconciliation_initial_delay)
--reconciliation_schedule_spread=(30, mins)
-	Difference between explicit and implicit reconciliation intervals intended to create a non-overlapping task reconciliation schedule.
-	(org.apache.aurora.scheduler.reconciliation.ReconciliationModule.reconciliation_schedule_spread)
--shiro_ini_path
-	Path to shiro.ini for authentication and authorization configuration.
-	(org.apache.aurora.scheduler.http.api.security.IniShiroRealmModule.shiro_ini_path)
--shiro_realm_modules=[org.apache.aurora.scheduler.app.MoreModules$1@30c15d8b]
-	Guice modules for configuring Shiro Realms.
-	(org.apache.aurora.scheduler.http.api.security.HttpSecurityModule.shiro_realm_modules)
--sla_non_prod_metrics=[]
-	Metric categories collected for non production tasks.
-	(org.apache.aurora.scheduler.sla.SlaModule.sla_non_prod_metrics)
--sla_prod_metrics=[JOB_UPTIMES, PLATFORM_UPTIME, MEDIANS]
-	Metric categories collected for production tasks.
-	(org.apache.aurora.scheduler.sla.SlaModule.sla_prod_metrics)
--sla_stat_refresh_interval=(1, mins)
-	The SLA stat refresh interval.
-	(org.apache.aurora.scheduler.sla.SlaModule.sla_stat_refresh_interval)
--slow_query_log_threshold=(25, ms)
-	Log all queries that take at least this long to execute.
-	(org.apache.aurora.scheduler.storage.mem.InMemStoresModule.slow_query_log_threshold)
--slow_query_log_threshold=(25, ms)
-	Log all queries that take at least this long to execute.
-	(org.apache.aurora.scheduler.storage.db.DbModule.slow_query_log_threshold)
--stat_retention_period=(1, hrs)
-	Time for a stat to be retained in memory before expiring.
-	(org.apache.aurora.scheduler.stats.StatsModule.stat_retention_period)
--stat_sampling_interval=(1, secs)
-	Statistic value sampling interval.
-	(org.apache.aurora.scheduler.stats.StatsModule.stat_sampling_interval)
--thermos_executor_cpu=0.25
-	The number of CPU cores to allocate for each instance of the executor.
-	(org.apache.aurora.scheduler.configuration.executor.ExecutorModule.thermos_executor_cpu)
--thermos_executor_flags
-	Extra arguments to be passed to the thermos executor
-	(org.apache.aurora.scheduler.configuration.executor.ExecutorModule.thermos_executor_flags)
--thermos_executor_ram=(128, MB)
-	The amount of RAM to allocate for each instance of the executor.
-	(org.apache.aurora.scheduler.configuration.executor.ExecutorModule.thermos_executor_ram)
--thermos_executor_resources=[]
-	A comma seperated list of additional resources to copy into the sandbox.Note: if thermos_executor_path is not the thermos_executor.pex file itself, this must include it.
-	(org.apache.aurora.scheduler.configuration.executor.ExecutorModule.thermos_executor_resources)
--thermos_observer_root=/var/run/thermos
-	Path to the thermos observer root (by default /var/run/thermos.)
-	(org.apache.aurora.scheduler.configuration.executor.ExecutorModule.thermos_observer_root)
--transient_task_state_timeout=(5, mins)
-	The amount of time after which to treat a task stuck in a transient state as LOST.
-	(org.apache.aurora.scheduler.reconciliation.ReconciliationModule.transient_task_state_timeout)
--use_beta_db_task_store=false
-	Whether to use the experimental database-backed task store.
-	(org.apache.aurora.scheduler.storage.db.DbModule.use_beta_db_task_store)
--viz_job_url_prefix=
-	URL prefix for job container stats.
-	(org.apache.aurora.scheduler.app.SchedulerMain.viz_job_url_prefix)
--zk_chroot_path
-	chroot path to use for the ZooKeeper connections
-	(org.apache.aurora.scheduler.zookeeper.guice.client.flagged.FlaggedClientConfig.zk_chroot_path)
--zk_in_proc=false
-	Launches an embedded zookeeper server for local testing causing -zk_endpoints to be ignored if specified.
-	(org.apache.aurora.scheduler.zookeeper.guice.client.flagged.FlaggedClientConfig.zk_in_proc)
--zk_session_timeout=(4, secs)
-	The ZooKeeper session timeout.
-	(org.apache.aurora.scheduler.zookeeper.guice.client.flagged.FlaggedClientConfig.zk_session_timeout)
--------------------------------------------------------------------------
-```