You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@aurora.apache.org by re...@apache.org on 2018/09/11 05:28:12 UTC

svn commit: r1840515 [11/15] - in /aurora/site: publish/blog/aurora-0-21-0-released/ publish/documentation/0.21.0/ publish/documentation/0.21.0/additional-resources/ publish/documentation/0.21.0/additional-resources/presentations/ publish/documentation...

Added: aurora/site/source/documentation/0.21.0/development/committers-guide.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.21.0/development/committers-guide.md?rev=1840515&view=auto
==============================================================================
--- aurora/site/source/documentation/0.21.0/development/committers-guide.md (added)
+++ aurora/site/source/documentation/0.21.0/development/committers-guide.md Tue Sep 11 05:28:10 2018
@@ -0,0 +1,105 @@
+Committer's Guide
+=================
+
+Information for official Apache Aurora committers.
+
+Setting up your email account
+-----------------------------
+Once your Apache ID has been set up you can configure your account and add ssh keys and setup an
+email forwarding address at
+
+    http://id.apache.org
+
+Additional instructions for setting up your new committer email can be found at
+
+    http://www.apache.org/dev/user-email.html
+
+The recommended setup is to configure all services (mailing lists, JIRA, ReviewBoard) to send
+emails to your @apache.org email address.
+
+
+Creating a gpg key for releases
+-------------------------------
+In order to create a release candidate you will need a gpg key published to an external key server
+and that key will need to be added to our KEYS file as well.
+
+1. Create a key:
+
+               gpg --gen-key
+
+2. Add your gpg key to the Apache Aurora KEYS file:
+
+               git clone https://gitbox.apache.org/repos/asf/aurora
+               (gpg --list-sigs <KEY ID> && gpg --armor --export <KEY ID>) >> KEYS
+               git add KEYS && git commit -m "Adding gpg key for <APACHE ID>"
+               ./rbt post -o -g
+
+3. Publish the key to an external key server:
+
+               gpg --keyserver pgp.mit.edu --send-keys <KEY ID>
+
+4. Update the changes to the KEYS file to the Apache Aurora svn dist locations listed below:
+
+               https://dist.apache.org/repos/dist/dev/aurora/KEYS
+               https://dist.apache.org/repos/dist/release/aurora/KEYS
+
+5. Add your key to git config for use with the release scripts:
+
+               git config --global user.signingkey <KEY ID>
+
+
+Creating a release
+------------------
+The following will guide you through the steps to create a release candidate, vote, and finally an
+official Apache Aurora release. Before starting your gpg key should be in the KEYS file and you
+must have access to commit to the dist.a.o repositories.
+
+1. Ensure that all issues resolved for this release candidate are tagged with the correct Fix
+Version in JIRA, the changelog script will use this to generate the CHANGELOG in step #2.
+To assign the fix version:
+
+    * Look up the [previous release date](https://issues.apache.org/jira/browse/aurora/?selectedTab=com.atlassian.jira.jira-projects-plugin:versions-panel).
+    * Query all issues resolved after that release date: `project = AURORA AND status in (resolved, Closed) and fixVersion is empty and resolutiondate >= "YYYY/MM/DD"`
+    * In the upper right corner of the query result, select Tools > Bulk Edit.
+    * Select all issues > edit issue > set 'Change Fix Version/s' to the release version.
+    * Make sure to uncheck 'Send mail for this update' at the bottom.
+
+2. Prepare RELEASE-NOTES.md for the release. This just boils down to removing the "(Not yet
+released)" suffix from the impending release.
+
+2. Create a release candidate. This will automatically update the CHANGELOG and commit it, create a
+branch and update the current version within the trunk. To create a minor version update and publish
+it run
+
+               ./build-support/release/release-candidate -l m -p
+
+3. Update, if necessary, the draft email created from the `release-candidate` script in step #2 and
+send the [VOTE] email to the dev@ mailing list. You can verify the release signature and checksums
+by running
+
+               ./build-support/release/verify-release-candidate
+
+4. Wait for the vote to complete. If the vote fails close the vote by replying to the initial [VOTE]
+email sent in step #3 by editing the subject to [RESULT][VOTE] ... and noting the failure reason
+(example [here](http://markmail.org/message/d4d6xtvj7vgwi76f)). You'll also need to manually revert
+the commits generated by the release candidate script that incremented the snapshot version and
+updated the changelog. Once that is done, now address any issues and go back to step #1 and run
+again, this time you will use the -r flag to increment the release candidate version. This will
+automatically clean up the release candidate rc0 branch and source distribution.
+
+               ./build-support/release/release-candidate -l m -r 1 -p
+
+5. Once the vote has successfully passed create the release
+
+**IMPORTANT: make sure to use the correct release at this final step (e.g.: `-r 1` if rc1 candidate
+has been voted for). Once the release tag is pushed it will be very hard to undo due to remote
+git pre-receive hook explicitly forbidding release tag manipulations.**
+
+               ./build-support/release/release
+
+6. Update the draft email created fom the `release` script in step #5 to include the Apache ID's for
+all binding votes and send the [RESULT][VOTE] email to the dev@ mailing list.
+
+7. Update the [Aurora Website](http://aurora.apache.org/) by following the
+[instructions](https://svn.apache.org/repos/asf/aurora/site/README.md) on the ASF Aurora SVN repo.
+Remember to add a blog post under source/blog and regenerate the site before committing.

Added: aurora/site/source/documentation/0.21.0/development/db-migration.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.21.0/development/db-migration.md?rev=1840515&view=auto
==============================================================================
--- aurora/site/source/documentation/0.21.0/development/db-migration.md (added)
+++ aurora/site/source/documentation/0.21.0/development/db-migration.md Tue Sep 11 05:28:10 2018
@@ -0,0 +1,34 @@
+DB Migrations
+=============
+
+Changes to the DB schema should be made in the form of migrations. This ensures that all changes
+are applied correctly after a DB dump from a previous version is restored.
+
+DB migrations are managed through a system built on top of
+[MyBatis Migrations](http://www.mybatis.org/migrations/). The migrations are run automatically when
+a snapshot is restored, no manual interaction is required by cluster operators.
+
+Upgrades
+--------
+When adding or altering tables or changing data, in addition to making to change in
+[schema.sql](../../src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql), a new
+migration class should be created under the org.apache.aurora.scheduler.storage.db.migration
+package. The class should implement the [MigrationScript](https://github.com/mybatis/migrations/blob/master/src/main/java/org/apache/ibatis/migration/MigrationScript.java)
+interface (see [V001_TestMigration](https://github.com/apache/aurora/blob/rel/0.21.0/src/test/java/org/apache/aurora/scheduler/storage/db/testmigration/V001_TestMigration.java)
+as an example). The upgrade and downgrade scripts are defined in this class. When restoring a
+snapshot the list of migrations on the classpath is compared to the list of applied changes in the
+DB. Any changes that have not yet been applied are executed and their downgrade script is stored
+alongside the changelog entry in the database to faciliate downgrades in the event of a rollback.
+
+Downgrades
+----------
+If, while running migrations, a rollback is detected, i.e. a change exists in the DB changelog that
+does not exist on the classpath, the downgrade script associated with each affected change is
+applied.
+
+Baselines
+---------
+After enough time has passed (at least 1 official release), it should be safe to baseline migrations
+if desired. This can be accomplished by ensuring the changes from migrations have been applied to
+[schema.sql](../../src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql) and then
+removing the corresponding migration classes and adding a migration to remove the changelog entries.
\ No newline at end of file

Added: aurora/site/source/documentation/0.21.0/development/design-documents.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.21.0/development/design-documents.md?rev=1840515&view=auto
==============================================================================
--- aurora/site/source/documentation/0.21.0/development/design-documents.md (added)
+++ aurora/site/source/documentation/0.21.0/development/design-documents.md Tue Sep 11 05:28:10 2018
@@ -0,0 +1,23 @@
+Design Documents
+================
+
+Since its inception as an Apache project, larger feature additions to the
+Aurora code base are discussed in form of design documents. Design documents
+are living documents until a consensus has been reached to implement a feature
+in the proposed form.
+
+Current and past documents:
+
+* [Command Hooks for the Aurora Client](../design/command-hooks/)
+* [Dynamic Reservations](https://docs.google.com/document/d/19gV8Po6DIHO14tOC7Qouk8RnboY8UCfRTninwn_5-7c/edit)
+* [GPU Resources in Aurora](https://docs.google.com/document/d/1J9SIswRMpVKQpnlvJAMAJtKfPP7ZARFknuyXl-2aZ-M/edit)
+* [Health Checks for Updates](https://docs.google.com/document/d/1KOO0LC046k75TqQqJ4c0FQcVGbxvrn71E10wAjMorVY/edit)
+* [JobUpdateDiff thrift API](https://docs.google.com/document/d/1Fc_YhhV7fc4D9Xv6gJzpfooxbK4YWZcvzw6Bd3qVTL8/edit)
+* [REST API RFC](https://docs.google.com/document/d/11_lAsYIRlD5ETRzF2eSd3oa8LXAHYFD8rSetspYXaf4/edit)
+* [Revocable Mesos offers in Aurora](https://docs.google.com/document/d/1r1WCHgmPJp5wbrqSZLsgtxPNj3sULfHrSFmxp2GyPTo/edit)
+* [Supporting the Mesos Universal Containerizer](https://docs.google.com/document/d/111T09NBF2zjjl7HE95xglsDpRdKoZqhCRM5hHmOfTLA/edit?usp=sharing)
+* [Tier Management In Apache Aurora](https://docs.google.com/document/d/1erszT-HsWf1zCIfhbqHlsotHxWUvDyI2xUwNQQQxLgs/edit?usp=sharing)
+* [Ubiquitous Jobs](https://docs.google.com/document/d/12hr6GnUZU3mc7xsWRzMi3nQILGB-3vyUxvbG-6YmvdE/edit)
+* [Pluggable Scheduling](https://docs.google.com/document/d/1fVHLt9AF-YbOCVCDMQmi5DATVusn-tqY8DldKbjVEm0/edit)
+
+Design documents can be found in the Aurora issue tracker via the query [`project = AURORA AND text ~ "docs.google.com" ORDER BY created`](https://issues.apache.org/jira/browse/AURORA-1528?jql=project%20%3D%20AURORA%20AND%20text%20~%20%22docs.google.com%22%20ORDER%20BY%20created).

Added: aurora/site/source/documentation/0.21.0/development/design/command-hooks.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.21.0/development/design/command-hooks.md?rev=1840515&view=auto
==============================================================================
--- aurora/site/source/documentation/0.21.0/development/design/command-hooks.md (added)
+++ aurora/site/source/documentation/0.21.0/development/design/command-hooks.md Tue Sep 11 05:28:10 2018
@@ -0,0 +1,102 @@
+# Command Hooks for the Aurora Client
+
+## Introduction/Motivation
+
+We've got hooks in the client that surround API calls. These are
+pretty awkward, because they don't correlate with user actions. For
+example, suppose we wanted a policy that said users weren't allowed to
+kill all instances of a production job at once.
+
+Right now, all that we could hook would be the "killJob" api call. But
+kill (at least in newer versions of the client) normally runs in
+batches. If a user called killall, what we would see on the API level
+is a series of "killJob" calls, each of which specified a batch of
+instances. We woudn't be able to distinguish between really killing
+all instances of a job (which is forbidden under this policy), and
+carefully killing in batches (which is permitted.) In each case, the
+hook would just see a series of API calls, and couldn't find out what
+the actual command being executed was!
+
+For most policy enforcement, what we really want to be able to do is
+look at and vet the commands that a user is performing, not the API
+calls that the client uses to implement those commands.
+
+So I propose that we add a new kind of hooks, which surround noun/verb
+commands. A hook will register itself to handle a collection of (noun,
+verb) pairs. Whenever any of those noun/verb commands are invoked, the
+hooks methods will be called around the execution of the verb. A
+pre-hook will have the ability to reject a command, preventing the
+verb from being executed.
+
+## Registering Hooks
+
+These hooks will be registered via configuration plugins. A configuration plugin
+can register hooks using an API. Hooks registered this way are, effectively,
+hardwired into the client executable.
+
+The order of execution of hooks is unspecified: they may be called in
+any order. There is no way to guarantee that one hook will execute
+before some other hook.
+
+
+### Global Hooks
+
+Commands registered by the python call are called _global_ hooks,
+because they will run for all configurations, whether or not they
+specify any hooks in the configuration file.
+
+In the implementation, hooks are registered in the module
+`apache.aurora.client.cli.command_hooks`, using the class
+`GlobalCommandHookRegistry`. A global hook can be registered by calling
+`GlobalCommandHookRegistry.register_command_hook` in a configuration plugin.
+
+### The API
+
+    class CommandHook(object)
+      @property
+      def name(self):
+        """Returns a name for the hook."
+
+      def get_nouns(self):
+        """Return the nouns that have verbs that should invoke this hook."""
+
+      def get_verbs(self, noun):
+        """Return the verbs for a particular noun that should invoke his hook."""
+
+      @abstractmethod
+      def pre_command(self, noun, verb, context, commandline):
+        """Execute a hook before invoking a verb.
+        * noun: the noun being invoked.
+        * verb: the verb being invoked.
+        * context: the context object that will be used to invoke the verb.
+          The options object will be initialized before calling the hook
+        * commandline: the original argv collection used to invoke the client.
+        Returns: True if the command should be allowed to proceed; False if the command
+        should be rejected.
+        """
+
+      def post_command(self, noun, verb, context, commandline, result):
+        """Execute a hook after invoking a verb.
+        * noun: the noun being invoked.
+        * verb: the verb being invoked.
+        * context: the context object that will be used to invoke the verb.
+          The options object will be initialized before calling the hook
+        * commandline: the original argv collection used to invoke the client.
+        * result: the result code returned by the verb.
+        Returns: nothing
+        """
+
+    class GlobalCommandHookRegistry(object):
+      @classmethod
+      def register_command_hook(self, hook):
+        pass
+
+### Skipping Hooks
+
+To skip a hook, a user uses a command-line option, `--skip-hooks`. The option can either
+specify specific hooks to skip, or "all":
+
+* `aurora --skip-hooks=all job create east/bozo/devel/myjob` will create a job
+  without running any hooks.
+* `aurora --skip-hooks=test,iq create east/bozo/devel/myjob` will create a job,
+  and will skip only the hooks named "test" and "iq".

Added: aurora/site/source/documentation/0.21.0/development/scheduler.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.21.0/development/scheduler.md?rev=1840515&view=auto
==============================================================================
--- aurora/site/source/documentation/0.21.0/development/scheduler.md (added)
+++ aurora/site/source/documentation/0.21.0/development/scheduler.md Tue Sep 11 05:28:10 2018
@@ -0,0 +1,118 @@
+Developing the Aurora Scheduler
+===============================
+
+The Aurora scheduler is written in Java code and built with [Gradle](http://gradle.org).
+
+
+Prerequisite
+============
+
+When using Apache Aurora checked out from the source repository or the binary
+distribution, the Gradle wrapper and JavaScript dependencies are provided.
+However, you need to manually install them when using the source release
+downloads:
+
+1. Install Gradle following the instructions on the [Gradle web site](http://gradle.org)
+2. From the root directory of the Apache Aurora project generate the Gradle
+wrapper by running:
+
+    gradle wrapper
+
+
+Getting Started
+===============
+
+You will need Java 8 installed and on your `PATH` or unzipped somewhere with `JAVA_HOME` set. Then
+
+    ./gradlew tasks
+
+will bootstrap the build system and show available tasks. This can take a while the first time you
+run it but subsequent runs will be much faster due to cached artifacts.
+
+Running the Tests
+-----------------
+Aurora has a comprehensive unit test suite. To run the tests use
+
+    ./gradlew build
+
+Gradle will only re-run tests when dependencies of them have changed. To force a re-run of all
+tests use
+
+    ./gradlew clean build
+
+Running the build with code quality checks
+------------------------------------------
+To speed up development iteration, the plain gradle commands will not run static analysis tools.
+However, you should run these before posting a review diff, and **always** run this before pushing a
+commit to origin/master.
+
+    ./gradlew build -Pq
+
+Running integration tests
+-------------------------
+To run the same tests that are run in the Apache Aurora continuous integration
+environment:
+
+    ./build-support/jenkins/build.sh
+
+In addition, there is an end-to-end test that runs a suite of aurora commands
+using a virtual cluster:
+
+    ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
+
+Creating a bundle for deployment
+--------------------------------
+Gradle can create a zip file containing Aurora, all of its dependencies, and a launch script with
+
+    ./gradlew distZip
+
+or a tar file containing the same files with
+
+    ./gradlew distTar
+
+The output file will be written to `dist/distributions/aurora-scheduler.zip` or
+`dist/distributions/aurora-scheduler.tar`.
+
+
+
+Developing Aurora Java code
+===========================
+
+Setting up an IDE
+-----------------
+Gradle can generate project files for your IDE. To generate an IntelliJ IDEA project run
+
+    ./gradlew idea
+
+and import the generated `aurora.ipr` file.
+
+Adding or Upgrading a Dependency
+--------------------------------
+New dependencies can be added from Maven central by adding a `compile` dependency to `build.gradle`.
+For example, to add a dependency on `com.example`'s `example-lib` 1.0 add this block:
+
+    compile 'com.example:example-lib:1.0'
+
+NOTE: Anyone thinking about adding a new dependency should first familiarize themselves with the
+Apache Foundation's third-party licensing
+[policy](http://www.apache.org/legal/resolved.html#category-x).
+
+
+
+Developing the Aurora Build System
+==================================
+
+Bootstrapping Gradle
+--------------------
+The following files were autogenerated by `gradle wrapper` using gradle's
+[Wrapper](http://www.gradle.org/docs/current/dsl/org.gradle.api.tasks.wrapper.Wrapper.html) plugin and
+should not be modified directly:
+
+    ./gradlew
+    ./gradlew.bat
+    ./gradle/wrapper/gradle-wrapper.jar
+    ./gradle/wrapper/gradle-wrapper.properties
+
+To upgrade Gradle unpack the new version somewhere, run `/path/to/new/gradle wrapper` in the
+repository root and commit the changed files.
+

Added: aurora/site/source/documentation/0.21.0/development/thermos.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.21.0/development/thermos.md?rev=1840515&view=auto
==============================================================================
--- aurora/site/source/documentation/0.21.0/development/thermos.md (added)
+++ aurora/site/source/documentation/0.21.0/development/thermos.md Tue Sep 11 05:28:10 2018
@@ -0,0 +1,126 @@
+The Python components of Aurora are built using [Pants](https://pantsbuild.github.io).
+
+
+Python Build Conventions
+========================
+The Python code is laid out according to the following conventions:
+
+1. 1 `BUILD` per 3rd level directory. For a list of current top-level packages run:
+
+        % find src/main/python -maxdepth 3 -mindepth 3 -type d |\
+        while read dname; do echo $dname |\
+            sed 's@src/main/python/\(.*\)/\(.*\)/\(.*\).*@\1.\2.\3@'; done
+
+2.  Each `BUILD` file exports 1
+    [`python_library`](https://pantsbuild.github.io/build_dictionary.html#bdict_python_library)
+    that provides a
+    [`setup_py`](https://pantsbuild.github.io/build_dictionary.html#setup_py)
+    containing each
+    [`python_binary`](https://pantsbuild.github.io/build_dictionary.html#python_binary)
+    in the `BUILD` file, named the same as the directory it's in so that it can be referenced
+    without a ':' character. The `sources` field in the `python_library` will almost always be
+    `rglobs('*.py')`.
+
+3.  Other BUILD files may only depend on this single public `python_library`
+    target. Any other target is considered a private implementation detail and
+    should be prefixed with an `_`.
+
+4.  `python_binary` targets are always named the same as the exported console script.
+
+5.  `python_binary` targets must have identical `dependencies` to the `python_library` exported
+    by the package and must use `entry_point`.
+
+    The means a PEX file generated by pants will contain exactly the same files that will be
+    available on the `PYTHONPATH` in the case of `pip install` of the corresponding library
+    target. This will help our migration off of Pants in the future.
+
+Annotated example - apache.thermos.runner
+-----------------------------------------
+
+    % find src/main/python/apache/thermos/runner
+    src/main/python/apache/thermos/runner
+    src/main/python/apache/thermos/runner/__init__.py
+    src/main/python/apache/thermos/runner/thermos_runner.py
+    src/main/python/apache/thermos/runner/BUILD
+    % cat src/main/python/apache/thermos/runner/BUILD
+    # License boilerplate omitted
+    import os
+
+
+    # Private target so that a setup_py can exist without a circular dependency. Only targets within
+    # this file should depend on this.
+    python_library(
+      name = '_runner',
+      # The target covers every python file under this directory and subdirectories.
+      sources = rglobs('*.py'),
+      dependencies = [
+        '3rdparty/python:twitter.common.app',
+        '3rdparty/python:twitter.common.log',
+        # Source dependencies are always referenced without a ':'.
+        'src/main/python/apache/thermos/common',
+        'src/main/python/apache/thermos/config',
+        'src/main/python/apache/thermos/core',
+      ],
+    )
+
+    # Binary target for thermos_runner.pex. Nothing should depend on this - it's only used as an
+    # argument to ./pants binary.
+    python_binary(
+      name = 'thermos_runner',
+      # Use entry_point, not source so the files used here are the same ones tests see.
+      entry_point = 'apache.thermos.bin.thermos_runner',
+      dependencies = [
+        # Notice that we depend only on the single private target from this BUILD file here.
+        ':_runner',
+      ],
+    )
+
+    # The public library that everyone importing the runner symbols uses.
+    # The test targets and any other dependent source code should depend on this.
+    python_library(
+      name = 'runner',
+      dependencies = [
+        # Again, notice that we depend only on the single private target from this BUILD file here.
+        ':_runner',
+      ],
+      # We always provide a setup_py. This will cause any dependee libraries to automatically
+      # reference this library in their requirements.txt rather than copy the source files into their
+      # sdist.
+      provides = setup_py(
+        # Conventionally named and versioned.
+        name = 'apache.thermos.runner',
+        version = open(os.path.join(get_buildroot(), '.auroraversion')).read().strip().upper(),
+      ).with_binaries({
+        # Every binary in this file should also be repeated here.
+        # Always use the dict-form of .with_binaries so that commands with dashes in their names are
+        # supported.
+        # The console script name is always the same as the PEX with .pex stripped.
+        'thermos_runner': ':thermos_runner',
+      }),
+    )
+
+
+
+Thermos Test resources
+======================
+
+The Aurora source repository and distributions contain several
+[binary files](../../src/test/resources/org/apache/thermos/root/checkpoints) to
+qualify the backwards-compatibility of thermos with checkpoint data. Since
+thermos persists state to disk, to be read by the thermos observer), it is important that we have
+tests that prevent regressions affecting the ability to parse previously-written data.
+
+The files included represent persisted checkpoints that exercise different
+features of thermos. The existing files should not be modified unless
+we are accepting backwards incompatibility, such as with a major release.
+
+It is not practical to write source code to generate these files on the fly,
+as source would be vulnerable to drift (e.g. due to refactoring) in ways
+that would undermine the goal of ensuring backwards compatibility.
+
+The most common reason to add a new checkpoint file would be to provide
+coverage for new thermos features that alter the data format. This is
+accomplished by writing and running a
+[job configuration](../../reference/configuration/) that exercises the feature, and
+copying the checkpoint file from the sandbox directory, by default this is
+`/var/run/thermos/checkpoints/<aurora task id>`.

Added: aurora/site/source/documentation/0.21.0/development/thrift.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.21.0/development/thrift.md?rev=1840515&view=auto
==============================================================================
--- aurora/site/source/documentation/0.21.0/development/thrift.md (added)
+++ aurora/site/source/documentation/0.21.0/development/thrift.md Tue Sep 11 05:28:10 2018
@@ -0,0 +1,54 @@
+Thrift
+======
+
+Aurora uses [Apache Thrift](https://thrift.apache.org/) for representing structured data in
+client/server RPC protocol as well as for internal data storage. While Thrift is capable of
+correctly handling additions and renames of the existing members, field removals must be done
+carefully to ensure backwards compatibility and provide predictable deprecation cycle. This
+document describes general guidelines for making Thrift schema changes to the existing fields in
+[api.thrift](https://github.com/apache/aurora/blob/rel/0.21.0/api/src/main/thrift/org/apache/aurora/gen/api.thrift).
+
+It is highly recommended to go through the
+[Thrift: The Missing Guide](http://diwakergupta.github.io/thrift-missing-guide/) first to refresh on
+basic Thrift schema concepts.
+
+Checklist
+---------
+Every existing Thrift schema modification is unique in its requirements and must be analyzed
+carefully to identify its scope and expected consequences. The following checklist may help in that
+analysis:
+* Is this a new field/struct? If yes, go ahead
+* Is this a pure field/struct rename without any type/structure change? If yes, go ahead and rename
+* Anything else, read further to make sure your change is properly planned
+
+Deprecation cycle
+-----------------
+Any time a breaking change (e.g.: field replacement or removal) is required, the following cycle
+must be followed:
+
+### vCurrent
+Change is applied in a way that does not break scheduler/client with this version to
+communicate with scheduler/client from vCurrent-1.
+* Do not remove or rename the old field
+* Add a new field as an eventual replacement of the old one and implement a dual read/write
+anywhere the old field is used. If a thrift struct is mapped in the DB store make sure both columns
+are marked as `NOT NULL`
+* Check [storage.thrift](https://github.com/apache/aurora/blob/rel/0.21.0/api/src/main/thrift/org/apache/aurora/gen/storage.thrift) to see if
+the affected struct is stored in Aurora scheduler storage. If so, it's almost certainly also
+necessary to perform a [DB migration](../db-migration/).
+* Add a deprecation jira ticket into the vCurrent+1 release candidate
+* Add a TODO for the deprecated field mentioning the jira ticket
+
+### vCurrent+1
+Finalize the change by removing the deprecated fields from the Thrift schema.
+* Drop any dual read/write routines added in the previous version
+* Remove thrift backfilling in scheduler
+* Remove the deprecated Thrift field
+
+Testing
+-------
+It's always advisable to test your changes in the local vagrant environment to build more
+confidence that you change is backwards compatible. It's easy to simulate different
+client/scheduler versions by playing with `aurorabuild` command. See [this document](../../getting-started/vagrant/)
+for more.
+

Added: aurora/site/source/documentation/0.21.0/development/ui.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.21.0/development/ui.md?rev=1840515&view=auto
==============================================================================
--- aurora/site/source/documentation/0.21.0/development/ui.md (added)
+++ aurora/site/source/documentation/0.21.0/development/ui.md Tue Sep 11 05:28:10 2018
@@ -0,0 +1,46 @@
+Developing the Aurora Scheduler UI
+==================================
+
+Installing bower (optional)
+----------------------------
+Third party JS libraries used in Aurora (located at 3rdparty/javascript/bower_components) are
+managed by bower, a JS dependency manager. Bower is only required if you plan to add, remove or
+update JS libraries. Bower can be installed using the following command:
+
+    npm install -g bower
+
+Bower depends on node.js and npm. The easiest way to install node on a mac is via brew:
+
+    brew install node
+
+For more node.js installation options refer to https://github.com/joyent/node/wiki/Installation.
+
+More info on installing and using bower can be found at: http://bower.io/. Once installed, you can
+use the following commands to view and modify the bower repo at
+3rdparty/javascript/bower_components
+
+    bower list
+    bower install <library name>
+    bower remove <library name>
+    bower update <library name>
+    bower help
+
+
+Faster Iteration in Vagrant
+---------------------------
+The scheduler serves UI assets from the classpath. For production deployments this means the assets
+are served from within a jar. However, for faster development iteration, the vagrant image is
+configured to add the `scheduler` subtree of `/vagrant/dist/resources/main` to the head of
+`CLASSPATH`. This path is configured as a shared filesystem to the path on the host system where
+your Aurora repository lives. This means that any updates under `dist/resources/main/scheduler` in
+your checkout will be reflected immediately in the UI served from within the vagrant image.
+
+The one caveat to this is that this path is under `dist` not `src`. This is because the assets must
+be processed by gradle before they can be served. So, unfortunately, you cannot just save your local
+changes and see them reflected in the UI, you must first run `./gradlew processResources`. This is
+less than ideal, but better than having to restart the scheduler after every change. Additionally,
+gradle makes this process somewhat easier with the use of the `--continuous` flag. If you run:
+`./gradlew processResources --continuous` gradle will monitor the filesystem for changes and run the
+task automatically as necessary. This doesn't quite provide hot-reload capabilities, but it does
+allow for <5s from save to changes being visibile in the UI with no further action required on the
+part of the developer.

Added: aurora/site/source/documentation/0.21.0/features/constraints.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.21.0/features/constraints.md?rev=1840515&view=auto
==============================================================================
--- aurora/site/source/documentation/0.21.0/features/constraints.md (added)
+++ aurora/site/source/documentation/0.21.0/features/constraints.md Tue Sep 11 05:28:10 2018
@@ -0,0 +1,126 @@
+Scheduling Constraints
+======================
+
+By default, Aurora will pick any random agent with sufficient resources
+in order to schedule a task. This scheduling choice can be further
+restricted with the help of constraints.
+
+
+Mesos Attributes
+----------------
+
+Data centers are often organized with hierarchical failure domains.  Common failure domains
+include hosts, racks, rows, and PDUs.  If you have this information available, it is wise to tag
+the Mesos agent with them as
+[attributes](https://mesos.apache.org/documentation/attributes-resources/).
+
+The Mesos agent `--attributes` command line argument can be used to mark agents with
+static key/value pairs, so called attributes (not to be confused with `--resources`, which are
+dynamic and accounted).
+
+For example, consider the host `cluster1-aaa-03-sr2` and its following attributes (given in
+key:value format): `host:cluster1-aaa-03-sr2` and `rack:aaa`.
+
+Aurora makes these attributes available for matching with scheduling constraints.
+
+
+Limit Constraints
+-----------------
+
+Limit constraints allow to control machine diversity using constraints. The below
+constraint ensures that no more than two instances of your job may run on a single host.
+Think of this as a "group by" limit.
+
+    Service(
+      name = 'webservice',
+      role = 'www-data',
+      constraints = {
+        'host': 'limit:2',
+      }
+      ...
+    )
+
+
+Likewise, you can use constraints to control rack diversity, e.g. at
+most one task per rack:
+
+    constraints = {
+      'rack': 'limit:1',
+    }
+
+Use these constraints sparingly as they can dramatically reduce Tasks' schedulability.
+Further details are available in the reference documentation on
+[Scheduling Constraints](../../reference/configuration/#specifying-scheduling-constraints).
+
+
+
+Value Constraints
+-----------------
+
+Value constraints can be used to express that a certain attribute with a certain value
+should be present on a Mesos agent. For example, the following job would only be
+scheduled on nodes that claim to have an `SSD` as their disk.
+
+    Service(
+      name = 'webservice',
+      role = 'www-data',
+      constraints = {
+        'disk': 'SSD',
+      }
+      ...
+    )
+
+
+Further details are available in the reference documentation on
+[Scheduling Constraints](../../reference/configuration/#specifying-scheduling-constraints).
+
+
+Running stateful services
+-------------------------
+
+Aurora is best suited to run stateless applications, but it also accommodates for stateful services
+like databases, or services that otherwise need to always run on the same machines.
+
+### Dedicated attribute
+
+Most of the Mesos attributes arbitrary and available for custom use.  There is one exception,
+though: the `dedicated` attribute.  Aurora treats this specially, and only allows matching jobs to
+run on these machines, and will only schedule matching jobs on these machines.
+
+
+#### Syntax
+The dedicated attribute has semantic meaning. The format is `$role(/.*)?`. When a job is created,
+the scheduler requires that the `$role` component matches the `role` field in the job
+configuration, and will reject the job creation otherwise.  The remainder of the attribute is
+free-form. We've developed the idiom of formatting this attribute as `$role/$job`, but do not
+enforce this. For example: a job `devcluster/www-data/prod/hello` with a dedicated constraint set as
+`www-data/web.multi` will have its tasks scheduled only on Mesos agents configured with:
+`--attributes=dedicated:www-data/web.multi`.
+
+A wildcard (`*`) may be used for the role portion of the dedicated attribute, which will allow any
+owner to elect for a job to run on the host(s). For example: tasks from both
+`devcluster/www-data/prod/hello` and `devcluster/vagrant/test/hello` with a dedicated constraint
+formatted as `*/web.multi` will be scheduled only on Mesos agents configured with
+`--attributes=dedicated:*/web.multi`. This may be useful when assembling a virtual cluster of
+machines sharing the same set of traits or requirements.
+
+##### Example
+Consider the following agent command line:
+
+    mesos-slave --attributes="dedicated:db_team/redis" ...
+
+And this job configuration:
+
+    Service(
+      name = 'redis',
+      role = 'db_team',
+      constraints = {
+        'dedicated': 'db_team/redis'
+      }
+      ...
+    )
+
+The job configuration is indicating that it should only be scheduled on agents with the attribute
+`dedicated:db_team/redis`.  Additionally, Aurora will prevent any tasks that do _not_ have that
+constraint from running on those agents.
+

Added: aurora/site/source/documentation/0.21.0/features/containers.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.21.0/features/containers.md?rev=1840515&view=auto
==============================================================================
--- aurora/site/source/documentation/0.21.0/features/containers.md (added)
+++ aurora/site/source/documentation/0.21.0/features/containers.md Tue Sep 11 05:28:10 2018
@@ -0,0 +1,130 @@
+Containers
+==========
+
+Aurora supports several containerizers, notably the Mesos containerizer and the Docker
+containerizer. The Mesos containerizer uses native OS features directly to provide isolation between
+containers, while the Docker containerizer delegates container management to the Docker engine.
+
+The support for launching container images via both containerizers has to be
+[enabled by a cluster operator](../../operations/configuration/#containers).
+
+Mesos Containerizer
+-------------------
+
+The Mesos containerizer is the native Mesos containerizer solution. It allows tasks to be
+run with an array of [pluggable isolators](../resource-isolation/) and can launch tasks using
+[Docker](https://github.com/docker/docker/blob/master/image/spec/v1.md) images,
+[AppC](https://github.com/appc/spec/blob/master/SPEC.md) images, or directly on the agent host
+filesystem.
+
+The following example (available in our [Vagrant environment](../../getting-started/vagrant/))
+launches a hello world example within a `debian/jessie` Docker image:
+
+    $ cat /vagrant/examples/jobs/hello_docker_image.aurora
+    hello_loop = Process(
+      name = 'hello',
+      cmdline = """
+        while true; do
+          echo hello world
+          sleep 10
+        done
+      """)
+
+    task = Task(
+      processes = [hello_loop],
+      resources = Resources(cpu=1, ram=1*MB, disk=8*MB)
+    )
+
+    jobs = [
+      Service(
+        cluster = 'devcluster',
+        environment = 'devel',
+        role = 'www-data',
+        name = 'hello_docker_image',
+        task = task,
+        container = Mesos(image=DockerImage(name='debian', tag='jessie'))
+      )
+    ]
+
+Docker and Appc images are designated using an appropriate `image` property of the `Mesos`
+configuration object. If either `container` or `image` is left unspecified, the host filesystem
+will be used. Further details of how to specify images can be found in the
+[Reference Documentation](../../reference/configuration/#mesos-object).
+
+By default, Aurora launches processes as the Linux user named like the used role (e.g. `www-data`
+in the example above). This user has to exist on the host filesystem. If it does not exist within
+the container image, it will be created automatically. Otherwise, this user and its primary group
+has to exist in the image with matching uid/gid.
+
+For more information on the Mesos containerizer filesystem, namespace, and isolator features, visit
+[Mesos Containerizer](http://mesos.apache.org/documentation/latest/mesos-containerizer/) and
+[Mesos Container Images](http://mesos.apache.org/documentation/latest/container-image/).
+
+
+Docker Containerizer
+--------------------
+
+The Docker containerizer launches container images using the Docker engine. It may often provide
+more advanced features than the native Mesos containerizer, but has to be installed separately to
+Mesos on each agent host.
+
+Starting with the 0.17.0 release, `image` can be specified with a `{{docker.image[name][tag]}}` binder so that
+the tag can be resolved to a concrete image digest. This ensures that the job always uses the same image
+across restarts, even if the version identified by the tag has been updated, guaranteeing that only job
+updates can mutate configuration.
+
+Example (available in the [Vagrant environment](../../getting-started/vagrant/)):
+
+    $ cat /vagrant/examples/jobs/hello_docker_engine.aurora
+    hello_loop = Process(
+      name = 'hello',
+      cmdline = """
+        while true; do
+          echo hello world
+          sleep 10
+        done
+      """)
+
+    task = Task(
+      processes = [hello_loop],
+      resources = Resources(cpu=1, ram=1*MB, disk=8*MB)
+    )
+
+    jobs = [
+      Service(
+        cluster = 'devcluster',
+        environment = 'devel',
+        role = 'www-data',
+        name = 'hello_docker',
+        task = task,
+        container = Docker(image = 'python:2.7')
+      ), Service(
+        cluster = 'devcluster',
+        environment = 'devel',
+        role = 'www-data',
+        name = 'hello_docker_engine_binding',
+        task = task,
+        container = Docker(image = '{{docker.image[library/python][2.7]}}')
+      )
+    ]
+
+Note, this feature requires a v2 Docker registry. If using a private Docker registry its url
+must be specified in the `clusters.json` configuration file under the key `docker_registry`.
+If not specified `docker_registry` defaults to `https://registry-1.docker.io` (Docker Hub).
+
+Example:
+    # clusters.json
+    [{
+      "name": "devcluster",
+      ...
+      "docker_registry": "https://registry.example.com"
+    }]
+
+Details of how to use Docker via the Docker engine can be found in the
+[Reference Documentation](../../reference/configuration/#docker-object). Please note that in order to
+correctly execute processes inside a job, the Docker container must have Python 2.7 and potentitally
+further Mesos dependencies installed. This limitation does not hold for Docker containers used via
+the Mesos containerizer.
+
+For more information on launching Docker containers through the Docker containerizer, visit
+[Docker Containerizer](http://mesos.apache.org/documentation/latest/docker-containerizer/)

Added: aurora/site/source/documentation/0.21.0/features/cron-jobs.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.21.0/features/cron-jobs.md?rev=1840515&view=auto
==============================================================================
--- aurora/site/source/documentation/0.21.0/features/cron-jobs.md (added)
+++ aurora/site/source/documentation/0.21.0/features/cron-jobs.md Tue Sep 11 05:28:10 2018
@@ -0,0 +1,124 @@
+# Cron Jobs
+
+Aurora supports execution of scheduled jobs on a Mesos cluster using cron-style syntax.
+
+- [Overview](#overview)
+- [Collision Policies](#collision-policies)
+- [Failure recovery](#failure-recovery)
+- [Interacting with cron jobs via the Aurora CLI](#interacting-with-cron-jobs-via-the-aurora-cli)
+	- [cron schedule](#cron-schedule)
+	- [cron deschedule](#cron-deschedule)
+	- [cron start](#cron-start)
+	- [job killall, job restart, job kill](#job-killall-job-restart-job-kill)
+- [Technical Note About Syntax](#technical-note-about-syntax)
+- [Caveats](#caveats)
+	- [Failovers](#failovers)
+	- [Collision policy is best-effort](#collision-policy-is-best-effort)
+	- [Timezone Configuration](#timezone-configuration)
+
+## Overview
+
+A job is identified as a cron job by the presence of a
+`cron_schedule` attribute containing a cron-style schedule in the
+[`Job`](../../reference/configuration/#job-objects) object. Examples of cron schedules
+include "every 5 minutes" (`*/5 * * * *`), "Fridays at 17:00" (`* 17 * * FRI`), and
+"the 1st and 15th day of the month at 03:00" (`0 3 1,15 *`).
+
+Example (available in the [Vagrant environment](../../getting-started/vagrant/)):
+
+    $ cat /vagrant/examples/jobs/cron_hello_world.aurora
+    # A cron job that runs every 5 minutes.
+    jobs = [
+      Job(
+        cluster = 'devcluster',
+        role = 'www-data',
+        environment = 'test',
+        name = 'cron_hello_world',
+        cron_schedule = '*/5 * * * *',
+        task = SimpleTask(
+          'cron_hello_world',
+          'echo "Hello world from cron, the time is now $(date --rfc-822)"'),
+      ),
+    ]
+
+## Collision Policies
+
+The `cron_collision_policy` field specifies the scheduler's behavior when a new cron job is
+triggered while an older run hasn't finished. The scheduler has two policies available:
+
+* `KILL_EXISTING`: The default policy - on a collision the old instances are killed and a instances with the current
+configuration are started.
+* `CANCEL_NEW`: On a collision the new run is cancelled.
+
+Note that the use of `CANCEL_NEW` is likely a code smell - interrupted cron jobs should be able
+to recover their progress on a subsequent invocation, otherwise they risk having their work queue
+grow faster than they can process it.
+
+## Failure recovery
+
+Unlike with services, which aurora will always re-execute regardless of exit status, instances of
+cron jobs retry according to the `max_task_failures` attribute of the
+[Task](../../reference/configuration/#task-object) object. To get "run-until-success" semantics,
+set `max_task_failures` to `-1`.
+
+## Interacting with cron jobs via the Aurora CLI
+
+Most interaction with cron jobs takes place using the `cron` subcommand. See `aurora cron -h`
+for up-to-date usage instructions.
+
+### cron schedule
+Schedules a new cron job on the Aurora cluster for later runs or replaces the existing cron template
+with a new one. Only future runs will be affected, any existing active tasks are left intact.
+
+    $ aurora cron schedule devcluster/www-data/test/cron_hello_world /vagrant/examples/jobs/cron_hello_world.aurora
+
+### cron deschedule
+Deschedules a cron job, preventing future runs but allowing current runs to complete.
+
+    $ aurora cron deschedule devcluster/www-data/test/cron_hello_world
+
+### cron start
+Start a cron job immediately, outside of its normal cron schedule.
+
+    $ aurora cron start devcluster/www-data/test/cron_hello_world
+
+### job killall, job restart, job kill
+Cron jobs create instances running on the cluster that you can interact with like normal Aurora
+tasks with `job kill` and `job restart`.
+
+
+## Technical Note About Syntax
+
+`cron_schedule` uses a restricted subset of BSD crontab syntax. While the
+execution engine currently uses Quartz, the schedule parsing is custom, a subset of FreeBSD
+[crontab(5)](http://www.freebsd.org/cgi/man.cgi?crontab(5)) syntax. See
+[the source](https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/cron/CrontabEntry.java#L106-L124)
+for details.
+
+
+## Caveats
+
+### Failovers
+No failover recovery. Aurora does not record the latest minute it fired
+triggers for across failovers. Therefore it's possible to miss triggers
+on failover. Note that this behavior may change in the future.
+
+It's necessary to sync time between schedulers with something like `ntpd`.
+Clock skew could cause double or missed triggers in the case of a failover.
+
+### Collision policy is best-effort
+Aurora aims to always have *at least one copy* of a given instance running at a time - it's
+an AP system, meaning it chooses Availability and Partition Tolerance at the expense of
+Consistency.
+
+If your collision policy was `CANCEL_NEW` and a task has terminated but
+Aurora has not noticed this Aurora will go ahead and create your new
+task.
+
+If your collision policy was `KILL_EXISTING` and a task was marked `LOST`
+but not yet GCed Aurora will go ahead and create your new task without
+attempting to kill the old one (outside the GC interval).
+
+### Timezone Configuration
+Cron timezone is configured indepdendently of JVM timezone with the `-cron_timezone` flag and
+defaults to UTC.

Added: aurora/site/source/documentation/0.21.0/features/custom-executors.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.21.0/features/custom-executors.md?rev=1840515&view=auto
==============================================================================
--- aurora/site/source/documentation/0.21.0/features/custom-executors.md (added)
+++ aurora/site/source/documentation/0.21.0/features/custom-executors.md Tue Sep 11 05:28:10 2018
@@ -0,0 +1,166 @@
+Custom Executors
+================
+
+If the need arises to use a Mesos executor other than the Thermos executor, the scheduler can be
+configured to utilize a custom executor by specifying the `-custom_executor_config` flag.
+The flag must be set to the path of a valid executor configuration file.
+
+The configuration file must be a valid **JSON array** and contain, at minimum,
+one executor configuration including the name, command and resources fields and
+must be pointed to by the `-custom_executor_config` flag when the scheduler is
+started.
+
+### Array Entry
+
+Property                 | Description
+-----------------------  | ---------------------------------
+executor (required)      | Description of executor.
+task_prefix (required) ) | Prefix given to tasks launched with this executor's configuration.
+volume_mounts (optional) | Volumes to be mounted in container running executor.
+
+#### executor
+
+Property                 | Description
+-----------------------  | ---------------------------------
+name (required)          | Name of the executor.
+command (required)       | How to run the executor.
+resources (required)     | Overhead to use for each executor instance.
+
+#### command
+
+Property                 | Description
+-----------------------  | ---------------------------------
+value (required)         | The command to execute.
+arguments (optional)     | A list of arguments to pass to the command.
+uris (optional)          | List of resources to download into the task sandbox.
+shell (optional)         | Run executor via shell.
+
+A note on the command property (from [mesos.proto](https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto)):
+
+```
+1) If 'shell == true', the command will be launched via shell
+   (i.e., /bin/sh -c 'value'). The 'value' specified will be
+   treated as the shell command. The 'arguments' will be ignored.
+2) If 'shell == false', the command will be launched by passing
+   arguments to an executable. The 'value' specified will be
+   treated as the filename of the executable. The 'arguments'
+   will be treated as the arguments to the executable. This is
+   similar to how POSIX exec families launch processes (i.e.,
+   execlp(value, arguments(0), arguments(1), ...)).
+```
+
+##### uris (list)
+* Follows the [Mesos Fetcher schema](http://mesos.apache.org/documentation/latest/fetcher/)
+
+Property                 | Description
+-----------------------  | ---------------------------------
+value (required)         | Path to the resource needed in the sandbox.
+executable (optional)    | Change resource to be executable via chmod.
+extract (optional)       | Extract files from packed or compressed archives into the sandbox.
+cache (optional)         | Use caching mechanism provided by Mesos for resources.
+
+#### resources (list)
+
+Property             | Description
+-------------------  | ---------------------------------
+name (required)      | Name of the resource: cpus or mem.
+type (required)      | Type of resource. Should always be SCALAR.
+scalar (required)    | Value in float for cpus or int for mem (in MBs)
+
+### volume_mounts (list)
+
+Property                     | Description
+---------------------------  | ---------------------------------
+host_path (required)         | Host path to mount inside the container.
+container_path (required)    | Path inside the container where `host_path` will be mounted.
+mode (required)              | Mode in which to mount the volume, Read-Write (RW) or Read-Only (RO).
+
+A sample configuration is as follows:
+
+```json
+[
+    {
+      "executor": {
+        "name": "myExecutor",
+        "command": {
+          "value": "myExecutor.a",
+          "shell": "false",
+          "arguments": [
+            "localhost:2181",
+            "-verbose",
+            "-config myConfiguration.config"
+          ],
+          "uris": [
+            {
+              "value": "/dist/myExecutor.a",
+              "executable": true,
+              "extract": false,
+              "cache": true
+            },
+            {
+              "value": "/home/user/myConfiguration.config",
+              "executable": false,
+              "extract": false,
+              "cache": false
+            }
+          ]
+        },
+        "resources": [
+          {
+            "name": "cpus",
+            "type": "SCALAR",
+            "scalar": {
+              "value": 1.00
+            }
+          },
+          {
+            "name": "mem",
+            "type": "SCALAR",
+            "scalar": {
+              "value": 512
+            }
+          }
+        ]
+      },
+      "volume_mounts": [
+        {
+          "mode": "RO",
+          "container_path": "/path/on/container",
+          "host_path": "/path/to/host/directory"
+        },
+        {
+          "mode": "RW",
+          "container_path": "/container",
+          "host_path": "/host"
+        }
+      ],
+      "task_prefix": "my-executor-"
+    }
+]
+```
+
+It should be noted that if you do not use Thermos or a Thermos based executor, links in the scheduler's
+Web UI for tasks will not work (at least for the time being).
+Some information about launched tasks can still be accessed via the Mesos Web UI or via the Aurora Client.
+
+### Using a custom executor
+
+To launch tasks using a custom executor,
+an [ExecutorConfig](../../reference/configuration/#executorconfig-objects) object must be added to
+the Job or Service object. The `name` parameter of ExecutorConfig must match the name of an executor
+defined in the JSON object provided to the scheduler at startup time.
+
+For example, if we desire to launch tasks using `myExecutor` (defined above), we may do so in
+the following manner:
+
+```
+jobs = [Service(
+  task = task,
+  cluster = 'devcluster',
+  role = 'www-data',
+  environment = 'prod',
+  name = 'hello',
+  executor_config = ExecutorConfig(name='myExecutor'))]
+```
+
+This will create a Service Job which will launch tasks using myExecutor instead of Thermos.

Added: aurora/site/source/documentation/0.21.0/features/job-updates.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.21.0/features/job-updates.md?rev=1840515&view=auto
==============================================================================
--- aurora/site/source/documentation/0.21.0/features/job-updates.md (added)
+++ aurora/site/source/documentation/0.21.0/features/job-updates.md Tue Sep 11 05:28:10 2018
@@ -0,0 +1,123 @@
+Aurora Job Updates
+==================
+
+`Job` configurations can be updated at any point in their lifecycle.
+Usually updates are done incrementally using a process called a *rolling
+upgrade*, in which Tasks are upgraded in small groups, one group at a
+time.  Updates are done using various Aurora Client commands.
+
+
+Rolling Job Updates
+-------------------
+
+There are several sub-commands to manage job updates:
+
+    aurora update start <job key> <configuration file>
+    aurora update info <job key>
+    aurora update pause <job key>
+    aurora update resume <job key>
+    aurora update abort <job key>
+    aurora update list <cluster>
+
+When you `start` a job update, the command will return once it has sent the
+instructions to the scheduler.  At that point, you may view detailed
+progress for the update with the `info` subcommand, in addition to viewing
+graphical progress in the web browser.  You may also get a full listing of
+in-progress updates in a cluster with `list`.
+
+Once an update has been started, you can `pause` to keep the update but halt
+progress.  This can be useful for doing things like debug a  partially-updated
+job to determine whether you would like to proceed.  You can `resume` to
+proceed.
+
+You may `abort` a job update regardless of the state it is in. This will
+instruct the scheduler to completely abandon the job update and leave the job
+in the current (possibly partially-updated) state.
+
+For a configuration update, the Aurora Scheduler calculates required changes
+by examining the current job config state and the new desired job config.
+It then starts a *rolling batched update process* by going through every batch
+and performing these operations, in order:
+
+- If an instance is not present in the scheduler but is present in
+  the new config, then the instance is created.
+- If an instance is present in both the scheduler and the new config, then
+  the scheduler diffs both task configs. If it detects any changes, it
+  performs an instance update by killing the old config instance and adds
+  the new config instance.
+- If an instance is present in the scheduler but isn't in the new config,
+  then that instance is killed.
+
+The Aurora Scheduler continues through the instance list until all tasks are
+updated and in `RUNNING`. If the scheduler determines the update is not going
+well (based on the criteria specified in the UpdateConfig), it cancels the update.
+
+Update cancellation runs a procedure similar to the described above
+update sequence, but in reverse order. New instance configs are swapped
+with old instance configs and batch updates proceed backwards
+from the point where the update failed. E.g.; (0,1,2) (3,4,5) (6,7,
+8-FAIL) results in a rollback in order (8,7,6) (5,4,3) (2,1,0).
+
+For details on how to control a job update, please see the
+[UpdateConfig](../../reference/configuration/#updateconfig-objects) configuration object.
+
+
+Coordinated Job Updates
+------------------------
+
+Some Aurora services may benefit from having more control over updates by explicitly
+acknowledging ("heartbeating") job update progress. This may be helpful for mission-critical
+service updates where explicit job health monitoring is vital during the entire job update
+lifecycle. Such job updates would rely on an external service (or a custom client) periodically
+pulsing an active coordinated job update via a
+[pulseJobUpdate RPC](https://github.com/apache/aurora/blob/rel/0.21.0/api/src/main/thrift/org/apache/aurora/gen/api.thrift).
+
+A coordinated update is defined by setting a positive
+[pulse_interval_secs](../../reference/configuration/#updateconfig-objects) value in job configuration
+file. If no pulses are received within specified interval the update will be blocked. A blocked
+update is unable to continue rolling forward (or rolling back) but retains its active status.
+It may only be unblocked by a fresh `pulseJobUpdate` call.
+
+NOTE: A coordinated update starts in `ROLL_FORWARD_AWAITING_PULSE` state and will not make any
+progress until the first pulse arrives. However, a paused update (`ROLL_FORWARD_PAUSED` or
+`ROLL_BACK_PAUSED`) is still considered active and upon resuming will immediately make progress
+provided the pulse interval has not expired.
+
+
+SLA-Aware Updates
+-----------------
+
+Updates can take advantage of [Custom SLA Requirements](../../features/sla-requirements/) and
+specify the `sla_aware=True` option within
+[UpdateConfig](../../reference/configuration/#updateconfig-objects) to only update instances if
+the action will maintain the task's SLA requirements. This feature allows updates to avoid killing
+too many instances in the face of unexpected failures outside of the update range.
+
+See the [Using the `sla_aware` option](../../reference/configuration/#using-the-sla-aware-option)
+for more information on how to use this feature.
+
+
+Canary Deployments
+------------------
+
+Canary deployments are a pattern for rolling out updates to a subset of job instances,
+in order to test different code versions alongside the actual production job.
+It is a risk-mitigation strategy for job owners and commonly used in a form where
+job instance 0 runs with a different configuration than the instances 1-N.
+
+For example, consider a job with 4 instances that each
+request 1 core of cpu, 1 GB of RAM, and 1 GB of disk space as specified
+in the configuration file `hello_world.aurora`. If you want to
+update it so it requests 2 GB of RAM instead of 1. You can create a new
+configuration file to do that called `new_hello_world.aurora` and
+issue
+
+    aurora update start <job_key_value>/0-1 new_hello_world.aurora
+
+This results in instances 0 and 1 having 1 cpu, 2 GB of RAM, and 1 GB of disk space,
+while instances 2 and 3 have 1 cpu, 1 GB of RAM, and 1 GB of disk space. If instance 3
+dies and restarts, it restarts with 1 cpu, 1 GB RAM, and 1 GB disk space.
+
+So that means there are two simultaneous task configurations for the same job
+at the same time, just valid for different ranges of instances. While this isn't a recommended
+pattern, it is valid and supported by the Aurora scheduler.

Added: aurora/site/source/documentation/0.21.0/features/mesos-fetcher.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.21.0/features/mesos-fetcher.md?rev=1840515&view=auto
==============================================================================
--- aurora/site/source/documentation/0.21.0/features/mesos-fetcher.md (added)
+++ aurora/site/source/documentation/0.21.0/features/mesos-fetcher.md Tue Sep 11 05:28:10 2018
@@ -0,0 +1,46 @@
+Mesos Fetcher
+=============
+
+Mesos has support for downloading resources into the sandbox through the
+use of the [Mesos Fetcher](http://mesos.apache.org/documentation/latest/fetcher/)
+
+Aurora supports passing URIs to the Mesos Fetcher dynamically by including
+a list of URIs in job submissions.
+
+How to use
+----------
+The scheduler flag `-enable_mesos_fetcher` must be set to true.
+
+Currently only the scheduler side of this feature has been implemented
+so a modification to the existing client, or a custom Thrift client are required
+to make use of this feature.
+
+If using a custom Thrift client, the list of URIs must be included in TaskConfig
+as the `mesosFetcherUris` field.
+
+Each Mesos Fetcher URI has the following data members:
+
+|Property | Description|
+|---------|------|
+|value (required)  |Path to the resource needed in the sandbox.|
+|extract (optional)|Extract files from packed or compressed archives into the sandbox.|
+|cache (optional) | Use caching mechanism provided by Mesos for resources.|
+
+Note that this structure is very similar to the one provided for downloading
+resources needed for a [custom executor](../../operations/configuration/).
+
+This is because both features use the Mesos fetcher to retrieve resources into
+the sandbox. However, one, the custom executor feature, has a static set of URIs
+set in the server side, and the other, the Mesos Fetcher feature, is a dynamic set
+of URIs set at the time of job submission.
+
+Security Implications
+---------------------
+There are security implications that must be taken into account when enabling this feature.
+**Enabling this feature may potentially enable any job submitting user to perform a privilege escalation.**
+
+Until a more through solution is created, one step that has been taken to mitigate this issue
+is to statically mark every user submitted URI as non-executable. This is in contrast to the set of URIs
+set in the custom executor feature which may mark any URI as executable.
+
+If the need arises to mark a downloaded URI as executable, please consider using the custom executor feature.
\ No newline at end of file

Added: aurora/site/source/documentation/0.21.0/features/multitenancy.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.21.0/features/multitenancy.md?rev=1840515&view=auto
==============================================================================
--- aurora/site/source/documentation/0.21.0/features/multitenancy.md (added)
+++ aurora/site/source/documentation/0.21.0/features/multitenancy.md Tue Sep 11 05:28:10 2018
@@ -0,0 +1,82 @@
+Multitenancy
+============
+
+Aurora is a multi-tenant system that can run jobs of multiple clients/tenants.
+Going beyond the [resource isolation on an individual host](../resource-isolation/), it is
+crucial to prevent those jobs from stepping on each others toes.
+
+
+Job Namespaces
+--------------
+
+The namespace for jobs in Aurora follows a hierarchical structure. This is meant to make it easier
+to differentiate between different jobs. A job key consists of four parts. The four parts are
+`<cluster>/<role>/<environment>/<jobname>` in that order:
+
+* Cluster refers to the name of a particular Aurora installation.
+* Role names are user accounts.
+* Environment names are namespaces.
+* Jobname is the custom name of your job.
+
+Role names correspond to user accounts. They are used for
+[authentication](../../operations/security/), as the linux user used to run jobs, and for the
+assignment of [quota](#preemption). If you don't know what accounts are available, contact your
+sysadmin.
+
+The environment component in the job key, serves as a namespace. The values for
+environment are validated in the scheduler. By default allowing any of `devel`, `test`,
+`production`, and any value matching the regular expression `staging[0-9]*`. This validation can be
+changed to allow any arbitrary regular expression by setting the scheduler option `allowed_job_environments`.
+
+None of the values imply any difference in the scheduling behavior. Conventionally, the
+"environment" is set so as to indicate a certain level of stability in the behavior of the job
+by ensuring that an appropriate level of testing has been performed on the application code. e.g.
+in the case of a typical Job, releases may progress through the following phases in order of
+increasing level of stability: `devel`, `test`, `staging`, `production`.
+
+
+Configuration Tiers
+-------------------
+
+Tier is a predefined bundle of task configuration options. Aurora schedules tasks and assigns them
+resources based on their tier assignment. The default scheduler tier configuration allows for
+3 tiers:
+
+ - `revocable`: The `revocable` tier requires the task to run with [revocable](../resource-isolation/#oversubscription)
+ resources.
+ - `preemptible`: Setting the task’s tier to `preemptible` allows for the possibility of that task
+ being [preempted](#preemption) by other tasks when cluster is running low on resources.
+ - `preferred`: The `preferred` tier prevents the task from using [revocable](../resource-isolation/#oversubscription)
+ resources and from being [preempted](#preemption).
+
+Since it is possible that a cluster is configured with a custom tier configuration, users should
+consult their cluster administrator to be informed of the tiers supported by the cluster. Attempts
+to schedule jobs with an unsupported tier will be rejected by the scheduler.
+
+
+Preemption
+----------
+
+In order to guarantee that important production jobs are always running, Aurora supports
+preemption.
+
+Let's consider we have a pending job that is candidate for scheduling but resource shortage pressure
+prevents this. Active tasks can become the victim of preemption, if:
+
+ - both candidate and victim are owned by the same role and the
+   [priority](../../reference/configuration/#job-objects) of a victim is lower than the
+   [priority](../../reference/configuration/#job-objects) of the candidate.
+ - OR a victim is a `preemptible` or `revocable` [tier](#configuration-tiers) task and the candidate
+   is a `preferred` [tier](#configuration-tiers) task.
+
+In other words, tasks from `preferred` [tier](../../reference/configuration/#job-objects) jobs may
+preempt tasks from any `preemptible` or `revocable` job. However, a `preferred` task may only be
+preempted by tasks from `preferred` jobs in the same role with higher [priority](../../reference/configuration/#job-objects).
+
+Aurora requires resource quotas for [production non-dedicated jobs](../../reference/configuration/#job-objects).
+Quota is enforced at the job role level and when set, defines a non-preemptible pool of compute resources within
+that role. All job types (service, adhoc or cron) require role resource quota unless a job has
+[dedicated constraint set](../constraints/#dedicated-attribute).
+
+To grant quota to a particular role in production, an operator can use the command
+`aurora_admin set_quota`.

Added: aurora/site/source/documentation/0.21.0/features/resource-isolation.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.21.0/features/resource-isolation.md?rev=1840515&view=auto
==============================================================================
--- aurora/site/source/documentation/0.21.0/features/resource-isolation.md (added)
+++ aurora/site/source/documentation/0.21.0/features/resource-isolation.md Tue Sep 11 05:28:10 2018
@@ -0,0 +1,181 @@
+Resources Isolation and Sizing
+==============================
+
+This document assumes Aurora and Mesos have been configured
+using our [recommended resource isolation settings](../../operations/configuration/#resource-isolation).
+
+- [Isolation](#isolation)
+- [Sizing](#sizing)
+- [Oversubscription](#oversubscription)
+
+
+Isolation
+---------
+
+Aurora is a multi-tenant system; a single software instance runs on a
+server, serving multiple clients/tenants. To share resources among
+tenants, it leverages Mesos for isolation of:
+
+* CPU
+* GPU
+* memory
+* disk space
+* ports
+
+CPU is a soft limit, and handled differently from memory and disk space.
+Too low a CPU value results in throttling your application and
+slowing it down. Memory and disk space are both hard limits; when your
+application goes over these values, it's killed.
+
+### CPU Isolation
+
+Mesos can be configured to use a quota based CPU scheduler (the *Completely*
+*Fair Scheduler*) to provide consistent and predictable performance.
+This is effectively a guarantee of resources -- you receive at least what
+you requested, but also no more than you've requested.
+
+The scheduler gives applications a CPU quota for every 100 ms interval.
+When an application uses its quota for an interval, it is throttled for
+the rest of the 100 ms. Usage resets for each interval and unused
+quota does not carry over.
+
+For example, an application specifying 4.0 CPU has access to 400 ms of
+CPU time every 100 ms. This CPU quota can be used in different ways,
+depending on the application and available resources. Consider the
+scenarios shown in this diagram.
+
+![CPU Availability](../images/CPUavailability.png)
+
+* *Scenario A*: the application can use up to 4 cores continuously for
+every 100 ms interval. It is never throttled and starts processing
+new requests immediately.
+
+* *Scenario B* : the application uses up to 8 cores (depending on
+availability) but is throttled after 50 ms. The CPU quota resets at the
+start of each new 100 ms interval.
+
+* *Scenario C* : is like Scenario A, but there is a garbage collection
+event in the second interval that consumes all CPU quota. The
+application throttles for the remaining 75 ms of that interval and
+cannot service requests until the next interval. In this example, the
+garbage collection finished in one interval but, depending on how much
+garbage needs collecting, it may take more than one interval and further
+delay service of requests.
+
+*Technical Note*: Mesos considers logical cores, also known as
+hyperthreading or SMT cores, as the unit of CPU.
+
+### Memory Isolation
+
+Mesos uses dedicated memory allocation. Your application always has
+access to the amount of memory specified in your configuration. The
+application's memory use is defined as the sum of the resident set size
+(RSS) of all processes in a shard. Each shard is considered
+independently.
+
+In other words, say you specified a memory size of 10GB. Each shard
+would receive 10GB of memory. If an individual shard's memory demands
+exceed 10GB, that shard is killed, but the other shards continue
+working.
+
+*Technical note*: Total memory size is not enforced at allocation time,
+so your application can request more than its allocation without getting
+an ENOMEM. However, it will be killed shortly after.
+
+### Disk Space
+
+Disk space used by your application is defined as the sum of the files'
+disk space in your application's directory, including the `stdout` and
+`stderr` logged from your application. Each shard is considered
+independently. You should use off-node storage for your application's
+data whenever possible.
+
+In other words, say you specified disk space size of 100MB. Each shard
+would receive 100MB of disk space. If an individual shard's disk space
+demands exceed 100MB, that shard is killed, but the other shards
+continue working.
+
+After your application finishes running, its allocated disk space is
+reclaimed. Thus, your job's final action should move any disk content
+that you want to keep, such as logs, to your home file system or other
+less transitory storage. Disk reclamation takes place an undefined
+period after the application finish time; until then, the disk contents
+are still available but you shouldn't count on them being so.
+
+*Technical note* : Disk space is not enforced at write so your
+application can write above its quota without getting an ENOSPC, but it
+will be killed shortly after. This is subject to change.
+
+### GPU Isolation
+
+GPU isolation will be supported for Nvidia devices starting from Mesos 1.0.
+Access to the allocated units will be exclusive with no sharing between tasks
+allowed (e.g. no fractional GPU allocation). For more details, see the
+[Mesos design document](https://docs.google.com/document/d/10GJ1A80x4nIEo8kfdeo9B11PIbS1xJrrB4Z373Ifkpo/edit#heading=h.w84lz7p4eexl)
+and the [Mesos agent configuration](http://mesos.apache.org/documentation/latest/configuration/).
+
+### Other Resources
+
+Other resources, such as network bandwidth, do not have any performance
+guarantees. For some resources, such as memory bandwidth, there are no
+practical sharing methods so some application combinations collocated on
+the same host may cause contention.
+
+
+Sizing
+-------
+
+### CPU Sizing
+
+To correctly size Aurora-run Mesos tasks, specify a per-shard CPU value
+that lets the task run at its desired performance when at peak load
+distributed across all shards. Include reserve capacity of at least 50%,
+possibly more, depending on how critical your service is (or how
+confident you are about your original estimate : -)), ideally by
+increasing the number of shards to also improve resiliency. When running
+your application, observe its CPU stats over time. If consistently at or
+near your quota during peak load, you should consider increasing either
+per-shard CPU or the number of shards.
+
+## Memory Sizing
+
+Size for your application's peak requirement. Observe the per-instance
+memory statistics over time, as memory requirements can vary over
+different periods. Remember that if your application exceeds its memory
+value, it will be killed, so you should also add a safety margin of
+around 10-20%. If you have the ability to do so, you may also want to
+put alerts on the per-instance memory.
+
+## Disk Space Sizing
+
+Size for your application's peak requirement. Rotate and discard log
+files as needed to stay within your quota. When running a Java process,
+add the maximum size of the Java heap to your disk space requirement, in
+order to account for an out of memory error dumping the heap
+into the application's sandbox space.
+
+## GPU Sizing
+
+GPU is highly dependent on your application requirements and is only limited
+by the number of physical GPU units available on a target box.
+
+
+Oversubscription
+----------------
+
+Mesos supports [oversubscription of machine resources](http://mesos.apache.org/documentation/latest/oversubscription/)
+via the concept of revocable tasks. In contrast to non-revocable tasks, revocable tasks are best-effort.
+Mesos reserves the right to throttle or even kill them if they might affect existing high-priority
+user-facing services.
+
+As of today, the only revocable resource supported by Aurora are CPU and RAM resources. A job can
+opt-in to use those by specifying the `revocable` [Configuration Tier](../../features/multitenancy/#configuration-tiers).
+A revocable job will only be scheduled using revocable resources, even if there are plenty of
+non-revocable resources available.
+
+The Aurora scheduler must be [configured to receive revocable offers](../../operations/configuration/#resource-isolation)
+from Mesos and accept revocable jobs. If not configured properly revocable tasks will never get
+assigned to hosts and will stay in `PENDING`.
+
+For details on how to mark a job as being revocable, see the
+[Configuration Reference](../../reference/configuration/).

Added: aurora/site/source/documentation/0.21.0/features/service-discovery.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.21.0/features/service-discovery.md?rev=1840515&view=auto
==============================================================================
--- aurora/site/source/documentation/0.21.0/features/service-discovery.md (added)
+++ aurora/site/source/documentation/0.21.0/features/service-discovery.md Tue Sep 11 05:28:10 2018
@@ -0,0 +1,43 @@
+Service Discovery
+=================
+
+It is possible for the Aurora executor to announce tasks into ServerSets for
+the purpose of service discovery.  ServerSets use the Zookeeper [group membership pattern](http://zookeeper.apache.org/doc/trunk/recipes.html#sc_outOfTheBox)
+of which there are several reference implementations:
+
+  - [C++](https://github.com/apache/mesos/blob/master/src/zookeeper/group.cpp)
+  - [Java](https://github.com/twitter/commons/blob/master/src/java/com/twitter/common/zookeeper/ServerSetImpl.java#L221)
+  - [Python](https://github.com/twitter/commons/blob/master/src/python/twitter/common/zookeeper/serverset/serverset.py#L51)
+
+These can also be used natively in Finagle using the [ZookeeperServerSetCluster](https://github.com/twitter/finagle/blob/master/finagle-serversets/src/main/scala/com/twitter/finagle/zookeeper/ZookeeperServerSetCluster.scala).
+
+For more information about how to configure announcing, see the [Configuration Reference](../../reference/configuration/).
+
+Using Mesos DiscoveryInfo
+-------------------------
+Experimental support for populating DiscoveryInfo in Mesos is introduced in Aurora. This can be used to build
+custom service discovery system not using zookeeper. Please see `Service Discovery` section in
+[Mesos Framework Development guide](http://mesos.apache.org/documentation/latest/app-framework-development-guide/) for
+explanation of the protobuf message in Mesos.
+
+To use this feature, please enable `--populate_discovery_info` flag on scheduler. All jobs started by scheduler
+afterwards will have their portmap populated to Mesos and discoverable in `/state` endpoint in Mesos master and agent.
+
+### Using Mesos DNS
+An example is using [Mesos-DNS](https://github.com/mesosphere/mesos-dns), which is able to generate multiple DNS
+records. With current implementation, the example job with key `devcluster/vagrant/test/http-example` generates at
+least the following:
+
+1. An A record for `http_example.test.vagrant.aurora.mesos` (which only includes IP address);
+2. A [SRV record](https://en.wikipedia.org/wiki/SRV_record) for
+ `_http_example.test.vagrant._tcp.aurora.mesos`, which includes IP address and every port. This should only
+  be used if the service has one port.
+3. A SRV record `_{port-name}._http_example.test.vagrant._tcp.aurora.mesos` for each port name
+  defined. This should be used when the service has multiple ports. To have this working properly it's needed to
+  add `-populate_discovery_info` to scheduler's configuration.
+
+Things to note:
+
+1. The domain part (".mesos" in above example) can be configured in [Mesos DNS](http://mesosphere.github.io/mesos-dns/docs/configuration-parameters.html);
+2. Right now, portmap and port aliases in announcer object are not reflected in DiscoveryInfo, therefore not visible in
+   Mesos DNS records either. This is because they are only resolved in thermos executors.

Added: aurora/site/source/documentation/0.21.0/features/services.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.21.0/features/services.md?rev=1840515&view=auto
==============================================================================
--- aurora/site/source/documentation/0.21.0/features/services.md (added)
+++ aurora/site/source/documentation/0.21.0/features/services.md Tue Sep 11 05:28:10 2018
@@ -0,0 +1,116 @@
+Long-running Services
+=====================
+
+Jobs that are always restart on completion, whether successful or unsuccessful,
+are called services. This is useful for long-running processes
+such as webservices that should always be running, unless stopped explicitly.
+
+
+Service Specification
+---------------------
+
+A job is identified as a service by the presence of the flag
+``service=True` in the [`Job`](../../reference/configuration/#job-objects) object.
+The `Service` alias can be used as shorthand for `Job` with `service=True`.
+
+Example (available in the [Vagrant environment](../../getting-started/vagrant/)):
+
+    $ cat /vagrant/examples/jobs/hello_world.aurora
+    hello = Process(
+      name = 'hello',
+      cmdline = """
+        while true; do
+          echo hello world
+          sleep 10
+        done
+      """)
+
+    task = SequentialTask(
+      processes = [hello],
+      resources = Resources(cpu = 1.0, ram = 128*MB, disk = 128*MB)
+    )
+
+    jobs = [
+      Service(
+        task = task,
+        cluster = 'devcluster',
+        role = 'www-data',
+        environment = 'prod',
+        name = 'hello'
+      )
+    ]
+
+
+Jobs without the service bit set only restart up to `max_task_failures` times and only if they
+terminated unsuccessfully either due to human error or machine failure (see the
+[`Job`](../../reference/configuration/#job-objects) object for details).
+
+
+Ports
+-----
+
+In order to be useful, most services have to bind to one or more ports. Aurora enables this
+usecase via the [`thermos.ports` namespace](../../reference/configuration/#thermos-namespace) that
+allows to request arbitrarily named ports:
+
+
+    nginx = Process(
+      name = 'nginx',
+      cmdline = './run_nginx.sh -port {{thermos.ports[http]}}'
+    )
+
+
+When this process is included in a job, the job will be allocated a port, and the command line
+will be replaced with something like:
+
+    ./run_nginx.sh -port 42816
+
+Where 42816 happens to be the allocated port.
+
+For details on how to enable clients to discover this dynamically assigned port, see our
+[Service Discovery](../service-discovery/) documentation.
+
+
+Health Checking
+---------------
+
+Typically, the Thermos executor monitors processes within a task only by liveness of the forked
+process. In addition to that, Aurora has support for rudimentary health checking: Either via HTTP
+via custom shell scripts.
+
+For example, simply by requesting a `health` port, a process can request to be health checked
+via repeated calls to the `/health` endpoint:
+
+    nginx = Process(
+      name = 'nginx',
+      cmdline = './run_nginx.sh -port {{thermos.ports[health]}}'
+    )
+
+Please see the
+[configuration reference](../../reference/configuration/#healthcheckconfig-objects)
+for configuration options for this feature.
+
+Starting with the 0.17.0 release, job updates rely only on task health-checks by introducing
+a `min_consecutive_successes` parameter on the HealthCheckConfig object. This parameter represents
+the number of successful health checks needed before a task is moved into the `RUNNING` state. Tasks
+that do not have enough successful health checks within the first `n` attempts, are moved to the
+`FAILED` state, where `n = ceil(initial_interval_secs/interval_secs) + max_consecutive_failures +
+min_consecutive_successes`. In order to accommodate variability during task warm up, `initial_interval_secs`
+will act as a grace period. Any health-check failures during the first `m` attempts are ignored and
+do not count towards `max_consecutive_failures`, where `m = ceil(initial_interval_secs/interval_secs)`.
+
+As [job updates](../job-updates/) are based only on health-checks, it is not necessary to set
+`watch_secs` to the worst-case update time, it can instead be set to 0. The scheduler considers a
+task that is in the `RUNNING` to be healthy and proceeds to updating the next batch of instances.
+For details on how to control health checks, please see the
+[HealthCheckConfig](../../reference/configuration/#healthcheckconfig-objects) configuration object.
+Existing jobs that do not configure a health-check can fall-back to using `watch_secs` to
+monitor a task before considering it healthy.
+
+You can pause health checking by touching a file inside of your sandbox, named `.healthchecksnooze`.
+As long as that file is present, health checks will be disabled, enabling users to gather core
+dumps or other performance measurements without worrying about Aurora's health check killing
+their process.
+
+WARNING: Remember to remove this when you are done, otherwise your instance will have permanently
+disabled health checks.