You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@aurora.apache.org by se...@apache.org on 2017/04/25 21:27:00 UTC
aurora git commit: Extend operator documentation

Repository: aurora
Updated Branches:
  refs/heads/master 6cb2d4f69 -> c85bffdd6


Extend operator documentation

Included changes:

* new cluster upgrade instructions
* docs for several best practices collected on the mailinglist
* extracted and extended troubleshooting guide for new cluster operators
* several minor formatting fixes

Reviewed at https://reviews.apache.org/r/58651/


Project: http://git-wip-us.apache.org/repos/asf/aurora/repo
Commit: http://git-wip-us.apache.org/repos/asf/aurora/commit/c85bffdd
Tree: http://git-wip-us.apache.org/repos/asf/aurora/tree/c85bffdd
Diff: http://git-wip-us.apache.org/repos/asf/aurora/diff/c85bffdd

Branch: refs/heads/master
Commit: c85bffdd6f68312261697eee868d57069adda434
Parents: 6cb2d4f
Author: Stephan Erb <se...@apache.org>
Authored: Tue Apr 25 23:26:43 2017 +0200
Committer: Stephan Erb <se...@apache.org>
Committed: Tue Apr 25 23:26:43 2017 +0200

----------------------------------------------------------------------
 docs/README.md                        |   3 +
 docs/features/custom-executors.md     |  15 ++--
 docs/features/webhooks.md             |   2 +-
 docs/operations/backup-restore.md     |  10 +--
 docs/operations/configuration.md      |  63 +++++++++++++++--
 docs/operations/installation.md       |  70 ++-----------------
 docs/operations/storage.md            |   7 +-
 docs/operations/troubleshooting.md    | 106 +++++++++++++++++++++++++++++
 docs/operations/upgrades.md           |  41 +++++++++++
 docs/reference/scheduler-endpoints.md |  10 +--
 10 files changed, 237 insertions(+), 90 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/aurora/blob/c85bffdd/docs/README.md
----------------------------------------------------------------------
diff --git a/docs/README.md b/docs/README.md
index dfd3a23..166bf1c 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -35,6 +35,8 @@ For those that wish to manage and fine-tune an Aurora cluster.
 
  * [Installation](operations/installation.md)
  * [Configuration](operations/configuration.md)
+ * [Upgrades](operations/upgrades.md)
+ * [Troubleshooting](operations/troubleshooting.md)
  * [Monitoring](operations/monitoring.md)
  * [Security](operations/security.md)
  * [Storage](operations/storage.md)
@@ -55,6 +57,7 @@ The complete reference of commands, configuration options, and scheduler interna
     - [Client Cluster Configuration](reference/client-cluster-configuration.md)
  * [Scheduler Configuration](reference/scheduler-configuration.md)
  * [Observer Configuration](reference/observer-configuration.md)
+ * [Endpoints](reference/scheduler-endpoints.md)
 
 ## Additional Resources
  * [Tools integrating with Aurora](additional-resources/tools.md)

http://git-wip-us.apache.org/repos/asf/aurora/blob/c85bffdd/docs/features/custom-executors.md
----------------------------------------------------------------------
diff --git a/docs/features/custom-executors.md b/docs/features/custom-executors.md
index 40fc118..1357c1e 100644
--- a/docs/features/custom-executors.md
+++ b/docs/features/custom-executors.md
@@ -36,6 +36,7 @@ uris (optional)          | List of resources to download into the task sandbox.
 shell (optional)         | Run executor via shell.
 
 A note on the command property (from [mesos.proto](https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto)):
+
 ```
 1) If 'shell == true', the command will be launched via shell
    (i.e., /bin/sh -c 'value'). The 'value' specified will be
@@ -68,14 +69,15 @@ scalar (required)    | Value in float for cpus or int for mem (in MBs)
 
 ### volume_mounts (list)
 
-Property                  | Description
-------------------------  | ---------------------------------
-host_path (required)      | Host path to mount inside the container.
-container_path (required) | Path inside the container where `host_path` will be mounted.
-mode (required)           | Mode in which to mount the volume, Read-Write (RW) or Read-Only (RO).
+Property                     | Description
+---------------------------  | ---------------------------------
+host_path (required)         | Host path to mount inside the container.
+container_path (required)    | Path inside the container where `host_path` will be mounted.
+mode (required)              | Mode in which to mount the volume, Read-Write (RW) or Read-Only (RO).
 
 A sample configuration is as follows:
-```
+
+```json
 [
     {
       "executor": {
@@ -135,7 +137,6 @@ A sample configuration is as follows:
       "task_prefix": "my-executor-"
     }
 ]
-
 ```
 
 It should be noted that if you do not use Thermos or a Thermos based executor, links in the scheduler's

http://git-wip-us.apache.org/repos/asf/aurora/blob/c85bffdd/docs/features/webhooks.md
----------------------------------------------------------------------
diff --git a/docs/features/webhooks.md b/docs/features/webhooks.md
index 075aeec..a060975 100644
--- a/docs/features/webhooks.md
+++ b/docs/features/webhooks.md
@@ -19,6 +19,7 @@ Below is a sample configuration:
 ```
 
 And an example of a response that you will get back:
+
 ```json
 {
     "task":
@@ -77,4 +78,3 @@ And an example of a response that you will get back:
         },
         "oldState":{}}
 ```
-

http://git-wip-us.apache.org/repos/asf/aurora/blob/c85bffdd/docs/operations/backup-restore.md
----------------------------------------------------------------------
diff --git a/docs/operations/backup-restore.md b/docs/operations/backup-restore.md
index da467c3..15e6dd2 100644
--- a/docs/operations/backup-restore.md
+++ b/docs/operations/backup-restore.md
@@ -3,7 +3,7 @@
 **Be sure to read the entire page before attempting to restore from a backup, as it may have
 unintended consequences.**
 
-# Summary
+## Summary
 
 The restoration procedure replaces the existing (possibly corrupted) Mesos replicated log with an
 earlier, backed up, version and requires all schedulers to be taken down temporarily while
@@ -18,7 +18,7 @@ so any tasks that have been rescheduled since the backup was taken will be kille
 Instructions below have been verified in [Vagrant environment](../getting-started/vagrant.md) and with minor
 syntax/path changes should be applicable to any Aurora cluster.
 
-# Preparation
+## Preparation
 
 Follow these steps to prepare the cluster for restoring from a backup:
 
@@ -54,7 +54,7 @@ accomplished by updating the following scheduler configuration options:
 
 * Restart all schedulers
 
-# Cleanup and re-initialize Mesos replicated log
+## Cleanup and re-initialize Mesos replicated log
 
 Get rid of the corrupted files and re-initialize Mesos replicated log:
 
@@ -63,7 +63,7 @@ Get rid of the corrupted files and re-initialize Mesos replicated log:
 * Initialize Mesos replica's log file: `sudo mesos-log initialize --path=<-native_log_file_path>`
 * Start schedulers
 
-# Restore from backup
+## Restore from backup
 
 At this point the scheduler is ready to rehydrate from the backup:
 
@@ -87,5 +87,5 @@ See `aurora_admin help <command>` for usage details.
 the provided backup snapshot and initiate a mandatory failover
 `aurora_admin scheduler_commit_recovery --bypass-leader-redirect  <cluster>`
 
-# Cleanup
+## Cleanup
 Undo any modification done during [Preparation](#preparation) sequence.

http://git-wip-us.apache.org/repos/asf/aurora/blob/c85bffdd/docs/operations/configuration.md
----------------------------------------------------------------------
diff --git a/docs/operations/configuration.md b/docs/operations/configuration.md
index 203f3be..f0581ea 100644
--- a/docs/operations/configuration.md
+++ b/docs/operations/configuration.md
@@ -29,7 +29,6 @@ Like Mesos, Aurora uses command-line flags for runtime configuration. As such th
     # Environment variables controlling libmesos
     export JAVA_HOME=...
     export GLOG_v=1
-    # Port and public ip used to communicate with the Mesos master and for the replicated log
     export LIBPROCESS_PORT=8083
     export LIBPROCESS_IP=192.168.33.7
 
@@ -38,6 +37,36 @@ Like Mesos, Aurora uses command-line flags for runtime configuration. As such th
 That way Aurora's current flags are visible in `ps` and in the `/vars` admin endpoint.
 
 
+## JVM Configuration
+
+JVM settings are dependent on your environment and cluster size. They might require
+custom tuning. As a starting point, we recommend:
+
+* Ensure the initial (`-Xms`) and maximum (`-Xmx`) heap size are idential to prevent heap resizing
+  at runtime.
+* Either `-XX:+UseConcMarkSweepGC` or `-XX:+UseG1GC -XX:+UseStringDeduplication` are
+  sane defaults for the garbage collector.
+* `-Djava.net.preferIPv4Stack=true` makes sense in most cases as well.
+
+
+## Network Configuration
+
+By default, Aurora binds to all interfaces and auto-discovers its hostname. To reduce ambiguity
+it helps to hardcode them though:
+
+    -http_port=8081
+    -ip=192.168.33.7
+    -hostname="aurora1.us-east1.example.org"
+
+Two environment variables control the ip and port for the communication with the Mesos master
+and for the replicated log used by Aurora:
+
+    export LIBPROCESS_PORT=8083
+    export LIBPROCESS_IP=192.168.33.7
+
+It is important that those can be reached from all Mesos master and Aurora scheduler instances.
+
+
 ## Replicated Log Configuration
 
 Aurora schedulers use ZooKeeper to discover log replicas and elect a leader. Only one scheduler is
@@ -64,8 +93,13 @@ should be set to `3`.
 *Incorrectly setting this flag will cause data corruption to occur!*
 
 ### `-native_log_file_path`
-Location of the Mesos replicated log files. Consider allocating a dedicated disk (preferably SSD)
-for Mesos replicated log files to ensure optimal storage performance.
+Location of the Mesos replicated log files. For optimal and consistent performance, consider
+allocating a dedicated disk (preferably SSD) for the replicated log. Ensure that this disk is not
+used by anything else (e.g. no process logging) and in particular that it is a real disk
+and not just a partition.
+
+Even when a dedicated disk is used, switching from `CFQ` to `deadline` I/O scheduler of Linux kernel
+can furthermore help with storage performance in Aurora ([see this ticket for details](https://issues.apache.org/jira/browse/AURORA-1211)).
 
 ### `-native_log_zk_group_path`
 ZooKeeper path used for Mesos replicated log quorum discovery.
@@ -91,8 +125,10 @@ or truncating of the replicated log used by Aurora. In that case, see the docume
 
 Configuration options for the Aurora scheduler backup manager.
 
-* `-backup_interval`: The interval on which the scheduler writes local storage backups.  The default is every hour.
-* `-backup_dir`: Directory to write backups to.
+* `-backup_interval`: The interval on which the scheduler writes local storage backups.
+   The default is every hour.
+* `-backup_dir`: Directory to write backups to. As stated above, this should not be co-located on the
+   same disk as the replicated log.
 * `-max_saved_backups`: Maximum number of backups to retain before deleting the oldest backup(s).
 
 
@@ -137,6 +173,23 @@ tier configuration, you will also have to specify a file path:
     -tier_config=path/to/tiers/config.json
 
 
+## Multi-Framework Setup
+
+Aurora holds onto Mesos offers in order to provide efficient scheduling and
+[preemption](../features/multitenancy.md#preemption). This is problematic in multi-framework
+environments as Aurora might starve other frameworks.
+
+With a downside of increased scheduling latency, Aurora can be configured to be more cooperative:
+
+* Lowering `-min_offer_hold_time` (e.g. to `1mins`) can ensure unused offers are returned back to
+  Mesos more frequently.
+* Increasing `-offer_filter_duration` (e.g to `30secs`) will instruct Mesos
+  not to re-offer rejected resources for the given duration.
+
+Setting a [minimum amount of resources](http://mesos.apache.org/documentation/latest/quota/) for
+each Mesos role can furthermore help to ensure no framework is starved entirely.
+
+
 ## Containers
 
 Both the Mesos and Docker containerizers require configuration of the Mesos agent.

http://git-wip-us.apache.org/repos/asf/aurora/blob/c85bffdd/docs/operations/installation.md
----------------------------------------------------------------------
diff --git a/docs/operations/installation.md b/docs/operations/installation.md
index f9b04d4..82f5d18 100644
--- a/docs/operations/installation.md
+++ b/docs/operations/installation.md
@@ -26,6 +26,8 @@ profiles:
 A small number of machines (typically 3 or 5) responsible for cluster orchestration.  In most cases
 it is fine to co-locate these components in anything but very large clusters (> 1000 machines).
 Beyond that point, operators will likely want to manage these services on separate machines.
+In particular, you will want to use separate ZooKeeper ensembles for leader election and
+service discovery. Otherwise a service discovery error or outage can take down the entire cluster.
 
 In practice, 5 coordinators have been shown to reliably manage clusters with tens of thousands of
 machines.
@@ -140,7 +142,7 @@ CentOS: `sudo systemctl start aurora`
         wget -c https://apache.bintray.com/aurora/centos-7/aurora-executor-0.17.0-1.el7.centos.aurora.x86_64.rpm
         sudo yum install -y aurora-executor-0.17.0-1.el7.centos.aurora.x86_64.rpm
 
-### Configuration
+### Worker Configuration
 The executor typically does not require configuration.  Command line arguments can
 be passed to the executor using a command line argument on the scheduler.
 
@@ -194,6 +196,7 @@ Make an edit to add the `--mesos-root` flag resulting in something like:
       --log_to_stderr=google:INFO
     )
 
+
 ## Installing the client
 ### Ubuntu Trusty
 
@@ -214,7 +217,7 @@ Make an edit to add the `--mesos-root` flag resulting in something like:
     brew upgrade
     brew install aurora-cli
 
-### Configuration
+### Client Configuration
 Client configuration lives in a json file that describes the clusters available and how to reach
 them.  By default this file is at `/etc/aurora/clusters.json`.
 
@@ -247,66 +250,7 @@ are identical for both.
     sudo yum -y install mesos-1.1.0
 
 
-
 ## Troubleshooting
-So you've started your first cluster and are running into some issues? We've collected some common
-stumbling blocks and solutions here to help get you moving.
-
-### Replicated log not initialized
-
-#### Symptoms
-- Scheduler RPCs and web interface claim `Storage is not READY`
-- Scheduler log repeatedly prints messages like
-
-  ```
-  I1016 16:12:27.234133 26081 replica.cpp:638] Replica in EMPTY status
-  received a broadcasted recover request
-  I1016 16:12:27.234256 26084 recover.cpp:188] Received a recover response
-  from a replica in EMPTY status
-  ```
-
-#### Solution
-When you create a new cluster, you need to inform a quorum of schedulers that they are safe to
-consider their database to be empty by [initializing](#finalizing) the
-replicated log. This is done to prevent the scheduler from modifying the cluster state in the event
-of multiple simultaneous disk failures or, more likely, misconfiguration of the replicated log path.
-
-
-### Scheduler not registered
-
-#### Symptoms
-Scheduler log contains
-
-    Framework has not been registered within the tolerated delay.
-
-#### Solution
-Double-check that the scheduler is configured correctly to reach the Mesos master. If you are registering
-the master in ZooKeeper, make sure command line argument to the master:
 
-    --zk=zk://$ZK_HOST:2181/mesos/master
-
-is the same as the one on the scheduler:
-
-    -mesos_master_address=zk://$ZK_HOST:2181/mesos/master
-
-
-### Scheduler not running
-
-### Symptom
-The scheduler process commits suicide regularly. This happens under error conditions, but
-also on purpose in regular intervals.
-
-## Solution
-Aurora is meant to be run under supervision. You have to configure a supervisor like
-[Monit](http://mmonit.com/monit/) or [supervisord](http://supervisord.org/) to run the scheduler
-and restart it whenever it fails or exists on purpose.
-
-Aurora supports an active health checking protocol on its admin HTTP interface - if a `GET /health`
-times out or returns anything other than `200 OK` the scheduler process is unhealthy and should be
-restarted.
-
-For example, monit can be configured with
-
-    if failed port 8081 send "GET /health HTTP/1.0\r\n" expect "OK\n" with timeout 2 seconds for 10 cycles then restart
-
-assuming you set `-http_port=8081`.
+So you've started your first cluster and are running into some issues? We've collected some common
+stumbling blocks and solutions in our [Troubleshooting guide](troubleshooting.md) to help get you moving.

http://git-wip-us.apache.org/repos/asf/aurora/blob/c85bffdd/docs/operations/storage.md
----------------------------------------------------------------------
diff --git a/docs/operations/storage.md b/docs/operations/storage.md
index c30922f..8db6f6f 100644
--- a/docs/operations/storage.md
+++ b/docs/operations/storage.md
@@ -1,8 +1,6 @@
 # Aurora Scheduler Storage
 
 - [Overview](#overview)
-- [Replicated Log Configuration](#replicated-log-configuration)
-- [Backup Configuration](#replicated-log-configuration)
 - [Storage Semantics](#storage-semantics)
   - [Reads, writes, modifications](#reads-writes-modifications)
     - [Read lifecycle](#read-lifecycle)
@@ -21,8 +19,9 @@ For example:
 * Production resource quotas
 * Mesos resource offer host attributes
 
-Aurora solves its persistence needs by leveraging the Mesos implementation of a Paxos replicated
-log [[1]](https://ramcloud.stanford.edu/~ongaro/userstudy/paxos.pdf)
+Aurora solves its persistence needs by leveraging the
+[Mesos implementation of a Paxos replicated log](http://mesos.apache.org/documentation/latest/replicated-log-internals/)
+[[1]](https://ramcloud.stanford.edu/~ongaro/userstudy/paxos.pdf)
 [[2]](http://en.wikipedia.org/wiki/State_machine_replication) with a key-value
 [LevelDB](https://github.com/google/leveldb) storage as persistence media.
 

http://git-wip-us.apache.org/repos/asf/aurora/blob/c85bffdd/docs/operations/troubleshooting.md
----------------------------------------------------------------------
diff --git a/docs/operations/troubleshooting.md b/docs/operations/troubleshooting.md
new file mode 100644
index 0000000..3a6d23b
--- /dev/null
+++ b/docs/operations/troubleshooting.md
@@ -0,0 +1,106 @@
+# Troubleshooting
+
+So you've started your first cluster and are running into some issues? We've collected some common
+stumbling blocks and solutions here to help get you moving.
+
+## Replicated log not initialized
+
+### Symptoms
+- Scheduler RPCs and web interface claim `Storage is not READY`
+- Scheduler log repeatedly prints messages like
+
+  ```
+  I1016 16:12:27.234133 26081 replica.cpp:638] Replica in EMPTY status
+  received a broadcasted recover request
+  I1016 16:12:27.234256 26084 recover.cpp:188] Received a recover response
+  from a replica in EMPTY status
+  ```
+
+### Solution
+When you create a new cluster, you need to inform a quorum of schedulers that they are safe to
+consider their database to be empty by [initializing](installation.md#finalizing) the
+replicated log. This is done to prevent the scheduler from modifying the cluster state in the event
+of multiple simultaneous disk failures or, more likely, misconfiguration of the replicated log path.
+
+
+## No distinct leader elected
+
+### Symptoms
+Either no scheduler or multiple scheduler believe to be leading.
+
+### Solution
+Verify the [network configuration](configuration.md#network-configuration) of the Aurora
+scheduler is correct:
+
+* The `LIBPROCESS_IP:LIBPROCESS_PORT` endpoints must be reachable from all coordinator nodes running
+  a scheduler or a Mesos master.
+* Hostname lookups have to resolve to public ips rather than local ones that cannot be reached
+  from another node.
+
+In addition, double-check the [quota settings](configuration.md#replicated-log-configuration) of the
+replicated log.
+
+
+## Scheduler not registered
+
+### Symptoms
+Scheduler log contains
+
+    Framework has not been registered within the tolerated delay.
+
+### Solution
+Double-check that the scheduler is configured correctly to reach the Mesos master. If you are registering
+the master in ZooKeeper, make sure command line argument to the master:
+
+    --zk=zk://$ZK_HOST:2181/mesos/master
+
+is the same as the one on the scheduler:
+
+    -mesos_master_address=zk://$ZK_HOST:2181/mesos/master
+
+
+## Scheduler not running
+
+### Symptoms
+The scheduler process commits suicide regularly. This happens under error conditions, but
+also on purpose in regular intervals.
+
+### Solution
+Aurora is meant to be run under supervision. You have to configure a supervisor like
+[Monit](http://mmonit.com/monit/), [supervisord](http://supervisord.org/), or systemd to run the
+scheduler and restart it whenever it fails or exists on purpose.
+
+Aurora supports an active health checking protocol on its admin HTTP interface - if a `GET /health`
+times out or returns anything other than `200 OK` the scheduler process is unhealthy and should be
+restarted.
+
+For example, monit can be configured with
+
+    if failed port 8081 send "GET /health HTTP/1.0\r\n" expect "OK\n" with timeout 2 seconds for 10 cycles then restart
+
+assuming you set `-http_port=8081`.
+
+
+## Executor crashing or hanging
+
+### Symptoms
+Launched task instances never transition to `STARTING` or `RUNNING` but immediately transition
+to `FAILED` or `LOST`.
+
+### Solution
+The executor might be failing due to unknown internal errors such as a missing native dependency
+of the Mesos executor library. Open the Mesos UI and navigate to the failing
+task in question. Inspect the various log files in order to learn about what is going on.
+
+
+## Observer does not discover tasks
+
+### Symptoms
+The observer UI does not list any tasks. When navigating from the scheduler UI to the state of
+a particular task instance the observer returns `Error: 404 Not Found`.
+
+### Solution
+The observer is refreshing its internal state every couple of seconds. If waiting a few seconds
+does not resolve the issue, check that the `--mesos-root` setting of the observer and the
+`--work_dir` option of the Mesos agent are in sync. For details, see our
+[Install instructions](installation.md#worker-configuration).

http://git-wip-us.apache.org/repos/asf/aurora/blob/c85bffdd/docs/operations/upgrades.md
----------------------------------------------------------------------
diff --git a/docs/operations/upgrades.md b/docs/operations/upgrades.md
new file mode 100644
index 0000000..1d6a73d
--- /dev/null
+++ b/docs/operations/upgrades.md
@@ -0,0 +1,41 @@
+# Upgrading Aurora
+
+Aurora can be updated from one version to the next without any downtime or restarts of running
+jobs. The same holds true for Mesos.
+
+Generally speaking, Mesos and Aurora strive for a +1/-1 version compatibility, i.e. all components
+are meant to be forward and backwards compatible for at least one version. This implies it
+does not really matter in which order updates are carried out.
+
+Exceptions to this rule are documented in the [Aurora release-notes](../../RELEASE-NOTES.md)
+and the [Mesos upgrade instructions](https://mesos.apache.org/documentation/latest/upgrades/).
+
+
+## Instructions
+
+To upgrade Aurora, follow these steps:
+
+1. Update the first scheduler instance by updating its software and restarting its process.
+2. Wait until the scheduler is up and its [Replicated Log](configuration.md#replicated-log-configuration)
+   caught up with the other schedulers in the cluster. The log has caught up if `log/recovered` has
+   the value `1`. You can check the metric via `curl LIBPROCESS_IP:LIBPROCESS_PORT/metrics/snapshot`,
+   where ip and port refer to the [libmesos configuration](configuration.md#network-configuration)
+   settings of the scheduler instance.
+3. Proceed with the next scheduler until all instances are updated.
+4. Update the Aurora executor deployed to the compute nodes of your cluster. Jobs will continue
+   running with the old version of the executor, and will only be launched by the new one once
+   they are restarted eventually due to natural cluster churn.
+5. Distribute the new Aurora client to your users.
+
+
+## Best Practices
+
+Even though not absolutely mandatory, we advice to adhere to the following rules:
+
+* Never skip any major or minor releases when updating. If you have to catch up several releases you
+  have to deploy all intermediary versions. Skipping bugfix releases is acceptable though.
+* Verify all updates on a test cluster before touching your production deployments.
+* To minimize the number of failovers during updates, update the currently leading scheduler
+  instance last.
+* Update the Aurora executor on a subset of compute nodes as a canary before deploying the change to
+  the whole fleet.

http://git-wip-us.apache.org/repos/asf/aurora/blob/c85bffdd/docs/reference/scheduler-endpoints.md
----------------------------------------------------------------------
diff --git a/docs/reference/scheduler-endpoints.md b/docs/reference/scheduler-endpoints.md
index d302e90..ddae76b 100644
--- a/docs/reference/scheduler-endpoints.md
+++ b/docs/reference/scheduler-endpoints.md
@@ -1,7 +1,7 @@
 # HTTP endpoints
 
 There are a number of HTTP endpoints that the Aurora scheduler exposes. These allow various
-operational tasks to be performed on the scheduler. Below is the list of all such endpoints
+operational tasks to be performed on the scheduler. Below is an (incomplete) list of such endpoints
 and a brief explanation of what they do.
 
 ## Leader health
@@ -12,8 +12,8 @@ HAProxy or AWS ELB.
 When a HTTP GET request is issued on this endpoint, it responds as follows:
 
 - If the instance that received the GET request is the leading scheduler, a HTTP status code of
-  200 (OK) is returned.
+  `200 OK` is returned.
 - If the instance that received the GET request is not the leading scheduler but a leader does
-  exist, a HTTP status code of 503 (SERVICE_UNAVAILABLE) is returned.
-- If no leader currently exists or the leader is unknown, a HTTP status code of 502
-  (BAD_GATEWAY) is returned.
\ No newline at end of file
+  exist, a HTTP status code of `503 SERVICE_UNAVAILABLE` is returned.
+- If no leader currently exists or the leader is unknown, a HTTP status code of `502 BAD_GATEWAY`
+  is returned.