You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mesos.apache.org by nn...@apache.org on 2015/07/18 00:35:25 UTC
mesos git commit: Added oversubscription user doc.
Repository: mesos
Updated Branches:
refs/heads/master 6b842c27b -> eba5a7339
Added oversubscription user doc.
Review: https://reviews.apache.org/r/36488
Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/eba5a733
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/eba5a733
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/eba5a733
Branch: refs/heads/master
Commit: eba5a73393769a4980b65782d95b6af0cf93f7c0
Parents: 6b842c2
Author: Niklas Nielsen <ni...@qni.dk>
Authored: Fri Jul 17 15:30:35 2015 -0700
Committer: Niklas Q. Nielsen <ni...@qni.dk>
Committed: Fri Jul 17 15:30:35 2015 -0700
----------------------------------------------------------------------
docs/images/oversubscription-overview.jpg | Bin 0 -> 67771 bytes
docs/oversubscription.md | 305 +++++++++++++++++++++++++
2 files changed, 305 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/mesos/blob/eba5a733/docs/images/oversubscription-overview.jpg
----------------------------------------------------------------------
diff --git a/docs/images/oversubscription-overview.jpg b/docs/images/oversubscription-overview.jpg
new file mode 100644
index 0000000..2d31097
Binary files /dev/null and b/docs/images/oversubscription-overview.jpg differ
http://git-wip-us.apache.org/repos/asf/mesos/blob/eba5a733/docs/oversubscription.md
----------------------------------------------------------------------
diff --git a/docs/oversubscription.md b/docs/oversubscription.md
new file mode 100644
index 0000000..f17d4d4
--- /dev/null
+++ b/docs/oversubscription.md
@@ -0,0 +1,305 @@
+--- layout: documentation ---
+
+# Oversubscription
+
+High-priority user-facing services are typically provisioned on large clusters
+for peak load and unexpected load spikes. Hence, for most of time, the
+provisioned resources remain underutilized. Oversubscription takes advantage of
+temporarily unused resources to execute best-effort tasks such as background
+analytics, video/image processing, chip simulations, and other low priority
+jobs.
+
+## How does it work?
+
+Oversubscription was introduced in Mesos 0.23.0 and adds two new slave
+components: a Resource Estimator and a Quality of Service (QoS) Controller,
+alongside extending the existing resource allocator, resource monitor, and
+mesos slave. The new components and their interactions are illustrated below.
+
+![Oversubscription overview](images/oversubscription-overview.jpg)
+
+### Resource estimation
+
+ - (1) The first step is to identify the amount of oversubscribed resources.
+ The resource estimator taps into the resource monitor and periodically gets
+usage statistics via `ResourceStatistic` messages. The resource estimator
+applies logic based on the collected resource statistics to determine the
+amount of oversubscribed resources. This can be a series of control algorithms
+based on measured resource usage slack (allocated but unused resources) and
+allocation slack.
+
+ - (2) The slave keeps polling estimates from the resource estimator and tracks
+ the latest estimate.
+
+ - (3) The slave will send the total amount of oversubscribed resources to the
+ master when the latest estimate is different from the previous estimate.
+
+### Resource tracking & scheduling algorithm
+
+ - (4) The allocator keeps track of the oversubscribed resources separately
+ from regular resources and annotate those resources as `revocable`. It is up
+to the resource estimator to determine which types of resources can be
+oversubscribed. It is recommended only to oversubscribe _compressible_
+resources such as cpu shares, bandwidth, etc.
+
+### Frameworks
+
+ - (5) Frameworks can choose to launch tasks on revocable resources by using
+ the regular launchTasks() API. To safe-guard frameworks that are not
+designed to deal with preemption, only frameworks registering with the
+`REVOCABLE_RESOURCES` capability set in its framework info will receive offers
+with revocable resources. Further more, recovable resources cannot be
+dynamically reserved and persistent volumes should not be created on revocable
+disk resources.
+
+### Task launch
+
+ - The revocable task is launched as usual when the runTask request is received
+ on the slave. The resources will still be marked as revocable and isolators
+can take appropriate actions, if certain resources need to be setup differently
+for revocable and regular tasks.
+
+> NOTE: If any resource used by a task or executor is
+revocable, the whole container is treated as a revocable container and can
+therefore be killed or throttled by the QoS Controller.
+
+### Interference detection
+
+ - (6) When the revocable task is running, it is important to constantly
+ monitor the original task running on those resources and guarantee
+performance based on an SLA. In order to react to detected interference, the
+QoS controller needs to be able to kill or throttle running revocable tasks.
+
+## Enabling frameworks to use oversubscribed resources
+
+Frameworks planning to use oversubscribed resources need to register with the
+`REVOCABLE_RESOURCES` capability set:
+
+~~~{.cpp}
+FrameworkInfo framework;
+framework.set_name("Revocable framework");
+
+framework.add_capabilities()->set_type(
+ FrameworkInfo::Capability::REVOCABLE_RESOURCES);
+~~~
+
+From that point on, the framework will start to receive revocable resources in
+offers.
+
+> NOTE: That there is no guarantee that the Mesos cluster has oversubscription
+enabled. If not, no revocable resources will be offered. See below for
+instructions how to configure Mesos for oversubscription.
+
+### Launching tasks using revocable resources
+
+Launching tasks using recovable resources is done through the existing
+`launchTasks` API. Revocable resources will have the `recovable` field set. See
+below for an example offer with regular and revocable resources.
+
+~~~{.json}
+{
+ "id": "20150618-112946-201330860-5050-2210-0000",
+ "framework_id": "20141119-101031-201330860-5050-3757-0000",
+ "slave_id": "20150618-112946-201330860-5050-2210-S1",
+ "hostname": "foobar",
+ "resources": [
+ {
+ "name": "cpus",
+ "type": "SCALAR",
+ "scalar": {
+ "value": 2.0
+ },
+ "role": "*"
+ }, {
+ "name": "mem",
+ "type": "SCALAR",
+ "scalar": {
+ "value": 512.0
+ },
+ "role": "*"
+ },
+ {
+ "name": "cpus",
+ "type": "SCALAR",
+ "scalar": {
+ "value": 0.45
+ },
+ "role": "*",
+ "revocable": {}
+ }
+ ]
+}
+~~~
+
+## Writing a custom resource estimator
+
+The resource estimator estimates and predicts the total resources used on the
+slave and informs the master about resources that can be oversubscribed. By
+default, Mesos comes with a `noop` and a `fixed` resource estimator. The `noop`
+estimator only provides an empty estimate to the slave and stalls, effectively
+disabling oversubscription. The `fixed` estimator doesn't use the actual
+measured slack, but oversubscribes the node with fixed resource amount (defined
+via a command line flag).
+
+The interface is defined below:
+
+~~~{.cpp}
+class ResourceEstimator
+{
+public:
+ // Initializes this resource estimator. This method needs to be
+ // called before any other member method is called. It registers
+ // a callback in the resource estimator. The callback allows the
+ // resource estimator to fetch the current resource usage for each
+ // executor on slave.
+ virtual Try<Nothing> initialize(
+ const lambda::function<process::Future<ResourceUsage>()>& usage) = 0;
+
+ // Returns the current estimation about the *maximum* amount of
+ // resources that can be oversubscribed on the slave. A new
+ // estimation will invalidate all the previously returned
+ // estimations. The slave will be calling this method periodically
+ // to forward it to the master. As a result, the estimator should
+ // respond with an estimate every time this method is called.
+ virtual process::Future<Resources> oversubscribable() = 0;
+};
+~~~
+
+## Writing a custom QoS controller
+
+The interface for implementing custom QoS Controllers is defined below:
+
+~~~{.cpp}
+class QoSController
+{
+public:
+ // Initializes this QoS Controller. This method needs to be
+ // called before any other member method is called. It registers
+ // a callback in the QoS Controller. The callback allows the
+ // QoS Controller to fetch the current resource usage for each
+ // executor on slave.
+ virtual Try<Nothing> initialize(
+ const lambda::function<process::Future<ResourceUsage>()>& usage) = 0;
+
+ // A QoS Controller informs the slave about corrections to carry
+ // out, but returning futures to QoSCorrection objects. For more
+ // information, please refer to mesos.proto.
+ virtual process::Future<std::list<QoSCorrection>> corrections() = 0;
+};
+~~~
+
+> NOTE The QoS Controller must not block `corrections()`. Back the QoS
+> Controller with it's own libprocess actor instead.
+
+The QoS Controller informs the slave that particular corrective actions need to
+be made. Each corrective action contains information about executor or task and
+the type of action to perform.
+
+~~~{.proto}
+message QoSCorrection {
+ enum Type {
+ KILL = 1; // Terminate an executor.
+ }
+
+ message Kill {
+ optional FrameworkID framework_id = 1;
+ optional ExecutorID executor_id = 2;
+ }
+
+ required Type type = 1;
+ optional Kill kill = 2;
+}
+~~~
+
+## Configuring Mesos for oversubscription
+
+Five new flags has been added to the slave:
+
+<table class="table table-striped">
+ <thead>
+ <tr>
+ <th width="30%">
+ Flag
+ </th>
+ <th>
+ Explanation
+ </th>
+ </thead>
+
+ <tr>
+ <td>
+ --oversubscribed_resources_interval=VALUE
+ </td>
+ <td>
+ The slave periodically updates the master with the current estimation
+about the total amount of oversubscribed resources that are allocated and
+available. The interval between updates is controlled by this flag. (default:
+15secs)
+ </td>
+ </tr>
+
+ <tr>
+ <td>
+ --qos_controller=VALUE
+ </td>
+ <td>
+ The name of the QoS Controller to use for oversubscription.
+ </td>
+ </tr>
+
+ <tr>
+ <td>
+ --qos_correction_interval_min=VALUE
+ </td>
+ <td>
+ The slave polls and carries out QoS corrections from the QoS Controller
+based on its observed performance of running tasks. The smallest interval
+between these corrections is controlled by this flag. (default: 0ns)
+ </td>
+ </tr>
+
+ <tr>
+ <td>
+ --resource_estimator=VALUE
+ </td>
+ <td>
+ The name of the resource estimator to use for oversubscription.
+ </td>
+ </tr>
+
+ <tr>
+ <td>
+ --resource_monitoring_interval=VALUE
+ </td>
+ <td>
+ Periodic time interval for monitoring executor resource usage (e.g.,
+10secs, 1min, etc) (default: 1secs)
+ </td>
+ </tr>
+
+</table>
+
+The `fixed` resource estimator is enabled as follows:
+
+```
+--resource_estimator="org_apache_mesos_FixedResourceEstimator"
+
+--modules='{
+ "libraries": {
+ "file": "/usr/local/lib64/libfixed_resource_estimator.so",
+ "modules": {
+ "name": "org_apache_mesos_FixedResourceEstimator",
+ "parameters": {
+ "key": "resources",
+ "value": "cpus:14"
+ }
+ }
+ }
+}'
+```
+
+In the example above, a fixed amount of 14 cpus will be offered as revocable
+resources.
+
+To select custom a resource estimator and QoS controller, please refer to the
+[modules documentation](modules.md).