You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mesos.apache.org by ch...@apache.org on 2018/06/01 01:32:34 UTC

[08/12] mesos git commit: Added documentation for resource provider and CSI plugin metrics.

Added documentation for resource provider and CSI plugin metrics.

Review: https://reviews.apache.org/r/67303


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/db075fc6
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/db075fc6
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/db075fc6

Branch: refs/heads/master
Commit: db075fc67aceb8f75bbc204aae042a30b65c57e3
Parents: 318aca9
Author: Chun-Hung Hsiao <ch...@mesosphere.io>
Authored: Thu May 24 18:01:26 2018 -0700
Committer: Chun-Hung Hsiao <ch...@mesosphere.io>
Committed: Thu May 31 18:29:56 2018 -0700

----------------------------------------------------------------------
 docs/home.md       |   2 +-
 docs/monitoring.md | 184 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 185 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/db075fc6/docs/home.md
----------------------------------------------------------------------
diff --git a/docs/home.md b/docs/home.md
index 5471c70..adefc4d 100644
--- a/docs/home.md
+++ b/docs/home.md
@@ -23,7 +23,7 @@ layout: documentation
 * [Maintenance](maintenance.md) for performing maintenance on a Mesos cluster.
 * [Upgrades](upgrades.md) for upgrading a Mesos cluster.
 * [Logging](logging.md)
-* [Monitoring](monitoring.md)
+* [Monitoring / Metrics](monitoring.md)
 * [Operational Guide](operational-guide.md)
 * [Fetcher Cache Configuration](fetcher.md)
 * [Fault Domains](fault-domains.md)

http://git-wip-us.apache.org/repos/asf/mesos/blob/db075fc6/docs/monitoring.md
----------------------------------------------------------------------
diff --git a/docs/monitoring.md b/docs/monitoring.md
index d9dc793..2985f68 100644
--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@@ -1764,3 +1764,187 @@ the master it is registered with.
   <td>Counter</td>
 </tr>
 </table>
+
+#### Resource Providers
+
+The following metrics provide information about ongoing and completed
+[operations](operations.md) that apply to resources provided by a
+[resource provider](resource-provider.md) with the given _type_ and _name_. In
+the following metrics, the _operation_ placeholder refers to the name of a
+particular operation type, which is described in the list of
+[supported operation types](#supported-operation-types).
+
+<table class="table table-striped">
+<thead>
+<tr><th>Metric</th><th>Description</th><th>Type</th>
+</thead>
+<tr>
+  <td>
+  <code>resource_providers/<i>&lt;type&gt;</i>.<i>&lt;name&gt;</i>/operations/<i>&lt;operation&gt;</i>/pending</code>
+  </td>
+  <td>Number of ongoing <i>operation</i>s</td>
+  <td>Gauge</td>
+</tr>
+<tr>
+  <td>
+  <code>resource_providers/<i>&lt;type&gt;</i>.<i>&lt;name&gt;</i>/operations/<i>&lt;operation&gt;</i>/finished</code>
+  </td>
+  <td>Number of finished <i>operation</i>s</td>
+  <td>Counter</td>
+</tr>
+<tr>
+  <td>
+  <code>resource_providers/<i>&lt;type&gt;</i>.<i>&lt;name&gt;</i>/operations/<i>&lt;operation&gt;</i>/failed</code>
+  </td>
+  <td>Number of failed <i>operation</i>s</td>
+  <td>Counter</td>
+</tr>
+<tr>
+  <td>
+  <code>resource_providers/<i>&lt;type&gt;</i>.<i>&lt;name&gt;</i>/operations/<i>&lt;operation&gt;</i>/dropped</code>
+  </td>
+  <td>Number of dropped <i>operation</i>s</td>
+  <td>Counter</td>
+</tr>
+</table>
+
+##### Supported Operation Types
+
+Since the supported operation types may vary among different resource providers,
+the following is a comprehensive list of operation types and the corresponding
+resource providers that support them. Note that the name column is for the
+_operation_ placeholder in the above metrics.
+
+<table class="table table-striped">
+<thead>
+<tr><th>Type</th><th>Name</th><th>Supported Resource Provider Types</th>
+</thead>
+<tr>
+  <td><code><a href="reservation.md">RESERVE</a></code></td>
+  <td><code>reserve</code></td>
+  <td>All</td>
+</tr>
+<tr>
+  <td><code><a href="reservation.md">UNRESERVE</a></code></td>
+  <td><code>unreserve</code></td>
+  <td>All</td>
+</tr>
+<tr>
+  <td><code><a href="persistent-volume.md#-offer-operation-create-">CREATE</a></code></td>
+  <td><code>create</code></td>
+  <td><code>org.apache.mesos.rp.local.storage</code></td>
+</tr>
+<tr>
+  <td><code><a href="persistent-volume.md#-offer-operation-destroy-">DESTROY</a></code></td>
+  <td><code>destroy</code></td>
+  <td><code>org.apache.mesos.rp.local.storage</code></td>
+</tr>
+<tr>
+  <td><code><a href="csi.md#-create_volume-operation">CREATE_VOLUME</a></code></td>
+  <td><code>create_volume</code></td>
+  <td><code>org.apache.mesos.rp.local.storage</code></td>
+</tr>
+<tr>
+  <td><code><a href="csi.md#-destroy_volume-operation">DESTROY_VOLUME</a></code></td>
+  <td><code>destroy_volume</code></td>
+  <td><code>org.apache.mesos.rp.local.storage</code></td>
+</tr>
+<tr>
+  <td><code><a href="csi.md#-create_block-operation">CREATE_BLOCK</a></code></td>
+  <td><code>create_block</code></td>
+  <td><code>org.apache.mesos.rp.local.storage</code></td>
+</tr>
+<tr>
+  <td><code><a href="csi.md#-destroy_block-operation">DESTROY_BLOCK</a></code></td>
+  <td><code>destroy_block</code></td>
+  <td><code>org.apache.mesos.rp.local.storage</code></td>
+</tr>
+</table>
+
+For example, cluster operators can monitor the number of successful
+`CREATE_VOLUME` operations that are applied to the resource provider with type
+`org.apache.mesos.rp.local.storage` and name `lvm` through the
+`resource_providers/org.apache.mesos.rp.local.storage.lvm/operations/create_volume/finished`
+metric.
+
+#### CSI Plugins
+
+Storage resource providers in Mesos are backed by
+[CSI plugins](csi.md#standalone-containers-for-csi-plugins) running in
+[standalone containers](standalone-container.md). To monitor the health of these
+CSI plugins for a storage resource provider with _type_ and _name_, the
+following metrics provide information about plugin terminations and ongoing and
+completed CSI calls made to the plugin. In the following metrics, the _rpc_
+placeholder refers to the name of a particular CSI call, which is described in
+the list of [supported CSI calls](#supported-csi-calls).
+
+<table class="table table-striped">
+<thead>
+<tr><th>Metric</th><th>Description</th><th>Type</th>
+</thead>
+<tr>
+  <td>
+  <code>resource_providers/<i>&lt;type&gt;</i>.<i>&lt;name&gt;</i>/csi_plugin/container_terminations</code>
+  </td>
+  <td>Number of terminated CSI plugin containers</td>
+  <td>Counter</td>
+</tr>
+<tr>
+  <td>
+  <code>resource_providers/<i>&lt;type&gt;</i>.<i>&lt;name&gt;</i>/csi_plugin/rpcs/<i>&lt;rpc&gt;</i>/pending</code>
+  </td>
+  <td>Number of ongoing <i>rpc</i> calls</td>
+  <td>Gauge</td>
+</tr>
+<tr>
+  <td>
+  <code>resource_providers/<i>&lt;type&gt;</i>.<i>&lt;name&gt;</i>/csi_plugin/rpcs/<i>&lt;rpc&gt;</i>/successes</code>
+  </td>
+  <td>Number of successful <i>rpc</i> calls</td>
+  <td>Counter</td>
+</tr>
+<tr>
+  <td>
+  <code>resource_providers/<i>&lt;type&gt;</i>.<i>&lt;name&gt;</i>/csi_plugin/rpcs/<i>&lt;rpc&gt;</i>/errors</code>
+  </td>
+  <td>Number of erroneous <i>rpc</i> calls</td>
+  <td>Counter</td>
+</tr>
+<tr>
+  <td>
+  <code>resource_providers/<i>&lt;type&gt;</i>.<i>&lt;name&gt;</i>/csi_plugin/rpcs/<i>&lt;rpc&gt;</i>/cancelled</code>
+  </td>
+  <td>Number of cancelled <i>rpc</i> calls</td>
+  <td>Counter</td>
+</tr>
+</table>
+
+##### Supported CSI Calls
+
+The following is a comprehensive list of CSI calls that are used in storage
+resource providers. These names are used to replace the _rpc_ placeholder in the
+above metrics.
+
+* [`csi.v0.Identity.GetPluginInfo`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#getplugininfo)
+* [`csi.v0.Identity.GetPluginCapabilities`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#getplugincapabilities)
+* [`csi.v0.Identity.Probe`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#probe)
+* [`csi.v0.Controller.CreateVolume`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#createvolume)
+* [`csi.v0.Controller.DeleteVolume`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#deletevolume)
+* [`csi.v0.Controller.ControllerPublishVolume`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#controllerpublishvolume)
+* [`csi.v0.Controller.ControllerUnpublishVolume`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#controllerunpublishvolume)
+* [`csi.v0.Controller.ValidateVolumeCapabilities`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#validatevolumecapabilities)
+* [`csi.v0.Controller.ListVolumes`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#listvolumes)
+* [`csi.v0.Controller.GetCapacity`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#getcapacity)
+* [`csi.v0.Controller.ControllerGetCapabilities`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#controllergetcapabilities)
+* [`csi.v0.Node.NodeStageVolume`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#node-service-rpc)
+* [`csi.v0.Node.NodeUnstageVolume`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#nodeunstagevolume)
+* [`csi.v0.Node.NodePublishVolume`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#nodepublishvolume)
+* [`csi.v0.Node.NodeUnpublishVolume`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#nodeunpublishvolume)
+* [`csi.v0.Node.NodeGetId`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#nodegetid)
+* [`csi.v0.Node.NodeGetCapabilities`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#nodegetcapabilities)
+
+For example, cluster operators can monitor the number of successful
+`csi.v0.Controller.CreateVolume` calls that are made by the resource provider
+with type `org.apache.mesos.rp.local.storage` and name `lvm` through the
+`resource_providers/org.apache.mesos.rp.local.storage.lvm/csi_plugin/rpcs/csi.v0.Controller.CreateVolume/successes`
+metric.