You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by ch...@apache.org on 2017/11/06 13:42:49 UTC

flink git commit: [FLINK-7843][metrics][docs] Add time unit and metric type to system metrics reference

Repository: flink
Updated Branches:
  refs/heads/master 2d56b11e7 -> d66e95c61


[FLINK-7843][metrics][docs] Add time unit and metric type to system metrics reference

This closes #4869.


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/d66e95c6
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/d66e95c6
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/d66e95c6

Branch: refs/heads/master
Commit: d66e95c610900103d3e9e14227eeac2f6d2fe607
Parents: 2d56b11
Author: yew1eb <ye...@gmail.com>
Authored: Fri Oct 20 15:55:21 2017 +0800
Committer: zentol <ch...@apache.org>
Committed: Mon Nov 6 14:41:57 2017 +0100

----------------------------------------------------------------------
 docs/monitoring/metrics.md | 187 ++++++++++++++++++++++++++++------------
 1 file changed, 131 insertions(+), 56 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/d66e95c6/docs/monitoring/metrics.md
----------------------------------------------------------------------
diff --git a/docs/monitoring/metrics.md b/docs/monitoring/metrics.md
index 0bc17a7..bb15cec 100644
--- a/docs/monitoring/metrics.md
+++ b/docs/monitoring/metrics.md
@@ -512,7 +512,7 @@ metrics.reporter.slf4j.interval: 60 SECONDS
 By default Flink gathers several metrics that provide deep insights on the current state.
 This section is a reference of all these metrics.
 
-The tables below generally feature 4 columns:
+The tables below generally feature 5 columns:
 
 * The "Scope" column describes which scope format is used to generate the system scope.
   For example, if the cell contains "Operator" then the scope format for "metrics.scope.operator" is used.
@@ -525,6 +525,8 @@ The tables below generally feature 4 columns:
 
 * The "Description" column provides information as to what a given metric is measuring.
 
+* The "Type" column describes which metric type is used for the measurement.
+
 Note that all dots in the infix/metric name columns are still subject to the "metrics.delimiter" setting.
 
 Thus, in order to infer the metric identifier:
@@ -537,10 +539,11 @@ Thus, in order to infer the metric identifier:
 <table class="table table-bordered">
   <thead>
     <tr>
-      <th class="text-left" style="width: 20%">Scope</th>
-      <th class="text-left" style="width: 25%">Infix</th>
-      <th class="text-left" style="width: 23%">Metrics</th>
+      <th class="text-left" style="width: 18%">Scope</th>
+      <th class="text-left" style="width: 22%">Infix</th>
+      <th class="text-left" style="width: 20%">Metrics</th>
       <th class="text-left" style="width: 32%">Description</th>
+      <th class="text-left" style="width: 8%">Type</th>
     </tr>
   </thead>
   <tbody>
@@ -549,10 +552,12 @@ Thus, in order to infer the metric identifier:
       <td rowspan="2">Status.JVM.CPU</td>
       <td>Load</td>
       <td>The recent CPU usage of the JVM.</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>Time</td>
       <td>The CPU time used by the JVM.</td>
+      <td>Gauge</td>
     </tr>
   </tbody>
 </table>
@@ -561,10 +566,11 @@ Thus, in order to infer the metric identifier:
 <table class="table table-bordered">                               
   <thead>                                                          
     <tr>                                                           
-      <th class="text-left" style="width: 20%">Scope</th>
-      <th class="text-left" style="width: 25%">Infix</th>          
-      <th class="text-left" style="width: 23%">Metrics</th>                           
-      <th class="text-left" style="width: 32%">Description</th>                       
+      <th class="text-left" style="width: 18%">Scope</th>
+      <th class="text-left" style="width: 22%">Infix</th>          
+      <th class="text-left" style="width: 20%">Metrics</th>                           
+      <th class="text-left" style="width: 32%">Description</th>
+      <th class="text-left" style="width: 8%">Type</th>                       
     </tr>                                                          
   </thead>                                                         
   <tbody>                                                          
@@ -572,51 +578,63 @@ Thus, in order to infer the metric identifier:
       <th rowspan="12"><strong>Job-/TaskManager</strong></th>
       <td rowspan="12">Status.JVM.Memory</td>
       <td>Heap.Used</td>
-      <td>The amount of heap memory currently used.</td>
+      <td>The amount of heap memory currently used (in bytes).</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>Heap.Committed</td>
-      <td>The amount of heap memory guaranteed to be available to the JVM.</td>
+      <td>The amount of heap memory guaranteed to be available to the JVM (in bytes).</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>Heap.Max</td>
-      <td>The maximum amount of heap memory that can be used for memory management.</td>
+      <td>The maximum amount of heap memory that can be used for memory management (in bytes).</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>NonHeap.Used</td>
-      <td>The amount of non-heap memory currently used.</td>
+      <td>The amount of non-heap memory currently used (in bytes).</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>NonHeap.Committed</td>
-      <td>The amount of non-heap memory guaranteed to be available to the JVM.</td>
+      <td>The amount of non-heap memory guaranteed to be available to the JVM (in bytes).</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>NonHeap.Max</td>
-      <td>The maximum amount of non-heap memory that can be used for memory management.</td>
+      <td>The maximum amount of non-heap memory that can be used for memory management (in bytes).</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>Direct.Count</td>
       <td>The number of buffers in the direct buffer pool.</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>Direct.MemoryUsed</td>
-      <td>The amount of memory used by the JVM for the direct buffer pool.</td>
+      <td>The amount of memory used by the JVM for the direct buffer pool (in bytes).</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>Direct.TotalCapacity</td>
-      <td>The total capacity of all buffers in the direct buffer pool.</td>
+      <td>The total capacity of all buffers in the direct buffer pool (in bytes).</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>Mapped.Count</td>
       <td>The number of buffers in the mapped buffer pool.</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>Mapped.MemoryUsed</td>
-      <td>The amount of memory used by the JVM for the mapped buffer pool.</td>
+      <td>The amount of memory used by the JVM for the mapped buffer pool (in bytes).</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>Mapped.TotalCapacity</td>
-      <td>The number of buffers in the mapped buffer pool.</td>
+      <td>The number of buffers in the mapped buffer pool (in bytes).</td>
+      <td>Gauge</td>
     </tr>                                                         
   </tbody>                                                         
 </table>
@@ -625,10 +643,11 @@ Thus, in order to infer the metric identifier:
 <table class="table table-bordered">
   <thead>
     <tr>
-      <th class="text-left" style="width: 20%">Scope</th>
-      <th class="text-left" style="width: 25%">Infix</th>
-      <th class="text-left" style="width: 23%">Metrics</th>
+      <th class="text-left" style="width: 18%">Scope</th>
+      <th class="text-left" style="width: 22%">Infix</th>
+      <th class="text-left" style="width: 20%">Metrics</th>
       <th class="text-left" style="width: 32%">Description</th>
+      <th class="text-left" style="width: 8%">Type</th>
     </tr>
   </thead>
   <tbody>
@@ -637,6 +656,7 @@ Thus, in order to infer the metric identifier:
       <td rowspan="1">Status.JVM.ClassLoader</td>
       <td>Threads.Count</td>
       <td>The total number of live threads.</td>
+      <td>Gauge</td>
     </tr>
   </tbody>
 </table>
@@ -645,10 +665,11 @@ Thus, in order to infer the metric identifier:
 <table class="table table-bordered">
   <thead>
     <tr>
-      <th class="text-left" style="width: 20%">Scope</th>
-      <th class="text-left" style="width: 25%">Infix</th>
-      <th class="text-left" style="width: 23%">Metrics</th>
+      <th class="text-left" style="width: 18%">Scope</th>
+      <th class="text-left" style="width: 22%">Infix</th>
+      <th class="text-left" style="width: 20%">Metrics</th>
       <th class="text-left" style="width: 32%">Description</th>
+      <th class="text-left" style="width: 8%">Type</th>
     </tr>
   </thead>
   <tbody>
@@ -657,10 +678,12 @@ Thus, in order to infer the metric identifier:
       <td rowspan="2">Status.JVM.GarbageCollector</td>
       <td>&lt;GarbageCollector&gt;.Count</td>
       <td>The total number of collections that have occurred.</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>&lt;GarbageCollector&gt;.Time</td>
       <td>The total time spent performing garbage collection.</td>
+      <td>Gauge</td>
     </tr>
   </tbody>
 </table>
@@ -669,10 +692,11 @@ Thus, in order to infer the metric identifier:
 <table class="table table-bordered">
   <thead>
     <tr>
-      <th class="text-left" style="width: 20%">Scope</th>
-      <th class="text-left" style="width: 25%">Infix</th>
-      <th class="text-left" style="width: 23%">Metrics</th>
+      <th class="text-left" style="width: 18%">Scope</th>
+      <th class="text-left" style="width: 22%">Infix</th>
+      <th class="text-left" style="width: 20%">Metrics</th>
       <th class="text-left" style="width: 32%">Description</th>
+      <th class="text-left" style="width: 8%">Type</th>
     </tr>
   </thead>
   <tbody>
@@ -681,10 +705,12 @@ Thus, in order to infer the metric identifier:
       <td rowspan="2">Status.JVM.ClassLoader</td>
       <td>ClassesLoaded</td>
       <td>The total number of classes loaded since the start of the JVM.</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>ClassesUnloaded</td>
       <td>The total number of classes unloaded since the start of the JVM.</td>
+      <td>Gauge</td>
     </tr>
   </tbody>
 </table>
@@ -693,10 +719,11 @@ Thus, in order to infer the metric identifier:
 <table class="table table-bordered">
   <thead>
     <tr>
-      <th class="text-left" style="width: 20%">Scope</th>
-      <th class="text-left" style="width: 25%">Infix</th>
-      <th class="text-left" style="width: 25%">Metrics</th>
+      <th class="text-left" style="width: 18%">Scope</th>
+      <th class="text-left" style="width: 22%">Infix</th>
+      <th class="text-left" style="width: 22%">Metrics</th>
       <th class="text-left" style="width: 30%">Description</th>
+      <th class="text-left" style="width: 8%">Type</th>
     </tr>
   </thead>
   <tbody>
@@ -705,46 +732,56 @@ Thus, in order to infer the metric identifier:
       <td rowspan="2">Status.Network</td>
       <td>AvailableMemorySegments</td>
       <td>The number of unused memory segments.</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>TotalMemorySegments</td>
       <td>The number of allocated memory segments.</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <th rowspan="8">Task</th>
       <td rowspan="4">buffers</td>
       <td>inputQueueLength</td>
       <td>The number of queued input buffers.</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>outputQueueLength</td>
       <td>The number of queued output buffers.</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>inPoolUsage</td>
       <td>An estimate of the input buffers usage.</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>outPoolUsage</td>
       <td>An estimate of the output buffers usage.</td>
+      <td>Gauge</td>      
     </tr>
     <tr>
       <td rowspan="4">Network.&lt;Input|Output&gt;.&lt;gate&gt;<br />
         <strong>(only available if <tt>taskmanager.net.detailed-metrics</tt> config option is set)</strong></td>
       <td>totalQueueLen</td>
       <td>Total number of queued buffers in all input/output channels.</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>minQueueLen</td>
       <td>Minimum number of queued buffers in all input/output channels.</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>maxQueueLen</td>
       <td>Maximum number of queued buffers in all input/output channels.</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>avgQueueLen</td>
       <td>Average number of queued buffers in all input/output channels.</td>
+      <td>Gauge</td>
     </tr>
   </tbody>
 </table>
@@ -753,9 +790,10 @@ Thus, in order to infer the metric identifier:
 <table class="table table-bordered">
   <thead>
     <tr>
-      <th class="text-left" style="width: 20%">Scope</th>
-      <th class="text-left" style="width: 30%">Metrics</th>
-      <th class="text-left" style="width: 50%">Description</th>
+      <th class="text-left" style="width: 18%">Scope</th>
+      <th class="text-left" style="width: 26%">Metrics</th>
+      <th class="text-left" style="width: 48%">Description</th>
+      <th class="text-left" style="width: 8%">Type</th>
     </tr>
   </thead>
   <tbody>
@@ -763,18 +801,22 @@ Thus, in order to infer the metric identifier:
       <th rowspan="4"><strong>JobManager</strong></th>
       <td>numRegisteredTaskManagers</td>
       <td>The number of registered taskmanagers.</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>numRunningJobs</td>
       <td>The number of running jobs.</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>taskSlotsAvailable</td>
       <td>The number of available task slots.</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>taskSlotsTotal</td>
       <td>The total number of task slots.</td>
+      <td>Gauge</td>
     </tr>
   </tbody>
 </table>
@@ -783,34 +825,39 @@ Thus, in order to infer the metric identifier:
 <table class="table table-bordered">
   <thead>
     <tr>
-      <th class="text-left" style="width: 20%">Scope</th>
-      <th class="text-left" style="width: 30%">Metrics</th>
-      <th class="text-left" style="width: 50%">Description</th>
+      <th class="text-left" style="width: 18%">Scope</th>
+      <th class="text-left" style="width: 26%">Metrics</th>
+      <th class="text-left" style="width: 48%">Description</th>
+      <th class="text-left" style="width: 8%">Type</th>
     </tr>
   </thead>
   <tbody>
     <tr>
       <th rowspan="4"><strong>Job (only available on JobManager)</strong></th>
       <td>restartingTime</td>
-      <td>The time it took to restart the job, or how long the current restart has been in progress.</td>
+      <td>The time it took to restart the job, or how long the current restart has been in progress (in milliseconds).</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>uptime</td>
       <td>
         The time that the job has been running without interruption.
-        <p>Returns -1 for completed jobs.</p>
+        <p>Returns -1 for completed jobs (in milliseconds).</p>
       </td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>downtime</td>
       <td>
         For jobs currently in a failing/recovering situation, the time elapsed during this outage.
-        <p>Returns 0 for running jobs and -1 for completed jobs.</p>
+        <p>Returns 0 for running jobs and -1 for completed jobs (in milliseconds).</p>
       </td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>fullRestarts</td>
-      <td>The total number of full restarts since this job was submitted.</td>
+      <td>The total number of full restarts since this job was submitted (in milliseconds).</td>
+      <td>Gauge</td>
     </tr>
   </tbody>
 </table>
@@ -819,53 +866,64 @@ Thus, in order to infer the metric identifier:
 <table class="table table-bordered">
   <thead>
     <tr>
-      <th class="text-left" style="width: 20%">Scope</th>
-      <th class="text-left" style="width: 30%">Metrics</th>
-      <th class="text-left" style="width: 50%">Description</th>
+      <th class="text-left" style="width: 18%">Scope</th>
+      <th class="text-left" style="width: 26%">Metrics</th>
+      <th class="text-left" style="width: 48%">Description</th>
+      <th class="text-left" style="width: 8%">Type</th>
     </tr>
   </thead>
   <tbody>
     <tr>
       <th rowspan="9"><strong>Job (only available on JobManager)</strong></th>
       <td>lastCheckpointDuration</td>
-      <td>The time it took to complete the last checkpoint.</td>
+      <td>The time it took to complete the last checkpoint (in milliseconds).</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>lastCheckpointSize</td>
-      <td>The total size of the last checkpoint.</td>
+      <td>The total size of the last checkpoint (in bytes).</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>lastCheckpointExternalPath</td>
       <td>The path where the last external checkpoint was stored.</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>lastCheckpointRestoreTimestamp</td>
-      <td>Timestamp when the last checkpoint was restored at the coordinator.</td>
+      <td>Timestamp when the last checkpoint was restored at the coordinator (in milliseconds).</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>lastCheckpointAlignmentBuffered</td>
-      <td>The number of buffered bytes during alignment over all subtasks for the last checkpoint.</td>
+      <td>The number of buffered bytes during alignment over all subtasks for the last checkpoint (in bytes).</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>numberOfInProgressCheckpoints</td>
       <td>The number of in progress checkpoints.</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>numberOfCompletedCheckpoints</td>
       <td>The number of successfully completed checkpoints.</td>
+      <td>Gauge</td>
     </tr>            
     <tr>
       <td>numberOfFailedCheckpoints</td>
       <td>The number of failed checkpoints.</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>totalNumberOfCheckpoints</td>
       <td>The number of total checkpoints (in progress, completed, failed).</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <th rowspan="1">Task</th>
       <td>checkpointAlignmentTime</td>
-      <td>The time in nanoseconds that the last barrier alignment took to complete, or how long the current alignment has taken so far.</td>
+      <td>The time in nanoseconds that the last barrier alignment took to complete, or how long the current alignment has taken so far (in nanoseconds).</td>
+      <td>Gauge</td>
     </tr>
   </tbody>
 </table>
@@ -874,57 +932,69 @@ Thus, in order to infer the metric identifier:
 <table class="table table-bordered">
   <thead>
     <tr>
-      <th class="text-left" style="width: 20%">Scope</th>
-      <th class="text-left" style="width: 30%">Metrics</th>
-      <th class="text-left" style="width: 50%">Description</th>
+      <th class="text-left" style="width: 18%">Scope</th>
+      <th class="text-left" style="width: 26%">Metrics</th>
+      <th class="text-left" style="width: 48%">Description</th>
+      <th class="text-left" style="width: 8%">Type</th>
     </tr>
   </thead>
   <tbody>
     <tr>
       <th rowspan="7"><strong>Task</strong></th>
       <td>currentLowWatermark</td>
-      <td>The lowest watermark this task has received.</td>
+      <td>The lowest watermark this task has received (in milliseconds).</td>
+      <td>Gauge</td>
     </tr>
     <tr>
       <td>numBytesInLocal</td>
       <td>The total number of bytes this task has read from a local source.</td>
+      <td>Counter</td>
     </tr>
     <tr>
       <td>numBytesInLocalPerSecond</td>
       <td>The number of bytes this task reads from a local source per second.</td>
+      <td>Meter</td>
     </tr>
     <tr>
       <td>numBytesInRemote</td>
       <td>The total number of bytes this task has read from a remote source.</td>
+      <td>Counter</td>
     </tr>
     <tr>
       <td>numBytesInRemotePerSecond</td>
       <td>The number of bytes this task reads from a remote source per second.</td>
+      <td>Meter</td>
     </tr>
     <tr>
       <td>numBytesOut</td>
       <td>The total number of bytes this task has emitted.</td>
+      <td>Counter</td>
     </tr>
     <tr>
       <td>numBytesOutPerSecond</td>
       <td>The number of bytes this task emits per second.</td>
+      <td>Meter</td>
     </tr>
     <tr>
       <th rowspan="5"><strong>Task/Operator</strong></th>
       <td>numRecordsIn</td>
       <td>The total number of records this operator/task has received.</td>
+      <td>Counter</td>
     </tr>
     <tr>
       <td>numRecordsInPerSecond</td>
       <td>The number of records this operator/task receives per second.</td>
+      <td>Meter</td>
     </tr>
     <tr>
       <td>numRecordsOut</td>
       <td>The total number of records this operator/task has emitted.</td>
+      <td>Counter</td>
     </tr>
     <tr>
       <td>numRecordsOutPerSecond</td>
       <td>The number of records this operator/task sends per second.</td>
+      <td>Meter</td>
     </tr>
     <tr>
       <td>numLateRecordsDropped</td>
@@ -933,11 +1003,13 @@ Thus, in order to infer the metric identifier:
     <tr>
       <th rowspan="2"><strong>Operator</strong></th>
       <td>latency</td>
-      <td>The latency distributions from all incoming sources.</td>
+      <td>The latency distributions from all incoming sources (in milliseconds).</td>
+      <td>Histogram</td>
     </tr>
     <tr>
       <td>numSplitsProcessed</td>
       <td>The total number of InputSplits this data source has processed (if the operator is a data source).</td>
+      <td>Gauge</td>
     </tr>
   </tbody>
 </table>
@@ -948,9 +1020,10 @@ Thus, in order to infer the metric identifier:
 <table class="table table-bordered">
   <thead>
     <tr>
-      <th class="text-left" style="width: 20%">Scope</th>
-      <th class="text-left" style="width: 30%">Metrics</th>
-      <th class="text-left" style="width: 50%">Description</th>
+      <th class="text-left" style="width: 18%">Scope</th>
+      <th class="text-left" style="width: 26%">Metrics</th>
+      <th class="text-left" style="width: 48%">Description</th>
+      <th class="text-left" style="width: 8%">Type</th>
     </tr>
   </thead>
   <tbody>
@@ -958,11 +1031,13 @@ Thus, in order to infer the metric identifier:
       <th rowspan="1">Operator</th>
       <td>commitsSucceeded</td>
       <td>Kafka offset commit success count if Kafka commit is turned on and checkpointing is enabled.</td>
+      <td>Counter</td>
     </tr>
     <tr>
        <th rowspan="1">Operator</th>
        <td>commitsFailed</td>
        <td>Kafka offset commit failure count if Kafka commit is turned on and checkpointing is enabled.</td>
+       <td>Counter</td>
     </tr>
   </tbody>
 </table>