You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by govind-menon <gi...@git.apache.org> on 2018/09/25 14:39:37 UTC

[GitHub] storm pull request #2845: STORM-3234: Replace old metrics docs with better d...

Github user govind-menon commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2845#discussion_r220219222
  
    --- Diff: docs/ClusterMetrics.md ---
    @@ -0,0 +1,256 @@
    +---
    +title: Cluster Metrics
    +layout: documentation
    +documentation: true
    +---
    +
    +#Cluster Metrics
    +
    +There are lots of metrics to help you monitor a running cluster.  Many of these metrics are still a work in progress and so is the metrics system itself so any of them may change, even between minor version releases.  We will try to keep them as stable as possible, but they should all be considered somewhat unstable. Some of the metrics may also be for experimental features, or features that are not complete yet, so please read the description of the metric before using it for monitoring or alerting.
    +
    +Also be aware that depending on the metrics system you use, the names are likely to be translated into a different format that is compatible with the system.  Typically this means that the ':' separating character will be replaced with a '.' character.
    +
    +Most metrics should have the units that they are reported in as a part of the description.  For Timers often this is configured by the reporter that is uploading them to your system.  Pay attention because even if the metric name has a time unit in it, it may be false.
    +
    +Also most metrics, except for gauges and counters, are a collection of numbers, and not a single value.  Often these result in multiple metrics being uploaded to a reporting system, such as percentiles for a histogram, or rates for a meter.  It is dependent on the configured metrics reporter how this happens, or how the name here corresponds to the metric in your reporting system.
    +
    +## Cluster Metrics (From Nimbus)
    +
    +These are metrics that come from the active nimbus instance and report the state of the cluster as a whole, as seen by nimbus.
    +
    +| Metric Name | Type | Description |
    +|-------------|------|-------------|
    +| cluster:num-nimbus-leaders | gauge | Number of nimbuses marked as a leader. This should really only ever be 1 in a health cluster, or 0 for a short period of time while a failover happens. |
    --- End diff --
    
    Nit: healthy


---