You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/01/19 12:02:00 UTC

[jira] [Commented] (AIRFLOW-3177) Change scheduler_heartbeat metric from gauge to counter

    [ https://issues.apache.org/jira/browse/AIRFLOW-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16747058#comment-16747058 ] 

ASF subversion and git services commented on AIRFLOW-3177:
----------------------------------------------------------

Commit 4740da13d7432c41bc091bd9e271322b29933eef in airflow's branch refs/heads/v1-10-test from Greg Neiheisel
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=4740da1 ]

[AIRFLOW-3177] Change scheduler_heartbeat from gauge to counter (#4027)

This updates the scheduler_heartbeat metric from a gauge to a counter to
better support the statsd_exporter for usage with Prometheus. A counter
allows users to track the rate of the heartbeat, and integrates with the
exporter better. A crashing or down scheduler will no longer emit the
metric, but the statsd_exporter will continue to show a 1 for the metric
value. This fixes that issue because a counter will continually change,
and the lack of change indicates an issue with the scheduler.

Add statsd change notice in UPDATING.md


> Change scheduler_heartbeat metric from gauge to counter
> -------------------------------------------------------
>
>                 Key: AIRFLOW-3177
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3177
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: scheduler
>    Affects Versions: 2.0.0
>            Reporter: Greg Neiheisel
>            Assignee: Greg Neiheisel
>            Priority: Minor
>             Fix For: 1.10.1
>
>
> Currently, the scheduler_heartbeat metric exposed with the statsd integration is a gauge. I'm proposing to change the gauge to a counter for a better integration with Prometheus via the [statsd_exporter|[https://github.com/prometheus/statsd_exporter].]
> Rather than pointing Airflow at an actual statsd server, you can point it at this exporter, which will accumulate the metrics and expose them to be scraped by Prometheus at /metrics. The problem is that once this value is set when the scheduler runs its first loop, it will always be exposed to Prometheus as 1. The scheduler can crash, or be turned off and the statsd exporter will report a 1 until it is restarted and rebuilds its internal state.
> By turning this metric into a counter, we can detect an issue with the scheduler by graphing and alerting using a rate. If the rate of change of the counter drops below what it should be at (determined by the scheduler_heartbeat_secs setting), we can fire an alert.
> This should be helpful for adoption in Kubernetes environments where Prometheus is pretty much the standard.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)