You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by jelmerk <gi...@git.apache.org> on 2018/06/27 07:49:14 UTC
[GitHub] flink pull request #6211: [FLINK-9665] PrometheusReporter does not properly ...
GitHub user jelmerk opened a pull request:
https://github.com/apache/flink/pull/6211
[FLINK-9665] PrometheusReporter does not properly unregister metrics
## What is the purpose of the change
Prometheus metrics from canceled jobs are not always correctly cleaned up. This patch aims to remedy this
## Brief change log
- always unregister childs from collectors in notifyOfRemovedMetric so that no metrics stay behind when a job gets canceled but the collector does not get unregistered
## Verifying this change
I added a metricIsRemovedWhenCollectorIsNotUnregisteredYet test to PrometheusReporterTest
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
- The serializers: no
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
- The S3 file system connector: no
## Documentation
- Does this pull request introduce a new feature?: no
- If yes, how is the feature documented?: not applicable
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jelmerk/flink FLINK_9665
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/6211.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6211
----
commit 1dc5eb1e3ee8a1a15ecb95e5084db37fe7298cf4
Author: Jelmer Kuperus <jk...@...>
Date: 2018-06-26T23:05:11Z
[FLINK-9665] PrometheusReporter does not properly unregister metrics
----
---
[GitHub] flink pull request #6211: [FLINK-9665] PrometheusReporter does not properly ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/flink/pull/6211
---
[GitHub] flink issue #6211: [FLINK-9665] PrometheusReporter does not properly unregis...
Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:
https://github.com/apache/flink/pull/6211
@lamber-ken If you are referring to the added test, it covers the case of 2 jobs (`TaskManager>>Job<<MetricGroup`) in the same registry, not TaskManagers.
There will usually not be multiple TaskManagers registered in a single registry, but this issue applies in general to any subset of metrics with the same logical scope.
---
[GitHub] flink pull request #6211: [FLINK-9665] PrometheusReporter does not properly ...
Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on a diff in the pull request:
https://github.com/apache/flink/pull/6211#discussion_r198460116
--- Diff: flink-metrics/flink-metrics-prometheus/src/test/java/org/apache/flink/metrics/prometheus/PrometheusReporterTest.java ---
@@ -159,6 +162,27 @@ public void histogramIsReportedAsPrometheusSummary() throws UnirestException {
}
}
+ @Test
+ public void metricIsRemovedWhenCollectorIsNotUnregisteredYet() throws UnirestException {
+ TaskManagerMetricGroup tmMetricGroup = new TaskManagerMetricGroup(registry, HOST_NAME, TASK_MANAGER);
+
+ String metricName = "numRecordsOut";
+
+ Counter metric1 = new SimpleCounter();
+ FrontMetricGroup<TaskManagerJobMetricGroup> metricGroup1 = new FrontMetricGroup<>(1, new TaskManagerJobMetricGroup(registry, tmMetricGroup, JobID.generate(), "job_1"));
+ reporter.notifyOfAddedMetric(metric1, metricName, metricGroup1);
+
+ Counter metric2 = new SimpleCounter();
+ FrontMetricGroup<TaskManagerJobMetricGroup> metricGroup2 = new FrontMetricGroup<>(2, new TaskManagerJobMetricGroup(registry, tmMetricGroup, JobID.generate(), "job_2"));
--- End diff --
nit: the first argument to `FrontMetricGroup` should be identical since you're passing it to the same reporter. Actually it shouldn't be necessary to create a `FrontMetricGroup` at all, as it is only need to introduce reporter specific configuration properties.
---
[GitHub] flink pull request #6211: [FLINK-9665] PrometheusReporter does not properly ...
Posted by jelmerk <gi...@git.apache.org>.
Github user jelmerk commented on a diff in the pull request:
https://github.com/apache/flink/pull/6211#discussion_r198484545
--- Diff: flink-metrics/flink-metrics-prometheus/src/test/java/org/apache/flink/metrics/prometheus/PrometheusReporterTest.java ---
@@ -159,6 +162,27 @@ public void histogramIsReportedAsPrometheusSummary() throws UnirestException {
}
}
+ @Test
+ public void metricIsRemovedWhenCollectorIsNotUnregisteredYet() throws UnirestException {
+ TaskManagerMetricGroup tmMetricGroup = new TaskManagerMetricGroup(registry, HOST_NAME, TASK_MANAGER);
+
+ String metricName = "numRecordsOut";
+
+ Counter metric1 = new SimpleCounter();
+ FrontMetricGroup<TaskManagerJobMetricGroup> metricGroup1 = new FrontMetricGroup<>(1, new TaskManagerJobMetricGroup(registry, tmMetricGroup, JobID.generate(), "job_1"));
+ reporter.notifyOfAddedMetric(metric1, metricName, metricGroup1);
+
+ Counter metric2 = new SimpleCounter();
+ FrontMetricGroup<TaskManagerJobMetricGroup> metricGroup2 = new FrontMetricGroup<>(2, new TaskManagerJobMetricGroup(registry, tmMetricGroup, JobID.generate(), "job_2"));
--- End diff --
PrometheusReporter does an explicit cast to FrontMetricGroup
https://github.com/jelmerk/flink/blob/642b21647d78653ebe6a3e9fad8f629599e10367/flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusReporter.java#L242
I'll fix the first argument
---
[GitHub] flink issue #6211: [FLINK-9665] PrometheusReporter does not properly unregis...
Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:
https://github.com/apache/flink/pull/6211
I've merged the PR, @jelmerk could you close it? Thanks!
---
[GitHub] flink issue #6211: [FLINK-9665] PrometheusReporter does not properly unregis...
Posted by lamber-ken <gi...@git.apache.org>.
Github user lamber-ken commented on the issue:
https://github.com/apache/flink/pull/6211
Hi, @jelmerk, I have a question that when will there be two taskmanager registered in the same registry?
---