You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by jelmerk <gi...@git.apache.org> on 2018/06/27 07:49:14 UTC

[GitHub] flink pull request #6211: [FLINK-9665] PrometheusReporter does not properly ...

GitHub user jelmerk opened a pull request:

    https://github.com/apache/flink/pull/6211

    [FLINK-9665] PrometheusReporter does not properly unregister metrics

    ## What is the purpose of the change
    
    Prometheus metrics from canceled jobs are not always correctly cleaned up. This patch aims to remedy this
    
    ## Brief change log
    
    - always unregister childs from collectors in notifyOfRemovedMetric so that no metrics stay behind when a job gets canceled but the collector does not get unregistered
    
    ## Verifying this change
    
    I added a metricIsRemovedWhenCollectorIsNotUnregisteredYet test to PrometheusReporterTest
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): no
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
      - The serializers: no
      - The runtime per-record code paths (performance sensitive): no
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
      - The S3 file system connector: no
    
    ## Documentation
    
      - Does this pull request introduce a new feature?: no
      - If yes, how is the feature documented?: not applicable


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jelmerk/flink FLINK_9665

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/6211.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6211
    
----
commit 1dc5eb1e3ee8a1a15ecb95e5084db37fe7298cf4
Author: Jelmer Kuperus <jk...@...>
Date:   2018-06-26T23:05:11Z

    [FLINK-9665] PrometheusReporter does not properly unregister metrics

----


---

[GitHub] flink pull request #6211: [FLINK-9665] PrometheusReporter does not properly ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/6211


---

[GitHub] flink issue #6211: [FLINK-9665] PrometheusReporter does not properly unregis...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/6211
  
    @lamber-ken If you are referring to the added test, it covers the case of 2 jobs (`TaskManager>>Job<<MetricGroup`) in the same registry, not TaskManagers.
    
    There will usually not be multiple TaskManagers registered in a single registry, but this issue applies in general to any subset of metrics with the same logical scope.


---

[GitHub] flink pull request #6211: [FLINK-9665] PrometheusReporter does not properly ...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6211#discussion_r198460116
  
    --- Diff: flink-metrics/flink-metrics-prometheus/src/test/java/org/apache/flink/metrics/prometheus/PrometheusReporterTest.java ---
    @@ -159,6 +162,27 @@ public void histogramIsReportedAsPrometheusSummary() throws UnirestException {
     		}
     	}
     
    +	@Test
    +	public void metricIsRemovedWhenCollectorIsNotUnregisteredYet() throws UnirestException {
    +		TaskManagerMetricGroup tmMetricGroup = new TaskManagerMetricGroup(registry, HOST_NAME, TASK_MANAGER);
    +
    +		String metricName = "numRecordsOut";
    +
    +		Counter metric1 = new SimpleCounter();
    +		FrontMetricGroup<TaskManagerJobMetricGroup> metricGroup1 = new FrontMetricGroup<>(1, new TaskManagerJobMetricGroup(registry, tmMetricGroup, JobID.generate(), "job_1"));
    +		reporter.notifyOfAddedMetric(metric1, metricName, metricGroup1);
    +
    +		Counter metric2 = new SimpleCounter();
    +		FrontMetricGroup<TaskManagerJobMetricGroup> metricGroup2 = new FrontMetricGroup<>(2, new TaskManagerJobMetricGroup(registry, tmMetricGroup, JobID.generate(), "job_2"));
    --- End diff --
    
    nit: the first argument to `FrontMetricGroup` should be identical since you're passing it to the same reporter. Actually it shouldn't be necessary to create a `FrontMetricGroup` at all, as it is only need to introduce reporter specific configuration properties.


---

[GitHub] flink pull request #6211: [FLINK-9665] PrometheusReporter does not properly ...

Posted by jelmerk <gi...@git.apache.org>.
Github user jelmerk commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6211#discussion_r198484545
  
    --- Diff: flink-metrics/flink-metrics-prometheus/src/test/java/org/apache/flink/metrics/prometheus/PrometheusReporterTest.java ---
    @@ -159,6 +162,27 @@ public void histogramIsReportedAsPrometheusSummary() throws UnirestException {
     		}
     	}
     
    +	@Test
    +	public void metricIsRemovedWhenCollectorIsNotUnregisteredYet() throws UnirestException {
    +		TaskManagerMetricGroup tmMetricGroup = new TaskManagerMetricGroup(registry, HOST_NAME, TASK_MANAGER);
    +
    +		String metricName = "numRecordsOut";
    +
    +		Counter metric1 = new SimpleCounter();
    +		FrontMetricGroup<TaskManagerJobMetricGroup> metricGroup1 = new FrontMetricGroup<>(1, new TaskManagerJobMetricGroup(registry, tmMetricGroup, JobID.generate(), "job_1"));
    +		reporter.notifyOfAddedMetric(metric1, metricName, metricGroup1);
    +
    +		Counter metric2 = new SimpleCounter();
    +		FrontMetricGroup<TaskManagerJobMetricGroup> metricGroup2 = new FrontMetricGroup<>(2, new TaskManagerJobMetricGroup(registry, tmMetricGroup, JobID.generate(), "job_2"));
    --- End diff --
    
    PrometheusReporter does an explicit cast to FrontMetricGroup
    https://github.com/jelmerk/flink/blob/642b21647d78653ebe6a3e9fad8f629599e10367/flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusReporter.java#L242
    
    I'll fix the first argument


---

[GitHub] flink issue #6211: [FLINK-9665] PrometheusReporter does not properly unregis...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/6211
  
    I've merged the PR, @jelmerk could you close it? Thanks!


---

[GitHub] flink issue #6211: [FLINK-9665] PrometheusReporter does not properly unregis...

Posted by lamber-ken <gi...@git.apache.org>.
Github user lamber-ken commented on the issue:

    https://github.com/apache/flink/pull/6211
  
    Hi, @jelmerk, I have a question that when will there be two taskmanager registered in the same registry?


---