You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Alok Singh (Jira)" <ji...@apache.org> on 2021/10/12 12:36:00 UTC

[jira] [Created] (FLINK-24514) Incorrect Flink Metrics on Job-Internal-Restart:

Alok Singh created FLINK-24514:
----------------------------------

             Summary: Incorrect Flink Metrics on Job-Internal-Restart:
                 Key: FLINK-24514
                 URL: https://issues.apache.org/jira/browse/FLINK-24514
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Metrics
    Affects Versions: 1.12.1
            Reporter: Alok Singh
         Attachments: Screenshot 2021-10-12 at 4.46.49 PM.png, Screenshot 2021-10-12 at 4.47.17 PM.png, Screenshot 2021-10-12 at 4.47.29 PM.png, Screenshot 2021-10-12 at 4.47.41 PM.png

We have been seeing metrics showing multi-folded values after Flink Job restarts (due to some internal exceptions for example something like while deployment, the job didn't get the Task Managers in time and then it restarted on its own.)

Metrics implementation:
 # We have done metrics implementation using Meter.
 # We are using Accumulators.scala to define our metrics name as Value and use this as key and MeterView as value to define it under a Map in CustomMetrics.scala.
 # For MeterView object creation, we use object of AtomicLongCounter.scala class which extends Counter interface and override its methods. (Attached code files for the same to understand better)
 # We register the metrics inside FilterReportsForSummaryAnalysis.scala.

Some points to remember:
 # Not all job internal restarts cause incorrect metrics.
 # When there are internal job-restarts which caused incorrect metrics, then if we manually restart the job (Killing it and restarting using or not using savepoints), the metrics show correct value after this manual restart.(Given that on manual restarts, no other potential exception happened again which could cause an internal restarts)
 # We are using Flink Delay Restart Strategy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)