You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Zhenzhong Xu (JIRA)" <ji...@apache.org> on 2017/10/20 18:13:00 UTC

[jira] [Created] (FLINK-7894) Improve metrics around fine-grained recovery and associated checkpointing behaviors

Zhenzhong Xu created FLINK-7894:
-----------------------------------

             Summary: Improve metrics around fine-grained recovery and associated checkpointing behaviors
                 Key: FLINK-7894
                 URL: https://issues.apache.org/jira/browse/FLINK-7894
             Project: Flink
          Issue Type: Improvement
    Affects Versions: 1.3.2, 1.4.0
            Reporter: Zhenzhong Xu


Currently, the only metric around fine-grained recovery is "task_failures". It's a very high level metric, it would be nice to have the following improvements:

* Allows slice and dice into which tasks were restarted. 
* Recovery duration.
* Recovery associated checkpoint behaviors: cancels, failures, etc



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)