You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Kevin Liu (Jira)" <ji...@apache.org> on 2020/08/25 06:12:00 UTC

[jira] [Issue Comment Deleted] (FLINK-19009) wrong way to calculate the "downtime" metric

     [ https://issues.apache.org/jira/browse/FLINK-19009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kevin Liu updated FLINK-19009:
------------------------------
    Comment: was deleted

(was: I think I can fix this bug, but first we need to reach a consensus on the definition of 'a failing/recovering situation' in Flink Docs. There are 10 types of JobStatus. And 'FAILING' is the case. But what about the others? For example, 'RECONCILING'. ([~jark] What do you think, or do you know someone familiar with this part?))

> wrong way to calculate the "downtime" metric
> --------------------------------------------
>
>                 Key: FLINK-19009
>                 URL: https://issues.apache.org/jira/browse/FLINK-19009
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination, Runtime / Metrics
>    Affects Versions: 1.7.2, 1.8.0
>            Reporter: Zhinan Cheng
>            Priority: Trivial
>             Fix For: 1.12.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Currently the way to calculate the Flink system metric "downtime"  is not consistent with the description in the doc, now the downtime is actually the current timestamp minus the time timestamp when the job started.
>    
> But Flink doc (https://flink.apache.org/gettinghelp.html) obviously describes the time as the current timestamp minus the timestamp when the job failed.
>  
> I believe we should update the code this metric as the Flink doc shows. The easy way to solve this is using the current timestamp to minus the latest uptime timestamp.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)