You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Robert Kanter (JIRA)" <ji...@apache.org> on 2014/05/21 00:55:38 UTC

[jira] [Updated] (OOZIE-1828) Introduce counters JobStatus terminal states metrics

     [ https://issues.apache.org/jira/browse/OOZIE-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Kanter updated OOZIE-1828:
---------------------------------

    Attachment: OOZIE-1828.patch

The patch simply adds a new "killed" counter.  Note that there is a different between the "jobs.kill" and the new "jobs.killed" counter where "kill" is jobs killed by the {{oozie job -kill}} CLI command (or equivalent REST) and "killed" is jobs killed by the kill node in a workflow.

No tests, but I verified that it works as expected.

> Introduce counters JobStatus terminal states metrics
> ----------------------------------------------------
>
>                 Key: OOZIE-1828
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1828
>             Project: Oozie
>          Issue Type: Bug
>          Components: monitoring
>    Affects Versions: 4.0.0
>            Reporter: Gilad Wolff
>            Assignee: Robert Kanter
>         Attachments: OOZIE-1828.patch
>
>
> Currently the Oozie server exposes job status metrics from the 'variables' group. These include metrics for jobs in terminal states: 'SUCCEEDED', 'FAILED', 'KILLED'. The way Oozie compute the metrics is by querying the database for all jobs in each and every state. This means that when a purge happens these "apparent" counters' values are going to change (if anything was purged). This renders these counters as not very useful.
> It would be better if real counters for jobs in terminal states can be exposed from the oozie server. One way to do this would be to initialize an in-memory counters and a timestamp count the jobs that finished between the timestamp and 'now' (and keep updating timestamp to avoid it falling out of the retention period). This means that each Oozie server may have its own counter but that is okay as the count itself is not very important what is important is the rate-of-change.



--
This message was sent by Atlassian JIRA
(v6.2#6252)