You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "David Robinson (JIRA)" <ji...@apache.org> on 2014/05/30 01:22:02 UTC

[jira] [Created] (AURORA-493) expose accurate metrics of state transitions

David Robinson created AURORA-493:
-------------------------------------

             Summary: expose accurate metrics of state transitions
                 Key: AURORA-493
                 URL: https://issues.apache.org/jira/browse/AURORA-493
             Project: Aurora
          Issue Type: Task
          Components: Scheduler
            Reporter: David Robinson
            Priority: Minor


The task store metrics (task_store_*) exposed via http://localhost:8081/vars aren't accurate enough to be use for alerting purposes. At first glance the task_store_* metrics look like they could be used to alert on LOST tasks (task_store_LOST) increasing (among other things), but the numbers actually decrease as tasks are pruned. If a task becomes lost task_store_LOST is incremented, but it's also decremented as lost tasks are pruned, therefore if both increment and decrement occur within an alerting system's polling interval then the lost task(s) will not be captured.

Consider adding counters of task state transitions that aren't touched when tasks are pruned -- they should show the entire number of tasks that have transitioned through, or terminated in each state.



--
This message was sent by Atlassian JIRA
(v6.2#6252)