You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Bill Farner (JIRA)" <ji...@apache.org> on 2014/06/24 02:14:24 UTC

[jira] [Commented] (AURORA-548) scheduler should always show tasks_lost_rack_XXX metrics

    [ https://issues.apache.org/jira/browse/AURORA-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041509#comment-14041509 ] 

Bill Farner commented on AURORA-548:
------------------------------------

The scheduler doesn't necessarily know about all the racks in the cluster, and may only know about certain racks intermittently.  I'm afraid the best approach might be to not assume all racks are present.

> scheduler should always show tasks_lost_rack_XXX metrics
> --------------------------------------------------------
>
>                 Key: AURORA-548
>                 URL: https://issues.apache.org/jira/browse/AURORA-548
>             Project: Aurora
>          Issue Type: Task
>          Components: Scheduler
>            Reporter: David Robinson
>
> The scheduler's /vars endpoint only exposes a tasks_lost_rack_XXX metric when tasks in a rack have been lost (a tasks_lost_rack_XXX key has a non-zero value). If no tasks in a rack have been lost then metrics for the rack are not exposed. This makes the metrics difficult to use for alerting purposes --  it's impossible to tell whether the rack does not exist or exists but has had no lost tasks. Each rack should have an entry in /vars regardless of whether there have been any lost tasks.
> Sample metrics:
> tasks_lost_rack_aab 3
> tasks_lost_rack_aae 4
> tasks_lost_rack_aah 2
> tasks_lost_rack_aai 3
> Expected metrics:
> tasks_lost_rack_aaa 0
> tasks_lost_rack_aab 3
> tasks_lost_rack_aac 0
> tasks_lost_rack_aad 0
> tasks_lost_rack_aae 4
> tasks_lost_rack_aaf 0
> tasks_lost_rack_aag 0
> tasks_lost_rack_aah 2
> tasks_lost_rack_aai 3



--
This message was sent by Atlassian JIRA
(v6.2#6252)