You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Maxim Khutornenko (JIRA)" <ji...@apache.org> on 2015/02/26 22:46:04 UTC

[jira] [Commented] (AURORA-1148) Display All Scheduling Veto Reasons for PENDING tasks

    [ https://issues.apache.org/jira/browse/AURORA-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339220#comment-14339220 ] 

Maxim Khutornenko commented on AURORA-1148:
-------------------------------------------

Scheduling veto reasons are tracked and reported by the {{NearestFit.java}} [1] via a {{getPendingReason}} RPC call. The current logic attempts to determine the closest task/offer fit by evaluating the number of vetoes and their overall score with less severe veto(es) taking the win. Given that a task group is matched against every available offer in the cluster, the possibility of ALL veto types issued at least once during a single scheduling loop run for a given task group is very high. Displaying all vetoes in the UI would make little sense as only the closest fit would tell the real problem.  

One option we may consider is displaying aggregated veto stats where the occurrence of each veto is reported along with the nearest fit. E.g., something like:
{noformat}
Nearest fit rejection reason: Insufficient: RAM
Other rejection reasons: 
   Limit not satisfied: rack (10 hosts), 
   Constraint not satisfied: dedicated (8 hosts),
   Insufficient: CPU (6 hosts), 
   Host maintenance (3 hosts)
{noformat}

While aggregating by veto occurrence is not ideal and may be actually misleading, it's probably the best we can do given our current implementation.

[1] - https://github.com/apache/incubator-aurora/blob/master/src/main/java/org/apache/aurora/scheduler/metadata/NearestFit.java

> Display All Scheduling Veto Reasons for PENDING tasks
> -----------------------------------------------------
>
>                 Key: AURORA-1148
>                 URL: https://issues.apache.org/jira/browse/AURORA-1148
>             Project: Aurora
>          Issue Type: Story
>          Components: Reliability, Scheduler
>            Reporter: Joe Smith
>
> Recently I was triaging an instance that would not schedule, and although I was validating many possibilities, I did get sidetracked when I saw that 'Insufficient RAM' was listed as the reason.
> In fact, there was also insufficient Disk, CPU, and hosts, as this was a task which had a dedicated constraint.
> In order to give more information, we should ensure we're exposing all Vetoes to help narrow if there is just one resource preventing scheduling, or many (all resources may point to a lack of hosts overall, for instance).
> I did notice both AURORA-377 and AURORA-911, but both seem unrelated to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)