You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Bill Farner (JIRA)" <ji...@apache.org> on 2014/04/08 01:30:18 UTC

[jira] [Reopened] (AURORA-302) TaskGroups may abandon tasks

     [ https://issues.apache.org/jira/browse/AURORA-302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bill Farner reopened AURORA-302:
--------------------------------

    Assignee: Bill Farner

> TaskGroups may abandon tasks
> ----------------------------
>
>                 Key: AURORA-302
>                 URL: https://issues.apache.org/jira/browse/AURORA-302
>             Project: Aurora
>          Issue Type: Bug
>          Components: Scheduler
>            Reporter: Bill Farner
>            Assignee: Bill Farner
>
> I've yet to figure out exactly how this happens, but i've witnessed this twice successively in vagrant (but was unable to repro while trying to debug it), and once in production.
> TaskGroups appears to have a bug that causes it to keep a group in the {{groups}} data structure, but with no corresponding async task in {{executor}}.  The design of TaskGroups is such that each task group must ~always be represented in both (almost always because the executor entry will be absent briefly while trying to schedule a task).
> The one i observed in production looked like this (in /pendingtasks):
> {noformat}
> {
> penaltyMs: 30000,
> name: "role/env/job",
> taskIds: [ ]
> },
> {noformat}
> When i saw it in vagrant:
> {noformat}
> {
> penaltyMs: 1,
> name: "role/env/job",
> taskIds: [ ]
> },
> {noformat}
> Additionally, the {{schedule_queue_size}} in vagrant was consistently zero when i observed this, further supporting the hypothesis that the group was not being evaluated.
> TaskGroups is intended to invalidate empty groups, so the mere presence of one suggests that it has been dropped.



--
This message was sent by Atlassian JIRA
(v6.2#6252)