You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Bill Farner (JIRA)" <ji...@apache.org> on 2014/04/02 03:34:16 UTC

[jira] [Created] (AURORA-302) TaskGroups may abandon tasks

Bill Farner created AURORA-302:
----------------------------------

             Summary: TaskGroups may abandon tasks
                 Key: AURORA-302
                 URL: https://issues.apache.org/jira/browse/AURORA-302
             Project: Aurora
          Issue Type: Bug
          Components: Scheduler
            Reporter: Bill Farner


I've yet to figure out exactly how this happens, but i've witnessed this twice successively in vagrant (but was unable to repro while trying to debug it), and once in production.

TaskGroups appears to have a bug that causes it to keep a group in the {{groups}} data structure, but with no corresponding async task in {{executor}}.  The design of TaskGroups is such that each task group must ~always be represented in both (almost always because the executor entry will be absent briefly while trying to schedule a task).

The one i observed in production looked like this (in /pendingtasks):
{noformat}
{
penaltyMs: 30000,
name: "role/env/job",
taskIds: [ ]
},
{noformat}

When i saw it in vagrant:
{noformat}
{
penaltyMs: 1,
name: "role/env/job",
taskIds: [ ]
},
{noformat}

Additionally, the {{schedule_queue_size}} in vagrant was consistently zero when i observed this, further supporting the hypothesis that the group was not being evaluated.

TaskGroups is intended to invalidate empty groups, so the mere presence of one suggests that it has been dropped.



--
This message was sent by Atlassian JIRA
(v6.2#6252)