You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@aurora.apache.org by "Chris Lambert (JIRA)" <ji...@apache.org> on 2014/12/01 22:45:12 UTC

[jira] [Updated] (AURORA-909) Make task scheduling more efficient

     [ https://issues.apache.org/jira/browse/AURORA-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Lambert updated AURORA-909:
---------------------------------
    Sprint: Twitter Aurora Q4 Sprint 5

> Make task scheduling more efficient
> -----------------------------------
>
>                 Key: AURORA-909
>                 URL: https://issues.apache.org/jira/browse/AURORA-909
>             Project: Aurora
>          Issue Type: Story
>          Components: Scheduler
>            Reporter: Bill Farner
>            Assignee: Maxim Khutornenko
>
> We're making a decent effort at reducing the _cost_ of task scheduling operations, abut have not yet invested in reducing the working set in a way that causes task scheduling to scale better.  Each scheduling attempt for each task is an O(n) operation, where n is the number of offers.
> I would like to explore optimizations where we try to reduce the amount of redundant work performed in task scheduling.  Say, for example, we're trying to schedule a task that needs 2 CPUs, and we only have offers with 1 CPU.  Each scheduling round will re-assess every offer, despite the fact that the offers have not changed shape, and will always be a mismatch (hereafter termed _static_ mismatches).  Instead, we should try to skip over offers that are a static mismatch.  We could do this at the {{TaskGroup}} level, since every element in a task group is by definition statically equivalent.  This means that jobs with a large number of instances could be scheduled very efficiently, since the first task scheduling round could identify static mismatches, reducing the working set in the next round.
> This is to contrast with _dynamic_ mismatches, where a change in the tasks on a machine or other settings could make a previously-ineligible offer become a match.  The current sources of dynamic mismatches are limit constraints, host maintenance modes, and dedicated attributes.
> I propose we proceed in several steps, re-evaluating after each:
> 1. instrument the scheduler to better estimate the improvements
> 2. avoid future (offer, task group) evaluations when static mismatches are found
> 3. avoid future (offer, task group) evaluations when dynamic mismatches are found



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)