You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Maxim Khutornenko (JIRA)" <ji...@apache.org> on 2014/12/05 02:29:12 UTC

[jira] [Comment Edited] (AURORA-909) Make task scheduling more efficient

    [ https://issues.apache.org/jira/browse/AURORA-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226561#comment-14226561 ] 

Maxim Khutornenko edited comment on AURORA-909 at 12/5/14 1:28 AM:
-------------------------------------------------------------------

Testbed for Step 2:
https://reviews.apache.org/r/28710
https://reviews.apache.org/r/28731/


was (Author: maximk):
Testbed for Step 2: https://reviews.apache.org/r/28474/

> Make task scheduling more efficient
> -----------------------------------
>
>                 Key: AURORA-909
>                 URL: https://issues.apache.org/jira/browse/AURORA-909
>             Project: Aurora
>          Issue Type: Story
>          Components: Scheduler
>            Reporter: Bill Farner
>            Assignee: Maxim Khutornenko
>
> We're making a decent effort at reducing the _cost_ of task scheduling operations, abut have not yet invested in reducing the working set in a way that causes task scheduling to scale better.  Each scheduling attempt for each task is an O(n) operation, where n is the number of offers.
> I would like to explore optimizations where we try to reduce the amount of redundant work performed in task scheduling.  Say, for example, we're trying to schedule a task that needs 2 CPUs, and we only have offers with 1 CPU.  Each scheduling round will re-assess every offer, despite the fact that the offers have not changed shape, and will always be a mismatch (hereafter termed _static_ mismatches).  Instead, we should try to skip over offers that are a static mismatch.  We could do this at the {{TaskGroup}} level, since every element in a task group is by definition statically equivalent.  This means that jobs with a large number of instances could be scheduled very efficiently, since the first task scheduling round could identify static mismatches, reducing the working set in the next round.
> This is to contrast with _dynamic_ mismatches, where a change in the tasks on a machine or other settings could make a previously-ineligible offer become a match.  The current sources of dynamic mismatches are limit constraints, host maintenance modes, and dedicated attributes.
> I propose we proceed in several steps, re-evaluating after each:
> 1. instrument the scheduler to better estimate the improvements
> 2. avoid future (offer, task group) evaluations when static mismatches are found
> 3. avoid future (offer, task group) evaluations when dynamic mismatches are found



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)