You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Bill Farner (JIRA)" <ji...@apache.org> on 2015/02/05 01:10:34 UTC

[jira] [Commented] (AURORA-909) Differentiate between dynamic and static vetoes

    [ https://issues.apache.org/jira/browse/AURORA-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306272#comment-14306272 ] 

Bill Farner commented on AURORA-909:
------------------------------------

Sorry i neglected to acknowledge this.  One thing that is disrupted by skipping offers is that we currently compute a "pending reason", so we try to find the nearest miss based on some (fairly subjective) criteria.  That said, sub-linear complexity would be really nice.  Would you mind peeling off a separate ticket with this idea?  I should be complementary to the caching here, but we need to think harder about the downstream effects of skipping.

> Differentiate between dynamic and static vetoes
> -----------------------------------------------
>
>                 Key: AURORA-909
>                 URL: https://issues.apache.org/jira/browse/AURORA-909
>             Project: Aurora
>          Issue Type: Story
>          Components: Scheduler
>            Reporter: Bill Farner
>            Assignee: Maxim Khutornenko
>
> We're making a decent effort at reducing the _cost_ of task scheduling operations, abut have not yet invested in reducing the working set in a way that causes task scheduling to scale better.  Each scheduling attempt for each task is an O(n) operation, where n is the number of offers.
> I would like to explore optimizations where we try to reduce the amount of redundant work performed in task scheduling.  Say, for example, we're trying to schedule a task that needs 2 CPUs, and we only have offers with 1 CPU.  Each scheduling round will re-assess every offer, despite the fact that the offers have not changed shape, and will always be a mismatch (hereafter termed _static_ mismatches).  Instead, we should try to skip over offers that are a static mismatch.  We could do this at the {{TaskGroup}} level, since every element in a task group is by definition statically equivalent.  This means that jobs with a large number of instances could be scheduled very efficiently, since the first task scheduling round could identify static mismatches, reducing the working set in the next round.
> This is to contrast with _dynamic_ mismatches, where a change in the tasks on a machine or other settings could make a previously-ineligible offer become a match.  The current sources of dynamic mismatches are limit constraints, host maintenance modes, and dedicated attributes.
> I propose we proceed in several steps, re-evaluating after each:
> 1. instrument the scheduler to better estimate the improvements
> 2. avoid future (offer, task group) evaluations when static mismatches are found
> 3. avoid future (offer, task group) evaluations when dynamic mismatches are found



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)