You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@aurora.apache.org by Renan DelValle Rueda <re...@gmail.com> on 2017/10/02 20:06:58 UTC

Extending Aurora Pluggable Scheduling Proposal

Hello fellow Aurorans,

I'd like to share a proposal doc that seeks to lay out a roadmap for
bringing in new scheduling features to Aurora.

David McLaughlin did a fantastic job of getting the ball rolling with the
pluggable scheduling patches he contributed (1) and I'd like to expand upon
that work.

The overarching idea of this proposal is that everyone has different
scheduling needs and it would be great to enhance Aurora to allow operators
to meet organization specific scheduling needs without imposing them on the
rest of the community.

The features outlined in this proposal are based upon principles from
Fenzo(2) which have enjoyed great success powering Mantis(3) and Titus(4)
at Netflix.

Finally, since this proposal is about scheduling enhancements, I also
thought it would be pertinent to include talk of a feature that attempts to
avoid hosting tasks on misbehaving agents. This is due to the fact that
some of the scheduling policies introduced by this proposal can amplify the
negative effect a bad node can have on performance. (I.e. we keep on
choosing the "bad" node to schedule on and the task keeps on failing
through no fault of its own.)

Would love to hear some feedback on these ideas and/or opinions on what the
next steps should be if we were to embark on this journey.

https://docs.google.com/document/d/11ArMA53chtK-Zb_
KPMV7l_bCvTrUb005XlqzGQ2fTP4/edit#

Thanks!
-Renan

1. https://lists.apache.org/thread.html/50caf01283144ee9dacd24d3fb481a
2ca6120ceaa1289fd5b48620a4@%3Cdev.aurora.apache.org%3E

2. https://github.com/Netflix/Fenzo

3. https://medium.com/netflix-techblog/stream-processing-
with-mantis-78af913f51a6

4. https://medium.com/netflix-techblog/the-evolution-of-
container-usage-at-netflix-3abfc096781b