You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@aurora.apache.org by Benjamin Mahler <be...@gmail.com> on 2014/04/30 22:39:20 UTC

Fwd: protecting mesos from fat fingers

+aurora dev list

There are many mechanisms built into Aurora to prevent these scenarios, you
may want to reach out to them for insight.

We have discussed some API rate limiting within Mesos, but this is near the
limit of policy that Mesos could enforce, as we don't understand the
semantics of the tasks being launched. Rate limiting within Mesos also
doesn't solve the problem of a flapping task within Marathon.

---------- Forwarded message ----------
From: Dick Davies <di...@hellooperator.net>
Date: Wed, Apr 30, 2014 at 11:30 AM
Subject: protecting mesos from fat fingers
To: user@mesos.apache.org


Managed to take out a mesos slave today with a typo while launching
a marathon app, and wondered if there are throttles/limits that can be
applied to repeated launches to limit the risk of such mistakes in the
future.

I started a thread on the marathon list
 (
https://groups.google.com/forum/?hl=en#!topic/marathon-framework/4iWLqTYTvgM
)

[ TL:DR: marathon throws an app that will never deploy correctly at slaves
until the disk fills with debris and the slave dies ]

but I suppose this could be something available in mesos itself.

I can't find a lot of advice about operational aspects of Mesos admin;
could others here provide some good advice about their experience in
preventing failed task deploys from causing trouble on their clusters?

Thanks!