You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Anindya Sinha (JIRA)" <ji...@apache.org> on 2017/02/26 02:05:44 UTC
[jira] [Created] (MESOS-7181) Stale frameworks seen on Mesos, but
not known to schedulers
Anindya Sinha created MESOS-7181:
------------------------------------
Summary: Stale frameworks seen on Mesos, but not known to schedulers
Key: MESOS-7181
URL: https://issues.apache.org/jira/browse/MESOS-7181
Project: Mesos
Issue Type: Bug
Components: general
Reporter: Anindya Sinha
Assignee: Anindya Sinha
Using a scheduler which launches multiple frameworks using scheduler driver, we observe occasionally that a framework exists on Mesos which is not known to the scheduler. Since there is no entity that acts on the offers, this framework ends up hogging all the offers leading to starvation in the cluster.
This particular scenario is as follows:
1) Scheduler does a driver.start() which results in the 1st SUBSCRIBE sent to master.
2) The scheduler driver resends the SUBSCRIBE (since the framework has not yet registered) which is a result of the exponential backoff.
3) Framework is registered based on the 1st SUBSCRIBE, but the scheduler issues a driver.stop() immediately which results in a TEARDOWN sent to the master.
4) Master processes the TEARDOWN which removes the framework.
5) Master now processes the 2nd SUBSCRIBE (after authorization) and tries to add this framework. This succeeds and a new framework id is generated (since the original framework is no longer registered after the TEARDOWN) but the Scheduler driver by now has already terminated once the scheduler issued the driver.stop(). So, master continues to send offers to this 2nd framework and hogs on to offers till offer time out.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)