You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Neil Conway (JIRA)" <ji...@apache.org> on 2017/06/15 21:02:00 UTC

[jira] [Created] (MESOS-7681) Add safeguard for new agents with new features + old master

Neil Conway created MESOS-7681:
----------------------------------

             Summary: Add safeguard for new agents with new features + old master
                 Key: MESOS-7681
                 URL: https://issues.apache.org/jira/browse/MESOS-7681
             Project: Mesos
          Issue Type: Improvement
            Reporter: Neil Conway


Consider this scenario:

* Mesos cluster with 3 masters and 1 agent.
* 2 of the masters (including the leader) are upgraded to Mesos 1.4; remaining master stays at Mesos 1.3 (e.g., due to operator error).
* Agent is upgraded to Mesos 1.4
* Framework creates a reservation refinement on the agent
* Leading master fails; Mesos 1.3 master is elected as the new leader

In this scenario, the agent will send resources to the master in the new (post-refinement) format, but the master will not understand those new fields. This results in an inconsistency between the agent's resources and the master's view of the agent's resources. This could lead to various problems -- in effect, the reservation the framework previously made has been "forgotten" during master failover. Similarly, if the agent attempts to unreserve the resources (using the master's version of the resource), that operation will be rejected by the agent.

To fix this, it seems we need an explicit negotiation between the agent and the master as part of registration/re-registration. The agent would examine its resources and say which capabilities it _requires_ of the master; if the master does not support those resources, the agent cannot safely register. We could implement this either via master capabilities (agent computes the master capabilities it requires and declines to register if the master isn't new enough), or via agent capabilities (agent tells master the capabilities it is "actively using"; master refuses to allow any agent to register that is using a capability the master doesn't recognize/support). Probably the former is safer/cleaner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)