You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Benjamin Bannier (JIRA)" <ji...@apache.org> on 2017/10/06 15:46:00 UTC

[jira] [Created] (MESOS-8058) Agent and master can race when updating agent state

Benjamin Bannier created MESOS-8058:
---------------------------------------

             Summary: Agent and master can race when updating agent state
                 Key: MESOS-8058
                 URL: https://issues.apache.org/jira/browse/MESOS-8058
             Project: Mesos
          Issue Type: Bug
          Components: agent
    Affects Versions: 1.5.0
            Reporter: Benjamin Bannier
            Assignee: Benjamin Bannier
            Priority: Critical


In {{2af9a5b07dc80151154264e974d03f56a1c25838}} we introduce the use of {{UpdateSlaveMessage}} for the agent to inform the master about its current total resources. Currently we trigger this message only on agent registration and reregistration.

This can race with operations applied in the master and communicated via {{CheckpointResourcesMessage}}.

Example:

1. Agent ({{cpus:4(\*)}} registers.
2. Master is triggered to apply an operation to the agent's resources, e.g., a reservation: {{cpus:4(\*) -> cpus:4(A)}}. The master applies the operation to its current view of the agent's resources and sends the agent a {{CheckpointResourcesMessage}} so the agent can persist the result.
3. The agent send the master an {{UpdateSlaveMessage}}, e.g., {{cpus:4(\*)}} since it hasn't received the {{CheckpointResourcesMessage}} yet.
4. The master processes the {{UpdateSlaveMessage}} and updates its view of the agent's resources to be {{cpus:4(\*)}}.
5. The agent processes the {{CheckpointResourcesMessage}} and updates its view of its resources to be {{cpus:4(A)}}.
6. The agent and the master have an inconsistent view of the agent's resources.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)