You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Benjamin Bannier (JIRA)" <ji...@apache.org> on 2017/10/09 16:37:00 UTC

[jira] [Comment Edited] (MESOS-8058) Agent and master can race when updating agent state

    [ https://issues.apache.org/jira/browse/MESOS-8058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196870#comment-16196870 ] 

Benjamin Bannier edited comment on MESOS-8058 at 10/9/17 4:36 PM:
------------------------------------------------------------------

Reviews:
https://reviews.apache.org/r/62843/
https://reviews.apache.org/r/62834/
https://reviews.apache.org/r/62847/


was (Author: bbannier):
Reviews:
https://reviews.apache.org/r/62843/
https://reviews.apache.org/r/62834/

> Agent and master can race when updating agent state
> ---------------------------------------------------
>
>                 Key: MESOS-8058
>                 URL: https://issues.apache.org/jira/browse/MESOS-8058
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent
>    Affects Versions: 1.5.0
>            Reporter: Benjamin Bannier
>            Assignee: Benjamin Bannier
>            Priority: Critical
>              Labels: mesosphere
>
> In {{2af9a5b07dc80151154264e974d03f56a1c25838}} we introduce the use of {{UpdateSlaveMessage}} for the agent to inform the master about its current total resources. Currently we trigger this message only on agent registration and reregistration.
> This can race with operations applied in the master and communicated via {{CheckpointResourcesMessage}}.
> Example:
> 1. Agent ({{cpus:4(\*)}} registers.
> 2. Master is triggered to apply an operation to the agent's resources, e.g., a reservation: {{cpus:4(\*) -> cpus:4(A)}}. The master applies the operation to its current view of the agent's resources and sends the agent a {{CheckpointResourcesMessage}} so the agent can persist the result.
> 3. The agent sends the master an {{UpdateSlaveMessage}}, e.g., {{cpus:4(\*)}} since it hasn't received the {{CheckpointResourcesMessage}} yet.
> 4. The master processes the {{UpdateSlaveMessage}} and updates its view of the agent's resources to be {{cpus:4(\*)}}.
> 5. The agent processes the {{CheckpointResourcesMessage}} and updates its view of its resources to be {{cpus:4(A)}}.
> 6. The agent and the master have an inconsistent view of the agent's resources.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)