You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Benjamin Bannier (JIRA)" <ji...@apache.org> on 2018/01/10 21:37:00 UTC
[jira] [Created] (MESOS-8430) Race between operation status updates
and agent update
Benjamin Bannier created MESOS-8430:
---------------------------------------
Summary: Race between operation status updates and agent update
Key: MESOS-8430
URL: https://issues.apache.org/jira/browse/MESOS-8430
Project: Mesos
Issue Type: Task
Components: agent
Reporter: Benjamin Bannier
Currently, there exists a possible race between operation status updates triggered by a status update manager in the agent and updates to the agent's resources.
Consider a master failover where an agent has a resource provider with an operation which was not terminal. Now let the operation succeed and become terminal in the agent, but have the master failover before it processes the update. After master failover, the new master would learn about the resource provider resources via an `UpdateSlaveMessage`. Simultaneously, a status update manager in the agent could inform the master about the unacknowledged, successful operation. If the operation status update arrives in the master before the `UpdateSlaveMessage`, the operation status update handler could attempt to apply the operation on resources unknown to it, yet. This would likely trigger a `CHECK` failure in a contains check.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)