You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Greg Mann (JIRA)" <ji...@apache.org> on 2019/06/04 22:07:00 UTC

[jira] [Created] (MESOS-9818) Implement agent-side handling of automatic draining

Greg Mann created MESOS-9818:
--------------------------------

             Summary: Implement agent-side handling of automatic draining
                 Key: MESOS-9818
                 URL: https://issues.apache.org/jira/browse/MESOS-9818
             Project: Mesos
          Issue Type: Task
          Components: agent
            Reporter: Greg Mann


The agent needs to be updated to handle automatic draining. This includes the following:

The agent will have a new handler for the ‘DrainSlaveMessage’:
* ‘Slave::drain()’: checkpoint the drain info
* ‘Slave::_drain()’: Send KILL events for all tasks, with a kill policy specifying a grace period equal to the minimum of (task kill grace period, max_grace_period)

The agent’s ‘statusUpdate()’ handler will be updated:
* TASK_KILLED states will be overwritten to TASK_GONE_BY_OPERATOR when the agent is draining and is being decommissioned
* The AGENT_DRAINING reason will be inserted into all TASK_KILLING, TASK_KILLED, and TASK_GONE_BY_OPERATOR updates when the agent is draining
* The modified status updates will be checkpointed (instead of the original ones)

The agent’s recovery code will be updated to ensure that draining is being performed correctly after failover:
* If the agent is currently draining, it will loop through all tasks and send KILL events for any tasks whose latest state is not either terminal or TASK_KILLING.

The agent’s reregistration code will be updated to include the drain info in the ‘ReregisterSlaveMessage’.

The agent’s v0 ‘/state’ endpoint handler will be updated to include the drain info.

The agent’s ‘_statusUpdateAcknowledgement()’ and ‘operationStatusAcknowledgement()’ handlers will be updated to check if there are no active tasks or operations on the agent. If so, and if the agent is currently draining, then it will wipe the drain info from disk and transition into the normal, non-draining state.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)