You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2016/11/02 02:46:58 UTC

[jira] [Commented] (MESOS-6522) Ability to set global maximum executor runtime for an agent

    [ https://issues.apache.org/jira/browse/MESOS-6522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15627454#comment-15627454 ] 

Benjamin Mahler commented on MESOS-6522:
----------------------------------------

The current thinking w.r.t. to maintenance is that we ask for resources back from the schedulers (via inverse offers). Schedulers know the deadline and should co-operate with these requests. Once the maintenance window begins, the operator could force the draining of the agent but this will potentially cause SLA violations or data loss if the maintenance is destructive. Because of this, we'd like the operator to make this call. Also worth noting that it is to be expected that some attempts to do maintenance do not succeed since they would have led to SLA violations for the frameworks, or data loss in the case of destructive maintenance. In these cases the operator can follow up on the "stragglers" with a more suitable maintenance plan.

A maximum executor lifetime is interesting in that it forces churn in the cluster, but it would make it very difficult to implement certain classes of workloads (e.g. data storage) and I suspect it would frustrate framework developers since they have no control over it. In general we try to give control to the frameworks, since only they understand the workload.

> Ability to set global maximum executor runtime for an agent
> -----------------------------------------------------------
>
>                 Key: MESOS-6522
>                 URL: https://issues.apache.org/jira/browse/MESOS-6522
>             Project: Mesos
>          Issue Type: Improvement
>          Components: slave
>            Reporter: Will Rouesnel
>            Priority: Minor
>
> With the developing concept of agent maintenance mode, it would be nice to have some blunt-force ability to reason about the behavior of agents on uncooperative frameworks.
> Ideally there would be a new parameter --executor_maximum_lifetime which would specify a maximum duration for which *any* executor on an agent can run before being terminated.
> Even when using persistent schedulers such as Marathon, the ability to enforce reasonable gurantees about when an agent's tasks definitely must end can help contribute to keeping the cluster turning over and prevent nodes becoming "special" or jammed up with jobs which will not end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)