You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@mesos.apache.org by "James DeFelice (JIRA)" <ji...@apache.org> on 2015/02/18 12:56:11 UTC

[jira] [Commented] (MESOS-1571) Signal escalation timeout is not configurable

    [ https://issues.apache.org/jira/browse/MESOS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325794#comment-14325794 ] 

James DeFelice commented on MESOS-1571:
---------------------------------------

In the kubernetes-mesos framework, the executor Shutdown() implementation currently force-stop's the containers it's managing (which, to my understanding, sends SIGKILL). It manages Docker containers, which are normally given 10s to shut down gracefully before Docker sends a SIGKILL. That 10s timeout is not compatible with the default slave flag `executor_shudown_grace_timeout` value of mesos (3s). However if I change the value of that timeout to 20s to give the executor more time to gracefully kill things there's no way for the executor to reason about that because it has no idea of how much time it actually has.

As a workaround I've considered looking up the slave PID from the environment and querying its state.json for the startup flags, and trying to make a decision based on that. That approach seems somewhat hackish and I'd much rather do something nicer.

It would be great to have an environment var `MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD` or something, provided by the slave containerizer, so that the executor can make a decision about whether to send (via Docker) a TERM (and wait 10s) or KILL signal.

> Signal escalation timeout is not configurable
> ---------------------------------------------
>
>                 Key: MESOS-1571
>                 URL: https://issues.apache.org/jira/browse/MESOS-1571
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Niklas Quarfot Nielsen
>            Assignee: Alexander Rukletsov
>
> Even though the executor shutdown grace period is set to a larger interval, the signal escalation timeout will still be 3 seconds. It should either be configurable or dependent on EXECUTOR_SHUTDOWN_GRACE_PERIOD.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)