You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Tomasz Janiszewski (JIRA)" <ji...@apache.org> on 2017/01/17 11:26:26 UTC

[jira] [Created] (MESOS-6933) Executor does not respect grace period

Tomasz Janiszewski created MESOS-6933:
-----------------------------------------

             Summary: Executor does not respect grace period
                 Key: MESOS-6933
                 URL: https://issues.apache.org/jira/browse/MESOS-6933
             Project: Mesos
          Issue Type: Bug
          Components: executor
            Reporter: Tomasz Janiszewski


Mesos Defult Executor try to support grace period with escalate but unfortunately it does not work. It launches {{command}} by wrapping it in {{sh -c}} this cause process tree to look like this

{code}
Received killTask
Shutting down
Sending SIGTERM to process tree at pid 18
Sent SIGTERM to the following process trees:
[ 
-+- 18 sh -c cd offer-i18n-0.1.24 && LD_PRELOAD=../librealresources.so ./bin/offer-i18n -e prod -p $PORT0 
 \--- 19 command...
]
Command terminated with signal Terminated (pid: 18)
{code}

This cause {{sh}} to immediately close and so executor, while wrapped {{command}} might need some more time to finish. Finally, executor thinks command executed gracefully so it won't [escalate|https://github.com/apache/mesos/blob/1.1.0/src/launcher/executor.cpp#L695] to SIGKILL.

This cause leaks when POSIX contenerizer is used because if command ignores SIGTERM it will be attached to init and never get killed. Using pid/namespace only masks the problem because hanging process is cpatured before it can gracefully shutdown.

Fix for this is to sent SIGTERM only to {{sh}} children. {{sh}} will exit when all sub processes finish. If not they will be killed by escalation to SIGKILL.

All versions from: 0.20 are affected.

This test should pass [src/tests/command_executor_tests.cpp:342|https://github.com/apache/mesos/blob/2c856178b59593ff8068ea8d6c6593943c33008c/src/tests/command_executor_tests.cpp#L342-L343]
[Mailing list thread|https://lists.apache.org/thread.html/1025dca0cf4418aee50b14330711500af864f08b53eb82d10cd5c04c@%3Cuser.mesos.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)