You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Joerg Schad <jo...@mesosphere.io> on 2015/03/26 16:27:14 UTC

Slave recovery not recovering tasks when using systemd

Dear Mesos Users,
I just wanted to point out a solved issue (https://issues.apache.org/jira/browse/MESOS-2419 <https://issues.apache.org/jira/browse/MESOS-2419>) where the systemd default behaviour prevents tasks from recovering.

The problem is that the default KillMode for systemd processes is cgroup (http://www.freedesktop.org/software/systemd/man/systemd.kill.html <http://www.freedesktop.org/software/systemd/man/systemd.kill.html>) and hence all child processes are killed when the slave stops.
Explicitly setting the KillMode to process allows the executors to survive and reconnect. 

Feel free to check our configuration at: https://github.com/mesosphere/mesos-deb-packaging/blob/master/systemd/slave.systemd <https://github.com/mesosphere/mesos-deb-packaging/blob/master/systemd/slave.systemd>
Thanks,
Joerg

Re: Slave recovery not recovering tasks when using systemd

Posted by Jörg Schad <jo...@mesosphere.io>.
Lukas is just rebuilding the packages, so should be updated shortly.

Joerg

On Thu, Mar 26, 2015 at 4:31 PM, Jeff Schroeder <je...@computer.org>
wrote:

> On Thursday, March 26, 2015, Joerg Schad <jo...@mesosphere.io> wrote:
>
>> Dear Mesos Users,
>> I just wanted to point out a solved issue (
>> https://issues.apache.org/jira/browse/MESOS-2419) where the *systemd*
>> default behaviour prevents tasks from recovering.
>>
>> The problem is that the default KillMode for systemd processes is
>> *cgroup* (
>> http://www.freedesktop.org/software/systemd/man/systemd.kill.html) and
>> hence all child processes are killed when the slave stops.
>> Explicitly setting the KillMode to *process* allows the executors to
>> survive and reconnect.
>>
>> Feel free to check our configuration at:
>> https://github.com/mesosphere/mesos-deb-packaging/blob/master/systemd/slave.systemd
>>
>
> Thanks for the heads up! Will the RHEL7 packages be updated in the
> mesosphere repository to account for this?
>
>
> --
> Text by Jeff, typos by iPhone
>

Re: Slave recovery not recovering tasks when using systemd

Posted by Jeff Schroeder <je...@computer.org>.
On Thursday, March 26, 2015, Joerg Schad <jo...@mesosphere.io> wrote:

> Dear Mesos Users,
> I just wanted to point out a solved issue (
> https://issues.apache.org/jira/browse/MESOS-2419) where the *systemd*
> default behaviour prevents tasks from recovering.
>
> The problem is that the default KillMode for systemd processes is *cgroup*
> (http://www.freedesktop.org/software/systemd/man/systemd.kill.html) and
> hence all child processes are killed when the slave stops.
> Explicitly setting the KillMode to *process* allows the executors to
> survive and reconnect.
>
> Feel free to check our configuration at:
> https://github.com/mesosphere/mesos-deb-packaging/blob/master/systemd/slave.systemd
>

Thanks for the heads up! Will the RHEL7 packages be updated in the
mesosphere repository to account for this?


-- 
Text by Jeff, typos by iPhone