You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by suppandi <su...@gmail.com> on 2016/02/10 23:37:47 UTC

Framework disconnect kills running tasks

Hi,

I am trying to write my first framework and i wanted to test task
reconciliation. But whenever i kill my framework (with a kill -9), mesos
seems to cleanup the tasks by updating its state to TASK_KILLED.

Is there a parameter when creating the framework or the task that makes
this happen? I want my task to remain alive when the framework is
disconnected/dead.

Here is how i create my framework
https://gist.github.com/anonymous/3357783ce938c4293947

and here is how i create my task
https://gist.github.com/anonymous/d35f917ade791127f4c5

Thanks
suppandi

Re: Framework disconnect kills running tasks

Posted by Zameer Manji <zm...@apache.org>.
Setting `failover_timeout` is key. The Apache Aurora framework defaults
this value to 21 days to ensure there is no accidental destruction of tasks
in a production environment. FWIW, I think the default is terrible and not
desirable. I really think frameworks should opt in to this behaviour than
opt out. A minor ZK or network blip can cause destruction of tasks by
default.

On Wed, Feb 10, 2016 at 5:05 PM, Shuai Lin <li...@gmail.com> wrote:

> Hi suppandi,
>
> To make sure your tasks survive framework restarts, you need to:
>
> 1. When registering your framework,  set `failover_timeout` attribute of
> the FrameworkInfo PB. This is how long the master would wait for your
> framework to reconnect. By default it's 0, that's why your tasks are killed
> immediately when the framework exits.
>
> 2. When you reregister your framework, You need to use the same framework
> id as the previous run, so that the master can identify it's the framework
> reconnecting.
>
> Regards,
> Shuai
>
>
> On Thu, Feb 11, 2016 at 6:37 AM, suppandi <su...@gmail.com> wrote:
>
> > Hi,
> >
> > I am trying to write my first framework and i wanted to test task
> > reconciliation. But whenever i kill my framework (with a kill -9), mesos
> > seems to cleanup the tasks by updating its state to TASK_KILLED.
> >
> > Is there a parameter when creating the framework or the task that makes
> > this happen? I want my task to remain alive when the framework is
> > disconnected/dead.
> >
> > Here is how i create my framework
> > https://gist.github.com/anonymous/3357783ce938c4293947
> >
> > and here is how i create my task
> > https://gist.github.com/anonymous/d35f917ade791127f4c5
> >
> > Thanks
> > suppandi
> >
>
> --
> Zameer Manji
>
>

Re: Framework disconnect kills running tasks

Posted by Shuai Lin <li...@gmail.com>.
Hi suppandi,

To make sure your tasks survive framework restarts, you need to:

1. When registering your framework,  set `failover_timeout` attribute of
the FrameworkInfo PB. This is how long the master would wait for your
framework to reconnect. By default it's 0, that's why your tasks are killed
immediately when the framework exits.

2. When you reregister your framework, You need to use the same framework
id as the previous run, so that the master can identify it's the framework
reconnecting.

Regards,
Shuai


On Thu, Feb 11, 2016 at 6:37 AM, suppandi <su...@gmail.com> wrote:

> Hi,
>
> I am trying to write my first framework and i wanted to test task
> reconciliation. But whenever i kill my framework (with a kill -9), mesos
> seems to cleanup the tasks by updating its state to TASK_KILLED.
>
> Is there a parameter when creating the framework or the task that makes
> this happen? I want my task to remain alive when the framework is
> disconnected/dead.
>
> Here is how i create my framework
> https://gist.github.com/anonymous/3357783ce938c4293947
>
> and here is how i create my task
> https://gist.github.com/anonymous/d35f917ade791127f4c5
>
> Thanks
> suppandi
>