You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by haosdent <ha...@gmail.com> on 2016/12/04 10:10:00 UTC

Re: MESOS-6233 Allow agents to re-register post a host reboot

> we can have the agent remove `rm -f <work_dir>/meta/slaves/latest`
automatically upon recovery failure but only after the host has rebooted.
This sounds dangerous. When the different of AgentInfo is caused by
operator's typo, I think the operator would prefer to correct them and try
to start agent again. Rather than remove them automatically.

But if we decide to do that, please make sure email this behavior change to
the mailing lists in a separate email. Thank you!

On Wed, Nov 30, 2016 at 6:24 AM, tommy xiao <xi...@gmail.com> wrote:

> agree with james's options.
>
> 2016-11-30 0:48 GMT+08:00 James Peach <jo...@gmail.com>:
>
> >
> > > On Nov 28, 2016, at 6:09 PM, Yan Xu <xu...@apple.com> wrote:
> > >
> > > So one thing that was brought up during offline conversations was that
> > if the host reboot is associated with hardware change (e.g., a new memory
> > stick):
> > >
> > >       • Currently: the agent would skip the recovery (and the chance of
> > running into incompatible agent info) and register as a new agent.
> > >       • With the change: the agent could run into incompatible agent
> > info due to resource change and flap indefinitely until the operator
> > intervenes.
> > >
> > > To mitigate this and maintain the current behavior, we can have the
> > agent remove `rm -f <work_dir>/meta/slaves/latest` automatically upon
> > recovery failure but only after the host has rebooted. This way the agent
> > can restart as a new agent without operator intervention.
> > >
> > > Any thoughts?
> >
> > I still think you need a mechanism for the master/agent to tell you
> > whether it will honor the restart policy. Without this, you have to lock
> > the framework to a Mesos version.
> >
> > An empty RestartPolicy is also problematic since it precludes using
> > RestartPolicy in pods. If you later want to restart a task inside a pod
> but
> > not across agent restarts you would have no way to express that.
> >
> > J
>
>
>
>
> --
> Deshi Xiao
> Twitter: xds2000
> E-mail: xiaods(AT)gmail.com
>



-- 
Best Regards,
Haosdent Huang