You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by tommy xiao <xi...@gmail.com> on 2017/09/01 05:46:12 UTC

recovery_agent_removal_limit usage question?

toady i have a curious to read mesos source code for
--recovery_agent_removal_limit. how does it working from source code. i
have not found any useful logic for recovery_agent_removal_limit. anyone
can do me favor?

-- 
Deshi Xiao
Twitter: xds2000
E-mail: xiaods(AT)gmail.com

Re: recovery_agent_removal_limit usage question?

Posted by tommy xiao <xi...@gmail.com>.
Thanks for your answer, it clarify it and let me know it. thanks a lot.

2017-09-01 21:35 GMT+08:00 Ilya Pronin <ip...@twopensource.com>:

> Hey,
>
> I'm not sure I understood your question correctly. But AFAIK
> recovery_agent_removal_limit flag is intended to limit the number of agents
> that will be marked unreachable after the re-registration timeout. If the
> master sees that it has to remove more agents than the limit allows, it
> will failover. Otherwise, agents that have not yet re-registered will be
> marked unreachable at slave_removal_rate_limit. Here's the code that does
> that:
> https://github.com/apache/mesos/blob/master/src/master/master.cpp#L1946
>
> We no longer shutdown agents if they try to re-register after being marked
> unreachable, so we can safely remove those agents from the registry.
> However, it still might be a good signal for the operator to investigate
> why a lot of agents did not re-register.
>
> On Fri, Sep 1, 2017 at 6:46 AM, tommy xiao <xi...@gmail.com> wrote:
>
> > toady i have a curious to read mesos source code for
> > --recovery_agent_removal_limit. how does it working from source code. i
> > have not found any useful logic for recovery_agent_removal_limit. anyone
> > can do me favor?
> >
> > --
> > Deshi Xiao
> > Twitter: xds2000
> > E-mail: xiaods(AT)gmail.com
> >
>
> --
> Ilya Pronin
>



-- 
Deshi Xiao
Twitter: xds2000
E-mail: xiaods(AT)gmail.com

Re: recovery_agent_removal_limit usage question?

Posted by Ilya Pronin <ip...@twopensource.com>.
Hey,

I'm not sure I understood your question correctly. But AFAIK
recovery_agent_removal_limit flag is intended to limit the number of agents
that will be marked unreachable after the re-registration timeout. If the
master sees that it has to remove more agents than the limit allows, it
will failover. Otherwise, agents that have not yet re-registered will be
marked unreachable at slave_removal_rate_limit. Here's the code that does
that:
https://github.com/apache/mesos/blob/master/src/master/master.cpp#L1946

We no longer shutdown agents if they try to re-register after being marked
unreachable, so we can safely remove those agents from the registry.
However, it still might be a good signal for the operator to investigate
why a lot of agents did not re-register.

On Fri, Sep 1, 2017 at 6:46 AM, tommy xiao <xi...@gmail.com> wrote:

> toady i have a curious to read mesos source code for
> --recovery_agent_removal_limit. how does it working from source code. i
> have not found any useful logic for recovery_agent_removal_limit. anyone
> can do me favor?
>
> --
> Deshi Xiao
> Twitter: xds2000
> E-mail: xiaods(AT)gmail.com
>

-- 
Ilya Pronin