You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Neil Conway (JIRA)" <ji...@apache.org> on 2017/03/31 17:34:41 UTC

[jira] [Created] (MESOS-7333) Clarify log message when agent rate removal limit is applied

Neil Conway created MESOS-7333:
----------------------------------

             Summary: Clarify log message when agent rate removal limit is applied
                 Key: MESOS-7333
                 URL: https://issues.apache.org/jira/browse/MESOS-7333
             Project: Mesos
          Issue Type: Bug
          Components: master
            Reporter: Neil Conway


When the master begins to mark an agent unreachable and the agent removal rate limit is set, we log:

{noformat}
Scheduling removal of agent 07ae6114-a59a-41d5-a3d5-32e6681eb17d-S2 at slave(1)@192.168.10.45:5051 (192.168.10.45); did not re-register within 10mins after disconnecting
{noformat}

This can be improved. The important question for an operator is: _how long will it take for the agent to be removed?_ If this removal falls below the rate limit, the agent will be removed immediately; if it does not, the removal might not happen for a long time. It would be great to distinguish between these two cases in the log output.

For example: if the rate limit is configured but we're going to remove the agent immediately anyway, then just log the same output we normally do (skip "Scheduling..."). Whereas if the rate limit is going to delay removing the agent, we should (a) make that _clear_ in the output (b) ideally include some prediction of how long it will take for the agent to be removed. e.g., "Scheduling removal of agent ABC; agent removal rate limit of X is in effect, waiting Y until removing the agent."



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)