You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Dominic Hamon (JIRA)" <ji...@apache.org> on 2014/06/20 18:26:26 UTC

[jira] [Commented] (MESOS-1517) Maintain a queue of messages that arrive before the master recovers.

    [ https://issues.apache.org/jira/browse/MESOS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038988#comment-14038988 ] 

Dominic Hamon commented on MESOS-1517:
--------------------------------------

This seems like a worthwhile change.

There will be a subtle change in behaviour for message senders. Currently, senders will either get a response within a certain time (if the master is recovered) or no response at all. With this change, senders will always* get a response but it may take longer if recovery is in progress. I doubt that any senders currently rely on this behaviour (they shouldn't).

* for certain definitions of always

> Maintain a queue of messages that arrive before the master recovers.
> --------------------------------------------------------------------
>
>                 Key: MESOS-1517
>                 URL: https://issues.apache.org/jira/browse/MESOS-1517
>             Project: Mesos
>          Issue Type: Improvement
>          Components: master
>            Reporter: Benjamin Mahler
>              Labels: reliability
>             Fix For: 0.19.0
>
>
> Currently when the master is recovering, we drop all incoming messages. If slaves and frameworks knew about the leading master only once it has recovered, then we would only expect to see messages after we've recovered.
> We previously considered enqueuing all messages through the recovery future, but this has the downside of forcing all messages to go through the master's queue twice:
> {code}
>   // TODO(bmahler): Consider instead re-enqueing *all* messages
>   // through recover(). What are the performance implications of
>   // the additional queueing delay and the accumulated backlog
>   // of messages post-recovery?
>   if (!recovered.get().isReady()) {
>     VLOG(1) << "Dropping '" << event.message->name << "' message since "
>             << "not recovered yet";
>     ++metrics.dropped_messages;
>     return;
>   }
> {code}
> However, an easy solution to this problem is to maintain an explicit queue of incoming messages that gets flushed once we finish recovery. This ensures that all messages post-recovery are processed normally.



--
This message was sent by Atlassian JIRA
(v6.2#6252)