You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Benjamin Mahler <be...@gmail.com> on 2014/02/07 21:18:00 UTC

Re: Stateful Master

This had been put on the back-burner in favor of shipping other features.

Work is beginning on the Registrar again, I've created a design doc here:
https://cwiki.apache.org/confluence/display/MESOS/Registrar+Design+Document

Expect to new reviews in this area as well! Please reach out with any
questions / feedback.


On Thu, Oct 31, 2013 at 5:02 PM, Benjamin Mahler
<be...@gmail.com>wrote:

> Hi All,
>
> I'd like to mention some changes that have been discussed amongst the
> committers but have not yet been shared broadly with the list.
>
> The central component of Mesos is the "Master". The Master is responsible
> for administering slaves, frameworks, and resource offers. It also handles
> task launching requests, status updates, and framework messages. As you may
> or may not know, the Master is currently stateless, in that it does not
> persist any information across failovers. Rather, the Master currently
> recovers all of its state from the slaves and frameworks that re-register
> after a failover.
>
> This design has many benefits. First, failing over a Master is a trivial
> operation. Second, we do not have the performance overhead and complexity
> of dealing with persistent state. However, this design opens up a few cases
> for information loss in the system. For example, when no Master is running
> and a Slave fails permanently, there's no knowledge of this in the failed
> over Master.
>
> In order to detect these events, we'd like to add persistence of the
> registered slaves. The first step for this was creating the Registrar:
>  https://reviews.apache.org/r/14383/
> https://reviews.apache.org/r/14384/
> https://reviews.apache.org/r/15099/
> https://reviews.apache.org/r/15100/
>
> The Registrar is responsible for keeping the official records of the
> master. This will initially include SlaveInfo in order to correctly handle
> cases like the example I provided above. The Registrar is agnostic to the
> underlying data storage and can be backed by a local LevelDB, by ZooKeeper
> (for high availability Masters), and in the future by our reconfigurable
> replicated log.
>
> The next steps are to implement "statefulness" in the Master using the
> Registrar. So far I've sent out some of the preliminary cleanup work, and I
> have a few pending patches that I'm in the process of cleaning up that
> implement this fully so keep an eye out for those.
>
> In the longer term, we will add persistence of framework information in
> the same vein. That is, handling framework failures in the presence of
> Master failures.
>
> Cheers!
> Ben
>