You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Elouan Keryell-Even (JIRA)" <ji...@apache.org> on 2015/12/01 09:15:10 UTC

[jira] [Comment Edited] (MESOS-3548) Investigate federations of Mesos masters

    [ https://issues.apache.org/jira/browse/MESOS-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15031948#comment-15031948 ] 

Elouan Keryell-Even edited comment on MESOS-3548 at 12/1/15 8:14 AM:
---------------------------------------------------------------------

My team is also interested in multi-cluster management with Mesos.

We have set up a test architecture consisting of 2 separated clusters, with one mesos master managing both of them.

The use case we are interested in is to have both clusters collaborating, each one being able to borrow a few slaves from the other, when facing a load peak (this is indeed "bursting").

I think this would imply that each cluster is managed by its own Mesos master. One of the solution we thought about for the resource borrowing was to have the two masters communicating together to temporarily lend available resources.

Elouan KERYELL-EVEN
Software engineer @ Atos Integration
Toulouse, France


was (Author: winstonsurechill):
My team is also interested in multi-cluster management with Mesos.

For now we have set up a test architecture consisting of 2 separated clusters, with one mesos master managing both of them.

The use case we are interested in is to have multiple clusters collaborating, each one being able to borrow a few slaves from another, when facing an load peak (this is indeed "bursting"). I think that would imply that each cluster is managed by one Mesos master, and that the various masters could communicate in some way or another for the resource lending/borrowing.

Elouan KERYELL-EVEN
Software engineer @ Atos Integration
Toulouse, France

> Investigate federations of Mesos masters
> ----------------------------------------
>
>                 Key: MESOS-3548
>                 URL: https://issues.apache.org/jira/browse/MESOS-3548
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Neil Conway
>              Labels: federation, mesosphere, multi-dc
>
> In a large Mesos installation, the operator might want to ensure that even if the Mesos masters are inaccessible or failed, new tasks can still be scheduled (across multiple different frameworks). HA masters are only a partial solution here: the masters might still be inaccessible due to a correlated failure (e.g., Zookeeper misconfiguration/human error).
> To support this, we could support the notion of "hierarchies" or "federations" of Mesos masters. In a Mesos installation with 10k machines, the operator might configure 10 Mesos masters (each of which might be HA) to manage 1k machines each. Then an additional "meta-Master" would manage the allocation of cluster resources to the 10 masters. Hence, the failure of any individual master would impact 1k machines at most. The meta-master might not have a lot of work to do: e.g., it might be limited to occasionally reallocating cluster resources among the 10 masters, or ensuring that newly added cluster resources are allocated among the masters as appropriate. Hence, the failure of the meta-master would not prevent any of the individual masters from scheduling new tasks. A single framework instance probably wouldn't be able to use more resources than have been assigned to a single Master, but that seems like a reasonable restriction.
> This feature might also be a good fit for a multi-datacenter deployment of Mesos: each Mesos master instance would manage a single DC. Naturally, reducing the traffic between frameworks and the meta-master would be important for performance reasons in a configuration like this.
> Operationally, this might be simpler if Mesos processes were self-hosting ([MESOS-3547]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)