You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Stefano (JIRA)" <ji...@apache.org> on 2016/04/14 01:11:25 UTC
[jira] [Comment Edited] (MESOS-3548) Investigate federations of Mesos masters

    [ https://issues.apache.org/jira/browse/MESOS-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240195#comment-15240195 ] 

Stefano edited comment on MESOS-3548 at 4/13/16 11:11 PM:
----------------------------------------------------------

Hi all

My name is Stefano Bianchi, an italian student following the course of telecommunication engineering at University of Bologna.
I'm currently doing my master thesis at INFN-CNAF, computer science division managing the Datacenter Infrastructure Tier 1, directly connected to CERN in Ginevra, by means of Grid federated computing infrastructure.
The topic of my master thesis in on Dynamic Virtual Networking.
First of all i have to say that i'm working on an IaaS based on Openstack.
The Professors asked me to realize a distributed PaaS, such that scientists should be able to run their particular tasks (how to do the task is another story) in a transparent way.
I mean, a scientist should be able to run for instance, an hadoop container on the recources of the local datacenter, on which he is working, but also on the resources of other datacenters, belonging to the Grid computing infrastructure, maybe because the workload on local DC is too high. What is important is that scientist does not care of which resources he is using.
In order to do that i need to exploit Mesos as scheduler for tasks and project calico for distribute the tasks among different clusters.
My main problem is that Calico and Mesos both operate on the assumption that all nodes are reachable via a unique hostname. And i don't have a DNS in OpenStack.
First of all i need to understand how to create the federation of mesos clusters, as you mentioned you started deploying something, i would like to follow a tutorial if it is available. These federation, from what i understood i should do, must communicate each other in order to  share the datacenters recources.
Obviously, before to try the real interconnection among datacenters i need to simulate this scenario using OpenStack virtual network.
My actual configuration:
MesosCluster1_on_Network1:
Mesos-Master1
Mesos-Slave11, Mesos-Slave12(running mesos-dns)

MesosCluster2_on_Network2:
Mesos-Master2
Mesos-Slave21, Mesos-Slave22

Could you please help me ?
I am not a computer science engineer, and i don't have a lot of skills on these topics, so if you could share with me a way to build this kind of interconnection it could be great!
Thanks to all.

Best regards

Stefano


was (Author: jazzista88):
Hi all

My name is Stefano Bianchi, an italian student following the course of telecommunication engineering at University of Bologna.
I'm currently doing my master thesis at INFN-CNAF, computer science division managing the Datacenter Infrastructure Tier 1, directly connected to CERN in Ginevra, by means of Grid federated computing infrastructure.
The topic of my master thesis in on Dynamic Virtual Networking.
First of all i have to say that i'm working on an IaaS based on Openstack.
The Professors asked me to realize a distributed PaaS, such that scientists should be able to run their particular tasks (how to do the task is another story) in a transparent way.
I mean, a scientist should be able to run for instance, an hadoop container on the recources of the local datacenter, on which he is working, but also on the resources of other datacenters, belonging to the Grid computing infrastructure, maybe because the workload on local DC is too high. What is important is that scientist does not care of which resources he is using.
In order to do that i need to exploit Mesos as scheduler for tasks and project calico for distribute the tasks among different clusters.
My main problem is that Calico and Mesos both operate on the assumption that all nodes are reachable via a unique hostname. And i don't have a DNS in OpenStack.
First of all i need to understand how to create the federation of mesos clusters, as you mentioned you started deploying something, i would like to follow a tutorial if it is available. These federation, from what i understood i should do, must communicate each other in order to  share the datacenters recources.
Obviously, before to try the real interconnection among datacenters i need to simulate this scenario using OpenStack virtual network.
My actual configuration:
MesosCluster1_on_Network1:
Mesos-Master1
Mesos-Slave11, Mesos-Slave12(running mesos-dns)

MesosCluster2_on_Network2:
Mesos-Master2
Mesos-Slave21, Mesos-Slave22

Could you please help me ?
Is there a tutorial i can follow to deploy such system?

Thanks to all.

Best regards

Stefano

> Investigate federations of Mesos masters
> ----------------------------------------
>
>                 Key: MESOS-3548
>                 URL: https://issues.apache.org/jira/browse/MESOS-3548
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Neil Conway
>              Labels: federation, mesosphere, multi-dc
>
> In a large Mesos installation, the operator might want to ensure that even if the Mesos masters are inaccessible or failed, new tasks can still be scheduled (across multiple different frameworks). HA masters are only a partial solution here: the masters might still be inaccessible due to a correlated failure (e.g., Zookeeper misconfiguration/human error).
> To support this, we could support the notion of "hierarchies" or "federations" of Mesos masters. In a Mesos installation with 10k machines, the operator might configure 10 Mesos masters (each of which might be HA) to manage 1k machines each. Then an additional "meta-Master" would manage the allocation of cluster resources to the 10 masters. Hence, the failure of any individual master would impact 1k machines at most. The meta-master might not have a lot of work to do: e.g., it might be limited to occasionally reallocating cluster resources among the 10 masters, or ensuring that newly added cluster resources are allocated among the masters as appropriate. Hence, the failure of the meta-master would not prevent any of the individual masters from scheduling new tasks. A single framework instance probably wouldn't be able to use more resources than have been assigned to a single Master, but that seems like a reasonable restriction.
> This feature might also be a good fit for a multi-datacenter deployment of Mesos: each Mesos master instance would manage a single DC. Naturally, reducing the traffic between frameworks and the meta-master would be important for performance reasons in a configuration like this.
> Operationally, this might be simpler if Mesos processes were self-hosting ([MESOS-3547]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)