You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Jake Farrell (JIRA)" <ji...@apache.org> on 2016/10/11 13:29:20 UTC

[jira] [Assigned] (AURORA-1790) Aurora CNI Support

     [ https://issues.apache.org/jira/browse/AURORA-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jake Farrell reassigned AURORA-1790:
------------------------------------

    Assignee: Jake Farrell

> Aurora CNI Support
> ------------------
>
>                 Key: AURORA-1790
>                 URL: https://issues.apache.org/jira/browse/AURORA-1790
>             Project: Aurora
>          Issue Type: Epic
>            Reporter: Stephan Erb
>            Assignee: Jake Farrell
>
> The [Container Network Interface (CNI)|https://github.com/containernetworking/cni/blob/master/SPEC.md] is a plug-in based networking solution for containers. CNI is [supported by the Mesos Unified Containerizer|https://github.com/apache/mesos/blob/master/docs/cni.md].
> CNI support in Aurora would enable cluster operators to isolate tasks on the network level. This includes features such as IP-per container, or security policies ensuring that only designated subsets of containers can communicate with each other. Both are important feature for multi-tenant environments.
> h2. Mesos Protobufs
> In order to launch a task using CNI, Mesos requires frameworks to populate the [NetworkInfo|https://github.com/apache/mesos/blob/0f97117bac3e1382744e9a847ce11b7589fc45bd/include/mesos/mesos.proto#L1916-L1999] protobuf. The following shows relevant subset of fields:
> {code}
> /**
>  * Describes a container configuration and allows extensible
>  * configurations for different container implementations.
>  *
>  * NOTE: In the Aurora case, this is set as part of ExecutorInfo
>  */
> message ContainerInfo {
>   ...
>   // A list of network requests. A framework can request multiple IP addresses
>   // for the container.
>   repeated NetworkInfo network_infos = 7;
>   ...
> }
> /**
>  * Describes a network request from a framework as well as network resolution
>  * provided by Mesos.
>  */
> message NetworkInfo {
>   ...
>   // For the CNI case, empty during task/executor launch and only used
>   // in TaskStatus messages to inform the framework scheduler about
>   // the IP addresses bound to a container
>   repeated IPAddress ip_addresses = 5;
>   // Name of the network which will be used by network isolator to determine
>   // the network that the container joins. It's up to the network isolator
>   // to decide how to interpret this field.
>   optional string name = 6;
>   // To tag certain metadata to be used by Isolator/IPAM, e.g., rack, etc.
>   // Opaque to Mesos but interpreted by the CNI plugin
>   optional Labels labels = 4;
>   ...
> }
> /**
>  * Container related information that is resolved during container
>  * setup. The information is sent back to the framework as part of the
>  * TaskStatus message.
>  */
> message ContainerStatus {
>   // This field can be reliably used to identify the container IP address.
>   repeated NetworkInfo network_infos = 1;
>   ...
> }
> {code}
> h2. Challenges
> * In contrast to ports or other resources, this is the first time an important detail is only discovered asynchronously after a task has been launched, i.e. the scheduler will only learn about the IP addresses of the launched task after having received its first {{TaskStatus}}.
> * A task can now live in multiple networks and can have multiple IP addresses.
> h2. Necessary Changes
> In order to implement CNI support in Aurora, several changes across the entire code base are needed.
> h3. Mesos
> * As of today, it seems like there is no reliable way to discover CNI-assigned IPs from within an executor (see MESOS-6281). This is crucial for us, as Thermos is responsible to announce itself into Zookeeper serversets.
> h3. Thermos
> * The Observer UI needs to be updated to handle multiple IP addresses.
> * The ZK serverset announcement needs to be adjusted to publish all IP-addresses.
> * A replacement/addition for pystachio {{{{mesos.hostname}}}} is required so that usercode can discover its current IP addresses. This relates to MESOS-6281.
> h3. Aurora Scheduler
> * Feature toggle allowing operators to enabe/disable CNI support.
> * Plumbing of NetworkInfo name and labels touching Thrift API, storage, and task launch mechanism.
> * Extension of {{TaskStatusHandlerImpl}} and [{{StateManager}}|https://github.com/apache/aurora/blob/783baaefb9a814ca01fad78181fe3df3de5b34af/src/main/java/org/apache/aurora/scheduler/state/StateManager.java] storage layer to persist received IP addresses.
> h3. Aurora Client
> * Extension of the Pystachio configuration so that user-defined jobs can join operator enabled networks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)