You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mesos.apache.org by nn...@apache.org on 2015/10/05 22:55:33 UTC

mesos git commit: Added initial draft of networking user-doc.

Repository: mesos
Updated Branches:
  refs/heads/master 63709b19c -> 4e1bc0d8f


Added initial draft of networking user-doc.

Review: https://reviews.apache.org/r/38963


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/4e1bc0d8
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/4e1bc0d8
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/4e1bc0d8

Branch: refs/heads/master
Commit: 4e1bc0d8f4368592d66740ae5ac0abdc707eb19e
Parents: 63709b1
Author: Kapil Arya <ka...@mesosphere.io>
Authored: Mon Oct 5 13:48:06 2015 -0700
Committer: Niklas Q. Nielsen <ni...@mesosphere.io>
Committed: Mon Oct 5 13:48:07 2015 -0700

----------------------------------------------------------------------
 docs/home.md                                    |   1 +
 docs/images/networking-architecture.png         | Bin 0 -> 50637 bytes
 docs/networking-for-mesos-managed-containers.md | 282 +++++++++++++++++++
 3 files changed, 283 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/4e1bc0d8/docs/home.md
----------------------------------------------------------------------
diff --git a/docs/home.md b/docs/home.md
index 86320f6..e7d6930 100644
--- a/docs/home.md
+++ b/docs/home.md
@@ -33,6 +33,7 @@ layout: documentation
 
 * [Attributes and Resources](/documentation/attributes-resources/) for how to describe the slaves that comprise a cluster.
 * [Fetcher Cache](/documentation/latest/fetcher/) for how to configure the Mesos fetcher cache.
+* [Networking for Mesos-managed Containers](/documentation/latest/networking-for-mesos-managed-containers/)
 * [Oversubscription](/documentation/latest/oversubscription/) for how to configure Mesos to take advantage of unused resources to launch "best-effort" tasks.
 * [Persistent Volume](/documentation/latest/persistent-volume/) for how to allow tasks to access persistent storage resources.
 * [Reservation](/documentation/latest/reservation/) for how to configure Mesos to allow slaves to reserve resources.

http://git-wip-us.apache.org/repos/asf/mesos/blob/4e1bc0d8/docs/images/networking-architecture.png
----------------------------------------------------------------------
diff --git a/docs/images/networking-architecture.png b/docs/images/networking-architecture.png
new file mode 100644
index 0000000..58838cf
Binary files /dev/null and b/docs/images/networking-architecture.png differ

http://git-wip-us.apache.org/repos/asf/mesos/blob/4e1bc0d8/docs/networking-for-mesos-managed-containers.md
----------------------------------------------------------------------
diff --git a/docs/networking-for-mesos-managed-containers.md b/docs/networking-for-mesos-managed-containers.md
new file mode 100644
index 0000000..33568a8
--- /dev/null
+++ b/docs/networking-for-mesos-managed-containers.md
@@ -0,0 +1,282 @@
+---
+layout: documentation
+---
+
+# Networking for Mesos-managed containers
+
+While networking plays a key role in data center infrastructure, it is -- for
+now -- beyond the scope of Mesos to try to address the concerns of networking
+setup, topology and performance. However, Mesos can ease integrations with
+existing networking solutions and enable features, like IP per container,
+task-granular task isolation and service discovery. More often than not, it
+will be challenging to provide a one-size-fits-all networking solution. The
+requirements and available solutions will vary across all cloud-only,
+on-premise, and hybrid deployments.
+
+One of the primary goals for the networking support in Mesos was to have a
+pluggable mechanism to allow users to enable custom networking solution as
+needed. As a result, several extensions were added to Mesos components in
+version 0.25.0 to enable networking support. Further, all the extensions are
+opt-in to allow older frameworks and applications without networking support to
+coexist with the newer ones.
+
+The rest of this document describes the overall architecture of all the involved
+components, configuration steps for enabling IP-per-container, and required
+framework changes.
+
+## How does it work?
+
+![Mesos Networking Architecture](images/networking-architecture.png)
+
+
+A key observation is that the networking support is enabled via a Mesos module
+and thus the Mesos master and agents are completely oblivious of it. It is
+completely up to the networking module to provide the desired support. Next,
+the IP requests are provided on a best effort manner. Thus, the framework should
+be willing to handle ignored (in cases where the module(s) are not present) or
+declined (the IPs can't be assigned due to various reasons) requests.
+
+To maximize backwards-compatibility with existing frameworks, schedulers must
+opt-in to network isolation per-container. Schedulers opt in to network
+isolation using new data structures in the TaskInfo message.
+
+### Terminology
+
+* IP Address Management (IPAM) Server
+  * assigns IPs on demand
+  * recycles IPs once they have been released
+  * (optionally) can tag IPs with a given string/id.
+
+* IPAM client
+  * tightly coupled with a particular IPAM server
+  * acts as a bridge between the "Network Isolator Module" and the IPAM server
+  * communicates with the server to request/release IPs
+
+* Network Isolator Module (NIM):
+  * a Mesos module for the Agent implementing the `Isolator` interface
+  * looks at TaskInfos to detect the IP requirements for the tasks
+  * communicates with the IPAM client to request/release IPs
+  * communicates with an external network virtualizer/isolator to enable network
+    isolation
+
+* Cleanup Module:
+  * responsible for doing a cleanup (releasing IPs, etc.) during a Agent lost
+    event, dormant otherwise
+
+### Framework requests IP address for containers
+
+1. A Mesos framework uses the TaskInfo message to requests IPs for each
+   container being launched. (The request is ignored if the Mesos cluster
+   doesn't have support for IP-per-container.)
+
+2. Mesos Master processes TaskInfos and forwards them to the Agent for launching
+   tasks.
+
+### Network isolator module gets IP from IPAM server
+
+3. Mesos Agent inspects the TaskInfo to detect the container requirements
+   (MesosContainerizer in this case) and prepares various Isolators for the
+   to-be-launched container.
+   * The NIM inspects the TaskInfo to decide whether to enable network isolator
+     or not.
+
+4. If network isolator is to be enabled, NIM requests IP address(es) via IPAM
+     client and informs the Agent.
+
+### Agent launches container with a network namespace
+
+5. The Agent launches a container within a new network namespace.
+   * The Agent calls into NIM to perform "isolation"
+   * The NIM then calls into network virtualizer to isolate the container.
+
+### Network virtualizer assigns IP address to the container and isolates it.
+
+6. NIM then "decorates" the TaskStatus with the IP information.
+   * The IP address(es) from TaskStatus are made available at Master's
+     state endpoint.
+   * The TaskStatus is also forwarded to the framework to inform it of the IP
+     addresses.
+   * When a task is killed or lost, NIM communicates with IPAM client to release
+     corresponding IP address(es).
+
+### Cleanup module detects lost Agents and performs cleanup
+
+7. The cleanup module gets notified if there is an Agent-lost event.
+
+8. The cleanup module communicates with the IPAM client to release all IP
+   address(es) associated with the lost Agent. The IPAM may have a grace period
+   before the address(es) are recycled.
+
+## Configuration
+
+The network isolator module is not part of standard Mesos distribution. However,
+there is an example implementation at https://github.com/mesosphere/net-modules.
+
+Once the network isolation module has been built into a shared dynamic library,
+we can load it into Mesos Agent (see [modules documentation](modules.md) on
+instructions for building and loading a module).
+
+## Enabling frameworks for IP-per-container capability
+
+### NetworkInfo
+
+A new NetworkInfo message has been introduced:
+
+```{.proto}
+message NetworkInfo {
+  enum Protocol {
+    IPv4 = 0,
+    IPv6 = 1
+  }
+
+  optional Protocol protocol = 1;
+
+  optional string ip_address = 2;
+
+  repeated string groups = 3;
+
+  optional Labels labels = 4;
+};
+```
+
+When requesting an IP address from the IPAM, one needs to set the `protocol`
+field to `IPv4` or `IPv6`. Setting `ip_address` to a valid IP address allows the
+framework to specify a static IP address for the container (if supported by the
+NIM). This is helpful in situations where a task must be bound to a particular
+IP address even as it is killed and restarted on a different node.
+
+
+### Examples of specifying network requirements
+
+Frameworks wanting to enable IP per container, need to provide `NetworkInfo`
+message in TaskInfo. Here are a few examples:
+
+1. A request for one address of unspecified protocol version using the default
+   command executor
+
+
+   ```
+   TaskInfo {
+     ...
+     command: ...,
+     container: ContainerInfo {
+       network_infos: [
+         NetworkInfo {
+           protocol: None;
+           ip_address: None;
+           groups: [];
+           labels: None;
+         }
+       ]
+     }
+   }
+   ```
+
+2. A request for one IPv4 and one IPv6 address, in two separate groups using the
+   default command executor
+
+   ```
+   TaskInfo {
+     ...
+     command: ...,
+     container: ContainerInfo {
+       network_infos: [
+         NetworkInfo {
+           protocol: IPv4;
+           ip_address: None;
+           groups: ["public"];
+           labels: None;
+         },
+         NetworkInfo {
+           protocol: IPv6;
+           ip_address: None;
+           groups: ["private"];
+           labels: None;
+         }
+       ]
+     }
+   }
+   ```
+
+3. A request for a specific IP address using a custom executor
+
+   ```
+   TaskInfo {
+     ...
+     executor: ExecutorInfo {
+       ...,
+       container: ContainerInfo {
+         network_infos: [
+           NetworkInfo {
+             protocol: None;
+             ip_address: "10.1.2.3";
+             groups: [];
+             labels: None;
+           }
+         ]
+       }
+     }
+   }
+   ```
+
+NOTE: The Mesos Containerizer will reject any CommandInfo that has a ContainerInfo. For this reason, when opting in to network isolation when using the Mesos Containerizer, set TaskInfo.ContainerInfo.NetworkInfo.
+
+## Address Discovery
+
+The NetworkInfo message allows frameworks to request IP address(es) to be
+assigned at task launch time on the Mesos agent.  After opting in to network
+isolation for a given executor’s container in this way, frameworks will need to
+know what address(es) were ultimately assigned in order to perform health
+checks, or any other out-of-band communication.
+
+This is accomplished by adding a new field to the TaskStatus message.
+
+```{.proto}
+message ContainerStatus {
+   repeated NetworkInfo network_infos;
+}
+
+message TaskStatus {
+  ...
+  optional ContainerStatus container;
+  ...
+};
+```
+
+Further, the container IP addresses are also exposed via Master's state
+endpoint. The JSON output from Master's state endpoint contains a list of task
+statuses. If a task's container was started with it's own IP address, the
+assigned IP address will be exposed as part of the `TASK_RUNNING` status.
+
+NOTE: Since per-container address(es) are strictly opt-in from the framework,
+the framework may ignore the IP address(es) provided in StatusUpdate if it
+didn't set NetworkInfo in the first place.
+
+## Writing a Custom Network Isolator Module
+
+A network isolator module implements the Isolator interface provided by Mesos.
+The module is loaded as a dynamic shared library in to the Mesos Agent and gets
+hooked up in the container launch sequence. A network isolator may communicate
+with external IPAM and network virtualizer tools to fulfill framework
+requirements.
+
+In terms of the Isolator API, there are three key callbacks that a network
+isolator module should implement:
+
+1. `Isolator::prepare()` provides the module with a chance to decide whether or
+   not the enable network isolation for the given task container. If the network
+   isolation is to be enabled, the Isolator::prepare call would inform the Agent
+   to create a private network namespace for the coordinator. It is this
+   interface, that will also generate an IP address (statically or with the help
+   of an external IPAM agent) for the container.
+
+2. `Isolator::isolate()` provide the module with the opportunity to _isolate_
+   the container _after_ it has been created but before the executor is launched
+   inside the container. This would involve creating virtual ethernet adapter
+   for the container and assigning it an IP address. The module can also use
+   help of an external network virtualizer/isolator for setting up network for
+   the container.
+
+3. `Isolator::cleanup()` is called when the container terminates. This allows the
+   module to perform any cleanups such as recovering resources and releasing IP
+   addresses as needed.