You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mesos.apache.org by ka...@apache.org on 2016/02/01 03:49:28 UTC
svn commit: r1727886 [14/16] - in /mesos/site: ./ publish/ publish/assets/img/documentation/ publish/community/ publish/documentation/ publish/documentation/allocation-module/ publish/documentation/app-framework-development-guide/ publish/documentation...

Added: mesos/site/source/documentation/latest/containerizer-internals.md
URL: http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/containerizer-internals.md?rev=1727886&view=auto
==============================================================================
--- mesos/site/source/documentation/latest/containerizer-internals.md (added)
+++ mesos/site/source/documentation/latest/containerizer-internals.md Mon Feb  1 02:49:25 2016
@@ -0,0 +1,141 @@
+---
+layout: documentation
+---
+
+
+# Containerizer
+
+Containerizers are Mesos components responsible for launching
+containers. They own the containers launched for the tasks/executors,
+and are responsible for their isolation, resource management, and
+events (e.g., statistics).
+
+# Containerizer internals
+
+### Containerizer creation and launch
+
+* Agent creates a containerizer based on the flags (using slave flag
+  `--containerizers`). If multiple containerizers (e.g., docker,
+  mesos) are specified using the `--containerizers` flag, then the
+  composing containerizer will be used to create a containerizer.
+* If an executor is not specified in `TaskInfo`, Mesos agent will use
+  the default executor for the task (depending on the Containerizer
+	the agent is using, it could be `mesos-executor` or
+  `mesos-docker-executor`). TODO: Update this after MESOS-1718 is
+  completed. After this change, master will be responsible for
+  generating executor information.
+
+### Types of containerizers
+
+Mesos currently supports the following containerizers:
+
+* Composing
+* [Docker](/documentation/latest/docker-containerizer/)
+* [Mesos](/documentation/latest/containerizer/)
+* [External](/documentation/latest/external-containerizer/) (deprecated)
+
+#### Composing Containerizer
+
+Composing containerizer will compose the specified containerizers
+(using slave flag `--containerizers`) and act like a single
+containerizer. This is an implementation of `composite` design
+pattern.
+
+#### Docker Containerizer
+
+Docker containerizer manages containers using docker engine provided
+in the docker package.
+
+##### Container launch
+
+* Docker containerizer will attempt to launch the task in docker only
+  if `ContainerInfo::type` is set to DOCKER.
+* Docker containerizer will first pull the image.
+* Calls pre-launch hook.
+* The executor will be launched in one of the two ways:
+
+A) Mesos agent runs in a docker container
+
+* This is indicated by the presence of agent flag
+  `--docker_mesos_image`. In this case, the value of flag
+  `--docker_mesos_image` is assumed to be the docker image used to
+  launch the Mesos agent.
+* If the task includes an executor (custom executor), then that executor is
+  launched in a docker container.
+* If the task does not include an executor i.e. it defines a command, the
+  default executor `mesos-docker-executor` is launched in a docker container to
+  execute the command via Docker CLI.
+
+B) Mesos agent does not run in a docker container
+
+* If the task includes an executor (custom executor), then that executor is
+  launched in a docker container.
+* If task does not include an executor i.e. it defines a command, a subprocess
+  is forked to execute the default executor `mesos-docker-executor`.
+  `mesos-docker-executor` then spawns a shell to execute the command via Docker
+  CLI.
+
+#### Mesos Containerizer
+
+Mesos containerizer is the native Mesos containerizer. Mesos
+Containerizer will handle any executor/task that does not specify
+`ContainerInfo::DockerInfo`.
+
+##### Container launch
+
+* Calls prepare on each isolator.
+* Forks the executor using Launcher (see [Launcher](#Launcher)). The
+  forked child is blocked from executing until it is been isolated.
+* Isolate the executor. Call isolate with the pid for each isolator
+  (see [Isolators](#Isolators)).
+* Fetch the executor.
+* Exec the executor. The forked child is signalled to continue. It
+  will first execute any preparation commands from isolators and then
+  exec the executor.
+
+<a name="Launcher"></a>
+##### Launcher
+
+Launcher is responsible for forking/destroying containers.
+
+* Forks a new process in the containerized context. The child will
+  exec the binary at the given path with the given argv, flags, and
+  environment.
+* The I/O of the child will be redirected according to the specified
+  I/O descriptors.
+
+###### Linux launcher
+
+* Creates a âfreezerâ cgroup for the container.
+* Creates posix âpipeâ to enable communication between host (parent
+  process) and container process.
+* Spawn child process(container process) using `clone` system call.
+* Moves the new container process to the freezer hierarchy.
+* Signals the child process to continue (execâing) by writing a
+  character to the write end of the pipe in the parent process.
+
+###### Posix launcher (TBD)
+
+<a name="Isolators"></a>
+##### Isolators
+
+Isolators are responsible for creating an environment for the
+containers where resources like cpu, network, storage and memory can
+be isolated from other containers.
+
+### Containerizer states
+
+#### Docker
+
+* FETCHING
+* PULLING
+* RUNNING
+* DESTROYING
+
+#### Mesos
+
+* PREPARING
+* ISOLATING
+* FETCHING
+* RUNNING
+* DESTROYING

Modified: mesos/site/source/documentation/latest/containerizer.md
URL: http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/containerizer.md?rev=1727886&r1=1727885&r2=1727886&view=diff
==============================================================================
--- mesos/site/source/documentation/latest/containerizer.md (original)
+++ mesos/site/source/documentation/latest/containerizer.md Mon Feb  1 02:49:25 2016
@@ -2,83 +2,93 @@
 layout: documentation
 ---
 
-# Mesos Containerizer
+# Containerizer
 
-The MesosContainerizer provides lightweight containerization and
-resource isolation of executors using Linux-specific functionality
-such as control cgroups and namespaces. It is composable so operators
-can selectively enable different isolators.
-
-It also provides basic support for POSIX systems (e.g., OSX) but
-without any actual isolation, only resource usage reporting.
-
-### Shared Filesystem
-
-The SharedFilesystem isolator can optionally be used on Linux hosts to
-enable modifications to each container's view of the shared
-filesystem.
-
-The modifications are specified in the ContainerInfo included in the
-ExecutorInfo, either by a framework or by using the
-`--default_container_info` slave flag.
-
-ContainerInfo specifies Volumes which map parts of the shared
-filesystem (host\_path) into the container's view of the filesystem
-(container\_path), as read-write or read-only. The host\_path can be
-absolute, in which case it will make the filesystem subtree rooted at
-host\_path also accessible under container\_path for each container.
-If host\_path is relative then it is considered as a directory
-relative to the executor's work directory. The directory will be
-created and permissions copied from the corresponding directory (which
-must exist) in the shared filesystem.
-
-The primary use-case for this isolator is to selectively make parts of
-the shared filesystem private to each container. For example, a
-private "/tmp" directory can be achieved with `host_path="tmp"` and
-`container_path="/tmp"` which will create a directory "tmp" inside the
-executor's work directory (mode 1777) and simultaneously mount it as
-/tmp inside the container. This is transparent to processes running
-inside the container. Containers will not be able to see the host's
-/tmp or any other container's /tmp.
-
-### Pid Namespace
-
-The Pid Namespace isolator can be used to isolate each container in
-a separate pid namespace with two main benefits:
-
-1. Visibility: Processes running in the container (executor and
-   descendants) are unable to see or signal processes outside the
-   namespace.
-
-2. Clean termination: Termination of the leading process in a pid
-   namespace will result in the kernel terminating all other processes
-   in the namespace.
-
-The Launcher will use (2) during destruction of a container in
-preference to the freezer cgroup, avoiding known kernel issues related
-to freezing cgroups under OOM conditions.
-
-/proc will be mounted for containers so tools such as 'ps' will work
-correctly.
-
-
-### Posix Disk Isolator
-
-The Posix Disk isolator provides basic disk isolation. It is able to
-report the disk usage for each sandbox and optionally enforce the disk
-quota. It can be used on both Linux and OS X.
-
-To enable the Posix Disk isolator, append `posix/disk` to the
-`--isolation` flag when starting the slave.
-
-By default, the disk quota enforcement is disabled. To enable it,
-specify `--enforce_container_disk_quota` when starting the slave.
-
-The Posix Disk isolator reports disk usage for each sandbox by
-periodically running the `du` command. The disk usage can be retrieved
-from the resource statistics endpoint (`/monitor/statistics.json`).
-
-The interval between two `du`s can be controlled by the slave flag
-`--container_disk_watch_interval`. For example,
-`--container_disk_watch_interval=1mins` sets the interval to be 1
-minute. The default interval is 15 seconds.
+## Motivation
+
+Containerizers are intended to run tasks in 'containers' which in turn are used
+to:
+
+* Isolate a task from other running tasks.
+* 'Contain' tasks to run in limited resource runtime environment.
+* Control task's individual resources (e.g, CPU, memory) programatically.
+* Run software in a pre-packaged file system image, allowing it to run in
+  different environments.
+
+
+## Types of containerizers
+
+Mesos plays well with existing container technologies (e.g., docker) and also
+provides its own container technology. It also supports composing different
+container technologies(e.g., docker and mesos).
+
+Mesos implements the following containerizers:
+
+* [Composing](#Composing)
+* [Docker](#Docker)
+* [Mesos (default)](#Mesos)
+* External (deprecated)
+
+User can specify the types of containerizers to use via the agent flag
+`--containerizers`.
+
+
+<a name="Composing"></a>
+### Composing containerizer
+
+This feature allows multiple container technologies to play together. It is
+enabled when you configure the `--containerizers` agent flag with multiple comma
+seperated containerizer names (e.g., `--containerizers=mesos,docker`). The order
+of the comma separated list is important as the first containerizer that
+supports the task's container configuration will be used to launch the task.
+
+Use cases:
+
+* For testing tasks with different types of resource isolations. Since 'mesos'
+  containerizers have more isolation abilities, a framework can use composing
+  containerizer to test a task using 'mesos' containerizer's controlled
+  environment and at the same time test it to work with 'docker' containers by
+  just changing the container parameters for the task.
+
+
+<a name="Docker"></a>
+### Docker containerizer
+
+Docker containerizer allows tasks to be run inside docker container. This
+containerizer is enabled when you configure the agent flag as
+`--containerizers=docker`.
+
+Use cases:
+
+* If task needs to be run with the tooling that comes with the docker package.
+* If Mesos agent is running inside a docker container.
+
+For more details, see
+[Docker Containerizer](/documentation/latest/docker-containerizer/).
+
+<a name="Mesos"></a>
+### Mesos containerizer
+
+This containerizer allows tasks to be run with an array of pluggable isolators
+provided by Mesos. This is the native Mesos containerizer solution and is
+enabled when you configure the agent flag as `--containerizers=mesos`.
+
+Use cases:
+
+* Allow Mesos to control the task's runtime environment without depending on
+  other container technologies (e.g., docker).
+* Want fine grained operating system controls (e.g., cgroups/namespaces provided
+  by linux).
+* Want Mesos's latest container technology features.
+* Need additional resource controls like disk usage limits, which
+  might not be provided by other container technologies.
+* Want to add custom isolation for tasks.
+
+For more details, see
+[Mesos Containerizer](/documentation/latest/mesos-containerizer/).
+
+
+## References
+
+* [Containerizer Internals](/documentation/latest/containerizer-internals/) for
+  implementation details of containerizers.

Modified: mesos/site/source/documentation/latest/docker-containerizer.md
URL: http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/docker-containerizer.md?rev=1727886&r1=1727885&r2=1727886&view=diff
==============================================================================
--- mesos/site/source/documentation/latest/docker-containerizer.md (original)
+++ mesos/site/source/documentation/latest/docker-containerizer.md Mon Feb  1 02:49:25 2016
@@ -25,7 +25,7 @@ iptables -A INPUT -s 172.17.0.0/16 -i do
 
 ## How do I use the Docker Containerizer?
 
-TaskInfo before 0.20.0 used to only support either setting a CommandInfo that launches a task running the bash command, or a ExecutorInfo that launches a custom Executor
+TaskInfo before 0.20.0 used to only support either setting a CommandInfo that launches a task running the bash command, or an ExecutorInfo that launches a custom Executor
 that will launches the task.
 
 With 0.20.0 we added a ContainerInfo field to TaskInfo and ExecutorInfo that allows a Containerizer such as Docker to be configured to run the task or executor.
@@ -56,11 +56,11 @@ When launching the docker image as an Ex
 
 Note that we currently default to host networking when running a docker image, to easier support running a docker image as an Executor.
 
-The containerizer also supports optional force pulling of the image, and if disabled the docker image will only be updated again if it's not available on the host.
+The containerizer also supports optional force pulling of the image. It is set disabled as default, so the docker image will only be updated again if it's not available on the host. To enable force pulling an image, `force_pull_image` has to be set as true.
 
 ## Private Docker repository
 
-To run a image from a private repository, one can include the uri pointing to a `.dockercfg` that contains login information. The `.dockercfg` file will be pulled into the sandbox the Docker Containerizer
+To run an image from a private repository, one can include the uri pointing to a `.dockercfg` that contains login information. The `.dockercfg` file will be pulled into the sandbox the Docker Containerizer
 set the HOME environment variable pointing to the sandbox so docker cli will automatically pick up the config file.
 
 ## CommandInfo to run Docker images

Modified: mesos/site/source/documentation/latest/effective-code-reviewing.md
URL: http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/effective-code-reviewing.md?rev=1727886&r1=1727885&r2=1727886&view=diff
==============================================================================
--- mesos/site/source/documentation/latest/effective-code-reviewing.md (original)
+++ mesos/site/source/documentation/latest/effective-code-reviewing.md Mon Feb  1 02:49:25 2016
@@ -22,7 +22,7 @@ to consider before sending review reques
    change clear in the review request, so the reviewer is not left
    guessing. It is highly recommended to attach a JIRA issue with your
    review for additional context.
-4. **Follow the [style guide](http://mesos.apache.org/documentation/latest/c++-style-guide/)
+4. **Follow the [style guide](/documentation/latest/c++-style-guide/)
    and the style of code around you**.
 5. **Do a self-review of your changes before publishing**: Approach it
    from the perspective of a reviewer with no context. Is it easy to figure

Added: mesos/site/source/documentation/latest/executor-http-api.md
URL: http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/executor-http-api.md?rev=1727886&view=auto
==============================================================================
--- mesos/site/source/documentation/latest/executor-http-api.md (added)
+++ mesos/site/source/documentation/latest/executor-http-api.md Mon Feb  1 02:49:25 2016
@@ -0,0 +1,358 @@
+---
+layout: documentation
+---
+
+# Executor HTTP API
+
+Mesos 0.27.0 added **experimental** support for V1 Executor HTTP API.
+
+
+## Overview
+
+The executor interacts with Mesos via  "/api/v1/executor" endpoint hosted by the Mesos agent. The fully qualified URL of the endpoint might look like:
+
+    http://agenthost:5051/api/v1/executor
+
+Note that we refer to this endpoint with its suffix "/executor" in the rest of this document. This endpoint accepts HTTP POST requests with data encoded as JSON (Content-Type: application/json) or binary Protobuf (Content-Type: application/x-protobuf). The first request that the executor sends to "/executor" endpoint is called SUBSCRIBE and results in a streaming response ("200 OK" status code with Transfer-Encoding: chunked). **Executors are expected to keep the subscription connection open as long as possible (barring errors in network, agent process restarts, software bugs etc.) and incrementally process the response** (NOTE: HTTP client libraries that can only parse the response after the connection is closed cannot be used). For the encoding used, please refer to **Events** section below.
+
+All the subsequent (non subscribe) requests to "/executor" endpoint (see details below in **Calls** section) must be sent using a different connection(s) than the one being used for subscription. Agent responds to these HTTP POST requests with "202 Accepted" status codes (or, for unsuccessful requests, with 4xx or 5xx status codes; details in later sections). The "202 Accepted" response means that a request has been accepted for processing, not that the processing of the request has been completed. The request might or might not be acted upon by Mesos (e.g., agent fails during the processing of the request). Any asynchronous responses from these requests will be streamed on the long-lived subscription connection.
+
+## Calls
+
+The following calls are currently accepted by the agent. The canonical source of this information is [executor.proto](https://github.com/apache/mesos/blob/master/include/mesos/v1/executor/executor.proto) (NOTE: The protobuf definitions are subject to change before the Beta API is finalized). Note that when sending JSON encoded Calls, executors should encode raw bytes in Base64 and strings in UTF-8.
+
+### SUBSCRIBE
+
+This is the first step in the communication process between the executor and agent. This is also to be considered as subscription to the "/executor" events stream.
+
+To subscribe with the agent, the executor sends a HTTP POST request with encoded `SUBSCRIBE` message. The HTTP response is a stream with [RecordIO](scheduler-http-api.md#recordio-response-format) encoding, with the first event being `SUBSCRIBED` event (see details in **Events** section).
+
+Additionally, if the executor is connecting to the agent after a [disconnection](#disconnections), it can also send a list of:
+
+* **Unacknowledged Status Updates**: The executor is expected to maintain a list of status updates not acknowledged by the agent via the `ACKNOWLEDGE` events.
+* **Unacknowledged Tasks**: The executor is expected to maintain a list of tasks that have not been acknowledged by the agent. A task is considered acknowledged if atleast one of the status updates for this task is acknowledged by the slave.
+
+```
+SUBSCRIBE Request (JSON):
+
+POST /api/v1/executor  HTTP/1.1
+
+Host: agenthost:5051
+Content-Type: application/json
+Accept: application/json
+
+{
+  "type": "SUBSCRIBE",
+  "executor_id": {
+    "value": "387aa966-8fc5-4428-a794-5a868a60d3eb"
+  },
+  "framework_id": {
+    "value": "49154f1b-8cf6-4421-bf13-8bd11dccd1f1"
+  },
+  "subscribe": {
+    "unacknowledged_tasks": [
+      {
+        "name": "dummy-task",
+        "task_id": {
+          "value": "d40f3f3e-bbe3-44af-a230-4cb1eae72f67"
+        },
+        "agent_id": {
+          "value": "f1c9cdc5-195e-41a7-a0d7-adaa9af07f81"
+        },
+        "command": {
+          "value": "ls",
+          "arguments": [
+            "-l",
+            "\/tmp"
+          ]
+        }
+      }
+    ],
+    "unacknowledged_updates": [
+      {
+        "framework_id": {
+          "value": "49154f1b-8cf6-4421-bf13-8bd11dccd1f1"
+        },
+        "status": {
+          "source": "SOURCE_EXECUTOR",
+          "task_id": {
+            "value": "d40f3f3e-bbe3-44af-a230-4cb1eae72f67"
+          },
+        "state": "TASK_RUNNING",
+        "uuid": "ZDQwZjNmM2UtYmJlMy00NGFmLWEyMzAtNGNiMWVhZTcyZjY3Cg=="
+        }
+      }
+    ]
+  }
+}
+
+SUBSCRIBE Response Event (JSON):
+HTTP/1.1 200 OK
+
+Content-Type: application/json
+Transfer-Encoding: chunked
+
+<event-length>
+{
+  "type": "SUBSCRIBED",
+  "subscribed": {
+    "executor_info": {
+      "executor_id": {
+        "value": "387aa966-8fc5-4428-a794-5a868a60d3eb"
+      },
+      "command": {
+        "value": "\/path\/to\/executor"
+      },
+      "framework_id": {
+        "value": "49154f1b-8cf6-4421-bf13-8bd11dccd1f1"
+      }
+    },
+    "framework_info": {
+      "user": "foo",
+      "name": "my_framework"
+    },
+    "agent_id": {
+      "value": "f1c9cdc5-195e-41a7-a0d7-adaa9af07f81"
+    },
+    "agent_info": {
+      "host": "agenthost",
+      "port": 5051
+    }
+  }
+}
+<more events>
+```
+
+NOTE: Once an executor is launched, the agent waits for a duration of `--executor_registration_timeout` (configurable at agent startup) for the executor to subscribe. If the executor fails to subscribe within this duration, the agent forcefully destroys the container executor is running in.
+
+### UPDATE
+
+Sent by the executor to reliably communicate the state of managed tasks. It is crucial that a terminal update (e.g., `TASK_FINISHED`, `TASK_KILLED` or `TASK_FAILED`) is sent to the agent as soon as the task terminates, in order to allow Mesos to release the resources allocated to the task.
+
+The scheduler must explicitly respond to this call through an `ACKNOWLEDGE` message (see `ACKNOWLEDGED` in the Events section below for the semantics). The executor must maintain a list of unacknowledged updates. If for some reason, the executor is disconnected from the agent, these updates must be sent as part of `SUBSCRIBE` request in the `unacknowledged_updates` field.
+
+```
+UPDATE Request (JSON):
+
+POST /api/v1/executor  HTTP/1.1
+
+Host: agenthost:5051
+Content-Type: application/json
+Accept: application/json
+
+{
+  "executor_id": {
+    "value": "387aa966-8fc5-4428-a794-5a868a60d3eb"
+  },
+  "framework_id": {
+    "value": "9aaa9d0d-e00d-444f-bfbd-23dd197939a0-0000"
+  },
+  "type": "UPDATE",
+  "update": {
+    "status": {
+      "executor_id": {
+        "value": "387aa966-8fc5-4428-a794-5a868a60d3eb"
+      },
+      "source": "SOURCE_EXECUTOR",
+      "state": "TASK_RUNNING",
+      "task_id": {
+        "value": "66724cec-2609-4fa0-8d93-c5fb2099d0f8"
+      },
+      "uuid": "ZDQwZjNmM2UtYmJlMy00NGFmLWEyMzAtNGNiMWVhZTcyZjY3Cg=="
+    }
+  }
+}
+
+UPDATE Response:
+HTTP/1.1 202 Accepted
+```
+
+### MESSAGE
+
+Sent by the executor to send arbitrary binary data to the scheduler. Note that Mesos neither interprets this data nor makes any guarantees about the delivery of this message to the scheduler. The `data` field is raw bytes encoded in Base64.
+
+```
+MESSAGE Request (JSON):
+
+POST /api/v1/executor  HTTP/1.1
+
+Host: agenthost:5051
+Content-Type: application/json
+Accept: application/json
+
+{
+  "executor_id": {
+    "value": "387aa966-8fc5-4428-a794-5a868a60d3eb"
+  },
+  "framework_id": {
+    "value": "9aaa9d0d-e00d-444f-bfbd-23dd197939a0-0000"
+  },
+  "type": "MESSAGE",
+  "data": "t+Wonz5fRFKMzCnEptlv5A=="
+}
+
+MESSAGE Response:
+HTTP/1.1 202 Accepted
+```
+
+## Events
+
+Executor is expected to keep a **persistent** connection open to "/executor" endpoint even after getting a `SUBSCRIBED` HTTP Response event. This is indicated by "Connection: keep-alive" and "Transfer-Encoding: chunked" headers with *no* "Content-Length" header set. All subsequent events that are relevant to this executor generated by Mesos are streamed on this connection. Agent encodes each Event in [RecordIO](scheduler-http-api.md#recordio-response-format) format, i.e., string representation of length of the event in bytes followed by JSON or binary Protobuf  (possibly compressed) encoded event. Note that the value of length will never be â0â and the size of the length will be the size of unsigned integer (i.e., 64 bits). Also, note that the `RecordIO` encoding should be decoded by the executor whereas the underlying HTTP chunked encoding is typically invisible at the application (executor) layer. The type of content encoding used for the events will be determined by the
  accept header of the POST request (e.g., "Accept: application/json").
+
+The following events are currently sent by the agent. The canonical source of this information is at [executor.proto](include/mesos/v1/executor/executor.proto). Note that when sending JSON encoded events, agent encodes raw bytes in Base64 and strings in UTF-8.
+
+### SUBSCRIBED
+
+The first event sent by the agent when the executor sends a `SUBSCRIBE` request on the persistent connection. See `SUBSCRIBE` in Calls section for the format.
+
+### LAUNCH
+
+Sent by the agent whenever it needs to assign a new task to the executor. The executor is required to send an `UPDATE` message back to the agent indicating the success or failure of the task initialization.
+
+The executor must maintain a list of unacknowledged tasks (see `SUBSCRIBE` in `Calls` section). If for some reason, the executor is disconnected from the agent, these tasks must be sent as part of `SUBSCRIBE` request in the `tasks` field.
+
+```
+LAUNCH Event (JSON)
+<event-length>
+{
+  "type": "LAUNCH",
+  "launch": {
+    "framework_info": {
+      "id": {
+        "value": "49154f1b-8cf6-4421-bf13-8bd11dccd1f1"
+      },
+      "user": "foo",
+      "name": "my_framework"
+    },
+    "task": {
+      "name": "dummy-task",
+      "task_id": {
+        "value": "d40f3f3e-bbe3-44af-a230-4cb1eae72f67"
+      },
+      "agent_id": {
+        "value": "f1c9cdc5-195e-41a7-a0d7-adaa9af07f81"
+      },
+      "command": {
+        "value": "sleep",
+        "arguments": [
+          "100"
+        ]
+      }
+    }
+  }
+}
+```
+
+### KILL
+
+The `KILL` event is sent whenever the scheduler needs to stop execution of a specific task. The executor is required to send a terminal update (e.g., `TASK_FINISHED`, `TASK_KILLED` or `TASK_FAILED`) back to the agent once it has stopped/killed the task. Mesos will mark the task resources as freed once the terminal update is received.
+
+```
+LAUNCH Event (JSON)
+<event-length>
+{
+  "type" : "KILL",
+  "kill" : {
+    "task_id" : {"value" : "d40f3f3e-bbe3-44af-a230-4cb1eae72f67"}
+  }
+}
+```
+
+### ACKNOWLEDGED
+
+Sent by the agent in order to signal the executor that a status update was received as part of the reliable message passing mechanism. Acknowledged updates must not be retried.
+
+```
+ACKNOWLEDGED Event (JSON)
+<event-length>
+{
+  "type" : "ACKNOWLEDGED",
+  "acknowledged" : {
+    "task_id" : {"value" : "d40f3f3e-bbe3-44af-a230-4cb1eae72f67"},
+    "uuid" : "ZDQwZjNmM2UtYmJlMy00NGFmLWEyMzAtNGNiMWVhZTcyZjY3Cg=="
+  }
+}
+```
+
+### MESSAGE
+
+Custom message generated by the scheduler and forwarded all the way to the executor. These messages are delivered "as-is" by Mesos and have no delivery guarantees. It is up to the scheduler to retry if a message is dropped for any reason. Note that `data` is raw bytes encoded as Base64.
+
+```
+MESSAGE Event (JSON)
+<event-length>
+{
+  "type" : "MESSAGE",
+  "message" : {
+    "data" : "c2FtcGxlIGRhdGE="
+  }
+}
+```
+
+### SHUTDOWN
+
+Sent by the agent in order to shutdown the executor. Once an executor gets a `SHUTDOWN` event it is required to kill all its tasks, send `TASK_KILLED` updates and gracefully exit. If an executor doesn't terminate within a certain period after the event was emitted (`grace_period_seconds`), the agent will forcefully destroy the container where the executor is running. The agent would then send `TASK_LOST` updates for any remaining active tasks of this executor.
+
+```
+SHUTDOWN Event (JSON)
+<event-length>
+{
+  "type" : "SHUTDOWN",
+  "shutdown" : {
+    "grace_period_seconds" : 5
+  }
+}
+```
+
+### ERROR
+
+Sent by the agent when an asynchronous error event is generated. It is recommended that the executor abort when it receives an error event and retry subscription.
+
+```
+ERROR Event (JSON)
+<event-length>
+{
+  "type" : "ERROR",
+  "error" : {
+    "message" : "Unrecoverable error"
+  }
+}
+```
+
+## Executor Environment Variables
+
+The following environment variables are set by the agent that can be used by the executor upon startup:
+
+* `MESOS_FRAMEWORK_ID`: `FrameworkID` of the scheduler needed as part of the `SUBSCRIBE` call.
+* `MESOS_EXECUTOR_ID`: `ExecutorID` of the executor needed as part of the `SUBSCRIBE` call.
+* `MESOS_DIRECTORY`: Path to the working directory for the executor.
+* `MESOS_AGENT_ENDPOINT`: Agent endpoint i.e. ip:port to be used by the executor to connect to the agent.
+* `MESOS_CHECKPOINT`: If set to true, denotes that framework has checkpointing enabled.
+
+If `MESOS_CHECKPOINT` is set i.e. when framework checkpointing is enabled, the following additional variables are also set that can be used by the executor for retrying upon a disconnection with the agent:
+
+* `MESOS_RECOVERY_TIMEOUT`: The total duration that the executor should spend retrying before shutting itself down when it is disconnected from the agent (e.g., `15mins`, `5secs` etc.). This is configurable at agent startup via the flag `--recovery_timeout`.
+* `MESOS_SUBSCRIPTION_BACKOFF_MAX`: The maximum backoff duration to be used by the executor between two retries when disconnected (e.g., `250ms`, `1mins` etc.). This is configurable at agent startup via the flag `--executor_reregistration_timeout`.
+
+NOTE: Additionally, the executor also inherits all the agent's environment variables.
+
+## Disconnections
+
+An executor considers itself disconnected if the persistent subscription connection (opened via SUBSCRIBE request) to "/executor" breaks. The disconnection can happen due to an agent process failure etc.
+
+Upon detecting a disconnection from the agent, the retry behavior depends on whether framework checkpointing is enabled:
+
+* If framework checkpointing is disabled, the executor is not supposed to retry subscription and gracefully exit.
+* If framework checkpointing is enabled, the executor is supposed to retry subscription using a suitable [backoff strategy](#backoff-strategies) for a duration of `MESOS_RECOVERY_TIMEOUT`. If it is not able to establish a subscription with the agent within this duration, it should gracefully exit.
+
+## Agent Recovery
+
+Upon agent startup, an agent performs [recovery](/documentation/latest/slave-recovery/). This allows the agent to recover status updates and reconnect with old executors. Currently, the agent supports the following recovery mechanisms specified via the `--recover` flag:
+
+* **reconnect** (default): This mode allows the agent to reconnect with any of itâs old live executors provided the framework has enabled checkpointing. The recovery of the agent is only marked complete once all the disconnected executors have connected and hung executors have been destroyed. Hence, it is mandatory that every executor retries at least once within the interval (`MESOS_SUBSCRIPTION_BACKOFF_MAX`) to ensure it is not shutdown by the agent due to being hung/unresponsive.
+* **cleanup** : This mode kills any old live executors and then exits the agent. This is usually done by operators when making a non-compatible slave/executor upgrade. Upon receiving a `SUBSCRIBE` request from the executor of a framework with checkpointing enabled, the agent would send it a `SHUTDOWN` event as soon as it reconnects. For hung executors, the agent would wait for a duration of `--executor_shutdown_grace_period` (configurable at agent startup) and then forcefully kill the container where the executor is running in.
+
+
+## Backoff Strategies
+
+Executors are encouraged to retry subscription using a suitable backoff strategy like linear backoff, when they notice a disconnection with the agent. A disconnection typically happens when the agent process terminates (e.g., restarted for an upgrade). Each retry interval should be bounded by the value of `MESOS_SUBSCRIPTION_BACKOFF_MAX` which is set as an environment variable.

Modified: mesos/site/source/documentation/latest/external-containerizer.md
URL: http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/external-containerizer.md?rev=1727886&r1=1727885&r2=1727886&view=diff
==============================================================================
--- mesos/site/source/documentation/latest/external-containerizer.md (original)
+++ mesos/site/source/documentation/latest/external-containerizer.md Mon Feb  1 02:49:25 2016
@@ -4,6 +4,8 @@ layout: documentation
 
 # External Containerizer
 
+**NOTE:**  The external containerizer is deprecated. See
+[MESOS-3370](https://issues.apache.org/jira/browse/MESOS-3370) for details.
 
 * EC = external containerizer. A part of the mesos slave that provides
 an API for containerizing via external plugin executables.

Modified: mesos/site/source/documentation/latest/fetcher.md
URL: http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/fetcher.md?rev=1727886&r1=1727885&r2=1727886&view=diff
==============================================================================
--- mesos/site/source/documentation/latest/fetcher.md (original)
+++ mesos/site/source/documentation/latest/fetcher.md Mon Feb  1 02:49:25 2016
@@ -11,10 +11,11 @@ from local file systems.
 
 ## What is the Mesos fetcher?
 
-The Mesos fetcher is a mechanism to download resources into the sandbox
-directory of a task in preparation of running the task. As part of a TaskInfo
-message, the framework ordering the task's execution provides a list of
-`CommandInfo::URI` protobuf values, which becomes the input to the Mesos fetcher.
+The Mesos fetcher is a mechanism to download resources into the [sandbox
+directory](/documentation/latest/sandbox/) of a task in preparation of running
+the task. As part of a TaskInfo message, the framework ordering the task's
+execution provides a list of `CommandInfo::URI` protobuf values, which becomes
+the input to the Mesos fetcher.
 
 The Mesos fetcher can copy files from a local filesytem and it also natively
 supports the HTTP, HTTPS, FTP and FTPS protocols. If the requested URI is based
@@ -220,6 +221,43 @@ If multiple evictions happen concurrentl
 separate space goals. However, leftover freed up space from one effort is
 automatically awarded to others.
 
+## HTTP and SOCKS proxy settings
+
+Sometimes it is desirable to use a proxy to download the file. The Mesos
+fetcher uses libcurl internally for downloading content from
+HTTP/HTTPS/FTP/FTPS servers, and libcurl can use a proxy automatically if
+certain environment variables are set.
+
+The respective environment variable name is `[protocol]_proxy`, where
+`protocol` can be one of socks4, socks5, http, https.
+
+For example, the value of the `http_proxy` environment variable would be used
+as the proxy for fetching http contents, while `https_proxy` would be used for
+fetching https contents. Pay attention that these variable names must be
+entirely in lower case.
+
+The value of the proxy variable is of the format
+`[protocol://][user:password@]machine[:port]`, where `protocol` can be one of
+socks4, socks5, http, https.
+
+FTP/FTPS requests with a proxy also make use of a HTTP/HTTPS proxy. Even
+though in general this constrains the available FTP protocol operations,
+everything the fetcher uses is supported.
+
+Your proxy settings can be placed in `/etc/default/mesos-slave`. Here is an
+example:
+
+```
+export http_proxy=https://proxy.example.com:3128
+export https_proxy=https://proxy.example.com:3128
+```
+
+The fetcher will pick up these environment variable settings since the utility
+program `mesos-fetcher` which it employs is a child of mesos-slave.
+
+For more details, please check the
+[libcurl manual](http://curl.haxx.se/libcurl/c/libcurl-tutorial.html).
+
 ## Slave flags
 
 It is highly recommended to set these flags explicitly to values other than

Modified: mesos/site/source/documentation/latest/frameworks.md
URL: http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/frameworks.md?rev=1727886&r1=1727885&r2=1727886&view=diff
==============================================================================
--- mesos/site/source/documentation/latest/frameworks.md (original)
+++ mesos/site/source/documentation/latest/frameworks.md Mon Feb  1 02:49:25 2016
@@ -26,9 +26,11 @@ layout: documentation
 * [Chronos](https://github.com/airbnb/chronos) is a distributed job scheduler that supports complex job topologies. It can be used as a more fault-tolerant replacement for Cron.
 * [Jenkins](https://github.com/jenkinsci/mesos-plugin) is a continuous integration server. The mesos-jenkins plugin allows it to dynamically launch workers on a Mesos cluster depending on the workload.
 * [JobServer](http://www.grandlogic.com/content/html_docs/jobserver.html) is a distributed job scheduler and processor  which allows developers to build custom batch processing Tasklets using point and click web UI.
+* [GoDocker](https://bitbucket.org/osallou/go-docker) is a batch computing job scheduler like SGE, Torque, etc. It schedules batch computing tasks via webui, API or CLI for system or LDAP users, mounting their home directory or other shared resources in a Docker container. It targets scientists, not developers, and provides plugin mechanisms to extend or modify the default behavior.
 
 ## Data Storage
 
 * [Cassandra](https://github.com/mesosphere/cassandra-mesos) is a performant and highly available distributed database. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.
 * [ElasticSearch](https://github.com/mesosphere/elasticsearch-mesos) is a distributed search engine. Mesos makes it easy to run and scale.
 * [Hypertable](https://code.google.com/p/hypertable/wiki/Mesos) is a high performance, scalable, distributed storage and processing system for structured and unstructured data.
+* [Tachyon](http://tachyon-project.org) is a memory-centric distributed storage system enabling reliable data sharing at memory-speed across cluster frameworks.

Modified: mesos/site/source/documentation/latest/getting-started.md
URL: http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/getting-started.md?rev=1727886&r1=1727885&r2=1727886&view=diff
==============================================================================
--- mesos/site/source/documentation/latest/getting-started.md (original)
+++ mesos/site/source/documentation/latest/getting-started.md Mon Feb  1 02:49:25 2016
@@ -8,18 +8,20 @@ layout: documentation
 
 There are different ways you can get Mesos:
 
-1. Download the latest stable release from [Apache](http://mesos.apache.org/downloads/) (***Recommended***)
+1\. Download the latest stable release from [Apache](http://mesos.apache.org/downloads/) (***Recommended***)
 
-        $ wget http://www.apache.org/dist/mesos/0.24.0/mesos-0.24.0.tar.gz
-        $ tar -zxf mesos-0.24.0.tar.gz
+    $ wget http://www.apache.org/dist/mesos/0.27.0/mesos-0.27.0.tar.gz
+    $ tar -zxf mesos-0.27.0.tar.gz
 
-2. Clone the Mesos git [repository](https://git-wip-us.apache.org/repos/asf/mesos.git) (***Advanced Users Only***)
+2\. Clone the Mesos git [repository](https://git-wip-us.apache.org/repos/asf/mesos.git) (***Advanced Users Only***)
 
-        $ git clone https://git-wip-us.apache.org/repos/asf/mesos.git
+    $ git clone https://git-wip-us.apache.org/repos/asf/mesos.git
+
+*NOTE: If you have problems running the above commands, you may need to first run through the ***System Requirements*** section below to install the `wget`, `tar`, and `git` utilities for your system.*
 
 ## System Requirements
 
-Mesos runs on Linux (64 Bit) and Mac OS X (64 Bit).
+Mesos runs on Linux (64 Bit) and Mac OS X (64 Bit). To build Mesos from source, GCC 4.8.1+ or Clang 3.5+ is required.
 
 For full support of process isolation under Linux a recent kernel >=3.10 is required.
 
@@ -29,155 +31,190 @@ Make sure your hostname is resolvable vi
 
 Following are the instructions for stock Ubuntu 14.04. If you are using a different OS, please install the packages accordingly.
 
-        # Update the packages.
-        $ sudo apt-get update
+    # Update the packages.
+    $ sudo apt-get update
+
+    # Install a few utility tools.
+    $ sudo apt-get install -y tar wget git
+
+    # Install the latest OpenJDK.
+    $ sudo apt-get install -y openjdk-7-jdk
+
+    # Install autotools (Only necessary if building from git repository).
+    $ sudo apt-get install -y autoconf libtool
 
-        # Install the latest OpenJDK.
-        $ sudo apt-get install -y openjdk-7-jdk
+    # Install other Mesos dependencies.
+    $ sudo apt-get -y install build-essential python-dev python-boto libcurl4-nss-dev libsasl2-dev libsasl2-modules maven libapr1-dev libsvn-dev
 
-        # Install autotools (Only necessary if building from git repository).
-        $ sudo apt-get install -y autoconf libtool
+### Mac OS X Yosemite & El Capitan
 
-        # Install other Mesos dependencies.
-        $ sudo apt-get -y install build-essential python-dev python-boto libcurl4-nss-dev libsasl2-dev maven libapr1-dev libsvn-dev
+Following are the instructions for stock Mac OS X Yosemite and El Capitan. If you are using a different OS, please install the packages accordingly.
 
-### Mac OS X Yosemite
+    # Install Command Line Tools.
+    $ xcode-select --install
 
-Following are the instructions for stock Mac OS X Yosemite. If you are using a different OS, please install the packages accordingly.
+    # Install Homebrew.
+    $ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
 
-        # Install Command Line Tools.
-        $ xcode-select --install
+    # Install Java.
+    $ brew install Caskroom/cask/java
 
-        # Install Homebrew.
-        $ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
+    # Install libraries.
+    $ brew install wget git autoconf automake libtool subversion maven
 
-        # Install libraries.
-        $ brew install autoconf automake libtool subversion maven
+*NOTE: When upgrading from Yosemite to El Capitan, make sure to rerun `xcode-select --install` after the upgrade.*
 
 ### CentOS 6.6
 
 Following are the instructions for stock CentOS 6.6. If you are using a different OS, please install the packages accordingly.
 
-        # Install a recent kernel for full support of process isolation.
-        $ sudo rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
-        $ sudo rpm -Uvh http://www.elrepo.org/elrepo-release-6-6.el6.elrepo.noarch.rpm
-        $ sudo yum --enablerepo=elrepo-kernel install -y kernel-lt
-
-        # Make the just installed kernel the one booted by default, and reboot.
-        $ sudo sed -i 's/default=1/default=0/g' /boot/grub/grub.conf
-        $ sudo reboot
-
-        # Install a few utility tools. This also forces an update of `nss`,
-        # which is necessary for the Java bindings to build properly.
-        $ sudo yum install -y tar wget which nss
-
-        # 'Mesos > 0.21.0' requires a C++ compiler with full C++11 support,
-        # (e.g. GCC > 4.8) which is available via 'devtoolset-2'.
-        # Fetch the Scientific Linux CERN devtoolset repo file.
-        $ sudo wget -O /etc/yum.repos.d/slc6-devtoolset.repo http://linuxsoft.cern.ch/cern/devtoolset/slc6-devtoolset.repo
-
-        # Import the CERN GPG key.
-        $ sudo rpm --import http://linuxsoft.cern.ch/cern/centos/7/os/x86_64/RPM-GPG-KEY-cern
-
-        # Fetch the Apache Maven repo file.
-        $ sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
-
-        # 'Mesos > 0.21.0' requires 'subversion > 1.8' devel package, which is
-        # not available in the default repositories.
-        # Add the WANdisco SVN repo file: '/etc/yum.repos.d/wandisco-svn.repo' with content:
-
-          [WANdiscoSVN]
-          name=WANdisco SVN Repo 1.8
-          enabled=1
-          baseurl=http://opensource.wandisco.com/centos/6/svn-1.8/RPMS/$basearch/
-          gpgcheck=1
-          gpgkey=http://opensource.wandisco.com/RPM-GPG-KEY-WANdisco
-
-        # Install essential development tools.
-        $ sudo yum groupinstall -y "Development Tools"
-
-        # Install 'devtoolset-2-toolchain' which includes GCC 4.8.2 and related packages.
-        $ sudo yum install -y devtoolset-2-toolchain
-
-        # Install other Mesos dependencies.
-        $ sudo yum install -y apache-maven python-devel java-1.7.0-openjdk-devel zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 apr-devel subversion-devel apr-util-devel
-
-        # Enter a shell with 'devtoolset-2' enabled.
-        $ scl enable devtoolset-2 bash
-        $ g++ --version  # Make sure you've got GCC > 4.8!
+    # Install a recent kernel for full support of process isolation.
+    $ sudo rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
+    $ sudo rpm -Uvh http://www.elrepo.org/elrepo-release-6-6.el6.elrepo.noarch.rpm
+    $ sudo yum --enablerepo=elrepo-kernel install -y kernel-lt
+
+    # Make the just installed kernel the one booted by default, and reboot.
+    $ sudo sed -i 's/default=1/default=0/g' /boot/grub/grub.conf
+    $ sudo reboot
+
+    # Install a few utility tools. This also forces an update of `nss`,
+    # which is necessary for the Java bindings to build properly.
+    $ sudo yum install -y tar wget git which nss
+
+    # 'Mesos > 0.21.0' requires a C++ compiler with full C++11 support,
+    # (e.g. GCC > 4.8) which is available via 'devtoolset-2'.
+    # Fetch the Scientific Linux CERN devtoolset repo file.
+    $ sudo wget -O /etc/yum.repos.d/slc6-devtoolset.repo http://linuxsoft.cern.ch/cern/devtoolset/slc6-devtoolset.repo
+
+    # Import the CERN GPG key.
+    $ sudo rpm --import http://linuxsoft.cern.ch/cern/centos/7/os/x86_64/RPM-GPG-KEY-cern
+
+    # Fetch the Apache Maven repo file.
+    $ sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
+
+    # 'Mesos > 0.21.0' requires 'subversion > 1.8' devel package, which is
+    # not available in the default repositories.
+    # Create a WANdisco SVN repo file to install the correct version:
+    $ sudo cat > /etc/yum.repos.d/wandisco-svn.repo <<EOF
+    [WANdiscoSVN]
+    name=WANdisco SVN Repo 1.8
+    enabled=1
+    baseurl=http://opensource.wandisco.com/centos/6/svn-1.8/RPMS/$basearch/
+    gpgcheck=1
+    gpgkey=http://opensource.wandisco.com/RPM-GPG-KEY-WANdisco
+    EOF
+
+    # Install essential development tools.
+    $ sudo yum groupinstall -y "Development Tools"
+
+    # Install 'devtoolset-2-toolchain' which includes GCC 4.8.2 and related packages.
+    $ sudo yum install -y devtoolset-2-toolchain
+
+    # Install other Mesos dependencies.
+    $ sudo yum install -y apache-maven python-devel java-1.7.0-openjdk-devel zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 apr-devel subversion-devel apr-util-devel
+
+    # Enter a shell with 'devtoolset-2' enabled.
+    $ scl enable devtoolset-2 bash
+    $ g++ --version  # Make sure you've got GCC > 4.8!
+
+    # Process isolation is using cgroups that are managed by 'cgconfig'.
+    # The 'cgconfig' service is not started by default on CentOS 6.6.
+    # Also the default configuration does not attach the 'perf_event' subsystem.
+    # To do this, add 'perf_event = /cgroup/perf_event;' to the entries in '/etc/cgconfig.conf'.
+    $ sudo yum install -y libcgroup
+    $ sudo service cgconfig start
 
 ### CentOS 7.1
 
 Following are the instructions for stock CentOS 7.1. If you are using a different OS, please install the packages accordingly.
 
-        # Install a few utility tools
-        $ sudo yum install -y tar wget
+    # Install a few utility tools
+    $ sudo yum install -y tar wget git
 
-        # Fetch the Apache Maven repo file.
-        $ sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
+    # Fetch the Apache Maven repo file.
+    $ sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
 
-        # 'Mesos > 0.21.0' requires 'subversion > 1.8' devel package, which is
-        # not available in the default repositories.
-        # Add the WANdisco SVN repo file: '/etc/yum.repos.d/wandisco-svn.repo' with content:
-
-          [WANdiscoSVN]
-          name=WANdisco SVN Repo 1.9
-          enabled=1
-          baseurl=http://opensource.wandisco.com/centos/7/svn-1.9/RPMS/$basearch/
-          gpgcheck=1
-          gpgkey=http://opensource.wandisco.com/RPM-GPG-KEY-WANdisco
+    # Install the EPEL repo so that we can pull in 'libserf-1' as part of our
+    # subversion install below.
+    $ sudo yum install -y epel-release
+
+    # 'Mesos > 0.21.0' requires 'subversion > 1.8' devel package,
+    # which is not available in the default repositories.
+    # Create a WANdisco SVN repo file to install the correct version:
+    $ sudo cat > /etc/yum.repos.d/wandisco-svn.repo <<EOF
+    [WANdiscoSVN]
+    name=WANdisco SVN Repo 1.9
+    enabled=1
+    baseurl=http://opensource.wandisco.com/centos/7/svn-1.9/RPMS/$basearch/
+    gpgcheck=1
+    gpgkey=http://opensource.wandisco.com/RPM-GPG-KEY-WANdisco
+    EOF
+
+    # Parts of Mesos require systemd in order to operate. However, Mesos
+    # only supports versions of systemd that contain the 'Delegate' flag.
+    # This flag was first introduced in 'systemd version 218', which is
+    # lower than the default version installed by centos. Luckily, centos
+    # 7.1 has a patched 'systemd < 218' that contains the 'Delegate' flag.
+    # Explicity update systemd to this patched version.
+    $ sudo yum update systemd
 
-        # Install essential development tools.
-        $ sudo yum groupinstall -y "Development Tools"
+    # Install essential development tools.
+    $ sudo yum groupinstall -y "Development Tools"
 
-        # Install other Mesos dependencies.
-        $ sudo yum install -y apache-maven python-devel java-1.7.0-openjdk-devel zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 apr-devel subversion-devel apr-util-devel
+    # Install other Mesos dependencies.
+    $ sudo yum install -y apache-maven python-devel java-1.8.0-openjdk-devel zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 apr-devel subversion-devel apr-util-devel
 
 ## Building Mesos
 
-        # Change working directory.
-        $ cd mesos
+    # Change working directory.
+    $ cd mesos
 
-        # Bootstrap (Only required if building from git repository).
-        $ ./bootstrap
+    # Bootstrap (Only required if building from git repository).
+    $ ./bootstrap
 
-        # Configure and build.
-        $ mkdir build
-        $ cd build
-        $ ../configure
-        $ make
+    # Configure and build.
+    $ mkdir build
+    $ cd build
+    $ ../configure
+    $ make
 
 In order to speed up the build and reduce verbosity of the logs, you can append `-j <number of cores> V=0` to `make`.
 
-        # Run test suite.
-        $ make check
+    # Run test suite.
+    $ make check
 
-        # Install (Optional).
-        $ make install
+    # Install (Optional).
+    $ make install
 
 ## Examples
 
 Mesos comes bundled with example frameworks written in C++, Java and Python.
+The framework binaries will only be available after running `make check`, as
+described in the ***Building Mesos*** section above.
 
-        # Change into build directory.
-        $ cd build
+    # Change into build directory.
+    $ cd build
 
-        # Start mesos master (Ensure work directory exists and has proper permissions).
-        $ ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/var/lib/mesos
+    # Start mesos master (Ensure work directory exists and has proper permissions).
+    $ ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/var/lib/mesos
 
-        # Start mesos slave.
-        $ ./bin/mesos-slave.sh --master=127.0.0.1:5050
+    # Start mesos slave.
+    $ ./bin/mesos-slave.sh --master=127.0.0.1:5050
 
-        # Visit the mesos web page.
-        $ http://127.0.0.1:5050
+    # Visit the mesos web page.
+    $ http://127.0.0.1:5050
 
-        # Run C++ framework (Exits after successfully running some tasks.).
-        $ ./src/test-framework --master=127.0.0.1:5050
+    # Run C++ framework (Exits after successfully running some tasks.).
+    $ ./src/test-framework --master=127.0.0.1:5050
 
-        # Run Java framework (Exits after successfully running some tasks.).
-        $ ./src/examples/java/test-framework 127.0.0.1:5050
+    # Run Java framework (Exits after successfully running some tasks.).
+    $ ./src/examples/java/test-framework 127.0.0.1:5050
 
-        # Run Python framework (Exits after successfully running some tasks.).
-        $ ./src/examples/python/test-framework 127.0.0.1:5050
+    # Run Python framework (Exits after successfully running some tasks.).
+    $ ./src/examples/python/test-framework 127.0.0.1:5050
 
-*NOTE: To build the example frameworks, make sure you build the test suite by doing `make check`.*
+*Note: These examples assume you are running Mesos on your local machine.
+Following them will not allow you to access the Mesos web page in a production
+environment (e.g. on AWS). For that you will need to specify the actual IP of
+your host when launching the Mesos master and ensure your firewall settings
+allow access to port 5050 from the outside world.*

Added: mesos/site/source/documentation/latest/high-availability-framework-guide.md
URL: http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/high-availability-framework-guide.md?rev=1727886&view=auto
==============================================================================
--- mesos/site/source/documentation/latest/high-availability-framework-guide.md (added)
+++ mesos/site/source/documentation/latest/high-availability-framework-guide.md Mon Feb  1 02:49:25 2016
@@ -0,0 +1,293 @@
+---
+layout: documentation
+---
+
+# Designing Highly Available Mesos Frameworks
+
+A Mesos framework manages tasks. For a Mesos framework to be highly available,
+it must continue to manage tasks correctly in the presence of a variety of
+failure scenarios. The most common failure conditions that framework authors
+should consider include:
+
+* The Mesos master that a framework scheduler is connected to might fail, for
+  example by crashing or by losing network connectivity. If the master has been
+  configured to use [high-availability mode](/documentation/latest/high-availability/), this will
+  result in promoting another Mesos master replica to become the current
+  leader. In this situation, the scheduler should re-register with the new
+  master and ensure that task state is consistent.
+
+* The host where a framework scheduler is running might fail. To ensure that the
+  framework remains available and can continue to schedule new tasks, framework
+  authors should ensure that multiple copies of the scheduler run on different
+  nodes, and that a backup copy is promoted to become the new leader when the
+  previous leader fails. Mesos itself does not dictate how framework authors
+  should handle this situation, although we provide some suggestions below. It
+  can be useful to deploy multiple copies of your framework scheduler using
+  a long-running task scheduler such as Apache Aurora or Marathon.
+
+* The host where a task is running might fail. Alternatively, the node itself
+  might not have failed but the Mesos agent on the node might be unable to
+  communicate with the Mesos master, e.g., due to a network partition.
+
+Note that more than one of these failures might occur simultaneously.
+
+## Mesos Architecture
+
+Before discussing the specific failure scenarios outlined above, it is worth
+highlighting some aspects of how Mesos is designed that influence high
+availability:
+
+* Mesos provides unreliable messaging between components by default: messages
+  are delivered "at-most-once" (they might be dropped). Framework authors should
+  expect that messages they send might not be received and be prepared to take
+  appropriate corrective action. To detect that a message might be lost,
+  frameworks typically use timeouts. For example, if a framework attempts to
+  launch a task, that message might not be received by the Mesos master (e.g.,
+  due to a transient network failure). To address this, the framework scheduler
+  should set a timeout after attempting to launch a new task. If the scheduler
+  hasn't seen a status update for the new task before the timeout fires, it
+  should take corrective action---for example, by performing [task state reconciliation](/documentation/latest/reconciliation/),
+  and then launching a new copy of the task if necessary.
+
+  * In general, distributed systems cannot distinguish between "lost" messages
+    and messages that are merely delayed. In the example above, the scheduler
+    might see a status update for the first task launch attempt immediately
+    _after_ its timeout has fired and it has already begun taking corrective
+    action. Scheduler authors should be aware of this possibility and program
+    accordingly.
+
+  * Mesos actually provides ordered (but unreliable) message delivery between
+    any two pair of processes: for example, if a framework sends messages M1 and
+    M2 to the master, the master might receive no messages, just M1, just M2, or
+    M1 followed by M2 -- it will _not_ receive M2 followed by M1.
+
+  * As a convenience for framework authors, Mesos provides reliable delivery of
+    task status updates. The agent persists task status updates to disk and then
+    forwards them to the master. The master sends status updates to the
+    appropriate framework scheduler. When a scheduler acknowledges a status
+    update, the master forwards the acknowledgment back to the agent, which
+    allows the stored status update to be garbage collected. If the agent does
+    not receive an acknowledgment for a task status update within a certain
+    amount of time, it will repeatedly resend the status update to the master,
+    which will again forward the update to the scheduler. Hence, task status
+    updates will be delivered "at least once", assuming that the agent and the
+    scheduler both remain available. To handle the fact that task status updates
+    might be delivered more than once, it can be helpful to make the framework
+    logic that processes them [idempotent](https://en.wikipedia.org/wiki/Idempotence).
+
+* The Mesos master stores information about the active tasks and registered
+  frameworks _in memory_: it does not persist it to disk or attempt to ensure
+  that this information is preserved after a master failover. This helps the
+  Mesos master scale to large clusters with many tasks and frameworks. A
+  downside of this design is that after a failure, more work is required to
+  recover the lost in-memory master state.
+
+* If all the Mesos masters are unavailable (e.g., crashed or unreachable), the
+  cluster should continue to operate: existing Mesos agents and user tasks should
+  continue running. However, new tasks cannot be scheduled, and frameworks will
+  not receive resource offers or status updates about previously launched tasks.
+
+* Mesos does not dictate how frameworks should be implemented and does not try
+  to assume responsibility for how frameworks should deal with failures.
+  Instead, Mesos tries to provide framework developers with the tools they need
+  to implement this behavior themselves. Different frameworks might choose to
+  handle failures differently, depending on their exact requirements.
+
+## Recommendations for Highly Available Frameworks
+
+Highly available framework designs typically follow a few common patterns:
+
+1. To tolerate scheduler failures, frameworks run multiple scheduler instances
+   (three instances is typical). At any given time, only one of these scheduler
+   instances is the _leader_: this instance is connected to the Mesos master,
+   receives resource offers and task status updates, and launches new tasks. The
+   other scheduler replicas are _followers_: they are used only when the leader
+   fails, in which case one of the followers is chosen to become the new leader.
+
+2. Schedulers need a mechanism to decide when the current scheduler leader has
+   failed and to elect a new leader. This is typically accomplished using a
+   coordination service like [Apache ZooKeeper](https://zookeeper.apache.org/)
+   or [etcd](https://github.com/coreos/etcd). Consult the documentation of the
+   coordination system you are using for more information on how to correctly
+   implement leader election.
+
+3. After electing a new leading scheduler, the new leader needs to ensure that
+   its local state is consistent with the current state of the cluster. For
+   example, suppose that the previous leading scheduler attempted to launch a
+   new task and then immediately failed. The task might have launched
+   successfully, at which point the newly elected leader will begin to receive
+   status updates about it. To handle this situation, frameworks typically use a
+   strongly consistent distributed data store to record information about active
+   and pending tasks. In fact, the same coordination service that is used for
+   leader election (such as ZooKeeper or etcd) can often be used for this
+   purpose. Some Mesos frameworks (such as Apache Aurora) use the Mesos
+   replicated log for this purpose.
+
+   * The data store should be used to record the actions that the scheduler
+     _intends_ to take, before it takes them. For example, if a scheduler
+     decides to launch a new task, it _first_ writes this intent to its data
+     store. Then it sends a "launch task" message to the Mesos master. If this
+     instance of the scheduler fails and a new scheduler is promoted to become
+     the leader, the new leader can consult the data store to find _all possible
+     tasks_ that might be running on the cluster. This is an instance of the
+     [write-ahead logging](https://en.wikipedia.org/wiki/Write-ahead_logging)
+     pattern often employed by database systems and filesystems to improve
+     reliability. Two aspects of this design are worth emphasizing.
+
+     1. The scheduler must persist its intent _before_ launching the task: if
+        the task is launched first and then the scheduler fails before it can
+        write to the data store, the new leading scheduler won't know about the
+        new task. If this occurs, the new scheduler instance will begin
+        receiving task status updates for a task that it has no knowledge of;
+        there is often not a good way to recover from this situation.
+
+     2. Second, the scheduler should ensure that its intent has been durably
+        recorded in the data store before continuing to launch the task (for
+        example, it should wait for a quorum of replicas in the data store to
+        have acknowledged receipt of the write operation). For more details on
+        how to do this, consult the documentation for the data store you are
+        using.
+
+## The Life Cycle of a Task
+
+A Mesos task transitions through a sequence of states. The authoritative "source
+of truth" for the current state of a task is the agent on which the task is
+running. A framework scheduler learns about the current state of a task by
+communicating with the Mesos master---specifically, by listening for task status
+updates and by performing task state reconciliation.
+
+Frameworks can represent the state of a task using a state machine, with one
+initial state and several possible terminal states:
+
+* A task begins in the `TASK_STAGING` state. A task is in this state when the
+  master has received the framework's request to launch the task but the task
+  has not yet started to run. In this state, the task's dependencies are
+  fetched---for example, using the [Mesos fetcher cache](/documentation/latest/fetcher/).
+
+* The `TASK_STARTING` state is optional and intended primarily for use by
+  custom executors. It can be used to describe the fact that a custom executor
+  has learned about the task (and maybe started fetching its dependencies) but has
+  not yet started to run it.
+
+* A task transitions to the `TASK_RUNNING` state after it starts running
+  successfully (if the task fails to start, it transitions to one of the
+  terminal states listed below).
+
+  * If a framework attempts to launch a task but does not receive a status
+    update for it within a timeout, the framework should perform
+    [reconciliation](/documentation/latest/reconciliation/). That is, it should ask the master for
+    the current state of the task. The master will reply with `TASK_LOST` for
+    unknown tasks. The framework can then use this to distinguish between tasks
+    that are slow to launch and tasks that the master has never heard about
+    (e.g., because the task launch message was dropped).
+
+    * Note that the correctness of this technique depends on the fact that
+      messaging between the scheduler and the master is ordered.
+
+* There are several terminal states:
+
+  * `TASK_FINISHED` is used when a task completes successfully.
+  * `TASK_FAILED` indicates that a task aborted with an error.
+  * `TASK_KILLED` indicates that a task was killed by the executor.
+  * `TASK_LOST` indicates that the task was running on an agent that has lost
+    contact with the current master (typically due to a network partition or the
+    agent host crashing). This case is described further below.
+  * `TASK_ERROR` indicates that a task launch attempt failed because of an error
+    in the task specification.
+
+## Dealing with Partitioned or Failed Agents
+
+The Mesos master keeps track of the availability and health of the registered agents
+by 2 different mechanisms.
+
+ 1) State of a persistent TCP connection to the agent.
+
+ 2) Health checks via periodic ping messages to the agent which are expected to be responded with pongs
+    (this behavior is controlled by the `--slave_ping_timeout` and `--max_slave_ping_timeouts` master flags).
+
+If the persistent TCP connection to the agent breaks or the agent fails health checks, the master decides
+that the agent has failed and takes steps to remove it from the cluster. Specifically:
+
+* If the TCP connection breaks, the agent is considered disconnected. The semantics when a registered
+  agent gets disconnected are as follows for each framework running on that agent:
+
+  * If the framework is [checkpointing](/documentation/latest/slave-recovery/): No immediate action is taken. The agent is
+    given a chance to reconnect until health checks time out.
+
+  * If the framework is not-checkpointing: All the framework's tasks and executors are considered lost. Master
+    immediately sends `TASK_LOST` status updates for the tasks. These updates are not delivered reliably to the
+    scheduler (see NOTE below). The agent is given a chance to reconnect until health checks timeout.
+
+* If the agent fails health checks it is scheduled for removal. The removals can be rate limited by the master
+  (see `---slave_removal_rate_limit` master flag) to avoid removing a slew of slaves at once (e.g., during a
+  network partition event).
+
+* Once it is time to remove an agent, the master marks it as "removed" in the master's durable state (this
+  will survive master failover). If an agent marked as "removed" attempts to reconnect to the
+  master (e.g., after network partition is restored), the connection attempt will be refused
+  and the agent asked to shutdown. A shutting down agent shuts down all running tasks and executors,
+  but any persistent volumes and dynamic reservations are still preserved.
+
+  * To allow the removed agent node to rejoin the cluster, a new `mesos-slave`
+    process can be started. This will ensure the agent receives a new agent ID and register with master
+    possibly with previously created persistent volumes and dynamic reservations. In effect, the agent will
+    be treated as a newly joined agent.
+
+* For each agent that is marked "removed" the scheduler receives a `slaveLost` callback and `TASK_LOST` status
+  updates for each task that was running on the agent
+
+	>NOTE: Neither the callback nor the updates are reliably delivered by the master. For example if
+	the master or scheduler fails over or there is a network connection issue during the delivery
+	of these messages, they will not be resent.
+
+Typically, frameworks respond to this situation by scheduling new copies of the
+tasks that were running on the lost agent. This should be done with caution,
+however: it is possible that the lost agent is still alive, but is partitioned
+from the master and is unable to communicate with it. Depending on the nature of
+the network partition, tasks on the agent might still be able to communicate
+with external clients or other hosts in the cluster. Frameworks can take steps
+to prevent this (e.g., by having tasks connect to ZooKeeper and cease operation
+if their ZooKeeper session expires), but Mesos leaves such details to framework
+authors.
+
+## Dealing with Partitioned or Failed Masters
+
+The behavior described above does not apply during the period immediately after
+a new Mesos master is elected. As noted above, most Mesos master state is kept
+in-memory; hence, when the leading master fails and a new master is elected, the
+new master will have little knowledge of the current state of the cluster.
+Instead, it rebuilds this information as the frameworks and agents notice that a
+new master has been elected and then _reregister_ with it.
+
+### Framework Reregistration
+When master failover occurs, frameworks that were connected to the previous
+leading master should reconnect to the new leading master. The
+`MesosSchedulerDriver` handles most of the details of detecting when the
+previous leading master has failed and connecting to the new leader; when the
+framework has successfully reregistered with the new leading master, the
+`reregistered` scheduler callback will be invoked.
+
+When a highly available framework scheduler initially connects to the master, it
+should set the `failover_timeout` field in its `FrameworkInfo`. This specifies
+how long the master will wait for a framework to reconnect after a failover
+before the framework's state is garbage-collected and any running tasks
+associated with the framework are killed. It is recommended that frameworks set
+a generous `failover_timeout` (e.g., 1 week) to avoid their tasks being killed
+unintentionally.
+
+### Agent Reregistration
+During the period after a new master has been elected but before a given agent
+has reregistered or the `slave_reregister_timeout` has fired, attempting to
+reconcile the state of a task running on that agent will not return any
+information (because the master cannot accurately determine the state of the
+task).
+
+If an agent does not reregister with the new master within a timeout (controlled
+by the `--slave_reregister_timeout` configuration flag), the master marks the
+agent as failed and follows the same steps described above. However, there is
+one difference: by default, agents are _allowed to reconnect_ following master
+failover, even after the `slave_reregister_timeout` has fired. This means that
+frameworks might see a `TASK_LOST` update for a task but then later discover
+that the task is running (because the agent where it was running was allowed to
+reconnect). This behavior can be avoided by enabling the `--registry_strict`
+configuration flag, which will be the default in a future version of Mesos.

Modified: mesos/site/source/documentation/latest/high-availability.md
URL: http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/high-availability.md?rev=1727886&r1=1727885&r2=1727886&view=diff
==============================================================================
--- mesos/site/source/documentation/latest/high-availability.md (original)
+++ mesos/site/source/documentation/latest/high-availability.md Mon Feb  1 02:49:25 2016
@@ -2,10 +2,12 @@
 layout: documentation
 ---
 
-# Mesos High Availability Mode
+# Mesos High-Availability Mode
 
 If the Mesos master is unavailable, existing tasks can continue to execute, but new resources cannot be allocated and new tasks cannot be launched. To reduce the chance of this situation occurring, Mesos has a high-availability mode that uses multiple Mesos masters: one active master (called the _leader_ or leading master) and several _backups_ in case it fails. The masters elect the leader, with [Apache ZooKeeper](http://zookeeper.apache.org/) both coordinating the election and handling leader detection by masters, slaves, and scheduler drivers. More information regarding [how leader election works](http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection) is available on the Apache Zookeeper website.
 
+This document describes how to configure Mesos to run in high-availability mode. For more information on developing highly available frameworks, see a [companion document](/documentation/latest/high-availability-framework-guide/).
+
 **Note**: This document assumes you know how to start, run, and work with ZooKeeper, whose client library is included in the standard Mesos build.
 
 ## Usage
@@ -19,11 +21,11 @@ To put Mesos into high-availability mode
 
     * Start the mesos-slave binaries with `--master=zk://host1:port1,host2:port2,.../path`
 
-    * Start any framework schedulers using the same `zk` path as in the last two steps. The SchedulerDriver must be constructed with this path, as shown in the [Framework Development Guide]( http://mesos.apache.org/documentation/latest/app-framework-development-guide/).
+    * Start any framework schedulers using the same `zk` path as in the last two steps. The SchedulerDriver must be constructed with this path, as shown in the [Framework Development Guide](/documentation/latest/app-framework-development-guide/).
 
 From now on, the Mesos masters and slaves all communicate with ZooKeeper to find out which master is the current leading master. This is in addition to the usual communication between the leading master and the slaves.
 
-Refer to the [Scheduler API](http://mesos.apache.org/documentation/latest/app-framework-development-guide/) for how to deal with leadership changes.
+Refer to the [Scheduler API](/documentation/latest/app-framework-development-guide/) for how to deal with leadership changes.
 
 ## Component Disconnection Handling
 When a network partition disconnects a component (master, slave, or scheduler driver) from ZooKeeper, the component's Master Detector induces a timeout event. This notifies the component that it has no leading master. Depending on the component, the following happens. (Note that while a component is disconnected from ZooKeeper, a master may still be in communication with slaves or schedulers and vice versa.)
@@ -43,7 +45,7 @@ When a network partition disconnects a s
 
 * The slave fails health checks from the leader.
 
-* The leader marks the slave as deactivated and sends its tasks to the LOST state. The  [Framework Development Guide](http://mesos.apache.org/documentation/latest/app-framework-development-guide/) describes these various task states.
+* The leader marks the slave as deactivated and sends its tasks to the LOST state. The  [Framework Development Guide](/documentation/latest/app-framework-development-guide/) describes these various task states.
 
 * Deactivated slaves may not re-register with the leader and are told to shut down upon any post-deactivation communication.
 

Added: mesos/site/source/documentation/latest/logging.md
URL: http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/logging.md?rev=1727886&view=auto
==============================================================================
--- mesos/site/source/documentation/latest/logging.md (added)
+++ mesos/site/source/documentation/latest/logging.md Mon Feb  1 02:49:25 2016
@@ -0,0 +1,192 @@
+---
+layout: documentation
+---
+
+# Logging
+
+Mesos handles the logs of each Mesos component differently depending on the
+degree of control Mesos has over the source code of the component.
+
+Roughly, these categories are:
+
+* [Internal](#Internal) - Master and Agent.
+* [Containers](#Containers) - Executors and Tasks.
+* External - Components launched outside of Mesos, like
+  Frameworks and [ZooKeeper](/documentation/latest/high-availability/).  These are expected to
+  implement their own logging solution.
+
+## <a name="Internal"></a>Internal
+
+The Mesos Master and Agent use the
+[Google's logging library](https://github.com/google/glog).
+For information regarding the command-line options used to configure this
+library, see the [configuration documentation](/documentation/latest/configuration/).
+Google logging options that are not explicitly mentioned there can be
+configured via environment variables.
+
+Both Master and Agent also expose an HTTP endpoint which temporarily toggles
+verbose logging:
+
+```
+POST <ip:port>/logging/toggle?level=[1|2|3]&duration=VALUE
+```
+
+The effect is analogous to setting the `GLOG_v` environment variable prior
+to starting the Master/Agent, except the logging level will revert to the
+original level after the given duration.
+
+## <a name="Containers"></a>Containers
+
+For background, see [the containerizer documentation](/documentation/latest/containerizer/).
+
+Mesos does not assume any structured logging for entities running inside
+containers.  Instead, Mesos will store the stdout and stderr of containers
+into plain files ("stdout" and "stderr") located inside
+[the sandbox](sandbox.md#where-is-it).
+
+In some cases, the default Container logger behavior of Mesos is not ideal:
+
+* Logging may not be standardized across containers.
+* Logs are not easily aggregated.
+* Log file sizes are not managed.  Given enough time, the "stdout" and "stderr"
+  files can fill up the Agent's disk.
+
+## `ContainerLogger` Module
+
+The `ContainerLogger` module was introduced in Mesos 0.27.0 and aims to address
+the shortcomings of the default logging behavior for containers.  The module
+can be used to change how Mesos redirects the stdout and stderr of containers.
+
+The [interface for a `ContainerLogger` can be found here](https://github.com/apache/mesos/blob/master/include/mesos/slave/container_logger.hpp).
+
+Mesos comes with two `ContainerLogger` modules:
+
+* The `SandboxContainerLogger` implements the existing logging behavior as
+  a `ContainerLogger`.  This is the default behavior.
+* The `LogrotateContainerLogger` addresses the problem of unbounded log file
+  sizes.
+
+### `LogrotateContainerLogger`
+
+The `LogrotateContainerLogger` constrains the total size of a container's
+stdout and stderr files.  The module does this by rotating log files based
+on the parameters to the module.  When a log file reaches its specified
+maximum size, it is renamed by appending a `.N` to the end of the filename,
+where `N` increments each rotation.  Older log files are deleted when the
+specified maximum number of files is reached.
+
+#### Invoking the module
+
+The `LogrotateContainerLogger` can be loaded by specifying the library
+`liblogrotate_container_logger.so` in the
+[`--modules` flag](modules.md#Invoking) when starting the Agent and by
+setting the `--container_logger` Agent flag to
+`org_apache_mesos_LogrotateContainerLogger`.
+
+#### Module parameters
+
+<table class="table table-striped">
+  <thead>
+    <tr>
+      <th width="30%">
+        Key
+      </th>
+      <th>
+        Explanation
+      </th>
+    </tr>
+  </thead>
+
+  <tr>
+    <td>
+      <code>max_stdout_size</code>/<code>max_stderr_size</code>
+    </td>
+    <td>
+      Maximum size, in bytes, of a single stdout/stderr log file.
+      When the size is reached, the file will be rotated.
+
+      Defaults to 10 MB.  Minimum size of 1 (memory) page, usually around 4 KB.
+    </td>
+  </tr>
+
+  <tr>
+    <td>
+      <code>logrotate_stdout_options</code>/
+      <code>logrotate_stderr_options</code>
+    </td>
+    <td>
+      Additional config options to pass into <code>logrotate</code> for stdout.
+      This string will be inserted into a <code>logrotate</code> configuration
+      file. i.e. For "stdout":
+      <pre>
+/path/to/stdout {
+  [logrotate_stdout_options]
+  size [max_stdout_size]
+}</pre>
+      NOTE: The <code>size</code> option will be overriden by this module.
+    </td>
+  </tr>
+
+  <tr>
+    <td>
+      <code>launcher_dir</code>
+    </td>
+    <td>
+      Directory path of Mesos binaries.
+      The <code>LogrotateContainerLogger</code> will find the
+      <code>mesos-logrotate-logger</code> binary under this directory.
+
+      Defaults to <code>/usr/local/libexec/mesos</code>.
+    </td>
+  </tr>
+
+  <tr>
+    <td>
+      <code>logrotate_path</code>
+    </td>
+    <td>
+      If specified, the <code>LogrotateContainerLogger</code> will use the
+      specified <code>logrotate</code> instead of the system's
+      <code>logrotate</code>.  If <code>logrotate</code> is not found, then
+      the module will exit with an error.
+    </td>
+  </tr>
+</table>
+
+#### How it works
+
+1. Every time a container starts up, the `LogrotateContainerLogger`
+   starts up companion subprocesses of the `mesos-logrotate-logger` binary.
+2. The module instructs Mesos to redirect the container's stdout/stderr
+   to the `mesos-logrotate-logger`.
+3. As the container outputs to stdout/stderr, `mesos-logrotate-logger` will
+   pipe the output into the "stdout"/"stderr" files.  As the files grow,
+   `mesos-logrotate-logger` will call `logrotate` to keep the files strictly
+   under the configured maximum size.
+4. When the container exits, `mesos-logrotate-logger` will finish logging before
+   exiting as well.
+
+The `LogrotateContainerLogger` is designed to be resilient across Agent
+failover.  If the Agent process dies, any instances of `mesos-logrotate-logger`
+will continue to run.
+
+### Writing a Custom `ContainerLogger`
+
+For basics on module writing, see [the modules documentation](/documentation/latest/modules/).
+
+There are several caveats to consider when designing a new `ContainerLogger`:
+
+* Logging by the `ContainerLogger` should be resilient to Agent failover.
+  If the Agent process dies (which includes the `ContainerLogger` module),
+  logging should continue.  This is usually achieved by using subprocesses.
+* When containers shut down, the `ContainerLogger` is not explicitly notified.
+  Instead, encountering `EOF` in the container's stdout/stderr signifies
+  that the container has exited.  This provides a stronger guarantee that the
+  `ContainerLogger` has seen all the logs before exiting itself.
+* The `ContainerLogger` should not assume that containers have been launched
+  with any specific `ContainerLogger`.  The Agent may be restarted with a
+  different `ContainerLogger`.
+* Each [containerizer](/documentation/latest/containerizer/) running on an Agent uses its own
+  instance of the `ContainerLogger`.  This means more than one `ContainerLogger`
+  may be running in a single Agent.  However, each Agent will only run a single
+  type of `ContainerLogger`.

Modified: mesos/site/source/documentation/latest/maintenance.md
URL: http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/maintenance.md?rev=1727886&r1=1727885&r2=1727886&view=diff
==============================================================================
--- mesos/site/source/documentation/latest/maintenance.md (original)
+++ mesos/site/source/documentation/latest/maintenance.md Mon Feb  1 02:49:25 2016
@@ -17,7 +17,7 @@ Frameworks require visibility into any a
 in order to meet Service Level Agreements or to ensure uninterrupted services
 for their end users.  Therefore, to reconcile the requirements of frameworks
 and operators, frameworks must be aware of planned maintenance events and
-operators must be aware of frameworksâ ability to adapt to maintenance.
+operators must be aware of frameworks' ability to adapt to maintenance.
 Maintenance primitives add a layer to facilitate communication between the
 frameworks and operator.
 

Modified: mesos/site/source/documentation/latest/markdown-style-guide.md
URL: http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/markdown-style-guide.md?rev=1727886&r1=1727885&r2=1727886&view=diff
==============================================================================
--- mesos/site/source/documentation/latest/markdown-style-guide.md (original)
+++ mesos/site/source/documentation/latest/markdown-style-guide.md Mon Feb  1 02:49:25 2016
@@ -61,7 +61,7 @@ We use single backticks to highlight sam
 
 Files and path references should be specified as follows:
 
-~~~{.text}
+~~~{.txt}
 Remember you can also use the `file:///path/to/file` or `/path/to/file`
 ~~~
 
@@ -70,36 +70,33 @@ Remember you can also use the `file:///p
 
 In order to avoid problems with markdown formatting we should specify tables in html directly:
 
-~~~{.html}
-<table class="table table-striped">
-  <thead>
-    <tr>
-      <th width="30%">
-        Flag
-      </th>
-      <th>
-        Explanation
-      </th>
-  </thead>
-  <tr>
-    <td>
-      --ip=VALUE
-    </td>
-    <td>
-      IP address to listen on
-    </td>
-  </tr>
-  <tr>
-    <td>
-      --[no-]help
-    </td>
-    <td>
-      Prints this help message (default: false)
-
-    </td>
-  </tr>
-</table>
-~~~
+    <table class="table table-striped">
+      <thead>
+        <tr>
+          <th width="30%">
+            Flag
+          </th>
+          <th>
+            Explanation
+          </th>
+      </thead>
+      <tr>
+        <td>
+          --ip=VALUE
+        </td>
+        <td>
+          IP address to listen on
+        </td>
+      </tr>
+      <tr>
+        <td>
+          --[no-]help
+        </td>
+        <td>
+          Prints this help message (default: false)
+        </td>
+      </tr>
+    </table>
 
 
 ## Indendation and Whitespace