You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mesos.apache.org by ka...@apache.org on 2016/02/01 03:49:28 UTC
svn commit: r1727886 [6/16] - in /mesos/site: ./ publish/ publish/assets/img/documentation/ publish/community/ publish/documentation/ publish/documentation/allocation-module/ publish/documentation/app-framework-development-guide/ publish/documentation/...

Modified: mesos/site/publish/documentation/latest/containerizer-internals/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/latest/containerizer-internals/index.html?rev=1727886&r1=1727885&r2=1727886&view=diff
==============================================================================
--- mesos/site/publish/documentation/latest/containerizer-internals/index.html (original)
+++ mesos/site/publish/documentation/latest/containerizer-internals/index.html Mon Feb  1 02:49:25 2016
@@ -112,9 +112,9 @@ generating executor information.</li>
 
 <ul>
 <li>Composing</li>
-<li>Docker</li>
-<li>Mesos</li>
-<li>External (deprecated)</li>
+<li><a href="/documentation/latest/docker-containerizer/">Docker</a></li>
+<li><a href="/documentation/latest/containerizer/">Mesos</a></li>
+<li><a href="/documentation/latest/external-containerizer/">External</a> (deprecated)</li>
 </ul>
 
 
@@ -148,21 +148,23 @@ if <code>ContainerInfo::type</code> is s
 <code>--docker_mesos_image</code>. In this case, the value of flag
 <code>--docker_mesos_image</code> is assumed to be the docker image used to
 launch the Mesos agent.</li>
-<li>If the task uses an executor to run (not the default command
-executor), then that executor is launched in a docker container.</li>
-<li>If the task uses <code>TaskInfo</code>, then the default executor
-<code>mesos-docker-executor</code> is lunched in a docker container.</li>
+<li>If the task includes an executor (custom executor), then that executor is
+launched in a docker container.</li>
+<li>If the task does not include an executor i.e. it defines a command, the
+default executor <code>mesos-docker-executor</code> is launched in a docker container to
+execute the command via Docker CLI.</li>
 </ul>
 
 
 <p>B) Mesos agent does not run in a docker container</p>
 
 <ul>
-<li>If task uses <code>TaskInfo</code>, then it forks a subprocess to execute
-<code>mesos-docker-executor</code>. mesos-docker-executor spawns shell to
-execute docker commands (e.g., docker run).</li>
-<li>If the task uses a custom executor, then its launched in a docker
-container.</li>
+<li>If the task includes an executor (custom executor), then that executor is
+launched in a docker container.</li>
+<li>If task does not include an executor i.e. it defines a command, a subprocess
+is forked to execute the default executor <code>mesos-docker-executor</code>.
+<code>mesos-docker-executor</code> then spawns a shell to execute the command via Docker
+CLI.</li>
 </ul>
 
 

Modified: mesos/site/publish/documentation/latest/containerizer/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/latest/containerizer/index.html?rev=1727886&r1=1727885&r2=1727886&view=diff
==============================================================================
--- mesos/site/publish/documentation/latest/containerizer/index.html (original)
+++ mesos/site/publish/documentation/latest/containerizer/index.html Mon Feb  1 02:49:25 2016
@@ -81,87 +81,113 @@
 		<p>See our <a href="/community/">community</a> page for more details.</p>
 	</div>
 	<div class="col-md-8">
-		<h1>Mesos Containerizer</h1>
+		<h1>Containerizer</h1>
+
+<h2>Motivation</h2>
+
+<p>Containerizers are intended to run tasks in &lsquo;containers&rsquo; which in turn are used
+to:</p>
+
+<ul>
+<li>Isolate a task from other running tasks.</li>
+<li>&lsquo;Contain&rsquo; tasks to run in limited resource runtime environment.</li>
+<li>Control task&rsquo;s individual resources (e.g, CPU, memory) programatically.</li>
+<li>Run software in a pre-packaged file system image, allowing it to run in
+different environments.</li>
+</ul>
+
+
+<h2>Types of containerizers</h2>
+
+<p>Mesos plays well with existing container technologies (e.g., docker) and also
+provides its own container technology. It also supports composing different
+container technologies(e.g., docker and mesos).</p>
+
+<p>Mesos implements the following containerizers:</p>
+
+<ul>
+<li><a href="#Composing">Composing</a></li>
+<li><a href="#Docker">Docker</a></li>
+<li><a href="#Mesos">Mesos (default)</a></li>
+<li>External (deprecated)</li>
+</ul>
+
+
+<p>User can specify the types of containerizers to use via the agent flag
+<code>--containerizers</code>.</p>
+
+<p><a name="Composing"></a></p>
+
+<h3>Composing containerizer</h3>
+
+<p>This feature allows multiple container technologies to play together. It is
+enabled when you configure the <code>--containerizers</code> agent flag with multiple comma
+seperated containerizer names (e.g., <code>--containerizers=mesos,docker</code>). The order
+of the comma separated list is important as the first containerizer that
+supports the task&rsquo;s container configuration will be used to launch the task.</p>
+
+<p>Use cases:</p>
+
+<ul>
+<li>For testing tasks with different types of resource isolations. Since &lsquo;mesos&rsquo;
+containerizers have more isolation abilities, a framework can use composing
+containerizer to test a task using &lsquo;mesos&rsquo; containerizer&rsquo;s controlled
+environment and at the same time test it to work with &lsquo;docker&rsquo; containers by
+just changing the container parameters for the task.</li>
+</ul>
+
+
+<p><a name="Docker"></a></p>
+
+<h3>Docker containerizer</h3>
+
+<p>Docker containerizer allows tasks to be run inside docker container. This
+containerizer is enabled when you configure the agent flag as
+<code>--containerizers=docker</code>.</p>
+
+<p>Use cases:</p>
+
+<ul>
+<li>If task needs to be run with the tooling that comes with the docker package.</li>
+<li>If Mesos agent is running inside a docker container.</li>
+</ul>
+
+
+<p>For more details, see
+<a href="/documentation/latest/docker-containerizer/">Docker Containerizer</a>.</p>
+
+<p><a name="Mesos"></a></p>
+
+<h3>Mesos containerizer</h3>
+
+<p>This containerizer allows tasks to be run with an array of pluggable isolators
+provided by Mesos. This is the native Mesos containerizer solution and is
+enabled when you configure the agent flag as <code>--containerizers=mesos</code>.</p>
+
+<p>Use cases:</p>
+
+<ul>
+<li>Allow Mesos to control the task&rsquo;s runtime environment without depending on
+other container technologies (e.g., docker).</li>
+<li>Want fine grained operating system controls (e.g., cgroups/namespaces provided
+by linux).</li>
+<li>Want Mesos&rsquo;s latest container technology features.</li>
+<li>Need additional resource controls like disk usage limits, which
+might not be provided by other container technologies.</li>
+<li>Want to add custom isolation for tasks.</li>
+</ul>
+
+
+<p>For more details, see
+<a href="/documentation/latest/mesos-containerizer/">Mesos Containerizer</a>.</p>
+
+<h2>References</h2>
+
+<ul>
+<li><a href="/documentation/latest/containerizer-internals/">Containerizer Internals</a> for
+implementation details of containerizers.</li>
+</ul>
 
-<p>The MesosContainerizer provides lightweight containerization and
-resource isolation of executors using Linux-specific functionality
-such as control cgroups and namespaces. It is composable so operators
-can selectively enable different isolators.</p>
-
-<p>It also provides basic support for POSIX systems (e.g., OSX) but
-without any actual isolation, only resource usage reporting.</p>
-
-<h3>Shared Filesystem</h3>
-
-<p>The SharedFilesystem isolator can optionally be used on Linux hosts to
-enable modifications to each container&rsquo;s view of the shared
-filesystem.</p>
-
-<p>The modifications are specified in the ContainerInfo included in the
-ExecutorInfo, either by a framework or by using the
-<code>--default_container_info</code> slave flag.</p>
-
-<p>ContainerInfo specifies Volumes which map parts of the shared
-filesystem (host_path) into the container&rsquo;s view of the filesystem
-(container_path), as read-write or read-only. The host_path can be
-absolute, in which case it will make the filesystem subtree rooted at
-host_path also accessible under container_path for each container.
-If host_path is relative then it is considered as a directory
-relative to the executor&rsquo;s work directory. The directory will be
-created and permissions copied from the corresponding directory (which
-must exist) in the shared filesystem.</p>
-
-<p>The primary use-case for this isolator is to selectively make parts of
-the shared filesystem private to each container. For example, a
-private &ldquo;/tmp&rdquo; directory can be achieved with <code>host_path="tmp"</code> and
-<code>container_path="/tmp"</code> which will create a directory &ldquo;tmp&rdquo; inside the
-executor&rsquo;s work directory (mode 1777) and simultaneously mount it as
-/tmp inside the container. This is transparent to processes running
-inside the container. Containers will not be able to see the host&rsquo;s
-/tmp or any other container&rsquo;s /tmp.</p>
-
-<h3>Pid Namespace</h3>
-
-<p>The Pid Namespace isolator can be used to isolate each container in
-a separate pid namespace with two main benefits:</p>
-
-<ol>
-<li><p>Visibility: Processes running in the container (executor and
-descendants) are unable to see or signal processes outside the
-namespace.</p></li>
-<li><p>Clean termination: Termination of the leading process in a pid
-namespace will result in the kernel terminating all other processes
-in the namespace.</p></li>
-</ol>
-
-
-<p>The Launcher will use (2) during destruction of a container in
-preference to the freezer cgroup, avoiding known kernel issues related
-to freezing cgroups under OOM conditions.</p>
-
-<p>/proc will be mounted for containers so tools such as &lsquo;ps&rsquo; will work
-correctly.</p>
-
-<h3>Posix Disk Isolator</h3>
-
-<p>The Posix Disk isolator provides basic disk isolation. It is able to
-report the disk usage for each sandbox and optionally enforce the disk
-quota. It can be used on both Linux and OS X.</p>
-
-<p>To enable the Posix Disk isolator, append <code>posix/disk</code> to the
-<code>--isolation</code> flag when starting the slave.</p>
-
-<p>By default, the disk quota enforcement is disabled. To enable it,
-specify <code>--enforce_container_disk_quota</code> when starting the slave.</p>
-
-<p>The Posix Disk isolator reports disk usage for each sandbox by
-periodically running the <code>du</code> command. The disk usage can be retrieved
-from the resource statistics endpoint (<code>/monitor/statistics.json</code>).</p>
-
-<p>The interval between two <code>du</code>s can be controlled by the slave flag
-<code>--container_disk_watch_interval</code>. For example,
-<code>--container_disk_watch_interval=1mins</code> sets the interval to be 1
-minute. The default interval is 15 seconds.</p>
 
 	</div>
 </div>

Modified: mesos/site/publish/documentation/latest/docker-containerizer/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/latest/docker-containerizer/index.html?rev=1727886&r1=1727885&r2=1727886&view=diff
==============================================================================
--- mesos/site/publish/documentation/latest/docker-containerizer/index.html (original)
+++ mesos/site/publish/documentation/latest/docker-containerizer/index.html Mon Feb  1 02:49:25 2016
@@ -104,7 +104,7 @@ iptables -A INPUT -s 172.17.0.0/16 -i do
 
 <h2>How do I use the Docker Containerizer?</h2>
 
-<p>TaskInfo before 0.20.0 used to only support either setting a CommandInfo that launches a task running the bash command, or a ExecutorInfo that launches a custom Executor
+<p>TaskInfo before 0.20.0 used to only support either setting a CommandInfo that launches a task running the bash command, or an ExecutorInfo that launches a custom Executor
 that will launches the task.</p>
 
 <p>With 0.20.0 we added a ContainerInfo field to TaskInfo and ExecutorInfo that allows a Containerizer such as Docker to be configured to run the task or executor.</p>
@@ -135,11 +135,11 @@ Note that the Docker image is expected t
 
 <p>Note that we currently default to host networking when running a docker image, to easier support running a docker image as an Executor.</p>
 
-<p>The containerizer also supports optional force pulling of the image, and if disabled the docker image will only be updated again if it&rsquo;s not available on the host.</p>
+<p>The containerizer also supports optional force pulling of the image. It is set disabled as default, so the docker image will only be updated again if it&rsquo;s not available on the host. To enable force pulling an image, <code>force_pull_image</code> has to be set as true.</p>
 
 <h2>Private Docker repository</h2>
 
-<p>To run a image from a private repository, one can include the uri pointing to a <code>.dockercfg</code> that contains login information. The <code>.dockercfg</code> file will be pulled into the sandbox the Docker Containerizer
+<p>To run an image from a private repository, one can include the uri pointing to a <code>.dockercfg</code> that contains login information. The <code>.dockercfg</code> file will be pulled into the sandbox the Docker Containerizer
 set the HOME environment variable pointing to the sandbox so docker cli will automatically pick up the config file.</p>
 
 <h2>CommandInfo to run Docker images</h2>

Modified: mesos/site/publish/documentation/latest/effective-code-reviewing/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/latest/effective-code-reviewing/index.html?rev=1727886&r1=1727885&r2=1727886&view=diff
==============================================================================
--- mesos/site/publish/documentation/latest/effective-code-reviewing/index.html (original)
+++ mesos/site/publish/documentation/latest/effective-code-reviewing/index.html Mon Feb  1 02:49:25 2016
@@ -102,7 +102,7 @@ reviews based on commits. Become familia
 change clear in the review request, so the reviewer is not left
 guessing. It is highly recommended to attach a JIRA issue with your
 review for additional context.</li>
-<li><strong>Follow the <a href="http://mesos.apache.org/documentation/latest/c++-style-guide/">style guide</a>
+<li><strong>Follow the <a href="/documentation/latest/c++-style-guide/">style guide</a>
 and the style of code around you</strong>.</li>
 <li><strong>Do a self-review of your changes before publishing</strong>: Approach it
 from the perspective of a reviewer with no context. Is it easy to figure

Added: mesos/site/publish/documentation/latest/executor-http-api/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/latest/executor-http-api/index.html?rev=1727886&view=auto
==============================================================================
--- mesos/site/publish/documentation/latest/executor-http-api/index.html (added)
+++ mesos/site/publish/documentation/latest/executor-http-api/index.html Mon Feb  1 02:49:25 2016
@@ -0,0 +1,462 @@
+<!DOCTYPE html>
+<html>
+    <head>
+        <meta charset="utf-8">
+        <title></title>
+		    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+
+		    <link href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+		    <link rel="alternate" type="application/atom+xml" title="Apache Mesos Blog" href="/blog/feed.xml">
+		    
+		    <link href="../../../assets/css/main.css" media="screen" rel="stylesheet" type="text/css" />
+				
+		    
+			
+			<!-- Google Analytics Magic -->
+			<script type="text/javascript">
+			  var _gaq = _gaq || [];
+			  _gaq.push(['_setAccount', 'UA-20226872-1']);
+			  _gaq.push(['_setDomainName', 'apache.org']);
+			  _gaq.push(['_trackPageview']);
+
+			  (function() {
+			    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+			    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+			    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+			  })();
+			</script>
+    </head>
+    <body>
+			<!-- magical breadcrumbs -->
+			<div class="topnav">
+			<ul class="breadcrumb">
+			  <li>
+					<div class="dropdown">
+					  <a data-toggle="dropdown" href="#">Apache Software Foundation <span class="caret"></span></a>
+					  <ul class="dropdown-menu" role="menu" aria-labelledby="dLabel">
+							<li><a href="http://www.apache.org">Apache Homepage</a></li>
+							<li><a href="http://www.apache.org/licenses/">License</a></li>
+					  	<li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li>  
+					  	<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+							<li><a href="http://www.apache.org/security/">Security</a></li>
+					  </ul>
+					</div>
+				</li>
+				<li><a href="http://mesos.apache.org">Apache Mesos</a></li>
+				
+				
+					<li><a href="/documentation
+/">Documentation
+</a></li>
+				
+				
+			</ul><!-- /breadcrumb -->
+			</div>
+			
+			<!-- navbar excitement -->
+	    <div class="navbar navbar-static-top" role="navigation">
+	      <div class="navbar-inner">
+	        <div class="container">
+						<a href="/" class="logo"><img src="/assets/img/mesos_logo.png" alt="Apache Mesos logo" /></a>
+					<div class="nav-collapse">
+						<ul class="nav nav-pills navbar-right">
+						  <li><a href="/gettingstarted/">Getting Started</a></li>
+						  <li><a href="/documentation/latest/">Documentation</a></li>
+						  <li><a href="/downloads/">Downloads</a></li>
+						  <li><a href="/community/">Community</a></li>
+						</ul>
+					</div>
+	        </div>
+	      </div>
+	    </div><!-- /.navbar -->
+
+      <div class="container">
+
+			<div class="row-fluid">
+	<div class="col-md-4">
+		<h4>If you're new to Mesos</h4>
+		<p>See the <a href="/gettingstarted/">getting started</a> page for more information about downloading, building, and deploying Mesos.</p>
+		
+		<h4>If you'd like to get involved or you're looking for support</h4>
+		<p>See our <a href="/community/">community</a> page for more details.</p>
+	</div>
+	<div class="col-md-8">
+		<h1>Executor HTTP API</h1>
+
+<p>Mesos 0.27.0 added <strong>experimental</strong> support for V1 Executor HTTP API.</p>
+
+<h2>Overview</h2>
+
+<p>The executor interacts with Mesos via  &ldquo;/api/v1/executor&rdquo; endpoint hosted by the Mesos agent. The fully qualified URL of the endpoint might look like:</p>
+
+<pre><code>http://agenthost:5051/api/v1/executor
+</code></pre>
+
+<p>Note that we refer to this endpoint with its suffix &ldquo;/executor&rdquo; in the rest of this document. This endpoint accepts HTTP POST requests with data encoded as JSON (Content-Type: application/json) or binary Protobuf (Content-Type: application/x-protobuf). The first request that the executor sends to &ldquo;/executor&rdquo; endpoint is called SUBSCRIBE and results in a streaming response (&ldquo;200 OK&rdquo; status code with Transfer-Encoding: chunked). <strong>Executors are expected to keep the subscription connection open as long as possible (barring errors in network, agent process restarts, software bugs etc.) and incrementally process the response</strong> (NOTE: HTTP client libraries that can only parse the response after the connection is closed cannot be used). For the encoding used, please refer to <strong>Events</strong> section below.</p>
+
+<p>All the subsequent (non subscribe) requests to &ldquo;/executor&rdquo; endpoint (see details below in <strong>Calls</strong> section) must be sent using a different connection(s) than the one being used for subscription. Agent responds to these HTTP POST requests with &ldquo;202 Accepted&rdquo; status codes (or, for unsuccessful requests, with 4xx or 5xx status codes; details in later sections). The &ldquo;202 Accepted&rdquo; response means that a request has been accepted for processing, not that the processing of the request has been completed. The request might or might not be acted upon by Mesos (e.g., agent fails during the processing of the request). Any asynchronous responses from these requests will be streamed on the long-lived subscription connection.</p>
+
+<h2>Calls</h2>
+
+<p>The following calls are currently accepted by the agent. The canonical source of this information is <a href="https://github.com/apache/mesos/blob/master/include/mesos/v1/executor/executor.proto">executor.proto</a> (NOTE: The protobuf definitions are subject to change before the Beta API is finalized). Note that when sending JSON encoded Calls, executors should encode raw bytes in Base64 and strings in UTF-8.</p>
+
+<h3>SUBSCRIBE</h3>
+
+<p>This is the first step in the communication process between the executor and agent. This is also to be considered as subscription to the &ldquo;/executor&rdquo; events stream.</p>
+
+<p>To subscribe with the agent, the executor sends a HTTP POST request with encoded <code>SUBSCRIBE</code> message. The HTTP response is a stream with <a href="scheduler-http-api.md#recordio-response-format">RecordIO</a> encoding, with the first event being <code>SUBSCRIBED</code> event (see details in <strong>Events</strong> section).</p>
+
+<p>Additionally, if the executor is connecting to the agent after a <a href="#disconnections">disconnection</a>, it can also send a list of:</p>
+
+<ul>
+<li><strong>Unacknowledged Status Updates</strong>: The executor is expected to maintain a list of status updates not acknowledged by the agent via the <code>ACKNOWLEDGE</code> events.</li>
+<li><strong>Unacknowledged Tasks</strong>: The executor is expected to maintain a list of tasks that have not been acknowledged by the agent. A task is considered acknowledged if atleast one of the status updates for this task is acknowledged by the slave.</li>
+</ul>
+
+
+<pre><code>SUBSCRIBE Request (JSON):
+
+POST /api/v1/executor  HTTP/1.1
+
+Host: agenthost:5051
+Content-Type: application/json
+Accept: application/json
+
+{
+  "type": "SUBSCRIBE",
+  "executor_id": {
+    "value": "387aa966-8fc5-4428-a794-5a868a60d3eb"
+  },
+  "framework_id": {
+    "value": "49154f1b-8cf6-4421-bf13-8bd11dccd1f1"
+  },
+  "subscribe": {
+    "unacknowledged_tasks": [
+      {
+        "name": "dummy-task",
+        "task_id": {
+          "value": "d40f3f3e-bbe3-44af-a230-4cb1eae72f67"
+        },
+        "agent_id": {
+          "value": "f1c9cdc5-195e-41a7-a0d7-adaa9af07f81"
+        },
+        "command": {
+          "value": "ls",
+          "arguments": [
+            "-l",
+            "\/tmp"
+          ]
+        }
+      }
+    ],
+    "unacknowledged_updates": [
+      {
+        "framework_id": {
+          "value": "49154f1b-8cf6-4421-bf13-8bd11dccd1f1"
+        },
+        "status": {
+          "source": "SOURCE_EXECUTOR",
+          "task_id": {
+            "value": "d40f3f3e-bbe3-44af-a230-4cb1eae72f67"
+          },
+        "state": "TASK_RUNNING",
+        "uuid": "ZDQwZjNmM2UtYmJlMy00NGFmLWEyMzAtNGNiMWVhZTcyZjY3Cg=="
+        }
+      }
+    ]
+  }
+}
+
+SUBSCRIBE Response Event (JSON):
+HTTP/1.1 200 OK
+
+Content-Type: application/json
+Transfer-Encoding: chunked
+
+&lt;event-length&gt;
+{
+  "type": "SUBSCRIBED",
+  "subscribed": {
+    "executor_info": {
+      "executor_id": {
+        "value": "387aa966-8fc5-4428-a794-5a868a60d3eb"
+      },
+      "command": {
+        "value": "\/path\/to\/executor"
+      },
+      "framework_id": {
+        "value": "49154f1b-8cf6-4421-bf13-8bd11dccd1f1"
+      }
+    },
+    "framework_info": {
+      "user": "foo",
+      "name": "my_framework"
+    },
+    "agent_id": {
+      "value": "f1c9cdc5-195e-41a7-a0d7-adaa9af07f81"
+    },
+    "agent_info": {
+      "host": "agenthost",
+      "port": 5051
+    }
+  }
+}
+&lt;more events&gt;
+</code></pre>
+
+<p>NOTE: Once an executor is launched, the agent waits for a duration of <code>--executor_registration_timeout</code> (configurable at agent startup) for the executor to subscribe. If the executor fails to subscribe within this duration, the agent forcefully destroys the container executor is running in.</p>
+
+<h3>UPDATE</h3>
+
+<p>Sent by the executor to reliably communicate the state of managed tasks. It is crucial that a terminal update (e.g., <code>TASK_FINISHED</code>, <code>TASK_KILLED</code> or <code>TASK_FAILED</code>) is sent to the agent as soon as the task terminates, in order to allow Mesos to release the resources allocated to the task.</p>
+
+<p>The scheduler must explicitly respond to this call through an <code>ACKNOWLEDGE</code> message (see <code>ACKNOWLEDGED</code> in the Events section below for the semantics). The executor must maintain a list of unacknowledged updates. If for some reason, the executor is disconnected from the agent, these updates must be sent as part of <code>SUBSCRIBE</code> request in the <code>unacknowledged_updates</code> field.</p>
+
+<pre><code>UPDATE Request (JSON):
+
+POST /api/v1/executor  HTTP/1.1
+
+Host: agenthost:5051
+Content-Type: application/json
+Accept: application/json
+
+{
+  "executor_id": {
+    "value": "387aa966-8fc5-4428-a794-5a868a60d3eb"
+  },
+  "framework_id": {
+    "value": "9aaa9d0d-e00d-444f-bfbd-23dd197939a0-0000"
+  },
+  "type": "UPDATE",
+  "update": {
+    "status": {
+      "executor_id": {
+        "value": "387aa966-8fc5-4428-a794-5a868a60d3eb"
+      },
+      "source": "SOURCE_EXECUTOR",
+      "state": "TASK_RUNNING",
+      "task_id": {
+        "value": "66724cec-2609-4fa0-8d93-c5fb2099d0f8"
+      },
+      "uuid": "ZDQwZjNmM2UtYmJlMy00NGFmLWEyMzAtNGNiMWVhZTcyZjY3Cg=="
+    }
+  }
+}
+
+UPDATE Response:
+HTTP/1.1 202 Accepted
+</code></pre>
+
+<h3>MESSAGE</h3>
+
+<p>Sent by the executor to send arbitrary binary data to the scheduler. Note that Mesos neither interprets this data nor makes any guarantees about the delivery of this message to the scheduler. The <code>data</code> field is raw bytes encoded in Base64.</p>
+
+<pre><code>MESSAGE Request (JSON):
+
+POST /api/v1/executor  HTTP/1.1
+
+Host: agenthost:5051
+Content-Type: application/json
+Accept: application/json
+
+{
+  "executor_id": {
+    "value": "387aa966-8fc5-4428-a794-5a868a60d3eb"
+  },
+  "framework_id": {
+    "value": "9aaa9d0d-e00d-444f-bfbd-23dd197939a0-0000"
+  },
+  "type": "MESSAGE",
+  "data": "t+Wonz5fRFKMzCnEptlv5A=="
+}
+
+MESSAGE Response:
+HTTP/1.1 202 Accepted
+</code></pre>
+
+<h2>Events</h2>
+
+<p>Executor is expected to keep a <strong>persistent</strong> connection open to &ldquo;/executor&rdquo; endpoint even after getting a <code>SUBSCRIBED</code> HTTP Response event. This is indicated by &ldquo;Connection: keep-alive&rdquo; and &ldquo;Transfer-Encoding: chunked&rdquo; headers with <em>no</em> &ldquo;Content-Length&rdquo; header set. All subsequent events that are relevant to this executor generated by Mesos are streamed on this connection. Agent encodes each Event in <a href="scheduler-http-api.md#recordio-response-format">RecordIO</a> format, i.e., string representation of length of the event in bytes followed by JSON or binary Protobuf  (possibly compressed) encoded event. Note that the value of length will never be â0â and the size of the length will be the size of unsigned integer (i.e., 64 bits). Also, note that the <code>RecordIO</code> encoding should be decoded by the executor whereas the underlying HTTP chunked encoding is typically invisible at the 
 application (executor) layer. The type of content encoding used for the events will be determined by the accept header of the POST request (e.g., &ldquo;Accept: application/json&rdquo;).</p>
+
+<p>The following events are currently sent by the agent. The canonical source of this information is at <a href="include/mesos/v1/executor/executor.proto">executor.proto</a>. Note that when sending JSON encoded events, agent encodes raw bytes in Base64 and strings in UTF-8.</p>
+
+<h3>SUBSCRIBED</h3>
+
+<p>The first event sent by the agent when the executor sends a <code>SUBSCRIBE</code> request on the persistent connection. See <code>SUBSCRIBE</code> in Calls section for the format.</p>
+
+<h3>LAUNCH</h3>
+
+<p>Sent by the agent whenever it needs to assign a new task to the executor. The executor is required to send an <code>UPDATE</code> message back to the agent indicating the success or failure of the task initialization.</p>
+
+<p>The executor must maintain a list of unacknowledged tasks (see <code>SUBSCRIBE</code> in <code>Calls</code> section). If for some reason, the executor is disconnected from the agent, these tasks must be sent as part of <code>SUBSCRIBE</code> request in the <code>tasks</code> field.</p>
+
+<pre><code>LAUNCH Event (JSON)
+&lt;event-length&gt;
+{
+  "type": "LAUNCH",
+  "launch": {
+    "framework_info": {
+      "id": {
+        "value": "49154f1b-8cf6-4421-bf13-8bd11dccd1f1"
+      },
+      "user": "foo",
+      "name": "my_framework"
+    },
+    "task": {
+      "name": "dummy-task",
+      "task_id": {
+        "value": "d40f3f3e-bbe3-44af-a230-4cb1eae72f67"
+      },
+      "agent_id": {
+        "value": "f1c9cdc5-195e-41a7-a0d7-adaa9af07f81"
+      },
+      "command": {
+        "value": "sleep",
+        "arguments": [
+          "100"
+        ]
+      }
+    }
+  }
+}
+</code></pre>
+
+<h3>KILL</h3>
+
+<p>The <code>KILL</code> event is sent whenever the scheduler needs to stop execution of a specific task. The executor is required to send a terminal update (e.g., <code>TASK_FINISHED</code>, <code>TASK_KILLED</code> or <code>TASK_FAILED</code>) back to the agent once it has stopped/killed the task. Mesos will mark the task resources as freed once the terminal update is received.</p>
+
+<pre><code>LAUNCH Event (JSON)
+&lt;event-length&gt;
+{
+  "type" : "KILL",
+  "kill" : {
+    "task_id" : {"value" : "d40f3f3e-bbe3-44af-a230-4cb1eae72f67"}
+  }
+}
+</code></pre>
+
+<h3>ACKNOWLEDGED</h3>
+
+<p>Sent by the agent in order to signal the executor that a status update was received as part of the reliable message passing mechanism. Acknowledged updates must not be retried.</p>
+
+<pre><code>ACKNOWLEDGED Event (JSON)
+&lt;event-length&gt;
+{
+  "type" : "ACKNOWLEDGED",
+  "acknowledged" : {
+    "task_id" : {"value" : "d40f3f3e-bbe3-44af-a230-4cb1eae72f67"},
+    "uuid" : "ZDQwZjNmM2UtYmJlMy00NGFmLWEyMzAtNGNiMWVhZTcyZjY3Cg=="
+  }
+}
+</code></pre>
+
+<h3>MESSAGE</h3>
+
+<p>Custom message generated by the scheduler and forwarded all the way to the executor. These messages are delivered &ldquo;as-is&rdquo; by Mesos and have no delivery guarantees. It is up to the scheduler to retry if a message is dropped for any reason. Note that <code>data</code> is raw bytes encoded as Base64.</p>
+
+<pre><code>MESSAGE Event (JSON)
+&lt;event-length&gt;
+{
+  "type" : "MESSAGE",
+  "message" : {
+    "data" : "c2FtcGxlIGRhdGE="
+  }
+}
+</code></pre>
+
+<h3>SHUTDOWN</h3>
+
+<p>Sent by the agent in order to shutdown the executor. Once an executor gets a <code>SHUTDOWN</code> event it is required to kill all its tasks, send <code>TASK_KILLED</code> updates and gracefully exit. If an executor doesn&rsquo;t terminate within a certain period after the event was emitted (<code>grace_period_seconds</code>), the agent will forcefully destroy the container where the executor is running. The agent would then send <code>TASK_LOST</code> updates for any remaining active tasks of this executor.</p>
+
+<pre><code>SHUTDOWN Event (JSON)
+&lt;event-length&gt;
+{
+  "type" : "SHUTDOWN",
+  "shutdown" : {
+    "grace_period_seconds" : 5
+  }
+}
+</code></pre>
+
+<h3>ERROR</h3>
+
+<p>Sent by the agent when an asynchronous error event is generated. It is recommended that the executor abort when it receives an error event and retry subscription.</p>
+
+<pre><code>ERROR Event (JSON)
+&lt;event-length&gt;
+{
+  "type" : "ERROR",
+  "error" : {
+    "message" : "Unrecoverable error"
+  }
+}
+</code></pre>
+
+<h2>Executor Environment Variables</h2>
+
+<p>The following environment variables are set by the agent that can be used by the executor upon startup:</p>
+
+<ul>
+<li><code>MESOS_FRAMEWORK_ID</code>: <code>FrameworkID</code> of the scheduler needed as part of the <code>SUBSCRIBE</code> call.</li>
+<li><code>MESOS_EXECUTOR_ID</code>: <code>ExecutorID</code> of the executor needed as part of the <code>SUBSCRIBE</code> call.</li>
+<li><code>MESOS_DIRECTORY</code>: Path to the working directory for the executor.</li>
+<li><code>MESOS_AGENT_ENDPOINT</code>: Agent endpoint i.e. ip:port to be used by the executor to connect to the agent.</li>
+<li><code>MESOS_CHECKPOINT</code>: If set to true, denotes that framework has checkpointing enabled.</li>
+</ul>
+
+
+<p>If <code>MESOS_CHECKPOINT</code> is set i.e. when framework checkpointing is enabled, the following additional variables are also set that can be used by the executor for retrying upon a disconnection with the agent:</p>
+
+<ul>
+<li><code>MESOS_RECOVERY_TIMEOUT</code>: The total duration that the executor should spend retrying before shutting itself down when it is disconnected from the agent (e.g., <code>15mins</code>, <code>5secs</code> etc.). This is configurable at agent startup via the flag <code>--recovery_timeout</code>.</li>
+<li><code>MESOS_SUBSCRIPTION_BACKOFF_MAX</code>: The maximum backoff duration to be used by the executor between two retries when disconnected (e.g., <code>250ms</code>, <code>1mins</code> etc.). This is configurable at agent startup via the flag <code>--executor_reregistration_timeout</code>.</li>
+</ul>
+
+
+<p>NOTE: Additionally, the executor also inherits all the agent&rsquo;s environment variables.</p>
+
+<h2>Disconnections</h2>
+
+<p>An executor considers itself disconnected if the persistent subscription connection (opened via SUBSCRIBE request) to &ldquo;/executor&rdquo; breaks. The disconnection can happen due to an agent process failure etc.</p>
+
+<p>Upon detecting a disconnection from the agent, the retry behavior depends on whether framework checkpointing is enabled:</p>
+
+<ul>
+<li>If framework checkpointing is disabled, the executor is not supposed to retry subscription and gracefully exit.</li>
+<li>If framework checkpointing is enabled, the executor is supposed to retry subscription using a suitable <a href="#backoff-strategies">backoff strategy</a> for a duration of <code>MESOS_RECOVERY_TIMEOUT</code>. If it is not able to establish a subscription with the agent within this duration, it should gracefully exit.</li>
+</ul>
+
+
+<h2>Agent Recovery</h2>
+
+<p>Upon agent startup, an agent performs <a href="/documentation/latest/slave-recovery/">recovery</a>. This allows the agent to recover status updates and reconnect with old executors. Currently, the agent supports the following recovery mechanisms specified via the <code>--recover</code> flag:</p>
+
+<ul>
+<li><strong>reconnect</strong> (default): This mode allows the agent to reconnect with any of itâs old live executors provided the framework has enabled checkpointing. The recovery of the agent is only marked complete once all the disconnected executors have connected and hung executors have been destroyed. Hence, it is mandatory that every executor retries at least once within the interval (<code>MESOS_SUBSCRIPTION_BACKOFF_MAX</code>) to ensure it is not shutdown by the agent due to being hung/unresponsive.</li>
+<li><strong>cleanup</strong> : This mode kills any old live executors and then exits the agent. This is usually done by operators when making a non-compatible slave/executor upgrade. Upon receiving a <code>SUBSCRIBE</code> request from the executor of a framework with checkpointing enabled, the agent would send it a <code>SHUTDOWN</code> event as soon as it reconnects. For hung executors, the agent would wait for a duration of <code>--executor_shutdown_grace_period</code> (configurable at agent startup) and then forcefully kill the container where the executor is running in.</li>
+</ul>
+
+
+<h2>Backoff Strategies</h2>
+
+<p>Executors are encouraged to retry subscription using a suitable backoff strategy like linear backoff, when they notice a disconnection with the agent. A disconnection typically happens when the agent process terminates (e.g., restarted for an upgrade). Each retry interval should be bounded by the value of <code>MESOS_SUBSCRIPTION_BACKOFF_MAX</code> which is set as an environment variable.</p>
+
+	</div>
+</div>
+
+			
+	      <hr>
+
+				<!-- footer -->
+	      <div class="footer">
+	        <p>&copy; 2012-2015 <a href="http://apache.org">The Apache Software Foundation</a>.
+	        Apache Mesos, the Apache feather logo, and the Apache Mesos project logo are trademarks of The Apache Software Foundation.<p>
+	      </div><!-- /footer -->
+
+	    </div> <!-- /container -->
+
+	    <!-- JS -->
+	    <script src="//code.jquery.com/jquery-1.11.0.min.js" type="text/javascript"></script>
+			<script src="//netdna.bootstrapcdn.com/bootstrap/3.1.1/js/bootstrap.min.js" type="text/javascript"></script>
+    </body>
+</html>

Modified: mesos/site/publish/documentation/latest/external-containerizer/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/latest/external-containerizer/index.html?rev=1727886&r1=1727885&r2=1727886&view=diff
==============================================================================
--- mesos/site/publish/documentation/latest/external-containerizer/index.html (original)
+++ mesos/site/publish/documentation/latest/external-containerizer/index.html Mon Feb  1 02:49:25 2016
@@ -83,6 +83,9 @@
 	<div class="col-md-8">
 		<h1>External Containerizer</h1>
 
+<p><strong>NOTE:</strong>  The external containerizer is deprecated. See
+<a href="https://issues.apache.org/jira/browse/MESOS-3370">MESOS-3370</a> for details.</p>
+
 <ul>
 <li>EC = external containerizer. A part of the mesos slave that provides
 an API for containerizing via external plugin executables.</li>
@@ -171,7 +174,7 @@ EC.</li>
 <p>A container is in a staging state and now gets started and observed
 until it gets into a final state.</p>
 
-<p><img src="/assets/img/documentation/ec_launch_seqdiag.png?raw=true" alt="Container Launching Scheme" /></p>
+<p><img src="images/ec_launch_seqdiag.png?raw=true" alt="Container Launching Scheme" /></p>
 
 <h3>Container Running</h3>
 
@@ -180,14 +183,14 @@ being in a non terminal state by the sla
 will get triggered multiple times at the ECP over the lifetime of a
 container. Their order however is not determined.</p>
 
-<p><img src="/assets/img/documentation/ec_lifecycle_seqdiag.png?raw=true" alt="Container Running Scheme" /></p>
+<p><img src="images/ec_lifecycle_seqdiag.png?raw=true" alt="Container Running Scheme" /></p>
 
 <h3>Resource Limitation</h3>
 
 <p>While a container is active, a resource limitation was identified
 (e.g. out of memory) by the ECP isolation mechanism of choice.</p>
 
-<p><img src="/assets/img/documentation/ec_kill_seqdiag.png?raw=true" alt="Resource Limitation Scheme" /></p>
+<p><img src="images/ec_kill_seqdiag.png?raw=true" alt="Resource Limitation Scheme" /></p>
 
 <h2>Slave Recovery Overview</h2>
 
@@ -220,14 +223,14 @@ ultimate command reaper.</li>
 
 <p>While containers are active, the slave fails over.</p>
 
-<p><img src="/assets/img/documentation/ec_recover_seqdiag.png?raw=true" alt="Recovery Scheme" /></p>
+<p><img src="images/ec_recover_seqdiag.png?raw=true" alt="Recovery Scheme" /></p>
 
 <h3>Orphan Destruction</h3>
 
 <p>Containers identified by the ECP as being active but not slave state
 recoverable are getting terminated.</p>
 
-<p><img src="/assets/img/documentation/ec_orphan_seqdiag.png?raw=true" alt="Orphan Destruction Scheme" /></p>
+<p><img src="images/ec_orphan_seqdiag.png?raw=true" alt="Orphan Destruction Scheme" /></p>
 
 <h1>Command Details</h1>
 

Modified: mesos/site/publish/documentation/latest/fetcher-cache-internals/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/latest/fetcher-cache-internals/index.html?rev=1727886&r1=1727885&r2=1727886&view=diff
==============================================================================
--- mesos/site/publish/documentation/latest/fetcher-cache-internals/index.html (original)
+++ mesos/site/publish/documentation/latest/fetcher-cache-internals/index.html Mon Feb  1 02:49:25 2016
@@ -136,7 +136,7 @@
 
 <p>Based on this setup, the main program flow in the fetcher process is concerned with assembling a list of parameters to the mesos-fetcher program that describe items to be fetched. This figure illustrates the high-level collaboration of the fetcher process with mesos-fetcher program runs. It also depicts the next level of detail of the fetcher process, which will be described in the following section.</p>
 
-<p><img src="/assets/img/documentation/fetch_components.jpg" alt="Fetcher Separation of Labor" /></p>
+<p><img src="images/fetch_components.jpg" alt="Fetcher Separation of Labor" /></p>
 
 <h3>Cache state representation and manipulation</h3>
 
@@ -148,7 +148,7 @@
 
 <p>This figure illustrates the different states which a cache entry can be in.</p>
 
-<p><img src="/assets/img/documentation/fetch_state.jpg" alt="Fetcher Cache State" /></p>
+<p><img src="images/fetch_state.jpg" alt="Fetcher Cache State" /></p>
 
 <p>While a cache entry is referenced it cannot be evicted by a the current or any other concurrent fetch attempt in order to make space for a download of a new cache file.</p>
 
@@ -171,7 +171,7 @@
 
 <p>As menitoned above, the fetcher process' main control flow concerns sorting out what to do with each URI presented to it in a fetch request. An overview of the ensuing control flow for a given URI is depicted in this figure.</p>
 
-<p><img src="/assets/img/documentation/fetch_flow.jpg" alt="Determining Fetcher Actions" /></p>
+<p><img src="images/fetch_flow.jpg" alt="Determining Fetcher Actions" /></p>
 
 <p>After going through this procedure for each URI, the fetcher process assembles the gathered list of per-URI actions into a JSON object (<code>FetcherInfo</code>), which is passed to the mesos-fetcher program in an environment variable. The possible fetch actions for a URI are shown at the bottom of the flow chart. After they are determined, the fetcher process invokes mesos-fetcher.</p>
 
@@ -204,7 +204,7 @@
 
 <h3>Cache eviction</h3>
 
-<p><img src="/assets/img/documentation/fetch_evict1.jpg" alt="Before eviction" /></p>
+<p><img src="images/fetch_evict1.jpg" alt="Before eviction" /></p>
 
 <p>The resources named &ldquo;A&rdquo; and &ldquo;B&rdquo; have been fetched with caching into sandbox 1 and 2 below. In the course of this, two cache entries have been created and two files have been downloaded into the cache and named &ldquo;1&rdquo; and &ldquo;2&rdquo;. (Cache file names have unique names that comprise serial numbers.)</p>
 
@@ -216,11 +216,11 @@
 </ol>
 
 
-<p><img src="/assets/img/documentation/fetch_evict2.jpg" alt="After eviction" /></p>
+<p><img src="images/fetch_evict2.jpg" alt="After eviction" /></p>
 
 <p>The next figure then shows what happens if the first URI is fetched once again. Here we also assume the cache being so filled up that eviction is necessary and this time the entry and file for &ldquo;B&rdquo; are the victims.</p>
 
-<p><img src="/assets/img/documentation/fetch_evict3.jpg" alt="After another eviction" /></p>
+<p><img src="images/fetch_evict3.jpg" alt="After another eviction" /></p>
 
 	</div>
 </div>

Modified: mesos/site/publish/documentation/latest/fetcher/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/latest/fetcher/index.html?rev=1727886&r1=1727885&r2=1727886&view=diff
==============================================================================
--- mesos/site/publish/documentation/latest/fetcher/index.html (original)
+++ mesos/site/publish/documentation/latest/fetcher/index.html Mon Feb  1 02:49:25 2016
@@ -90,10 +90,11 @@ from local file systems.</p>
 
 <h2>What is the Mesos fetcher?</h2>
 
-<p>The Mesos fetcher is a mechanism to download resources into the sandbox
-directory of a task in preparation of running the task. As part of a TaskInfo
-message, the framework ordering the task&rsquo;s execution provides a list of
-<code>CommandInfo::URI</code> protobuf values, which becomes the input to the Mesos fetcher.</p>
+<p>The Mesos fetcher is a mechanism to download resources into the <a href="/documentation/latest/sandbox/">sandbox
+directory</a> of a task in preparation of running
+the task. As part of a TaskInfo message, the framework ordering the task&rsquo;s
+execution provides a list of <code>CommandInfo::URI</code> protobuf values, which becomes
+the input to the Mesos fetcher.</p>
 
 <p>The Mesos fetcher can copy files from a local filesytem and it also natively
 supports the HTTP, HTTPS, FTP and FTPS protocols. If the requested URI is based
@@ -311,6 +312,42 @@ falls back on bypassing the cache for th
 separate space goals. However, leftover freed up space from one effort is
 automatically awarded to others.</p>
 
+<h2>HTTP and SOCKS proxy settings</h2>
+
+<p>Sometimes it is desirable to use a proxy to download the file. The Mesos
+fetcher uses libcurl internally for downloading content from
+HTTP/HTTPS/FTP/FTPS servers, and libcurl can use a proxy automatically if
+certain environment variables are set.</p>
+
+<p>The respective environment variable name is <code>[protocol]_proxy</code>, where
+<code>protocol</code> can be one of socks4, socks5, http, https.</p>
+
+<p>For example, the value of the <code>http_proxy</code> environment variable would be used
+as the proxy for fetching http contents, while <code>https_proxy</code> would be used for
+fetching https contents. Pay attention that these variable names must be
+entirely in lower case.</p>
+
+<p>The value of the proxy variable is of the format
+<code>[protocol://][user:password@]machine[:port]</code>, where <code>protocol</code> can be one of
+socks4, socks5, http, https.</p>
+
+<p>FTP/FTPS requests with a proxy also make use of a HTTP/HTTPS proxy. Even
+though in general this constrains the available FTP protocol operations,
+everything the fetcher uses is supported.</p>
+
+<p>Your proxy settings can be placed in <code>/etc/default/mesos-slave</code>. Here is an
+example:</p>
+
+<pre><code>export http_proxy=https://proxy.example.com:3128
+export https_proxy=https://proxy.example.com:3128
+</code></pre>
+
+<p>The fetcher will pick up these environment variable settings since the utility
+program <code>mesos-fetcher</code> which it employs is a child of mesos-slave.</p>
+
+<p>For more details, please check the
+<a href="http://curl.haxx.se/libcurl/c/libcurl-tutorial.html">libcurl manual</a>.</p>
+
 <h2>Slave flags</h2>
 
 <p>It is highly recommended to set these flags explicitly to values other than

Modified: mesos/site/publish/documentation/latest/frameworks/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/latest/frameworks/index.html?rev=1727886&r1=1727885&r2=1727886&view=diff
==============================================================================
--- mesos/site/publish/documentation/latest/frameworks/index.html (original)
+++ mesos/site/publish/documentation/latest/frameworks/index.html Mon Feb  1 02:49:25 2016
@@ -113,6 +113,7 @@
 <li><a href="https://github.com/airbnb/chronos">Chronos</a> is a distributed job scheduler that supports complex job topologies. It can be used as a more fault-tolerant replacement for Cron.</li>
 <li><a href="https://github.com/jenkinsci/mesos-plugin">Jenkins</a> is a continuous integration server. The mesos-jenkins plugin allows it to dynamically launch workers on a Mesos cluster depending on the workload.</li>
 <li><a href="http://www.grandlogic.com/content/html_docs/jobserver.html">JobServer</a> is a distributed job scheduler and processor  which allows developers to build custom batch processing Tasklets using point and click web UI.</li>
+<li><a href="https://bitbucket.org/osallou/go-docker">GoDocker</a> is a batch computing job scheduler like SGE, Torque, etc. It schedules batch computing tasks via webui, API or CLI for system or LDAP users, mounting their home directory or other shared resources in a Docker container. It targets scientists, not developers, and provides plugin mechanisms to extend or modify the default behavior.</li>
 </ul>
 
 
@@ -122,6 +123,7 @@
 <li><a href="https://github.com/mesosphere/cassandra-mesos">Cassandra</a> is a performant and highly available distributed database. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.</li>
 <li><a href="https://github.com/mesosphere/elasticsearch-mesos">ElasticSearch</a> is a distributed search engine. Mesos makes it easy to run and scale.</li>
 <li><a href="https://code.google.com/p/hypertable/wiki/Mesos">Hypertable</a> is a high performance, scalable, distributed storage and processing system for structured and unstructured data.</li>
+<li><a href="http://tachyon-project.org">Tachyon</a> is a memory-centric distributed storage system enabling reliable data sharing at memory-speed across cluster frameworks.</li>
 </ul>
 
 

Modified: mesos/site/publish/documentation/latest/getting-started/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/latest/getting-started/index.html?rev=1727886&r1=1727885&r2=1727886&view=diff
==============================================================================
--- mesos/site/publish/documentation/latest/getting-started/index.html (original)
+++ mesos/site/publish/documentation/latest/getting-started/index.html Mon Feb  1 02:49:25 2016
@@ -87,22 +87,22 @@
 
 <p>There are different ways you can get Mesos:</p>
 
-<ol>
-<li><p>Download the latest stable release from <a href="http://mesos.apache.org/downloads/">Apache</a> (<strong><em>Recommended</em></strong>)</p>
+<p>1. Download the latest stable release from <a href="http://mesos.apache.org/downloads/">Apache</a> (<strong><em>Recommended</em></strong>)</p>
 
-<pre><code> $ wget http://www.apache.org/dist/mesos/0.26.0/mesos-0.26.0.tar.gz
- $ tar -zxf mesos-0.26.0.tar.gz
-</code></pre></li>
-<li><p>Clone the Mesos git <a href="https://git-wip-us.apache.org/repos/asf/mesos.git">repository</a> (<strong><em>Advanced Users Only</em></strong>)</p>
-
-<pre><code> $ git clone https://git-wip-us.apache.org/repos/asf/mesos.git
-</code></pre></li>
-</ol>
+<pre><code>$ wget http://www.apache.org/dist/mesos/0.27.0/mesos-0.27.0.tar.gz
+$ tar -zxf mesos-0.27.0.tar.gz
+</code></pre>
+
+<p>2. Clone the Mesos git <a href="https://git-wip-us.apache.org/repos/asf/mesos.git">repository</a> (<strong><em>Advanced Users Only</em></strong>)</p>
+
+<pre><code>$ git clone https://git-wip-us.apache.org/repos/asf/mesos.git
+</code></pre>
 
+<p><em>NOTE: If you have problems running the above commands, you may need to first run through the </em><strong>System Requirements</strong><em> section below to install the <code>wget</code>, <code>tar</code>, and <code>git</code> utilities for your system.</em></p>
 
 <h2>System Requirements</h2>
 
-<p>Mesos runs on Linux (64 Bit) and Mac OS X (64 Bit).</p>
+<p>Mesos runs on Linux (64 Bit) and Mac OS X (64 Bit). To build Mesos from source, GCC 4.8.1+ or Clang 3.5+ is required.</p>
 
 <p>For full support of process isolation under Linux a recent kernel >=3.10 is required.</p>
 
@@ -112,165 +112,200 @@
 
 <p>Following are the instructions for stock Ubuntu 14.04. If you are using a different OS, please install the packages accordingly.</p>
 
-<pre><code>    # Update the packages.
-    $ sudo apt-get update
+<pre><code># Update the packages.
+$ sudo apt-get update
 
-    # Install the latest OpenJDK.
-    $ sudo apt-get install -y openjdk-7-jdk
+# Install a few utility tools.
+$ sudo apt-get install -y tar wget git
 
-    # Install autotools (Only necessary if building from git repository).
-    $ sudo apt-get install -y autoconf libtool
+# Install the latest OpenJDK.
+$ sudo apt-get install -y openjdk-7-jdk
 
-    # Install other Mesos dependencies.
-    $ sudo apt-get -y install build-essential python-dev python-boto libcurl4-nss-dev libsasl2-dev maven libapr1-dev libsvn-dev
+# Install autotools (Only necessary if building from git repository).
+$ sudo apt-get install -y autoconf libtool
+
+# Install other Mesos dependencies.
+$ sudo apt-get -y install build-essential python-dev python-boto libcurl4-nss-dev libsasl2-dev libsasl2-modules maven libapr1-dev libsvn-dev
 </code></pre>
 
-<h3>Mac OS X Yosemite</h3>
+<h3>Mac OS X Yosemite &amp; El Capitan</h3>
+
+<p>Following are the instructions for stock Mac OS X Yosemite and El Capitan. If you are using a different OS, please install the packages accordingly.</p>
 
-<p>Following are the instructions for stock Mac OS X Yosemite. If you are using a different OS, please install the packages accordingly.</p>
+<pre><code># Install Command Line Tools.
+$ xcode-select --install
 
-<pre><code>    # Install Command Line Tools.
-    $ xcode-select --install
+# Install Homebrew.
+$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
 
-    # Install Homebrew.
-    $ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
+# Install Java.
+$ brew install Caskroom/cask/java
 
-    # Install libraries.
-    $ brew install autoconf automake libtool subversion maven
+# Install libraries.
+$ brew install wget git autoconf automake libtool subversion maven
 </code></pre>
 
+<p><em>NOTE: When upgrading from Yosemite to El Capitan, make sure to rerun <code>xcode-select --install</code> after the upgrade.</em></p>
+
 <h3>CentOS 6.6</h3>
 
 <p>Following are the instructions for stock CentOS 6.6. If you are using a different OS, please install the packages accordingly.</p>
 
-<pre><code>    # Install a recent kernel for full support of process isolation.
-    $ sudo rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
-    $ sudo rpm -Uvh http://www.elrepo.org/elrepo-release-6-6.el6.elrepo.noarch.rpm
-    $ sudo yum --enablerepo=elrepo-kernel install -y kernel-lt
-
-    # Make the just installed kernel the one booted by default, and reboot.
-    $ sudo sed -i 's/default=1/default=0/g' /boot/grub/grub.conf
-    $ sudo reboot
-
-    # Install a few utility tools. This also forces an update of `nss`,
-    # which is necessary for the Java bindings to build properly.
-    $ sudo yum install -y tar wget which nss
-
-    # 'Mesos &gt; 0.21.0' requires a C++ compiler with full C++11 support,
-    # (e.g. GCC &gt; 4.8) which is available via 'devtoolset-2'.
-    # Fetch the Scientific Linux CERN devtoolset repo file.
-    $ sudo wget -O /etc/yum.repos.d/slc6-devtoolset.repo http://linuxsoft.cern.ch/cern/devtoolset/slc6-devtoolset.repo
-
-    # Import the CERN GPG key.
-    $ sudo rpm --import http://linuxsoft.cern.ch/cern/centos/7/os/x86_64/RPM-GPG-KEY-cern
-
-    # Fetch the Apache Maven repo file.
-    $ sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
-
-    # 'Mesos &gt; 0.21.0' requires 'subversion &gt; 1.8' devel package, which is
-    # not available in the default repositories.
-    # Add the WANdisco SVN repo file: '/etc/yum.repos.d/wandisco-svn.repo' with content:
-
-      [WANdiscoSVN]
-      name=WANdisco SVN Repo 1.8
-      enabled=1
-      baseurl=http://opensource.wandisco.com/centos/6/svn-1.8/RPMS/$basearch/
-      gpgcheck=1
-      gpgkey=http://opensource.wandisco.com/RPM-GPG-KEY-WANdisco
-
-    # Install essential development tools.
-    $ sudo yum groupinstall -y "Development Tools"
-
-    # Install 'devtoolset-2-toolchain' which includes GCC 4.8.2 and related packages.
-    $ sudo yum install -y devtoolset-2-toolchain
-
-    # Install other Mesos dependencies.
-    $ sudo yum install -y apache-maven python-devel java-1.7.0-openjdk-devel zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 apr-devel subversion-devel apr-util-devel
-
-    # Enter a shell with 'devtoolset-2' enabled.
-    $ scl enable devtoolset-2 bash
-    $ g++ --version  # Make sure you've got GCC &gt; 4.8!
+<pre><code># Install a recent kernel for full support of process isolation.
+$ sudo rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
+$ sudo rpm -Uvh http://www.elrepo.org/elrepo-release-6-6.el6.elrepo.noarch.rpm
+$ sudo yum --enablerepo=elrepo-kernel install -y kernel-lt
+
+# Make the just installed kernel the one booted by default, and reboot.
+$ sudo sed -i 's/default=1/default=0/g' /boot/grub/grub.conf
+$ sudo reboot
+
+# Install a few utility tools. This also forces an update of `nss`,
+# which is necessary for the Java bindings to build properly.
+$ sudo yum install -y tar wget git which nss
+
+# 'Mesos &gt; 0.21.0' requires a C++ compiler with full C++11 support,
+# (e.g. GCC &gt; 4.8) which is available via 'devtoolset-2'.
+# Fetch the Scientific Linux CERN devtoolset repo file.
+$ sudo wget -O /etc/yum.repos.d/slc6-devtoolset.repo http://linuxsoft.cern.ch/cern/devtoolset/slc6-devtoolset.repo
+
+# Import the CERN GPG key.
+$ sudo rpm --import http://linuxsoft.cern.ch/cern/centos/7/os/x86_64/RPM-GPG-KEY-cern
+
+# Fetch the Apache Maven repo file.
+$ sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
+
+# 'Mesos &gt; 0.21.0' requires 'subversion &gt; 1.8' devel package, which is
+# not available in the default repositories.
+# Create a WANdisco SVN repo file to install the correct version:
+$ sudo cat &gt; /etc/yum.repos.d/wandisco-svn.repo &lt;&lt;EOF
+[WANdiscoSVN]
+name=WANdisco SVN Repo 1.8
+enabled=1
+baseurl=http://opensource.wandisco.com/centos/6/svn-1.8/RPMS/$basearch/
+gpgcheck=1
+gpgkey=http://opensource.wandisco.com/RPM-GPG-KEY-WANdisco
+EOF
+
+# Install essential development tools.
+$ sudo yum groupinstall -y "Development Tools"
+
+# Install 'devtoolset-2-toolchain' which includes GCC 4.8.2 and related packages.
+$ sudo yum install -y devtoolset-2-toolchain
+
+# Install other Mesos dependencies.
+$ sudo yum install -y apache-maven python-devel java-1.7.0-openjdk-devel zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 apr-devel subversion-devel apr-util-devel
+
+# Enter a shell with 'devtoolset-2' enabled.
+$ scl enable devtoolset-2 bash
+$ g++ --version  # Make sure you've got GCC &gt; 4.8!
+
+# Process isolation is using cgroups that are managed by 'cgconfig'.
+# The 'cgconfig' service is not started by default on CentOS 6.6.
+# Also the default configuration does not attach the 'perf_event' subsystem.
+# To do this, add 'perf_event = /cgroup/perf_event;' to the entries in '/etc/cgconfig.conf'.
+$ sudo yum install -y libcgroup
+$ sudo service cgconfig start
 </code></pre>
 
 <h3>CentOS 7.1</h3>
 
 <p>Following are the instructions for stock CentOS 7.1. If you are using a different OS, please install the packages accordingly.</p>
 
-<pre><code>    # Install a few utility tools
-    $ sudo yum install -y tar wget
+<pre><code># Install a few utility tools
+$ sudo yum install -y tar wget git
 
-    # Fetch the Apache Maven repo file.
-    $ sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
+# Fetch the Apache Maven repo file.
+$ sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
 
-    # 'Mesos &gt; 0.21.0' requires 'subversion &gt; 1.8' devel package, which is
-    # not available in the default repositories.
-    # Add the WANdisco SVN repo file: '/etc/yum.repos.d/wandisco-svn.repo' with content:
-
-      [WANdiscoSVN]
-      name=WANdisco SVN Repo 1.9
-      enabled=1
-      baseurl=http://opensource.wandisco.com/centos/7/svn-1.9/RPMS/$basearch/
-      gpgcheck=1
-      gpgkey=http://opensource.wandisco.com/RPM-GPG-KEY-WANdisco
+# Install the EPEL repo so that we can pull in 'libserf-1' as part of our
+# subversion install below.
+$ sudo yum install -y epel-release
+
+# 'Mesos &gt; 0.21.0' requires 'subversion &gt; 1.8' devel package,
+# which is not available in the default repositories.
+# Create a WANdisco SVN repo file to install the correct version:
+$ sudo cat &gt; /etc/yum.repos.d/wandisco-svn.repo &lt;&lt;EOF
+[WANdiscoSVN]
+name=WANdisco SVN Repo 1.9
+enabled=1
+baseurl=http://opensource.wandisco.com/centos/7/svn-1.9/RPMS/$basearch/
+gpgcheck=1
+gpgkey=http://opensource.wandisco.com/RPM-GPG-KEY-WANdisco
+EOF
+
+# Parts of Mesos require systemd in order to operate. However, Mesos
+# only supports versions of systemd that contain the 'Delegate' flag.
+# This flag was first introduced in 'systemd version 218', which is
+# lower than the default version installed by centos. Luckily, centos
+# 7.1 has a patched 'systemd &lt; 218' that contains the 'Delegate' flag.
+# Explicity update systemd to this patched version.
+$ sudo yum update systemd
 
-    # Install essential development tools.
-    $ sudo yum groupinstall -y "Development Tools"
+# Install essential development tools.
+$ sudo yum groupinstall -y "Development Tools"
 
-    # Install other Mesos dependencies.
-    $ sudo yum install -y apache-maven python-devel java-1.7.0-openjdk-devel zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 apr-devel subversion-devel apr-util-devel
+# Install other Mesos dependencies.
+$ sudo yum install -y apache-maven python-devel java-1.8.0-openjdk-devel zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 apr-devel subversion-devel apr-util-devel
 </code></pre>
 
 <h2>Building Mesos</h2>
 
-<pre><code>    # Change working directory.
-    $ cd mesos
+<pre><code># Change working directory.
+$ cd mesos
 
-    # Bootstrap (Only required if building from git repository).
-    $ ./bootstrap
+# Bootstrap (Only required if building from git repository).
+$ ./bootstrap
 
-    # Configure and build.
-    $ mkdir build
-    $ cd build
-    $ ../configure
-    $ make
+# Configure and build.
+$ mkdir build
+$ cd build
+$ ../configure
+$ make
 </code></pre>
 
 <p>In order to speed up the build and reduce verbosity of the logs, you can append <code>-j &lt;number of cores&gt; V=0</code> to <code>make</code>.</p>
 
-<pre><code>    # Run test suite.
-    $ make check
+<pre><code># Run test suite.
+$ make check
 
-    # Install (Optional).
-    $ make install
+# Install (Optional).
+$ make install
 </code></pre>
 
 <h2>Examples</h2>
 
-<p>Mesos comes bundled with example frameworks written in C++, Java and Python.</p>
+<p>Mesos comes bundled with example frameworks written in C++, Java and Python.
+The framework binaries will only be available after running <code>make check</code>, as
+described in the <strong><em>Building Mesos</em></strong> section above.</p>
 
-<pre><code>    # Change into build directory.
-    $ cd build
+<pre><code># Change into build directory.
+$ cd build
 
-    # Start mesos master (Ensure work directory exists and has proper permissions).
-    $ ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/var/lib/mesos
+# Start mesos master (Ensure work directory exists and has proper permissions).
+$ ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/var/lib/mesos
 
-    # Start mesos slave.
-    $ ./bin/mesos-slave.sh --master=127.0.0.1:5050
+# Start mesos slave.
+$ ./bin/mesos-slave.sh --master=127.0.0.1:5050
 
-    # Visit the mesos web page.
-    $ http://127.0.0.1:5050
+# Visit the mesos web page.
+$ http://127.0.0.1:5050
 
-    # Run C++ framework (Exits after successfully running some tasks.).
-    $ ./src/test-framework --master=127.0.0.1:5050
+# Run C++ framework (Exits after successfully running some tasks.).
+$ ./src/test-framework --master=127.0.0.1:5050
 
-    # Run Java framework (Exits after successfully running some tasks.).
-    $ ./src/examples/java/test-framework 127.0.0.1:5050
+# Run Java framework (Exits after successfully running some tasks.).
+$ ./src/examples/java/test-framework 127.0.0.1:5050
 
-    # Run Python framework (Exits after successfully running some tasks.).
-    $ ./src/examples/python/test-framework 127.0.0.1:5050
+# Run Python framework (Exits after successfully running some tasks.).
+$ ./src/examples/python/test-framework 127.0.0.1:5050
 </code></pre>
 
-<p><em>NOTE: To build the example frameworks, make sure you build the test suite by doing <code>make check</code>.</em></p>
+<p><em>Note: These examples assume you are running Mesos on your local machine.
+Following them will not allow you to access the Mesos web page in a production
+environment (e.g. on AWS). For that you will need to specify the actual IP of
+your host when launching the Mesos master and ensure your firewall settings
+allow access to port 5050 from the outside world.</em></p>
 
 	</div>
 </div>

Added: mesos/site/publish/documentation/latest/high-availability-framework-guide/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/latest/high-availability-framework-guide/index.html?rev=1727886&view=auto
==============================================================================
--- mesos/site/publish/documentation/latest/high-availability-framework-guide/index.html (added)
+++ mesos/site/publish/documentation/latest/high-availability-framework-guide/index.html Mon Feb  1 02:49:25 2016
@@ -0,0 +1,416 @@
+<!DOCTYPE html>
+<html>
+    <head>
+        <meta charset="utf-8">
+        <title></title>
+		    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+
+		    <link href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+		    <link rel="alternate" type="application/atom+xml" title="Apache Mesos Blog" href="/blog/feed.xml">
+		    
+		    <link href="../../../assets/css/main.css" media="screen" rel="stylesheet" type="text/css" />
+				
+		    
+			
+			<!-- Google Analytics Magic -->
+			<script type="text/javascript">
+			  var _gaq = _gaq || [];
+			  _gaq.push(['_setAccount', 'UA-20226872-1']);
+			  _gaq.push(['_setDomainName', 'apache.org']);
+			  _gaq.push(['_trackPageview']);
+
+			  (function() {
+			    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+			    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+			    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+			  })();
+			</script>
+    </head>
+    <body>
+			<!-- magical breadcrumbs -->
+			<div class="topnav">
+			<ul class="breadcrumb">
+			  <li>
+					<div class="dropdown">
+					  <a data-toggle="dropdown" href="#">Apache Software Foundation <span class="caret"></span></a>
+					  <ul class="dropdown-menu" role="menu" aria-labelledby="dLabel">
+							<li><a href="http://www.apache.org">Apache Homepage</a></li>
+							<li><a href="http://www.apache.org/licenses/">License</a></li>
+					  	<li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li>  
+					  	<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+							<li><a href="http://www.apache.org/security/">Security</a></li>
+					  </ul>
+					</div>
+				</li>
+				<li><a href="http://mesos.apache.org">Apache Mesos</a></li>
+				
+				
+					<li><a href="/documentation
+/">Documentation
+</a></li>
+				
+				
+			</ul><!-- /breadcrumb -->
+			</div>
+			
+			<!-- navbar excitement -->
+	    <div class="navbar navbar-static-top" role="navigation">
+	      <div class="navbar-inner">
+	        <div class="container">
+						<a href="/" class="logo"><img src="/assets/img/mesos_logo.png" alt="Apache Mesos logo" /></a>
+					<div class="nav-collapse">
+						<ul class="nav nav-pills navbar-right">
+						  <li><a href="/gettingstarted/">Getting Started</a></li>
+						  <li><a href="/documentation/latest/">Documentation</a></li>
+						  <li><a href="/downloads/">Downloads</a></li>
+						  <li><a href="/community/">Community</a></li>
+						</ul>
+					</div>
+	        </div>
+	      </div>
+	    </div><!-- /.navbar -->
+
+      <div class="container">
+
+			<div class="row-fluid">
+	<div class="col-md-4">
+		<h4>If you're new to Mesos</h4>
+		<p>See the <a href="/gettingstarted/">getting started</a> page for more information about downloading, building, and deploying Mesos.</p>
+		
+		<h4>If you'd like to get involved or you're looking for support</h4>
+		<p>See our <a href="/community/">community</a> page for more details.</p>
+	</div>
+	<div class="col-md-8">
+		<h1>Designing Highly Available Mesos Frameworks</h1>
+
+<p>A Mesos framework manages tasks. For a Mesos framework to be highly available,
+it must continue to manage tasks correctly in the presence of a variety of
+failure scenarios. The most common failure conditions that framework authors
+should consider include:</p>
+
+<ul>
+<li><p>The Mesos master that a framework scheduler is connected to might fail, for
+example by crashing or by losing network connectivity. If the master has been
+configured to use <a href="/documentation/latest/high-availability/">high-availability mode</a>, this will
+result in promoting another Mesos master replica to become the current
+leader. In this situation, the scheduler should re-register with the new
+master and ensure that task state is consistent.</p></li>
+<li><p>The host where a framework scheduler is running might fail. To ensure that the
+framework remains available and can continue to schedule new tasks, framework
+authors should ensure that multiple copies of the scheduler run on different
+nodes, and that a backup copy is promoted to become the new leader when the
+previous leader fails. Mesos itself does not dictate how framework authors
+should handle this situation, although we provide some suggestions below. It
+can be useful to deploy multiple copies of your framework scheduler using
+a long-running task scheduler such as Apache Aurora or Marathon.</p></li>
+<li><p>The host where a task is running might fail. Alternatively, the node itself
+might not have failed but the Mesos agent on the node might be unable to
+communicate with the Mesos master, e.g., due to a network partition.</p></li>
+</ul>
+
+
+<p>Note that more than one of these failures might occur simultaneously.</p>
+
+<h2>Mesos Architecture</h2>
+
+<p>Before discussing the specific failure scenarios outlined above, it is worth
+highlighting some aspects of how Mesos is designed that influence high
+availability:</p>
+
+<ul>
+<li><p>Mesos provides unreliable messaging between components by default: messages
+are delivered &ldquo;at-most-once&rdquo; (they might be dropped). Framework authors should
+expect that messages they send might not be received and be prepared to take
+appropriate corrective action. To detect that a message might be lost,
+frameworks typically use timeouts. For example, if a framework attempts to
+launch a task, that message might not be received by the Mesos master (e.g.,
+due to a transient network failure). To address this, the framework scheduler
+should set a timeout after attempting to launch a new task. If the scheduler
+hasn&rsquo;t seen a status update for the new task before the timeout fires, it
+should take corrective action&mdash;for example, by performing <a href="/documentation/latest/reconciliation/">task state reconciliation</a>,
+and then launching a new copy of the task if necessary.</p>
+
+<ul>
+<li><p>In general, distributed systems cannot distinguish between &ldquo;lost&rdquo; messages
+and messages that are merely delayed. In the example above, the scheduler
+might see a status update for the first task launch attempt immediately
+<em>after</em> its timeout has fired and it has already begun taking corrective
+action. Scheduler authors should be aware of this possibility and program
+accordingly.</p></li>
+<li><p>Mesos actually provides ordered (but unreliable) message delivery between
+any two pair of processes: for example, if a framework sends messages M1 and
+M2 to the master, the master might receive no messages, just M1, just M2, or
+M1 followed by M2 &ndash; it will <em>not</em> receive M2 followed by M1.</p></li>
+<li><p>As a convenience for framework authors, Mesos provides reliable delivery of
+task status updates. The agent persists task status updates to disk and then
+forwards them to the master. The master sends status updates to the
+appropriate framework scheduler. When a scheduler acknowledges a status
+update, the master forwards the acknowledgment back to the agent, which
+allows the stored status update to be garbage collected. If the agent does
+not receive an acknowledgment for a task status update within a certain
+amount of time, it will repeatedly resend the status update to the master,
+which will again forward the update to the scheduler. Hence, task status
+updates will be delivered &ldquo;at least once&rdquo;, assuming that the agent and the
+scheduler both remain available. To handle the fact that task status updates
+might be delivered more than once, it can be helpful to make the framework
+logic that processes them <a href="https://en.wikipedia.org/wiki/Idempotence">idempotent</a>.</p></li>
+</ul>
+</li>
+<li><p>The Mesos master stores information about the active tasks and registered
+frameworks <em>in memory</em>: it does not persist it to disk or attempt to ensure
+that this information is preserved after a master failover. This helps the
+Mesos master scale to large clusters with many tasks and frameworks. A
+downside of this design is that after a failure, more work is required to
+recover the lost in-memory master state.</p></li>
+<li><p>If all the Mesos masters are unavailable (e.g., crashed or unreachable), the
+cluster should continue to operate: existing Mesos agents and user tasks should
+continue running. However, new tasks cannot be scheduled, and frameworks will
+not receive resource offers or status updates about previously launched tasks.</p></li>
+<li><p>Mesos does not dictate how frameworks should be implemented and does not try
+to assume responsibility for how frameworks should deal with failures.
+Instead, Mesos tries to provide framework developers with the tools they need
+to implement this behavior themselves. Different frameworks might choose to
+handle failures differently, depending on their exact requirements.</p></li>
+</ul>
+
+
+<h2>Recommendations for Highly Available Frameworks</h2>
+
+<p>Highly available framework designs typically follow a few common patterns:</p>
+
+<ol>
+<li><p>To tolerate scheduler failures, frameworks run multiple scheduler instances
+(three instances is typical). At any given time, only one of these scheduler
+instances is the <em>leader</em>: this instance is connected to the Mesos master,
+receives resource offers and task status updates, and launches new tasks. The
+other scheduler replicas are <em>followers</em>: they are used only when the leader
+fails, in which case one of the followers is chosen to become the new leader.</p></li>
+<li><p>Schedulers need a mechanism to decide when the current scheduler leader has
+failed and to elect a new leader. This is typically accomplished using a
+coordination service like <a href="https://zookeeper.apache.org/">Apache ZooKeeper</a>
+or <a href="https://github.com/coreos/etcd">etcd</a>. Consult the documentation of the
+coordination system you are using for more information on how to correctly
+implement leader election.</p></li>
+<li><p>After electing a new leading scheduler, the new leader needs to ensure that
+its local state is consistent with the current state of the cluster. For
+example, suppose that the previous leading scheduler attempted to launch a
+new task and then immediately failed. The task might have launched
+successfully, at which point the newly elected leader will begin to receive
+status updates about it. To handle this situation, frameworks typically use a
+strongly consistent distributed data store to record information about active
+and pending tasks. In fact, the same coordination service that is used for
+leader election (such as ZooKeeper or etcd) can often be used for this
+purpose. Some Mesos frameworks (such as Apache Aurora) use the Mesos
+replicated log for this purpose.</p>
+
+<ul>
+<li><p>The data store should be used to record the actions that the scheduler
+<em>intends</em> to take, before it takes them. For example, if a scheduler
+decides to launch a new task, it <em>first</em> writes this intent to its data
+store. Then it sends a &ldquo;launch task&rdquo; message to the Mesos master. If this
+instance of the scheduler fails and a new scheduler is promoted to become
+the leader, the new leader can consult the data store to find <em>all possible
+tasks</em> that might be running on the cluster. This is an instance of the
+<a href="https://en.wikipedia.org/wiki/Write-ahead_logging">write-ahead logging</a>
+pattern often employed by database systems and filesystems to improve
+reliability. Two aspects of this design are worth emphasizing.</p>
+
+<ol>
+<li><p>The scheduler must persist its intent <em>before</em> launching the task: if
+the task is launched first and then the scheduler fails before it can
+write to the data store, the new leading scheduler won&rsquo;t know about the
+new task. If this occurs, the new scheduler instance will begin
+receiving task status updates for a task that it has no knowledge of;
+there is often not a good way to recover from this situation.</p></li>
+<li><p>Second, the scheduler should ensure that its intent has been durably
+recorded in the data store before continuing to launch the task (for
+example, it should wait for a quorum of replicas in the data store to
+have acknowledged receipt of the write operation). For more details on
+how to do this, consult the documentation for the data store you are
+using.</p></li>
+</ol>
+</li>
+</ul>
+</li>
+</ol>
+
+
+<h2>The Life Cycle of a Task</h2>
+
+<p>A Mesos task transitions through a sequence of states. The authoritative &ldquo;source
+of truth&rdquo; for the current state of a task is the agent on which the task is
+running. A framework scheduler learns about the current state of a task by
+communicating with the Mesos master&mdash;specifically, by listening for task status
+updates and by performing task state reconciliation.</p>
+
+<p>Frameworks can represent the state of a task using a state machine, with one
+initial state and several possible terminal states:</p>
+
+<ul>
+<li><p>A task begins in the <code>TASK_STAGING</code> state. A task is in this state when the
+master has received the framework&rsquo;s request to launch the task but the task
+has not yet started to run. In this state, the task&rsquo;s dependencies are
+fetched&mdash;for example, using the <a href="/documentation/latest/fetcher/">Mesos fetcher cache</a>.</p></li>
+<li><p>The <code>TASK_STARTING</code> state is optional and intended primarily for use by
+custom executors. It can be used to describe the fact that a custom executor
+has learned about the task (and maybe started fetching its dependencies) but has
+not yet started to run it.</p></li>
+<li><p>A task transitions to the <code>TASK_RUNNING</code> state after it starts running
+successfully (if the task fails to start, it transitions to one of the
+terminal states listed below).</p>
+
+<ul>
+<li><p>If a framework attempts to launch a task but does not receive a status
+update for it within a timeout, the framework should perform
+<a href="/documentation/latest/reconciliation/">reconciliation</a>. That is, it should ask the master for
+the current state of the task. The master will reply with <code>TASK_LOST</code> for
+unknown tasks. The framework can then use this to distinguish between tasks
+that are slow to launch and tasks that the master has never heard about
+(e.g., because the task launch message was dropped).</p>
+
+<ul>
+<li>Note that the correctness of this technique depends on the fact that
+messaging between the scheduler and the master is ordered.</li>
+</ul>
+</li>
+</ul>
+</li>
+<li><p>There are several terminal states:</p>
+
+<ul>
+<li><code>TASK_FINISHED</code> is used when a task completes successfully.</li>
+<li><code>TASK_FAILED</code> indicates that a task aborted with an error.</li>
+<li><code>TASK_KILLED</code> indicates that a task was killed by the executor.</li>
+<li><code>TASK_LOST</code> indicates that the task was running on an agent that has lost
+contact with the current master (typically due to a network partition or the
+agent host crashing). This case is described further below.</li>
+<li><code>TASK_ERROR</code> indicates that a task launch attempt failed because of an error
+in the task specification.</li>
+</ul>
+</li>
+</ul>
+
+
+<h2>Dealing with Partitioned or Failed Agents</h2>
+
+<p>The Mesos master keeps track of the availability and health of the registered agents
+by 2 different mechanisms.</p>
+
+<p> 1) State of a persistent TCP connection to the agent.</p>
+
+<p> 2) Health checks via periodic ping messages to the agent which are expected to be responded with pongs
+    (this behavior is controlled by the <code>--slave_ping_timeout</code> and <code>--max_slave_ping_timeouts</code> master flags).</p>
+
+<p>If the persistent TCP connection to the agent breaks or the agent fails health checks, the master decides
+that the agent has failed and takes steps to remove it from the cluster. Specifically:</p>
+
+<ul>
+<li><p>If the TCP connection breaks, the agent is considered disconnected. The semantics when a registered
+agent gets disconnected are as follows for each framework running on that agent:</p>
+
+<ul>
+<li><p>If the framework is <a href="/documentation/latest/slave-recovery/">checkpointing</a>: No immediate action is taken. The agent is
+given a chance to reconnect until health checks time out.</p></li>
+<li><p>If the framework is not-checkpointing: All the framework&rsquo;s tasks and executors are considered lost. Master
+immediately sends <code>TASK_LOST</code> status updates for the tasks. These updates are not delivered reliably to the
+scheduler (see NOTE below). The agent is given a chance to reconnect until health checks timeout.</p></li>
+</ul>
+</li>
+<li><p>If the agent fails health checks it is scheduled for removal. The removals can be rate limited by the master
+(see <code>---slave_removal_rate_limit</code> master flag) to avoid removing a slew of slaves at once (e.g., during a
+network partition event).</p></li>
+<li><p>Once it is time to remove an agent, the master marks it as &ldquo;removed&rdquo; in the master&rsquo;s durable state (this
+will survive master failover). If an agent marked as &ldquo;removed&rdquo; attempts to reconnect to the
+master (e.g., after network partition is restored), the connection attempt will be refused
+and the agent asked to shutdown. A shutting down agent shuts down all running tasks and executors,
+but any persistent volumes and dynamic reservations are still preserved.</p>
+
+<ul>
+<li>To allow the removed agent node to rejoin the cluster, a new <code>mesos-slave</code>
+process can be started. This will ensure the agent receives a new agent ID and register with master
+possibly with previously created persistent volumes and dynamic reservations. In effect, the agent will
+be treated as a newly joined agent.</li>
+</ul>
+</li>
+<li><p>For each agent that is marked &ldquo;removed&rdquo; the scheduler receives a <code>slaveLost</code> callback and <code>TASK_LOST</code> status
+updates for each task that was running on the agent</p>
+
+<blockquote><p>NOTE: Neither the callback nor the updates are reliably delivered by the master. For example if
+  the master or scheduler fails over or there is a network connection issue during the delivery
+  of these messages, they will not be resent.</p></blockquote></li>
+</ul>
+
+
+<p>Typically, frameworks respond to this situation by scheduling new copies of the
+tasks that were running on the lost agent. This should be done with caution,
+however: it is possible that the lost agent is still alive, but is partitioned
+from the master and is unable to communicate with it. Depending on the nature of
+the network partition, tasks on the agent might still be able to communicate
+with external clients or other hosts in the cluster. Frameworks can take steps
+to prevent this (e.g., by having tasks connect to ZooKeeper and cease operation
+if their ZooKeeper session expires), but Mesos leaves such details to framework
+authors.</p>
+
+<h2>Dealing with Partitioned or Failed Masters</h2>
+
+<p>The behavior described above does not apply during the period immediately after
+a new Mesos master is elected. As noted above, most Mesos master state is kept
+in-memory; hence, when the leading master fails and a new master is elected, the
+new master will have little knowledge of the current state of the cluster.
+Instead, it rebuilds this information as the frameworks and agents notice that a
+new master has been elected and then <em>reregister</em> with it.</p>
+
+<h3>Framework Reregistration</h3>
+
+<p>When master failover occurs, frameworks that were connected to the previous
+leading master should reconnect to the new leading master. The
+<code>MesosSchedulerDriver</code> handles most of the details of detecting when the
+previous leading master has failed and connecting to the new leader; when the
+framework has successfully reregistered with the new leading master, the
+<code>reregistered</code> scheduler callback will be invoked.</p>
+
+<p>When a highly available framework scheduler initially connects to the master, it
+should set the <code>failover_timeout</code> field in its <code>FrameworkInfo</code>. This specifies
+how long the master will wait for a framework to reconnect after a failover
+before the framework&rsquo;s state is garbage-collected and any running tasks
+associated with the framework are killed. It is recommended that frameworks set
+a generous <code>failover_timeout</code> (e.g., 1 week) to avoid their tasks being killed
+unintentionally.</p>
+
+<h3>Agent Reregistration</h3>
+
+<p>During the period after a new master has been elected but before a given agent
+has reregistered or the <code>slave_reregister_timeout</code> has fired, attempting to
+reconcile the state of a task running on that agent will not return any
+information (because the master cannot accurately determine the state of the
+task).</p>
+
+<p>If an agent does not reregister with the new master within a timeout (controlled
+by the <code>--slave_reregister_timeout</code> configuration flag), the master marks the
+agent as failed and follows the same steps described above. However, there is
+one difference: by default, agents are <em>allowed to reconnect</em> following master
+failover, even after the <code>slave_reregister_timeout</code> has fired. This means that
+frameworks might see a <code>TASK_LOST</code> update for a task but then later discover
+that the task is running (because the agent where it was running was allowed to
+reconnect). This behavior can be avoided by enabling the <code>--registry_strict</code>
+configuration flag, which will be the default in a future version of Mesos.</p>
+
+	</div>
+</div>
+
+			
+	      <hr>
+
+				<!-- footer -->
+	      <div class="footer">
+	        <p>&copy; 2012-2015 <a href="http://apache.org">The Apache Software Foundation</a>.
+	        Apache Mesos, the Apache feather logo, and the Apache Mesos project logo are trademarks of The Apache Software Foundation.<p>
+	      </div><!-- /footer -->
+
+	    </div> <!-- /container -->
+
+	    <!-- JS -->
+	    <script src="//code.jquery.com/jquery-1.11.0.min.js" type="text/javascript"></script>
+			<script src="//netdna.bootstrapcdn.com/bootstrap/3.1.1/js/bootstrap.min.js" type="text/javascript"></script>
+    </body>
+</html>