You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mesos.apache.org by dl...@apache.org on 2015/10/09 10:10:09 UTC
svn commit: r1707671 [1/3] - in /mesos/site: publish/ publish/documentation/ publish/documentation/committers/ publish/documentation/framework-rate-limiting/ publish/documentation/high-availability/ publish/documentation/latest/ publish/documentation/l...

Author: dlester
Date: Fri Oct  9 08:10:08 2015
New Revision: 1707671

URL: http://svn.apache.org/viewvc?rev=1707671&view=rev
Log:
Updates documentation.

Added:
    mesos/site/publish/documentation/latest/networking-for-mesos-managed-containers/
    mesos/site/publish/documentation/latest/networking-for-mesos-managed-containers/index.html
    mesos/site/publish/documentation/networking-for-mesos-managed-containers/
    mesos/site/publish/documentation/networking-for-mesos-managed-containers/index.html
    mesos/site/source/documentation/latest/networking-for-mesos-managed-containers.md
Modified:
    mesos/site/publish/documentation/committers/index.html
    mesos/site/publish/documentation/framework-rate-limiting/index.html
    mesos/site/publish/documentation/high-availability/index.html
    mesos/site/publish/documentation/index.html
    mesos/site/publish/documentation/latest/committers/index.html
    mesos/site/publish/documentation/latest/framework-rate-limiting/index.html
    mesos/site/publish/documentation/latest/high-availability/index.html
    mesos/site/publish/documentation/latest/index.html
    mesos/site/publish/documentation/latest/operational-guide/index.html
    mesos/site/publish/documentation/latest/upgrades/index.html
    mesos/site/publish/documentation/operational-guide/index.html
    mesos/site/publish/documentation/upgrades/index.html
    mesos/site/publish/sitemap.xml
    mesos/site/source/documentation/latest.html.md
    mesos/site/source/documentation/latest/committers.md
    mesos/site/source/documentation/latest/framework-rate-limiting.md
    mesos/site/source/documentation/latest/high-availability.md
    mesos/site/source/documentation/latest/operational-guide.md
    mesos/site/source/documentation/latest/upgrades.md

Modified: mesos/site/publish/documentation/committers/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/committers/index.html?rev=1707671&r1=1707670&r2=1707671&view=diff
==============================================================================
--- mesos/site/publish/documentation/committers/index.html (original)
+++ mesos/site/publish/documentation/committers/index.html Fri Oct  9 08:10:08 2015
@@ -87,7 +87,7 @@
 
 <h2>Becoming a committer</h2>
 
-<p>Every new committer has to be proposed by a current committer and then voted in by the members of the Mesos PMC. For details about this process and for candidate requirements see the general <a href="https://community.apache.org/newcommitter.html">Apache guidelines for assessing new candidates for committership</a>. Candidates prepare for their nomination as committer by contributing to the Mesos project and its community, by acting according to the <a href="http://theapacheway.com">Apache Way</a>, and by generally following the path <a href="https://community.apache.org/contributors/">from contributor to committer</a> for Apache projects. Specifically for the Mesos project, you can make use of the <a href="https://community.apache.org/committer-candidate-checklist/">Apache Mesos Committer Candidate Checklist</a> for suggestions of what kind of contributions and demonstrated behaviors can be instrumental, and to keep track of your progress.</p>
+<p>Every new committer has to be proposed by a current committer and then voted in by the members of the Mesos PMC. For details about this process and for candidate requirements see the general <a href="https://community.apache.org/newcommitter.html">Apache guidelines for assessing new candidates for committership</a>. Candidates prepare for their nomination as committer by contributing to the Mesos project and its community, by acting according to the <a href="http://theapacheway.com">Apache Way</a>, and by generally following the path <a href="https://community.apache.org/contributors/">from contributor to committer</a> for Apache projects. Specifically for the Mesos project, you can make use of the <a href="/documentation/latest/committer-candidate-checklist/">Apache Mesos Committer Candidate Checklist</a> for suggestions of what kind of contributions and demonstrated behaviors can be instrumental, and to keep track of your progress.</p>
 
 <h2>Current Committers</h2>
 

Modified: mesos/site/publish/documentation/framework-rate-limiting/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/framework-rate-limiting/index.html?rev=1707671&r1=1707670&r2=1707671&view=diff
==============================================================================
--- mesos/site/publish/documentation/framework-rate-limiting/index.html (original)
+++ mesos/site/publish/documentation/framework-rate-limiting/index.html Fri Oct  9 08:10:08 2015
@@ -89,7 +89,7 @@
 
 <p>In a multi-framework environment, this feature aims to protect the throughput of high-SLA (e.g., production, service) frameworks by having the master throttle messages from other (e.g., development, batch) frameworks.</p>
 
-<p>To throttle messages from a framework, the Mesos cluster operator sets a <code>qps</code> (queries per seconds) value for each framework identified by its principal (You can also throttle a group of frameworks together but we&rsquo;ll assume individual frameworks in this doc unless otherwise stated; see the <code>RateLimits</code> <a href="https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto">ProtoBuf definition</a> and the configuration notes below). The master then promises not to process messages from that framework at a rate above <code>qps</code>. The outstanding messages are stored in memory on the master.</p>
+<p>To throttle messages from a framework, the Mesos cluster operator sets a <code>qps</code> (queries per seconds) value for each framework identified by its principal (You can also throttle a group of frameworks together but we&rsquo;ll assume individual frameworks in this doc unless otherwise stated; see the <code>RateLimits</code> <a href="https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto">Protobuf definition</a> and the configuration notes below). The master then promises not to process messages from that framework at a rate above <code>qps</code>. The outstanding messages are stored in memory on the master.</p>
 
 <h2>Rate Limits Configuration</h2>
 

Modified: mesos/site/publish/documentation/high-availability/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/high-availability/index.html?rev=1707671&r1=1707670&r2=1707671&view=diff
==============================================================================
--- mesos/site/publish/documentation/high-availability/index.html (original)
+++ mesos/site/publish/documentation/high-availability/index.html Fri Oct  9 08:10:08 2015
@@ -83,7 +83,7 @@
 	<div class="col-md-8">
 		<h1>Mesos High Availability Mode</h1>
 
-<p>Mesos has a high-availability mode that uses multiple Mesos masters; one active master (called the leader or leading master) and several backups in case it fails. The masters elect the leader, with <a href="http://zookeeper.apache.org/">Apache ZooKeeper</a> both coordinating the election and handling leader detection by masters, slaves, and scheduler drivers. More information regarding <a href="http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection">how leader election works</a> is available on the Apache Zookeeper website.</p>
+<p>If the Mesos master is unavailable, existing tasks can continue to execute, but new resources cannot be allocated and new tasks cannot be launched. To reduce the chance of this situation occurring, Mesos has a high-availability mode that uses multiple Mesos masters: one active master (called the <em>leader</em> or leading master) and several <em>backups</em> in case it fails. The masters elect the leader, with <a href="http://zookeeper.apache.org/">Apache ZooKeeper</a> both coordinating the election and handling leader detection by masters, slaves, and scheduler drivers. More information regarding <a href="http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection">how leader election works</a> is available on the Apache Zookeeper website.</p>
 
 <p><strong>Note</strong>: This document assumes you know how to start, run, and work with ZooKeeper, whose client library is included in the standard Mesos build.</p>
 
@@ -96,7 +96,7 @@
 <li><p>Provide the znode path to all masters, slaves, and framework schedulers as follows:</p>
 
 <ul>
-<li><p>Start the mesos-master binaries using the <code>--zk</code> flag, e.g. `&ndash;zk=zk://host1:port1,host2:port2,&hellip;/path'</p></li>
+<li><p>Start the mesos-master binaries using the <code>--zk</code> flag, e.g. <code>--zk=zk://host1:port1,host2:port2,.../path</code></p></li>
 <li><p>Start the mesos-slave binaries with <code>--master=zk://host1:port1,host2:port2,.../path</code></p></li>
 <li><p>Start any framework schedulers using the same <code>zk</code> path as in the last two steps. The SchedulerDriver must be constructed with this path, as shown in the <a href="http://mesos.apache.org/documentation/latest/app-framework-development-guide/">Framework Development Guide</a>.</p></li>
 </ul>
@@ -108,40 +108,21 @@
 
 <p>Refer to the <a href="http://mesos.apache.org/documentation/latest/app-framework-development-guide/">Scheduler API</a> for how to deal with leadership changes.</p>
 
-<h2>Implementation Details</h2>
-
-<p>Mesos implements two levels of ZooKeeper leader election abstractions, one in <code>src/zookeeper</code> and the other in <code>src/master</code> (look for <code>contender|detector.hpp|cpp</code>).</p>
-
-<ul>
-<li><p>The lower level <code>LeaderContender</code> and <code>LeaderDetector</code> implement a generic ZooKeeper election algorithm loosely modeled after this
-<a href="http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection">recipe</a> (sans herd effect handling due to the master group&rsquo;s small size, which is often 3).</p></li>
-<li><p>The higher level <code>MasterContender</code> and <code>MasterDetector</code> wrap around ZooKeeper&rsquo;s contender and detector abstractions as adapters to provide/interpret the ZooKeeper data.</p></li>
-<li><p>Each Mesos master simultaneously uses both a contender and a detector to try to elect themselves and detect who the current leader is. A separate detector is necessary because each master&rsquo;s WebUI redirects browser traffic to the current leader when that master is not elected. Other Mesos components (i.e. slaves and scheduler drivers) use the detector to find the current leader and connect to it.</p></li>
-</ul>
-
-
-<p>The notion of the group of leader candidates is implemented in <code>Group</code>. This abstraction handles reliable (through queues and retries of retryable errors under the covers) ZooKeeper group membership registration, cancellation, and monitoring. It watches for several ZooKeeper session events:</p>
-
-<ul>
-<li>Connection</li>
-<li>Reconnection</li>
-<li>Session Expiration</li>
-<li>ZNode creation, deletion, updates</li>
-</ul>
-
-
-<p>We also explicitly timeout our sessions when disconnected from ZooKeeper for a specified amount of time. See <code>MASTER_CONTENDER_ZK_SESSION_TIMEOUT</code> and <code>MASTER_DETECTOR_ZK_SESSION_TIMEOUT</code>. This is because the ZooKeeper client libraries only notify of session expiration upon reconnection. These timeouts are of particular interest for network partitions.</p>
-
 <h2>Component Disconnection Handling</h2>
 
-<p>When a network partition disconnects a component (master, slave, scheduler driver) from ZooKeeper, the component&rsquo;s Master Detector induces a timeout event. This notifies the component that it has no leading master. Depending on the component, the following happens. (Note that while a component is disconnected from ZooKeeper, a master may still be in communication with slaves or schedulers and vice versa.)</p>
+<p>When a network partition disconnects a component (master, slave, or scheduler driver) from ZooKeeper, the component&rsquo;s Master Detector induces a timeout event. This notifies the component that it has no leading master. Depending on the component, the following happens. (Note that while a component is disconnected from ZooKeeper, a master may still be in communication with slaves or schedulers and vice versa.)</p>
 
 <ul>
 <li><p>Slaves disconnected from ZooKeeper no longer know which master is the leader. They ignore messages from masters to ensure they don&rsquo;t act on a non-leader&rsquo;s decisions. When a slave reconnects to ZooKeeper, ZooKeeper informs it of the current leader and the slave stops ignoring messages from the leader.</p></li>
 <li><p>Masters enter leaderless state irrespective of whether they are a leader or not before the disconnection.</p>
 
 <ul>
-<li><p>If the leader was disconnected from ZooKeeper, it aborts its process. The user/developer/administrator can start a new, connected to ZooKeeper, master instance that starts as a backup.</p></li>
+<li><p>If the leader was disconnected from ZooKeeper, it aborts its process. The user/developer/administrator can then start a new master instance which will try to reconnect to ZooKeeper.</p>
+
+<ul>
+<li>Note that many production deployments of Mesos use a process supervisor (such as systemd or supervisord) that is configured to automatically restart the Mesos master if the process aborts unexpectedly.</li>
+</ul>
+</li>
 <li><p>Otherwise, the disconnected backup waits to reconnect with ZooKeeper and possibly get elected as the new leading master.</p></li>
 </ul>
 </li>
@@ -154,10 +135,34 @@
 <ul>
 <li><p>The slave fails health checks from the leader.</p></li>
 <li><p>The leader marks the slave as deactivated and sends its tasks to the LOST state. The  <a href="http://mesos.apache.org/documentation/latest/app-framework-development-guide/">Framework Development Guide</a> describes these various task states.</p></li>
-<li><p>Deactivated slaves may not re-register with the leader, and are told to shut down upon any post-deactivation communication.</p></li>
+<li><p>Deactivated slaves may not re-register with the leader and are told to shut down upon any post-deactivation communication.</p></li>
+</ul>
+
+
+<h2>Implementation Details</h2>
+
+<p>Mesos implements two levels of ZooKeeper leader election abstractions, one in <code>src/zookeeper</code> and the other in <code>src/master</code> (look for <code>contender|detector.hpp|cpp</code>).</p>
+
+<ul>
+<li><p>The lower level <code>LeaderContender</code> and <code>LeaderDetector</code> implement a generic ZooKeeper election algorithm loosely modeled after this
+<a href="http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection">recipe</a> (sans herd effect handling due to the master group&rsquo;s small size, which is often 3).</p></li>
+<li><p>The higher level <code>MasterContender</code> and <code>MasterDetector</code> wrap around ZooKeeper&rsquo;s contender and detector abstractions as adapters to provide/interpret the ZooKeeper data.</p></li>
+<li><p>Each Mesos master simultaneously uses both a contender and a detector to try to elect themselves and detect who the current leader is. A separate detector is necessary because each master&rsquo;s WebUI redirects browser traffic to the current leader when that master is not elected. Other Mesos components (i.e., slaves and scheduler drivers) use the detector to find the current leader and connect to it.</p></li>
 </ul>
 
 
+<p>The notion of the group of leader candidates is implemented in <code>Group</code>. This abstraction handles reliable (through queues and retries of retryable errors under the covers) ZooKeeper group membership registration, cancellation, and monitoring. It watches for several ZooKeeper session events:</p>
+
+<ul>
+<li>Connection</li>
+<li>Reconnection</li>
+<li>Session Expiration</li>
+<li>ZNode creation, deletion, updates</li>
+</ul>
+
+
+<p>We also explicitly timeout our sessions when disconnected from ZooKeeper for a specified amount of time. See <code>MASTER_CONTENDER_ZK_SESSION_TIMEOUT</code> and <code>MASTER_DETECTOR_ZK_SESSION_TIMEOUT</code>. This is because the ZooKeeper client libraries only notify of session expiration upon reconnection. These timeouts are of particular interest for network partitions.</p>
+
 	</div>
 </div>
 

Modified: mesos/site/publish/documentation/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/index.html?rev=1707671&r1=1707670&r2=1707671&view=diff
==============================================================================
--- mesos/site/publish/documentation/index.html (original)
+++ mesos/site/publish/documentation/index.html Fri Oct  9 08:10:08 2015
@@ -119,6 +119,7 @@
 <ul>
 <li><a href="/documentation/attributes-resources/">Attributes and Resources</a> for how to describe the slaves that comprise a cluster.</li>
 <li><a href="/documentation/latest/fetcher/">Fetcher Cache</a> for how to configure the Mesos fetcher cache.</li>
+<li><a href="/documentation/latest/networking-for-mesos-managed-containers/">Networking for Mesos-managed Containers</a></li>
 <li><a href="/documentation/latest/oversubscription/">Oversubscription</a> for how to configure Mesos to take advantage of unused resources to launch &ldquo;best-effort&rdquo; tasks.</li>
 <li><a href="/documentation/latest/persistent-volume/">Persistent Volume</a> for how to allow tasks to access persistent storage resources.</li>
 <li><a href="/documentation/latest/reservation/">Reservation</a> for how to configure Mesos to allow slaves to reserve resources.</li>

Modified: mesos/site/publish/documentation/latest/committers/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/latest/committers/index.html?rev=1707671&r1=1707670&r2=1707671&view=diff
==============================================================================
--- mesos/site/publish/documentation/latest/committers/index.html (original)
+++ mesos/site/publish/documentation/latest/committers/index.html Fri Oct  9 08:10:08 2015
@@ -87,7 +87,7 @@
 
 <h2>Becoming a committer</h2>
 
-<p>Every new committer has to be proposed by a current committer and then voted in by the members of the Mesos PMC. For details about this process and for candidate requirements see the general <a href="https://community.apache.org/newcommitter.html">Apache guidelines for assessing new candidates for committership</a>. Candidates prepare for their nomination as committer by contributing to the Mesos project and its community, by acting according to the <a href="http://theapacheway.com">Apache Way</a>, and by generally following the path <a href="https://community.apache.org/contributors/">from contributor to committer</a> for Apache projects. Specifically for the Mesos project, you can make use of the <a href="https://community.apache.org/committer-candidate-checklist/">Apache Mesos Committer Candidate Checklist</a> for suggestions of what kind of contributions and demonstrated behaviors can be instrumental, and to keep track of your progress.</p>
+<p>Every new committer has to be proposed by a current committer and then voted in by the members of the Mesos PMC. For details about this process and for candidate requirements see the general <a href="https://community.apache.org/newcommitter.html">Apache guidelines for assessing new candidates for committership</a>. Candidates prepare for their nomination as committer by contributing to the Mesos project and its community, by acting according to the <a href="http://theapacheway.com">Apache Way</a>, and by generally following the path <a href="https://community.apache.org/contributors/">from contributor to committer</a> for Apache projects. Specifically for the Mesos project, you can make use of the <a href="/documentation/latest/committer-candidate-checklist/">Apache Mesos Committer Candidate Checklist</a> for suggestions of what kind of contributions and demonstrated behaviors can be instrumental, and to keep track of your progress.</p>
 
 <h2>Current Committers</h2>
 

Modified: mesos/site/publish/documentation/latest/framework-rate-limiting/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/latest/framework-rate-limiting/index.html?rev=1707671&r1=1707670&r2=1707671&view=diff
==============================================================================
--- mesos/site/publish/documentation/latest/framework-rate-limiting/index.html (original)
+++ mesos/site/publish/documentation/latest/framework-rate-limiting/index.html Fri Oct  9 08:10:08 2015
@@ -89,7 +89,7 @@
 
 <p>In a multi-framework environment, this feature aims to protect the throughput of high-SLA (e.g., production, service) frameworks by having the master throttle messages from other (e.g., development, batch) frameworks.</p>
 
-<p>To throttle messages from a framework, the Mesos cluster operator sets a <code>qps</code> (queries per seconds) value for each framework identified by its principal (You can also throttle a group of frameworks together but we&rsquo;ll assume individual frameworks in this doc unless otherwise stated; see the <code>RateLimits</code> <a href="https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto">ProtoBuf definition</a> and the configuration notes below). The master then promises not to process messages from that framework at a rate above <code>qps</code>. The outstanding messages are stored in memory on the master.</p>
+<p>To throttle messages from a framework, the Mesos cluster operator sets a <code>qps</code> (queries per seconds) value for each framework identified by its principal (You can also throttle a group of frameworks together but we&rsquo;ll assume individual frameworks in this doc unless otherwise stated; see the <code>RateLimits</code> <a href="https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto">Protobuf definition</a> and the configuration notes below). The master then promises not to process messages from that framework at a rate above <code>qps</code>. The outstanding messages are stored in memory on the master.</p>
 
 <h2>Rate Limits Configuration</h2>
 

Modified: mesos/site/publish/documentation/latest/high-availability/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/latest/high-availability/index.html?rev=1707671&r1=1707670&r2=1707671&view=diff
==============================================================================
--- mesos/site/publish/documentation/latest/high-availability/index.html (original)
+++ mesos/site/publish/documentation/latest/high-availability/index.html Fri Oct  9 08:10:08 2015
@@ -83,7 +83,7 @@
 	<div class="col-md-8">
 		<h1>Mesos High Availability Mode</h1>
 
-<p>Mesos has a high-availability mode that uses multiple Mesos masters; one active master (called the leader or leading master) and several backups in case it fails. The masters elect the leader, with <a href="http://zookeeper.apache.org/">Apache ZooKeeper</a> both coordinating the election and handling leader detection by masters, slaves, and scheduler drivers. More information regarding <a href="http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection">how leader election works</a> is available on the Apache Zookeeper website.</p>
+<p>If the Mesos master is unavailable, existing tasks can continue to execute, but new resources cannot be allocated and new tasks cannot be launched. To reduce the chance of this situation occurring, Mesos has a high-availability mode that uses multiple Mesos masters: one active master (called the <em>leader</em> or leading master) and several <em>backups</em> in case it fails. The masters elect the leader, with <a href="http://zookeeper.apache.org/">Apache ZooKeeper</a> both coordinating the election and handling leader detection by masters, slaves, and scheduler drivers. More information regarding <a href="http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection">how leader election works</a> is available on the Apache Zookeeper website.</p>
 
 <p><strong>Note</strong>: This document assumes you know how to start, run, and work with ZooKeeper, whose client library is included in the standard Mesos build.</p>
 
@@ -96,7 +96,7 @@
 <li><p>Provide the znode path to all masters, slaves, and framework schedulers as follows:</p>
 
 <ul>
-<li><p>Start the mesos-master binaries using the <code>--zk</code> flag, e.g. `&ndash;zk=zk://host1:port1,host2:port2,&hellip;/path'</p></li>
+<li><p>Start the mesos-master binaries using the <code>--zk</code> flag, e.g. <code>--zk=zk://host1:port1,host2:port2,.../path</code></p></li>
 <li><p>Start the mesos-slave binaries with <code>--master=zk://host1:port1,host2:port2,.../path</code></p></li>
 <li><p>Start any framework schedulers using the same <code>zk</code> path as in the last two steps. The SchedulerDriver must be constructed with this path, as shown in the <a href="http://mesos.apache.org/documentation/latest/app-framework-development-guide/">Framework Development Guide</a>.</p></li>
 </ul>
@@ -108,40 +108,21 @@
 
 <p>Refer to the <a href="http://mesos.apache.org/documentation/latest/app-framework-development-guide/">Scheduler API</a> for how to deal with leadership changes.</p>
 
-<h2>Implementation Details</h2>
-
-<p>Mesos implements two levels of ZooKeeper leader election abstractions, one in <code>src/zookeeper</code> and the other in <code>src/master</code> (look for <code>contender|detector.hpp|cpp</code>).</p>
-
-<ul>
-<li><p>The lower level <code>LeaderContender</code> and <code>LeaderDetector</code> implement a generic ZooKeeper election algorithm loosely modeled after this
-<a href="http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection">recipe</a> (sans herd effect handling due to the master group&rsquo;s small size, which is often 3).</p></li>
-<li><p>The higher level <code>MasterContender</code> and <code>MasterDetector</code> wrap around ZooKeeper&rsquo;s contender and detector abstractions as adapters to provide/interpret the ZooKeeper data.</p></li>
-<li><p>Each Mesos master simultaneously uses both a contender and a detector to try to elect themselves and detect who the current leader is. A separate detector is necessary because each master&rsquo;s WebUI redirects browser traffic to the current leader when that master is not elected. Other Mesos components (i.e. slaves and scheduler drivers) use the detector to find the current leader and connect to it.</p></li>
-</ul>
-
-
-<p>The notion of the group of leader candidates is implemented in <code>Group</code>. This abstraction handles reliable (through queues and retries of retryable errors under the covers) ZooKeeper group membership registration, cancellation, and monitoring. It watches for several ZooKeeper session events:</p>
-
-<ul>
-<li>Connection</li>
-<li>Reconnection</li>
-<li>Session Expiration</li>
-<li>ZNode creation, deletion, updates</li>
-</ul>
-
-
-<p>We also explicitly timeout our sessions when disconnected from ZooKeeper for a specified amount of time. See <code>MASTER_CONTENDER_ZK_SESSION_TIMEOUT</code> and <code>MASTER_DETECTOR_ZK_SESSION_TIMEOUT</code>. This is because the ZooKeeper client libraries only notify of session expiration upon reconnection. These timeouts are of particular interest for network partitions.</p>
-
 <h2>Component Disconnection Handling</h2>
 
-<p>When a network partition disconnects a component (master, slave, scheduler driver) from ZooKeeper, the component&rsquo;s Master Detector induces a timeout event. This notifies the component that it has no leading master. Depending on the component, the following happens. (Note that while a component is disconnected from ZooKeeper, a master may still be in communication with slaves or schedulers and vice versa.)</p>
+<p>When a network partition disconnects a component (master, slave, or scheduler driver) from ZooKeeper, the component&rsquo;s Master Detector induces a timeout event. This notifies the component that it has no leading master. Depending on the component, the following happens. (Note that while a component is disconnected from ZooKeeper, a master may still be in communication with slaves or schedulers and vice versa.)</p>
 
 <ul>
 <li><p>Slaves disconnected from ZooKeeper no longer know which master is the leader. They ignore messages from masters to ensure they don&rsquo;t act on a non-leader&rsquo;s decisions. When a slave reconnects to ZooKeeper, ZooKeeper informs it of the current leader and the slave stops ignoring messages from the leader.</p></li>
 <li><p>Masters enter leaderless state irrespective of whether they are a leader or not before the disconnection.</p>
 
 <ul>
-<li><p>If the leader was disconnected from ZooKeeper, it aborts its process. The user/developer/administrator can start a new, connected to ZooKeeper, master instance that starts as a backup.</p></li>
+<li><p>If the leader was disconnected from ZooKeeper, it aborts its process. The user/developer/administrator can then start a new master instance which will try to reconnect to ZooKeeper.</p>
+
+<ul>
+<li>Note that many production deployments of Mesos use a process supervisor (such as systemd or supervisord) that is configured to automatically restart the Mesos master if the process aborts unexpectedly.</li>
+</ul>
+</li>
 <li><p>Otherwise, the disconnected backup waits to reconnect with ZooKeeper and possibly get elected as the new leading master.</p></li>
 </ul>
 </li>
@@ -154,10 +135,34 @@
 <ul>
 <li><p>The slave fails health checks from the leader.</p></li>
 <li><p>The leader marks the slave as deactivated and sends its tasks to the LOST state. The  <a href="http://mesos.apache.org/documentation/latest/app-framework-development-guide/">Framework Development Guide</a> describes these various task states.</p></li>
-<li><p>Deactivated slaves may not re-register with the leader, and are told to shut down upon any post-deactivation communication.</p></li>
+<li><p>Deactivated slaves may not re-register with the leader and are told to shut down upon any post-deactivation communication.</p></li>
+</ul>
+
+
+<h2>Implementation Details</h2>
+
+<p>Mesos implements two levels of ZooKeeper leader election abstractions, one in <code>src/zookeeper</code> and the other in <code>src/master</code> (look for <code>contender|detector.hpp|cpp</code>).</p>
+
+<ul>
+<li><p>The lower level <code>LeaderContender</code> and <code>LeaderDetector</code> implement a generic ZooKeeper election algorithm loosely modeled after this
+<a href="http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection">recipe</a> (sans herd effect handling due to the master group&rsquo;s small size, which is often 3).</p></li>
+<li><p>The higher level <code>MasterContender</code> and <code>MasterDetector</code> wrap around ZooKeeper&rsquo;s contender and detector abstractions as adapters to provide/interpret the ZooKeeper data.</p></li>
+<li><p>Each Mesos master simultaneously uses both a contender and a detector to try to elect themselves and detect who the current leader is. A separate detector is necessary because each master&rsquo;s WebUI redirects browser traffic to the current leader when that master is not elected. Other Mesos components (i.e., slaves and scheduler drivers) use the detector to find the current leader and connect to it.</p></li>
 </ul>
 
 
+<p>The notion of the group of leader candidates is implemented in <code>Group</code>. This abstraction handles reliable (through queues and retries of retryable errors under the covers) ZooKeeper group membership registration, cancellation, and monitoring. It watches for several ZooKeeper session events:</p>
+
+<ul>
+<li>Connection</li>
+<li>Reconnection</li>
+<li>Session Expiration</li>
+<li>ZNode creation, deletion, updates</li>
+</ul>
+
+
+<p>We also explicitly timeout our sessions when disconnected from ZooKeeper for a specified amount of time. See <code>MASTER_CONTENDER_ZK_SESSION_TIMEOUT</code> and <code>MASTER_DETECTOR_ZK_SESSION_TIMEOUT</code>. This is because the ZooKeeper client libraries only notify of session expiration upon reconnection. These timeouts are of particular interest for network partitions.</p>
+
 	</div>
 </div>
 

Modified: mesos/site/publish/documentation/latest/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/latest/index.html?rev=1707671&r1=1707670&r2=1707671&view=diff
==============================================================================
--- mesos/site/publish/documentation/latest/index.html (original)
+++ mesos/site/publish/documentation/latest/index.html Fri Oct  9 08:10:08 2015
@@ -119,6 +119,7 @@
 <ul>
 <li><a href="/documentation/attributes-resources/">Attributes and Resources</a> for how to describe the slaves that comprise a cluster.</li>
 <li><a href="/documentation/latest/fetcher/">Fetcher Cache</a> for how to configure the Mesos fetcher cache.</li>
+<li><a href="/documentation/latest/networking-for-mesos-managed-containers/">Networking for Mesos-managed Containers</a></li>
 <li><a href="/documentation/latest/oversubscription/">Oversubscription</a> for how to configure Mesos to take advantage of unused resources to launch &ldquo;best-effort&rdquo; tasks.</li>
 <li><a href="/documentation/latest/persistent-volume/">Persistent Volume</a> for how to allow tasks to access persistent storage resources.</li>
 <li><a href="/documentation/latest/reservation/">Reservation</a> for how to configure Mesos to allow slaves to reserve resources.</li>

Added: mesos/site/publish/documentation/latest/networking-for-mesos-managed-containers/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/latest/networking-for-mesos-managed-containers/index.html?rev=1707671&view=auto
==============================================================================
--- mesos/site/publish/documentation/latest/networking-for-mesos-managed-containers/index.html (added)
+++ mesos/site/publish/documentation/latest/networking-for-mesos-managed-containers/index.html Fri Oct  9 08:10:08 2015
@@ -0,0 +1,415 @@
+<!DOCTYPE html>
+<html>
+    <head>
+        <meta charset="utf-8">
+        <title></title>
+		    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+
+		    <link href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+		    <link rel="alternate" type="application/atom+xml" title="Apache Mesos Blog" href="/blog/feed.xml">
+		    
+		    <link href="../../../assets/css/main.css" media="screen" rel="stylesheet" type="text/css" />
+				
+		    
+			
+			<!-- Google Analytics Magic -->
+			<script type="text/javascript">
+			  var _gaq = _gaq || [];
+			  _gaq.push(['_setAccount', 'UA-20226872-1']);
+			  _gaq.push(['_setDomainName', 'apache.org']);
+			  _gaq.push(['_trackPageview']);
+
+			  (function() {
+			    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+			    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+			    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+			  })();
+			</script>
+    </head>
+    <body>
+			<!-- magical breadcrumbs -->
+			<div class="topnav">
+			<ul class="breadcrumb">
+			  <li>
+					<div class="dropdown">
+					  <a data-toggle="dropdown" href="#">Apache Software Foundation <span class="caret"></span></a>
+					  <ul class="dropdown-menu" role="menu" aria-labelledby="dLabel">
+							<li><a href="http://www.apache.org">Apache Homepage</a></li>
+							<li><a href="http://www.apache.org/licenses/">License</a></li>
+					  	<li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li>  
+					  	<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+							<li><a href="http://www.apache.org/security/">Security</a></li>
+					  </ul>
+					</div>
+				</li>
+				<li><a href="http://mesos.apache.org">Apache Mesos</a></li>
+				
+				
+					<li><a href="/documentation
+/">Documentation
+</a></li>
+				
+				
+			</ul><!-- /breadcrumb -->
+			</div>
+			
+			<!-- navbar excitement -->
+	    <div class="navbar navbar-static-top" role="navigation">
+	      <div class="navbar-inner">
+	        <div class="container">
+						<a href="/" class="logo"><img src="/assets/img/mesos_logo.png" alt="Apache Mesos logo" /></a>
+					<div class="nav-collapse">
+						<ul class="nav nav-pills navbar-right">
+						  <li><a href="/gettingstarted/">Getting Started</a></li>
+						  <li><a href="/documentation/latest/">Documentation</a></li>
+						  <li><a href="/downloads/">Downloads</a></li>
+						  <li><a href="/community/">Community</a></li>
+						</ul>
+					</div>
+	        </div>
+	      </div>
+	    </div><!-- /.navbar -->
+
+      <div class="container">
+
+			<div class="row-fluid">
+	<div class="col-md-4">
+		<h4>If you're new to Mesos</h4>
+		<p>See the <a href="/gettingstarted/">getting started</a> page for more information about downloading, building, and deploying Mesos.</p>
+		
+		<h4>If you'd like to get involved or you're looking for support</h4>
+		<p>See our <a href="/community/">community</a> page for more details.</p>
+	</div>
+	<div class="col-md-8">
+		<h1>Networking for Mesos-managed containers</h1>
+
+<p>While networking plays a key role in data center infrastructure, it is &ndash; for
+now &ndash; beyond the scope of Mesos to try to address the concerns of networking
+setup, topology and performance. However, Mesos can ease integrations with
+existing networking solutions and enable features, like IP per container,
+task-granular task isolation and service discovery. More often than not, it
+will be challenging to provide a one-size-fits-all networking solution. The
+requirements and available solutions will vary across all cloud-only,
+on-premise, and hybrid deployments.</p>
+
+<p>One of the primary goals for the networking support in Mesos was to have a
+pluggable mechanism to allow users to enable custom networking solution as
+needed. As a result, several extensions were added to Mesos components in
+version 0.25.0 to enable networking support. Further, all the extensions are
+opt-in to allow older frameworks and applications without networking support to
+coexist with the newer ones.</p>
+
+<p>The rest of this document describes the overall architecture of all the involved
+components, configuration steps for enabling IP-per-container, and required
+framework changes.</p>
+
+<h2>How does it work?</h2>
+
+<p><img src="images/networking-architecture.png" alt="Mesos Networking Architecture" /></p>
+
+<p>A key observation is that the networking support is enabled via a Mesos module
+and thus the Mesos master and agents are completely oblivious of it. It is
+completely up to the networking module to provide the desired support. Next,
+the IP requests are provided on a best effort manner. Thus, the framework should
+be willing to handle ignored (in cases where the module(s) are not present) or
+declined (the IPs can&rsquo;t be assigned due to various reasons) requests.</p>
+
+<p>To maximize backwards-compatibility with existing frameworks, schedulers must
+opt-in to network isolation per-container. Schedulers opt in to network
+isolation using new data structures in the TaskInfo message.</p>
+
+<h3>Terminology</h3>
+
+<ul>
+<li><p>IP Address Management (IPAM) Server</p>
+
+<ul>
+<li>assigns IPs on demand</li>
+<li>recycles IPs once they have been released</li>
+<li>(optionally) can tag IPs with a given string/id.</li>
+</ul>
+</li>
+<li><p>IPAM client</p>
+
+<ul>
+<li>tightly coupled with a particular IPAM server</li>
+<li>acts as a bridge between the &ldquo;Network Isolator Module&rdquo; and the IPAM server</li>
+<li>communicates with the server to request/release IPs</li>
+</ul>
+</li>
+<li><p>Network Isolator Module (NIM):</p>
+
+<ul>
+<li>a Mesos module for the Agent implementing the <code>Isolator</code> interface</li>
+<li>looks at TaskInfos to detect the IP requirements for the tasks</li>
+<li>communicates with the IPAM client to request/release IPs</li>
+<li>communicates with an external network virtualizer/isolator to enable network
+isolation</li>
+</ul>
+</li>
+<li><p>Cleanup Module:</p>
+
+<ul>
+<li>responsible for doing a cleanup (releasing IPs, etc.) during a Agent lost
+event, dormant otherwise</li>
+</ul>
+</li>
+</ul>
+
+
+<h3>Framework requests IP address for containers</h3>
+
+<ol>
+<li><p>A Mesos framework uses the TaskInfo message to requests IPs for each
+container being launched. (The request is ignored if the Mesos cluster
+doesn&rsquo;t have support for IP-per-container.)</p></li>
+<li><p>Mesos Master processes TaskInfos and forwards them to the Agent for launching
+tasks.</p></li>
+</ol>
+
+
+<h3>Network isolator module gets IP from IPAM server</h3>
+
+<ol>
+<li><p>Mesos Agent inspects the TaskInfo to detect the container requirements
+(MesosContainerizer in this case) and prepares various Isolators for the
+to-be-launched container.</p>
+
+<ul>
+<li>The NIM inspects the TaskInfo to decide whether to enable network isolator
+or not.</li>
+</ul>
+</li>
+<li><p>If network isolator is to be enabled, NIM requests IP address(es) via IPAM
+  client and informs the Agent.</p></li>
+</ol>
+
+
+<h3>Agent launches container with a network namespace</h3>
+
+<ol>
+<li>The Agent launches a container within a new network namespace.
+
+<ul>
+<li>The Agent calls into NIM to perform &ldquo;isolation&rdquo;</li>
+<li>The NIM then calls into network virtualizer to isolate the container.</li>
+</ul>
+</li>
+</ol>
+
+
+<h3>Network virtualizer assigns IP address to the container and isolates it.</h3>
+
+<ol>
+<li>NIM then &ldquo;decorates&rdquo; the TaskStatus with the IP information.
+
+<ul>
+<li>The IP address(es) from TaskStatus are made available at Master&rsquo;s
+state endpoint.</li>
+<li>The TaskStatus is also forwarded to the framework to inform it of the IP
+addresses.</li>
+<li>When a task is killed or lost, NIM communicates with IPAM client to release
+corresponding IP address(es).</li>
+</ul>
+</li>
+</ol>
+
+
+<h3>Cleanup module detects lost Agents and performs cleanup</h3>
+
+<ol>
+<li><p>The cleanup module gets notified if there is an Agent-lost event.</p></li>
+<li><p>The cleanup module communicates with the IPAM client to release all IP
+address(es) associated with the lost Agent. The IPAM may have a grace period
+before the address(es) are recycled.</p></li>
+</ol>
+
+
+<h2>Configuration</h2>
+
+<p>The network isolator module is not part of standard Mesos distribution. However,
+there is an example implementation at https://github.com/mesosphere/net-modules.</p>
+
+<p>Once the network isolation module has been built into a shared dynamic library,
+we can load it into Mesos Agent (/documentation/latest/see <a href="modules/">modules documentation</a> on
+instructions for building and loading a module).</p>
+
+<h2>Enabling frameworks for IP-per-container capability</h2>
+
+<h3>NetworkInfo</h3>
+
+<p>A new NetworkInfo message has been introduced:</p>
+
+<pre><code class="{.proto}">message NetworkInfo {
+  enum Protocol {
+    IPv4 = 0,
+    IPv6 = 1
+  }
+
+  optional Protocol protocol = 1;
+
+  optional string ip_address = 2;
+
+  repeated string groups = 3;
+
+  optional Labels labels = 4;
+};
+</code></pre>
+
+<p>When requesting an IP address from the IPAM, one needs to set the <code>protocol</code>
+field to <code>IPv4</code> or <code>IPv6</code>. Setting <code>ip_address</code> to a valid IP address allows the
+framework to specify a static IP address for the container (if supported by the
+NIM). This is helpful in situations where a task must be bound to a particular
+IP address even as it is killed and restarted on a different node.</p>
+
+<h3>Examples of specifying network requirements</h3>
+
+<p>Frameworks wanting to enable IP per container, need to provide <code>NetworkInfo</code>
+message in TaskInfo. Here are a few examples:</p>
+
+<ol>
+<li><p>A request for one address of unspecified protocol version using the default
+command executor</p>
+
+<pre><code>TaskInfo {
+ Â ...
+ Â command: ...,
+ Â container: ContainerInfo {
+ Â Â Â network_infos: [
+ Â Â Â Â Â NetworkInfo {
+ Â Â Â Â Â Â Â protocol: None;
+ Â Â Â Â Â Â Â ip_address: None;
+ Â Â Â Â Â Â Â groups: [];
+ Â Â Â Â Â Â Â labels: None;
+ Â Â Â Â Â }
+ Â Â Â ]
+ Â }
+}
+</code></pre></li>
+<li><p>A request for one IPv4 and one IPv6 address, in two separate groups using the
+default command executor</p>
+
+<pre><code>TaskInfo {
+  ...
+  command: ...,
+  container: ContainerInfo {
+    network_infos: [
+      NetworkInfo {
+        protocol: IPv4;
+        ip_address: None;
+        groups: ["public"];
+        labels: None;
+      },
+      NetworkInfo {
+        protocol: IPv6;
+        ip_address: None;
+        groups: ["private"];
+        labels: None;
+      }
+    ]
+  }
+}
+</code></pre></li>
+<li><p>A request for a specific IP address using a custom executor</p>
+
+<pre><code>TaskInfo {
+  ...
+  executor: ExecutorInfo {
+    ...,
+    container: ContainerInfo {
+      network_infos: [
+        NetworkInfo {
+          protocol: None;
+          ip_address: "10.1.2.3";
+          groups: [];
+          labels: None;
+        }
+      ]
+    }
+  }
+}
+</code></pre></li>
+</ol>
+
+
+<p>NOTE: The Mesos Containerizer will reject any CommandInfo that has a ContainerInfo. For this reason, when opting in to network isolation when using the Mesos Containerizer, set TaskInfo.ContainerInfo.NetworkInfo.</p>
+
+<h2>Address Discovery</h2>
+
+<p>The NetworkInfo message allows frameworks to request IP address(es) to be
+assigned at task launch time on the Mesos agent. Â After opting in to network
+isolation for a given executorâs container in this way, frameworks will need to
+know what address(es) were ultimately assigned in order to perform health
+checks, or any other out-of-band communication.</p>
+
+<p>This is accomplished by adding a new field to the TaskStatus message.</p>
+
+<pre><code class="{.proto}">message ContainerStatus {
+   repeated NetworkInfo network_infos;
+}
+
+message TaskStatus {
+  ...
+  optional ContainerStatus container;
+  ...
+};
+</code></pre>
+
+<p>Further, the container IP addresses are also exposed via Master&rsquo;s state
+endpoint. The JSON output from Master&rsquo;s state endpoint contains a list of task
+statuses. If a task&rsquo;s container was started with it&rsquo;s own IP address, the
+assigned IP address will be exposed as part of the <code>TASK_RUNNING</code> status.</p>
+
+<p>NOTE: Since per-container address(es) are strictly opt-in from the framework,
+the framework may ignore the IP address(es) provided in StatusUpdate if it
+didn&rsquo;t set NetworkInfo in the first place.</p>
+
+<h2>Writing a Custom Network Isolator Module</h2>
+
+<p>A network isolator module implements the Isolator interface provided by Mesos.
+The module is loaded as a dynamic shared library in to the Mesos Agent and gets
+hooked up in the container launch sequence. A network isolator may communicate
+with external IPAM and network virtualizer tools to fulfill framework
+requirements.</p>
+
+<p>In terms of the Isolator API, there are three key callbacks that a network
+isolator module should implement:</p>
+
+<ol>
+<li><p><code>Isolator::prepare()</code> provides the module with a chance to decide whether or
+not the enable network isolation for the given task container. If the network
+isolation is to be enabled, the Isolator::prepare call would inform the Agent
+to create a private network namespace for the coordinator. It is this
+interface, that will also generate an IP address (statically or with the help
+of an external IPAM agent) for the container.</p></li>
+<li><p><code>Isolator::isolate()</code> provide the module with the opportunity to <em>isolate</em>
+the container <em>after</em> it has been created but before the executor is launched
+inside the container. This would involve creating virtual ethernet adapter
+for the container and assigning it an IP address. The module can also use
+help of an external network virtualizer/isolator for setting up network for
+the container.</p></li>
+<li><p><code>Isolator::cleanup()</code> is called when the container terminates. This allows the
+module to perform any cleanups such as recovering resources and releasing IP
+addresses as needed.</p></li>
+</ol>
+
+
+	</div>
+</div>
+
+			
+	      <hr>
+
+				<!-- footer -->
+	      <div class="footer">
+	        <p>&copy; 2012-2015 <a href="http://apache.org">The Apache Software Foundation</a>.
+	        Apache Mesos, the Apache feather logo, and the Apache Mesos project logo are trademarks of The Apache Software Foundation.<p>
+	      </div><!-- /footer -->
+
+	    </div> <!-- /container -->
+
+	    <!-- JS -->
+	    <script src="//code.jquery.com/jquery-1.11.0.min.js" type="text/javascript"></script>
+			<script src="//netdna.bootstrapcdn.com/bootstrap/3.1.1/js/bootstrap.min.js" type="text/javascript"></script>
+    </body>
+</html>

Modified: mesos/site/publish/documentation/latest/operational-guide/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/latest/operational-guide/index.html?rev=1707671&r1=1707670&r2=1707671&view=diff
==============================================================================
--- mesos/site/publish/documentation/latest/operational-guide/index.html (original)
+++ mesos/site/publish/documentation/latest/operational-guide/index.html Fri Oct  9 08:10:08 2015
@@ -83,11 +83,17 @@
 	<div class="col-md-8">
 		<h1>Operational Guide</h1>
 
+<h2>Using a process supervisor</h2>
+
+<p>Mesos uses a &ldquo;<a href="https://en.wikipedia.org/wiki/Fail-fast">fail-fast</a>&rdquo; approach to error handling: if a serious error occurs, Mesos will typically exit rather than trying to continue running in a possibly erroneous state. For example, when Mesos is configured for <a href="/documentation/latest/high-availability">high availability</a>, the leading master will abort itself when it discovers it has been partitioned away from the Zookeeper quorum. This is a safety precaution to ensure the previous leader doesn&rsquo;t continue communicating in an unsafe state.</p>
+
+<p>To ensure that such failures are handled appropriately, production deployments of Mesos typically use a <em>process supervisor</em> (such as systemd or supervisord) to detect when Mesos processes exit. The supervisor can be configured to restart the failed process automatically and/or to notify the cluster operator to investigate the situation.</p>
+
 <h2>Changing the master quorum</h2>
 
-<p>Currently the master leverages a paxos-based replicated log as its storage backend (<code>--registry=replicated_log</code> is the only storage backend supported). Each master participates in the ensemble as a log replica. The <code>--quorum</code> flag determines a majority of the masters.</p>
+<p>The master leverages a Paxos-based replicated log as its storage backend (<code>--registry=replicated_log</code> is the only storage backend currently supported). Each master participates in the ensemble as a log replica. The <code>--quorum</code> flag determines a majority of the masters.</p>
 
-<p>The following table shows the tolerance to master failures, for each quorum size:</p>
+<p>The following table shows the tolerance to master failures for each quorum size:</p>
 
 <table>
 <thead>
@@ -151,7 +157,7 @@
 <p>To increase the quorum by N, repeat this process to increment the quorum size N times.</p>
 
 <p>NOTE: Currently, moving out of a single master setup requires wiping the replicated log
-state and starting fresh. This will wipe all persistent data (e.g. slaves, maintenance
+state and starting fresh. This will wipe all persistent data (e.g., slaves, maintenance
 information, quota information, etc). To move from 1 master to 3 masters:</p>
 
 <ol>
@@ -167,7 +173,7 @@ information, quota information, etc). To
 
 <ol>
 <li>Initially, 5 masters are running with <code>--quorum=3</code></li>
-<li>Remove 2 masters from the cluster, ensure they will not be restarted (See NOTE section above). Now 3 masters are running with <code>--quorum=3</code></li>
+<li>Remove 2 masters from the cluster, ensure they will not be restarted (see NOTE section above). Now 3 masters are running with <code>--quorum=3</code></li>
 <li>Restart the 3 masters with <code>--quorum=2</code></li>
 </ol>
 
@@ -178,9 +184,9 @@ information, quota information, etc). To
 
 <p>Please see the NOTE section above. So long as the failed master is guaranteed to not re-join the ensemble, it is safe to start a new master <em>with an empty log</em> and allow it to catch up.</p>
 
-<h2>External access for mesos master</h2>
+<h2>External access for Mesos master</h2>
 
-<p>If the default ip (or the command line arg <code>--ip</code>) points to an internal IP, then external entities such as framework scheduler would not be able to reach the master. To address that scenario, an externally accessible IP:port can be setup via the <code>--advertise_ip</code> and <code>--advertise_port</code> command line arguments of mesos master. If configured, external entities such as framework scheduler interact with the advertise_ip:advertise_port from where the request needs to be proxied to the internal IP:Port on which mesos master is listening.</p>
+<p>If the default IP (or the command line arg <code>--ip</code>) is an internal IP, then external entities such as framework schedulers will be unable to reach the master. To address that scenario, an externally accessible IP:port can be setup via the <code>--advertise_ip</code> and <code>--advertise_port</code> command line arguments of <code>mesos-master</code>. If configured, external entities such as framework schedulers interact with the advertise_ip:advertise_port from where the request needs to be proxied to the internal IP:port on which the Mesos master is listening.</p>
 
 	</div>
 </div>

Modified: mesos/site/publish/documentation/latest/upgrades/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/latest/upgrades/index.html?rev=1707671&r1=1707670&r2=1707671&view=diff
==============================================================================
--- mesos/site/publish/documentation/latest/upgrades/index.html (original)
+++ mesos/site/publish/documentation/latest/upgrades/index.html Fri Oct  9 08:10:08 2015
@@ -115,6 +115,18 @@
 </ul>
 
 
+<p>In order to upgrade a running cluster:</p>
+
+<ul>
+<li>Rebuild and install any modules so that upgraded masters/slaves can use them.</li>
+<li>Install the new master binaries and restart the masters.</li>
+<li>Install the new slave binaries and restart the slaves.</li>
+<li>Upgrade the schedulers by linking the latest native library / jar / egg (if necessary).</li>
+<li>Restart the schedulers.</li>
+<li>Upgrade the executors by linking the latest native library / jar / egg (if necessary).</li>
+</ul>
+
+
 <h2>Upgrading from 0.23.x to 0.24.x</h2>
 
 <p><strong>NOTE</strong> Support for live upgrading a driver based scheduler to HTTP based (experimental) scheduler has been added.</p>

Added: mesos/site/publish/documentation/networking-for-mesos-managed-containers/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/networking-for-mesos-managed-containers/index.html?rev=1707671&view=auto
==============================================================================
--- mesos/site/publish/documentation/networking-for-mesos-managed-containers/index.html (added)
+++ mesos/site/publish/documentation/networking-for-mesos-managed-containers/index.html Fri Oct  9 08:10:08 2015
@@ -0,0 +1,415 @@
+<!DOCTYPE html>
+<html>
+    <head>
+        <meta charset="utf-8">
+        <title></title>
+		    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+
+		    <link href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+		    <link rel="alternate" type="application/atom+xml" title="Apache Mesos Blog" href="/blog/feed.xml">
+		    
+		    <link href="../../assets/css/main.css" media="screen" rel="stylesheet" type="text/css" />
+				
+		    
+			
+			<!-- Google Analytics Magic -->
+			<script type="text/javascript">
+			  var _gaq = _gaq || [];
+			  _gaq.push(['_setAccount', 'UA-20226872-1']);
+			  _gaq.push(['_setDomainName', 'apache.org']);
+			  _gaq.push(['_trackPageview']);
+
+			  (function() {
+			    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+			    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+			    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+			  })();
+			</script>
+    </head>
+    <body>
+			<!-- magical breadcrumbs -->
+			<div class="topnav">
+			<ul class="breadcrumb">
+			  <li>
+					<div class="dropdown">
+					  <a data-toggle="dropdown" href="#">Apache Software Foundation <span class="caret"></span></a>
+					  <ul class="dropdown-menu" role="menu" aria-labelledby="dLabel">
+							<li><a href="http://www.apache.org">Apache Homepage</a></li>
+							<li><a href="http://www.apache.org/licenses/">License</a></li>
+					  	<li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li>  
+					  	<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+							<li><a href="http://www.apache.org/security/">Security</a></li>
+					  </ul>
+					</div>
+				</li>
+				<li><a href="http://mesos.apache.org">Apache Mesos</a></li>
+				
+				
+					<li><a href="/documentation
+/">Documentation
+</a></li>
+				
+				
+			</ul><!-- /breadcrumb -->
+			</div>
+			
+			<!-- navbar excitement -->
+	    <div class="navbar navbar-static-top" role="navigation">
+	      <div class="navbar-inner">
+	        <div class="container">
+						<a href="/" class="logo"><img src="/assets/img/mesos_logo.png" alt="Apache Mesos logo" /></a>
+					<div class="nav-collapse">
+						<ul class="nav nav-pills navbar-right">
+						  <li><a href="/gettingstarted/">Getting Started</a></li>
+						  <li><a href="/documentation/latest/">Documentation</a></li>
+						  <li><a href="/downloads/">Downloads</a></li>
+						  <li><a href="/community/">Community</a></li>
+						</ul>
+					</div>
+	        </div>
+	      </div>
+	    </div><!-- /.navbar -->
+
+      <div class="container">
+
+			<div class="row-fluid">
+	<div class="col-md-4">
+		<h4>If you're new to Mesos</h4>
+		<p>See the <a href="/gettingstarted/">getting started</a> page for more information about downloading, building, and deploying Mesos.</p>
+		
+		<h4>If you'd like to get involved or you're looking for support</h4>
+		<p>See our <a href="/community/">community</a> page for more details.</p>
+	</div>
+	<div class="col-md-8">
+		<h1>Networking for Mesos-managed containers</h1>
+
+<p>While networking plays a key role in data center infrastructure, it is &ndash; for
+now &ndash; beyond the scope of Mesos to try to address the concerns of networking
+setup, topology and performance. However, Mesos can ease integrations with
+existing networking solutions and enable features, like IP per container,
+task-granular task isolation and service discovery. More often than not, it
+will be challenging to provide a one-size-fits-all networking solution. The
+requirements and available solutions will vary across all cloud-only,
+on-premise, and hybrid deployments.</p>
+
+<p>One of the primary goals for the networking support in Mesos was to have a
+pluggable mechanism to allow users to enable custom networking solution as
+needed. As a result, several extensions were added to Mesos components in
+version 0.25.0 to enable networking support. Further, all the extensions are
+opt-in to allow older frameworks and applications without networking support to
+coexist with the newer ones.</p>
+
+<p>The rest of this document describes the overall architecture of all the involved
+components, configuration steps for enabling IP-per-container, and required
+framework changes.</p>
+
+<h2>How does it work?</h2>
+
+<p><img src="images/networking-architecture.png" alt="Mesos Networking Architecture" /></p>
+
+<p>A key observation is that the networking support is enabled via a Mesos module
+and thus the Mesos master and agents are completely oblivious of it. It is
+completely up to the networking module to provide the desired support. Next,
+the IP requests are provided on a best effort manner. Thus, the framework should
+be willing to handle ignored (in cases where the module(s) are not present) or
+declined (the IPs can&rsquo;t be assigned due to various reasons) requests.</p>
+
+<p>To maximize backwards-compatibility with existing frameworks, schedulers must
+opt-in to network isolation per-container. Schedulers opt in to network
+isolation using new data structures in the TaskInfo message.</p>
+
+<h3>Terminology</h3>
+
+<ul>
+<li><p>IP Address Management (IPAM) Server</p>
+
+<ul>
+<li>assigns IPs on demand</li>
+<li>recycles IPs once they have been released</li>
+<li>(optionally) can tag IPs with a given string/id.</li>
+</ul>
+</li>
+<li><p>IPAM client</p>
+
+<ul>
+<li>tightly coupled with a particular IPAM server</li>
+<li>acts as a bridge between the &ldquo;Network Isolator Module&rdquo; and the IPAM server</li>
+<li>communicates with the server to request/release IPs</li>
+</ul>
+</li>
+<li><p>Network Isolator Module (NIM):</p>
+
+<ul>
+<li>a Mesos module for the Agent implementing the <code>Isolator</code> interface</li>
+<li>looks at TaskInfos to detect the IP requirements for the tasks</li>
+<li>communicates with the IPAM client to request/release IPs</li>
+<li>communicates with an external network virtualizer/isolator to enable network
+isolation</li>
+</ul>
+</li>
+<li><p>Cleanup Module:</p>
+
+<ul>
+<li>responsible for doing a cleanup (releasing IPs, etc.) during a Agent lost
+event, dormant otherwise</li>
+</ul>
+</li>
+</ul>
+
+
+<h3>Framework requests IP address for containers</h3>
+
+<ol>
+<li><p>A Mesos framework uses the TaskInfo message to requests IPs for each
+container being launched. (The request is ignored if the Mesos cluster
+doesn&rsquo;t have support for IP-per-container.)</p></li>
+<li><p>Mesos Master processes TaskInfos and forwards them to the Agent for launching
+tasks.</p></li>
+</ol>
+
+
+<h3>Network isolator module gets IP from IPAM server</h3>
+
+<ol>
+<li><p>Mesos Agent inspects the TaskInfo to detect the container requirements
+(MesosContainerizer in this case) and prepares various Isolators for the
+to-be-launched container.</p>
+
+<ul>
+<li>The NIM inspects the TaskInfo to decide whether to enable network isolator
+or not.</li>
+</ul>
+</li>
+<li><p>If network isolator is to be enabled, NIM requests IP address(es) via IPAM
+  client and informs the Agent.</p></li>
+</ol>
+
+
+<h3>Agent launches container with a network namespace</h3>
+
+<ol>
+<li>The Agent launches a container within a new network namespace.
+
+<ul>
+<li>The Agent calls into NIM to perform &ldquo;isolation&rdquo;</li>
+<li>The NIM then calls into network virtualizer to isolate the container.</li>
+</ul>
+</li>
+</ol>
+
+
+<h3>Network virtualizer assigns IP address to the container and isolates it.</h3>
+
+<ol>
+<li>NIM then &ldquo;decorates&rdquo; the TaskStatus with the IP information.
+
+<ul>
+<li>The IP address(es) from TaskStatus are made available at Master&rsquo;s
+state endpoint.</li>
+<li>The TaskStatus is also forwarded to the framework to inform it of the IP
+addresses.</li>
+<li>When a task is killed or lost, NIM communicates with IPAM client to release
+corresponding IP address(es).</li>
+</ul>
+</li>
+</ol>
+
+
+<h3>Cleanup module detects lost Agents and performs cleanup</h3>
+
+<ol>
+<li><p>The cleanup module gets notified if there is an Agent-lost event.</p></li>
+<li><p>The cleanup module communicates with the IPAM client to release all IP
+address(es) associated with the lost Agent. The IPAM may have a grace period
+before the address(es) are recycled.</p></li>
+</ol>
+
+
+<h2>Configuration</h2>
+
+<p>The network isolator module is not part of standard Mesos distribution. However,
+there is an example implementation at https://github.com/mesosphere/net-modules.</p>
+
+<p>Once the network isolation module has been built into a shared dynamic library,
+we can load it into Mesos Agent (/documentation/latest/see <a href="modules/">modules documentation</a> on
+instructions for building and loading a module).</p>
+
+<h2>Enabling frameworks for IP-per-container capability</h2>
+
+<h3>NetworkInfo</h3>
+
+<p>A new NetworkInfo message has been introduced:</p>
+
+<pre><code class="{.proto}">message NetworkInfo {
+  enum Protocol {
+    IPv4 = 0,
+    IPv6 = 1
+  }
+
+  optional Protocol protocol = 1;
+
+  optional string ip_address = 2;
+
+  repeated string groups = 3;
+
+  optional Labels labels = 4;
+};
+</code></pre>
+
+<p>When requesting an IP address from the IPAM, one needs to set the <code>protocol</code>
+field to <code>IPv4</code> or <code>IPv6</code>. Setting <code>ip_address</code> to a valid IP address allows the
+framework to specify a static IP address for the container (if supported by the
+NIM). This is helpful in situations where a task must be bound to a particular
+IP address even as it is killed and restarted on a different node.</p>
+
+<h3>Examples of specifying network requirements</h3>
+
+<p>Frameworks wanting to enable IP per container, need to provide <code>NetworkInfo</code>
+message in TaskInfo. Here are a few examples:</p>
+
+<ol>
+<li><p>A request for one address of unspecified protocol version using the default
+command executor</p>
+
+<pre><code>TaskInfo {
+ Â ...
+ Â command: ...,
+ Â container: ContainerInfo {
+ Â Â Â network_infos: [
+ Â Â Â Â Â NetworkInfo {
+ Â Â Â Â Â Â Â protocol: None;
+ Â Â Â Â Â Â Â ip_address: None;
+ Â Â Â Â Â Â Â groups: [];
+ Â Â Â Â Â Â Â labels: None;
+ Â Â Â Â Â }
+ Â Â Â ]
+ Â }
+}
+</code></pre></li>
+<li><p>A request for one IPv4 and one IPv6 address, in two separate groups using the
+default command executor</p>
+
+<pre><code>TaskInfo {
+  ...
+  command: ...,
+  container: ContainerInfo {
+    network_infos: [
+      NetworkInfo {
+        protocol: IPv4;
+        ip_address: None;
+        groups: ["public"];
+        labels: None;
+      },
+      NetworkInfo {
+        protocol: IPv6;
+        ip_address: None;
+        groups: ["private"];
+        labels: None;
+      }
+    ]
+  }
+}
+</code></pre></li>
+<li><p>A request for a specific IP address using a custom executor</p>
+
+<pre><code>TaskInfo {
+  ...
+  executor: ExecutorInfo {
+    ...,
+    container: ContainerInfo {
+      network_infos: [
+        NetworkInfo {
+          protocol: None;
+          ip_address: "10.1.2.3";
+          groups: [];
+          labels: None;
+        }
+      ]
+    }
+  }
+}
+</code></pre></li>
+</ol>
+
+
+<p>NOTE: The Mesos Containerizer will reject any CommandInfo that has a ContainerInfo. For this reason, when opting in to network isolation when using the Mesos Containerizer, set TaskInfo.ContainerInfo.NetworkInfo.</p>
+
+<h2>Address Discovery</h2>
+
+<p>The NetworkInfo message allows frameworks to request IP address(es) to be
+assigned at task launch time on the Mesos agent. Â After opting in to network
+isolation for a given executorâs container in this way, frameworks will need to
+know what address(es) were ultimately assigned in order to perform health
+checks, or any other out-of-band communication.</p>
+
+<p>This is accomplished by adding a new field to the TaskStatus message.</p>
+
+<pre><code class="{.proto}">message ContainerStatus {
+   repeated NetworkInfo network_infos;
+}
+
+message TaskStatus {
+  ...
+  optional ContainerStatus container;
+  ...
+};
+</code></pre>
+
+<p>Further, the container IP addresses are also exposed via Master&rsquo;s state
+endpoint. The JSON output from Master&rsquo;s state endpoint contains a list of task
+statuses. If a task&rsquo;s container was started with it&rsquo;s own IP address, the
+assigned IP address will be exposed as part of the <code>TASK_RUNNING</code> status.</p>
+
+<p>NOTE: Since per-container address(es) are strictly opt-in from the framework,
+the framework may ignore the IP address(es) provided in StatusUpdate if it
+didn&rsquo;t set NetworkInfo in the first place.</p>
+
+<h2>Writing a Custom Network Isolator Module</h2>
+
+<p>A network isolator module implements the Isolator interface provided by Mesos.
+The module is loaded as a dynamic shared library in to the Mesos Agent and gets
+hooked up in the container launch sequence. A network isolator may communicate
+with external IPAM and network virtualizer tools to fulfill framework
+requirements.</p>
+
+<p>In terms of the Isolator API, there are three key callbacks that a network
+isolator module should implement:</p>
+
+<ol>
+<li><p><code>Isolator::prepare()</code> provides the module with a chance to decide whether or
+not the enable network isolation for the given task container. If the network
+isolation is to be enabled, the Isolator::prepare call would inform the Agent
+to create a private network namespace for the coordinator. It is this
+interface, that will also generate an IP address (statically or with the help
+of an external IPAM agent) for the container.</p></li>
+<li><p><code>Isolator::isolate()</code> provide the module with the opportunity to <em>isolate</em>
+the container <em>after</em> it has been created but before the executor is launched
+inside the container. This would involve creating virtual ethernet adapter
+for the container and assigning it an IP address. The module can also use
+help of an external network virtualizer/isolator for setting up network for
+the container.</p></li>
+<li><p><code>Isolator::cleanup()</code> is called when the container terminates. This allows the
+module to perform any cleanups such as recovering resources and releasing IP
+addresses as needed.</p></li>
+</ol>
+
+
+	</div>
+</div>
+
+			
+	      <hr>
+
+				<!-- footer -->
+	      <div class="footer">
+	        <p>&copy; 2012-2015 <a href="http://apache.org">The Apache Software Foundation</a>.
+	        Apache Mesos, the Apache feather logo, and the Apache Mesos project logo are trademarks of The Apache Software Foundation.<p>
+	      </div><!-- /footer -->
+
+	    </div> <!-- /container -->
+
+	    <!-- JS -->
+	    <script src="//code.jquery.com/jquery-1.11.0.min.js" type="text/javascript"></script>
+			<script src="//netdna.bootstrapcdn.com/bootstrap/3.1.1/js/bootstrap.min.js" type="text/javascript"></script>
+    </body>
+</html>

Modified: mesos/site/publish/documentation/operational-guide/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/operational-guide/index.html?rev=1707671&r1=1707670&r2=1707671&view=diff
==============================================================================
--- mesos/site/publish/documentation/operational-guide/index.html (original)
+++ mesos/site/publish/documentation/operational-guide/index.html Fri Oct  9 08:10:08 2015
@@ -83,11 +83,17 @@
 	<div class="col-md-8">
 		<h1>Operational Guide</h1>
 
+<h2>Using a process supervisor</h2>
+
+<p>Mesos uses a &ldquo;<a href="https://en.wikipedia.org/wiki/Fail-fast">fail-fast</a>&rdquo; approach to error handling: if a serious error occurs, Mesos will typically exit rather than trying to continue running in a possibly erroneous state. For example, when Mesos is configured for <a href="/documentation/latest/high-availability">high availability</a>, the leading master will abort itself when it discovers it has been partitioned away from the Zookeeper quorum. This is a safety precaution to ensure the previous leader doesn&rsquo;t continue communicating in an unsafe state.</p>
+
+<p>To ensure that such failures are handled appropriately, production deployments of Mesos typically use a <em>process supervisor</em> (such as systemd or supervisord) to detect when Mesos processes exit. The supervisor can be configured to restart the failed process automatically and/or to notify the cluster operator to investigate the situation.</p>
+
 <h2>Changing the master quorum</h2>
 
-<p>Currently the master leverages a paxos-based replicated log as its storage backend (<code>--registry=replicated_log</code> is the only storage backend supported). Each master participates in the ensemble as a log replica. The <code>--quorum</code> flag determines a majority of the masters.</p>
+<p>The master leverages a Paxos-based replicated log as its storage backend (<code>--registry=replicated_log</code> is the only storage backend currently supported). Each master participates in the ensemble as a log replica. The <code>--quorum</code> flag determines a majority of the masters.</p>
 
-<p>The following table shows the tolerance to master failures, for each quorum size:</p>
+<p>The following table shows the tolerance to master failures for each quorum size:</p>
 
 <table>
 <thead>
@@ -151,7 +157,7 @@
 <p>To increase the quorum by N, repeat this process to increment the quorum size N times.</p>
 
 <p>NOTE: Currently, moving out of a single master setup requires wiping the replicated log
-state and starting fresh. This will wipe all persistent data (e.g. slaves, maintenance
+state and starting fresh. This will wipe all persistent data (e.g., slaves, maintenance
 information, quota information, etc). To move from 1 master to 3 masters:</p>
 
 <ol>
@@ -167,7 +173,7 @@ information, quota information, etc). To
 
 <ol>
 <li>Initially, 5 masters are running with <code>--quorum=3</code></li>
-<li>Remove 2 masters from the cluster, ensure they will not be restarted (See NOTE section above). Now 3 masters are running with <code>--quorum=3</code></li>
+<li>Remove 2 masters from the cluster, ensure they will not be restarted (see NOTE section above). Now 3 masters are running with <code>--quorum=3</code></li>
 <li>Restart the 3 masters with <code>--quorum=2</code></li>
 </ol>
 
@@ -178,9 +184,9 @@ information, quota information, etc). To
 
 <p>Please see the NOTE section above. So long as the failed master is guaranteed to not re-join the ensemble, it is safe to start a new master <em>with an empty log</em> and allow it to catch up.</p>
 
-<h2>External access for mesos master</h2>
+<h2>External access for Mesos master</h2>
 
-<p>If the default ip (or the command line arg <code>--ip</code>) points to an internal IP, then external entities such as framework scheduler would not be able to reach the master. To address that scenario, an externally accessible IP:port can be setup via the <code>--advertise_ip</code> and <code>--advertise_port</code> command line arguments of mesos master. If configured, external entities such as framework scheduler interact with the advertise_ip:advertise_port from where the request needs to be proxied to the internal IP:Port on which mesos master is listening.</p>
+<p>If the default IP (or the command line arg <code>--ip</code>) is an internal IP, then external entities such as framework schedulers will be unable to reach the master. To address that scenario, an externally accessible IP:port can be setup via the <code>--advertise_ip</code> and <code>--advertise_port</code> command line arguments of <code>mesos-master</code>. If configured, external entities such as framework schedulers interact with the advertise_ip:advertise_port from where the request needs to be proxied to the internal IP:port on which the Mesos master is listening.</p>
 
 	</div>
 </div>

Modified: mesos/site/publish/documentation/upgrades/index.html
URL: http://svn.apache.org/viewvc/mesos/site/publish/documentation/upgrades/index.html?rev=1707671&r1=1707670&r2=1707671&view=diff
==============================================================================
--- mesos/site/publish/documentation/upgrades/index.html (original)
+++ mesos/site/publish/documentation/upgrades/index.html Fri Oct  9 08:10:08 2015
@@ -115,6 +115,18 @@
 </ul>
 
 
+<p>In order to upgrade a running cluster:</p>
+
+<ul>
+<li>Rebuild and install any modules so that upgraded masters/slaves can use them.</li>
+<li>Install the new master binaries and restart the masters.</li>
+<li>Install the new slave binaries and restart the slaves.</li>
+<li>Upgrade the schedulers by linking the latest native library / jar / egg (if necessary).</li>
+<li>Restart the schedulers.</li>
+<li>Upgrade the executors by linking the latest native library / jar / egg (if necessary).</li>
+</ul>
+
+
 <h2>Upgrading from 0.23.x to 0.24.x</h2>
 
 <p><strong>NOTE</strong> Support for live upgrading a driver based scheduler to HTTP based (experimental) scheduler has been added.</p>