You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by se...@apache.org on 2015/07/08 20:31:10 UTC

[3/9] flink git commit: [FLINK-2288] [docs] Add docs for HA/ZooKeeper setup

[FLINK-2288] [docs] Add docs for HA/ZooKeeper setup

This closes #886


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/9c0dd974
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/9c0dd974
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/9c0dd974

Branch: refs/heads/master
Commit: 9c0dd9742966011322be36343611146ed7b862f0
Parents: 8c72b50
Author: Ufuk Celebi <uc...@apache.org>
Authored: Fri Jul 3 16:45:15 2015 +0200
Committer: Stephan Ewen <se...@apache.org>
Committed: Wed Jul 8 20:28:40 2015 +0200

----------------------------------------------------------------------
 docs/_includes/navbar.html                 |   1 +
 docs/page/css/flink.css                    |  14 ++-
 docs/setup/fig/jobmanager_ha_overview.png  | Bin 0 -> 57875 bytes
 docs/setup/jobmanager_high_availability.md | 121 ++++++++++++++++++++++++
 4 files changed, 133 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/9c0dd974/docs/_includes/navbar.html
----------------------------------------------------------------------
diff --git a/docs/_includes/navbar.html b/docs/_includes/navbar.html
index 740ec9f..dc7ef30 100644
--- a/docs/_includes/navbar.html
+++ b/docs/_includes/navbar.html
@@ -53,6 +53,7 @@ under the License.
                 <li><a href="{{ setup }}/yarn_setup.html">YARN</a></li>
                 <li><a href="{{ setup }}/gce_setup.html">GCloud</a></li>
                 <li><a href="{{ setup }}/flink_on_tez.html">Flink on Tez <span class="badge">Beta</span></a></li>
+                <li><a href="{{ setup }}/jobmanager_high_availability.html">JobManager High Availability<a></li>
 
                 <li class="divider"></li>
                 <li><a href="{{ setup }}/config.html">Configuration</a></li>

http://git-wip-us.apache.org/repos/asf/flink/blob/9c0dd974/docs/page/css/flink.css
----------------------------------------------------------------------
diff --git a/docs/page/css/flink.css b/docs/page/css/flink.css
index 9074e23..3b09e54 100644
--- a/docs/page/css/flink.css
+++ b/docs/page/css/flink.css
@@ -113,11 +113,19 @@ h2, h3 {
 
 code {
 	background: #f5f5f5;
-  padding: 0;
-  color: #333333;
-  font-family: "Menlo", "Lucida Console", monospace;
+	padding: 0;
+	color: #333333;
+	font-family: "Menlo", "Lucida Console", monospace;
 }
 
 pre {
 	font-size: 85%;
 }
+
+img.center {
+	display: block;
+	margin-left: auto;
+    margin-right: auto;
+}
+
+

http://git-wip-us.apache.org/repos/asf/flink/blob/9c0dd974/docs/setup/fig/jobmanager_ha_overview.png
----------------------------------------------------------------------
diff --git a/docs/setup/fig/jobmanager_ha_overview.png b/docs/setup/fig/jobmanager_ha_overview.png
new file mode 100644
index 0000000..ff82cae
Binary files /dev/null and b/docs/setup/fig/jobmanager_ha_overview.png differ

http://git-wip-us.apache.org/repos/asf/flink/blob/9c0dd974/docs/setup/jobmanager_high_availability.md
----------------------------------------------------------------------
diff --git a/docs/setup/jobmanager_high_availability.md b/docs/setup/jobmanager_high_availability.md
new file mode 100644
index 0000000..dec0cdc
--- /dev/null
+++ b/docs/setup/jobmanager_high_availability.md
@@ -0,0 +1,121 @@
+---
+title: "JobManager High Availability (HA)"
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+The JobManager is the coordinator of each Flink deployment. It is responsible for both *scheduling* and *resource management*.
+
+By default, there is a single JobManager instance per Flink cluster. This creates a *single point of failure* (SPOF): if the JobManager crashes, no new programs can be submitted and running programs fail.
+
+With JobManager High Availability, you can run multiple JobManager instances per Flink cluster and thereby circumvent the *SPOF*.
+
+The general idea of JobManager high availability is that there is a **single leading JobManager** at any time and **multiple standby JobManagers** to take over leadership in case the leader fails. This guarantees that there is **no single point of failure** and programs can make progress as soon as a standby JobManager has taken leadership. There is no explicit distinction between standby and master JobManager instances. Each JobManager can take the role of master or standby.
+
+As an example, consider the following setup with three JobManager instances:
+
+<img src="fig/jobmanager_ha_overview.png" class="center" />
+
+## Configuration
+
+To enable JobManager High Availability you have to configure a **ZooKeeper quorum** and set up a **masters file** with all JobManagers hosts.
+
+Flink leverages **[ZooKeeper](http://zookeeper.apache.org)** for  *distributed coordination* between all running JobManager instances. ZooKeeper is a separate service from Flink, which provides highly reliable distirbuted coordination via leader election and light-weight consistent state storage. Check out [ZooKeeper's Getting Started Guide](http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html) for more information about ZooKeeper.
+
+Configuring a ZooKeeper quorum in `conf/flink-conf.yaml` *enables* high availability mode and all Flink components try to connect to a JobManager via coordination through ZooKeeper.
+
+- **ZooKeeper quorum** (required): A *ZooKeeper quorum* is a replicated group of ZooKeeper servers, which provide the distributed coordination service.
+  
+  <pre>ha.zookeeper.quorum: address1:2181[,...],addressX:2181</pre>
+
+  Each *addressX:port* refers to a ZooKeeper server, which is reachable by Flink at the given address and port.
+
+- The following configuration keys are optional:
+
+  - `ha.zookeeper.dir: /flink [default]`: ZooKeeper directory to use for coordination
+  - TODO Add client configuration keys
+
+## Starting an HA-cluster
+
+In order to start an HA-cluster configure the *masters* file in `conf/masters`:
+
+- **masters file**: The *masters file* contains all hosts, on which JobManagers are started.
+
+  <pre>
+jobManagerAddress1
+[...]
+jobManagerAddressX
+  </pre>
+
+After configuring the masters and the ZooKeeper quorum, you can use the provided cluster startup scripts as usual. They will start a HA-cluster. **Keep in mind that the ZooKeeper quorum has to be running when you call the scripts**.
+
+## Running ZooKeeper
+
+If you don't have a running ZooKeeper installation, you can use the helper scripts, which ship with Flink.
+
+There is a ZooKeeper configuration template in `conf/zoo.cfg`. You can configure the hosts to run ZooKeeper on with the `server.X` entries, where X is a unique ID of each server:
+
+<pre>
+server.X=addressX:peerPort:leaderPort
+[...]
+server.Y=addressY:peerPort:leaderPort
+</pre>
+
+The script `bin/start-zookeeper-quorum.sh` will start a ZooKeeper server on each of the configured hosts. The started processes start ZooKeeper servers via a Flink wrapper, which reads the configuration from `conf/zoo.cfg` and makes sure to set some rqeuired configuration values for convenience. In production setups, it is recommended to manage your own ZooKeeper installation.
+
+## Example: Start and stop a local HA-cluster with 2 JobManagers
+
+1. **Configure ZooKeeper quorum** in `conf/flink.yaml`:
+   
+   <pre>ha.zookeeper.quorum: localhost</pre>
+
+2. **Configure masters** in `conf/masters`:
+
+   <pre>
+localhost
+localhost</pre>
+
+3. **Configure ZooKeeper server** in `conf/zoo.cfg` (currently it's only possible to run a single ZooKeeper server per machine):
+
+   <pre>server.0=localhost:2888:3888</pre>
+
+4. **Start ZooKeeper quorum**:
+
+   <pre>
+$ bin/start-zookeeper-quorum.sh
+Starting zookeeper daemon on host localhost.</pre>
+
+5. **Start an HA-cluster**:
+   
+   <pre>
+$ bin/start-cluster-streaming.sh
+Starting HA cluster (streaming mode) with 2 masters and 1 peers in ZooKeeper quorum.
+Starting jobmanager daemon on host localhost.
+Starting jobmanager daemon on host localhost.
+Starting taskmanager daemon on host localhost.</pre>
+
+6. **Stop ZooKeeper quorum and cluster**:
+
+   <pre>
+$ bin/stop-cluster.sh
+Stopping taskmanager daemon (pid: 7647) on localhost.
+Stopping jobmanager daemon (pid: 7495) on host localhost.
+Stopping jobmanager daemon (pid: 7349) on host localhost.
+$ bin/stop-zookeeper-quorum.sh
+Stopping zookeeper daemon (pid: 7101) on host localhost.</pre>