You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@zeppelin.apache.org by zj...@apache.org on 2020/07/06 02:25:34 UTC

[zeppelin] branch master updated: [ZEPPELIN-4874]. Add document for interpreter yarn launch mode

This is an automated email from the ASF dual-hosted git repository.

zjffdu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/zeppelin.git


The following commit(s) were added to refs/heads/master by this push:
     new 1b31947  [ZEPPELIN-4874]. Add document for interpreter yarn launch mode
1b31947 is described below

commit 1b319475d1625149de69253aceecd25fdb1a1179
Author: Jeff Zhang <zj...@apache.org>
AuthorDate: Wed Jun 17 23:15:42 2020 +0800

    [ZEPPELIN-4874]. Add document for interpreter yarn launch mode
    
    ### What is this PR for?
    
    Document about yarn launch mode.
    * Add one section in `Run Mode`
    * Add one section about how to integration with hadoop
    
    ### What type of PR is it?
    [ Documentation ]
    
    ### Todos
    * [ ] - Task
    
    ### What is the Jira issue?
    * https://issues.apache.org/jira/browse/ZEPPELIN-4874
    
    ### How should this be tested?
    No test needed
    
    ### Screenshots (if appropriate)
    
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? No
    
    Author: Jeff Zhang <zj...@apache.org>
    
    Closes #3835 from zjffdu/ZEPPELIN-4874 and squashes the following commits:
    
    a01912b4f [Jeff Zhang] [ZEPPELIN-4874]. Add document for interpreter yarn launch mode
---
 docs/_includes/themes/zeppelin/_navigation.html |  2 +
 docs/quickstart/yarn.md                         | 75 +++++++++++++++++++++++++
 docs/setup/basics/hadoop_integration.md         | 39 +++++++++++++
 3 files changed, 116 insertions(+)

diff --git a/docs/_includes/themes/zeppelin/_navigation.html b/docs/_includes/themes/zeppelin/_navigation.html
index 0940863..5f0eac4 100644
--- a/docs/_includes/themes/zeppelin/_navigation.html
+++ b/docs/_includes/themes/zeppelin/_navigation.html
@@ -31,6 +31,7 @@
                 <li class="title"><span>Run Mode</span></li>
                 <li><a href="{{BASE_PATH}}/quickstart/kubernetes.html">Kubernetes</a></li>
                 <li><a href="{{BASE_PATH}}/quickstart/docker.html">Docker</a></li>
+                <li><a href="{{BASE_PATH}}/quickstart/yarn.html">Yarn</a></li>
                 <li role="separator" class="divider"></li>
                 <li><a href="{{BASE_PATH}}/quickstart/spark_with_zeppelin.html">Spark with Zeppelin</a></li>
                 <li><a href="{{BASE_PATH}}/quickstart/sql_with_zeppelin.html">SQL with Zeppelin</a></li>
@@ -85,6 +86,7 @@
               <ul class="dropdown-menu scrollable-menu">
                 <li class="title"><span>Basics</span></li>
                 <li><a href="{{BASE_PATH}}/setup/basics/how_to_build.html">How to Build Zeppelin</a></li>
+                <li><a href="{{BASE_PATH}}/setup/basics/hadoop_integration.html">Hadoop Integration</a></li>
                 <li><a href="{{BASE_PATH}}/setup/basics/multi_user_support.html">Multi-user Support</a></li>
                 <li role="separator" class="divider"></li>
                 <li class="title"><span>Deployment</span></li>
diff --git a/docs/quickstart/yarn.md b/docs/quickstart/yarn.md
new file mode 100644
index 0000000..c283a2a
--- /dev/null
+++ b/docs/quickstart/yarn.md
@@ -0,0 +1,75 @@
+---
+layout: page
+title: "Zeppelin on Yarn"
+description: "Apache Zeppelin supports to run interpreter process in yarn containers"
+group: usage/interpreter 
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+{% include JB/setup %}
+
+# Zeppelin on Yarn
+
+<div id="toc"></div>
+
+Zeppelin on yarn means to run interpreter process in yarn container. The key benefit is the scalability, you won't run out of memory
+of the zeppelin server host if you run large amount of interpreter processes.
+
+## Prerequisites
+The following is required for yarn interpreter mode.
+
+* Hadoop client (both 2.x and 3.x are supported) is installed.
+* `$HADOOP_HOME/bin` is put in `PATH`. Because internally zeppelin will run command `hadoop classpath` to get all the hadoop jars and put them in the classpath of Zeppelin.
+* Set `USE_HADOOP` as `true` in `zeppelin-env.sh`.
+
+## Configuration
+
+Yarn interpreter mode needs to be set for each interpreter. You can set `zeppelin.interpreter.launcher` to be `yarn` to run it in yarn mode.
+Besides that, you can also specify other properties as following table.
+
+<table class="table-configuration">
+  <tr>
+    <th>Name</th>
+    <th>Default Value</th>
+    <th>Description</th>
+  </tr>
+  <tr>
+    <td>zeppelin.interpreter.yarn.resource.memory</td>
+    <td>1024</td>
+    <td>memory for interpreter process, unit: mb</td>
+  </tr>
+  <tr>
+    <td>zeppelin.interpreter.yarn.resource.memoryOverhead</td>
+    <td>Amount of non-heap memory to be allocated per interpreter process in yarn interpreter mode, in MiB unless otherwise specified. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc.</td>
+  </tr>
+  <tr>
+    <td>zeppelin.interpreter.yarn.resource.cores</td>
+    <td>1</td>
+    <td>cpu cores for interpreter process</td>
+  </tr>
+  <tr>
+    <td>zeppelin.interpreter.yarn.queue</td>
+    <td>default</td>
+    <td>yarn queue name</td>
+  </tr>
+</table>
+
+## Differences with non-yarn interpreter mode (local mode)
+
+There're several differences between yarn interpreter mode with non-yarn interpreter mode (local mode)
+
+* New yarn app will be allocated for the interpreter process.
+* Any local path setting won't work in yarn interpreter process. E.g. if you run python interpreter in yarn interpreter mode, then you need to make sure the python executable of `zeppelin.python` exist in all the nodes of yarn cluster. 
+Because the python interpreter may launch in any node.
+* Don't use it for spark interpreter. Instead use spark's built-in yarn-client or yarn-cluster which is more suitable for spark interpreter.
\ No newline at end of file
diff --git a/docs/setup/basics/hadoop_integration.md b/docs/setup/basics/hadoop_integration.md
new file mode 100644
index 0000000..9417ede
--- /dev/null
+++ b/docs/setup/basics/hadoop_integration.md
@@ -0,0 +1,39 @@
+---
+layout: page
+title: "How to integrate with hadoop"
+description: "How to integrate with hadoop"
+group: setup/basics
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+{% include JB/setup %}
+
+# Integrate with hadoop
+
+<div id="toc"></div>
+
+Hadoop is an optional component of zeppelin unless you need the following features
+
+* Use hdfs to store notes. 
+* Use hdfs to store interpreter configuration
+* Use hdfs to store recovery data
+* Launch interpreter in yarn mode
+
+## Requirements
+
+In Zeppelin 0.9 doesn't ship with hadoop dependencies, you need to include hadoop jars by yourself via the following steps
+
+* Hadoop client (both 2.x and 3.x are supported) is installed.
+* `$HADOOP_HOME/bin` is put in `PATH`. Because internally zeppelin will run command `hadoop classpath` to get all the hadoop jars and put them in the classpath of Zeppelin.
+* Set `USE_HADOOP` as `true` in `zeppelin-env.sh`.