You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@zeppelin.apache.org by mo...@apache.org on 2015/11/12 10:28:05 UTC

incubator-zeppelin git commit: [ZEPPELIN-407] Improve document on how to manage external libraries in spark interpreter

Repository: incubator-zeppelin
Updated Branches:
  refs/heads/master b52c86c0d -> 76cdcd8cd


[ZEPPELIN-407] Improve document on how to manage external libraries in spark interpreter

Improve docs on library loading in spark interpreter.
* Add information to load libraries using spark properties besides dependency loading using %dep interpreter
* State different ways of dependency loading for different version of Zeppelin
* Move Spark specific information from install.md to spark.md

Author: Mina Lee <mi...@nflabs.com>

Closes #410 from minahlee/ZEPPELIN-407 and squashes the following commits:

8f62790 [Mina Lee] [ZEPPELIN-407] Improve document on how to manage external libraries in spark interpreter
857d10b [Mina Lee] [ZEPPELIN-407] Improve document on how to manage external libraries in spark interpreter


Project: http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/commit/76cdcd8c
Tree: http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/tree/76cdcd8c
Diff: http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/diff/76cdcd8c

Branch: refs/heads/master
Commit: 76cdcd8cdd3a27d007dd767d1fa5ab4af4f17c83
Parents: b52c86c
Author: Mina Lee <mi...@nflabs.com>
Authored: Wed Nov 11 13:33:04 2015 +0900
Committer: Lee moon soo <mo...@apache.org>
Committed: Thu Nov 12 18:28:20 2015 +0900

----------------------------------------------------------------------
 docs/docs/install/install.md   | 10 ------
 docs/docs/interpreter/spark.md | 61 +++++++++++++++++++++++++++++++++++--
 2 files changed, 58 insertions(+), 13 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/76cdcd8c/docs/docs/install/install.md
----------------------------------------------------------------------
diff --git a/docs/docs/install/install.md b/docs/docs/install/install.md
index 03bc6f9..2dc4930 100644
--- a/docs/docs/install/install.md
+++ b/docs/docs/install/install.md
@@ -101,16 +101,6 @@ Configuration can be done by both environment variable(conf/zeppelin-env.sh) and
     <td>JVM Options</td>
 </table>
 
-#### Add jars, files
-
-spark.jars, spark.files property in *ZEPPELIN\_JAVA\_OPTS* adds jars, files into SparkContext.
-for example, 
-
-    ZEPPELIN_JAVA_OPTS="-Dspark.jars=/mylib1.jar,/mylib2.jar -Dspark.files=/myfile1.dat,/myfile2.dat"
-
-or you can do it dynamically with [dependency loader](../interpreter/spark.html#dependencyloading)
-
-
 ## Start/Stop
 #### Start Zeppelin
 

http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/76cdcd8c/docs/docs/interpreter/spark.md
----------------------------------------------------------------------
diff --git a/docs/docs/interpreter/spark.md b/docs/docs/interpreter/spark.md
index 06aee94..58fce0b 100644
--- a/docs/docs/interpreter/spark.md
+++ b/docs/docs/interpreter/spark.md
@@ -54,7 +54,10 @@ Note that scala / python environment shares the same SparkContext, SQLContext, Z
 <a name="dependencyloading"> </a>
 <br />
 <br />
-### Dependency loading
+### Dependency Management
+There are two ways to load external library in spark interpreter. First is using Zeppelin's %dep interpreter and second is loading Spark properties.
+
+#### 1. Dynamic Dependency Loading via %dep interpreter
 
 When your code requires external library, instead of doing download/copy/restart Zeppelin, you can easily do following jobs using %dep interpreter.
 
@@ -64,6 +67,7 @@ When your code requires external library, instead of doing download/copy/restart
  * Automatically add libraries to SparkCluster (You can turn off)
 
 Dep interpreter leverages scala environment. So you can write any Scala code here.
+Note that %dep interpreter should be used before %spark, %pyspark, %sql.
 
 Here's usages.
 
@@ -78,7 +82,7 @@ z.addRepo("RepoName").url("RepoURL")
 z.addRepo("RepoName").url("RepoURL").snapshot()
 
 // add credentials for private maven repository
-z.addRepo("RepoName").url("RepoURL).username("username").password("password")
+z.addRepo("RepoName").url("RepoURL").username("username").password("password")
 
 // add artifact from filesystem
 z.load("/path/to.jar")
@@ -101,7 +105,58 @@ z.load("groupId:artifactId:version").exclude("groupId:*")
 z.load("groupId:artifactId:version").local()
 ```
 
-Note that %dep interpreter should be used before %spark, %pyspark, %sql.
+
+<br />
+#### 2. Loading Spark Properties
+Once `SPARK_HOME` is set in `conf/zeppelin-env.sh`, Zeppelin uses `spark-submit` as spark interpreter runner. `spark-submit` supports two ways to load configurations. The first is command line options such as --master and Zeppelin can pass these options to `spark-submit` by exporting `SPARK_SUBMIT_OPTIONS` in conf/zeppelin-env.sh. Second is reading configuration options from `SPARK_HOME/conf/spark-defaults.conf`. Spark properites that user can set to distribute libraries are:
+
+<table class="table-configuration">
+  <tr>
+    <th>spark-defaults.conf</th>
+    <th>SPARK_SUBMIT_OPTIONS</th>
+    <th>Applicable Interpreter</th>
+    <th>Description</th>
+  </tr>
+  <tr>
+    <td>spark.jars</td>
+    <td>--jars</td>
+    <td>%spark</td>
+    <td>Comma-separated list of local jars to include on the driver and executor classpaths.</td>
+  </tr>
+  <tr>
+    <td>spark.jars.packages</td>
+    <td>--packages</td>
+    <td>%spark</td>
+    <td>Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be groupId:artifactId:version.</td>
+  </tr>
+  <tr>
+    <td>spark.files</td>
+    <td>--files</td>
+    <td>%pyspark</td>
+    <td>Comma-separated list of files to be placed in the working directory of each executor.</td>
+  </tr>
+</table>
+Note that adding jar to pyspark is only availabe via %dep interpreter at the moment
+
+<br/>
+Here are few examples:
+
+##### 0.5.5 and later
+* SPARK\_SUBMIT\_OPTIONS in conf/zeppelin-env.sh
+
+		export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.10:1.2.0 --jars /path/mylib1.jar,/path/mylib2.jar --files /path/mylib1.py,/path/mylib2.zip,/path/mylib3.egg"
+
+* SPARK_HOME/conf/spark-defaults.conf
+
+		spark.jars				/path/mylib1.jar,/path/mylib2.jar
+		spark.jars.packages		com.databricks:spark-csv_2.10:1.2.0
+		spark.files				/path/mylib1.py,/path/mylib2.egg,/path/mylib3.zip
+
+##### 0.5.0
+* ZEPPELIN\_JAVA\_OPTS in conf/zeppelin-env.sh
+
+		export ZEPPELIN_JAVA_OPTS="-Dspark.jars=/path/mylib1.jar,/path/mylib2.jar -Dspark.files=/path/myfile1.dat,/path/myfile2.dat"
+<br />
 
 
 <a name="zeppelincontext"> </a>