You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@zeppelin.apache.org by zj...@apache.org on 2019/10/30 13:45:27 UTC
svn commit: r1869172 [21/37] - in /zeppelin/site/docs/0.9.0-SNAPSHOT: ./
assets/themes/zeppelin/img/screenshots/ development/
development/contribution/ development/helium/ interpreter/ quickstart/
setup/basics/ setup/deployment/ setup/operation/ setup/...
Modified: zeppelin/site/docs/0.9.0-SNAPSHOT/search_data.json
URL: http://svn.apache.org/viewvc/zeppelin/site/docs/0.9.0-SNAPSHOT/search_data.json?rev=1869172&r1=1869171&r2=1869172&view=diff
==============================================================================
--- zeppelin/site/docs/0.9.0-SNAPSHOT/search_data.json (original)
+++ zeppelin/site/docs/0.9.0-SNAPSHOT/search_data.json Wed Oct 30 13:45:26 2019
@@ -3,7 +3,7 @@
"/interpreter/livy.html": {
"title": "Livy Interpreter for Apache Zeppelin",
- "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Livy Interpreter for Apache ZeppelinOverviewLivy is an open source REST interface for interacting with Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in YARN.Interactive Scala, Python and R shellsBatch submissions in Scala, Java, PythonMulti users can share the same server (impersonation support)Can be used for submitting jobs from anywhere with RESTDoes not require a
ny code change to your programsRequirementsAdditional requirements for the Livy interpreter are:Spark 1.3 or above.Livy server.ConfigurationWe added some common configurations for spark, and you can set any configuration you want.You can find all Spark configurations in here.And instead of starting property with spark. it should be replaced with livy.spark..Example: spark.driver.memory to livy.spark.driver.memory Property Default Description zeppelin.livy.url http://localhost:8998 URL where livy server is running zeppelin.livy.spark.sql.maxResult 1000 Max number of Spark SQL result to display. zeppelin.livy.spark.sql.field.truncate true Whether to truncate field values longer than 20 characters or not zeppelin.livy.session.create_timeout 120 Timeout in seconds for session creation zeppelin.livy.displayAppInfo true Whether to display app info zeppelin.livy.pull_status.interval.millis 1000 The int
erval for checking paragraph execution status livy.spark.driver.cores Driver cores. ex) 1, 2. livy.spark.driver.memory Driver memory. ex) 512m, 32g. livy.spark.executor.instances Executor instances. ex) 1, 4. livy.spark.executor.cores Num cores per executor. ex) 1, 4. livy.spark.executor.memory Executor memory per worker instance. ex) 512m, 32g. livy.spark.dynamicAllocation.enabled Use dynamic resource allocation. ex) True, False. livy.spark.dynamicAllocation.cachedExecutorIdleTimeout Remove an executor which has cached data blocks. livy.spark.dynamicAllocation.minExecutors Lower bound for the number of executors. livy.spark.dynamicAllocation.initialExecutors Initial number of executors to run. livy.spark.dynamicAllocation.maxExecutors Upper bound for the number of executors. livy.spark.jars.packages Adding extra libr
aries to livy interpreter zeppelin.livy.ssl.trustStore client trustStore file. Used when livy ssl is enabled zeppelin.livy.ssl.trustStorePassword password for trustStore file. Used when livy ssl is enabled zeppelin.livy.http.headers key_1: value_1; key_2: value_2 custom http headers when calling livy rest api. Each http header is separated by `;`, and each header is one key value pair where key value is separated by `:` We remove livy.spark.master in zeppelin-0.7. Because we sugguest user to use livy 0.3 in zeppelin-0.7. And livy 0.3 don&#39;t allow to specify livy.spark.master, it enfornce yarn-cluster mode.Adding External librariesYou can load dynamic library to livy interpreter by set livy.spark.jars.packages property to comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. The format for the coordinates should be groupId:artifactId:version.Example Property Example Description
livy.spark.jars.packages io.spray:spray-json_2.10:1.3.1 Adding extra libraries to livy interpreter How to useBasically, you can usespark%livy.sparksc.versionpyspark%livy.pysparkprint &quot;1&quot;sparkR%livy.sparkrhello &lt;- function( name ) { sprintf( &quot;Hello, %s&quot;, name );}hello(&quot;livy&quot;)ImpersonationWhen Zeppelin server is running with authentication enabled,then this interpreter utilizes Livyâs user impersonation featurei.e. sends extra parameter for creating and running a session (&quot;proxyUser&quot;: &quot;${loggedInUser}&quot;).This is particularly useful when multi users are sharing a Notebook server.Apply Zeppelin Dynamic FormsYou can leverage Zeppelin Dynamic Form. Form templates is only avalible for livy sql interpreter.%livy.sqlselect * from products where ${product_id=1}And creating dynamic formst programmatically is not feasible in livy interpreter, because ZeppelinContext i
s not available in livy interpreter.Shared SparkContextStarting from livy 0.5 which is supported by Zeppelin 0.8.0, SparkContext is shared between scala, python, r and sql.That means you can query the table via %livy.sql when this table is registered in %livy.spark, %livy.pyspark, $livy.sparkr.FAQLivy debugging: If you see any of these in error consoleConnect to livyhost:8998 [livyhost/127.0.0.1, livyhost/0:0:0:0:0:0:0:1] failed: Connection refusedLooks like the livy server is not up yet or the config is wrongException: Session not found, Livy server would have restarted, or lost session.The session would have timed out, you may need to restart the interpreter.Blacklisted configuration values in session config: spark.masterEdit conf/spark-blacklist.conf file in livy server and comment out #spark.master line.If you choose to work on livy in apps/spark/java directory in https://github.com/cloudera/hue,copy spark-user-configurable-options.template to spark-user-configurable-options.con
f file in livy server and comment out #spark.master.",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Livy Interpreter for Apache ZeppelinOverviewLivy is an open source REST interface for interacting with Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in YARN.Interactive Scala, Python and R shellsBatch submissions in Scala, Java, PythonMulti users can share the same server (impersonation support)Can be used for submitting jobs from anywhere with RESTDoes not require a
ny code change to your programsRequirementsAdditional requirements for the Livy interpreter are:Spark 1.3 or above.Livy server.ConfigurationWe added some common configurations for spark, and you can set any configuration you want.You can find all Spark configurations in here.And instead of starting property with spark. it should be replaced with livy.spark..Example: spark.driver.memory to livy.spark.driver.memory Property Default Description zeppelin.livy.url http://localhost:8998 URL where livy server is running zeppelin.livy.spark.sql.maxResult 1000 Max number of Spark SQL result to display. zeppelin.livy.spark.sql.field.truncate true Whether to truncate field values longer than 20 characters or not zeppelin.livy.session.create_timeout 120 Timeout in seconds for session creation zeppelin.livy.displayAppInfo true Whether to display app info zeppelin.livy.pull_status.interval.millis 1000 The int
erval for checking paragraph execution status livy.spark.driver.cores Driver cores. ex) 1, 2. livy.spark.driver.memory Driver memory. ex) 512m, 32g. livy.spark.executor.instances Executor instances. ex) 1, 4. livy.spark.executor.cores Num cores per executor. ex) 1, 4. livy.spark.executor.memory Executor memory per worker instance. ex) 512m, 32g. livy.spark.dynamicAllocation.enabled Use dynamic resource allocation. ex) True, False. livy.spark.dynamicAllocation.cachedExecutorIdleTimeout Remove an executor which has cached data blocks. livy.spark.dynamicAllocation.minExecutors Lower bound for the number of executors. livy.spark.dynamicAllocation.initialExecutors Initial number of executors to run. livy.spark.dynamicAllocation.maxExecutors Upper bound for the number of executors. livy.spark.jars.packages Adding extra libr
aries to livy interpreter zeppelin.livy.ssl.trustStore client trustStore file. Used when livy ssl is enabled zeppelin.livy.ssl.trustStorePassword password for trustStore file. Used when livy ssl is enabled zeppelin.livy.ssl.trustStoreType JKS type of truststore. Either JKS or PKCS12. zeppelin.livy.ssl.keyStore client keyStore file. Needed if Livy requires two way SSL authentication. zeppelin.livy.ssl.keyStorePassword password for keyStore file. zeppelin.livy.ssl.keyStoreType JKS type of keystore. Either JKS or PKCS12. zeppelin.livy.ssl.keyPassword password for key in the keyStore file. Defaults to zeppelin.livy.ssl.keyStorePassword. zeppelin.livy.http.headers key_1: value_1; key_2: value_2 custom http headers when calling livy rest api. Each http header is separated by `;`, and each header is one key value pair where key value is separated by `:` We remove livy.spar
k.master in zeppelin-0.7. Because we sugguest user to use livy 0.3 in zeppelin-0.7. And livy 0.3 don&#39;t allow to specify livy.spark.master, it enfornce yarn-cluster mode.Adding External librariesYou can load dynamic library to livy interpreter by set livy.spark.jars.packages property to comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. The format for the coordinates should be groupId:artifactId:version.Example Property Example Description livy.spark.jars.packages io.spray:spray-json_2.10:1.3.1 Adding extra libraries to livy interpreter How to useBasically, you can usespark%livy.sparksc.versionpyspark%livy.pysparkprint &quot;1&quot;sparkR%livy.sparkrhello &lt;- function( name ) { sprintf( &quot;Hello, %s&quot;, name );}hello(&quot;livy&quot;)ImpersonationWhen Zeppelin server is running with authentication enabled,then this interpreter utilizes Livyâs user im
personation featurei.e. sends extra parameter for creating and running a session (&quot;proxyUser&quot;: &quot;${loggedInUser}&quot;).This is particularly useful when multi users are sharing a Notebook server.Apply Zeppelin Dynamic FormsYou can leverage Zeppelin Dynamic Form. Form templates is only avalible for livy sql interpreter.%livy.sqlselect * from products where ${product_id=1}And creating dynamic formst programmatically is not feasible in livy interpreter, because ZeppelinContext is not available in livy interpreter.Shared SparkContextStarting from livy 0.5 which is supported by Zeppelin 0.8.0, SparkContext is shared between scala, python, r and sql.That means you can query the table via %livy.sql when this table is registered in %livy.spark, %livy.pyspark, $livy.sparkr.FAQLivy debugging: If you see any of these in error consoleConnect to livyhost:8998 [livyhost/127.0.0.1, livyhost/0:0:0:0:0:0:0:1] failed: Connection refusedLooks like the livy server is not u
p yet or the config is wrongException: Session not found, Livy server would have restarted, or lost session.The session would have timed out, you may need to restart the interpreter.Blacklisted configuration values in session config: spark.masterEdit conf/spark-blacklist.conf file in livy server and comment out #spark.master line.If you choose to work on livy in apps/spark/java directory in https://github.com/cloudera/hue,copy spark-user-configurable-options.template to spark-user-configurable-options.conf file in livy server and comment out #spark.master.",
"url": " /interpreter/livy.html",
"group": "interpreter",
"excerpt": "Livy is an open source REST interface for interacting with Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in YARN."
@@ -25,7 +25,7 @@
"/interpreter/markdown.html": {
"title": "Markdown Interpreter for Apache Zeppelin",
- "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Markdown Interpreter for Apache ZeppelinOverviewMarkdown is a plain text formatting syntax designed so that it can be converted to HTML.Apache Zeppelin uses pegdown and markdown4j as markdown parsers.In Zeppelin notebook, you can use %md in the beginning of a paragraph to invoke the Markdown interpreter and generate static html from Markdown plain text.In Zeppelin, Markdown interpreter is enabled by default and uses the pegdown par
ser.ExampleThe following example demonstrates the basic usage of Markdown in a Zeppelin notebook.Mathematical expressionMarkdown interpreter leverages %html display system internally. That means you can mix mathematical expressions with markdown syntax. For more information, please see Mathematical Expression section.Configuration Name Default Value Description markdown.parser.type pegdown Markdown Parser Type. Available values: pegdown, markdown4j. Pegdown Parserpegdown parser provides github flavored markdown.pegdown parser provides YUML and Websequence plugins also. Markdown4j ParserSince pegdown parser is more accurate and provides much more markdown syntax markdown4j option might be removed later. But keep this parser for the backward compatibility.",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Markdown Interpreter for Apache ZeppelinOverviewMarkdown is a plain text formatting syntax designed so that it can be converted to HTML.Apache Zeppelin uses flexmark, pegdown and markdown4j as markdown parsers.In Zeppelin notebook, you can use %md in the beginning of a paragraph to invoke the Markdown interpreter and generate static html from Markdown plain text.In Zeppelin, Markdown interpreter is enabled by default and uses the p
egdown parser.ExampleThe following example demonstrates the basic usage of Markdown in a Zeppelin notebook.Mathematical expressionMarkdown interpreter leverages %html display system internally. That means you can mix mathematical expressions with markdown syntax. For more information, please see Mathematical Expression section.Configuration Name Default Value Description markdown.parser.type flexmark Markdown Parser Type. Available values: flexmark, pegdown, markdown4j. Flexmark parser (Default Markdown Parser)CommonMark/Markdown Java parser with source level AST.flexmark parser provides YUML and Websequence extensions also.Pegdown Parserpegdown parser provides github flavored markdown. Although still one of the most popular Markdown parsing libraries for the JVM, pegdown has reached its end of life.The project is essentially unmaintained with tickets piling up and crucial bugs not being fixed.pegdown&#39;s parsing performance isn&#39;t great. But k
eep this parser for the backward compatibility.Markdown4j ParserSince pegdown parser is more accurate and provides much more markdown syntax markdown4j option might be removed later. But keep this parser for the backward compatibility.",
"url": " /interpreter/markdown.html",
"group": "interpreter",
"excerpt": "Markdown is a plain text formatting syntax designed so that it can be converted to HTML. Apache Zeppelin uses markdown4j."
@@ -34,6 +34,17 @@
+ "/interpreter/submarine.html": {
+ "title": "Apache Hadoop Submarine Interpreter for Apache Zeppelin",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Submarine Interpreter for Apache ZeppelinHadoop Submarine is the latest machine learning framework subproject in the Hadoop 3.1 release. It allows Hadoop to support Tensorflow, MXNet, Caffe, Spark, etc. A variety of deep learning frameworks provide a full-featured system framework for machine learning algorithm development, distributed model training, model management, and model publishing, combined with hadoop&#39;s intrinsic
data storage and data processing capabilities to enable data scientists to Good mining and the value of the data.A deep learning algorithm project requires data acquisition, data processing, data cleaning, interactive visual programming adjustment parameters, algorithm testing, algorithm publishing, algorithm job scheduling, offline model training, model online services and many other processes and processes. Zeppelin is a web-based notebook that supports interactive data analysis. You can use SQL, Scala, Python, etc. to make data-driven, interactive, collaborative documents.You can use the more than 20 interpreters in zeppelin (for example: spark, hive, Cassandra, Elasticsearch, Kylin, HBase, etc.) to collect data, clean data, feature extraction, etc. in the data in Hadoop before completing the machine learning model training. The data preprocessing process.By integrating submarine in zeppelin, we use zeppelin&#39;s data discovery, data analysis and data visualization and coll
aboration capabilities to visualize the results of algorithm development and parameter adjustment during machine learning model training.ArchitectureAs shown in the figure above, how the Submarine develops and models the machine learning algorithms through Zeppelin is explained from the system architecture.After installing and deploying Hadoop 3.1+ and Zeppelin, submarine will create a fully separate Zeppelin Submarine interpreter Docker container for each user in YARN. This container contains the development and runtime environment for Tensorflow. Zeppelin Server connects to the Zeppelin Submarine interpreter Docker container in YARN. allows algorithmic engineers to perform algorithm development and data visualization in Tensorflow&#39;s stand-alone environment in Zeppelin Notebook.After the algorithm is developed, the algorithm engineer can submit the algorithm directly to the YARN in offline transfer training in Zeppelin, real-time demonstration of model training with Submari
ne&#39;s TensorBoard for each algorithm engineer.You can not only complete the model training of the algorithm, but you can also use the more than twenty interpreters in Zeppelin. Complete the data preprocessing of the model, For example, you can perform data extraction, filtering, and feature extraction through the Spark interpreter in Zeppelin in the Algorithm Note.In the future, you can also use Zeppelin&#39;s upcoming Workflow workflow orchestration service. You can complete Spark, Hive data processing and Tensorflow model training in one Note. It is organized into a workflow through visualization, etc., and the scheduling of jobs is performed in the production environment.OverviewAs shown in the figure above, from the internal implementation, how Submarine combines Zeppelin&#39;s machine learning algorithm development and model training.The algorithm engineer created a Tensorflow notebook (left image) in Zeppelin by using Submarine interpreter.It is important to not
e that you need to complete the development of the entire algorithm in a Note.You can use Spark for data preprocessing in some of the paragraphs in Note.Use Python for algorithm development and debugging of Tensorflow in other paragraphs of notebook, Submarine creates a Zeppelin Submarine Interpreter Docker Container for you in YARN, which contains the following features and services:Shell Command line toolï¼Allows you to view the system environment in the Zeppelin Submarine Interpreter Docker Container, Install the extension tools you need or the Python dependencies.Kerberos libï¼Allows you to perform kerberos authentication and access to Hadoop clusters with Kerberos authentication enabled.Tensorflow environmentï¼Allows you to develop tensorflow algorithm code.Python environmentï¼Allows you to develop tensorflow code.Complete a complete algorithm development with a Note in Zeppelin. If this algorithm contains multiple modules, You can write different algorithm modu
les in multiple paragraphs in Note. The title of each paragraph is the name of the algorithm module. The content of the paragraph is the code content of this algorithm module.HDFS Clientï¼Zeppelin Submarine Interpreter will automatically submit the algorithm code you wrote in Note to HDFS.Submarine interpreter Docker Image It is Submarine that provides you with an image file that supports Tensorflow (CPU and GPU versions).And installed the algorithm library commonly used by Python.You can also install other development dependencies you need on top of the base image provided by Submarine.When you complete the development of the algorithm module, You can do this by creating a new paragraph in Note and typing %submarine dashboard. Zeppelin will create a Submarine Dashboard. The machine learning algorithm written in this Note can be submitted to YARN as a JOB by selecting the JOB RUN command option in the Control Panel. Create a Tensorflow Model Training Docker Container, The contai
ner contains the following sections:Tensorflow environmentHDFS Client Will automatically download the algorithm file Mount from HDFS into the container for distributed model training. Mount the algorithm file to the Work Dir path of the container.Submarine Tensorflow Docker Image There is Submarine that provides you with an image file that supports Tensorflow (CPU and GPU versions). And installed the algorithm library commonly used by Python. You can also install other development dependencies you need on top of the base image provided by Submarine. Name Class Description %submarine SubmarineInterpreter Provides interpreter for Apache Submarine dashboard %submarine.sh SubmarineShellInterpreter Provides interpreter for Apache Submarine shell %submarine.python PySubmarineInterpreter Provides interpreter for Apache Submarine python Submarine shellAfter creating a Note with Submarine Interpreter in Zeppelin, You can add a paragraph to N
ote if you need it. Using the %submarine.sh identifier, you can use the Shell command to perform various operations on the Submarine Interpreter Docker Container, such as:View the Pythone version in the ContainerView the system environment of the ContainerInstall the dependencies you need yourselfKerberos certification with kinitUse Hadoop in Container for HDFS operations, etc.Submarine pythonYou can add one or more paragraphs to Note. Write the algorithm module for Tensorflow in Python using the %submarine.python identifier.Submarine DashboardAfter writing the Tensorflow algorithm by using %submarine.python, You can add a paragraph to Note. Enter the %submarine dashboard and execute it. Zeppelin will create a Submarine Dashboard.With Submarine Dashboard you can do all the operational control of Submarine, for example:Usageï¼Display Submarine&#39;s command description to help developers locate problems.Refreshï¼Zeppelin will erase all your input in the Dashboard.Tensorbo
ardï¼You will be redirected to the Tensorboard WEB system created by Submarine for each user. With Tensorboard you can view the real-time status of the Tensorflow model training in real time.CommandJOB RUNï¼Selecting JOB RUN will display the parameter input interface for submitting JOB. Name Description Checkpoint Path/td> Submarine sets up a separate Checkpoint path for each user's Note for Tensorflow training. Saved the training data for this Note history, Used to train the output of model data, Tensorboard uses the data in this path for model presentation. Users cannot modify it. For example: `hdfs://cluster1/...` , The environment variable name for Checkpoint Path is `%checkpoint_path%`, You can use `%checkpoint_path%` instead of the input value in Data Path in `PS Launch Cmd` and `Worker Launch Cmd`. Input Path The user specifies the data data directory of the Tensorflow algorithm. Only HDFS-enabled directories are supported. The envir
onment variable name for Data Path is `%input_path%`, You can use `%input_path%` instead of the input value in Data Path in `PS Launch Cmd` and `Worker Launch Cmd`. PS Launch Cmd Tensorflow Parameter services launch commandï¼ä¾å¦ï¼`python cifar10_main.py --data-dir=%input_path% --job-dir=%checkpoint_path% --num-gpus=0 ...` Worker Launch Cmd Tensorflow Worker services launch commandï¼ä¾å¦ï¼`python cifar10_main.py --data-dir=%input_path% --job-dir=%checkpoint_path% --num-gpus=1 ...` JOB STOPYou can choose to execute the JOB STOP command. Stop a Tensorflow model training task that has been submitted and is runningTENSORBOARD STARTYou can choose to execute the TENSORBOARD START command to create your TENSORBOARD Docker Container.TENSORBOARD STOPYou can choose to execute the TENSORBOARD STOP command to stop and destroy your TENSORBOARD Docker Container.Run Commandï¼Execute the action command of your choiceClean Chechkpointï¼Che
cking this option will clear the data in this Note&#39;s Checkpoint Path before each JOB RUN execution.ConfigurationZeppelin Submarine interpreter provides the following properties to customize the Submarine interpreter Attribute name Attribute value Description DOCKER_CONTAINER_TIME_ZONE Etc/UTC Set the time zone in the container | DOCKER_HADOOP_HDFS_HOME /hadoop-3.1-0 Hadoop path in the following 3 imagesï¼SUBMARINE_INTERPRETER_DOCKER_IMAGEãtf.parameter.services.docker.imageãtf.worker.services.docker.imageï¼ | DOCKER_JAVA_HOME /opt/java JAVA path in the following 3 imagesï¼SUBMARINE_INTERPRETER_DOCKER_IMAGEãtf.parameter.services.docker.imageãtf.worker.services.docker.imageï¼ | HADOOP_YARN_SUBMARINE_JAR Path to the Submarine JAR package in the Hadoop-3.1+ release installed on the Zeppelin server | INTERPRETER_LAUNCH_MODE local/yarn Run the S
ubmarine interpreter instance in local or YARN local mainly for submarine interpreter development and debugging YARN mode for production environment | SUBMARINE_HADOOP_CONF_DIR Set the HADOOP-CONF path to support multiple Hadoop cluster environments SUBMARINE_HADOOP_HOME Hadoop-3.1+ above path installed on the Zeppelin server SUBMARINE_HADOOP_KEYTAB Keytab file path for a hadoop cluster with kerberos authentication turned on SUBMARINE_HADOOP_PRINCIPAL PRINCIPAL information for the keytab file of the hadoop cluster with kerberos authentication turned on SUBMARINE_INTERPRETER_DOCKER_IMAGE At INTERPRETER_LAUNCH_MODE=yarn, Submarine uses this image to create a Zeppelin Submarine interpreter container to create an algorithm development environment for the user. | docker.container.network YARN's Docker network name machinelearing.distributed.enable Whether to use the model training of the
distributed mode JOB RUN submission shell.command.timeout.millisecs 60000 Execute timeout settings for shell commands in the Submarine interpreter container submarine.algorithm.hdfs.path Save machine-based algorithms developed using Submarine interpreter to HDFS as files submarine.yarn.queue root.default Submarine submits model training YARN queue name tf.checkpoint.path Tensorflow checkpoint path, Each user will create a user's checkpoint secondary path using the username under this path. Each algorithm submitted by the user will create a checkpoint three-level path using the note id (the user's Tensorboard uses the checkpoint data in this path for visual display) tf.parameter.services.cpu Number of CPU cores applied to Tensorflow parameter services when Submarine submits model distributed training tf.parameter.services.docker.image Submarine creates a mirror for Tensorflow parameter services
when submitting model distributed training tf.parameter.services.gpu GPU cores applied to Tensorflow parameter services when Submarine submits model distributed training tf.parameter.services.memory 2G Memory resources requested by Tensorflow parameter services when Submarine submits model distributed training tf.parameter.services.num Number of Tensorflow parameter services used by Submarine to submit model distributed training tf.tensorboard.enable true Create a separate Tensorboard for each user tf.worker.services.cpu Submarine submits model resources for Tensorflow worker services when submitting model training tf.worker.services.docker.image Submarine creates a mirror for Tensorflow worker services when submitting model distributed training tf.worker.services.gpu Submarine submits GPU resources for Tensorflow worker services when submitting model training tf.worker.services.m
emory Submarine submits model resources for Tensorflow worker services when submitting model training tf.worker.services.num Number of Tensorflow worker services used by Submarine to submit model distributed training yarn.webapp.http.address http://hadoop:8088 YARN web ui address zeppelin.interpreter.rpc.portRange 29914 You need to export this port in the SUBMARINE_INTERPRETER_DOCKER_IMAGE configuration image. RPC communication for Zeppelin Server and Submarine interpreter containers zeppelin.ipython.grpc.message_size 33554432 Message size setting for IPython grpc in Submarine interpreter container zeppelin.ipython.launch.timeout 30000 IPython execution timeout setting in Submarine interpreter container zeppelin.python python Execution path of python in Submarine interpreter container zeppelin.python.maxResult 10000 The maximum number of python execution results returned from the Subma
rine interpreter container zeppelin.python.useIPython false IPython is currently not supported and must be false zeppelin.submarine.auth.type simple/kerberos Has Hadoop turned on kerberos authentication? Docker imagesThe docker images file is stored in the zeppelin/scripts/docker/submarine directory.submarine interpreter cpu versionsubmarine interpreter gpu versiontensorflow 1.10 &amp; hadoop 3.1.2 cpu versiontensorflow 1.10 &amp; hadoop 3.1.2 gpu versionChange Log0.1.0 (Zeppelin 0.9.0) :Support distributed or standolone tensorflow model training.Support submarine interpreter running local.Support submarine interpreter running YARN.Support Docker on YARN-3.3.0, Plan compatible with lower versions of yarn.Bugs &amp; ContactsSubmarine interpreter BUGIf you encounter a bug for this interpreter, please create a sub JIRA ticket on ZEPPELIN-3856.Submarine Running problemIf you encounter a problem for Submarine runtime, please create a ISSUE on hadoop
-submarine-ecosystem.YARN Submarine BUGIf you encounter a bug for Yarn Submarine, please create a JIRA ticket on SUBMARINE.DependencyYARNSubmarine currently need to run on Hadoop 3.3+The hadoop version of the hadoop submarine team git repository is periodically submitted to the code repository of the hadoop.The version of the git repository for the hadoop submarine team will be faster than the hadoop version release cycle.You can use the hadoop version of the hadoop submarine team git repository.Submarine runtime environmentyou can use Submarine-installer https://github.com/hadoopsubmarine, Deploy Docker and network environments.MoreHadoop Submarine Project: https://hadoop.apache.org/submarineYoutube Submarine Channel: https://www.youtube.com/channel/UC4JBt8Y8VJ0BW0IM9YpdCyQ",
+ "url": " /interpreter/submarine.html",
+ "group": "interpreter",
+ "excerpt": "Hadoop Submarine is the latest machine learning framework subproject in the Hadoop 3.1 release. It allows Hadoop to support Tensorflow, MXNet, Caffe, Spark, etc."
+ }
+ ,
+
+
+
"/interpreter/mahout.html": {
"title": "Mahout Interpreter for Apache Zeppelin",
"content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Apache Mahout Interpreter for Apache ZeppelinInstallationApache Mahout is a collection of packages that enable machine learning and matrix algebra on underlying engines such as Apache Flink or Apache Spark. A convenience script for creating and configuring two Mahout enabled interpreters exists. The %sparkMahout and %flinkMahout interpreters do not exist by default but can be easily created using this script. Easy InstallationTo
quickly and easily get up and running using Apache Mahout, run the following command from the top-level directory of the Zeppelin install:python scripts/mahout/add_mahout.pyThis will create the %sparkMahout and %flinkMahout interpreters, and restart Zeppelin.Advanced InstallationThe add_mahout.py script contains several command line arguments for advanced users. Argument Description Example --zeppelin_home This is the path to the Zeppelin installation. This flag is not needed if the script is run from the top-level installation directory or from the zeppelin/scripts/mahout directory. /path/to/zeppelin --mahout_home If the user has already installed Mahout, this flag can set the path to MAHOUT_HOME. If this is set, downloading Mahout will be skipped. /path/to/mahout_home --restart_later Restarting is necessary for updates to take effect. By default the script will restart Zeppelin for you. Restart will be skipped if this flag is set.
NA --force_download This flag will force the script to re-download the binary even if it already exists. This is useful for previously failed downloads. NA --overwrite_existing This flag will force the script to overwrite existing %sparkMahout and %flinkMahout interpreters. Useful when you want to just start over. NA NOTE 1: Apache Mahout at this time only supports Spark 1.5 and Spark 1.6 and Scala 2.10. If the user is using another version of Spark (e.g. 2.0), the %sparkMahout will likely not work. The %flinkMahout interpreter will still work and the user is encouraged to develop with that engine as the code can be ported via copy and paste, as is evidenced by the tutorial notebook.NOTE 2: If using Apache Flink in cluster mode, the following libraries will also need to be coppied to ${FLINK_HOME}/lib- mahout-math-0.12.2.jar- mahout-math-scala2.10-0.12.2.jar- mahout-flink2.10-0.12.2.jar- mahout-hdfs-0.12.2.jar- com.google.guava:guava:14.0.1Ov
erviewThe Apache Mahout⢠project&#39;s goal is to build an environment for quickly creating scalable performant machine learning applications.Apache Mahout software provides three major features:A simple and extensible programming environment and framework for building scalable algorithmsA wide variety of premade algorithms for Scala + Apache Spark, H2O, Apache FlinkSamsara, a vector math experimentation environment with R-like syntax which works at scaleIn other words:Apache Mahout provides a unified API for quickly creating machine learning algorithms on a variety of engines.How to useWhen starting a session with Apache Mahout, depending on which engine you are using (Spark or Flink), a few imports must be made and a Distributed Context must be declared. Copy and paste the following code and run once to get started.Flink%flinkMahoutimport org.apache.flink.api.scala._import org.apache.mahout.math.drm._import org.apache.mahout.math.drm.RLikeDrmOps._import org.apache.mahout
.flinkbindings._import org.apache.mahout.math._import scalabindings._import RLikeOps._implicit val ctx = new FlinkDistributedContext(benv)Spark%sparkMahoutimport org.apache.mahout.math._import org.apache.mahout.math.scalabindings._import org.apache.mahout.math.drm._import org.apache.mahout.math.scalabindings.RLikeOps._import org.apache.mahout.math.drm.RLikeDrmOps._import org.apache.mahout.sparkbindings._implicit val sdc: org.apache.mahout.sparkbindings.SparkDistributedContext = sc2sdc(sc)Same Code, Different EnginesAfter importing and setting up the distributed context, the Mahout R-Like DSL is consistent across engines. The following code will run in both %flinkMahout and %sparkMahoutval drmData = drmParallelize(dense( (2, 2, 10.5, 10, 29.509541), // Apple Cinnamon Cheerios (1, 2, 12, 12, 18.042851), // Cap&#39;n&#39;Crunch (1, 1, 12, 13, 22.736446), // Cocoa Puffs (2, 1, 11, 13, 32.207582), // Froot Loops (1, 2, 12, 11, 21.871292), // Honey Graham Ohs (
2, 1, 16, 8, 36.187559), // Wheaties Honey Gold (6, 2, 17, 1, 50.764999), // Cheerios (3, 2, 13, 7, 40.400208), // Clusters (3, 3, 13, 4, 45.811716)), numPartitions = 2)drmData.collect(::, 0 until 4)val drmX = drmData(::, 0 until 4)val y = drmData.collect(::, 4)val drmXtX = drmX.t %*% drmXval drmXty = drmX.t %*% yval XtX = drmXtX.collectval Xty = drmXty.collect(::, 0)val beta = solve(XtX, Xty)Leveraging Resource Pools and R for VisualizationResource Pools are a powerful Zeppelin feature that lets us share information between interpreters. A fun trick is to take the output of our work in Mahout and analyze it in other languages.Setting up a Resource Pool in FlinkIn Spark based interpreters resource pools are accessed via the ZeppelinContext API. To put and get things from the resource pool one can be done simpleval myVal = 1z.put(&quot;foo&quot;, myVal)val myFetchedVal = z.get(&quot;foo&quot;)To add this functionality to a Flink based interpreter we
declare the follwoing%flinkMahoutimport org.apache.zeppelin.interpreter.InterpreterContextval z = InterpreterContext.get().getResourcePool()Now we can access the resource pool in a consistent manner from the %flinkMahout interpreter.Passing a variable from Mahout to R and PlottingIn this simple example, we use Mahout (on Flink or Spark, the code is the same) to create a random matrix and then take the Sin of each element. We then randomly sample the matrix and create a tab separated string. Finally we pass that string to R where it is read as a .tsv file, and a DataFrame is created and plotted using native R plotting libraries.val mxRnd = Matrices.symmetricUniformView(5000, 2, 1234)val drmRand = drmParallelize(mxRnd)val drmSin = drmRand.mapBlock() {case (keys, block) =&gt; val blockB = block.like() for (i &lt;- 0 until block.nrow) { blockB(i, 0) = block(i, 0) blockB(i, 1) = Math.sin((block(i, 0) * 8)) } keys -&gt; blockB}z.put(&quot;sinDrm&quot;, org
.apache.mahout.math.drm.drmSampleToTSV(drmSin, 0.85))And then in an R paragraph...%spark.r {&quot;imageWidth&quot;: &quot;400px&quot;}library(&quot;ggplot2&quot;)sinStr = z.get(&quot;flinkSinDrm&quot;)data &lt;- read.table(text= sinStr, sep=&quot;t&quot;, header=FALSE)plot(data, col=&quot;red&quot;)",
@@ -47,7 +58,7 @@
"/interpreter/spark.html": {
"title": "Apache Spark Interpreter for Apache Zeppelin",
- "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Spark Interpreter for Apache ZeppelinOverviewApache Spark is a fast and general-purpose cluster computing system.It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs.Apache Spark is supported in Zeppelin with Spark interpreter group which consists of below five interpreters. Name Class Description %spark SparkInterpreter Creates a SparkConte
xt and provides a Scala environment %spark.pyspark PySparkInterpreter Provides a Python environment %spark.r SparkRInterpreter Provides an R environment with SparkR support %spark.sql SparkSQLInterpreter Provides a SQL environment %spark.dep DepInterpreter Dependency loader ConfigurationThe Spark interpreter can be configured with properties provided by Zeppelin.You can also set other Spark properties which are not listed in the table. For a list of additional properties, refer to Spark Available Properties. Property Default Description args Spark commandline args master local[*] Spark master uri. ex) spark://masterhost:7077 spark.app.name Zeppelin The name of spark application. spark.cores.max Total number of cores to use. Empty value uses all available core. spark.executor.memory 1g Executor memory per worker instance. ex) 512m, 32g zeppelin.dep
.additionalRemoteRepository spark-packages, http://dl.bintray.com/spark-packages/maven, false; A list of id,remote-repository-URL,is-snapshot; for each remote repository. zeppelin.dep.localrepo local-repo Local repository for dependency loader PYSPARKPYTHON python Python binary executable to use for PySpark in both driver and workers (default is python). Property spark.pyspark.python take precedence if it is set PYSPARKDRIVERPYTHON python Python binary executable to use for PySpark in driver only (default is PYSPARKPYTHON). Property spark.pyspark.driver.python take precedence if it is set zeppelin.spark.concurrentSQL false Execute multiple SQL concurrently if set true. zeppelin.spark.concurrentSQL.max 10 Max number of SQL concurrently executed zeppelin.spark.maxResult 1000 Max number of Spark SQL result to display. zeppelin.spark.printREPLOutput true Print REPL o
utput zeppelin.spark.useHiveContext true Use HiveContext instead of SQLContext if it is true. zeppelin.spark.importImplicit true Import implicits, UDF collection, and sql if set true. zeppelin.spark.enableSupportedVersionCheck true Do not change - developer only setting, not for production use zeppelin.spark.sql.interpolation false Enable ZeppelinContext variable interpolation into paragraph text zeppelin.spark.uiWebUrl Overrides Spark UI default URL. Value should be a full URL (ex: http://{hostName}/{uniquePath} zeppelin.spark.scala.color true Whether to enable color output of spark scala interpreter Without any configuration, Spark interpreter works out of box in local mode. But if you want to connect to your Spark cluster, you&#39;ll need to follow below two simple steps.1. Export SPARK_HOMEIn conf/zeppelin-env.sh, export SPARK_HOME environment variable with your Spark installation path.For example,expo
rt SPARK_HOME=/usr/lib/sparkYou can optionally set more environment variables# set hadoop conf direxport HADOOP_CONF_DIR=/usr/lib/hadoop# set options to pass spark-submit commandexport SPARK_SUBMIT_OPTIONS=&quot;--packages com.databricks:spark-csv_2.10:1.2.0&quot;# extra classpath. e.g. set classpath for hive-site.xmlexport ZEPPELIN_INTP_CLASSPATH_OVERRIDES=/etc/hive/confFor Windows, ensure you have winutils.exe in %HADOOP_HOME%bin. Please see Problems running Hadoop on Windows for the details.2. Set master in Interpreter menuAfter start Zeppelin, go to Interpreter menu and edit master property in your Spark interpreter setting. The value may vary depending on your Spark cluster deployment type.For example,local[*] in local modespark://master:7077 in standalone clusteryarn-client in Yarn client modeyarn-cluster in Yarn cluster modemesos://host:5050 in Mesos clusterThat&#39;s it. Zeppelin will work with any version of Spark and any deployment type without rebuilding Zeppe
lin in this way.For the further information about Spark &amp; Zeppelin version compatibility, please refer to &quot;Available Interpreters&quot; section in Zeppelin download page.Note that without exporting SPARK_HOME, it&#39;s running in local mode with included version of Spark. The included version may vary depending on the build profile.3. Yarn modeZeppelin support both yarn client and yarn cluster mode (yarn cluster mode is supported from 0.8.0). For yarn mode, you must specify SPARK_HOME &amp; HADOOP_CONF_DIR.You can either specify them in zeppelin-env.sh, or in interpreter setting page. Specifying them in zeppelin-env.sh means you can use only one version of spark &amp; hadoop. Specifying themin interpreter setting page means you can use multiple versions of spark &amp; hadoop in one zeppelin instance.4. New Version of SparkInterpreterThere&#39;s one new version of SparkInterpreter with better spark support and code completion starting from Zep
pelin 0.8.0. We enable it by default, but user can still use the old version of SparkInterpreter by setting zeppelin.spark.useNew as false in its interpreter setting.SparkContext, SQLContext, SparkSession, ZeppelinContextSparkContext, SQLContext and ZeppelinContext are automatically created and exposed as variable names sc, sqlContext and z, respectively, in Scala, Python and R environments.Staring from 0.6.1 SparkSession is available as variable spark when you are using Spark 2.x.Note that Scala/Python/R environment shares the same SparkContext, SQLContext and ZeppelinContext instance. How to pass property to SparkConfThere&#39;re 2 kinds of properties that would be passed to SparkConfStandard spark property (prefix with spark.). e.g. spark.executor.memory will be passed to SparkConfNon-standard spark property (prefix with zeppelin.spark.). e.g. zeppelin.spark.property_1, property_1 will be passed to SparkConfDependency ManagementThere are two ways to load external libraries i
n Spark interpreter. First is using interpreter setting menu and second is loading Spark properties.1. Setting Dependencies via Interpreter SettingPlease see Dependency Management for the details.2. Loading Spark PropertiesOnce SPARK_HOME is set in conf/zeppelin-env.sh, Zeppelin uses spark-submit as spark interpreter runner. spark-submit supports two ways to load configurations.The first is command line options such as --master and Zeppelin can pass these options to spark-submit by exporting SPARK_SUBMIT_OPTIONS in conf/zeppelin-env.sh. Second is reading configuration options from SPARK_HOME/conf/spark-defaults.conf. Spark properties that user can set to distribute libraries are: spark-defaults.conf SPARK_SUBMIT_OPTIONS Description spark.jars --jars Comma-separated list of local jars to include on the driver and executor classpaths. spark.jars.packages --packages Comma-separated list of maven coordinates of jars to include on the driver and execu
tor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be groupId:artifactId:version. spark.files --files Comma-separated list of files to be placed in the working directory of each executor. Here are few examples:SPARK_SUBMIT_OPTIONS in conf/zeppelin-env.shexport SPARK_SUBMIT_OPTIONS=&quot;--packages com.databricks:spark-csv_2.10:1.2.0 --jars /path/mylib1.jar,/path/mylib2.jar --files /path/mylib1.py,/path/mylib2.zip,/path/mylib3.egg&quot;SPARK_HOME/conf/spark-defaults.confspark.jars /path/mylib1.jar,/path/mylib2.jarspark.jars.packages com.databricks:spark-csv_2.10:1.2.0spark.files /path/mylib1.py,/path/mylib2.egg,/path/mylib3.zip3. Dynamic Dependency Loading via %spark.dep interpreterNote: %spark.dep interpreter loads libraries to %spark and %spark.pyspark but not to %spark.sql interpreter. So we recommend you to use the first opt
ion instead.When your code requires external library, instead of doing download/copy/restart Zeppelin, you can easily do following jobs using %spark.dep interpreter.Load libraries recursively from maven repositoryLoad libraries from local filesystemAdd additional maven repositoryAutomatically add libraries to SparkCluster (You can turn off)Dep interpreter leverages Scala environment. So you can write any Scala code here.Note that %spark.dep interpreter should be used before %spark, %spark.pyspark, %spark.sql.Here&#39;s usages.%spark.depz.reset() // clean up previously added artifact and repository// add maven repositoryz.addRepo(&quot;RepoName&quot;).url(&quot;RepoURL&quot;)// add maven snapshot repositoryz.addRepo(&quot;RepoName&quot;).url(&quot;RepoURL&quot;).snapshot()// add credentials for private maven repositoryz.addRepo(&quot;RepoName&quot;).url(&quot;RepoURL&quot;).username(&quot;username&quot;).password(&quot;p
assword&quot;)// add artifact from filesystemz.load(&quot;/path/to.jar&quot;)// add artifact from maven repository, with no dependencyz.load(&quot;groupId:artifactId:version&quot;).excludeAll()// add artifact recursivelyz.load(&quot;groupId:artifactId:version&quot;)// add artifact recursively except comma separated GroupID:ArtifactId listz.load(&quot;groupId:artifactId:version&quot;).exclude(&quot;groupId:artifactId,groupId:artifactId, ...&quot;)// exclude with patternz.load(&quot;groupId:artifactId:version&quot;).exclude(*)z.load(&quot;groupId:artifactId:version&quot;).exclude(&quot;groupId:artifactId:*&quot;)z.load(&quot;groupId:artifactId:version&quot;).exclude(&quot;groupId:*&quot;)// local() skips adding artifact to spark clusters (skipping sc.addJar())z.load(&quot;groupId:artifactId:version&quot;).local()ZeppelinContextZeppelin automatically injects ZeppelinContext as variable z in your
Scala/Python environment. ZeppelinContext provides some additional functions and utilities.See Zeppelin-Context for more details.Matplotlib Integration (pyspark)Both the python and pyspark interpreters have built-in support for inline visualization using matplotlib,a popular plotting library for python. More details can be found in the python interpreter documentation,since matplotlib support is identical. More advanced interactive plotting can be done with pyspark throughutilizing Zeppelin&#39;s built-in Angular Display System, as shown below:Running spark sql concurrentlyBy default, each sql statement would run sequentially in %spark.sql. But you can run them concurrently by following setup.set zeppelin.spark.concurrentSQL to true to enable the sql concurrent feature, underneath zeppelin will change to use fairscheduler for spark. And also set zeppelin.spark.concurrentSQL.max to control the max number of sql statements running concurrently.configure pools by creating fairsche
duler.xml under your SPARK_CONF_DIR, check the offical spark doc Configuring Pool Propertiesset pool property via setting paragraph property. e.g.%spark(pool=pool1)sql statementThis feature is available for both all versions of scala spark, pyspark. For sparkr, it is only available starting from 2.3.0.Interpreter setting optionYou can choose one of shared, scoped and isolated options wheh you configure Spark interpreter.Spark interpreter creates separated Scala compiler per each notebook but share a single SparkContext in scoped mode (experimental).It creates separated SparkContext per each notebook in isolated mode.IPython supportBy default, zeppelin would use IPython in pyspark when IPython is available, Otherwise it would fall back to the original PySpark implementation.If you don&#39;t want to use IPython, then you can set zeppelin.pyspark.useIPython as false in interpreter setting. For the IPython features, you can refer docPython InterpreterSetting up Zeppelin with Kerbero
sLogical setup with Zeppelin, Kerberos Key Distribution Center (KDC), and Spark on YARN:Configuration SetupOn the server that Zeppelin is installed, install Kerberos client modules and configuration, krb5.conf.This is to make the server communicate with KDC.Set SPARK_HOME in [ZEPPELIN_HOME]/conf/zeppelin-env.sh to use spark-submit(Additionally, you might have to set export HADOOP_CONF_DIR=/etc/hadoop/conf)Add the two properties below to Spark configuration ([SPARK_HOME]/conf/spark-defaults.conf):spark.yarn.principalspark.yarn.keytabNOTE: If you do not have permission to access for the above spark-defaults.conf file, optionally, you can add the above lines to the Spark Interpreter setting through the Interpreter tab in the Zeppelin UI.That&#39;s it. Play with Zeppelin!",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Spark Interpreter for Apache ZeppelinOverviewApache Spark is a fast and general-purpose cluster computing system.It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs.Apache Spark is supported in Zeppelin with Spark interpreter group which consists of below five interpreters. Name Class Description %spark SparkInterpreter Creates a SparkConte
xt and provides a Scala environment %spark.pyspark PySparkInterpreter Provides a Python environment %spark.r SparkRInterpreter Provides an R environment with SparkR support %spark.sql SparkSQLInterpreter Provides a SQL environment %spark.dep DepInterpreter Dependency loader ConfigurationThe Spark interpreter can be configured with properties provided by Zeppelin.You can also set other Spark properties which are not listed in the table. For a list of additional properties, refer to Spark Available Properties. Property Default Description args Spark commandline args master local[*] Spark master uri. ex) spark://masterhost:7077 spark.app.name Zeppelin The name of spark application. spark.cores.max Total number of cores to use. Empty value uses all available core. spark.executor.memory 1g Executor memory per worker instance. ex) 512m, 32g zeppelin.dep
.additionalRemoteRepository spark-packages, http://dl.bintray.com/spark-packages/maven, false; A list of id,remote-repository-URL,is-snapshot; for each remote repository. zeppelin.dep.localrepo local-repo Local repository for dependency loader PYSPARK_PYTHON python Python binary executable to use for PySpark in both driver and workers (default is python). Property spark.pyspark.python take precedence if it is set PYSPARK_DRIVER_PYTHON python Python binary executable to use for PySpark in driver only (default is PYSPARK_PYTHON). Property spark.pyspark.driver.python take precedence if it is set zeppelin.spark.concurrentSQL false Execute multiple SQL concurrently if set true. zeppelin.spark.concurrentSQL.max 10 Max number of SQL concurrently executed zeppelin.spark.maxResult 1000 Max number of Spark SQL result to display. zeppelin.spark.printREPLOutput true Print RE
PL output zeppelin.spark.useHiveContext true Use HiveContext instead of SQLContext if it is true. zeppelin.spark.importImplicit true Import implicits, UDF collection, and sql if set true. zeppelin.spark.enableSupportedVersionCheck true Do not change - developer only setting, not for production use zeppelin.spark.sql.interpolation false Enable ZeppelinContext variable interpolation into paragraph text zeppelin.spark.uiWebUrl Overrides Spark UI default URL. Value should be a full URL (ex: http://{hostName}/{uniquePath} zeppelin.spark.scala.color true Whether to enable color output of spark scala interpreter Without any configuration, Spark interpreter works out of box in local mode. But if you want to connect to your Spark cluster, you&#39;ll need to follow below two simple steps.1. Export SPARK_HOMEIn conf/zeppelin-env.sh, export SPARK_HOME environment variable with your Spark installation path.For example,
export SPARK_HOME=/usr/lib/sparkYou can optionally set more environment variables# set hadoop conf direxport HADOOP_CONF_DIR=/usr/lib/hadoop# set options to pass spark-submit commandexport SPARK_SUBMIT_OPTIONS=&quot;--packages com.databricks:spark-csv_2.10:1.2.0&quot;# extra classpath. e.g. set classpath for hive-site.xmlexport ZEPPELIN_INTP_CLASSPATH_OVERRIDES=/etc/hive/confFor Windows, ensure you have winutils.exe in %HADOOP_HOME%bin. Please see Problems running Hadoop on Windows for the details.2. Set master in Interpreter menuAfter start Zeppelin, go to Interpreter menu and edit master property in your Spark interpreter setting. The value may vary depending on your Spark cluster deployment type.For example,local[*] in local modespark://master:7077 in standalone clusteryarn-client in Yarn client modeyarn-cluster in Yarn cluster modemesos://host:5050 in Mesos clusterThat&#39;s it. Zeppelin will work with any version of Spark and any deployment type without rebuilding Z
eppelin in this way.For the further information about Spark &amp; Zeppelin version compatibility, please refer to &quot;Available Interpreters&quot; section in Zeppelin download page.Note that without exporting SPARK_HOME, it&#39;s running in local mode with included version of Spark. The included version may vary depending on the build profile.3. Yarn modeZeppelin support both yarn client and yarn cluster mode (yarn cluster mode is supported from 0.8.0). For yarn mode, you must specify SPARK_HOME &amp; HADOOP_CONF_DIR.You can either specify them in zeppelin-env.sh, or in interpreter setting page. Specifying them in zeppelin-env.sh means you can use only one version of spark &amp; hadoop. Specifying themin interpreter setting page means you can use multiple versions of spark &amp; hadoop in one zeppelin instance.4. New Version of SparkInterpreterStarting from 0.9, we totally removed the old spark interpreter implementation, and make the new spark interpre
ter as the official spark interpreter.SparkContext, SQLContext, SparkSession, ZeppelinContextSparkContext, SQLContext and ZeppelinContext are automatically created and exposed as variable names sc, sqlContext and z, respectively, in Scala, Python and R environments.Staring from 0.6.1 SparkSession is available as variable spark when you are using Spark 2.x.Note that Scala/Python/R environment shares the same SparkContext, SQLContext and ZeppelinContext instance. How to pass property to SparkConfThere&#39;re 2 kinds of properties that would be passed to SparkConfStandard spark property (prefix with spark.). e.g. spark.executor.memory will be passed to SparkConfNon-standard spark property (prefix with zeppelin.spark.). e.g. zeppelin.spark.property_1, property_1 will be passed to SparkConfDependency ManagementFor spark interpreter, you should not use Zeppelin&#39;s Dependency Management for managing third party dependencies, (%spark.dep also is not the recommended approach star
ting from Zeppelin 0.8). Instead you should set spark properties (spark.jars, spark.files, spark.jars.packages) in 2 ways. spark-defaults.conf SPARK_SUBMIT_OPTIONS Description spark.jars --jars Comma-separated list of local jars to include on the driver and executor classpaths. spark.jars.packages --packages Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be groupId:artifactId:version. spark.files --files Comma-separated list of files to be placed in the working directory of each executor. 1. Set spark properties in zeppelin side.In zeppelin side, you can either set them in spark interpreter setting page or via Generic ConfInterpreter.It is not recommended to set them in SPARK_SUBMIT_OPTIONS. Because it will be shared by all spar
k interpreters, you can not set different dependencies for different users.2. Set spark properties in spark side.In spark side, you can set them in spark-defaults.conf.e.g. spark.jars /path/mylib1.jar,/path/mylib2.jar spark.jars.packages com.databricks:spark-csv_2.10:1.2.0 spark.files /path/mylib1.py,/path/mylib2.egg,/path/mylib3.zipZeppelinContextZeppelin automatically injects ZeppelinContext as variable z in your Scala/Python environment. ZeppelinContext provides some additional functions and utilities.See Zeppelin-Context for more details.Matplotlib Integration (pyspark)Both the python and pyspark interpreters have built-in support for inline visualization using matplotlib,a popular plotting library for python. More details can be found in the python interpreter documentation,since matplotlib support is identical. More advanced interactive plotting can be done with pyspark throughutilizing Zeppelin&#39;s built-in Angular Display System, as shown below:
Running spark sql concurrentlyBy default, each sql statement would run sequentially in %spark.sql. But you can run them concurrently by following setup.set zeppelin.spark.concurrentSQL to true to enable the sql concurrent feature, underneath zeppelin will change to use fairscheduler for spark. And also set zeppelin.spark.concurrentSQL.max to control the max number of sql statements running concurrently.configure pools by creating fairscheduler.xml under your SPARK_CONF_DIR, check the offical spark doc Configuring Pool Propertiesset pool property via setting paragraph property. e.g.%spark(pool=pool1)sql statementThis feature is available for both all versions of scala spark, pyspark. For sparkr, it is only available starting from 2.3.0.Interpreter setting optionYou can choose one of shared, scoped and isolated options wheh you configure Spark interpreter.Spark interpreter creates separated Scala compiler per each notebook but share a single SparkContext in scoped mode (experimental).
It creates separated SparkContext per each notebook in isolated mode.IPython supportBy default, zeppelin would use IPython in pyspark when IPython is available, Otherwise it would fall back to the original PySpark implementation.If you don&#39;t want to use IPython, then you can set zeppelin.pyspark.useIPython as false in interpreter setting. For the IPython features, you can refer docPython InterpreterSetting up Zeppelin with KerberosLogical setup with Zeppelin, Kerberos Key Distribution Center (KDC), and Spark on YARN:Deprecate Spark 2.2 and earlier versionsStarting from 0.9, Zeppelin deprecate Spark 2.2 and earlier versions. So you will see a warning message when you use Spark 2.2 and earlier.You can get rid of this message by setting zeppelin.spark.deprecatedMsg.show to false.Configuration SetupOn the server that Zeppelin is installed, install Kerberos client modules and configuration, krb5.conf.This is to make the server communicate with KDC.Set SPARK_HOME in [ZEPPELIN_HOME
]/conf/zeppelin-env.sh to use spark-submit(Additionally, you might have to set export HADOOP_CONF_DIR=/etc/hadoop/conf)Add the two properties below to Spark configuration ([SPARK_HOME]/conf/spark-defaults.conf):spark.yarn.principalspark.yarn.keytabNOTE: If you do not have permission to access for the above spark-defaults.conf file, optionally, you can add the above lines to the Spark Interpreter setting through the Interpreter tab in the Zeppelin UI.That&#39;s it. Play with Zeppelin!",
"url": " /interpreter/spark.html",
"group": "interpreter",
"excerpt": "Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution engine."
@@ -80,7 +91,7 @@
"/interpreter/ignite.html": {
"title": "Ignite Interpreter for Apache Zeppelin",
- "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Ignite Interpreter for Apache ZeppelinOverviewApache Ignite In-Memory Data Fabric is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.You can use Zeppelin to retrieve distributed data from cache using Ignite SQL interpreter. Moreover, Ignite interpreter allo
ws you to execute any Scala code in cases when SQL doesn&#39;t fit to your requirements. For example, you can populate data into your caches or execute distributed computations.Installing and Running Ignite exampleIn order to use Ignite interpreters, you may install Apache Ignite in some simple steps:Ignite provides examples only with source or binary release. Download Ignite source release or binary release whatever you want. But you must download Ignite as the same version of Zeppelin&#39;s. If it is not, you can&#39;t use scala code on Zeppelin. The supported Ignite version is specified in Supported Interpreter table for each Zeppelin release. If you&#39;re using Zeppelin master branch, please see ignite.version in path/to/your-Zeppelin/ignite/pom.xml.Examples are shipped as a separate Maven project, so to start running you simply need to import provided &lt;dest_dir&gt;/apache-ignite-fabric-{version}-bin/examples/pom.xml file into your favourite IDE, such
as Eclipse.In case of Eclipse, Eclipse -&gt; File -&gt; Import -&gt; Existing Maven ProjectsSet examples directory path to Eclipse and select the pom.xml.Then start org.apache.ignite.examples.ExampleNodeStartup (or whatever you want) to run at least one or more ignite node. When you run example code, you may notice that the number of node is increase one by one.Tip. If you want to run Ignite examples on the cli not IDE, you can export executable Jar file from IDE. Then run it by using below command.nohup java -jar &lt;/path/to/your Jar file name&gt;Configuring Ignite InterpreterAt the &quot;Interpreters&quot; menu, you may edit Ignite interpreter or create new one. Zeppelin provides these properties for Ignite. Property Name value Description ignite.addresses 127.0.0.1:47500..47509 Coma separated list of Ignite cluster hosts. See [Ignite Cluster Configuration](https://apacheignite.readme.io/docs/cluster-config) section for more de
tails. ignite.clientMode true You can connect to the Ignite cluster as client or server node. See [Ignite Clients vs. Servers](https://apacheignite.readme.io/docs/clients-vs-servers) section for details. Use true or false values in order to connect in client or server mode respectively. ignite.config.url Configuration URL. Overrides all other settings. ignite.jdbc.url jdbc:ignite:cfg://default-ignite-jdbc.xml Ignite JDBC connection URL. ignite.peerClassLoadingEnabled true Enables peer-class-loading. See [Zero Deployment](https://apacheignite.readme.io/docs/zero-deployment) section for details. Use true or false values in order to enable or disable P2P class loading respectively. How to useAfter configuring Ignite interpreter, create your own notebook. Then you can bind interpreters like below image.For more interpreter binding information see here.Ignite SQL interpreterIn order to execute SQL query, use %ignite.ignitesql prefix.
Supposing you are running org.apache.ignite.examples.streaming.wordcount.StreamWords, then you can use &quot;words&quot; cache( Of course you have to specify this cache name to the Ignite interpreter setting section ignite.jdbc.url of Zeppelin ).For example, you can select top 10 words in the words cache using the following query%ignite.ignitesqlselect _val, count(_val) as cnt from String group by _val order by cnt desc limit 10As long as your Ignite version and Zeppelin Ignite version is same, you can also use scala code. Please check the Zeppelin Ignite version before you download your own Ignite.%igniteimport org.apache.ignite._import org.apache.ignite.cache.affinity._import org.apache.ignite.cache.query._import org.apache.ignite.configuration._import scala.collection.JavaConversions._val cache: IgniteCache[AffinityUuid, String] = ignite.cache(&quot;words&quot;)val qry = new SqlFieldsQuery(&quot;select avg(cnt), min(cnt), max(cnt) from (select count(_val) as c
nt from String group by _val)&quot;, true)val res = cache.query(qry).getAll()collectionAsScalaIterable(res).foreach(println _)Apache Ignite also provides a guide docs for Zeppelin &quot;Ignite with Apache Zeppelin&quot;",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Ignite Interpreter for Apache ZeppelinOverviewApache Ignite In-Memory Data Fabric is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.You can use Zeppelin to retrieve distributed data from cache using Ignite SQL interpreter. Moreover, Ignite interpreter allo
ws you to execute any Scala code in cases when SQL doesn&#39;t fit to your requirements. For example, you can populate data into your caches or execute distributed computations.Installing and Running Ignite exampleIn order to use Ignite interpreters, you may install Apache Ignite in some simple steps:Ignite provides examples only with source or binary release. Download Ignite source release or binary release whatever you want. But you must download Ignite as the same version of Zeppelin&#39;s. If it is not, you can&#39;t use scala code on Zeppelin. The supported Ignite version is specified in Supported Interpreter table for each Zeppelin release. If you&#39;re using Zeppelin master branch, please see ignite.version in path/to/your-Zeppelin/ignite/pom.xml.Examples are shipped as a separate Maven project, so to start running you simply need to import provided &lt;dest_dir&gt;/apache-ignite-fabric-{version}-bin/examples/pom.xml file into your favourite IDE, such
as Eclipse.In case of Eclipse, Eclipse -&gt; File -&gt; Import -&gt; Existing Maven ProjectsSet examples directory path to Eclipse and select the pom.xml.Then start org.apache.ignite.examples.ExampleNodeStartup (or whatever you want) to run at least one or more ignite node. When you run example code, you may notice that the number of node is increase one by one.Tip. If you want to run Ignite examples on the cli not IDE, you can export executable Jar file from IDE. Then run it by using below command.nohup java -jar &lt;/path/to/your Jar file name&gt;Configuring Ignite InterpreterAt the &quot;Interpreters&quot; menu, you may edit Ignite interpreter or create new one. Zeppelin provides these properties for Ignite. Property Name value Description ignite.addresses 127.0.0.1:47500..47509 Coma separated list of Ignite cluster hosts. See Ignite Cluster Configuration section for more details. ignite.clientMode true You can con
nect to the Ignite cluster as client or server node. See Ignite Clients vs. Servers section for details. Use true or false values in order to connect in client or server mode respectively. ignite.config.url Configuration URL. Overrides all other settings. ignite.jdbc.url jdbc:ignite:cfg://default-ignite-jdbc.xml Ignite JDBC connection URL. ignite.peerClassLoadingEnabled true Enables peer-class-loading. See Zero Deployment section for details. Use true or false values in order to enable or disable P2P class loading respectively. How to useAfter configuring Ignite interpreter, create your own notebook. Then you can bind interpreters like below image.For more interpreter binding information see here.Ignite SQL interpreterIn order to execute SQL query, use %ignite.ignitesql prefix. Supposing you are running org.apache.ignite.examples.streaming.wordcount.StreamWords, then you can use &quot;words&quot; cache( Of course you have to specify t
his cache name to the Ignite interpreter setting section ignite.jdbc.url of Zeppelin ).For example, you can select top 10 words in the words cache using the following query%ignite.ignitesqlselect _val, count(_val) as cnt from String group by _val order by cnt desc limit 10As long as your Ignite version and Zeppelin Ignite version is same, you can also use scala code. Please check the Zeppelin Ignite version before you download your own Ignite.%igniteimport org.apache.ignite._import org.apache.ignite.cache.affinity._import org.apache.ignite.cache.query._import org.apache.ignite.configuration._import scala.collection.JavaConversions._val cache: IgniteCache[AffinityUuid, String] = ignite.cache(&quot;words&quot;)val qry = new SqlFieldsQuery(&quot;select avg(cnt), min(cnt), max(cnt) from (select count(_val) as cnt from String group by _val)&quot;, true)val res = cache.query(qry).getAll()collectionAsScalaIterable(res).foreach(println _)Apache Ignite also provides a guide d
ocs for Zeppelin &quot;Ignite with Apache Zeppelin&quot;",
"url": " /interpreter/ignite.html",
"group": "interpreter",
"excerpt": "Apache Ignite in-memory Data Fabric is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies."
@@ -201,7 +212,7 @@
"/interpreter/flink.html": {
"title": "Flink Interpreter for Apache Zeppelin",
- "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Flink interpreter for Apache ZeppelinOverviewApache Flink is an open source platform for distributed stream and batch data processing. Flinkâs core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink also builds batch processing on top of the streaming engine, overlaying native iteration support, managed memory, and program opt
imization.How to start local Flink cluster, to test the interpreterZeppelin comes with pre-configured flink-local interpreter, which starts Flink in a local mode on your machine, so you do not need to install anything.How to configure interpreter to point to Flink clusterAt the &quot;Interpreters&quot; menu, you have to create a new Flink interpreter and provide next properties: property value Description host local host name of running JobManager. 'local' runs flink in local mode (default) port 6123 port of running JobManager For more information about Flink configuration, you can find it here.How to test it&#39;s workingYou can find an example of Flink usage in the Zeppelin Tutorial folder or try the following word count example, by using the Zeppelin notebook from Till Rohrmann&#39;s presentation Interactive data analysis with Apache Flink for Apache Flink Meetup.%shrm 10.txt.utf-8wget http://www.gutenberg.org/ebooks/1
0.txt.utf-8%flinkcase class WordCount(word: String, frequency: Int)val bible:DataSet[String] = benv.readTextFile(&quot;10.txt.utf-8&quot;)val partialCounts: DataSet[WordCount] = bible.flatMap{ line =&gt; &quot;&quot;&quot;bw+b&quot;&quot;&quot;.r.findAllIn(line).map(word =&gt; WordCount(word, 1))// line.split(&quot; &quot;).map(word =&gt; WordCount(word, 1))}val wordCounts = partialCounts.groupBy(&quot;word&quot;).reduce{ (left, right) =&gt; WordCount(left.word, left.frequency + right.frequency)}val result10 = wordCounts.first(10).collect()",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Flink interpreter for Apache ZeppelinOverviewApache Flink is an open source platform for distributed stream and batch data processing. Flinkâs core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink also builds batch processing on top of the streaming engine, overlaying native iteration support, managed memory, and program opt
imization.Apache Flink is supported in Zeppelin with Flink interpreter group which consists of below five interpreters. Name Class Description %flink FlinkInterpreter Creates ExecutionEnvironment/StreamExecutionEnvironment/BatchTableEnvironment/StreamTableEnvironment and provides a Scala environment %flink.pyflink PyFlinkInterpreter Provides a python environment %flink.ipyflink IPyFlinkInterpreter Provides an ipython environment %flink.ssql FlinkStreamSqlInterpreter Provides a stream sql environment %flink.bsql FlinkBatchSqlInterpreter Provides a batch sql environment ConfigurationThe Flink interpreter can be configured with properties provided by Zeppelin.You can also set other flink properties which are not listed in the table. For a list of additional properties, refer to Flink Available Properties. Property Default Description FLINK_HOME Location of flink installation. It is mus
t be specified, otherwise you can not use flink in zeppelin flink.execution.mode local Execution mode of flink, e.g. local/yarn/remote flink.execution.remote.host jobmanager hostname if it is remote mode flink.execution.remote.port jobmanager port if it is remote mode flink.jm.memory 1024 Total number of memory(mb) of JobManager flink.tm.memory 1024 Total number of memory(mb) of TaskManager flink.tm.num 2 Number of TaskManager flink.tm.slot 1 Number of slot per TaskManager flink.yarn.appName Zeppelin Flink Session Yarn app name flink.yarn.queue queue name of yarn app flink.yarn.jars additional user jars (comma separated) zeppelin.flink.scala.color true whether display scala shell output in colorful format zeppelin.flink.enableHive false whether enable hive zeppelin.flink.printREPLOutput true Print REPL output
zeppelin.flink.maxResult 1000 max number of row returned by sql interpreter zeppelin.flink.planner blink planner or flink table api, blink or flink zeppelin.pyflink.python python python executable for pyflink StreamExecutionEnvironment, ExecutionEnvironment, StreamTableEnvironment, BatchTableEnvironmentZeppelin will create 4 variables to represent flink&#39;s entrypoint:* senv (StreamExecutionEnvironment), * env (ExecutionEnvironment)* stenv (StreamTableEnvironment) * btenv (BatchTableEnvironment)ZeppelinContextZeppelin automatically injects ZeppelinContext as variable z in your Scala/Python environment. ZeppelinContext provides some additional functions and utilities.See Zeppelin-Context for more details.IPython supportBy default, zeppelin would use IPython in pyflink when IPython is available, Otherwise it would fall back to the original PyFlink implementation.If you don&#39;t want to use IPython, then you can set zeppelin.py
flink.useIPython as false in interpreter setting. For the IPython features, you can refer docPython Interpreter",
"url": " /interpreter/flink.html",
"group": "interpreter",
"excerpt": "Apache Flink is an open source platform for distributed stream and batch data processing."
@@ -223,7 +234,7 @@
"/interpreter/cassandra.html": {
"title": "Cassandra CQL Interpreter for Apache Zeppelin",
[... 159 lines stripped ...]