You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@zeppelin.apache.org by zj...@apache.org on 2020/12/24 14:36:06 UTC
svn commit: r1884775 [30/49] - in /zeppelin/site/docs: 0.9.0-SNAPSHOT/
0.9.0/ 0.9.0/assets/ 0.9.0/assets/themes/ 0.9.0/assets/themes/zeppelin/
0.9.0/assets/themes/zeppelin/bootstrap/
0.9.0/assets/themes/zeppelin/bootstrap/css/ 0.9.0/assets/themes/zeppe...
Added: zeppelin/site/docs/0.9.0/search_data.json
URL: http://svn.apache.org/viewvc/zeppelin/site/docs/0.9.0/search_data.json?rev=1884775&view=auto
==============================================================================
--- zeppelin/site/docs/0.9.0/search_data.json (added)
+++ zeppelin/site/docs/0.9.0/search_data.json Thu Dec 24 14:36:01 2020
@@ -0,0 +1,1112 @@
+{
+
+
+ "/interpreter/livy.html": {
+ "title": "Livy Interpreter for Apache Zeppelin",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Livy Interpreter for Apache ZeppelinOverviewLivy is an open source REST interface for interacting with Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in YARN.Interactive Scala, Python and R shellsBatch submissions in Scala, Java, PythonMulti users can share the same server (impersonation support)Can be used for submitting jobs from anywhere with RESTDoes not require a
ny code change to your programsRequirementsAdditional requirements for the Livy interpreter are:Spark 1.3 or above.Livy server.ConfigurationWe added some common configurations for spark, and you can set any configuration you want.You can find all Spark configurations in here.And instead of starting property with spark. it should be replaced with livy.spark..Example: spark.driver.memory to livy.spark.driver.memory Property Default Description zeppelin.livy.url http://localhost:8998 URL where livy server is running zeppelin.livy.spark.sql.maxResult 1000 Max number of Spark SQL result to display. zeppelin.livy.spark.sql.field.truncate true Whether to truncate field values longer than 20 characters or not zeppelin.livy.session.create_timeout 120 Timeout in seconds for session creation zeppelin.livy.displayAppInfo true Whether to display app info zeppelin.livy.pull_status.interval.millis 1000 The int
erval for checking paragraph execution status livy.spark.driver.cores Driver cores. ex) 1, 2. livy.spark.driver.memory Driver memory. ex) 512m, 32g. livy.spark.executor.instances Executor instances. ex) 1, 4. livy.spark.executor.cores Num cores per executor. ex) 1, 4. livy.spark.executor.memory Executor memory per worker instance. ex) 512m, 32g. livy.spark.dynamicAllocation.enabled Use dynamic resource allocation. ex) True, False. livy.spark.dynamicAllocation.cachedExecutorIdleTimeout Remove an executor which has cached data blocks. livy.spark.dynamicAllocation.minExecutors Lower bound for the number of executors. livy.spark.dynamicAllocation.initialExecutors Initial number of executors to run. livy.spark.dynamicAllocation.maxExecutors Upper bound for the number of executors. livy.spark.jars.packages Adding extra libr
aries to livy interpreter zeppelin.livy.ssl.trustStore client trustStore file. Used when livy ssl is enabled zeppelin.livy.ssl.trustStorePassword password for trustStore file. Used when livy ssl is enabled zeppelin.livy.ssl.trustStoreType JKS type of truststore. Either JKS or PKCS12. zeppelin.livy.ssl.keyStore client keyStore file. Needed if Livy requires two way SSL authentication. zeppelin.livy.ssl.keyStorePassword password for keyStore file. zeppelin.livy.ssl.keyStoreType JKS type of keystore. Either JKS or PKCS12. zeppelin.livy.ssl.keyPassword password for key in the keyStore file. Defaults to zeppelin.livy.ssl.keyStorePassword. zeppelin.livy.http.headers key_1: value_1; key_2: value_2 custom http headers when calling livy rest api. Each http header is separated by `;`, and each header is one key value pair where key value is separated by `:` We remove livy.spar
k.master in zeppelin-0.7. Because we sugguest user to use livy 0.3 in zeppelin-0.7. And livy 0.3 don&#39;t allow to specify livy.spark.master, it enfornce yarn-cluster mode.Adding External librariesYou can load dynamic library to livy interpreter by set livy.spark.jars.packages property to comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. The format for the coordinates should be groupId:artifactId:version.Example Property Example Description livy.spark.jars.packages io.spray:spray-json_2.10:1.3.1 Adding extra libraries to livy interpreter How to useBasically, you can usespark%livy.sparksc.versionpyspark%livy.pysparkprint &quot;1&quot;sparkR%livy.sparkrhello &lt;- function( name ) { sprintf( &quot;Hello, %s&quot;, name );}hello(&quot;livy&quot;)ImpersonationWhen Zeppelin server is running with authentication enabled,then this interpreter utilizes Livyâs user im
personation featurei.e. sends extra parameter for creating and running a session (&quot;proxyUser&quot;: &quot;${loggedInUser}&quot;).This is particularly useful when multi users are sharing a Notebook server.Apply Zeppelin Dynamic FormsYou can leverage Zeppelin Dynamic Form. Form templates is only avalible for livy sql interpreter.%livy.sqlselect * from products where ${product_id=1}And creating dynamic formst programmatically is not feasible in livy interpreter, because ZeppelinContext is not available in livy interpreter.Shared SparkContextStarting from livy 0.5 which is supported by Zeppelin 0.8.0, SparkContext is shared between scala, python, r and sql.That means you can query the table via %livy.sql when this table is registered in %livy.spark, %livy.pyspark, $livy.sparkr.FAQLivy debugging: If you see any of these in error consoleConnect to livyhost:8998 [livyhost/127.0.0.1, livyhost/0:0:0:0:0:0:0:1] failed: Connection refusedLooks like the livy server is not u
p yet or the config is wrongException: Session not found, Livy server would have restarted, or lost session.The session would have timed out, you may need to restart the interpreter.Blacklisted configuration values in session config: spark.masterEdit conf/spark-blacklist.conf file in livy server and comment out #spark.master line.If you choose to work on livy in apps/spark/java directory in https://github.com/cloudera/hue,copy spark-user-configurable-options.template to spark-user-configurable-options.conf file in livy server and comment out #spark.master.",
+ "url": " /interpreter/livy.html",
+ "group": "interpreter",
+ "excerpt": "Livy is an open source REST interface for interacting with Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in YARN."
+ }
+ ,
+
+
+
+ "/interpreter/ksql.html": {
+ "title": "KSQL Interpreter for Apache Zeppelin",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->KSQL Interpreter for Apache ZeppelinOverviewKSQL is the streaming SQL engine for Apache Kafka®. It provides an easy-to-use yet powerful interactive SQL interface for stream processing on Kafka,Configuration Property Default Description ksql.url http://localhost:8080 The KSQL Endpoint base URL N.b. The interpreter supports all the KSQL properties, i.e. ksql.streams.auto.offset.
reset.The full list of KSQL parameters is here.Using the KSQL InterpreterIn a paragraph, use %ksql and start your SQL query in order to start to interact with KSQL.Following some examples:%ksqlPRINT &#39;orders&#39;;%ksqlCREATE STREAM ORDERS WITH (VALUE_FORMAT=&#39;AVRO&#39;, KAFKA_TOPIC =&#39;orders&#39;);%ksqlSELECT *FROM ORDERSLIMIT 10",
+ "url": " /interpreter/ksql.html",
+ "group": "interpreter",
+ "excerpt": "SQL is the streaming SQL engine for Apache Kafka and provides an easy-to-use yet powerful interactive SQL interface for stream processing on Kafka."
+ }
+ ,
+
+
+
+ "/interpreter/pig.html": {
+ "title": "Pig Interpreter for Apache Zeppelin",
+ "content" : "Pig Interpreter for Apache ZeppelinOverviewApache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.Supported interpreter type%pig.script (default Pig interpreter, so you can use %pig)%pig.script is like the Pig grunt shell. Anything you can run in Pig grunt shell can be run in %pig.script interpreter, it is used for running Pig script where you donât need to visualize the data, it is suitable for data munging. %pig.query%pig.query is a little different compared with %pig.script. It is used for exploratory data analysis via Pig latin where you can leverage Zeppelinâs visualization ability. There&#39;re 2 minor differences in the last statement between %pig
.script and %pig.queryNo pig alias in the last statement in %pig.query (read the examples below).The last statement must be in single line in %pig.queryHow to useHow to setup Pig execution modes.Local ModeSet zeppelin.pig.execType as local.MapReduce ModeSet zeppelin.pig.execType as mapreduce. HADOOP_CONF_DIR needs to be specified in ZEPPELIN_HOME/conf/zeppelin-env.sh.Tez Local ModeOnly Tez 0.7 is supported. Set zeppelin.pig.execType as tez_local.Tez ModeOnly Tez 0.7 is supported. Set zeppelin.pig.execType as tez. HADOOP_CONF_DIR and TEZ_CONF_DIR needs to be specified in ZEPPELIN_HOME/conf/zeppelin-env.sh.Spark Local ModeOnly Spark 1.6.x is supported, by default it is Spark 1.6.3. Set zeppelin.pig.execType as spark_local.Spark ModeOnly Spark 1.6.x is supported, by default it is Spark 1.6.3. Set zeppelin.pig.execType as spark. For now, only yarn-client mode is supported. To enable it, you need to set property SPARK_MASTER to yarn-client and set SPARK_JAR to the spark assembly jar.How
to choose custom Spark VersionBy default, Pig Interpreter would use Spark 1.6.3 built with scala 2.10, if you want to use another spark version or scala version, you need to rebuild Zeppelin by specifying the custom Spark version via -Dpig.spark.version= and scala version via -Dpig.scala.version= in the maven build command.How to configure interpreterAt the Interpreters menu, you have to create a new Pig interpreter. Pig interpreter has below properties by default.And you can set any Pig properties here which will be passed to Pig engine. (like tez.queue.name &amp; mapred.job.queue.name).Besides, we use paragraph title as job name if it exists, else use the last line of Pig script. So you can use that to find app running in YARN RM UI. Property Default Description zeppelin.pig.execType mapreduce Execution mode for pig runtime. local | mapreduce | tez_local | tez | spark_local | spark zeppelin.pig.includeJobSta
ts false whether display jobStats info in %pig.script zeppelin.pig.maxResult 1000 max row number displayed in %pig.query tez.queue.name default queue name for tez engine mapred.job.queue.name default queue name for mapreduce engine SPARK_MASTER local local | yarn-client SPARK_JAR The spark assembly jar, both jar in local or hdfs is supported. Put it on hdfs could have performance benefit Examplepig%pigbankText = load &#39;bank.csv&#39; using PigStorage(&#39;;&#39;);bank = foreach bankText generate $0 as age, $1 as job, $2 as marital, $3 as education, $5 as balance; bank = filter bank by age != &#39;&quot;age&quot;&#39;;bank = foreach bank generate (int)age, REPLACE(job,&#39;&quot;&#39;,&#39;&#39;) as job, REPLACE(marital, &#39;&quot;&#39;, &#39;&a
mp;#39;) as marital, (int)(REPLACE(balance, &#39;&quot;&#39;, &#39;&#39;)) as balance;store bank into &#39;clean_bank.csv&#39; using PigStorage(&#39;;&#39;); -- this statement is optional, it just show you that most of time %pig.script is used for data munging before querying the data. pig.queryGet the number of each age where age is less than 30%pig.querybank_data = filter bank by age &lt; 30;b = group bank_data by age;foreach b generate group, COUNT($1);The same as above, but use dynamic text form so that use can specify the variable maxAge in textbox. (See screenshot below). Dynamic form is a very cool feature of Zeppelin, you can refer this link) for details.%pig.querybank_data = filter bank by age &lt; ${maxAge=40};b = group bank_data by age;foreach b generate group, COUNT($1) as count;Get the number of each age for specific marital type, also use dynamic form here. User can choose the marital type in the dropdown list (see screenshot
below).%pig.querybank_data = filter bank by marital==&#39;${marital=single,single|divorced|married}&#39;;b = group bank_data by age;foreach b generate group, COUNT($1) as count;The above examples are in the Pig tutorial note in Zeppelin, you can check that for details. Here&#39;s the screenshot.Data is shared between %pig and %pig.query, so that you can do some common work in %pig, and do different kinds of query based on the data of %pig. Besides, we recommend you to specify alias explicitly so that the visualization can display the column name correctly. In the above example 2 and 3 of %pig.query, we name COUNT($1) as count. If you don&#39;t do this, then we will name it using position. E.g. in the above first example of %pig.query, we will use col_1 in chart to represent COUNT($1).",
+ "url": " /interpreter/pig.html",
+ "group": "manual",
+ "excerpt": "Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs."
+ }
+ ,
+
+
+
+ "/interpreter/markdown.html": {
+ "title": "Markdown Interpreter for Apache Zeppelin",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Markdown Interpreter for Apache ZeppelinOverviewMarkdown is a plain text formatting syntax designed so that it can be converted to HTML.Apache Zeppelin uses flexmark, pegdown and markdown4j as markdown parsers.In Zeppelin notebook, you can use %md in the beginning of a paragraph to invoke the Markdown interpreter and generate static html from Markdown plain text.In Zeppelin, Markdown interpreter is enabled by default and uses the p
egdown parser.ExampleThe following example demonstrates the basic usage of Markdown in a Zeppelin notebook.Mathematical expressionMarkdown interpreter leverages %html display system internally. That means you can mix mathematical expressions with markdown syntax. For more information, please see Mathematical Expression section.Configuration Name Default Value Description markdown.parser.type flexmark Markdown Parser Type. Available values: flexmark, pegdown, markdown4j. Flexmark parser (Default Markdown Parser)CommonMark/Markdown Java parser with source level AST.flexmark parser provides YUML and Websequence extensions also.Pegdown Parserpegdown parser provides github flavored markdown. Although still one of the most popular Markdown parsing libraries for the JVM, pegdown has reached its end of life.The project is essentially unmaintained with tickets piling up and crucial bugs not being fixed.pegdown&#39;s parsing performance isn&#39;t great. But k
eep this parser for the backward compatibility.Markdown4j ParserSince pegdown parser is more accurate and provides much more markdown syntax markdown4j option might be removed later. But keep this parser for the backward compatibility.",
+ "url": " /interpreter/markdown.html",
+ "group": "interpreter",
+ "excerpt": "Markdown is a plain text formatting syntax designed so that it can be converted to HTML. Apache Zeppelin uses markdown4j."
+ }
+ ,
+
+
+
+ "/interpreter/submarine.html": {
+ "title": "Apache Hadoop Submarine Interpreter for Apache Zeppelin",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Submarine Interpreter for Apache ZeppelinHadoop Submarine is the latest machine learning framework subproject in the Hadoop 3.1 release. It allows Hadoop to support Tensorflow, MXNet, Caffe, Spark, etc. A variety of deep learning frameworks provide a full-featured system framework for machine learning algorithm development, distributed model training, model management, and model publishing, combined with hadoop&#39;s intrinsic
data storage and data processing capabilities to enable data scientists to Good mining and the value of the data.A deep learning algorithm project requires data acquisition, data processing, data cleaning, interactive visual programming adjustment parameters, algorithm testing, algorithm publishing, algorithm job scheduling, offline model training, model online services and many other processes and processes. Zeppelin is a web-based notebook that supports interactive data analysis. You can use SQL, Scala, Python, etc. to make data-driven, interactive, collaborative documents.You can use the more than 20 interpreters in zeppelin (for example: spark, hive, Cassandra, Elasticsearch, Kylin, HBase, etc.) to collect data, clean data, feature extraction, etc. in the data in Hadoop before completing the machine learning model training. The data preprocessing process.By integrating submarine in zeppelin, we use zeppelin&#39;s data discovery, data analysis and data visualization and coll
aboration capabilities to visualize the results of algorithm development and parameter adjustment during machine learning model training.ArchitectureAs shown in the figure above, how the Submarine develops and models the machine learning algorithms through Zeppelin is explained from the system architecture.After installing and deploying Hadoop 3.1+ and Zeppelin, submarine will create a fully separate Zeppelin Submarine interpreter Docker container for each user in YARN. This container contains the development and runtime environment for Tensorflow. Zeppelin Server connects to the Zeppelin Submarine interpreter Docker container in YARN. allows algorithmic engineers to perform algorithm development and data visualization in Tensorflow&#39;s stand-alone environment in Zeppelin Notebook.After the algorithm is developed, the algorithm engineer can submit the algorithm directly to the YARN in offline transfer training in Zeppelin, real-time demonstration of model training with Submari
ne&#39;s TensorBoard for each algorithm engineer.You can not only complete the model training of the algorithm, but you can also use the more than twenty interpreters in Zeppelin. Complete the data preprocessing of the model, For example, you can perform data extraction, filtering, and feature extraction through the Spark interpreter in Zeppelin in the Algorithm Note.In the future, you can also use Zeppelin&#39;s upcoming Workflow workflow orchestration service. You can complete Spark, Hive data processing and Tensorflow model training in one Note. It is organized into a workflow through visualization, etc., and the scheduling of jobs is performed in the production environment.OverviewAs shown in the figure above, from the internal implementation, how Submarine combines Zeppelin&#39;s machine learning algorithm development and model training.The algorithm engineer created a Tensorflow notebook (left image) in Zeppelin by using Submarine interpreter.It is important to not
e that you need to complete the development of the entire algorithm in a Note.You can use Spark for data preprocessing in some of the paragraphs in Note.Use Python for algorithm development and debugging of Tensorflow in other paragraphs of notebook, Submarine creates a Zeppelin Submarine Interpreter Docker Container for you in YARN, which contains the following features and services:Shell Command line toolï¼Allows you to view the system environment in the Zeppelin Submarine Interpreter Docker Container, Install the extension tools you need or the Python dependencies.Kerberos libï¼Allows you to perform kerberos authentication and access to Hadoop clusters with Kerberos authentication enabled.Tensorflow environmentï¼Allows you to develop tensorflow algorithm code.Python environmentï¼Allows you to develop tensorflow code.Complete a complete algorithm development with a Note in Zeppelin. If this algorithm contains multiple modules, You can write different algorithm modu
les in multiple paragraphs in Note. The title of each paragraph is the name of the algorithm module. The content of the paragraph is the code content of this algorithm module.HDFS Clientï¼Zeppelin Submarine Interpreter will automatically submit the algorithm code you wrote in Note to HDFS.Submarine interpreter Docker Image It is Submarine that provides you with an image file that supports Tensorflow (CPU and GPU versions).And installed the algorithm library commonly used by Python.You can also install other development dependencies you need on top of the base image provided by Submarine.When you complete the development of the algorithm module, You can do this by creating a new paragraph in Note and typing %submarine dashboard. Zeppelin will create a Submarine Dashboard. The machine learning algorithm written in this Note can be submitted to YARN as a JOB by selecting the JOB RUN command option in the Control Panel. Create a Tensorflow Model Training Docker Container, The contai
ner contains the following sections:Tensorflow environmentHDFS Client Will automatically download the algorithm file Mount from HDFS into the container for distributed model training. Mount the algorithm file to the Work Dir path of the container.Submarine Tensorflow Docker Image There is Submarine that provides you with an image file that supports Tensorflow (CPU and GPU versions). And installed the algorithm library commonly used by Python. You can also install other development dependencies you need on top of the base image provided by Submarine. Name Class Description %submarine SubmarineInterpreter Provides interpreter for Apache Submarine dashboard %submarine.sh SubmarineShellInterpreter Provides interpreter for Apache Submarine shell %submarine.python PySubmarineInterpreter Provides interpreter for Apache Submarine python Submarine shellAfter creating a Note with Submarine Interpreter in Zeppelin, You can add a paragraph to N
ote if you need it. Using the %submarine.sh identifier, you can use the Shell command to perform various operations on the Submarine Interpreter Docker Container, such as:View the Pythone version in the ContainerView the system environment of the ContainerInstall the dependencies you need yourselfKerberos certification with kinitUse Hadoop in Container for HDFS operations, etc.Submarine pythonYou can add one or more paragraphs to Note. Write the algorithm module for Tensorflow in Python using the %submarine.python identifier.Submarine DashboardAfter writing the Tensorflow algorithm by using %submarine.python, You can add a paragraph to Note. Enter the %submarine dashboard and execute it. Zeppelin will create a Submarine Dashboard.With Submarine Dashboard you can do all the operational control of Submarine, for example:Usageï¼Display Submarine&#39;s command description to help developers locate problems.Refreshï¼Zeppelin will erase all your input in the Dashboard.Tensorbo
ardï¼You will be redirected to the Tensorboard WEB system created by Submarine for each user. With Tensorboard you can view the real-time status of the Tensorflow model training in real time.CommandJOB RUNï¼Selecting JOB RUN will display the parameter input interface for submitting JOB. Name Description Checkpoint Path/td> Submarine sets up a separate Checkpoint path for each user's Note for Tensorflow training. Saved the training data for this Note history, Used to train the output of model data, Tensorboard uses the data in this path for model presentation. Users cannot modify it. For example: `hdfs://cluster1/...` , The environment variable name for Checkpoint Path is `%checkpoint_path%`, You can use `%checkpoint_path%` instead of the input value in Data Path in `PS Launch Cmd` and `Worker Launch Cmd`. Input Path The user specifies the data data directory of the Tensorflow algorithm. Only HDFS-enabled directories are supported. The envir
onment variable name for Data Path is `%input_path%`, You can use `%input_path%` instead of the input value in Data Path in `PS Launch Cmd` and `Worker Launch Cmd`. PS Launch Cmd Tensorflow Parameter services launch commandï¼ä¾å¦ï¼`python cifar10_main.py --data-dir=%input_path% --job-dir=%checkpoint_path% --num-gpus=0 ...` Worker Launch Cmd Tensorflow Worker services launch commandï¼ä¾å¦ï¼`python cifar10_main.py --data-dir=%input_path% --job-dir=%checkpoint_path% --num-gpus=1 ...` JOB STOPYou can choose to execute the JOB STOP command. Stop a Tensorflow model training task that has been submitted and is runningTENSORBOARD STARTYou can choose to execute the TENSORBOARD START command to create your TENSORBOARD Docker Container.TENSORBOARD STOPYou can choose to execute the TENSORBOARD STOP command to stop and destroy your TENSORBOARD Docker Container.Run Commandï¼Execute the action command of your choiceClean Chechkpointï¼Che
cking this option will clear the data in this Note&#39;s Checkpoint Path before each JOB RUN execution.ConfigurationZeppelin Submarine interpreter provides the following properties to customize the Submarine interpreter Attribute name Attribute value Description DOCKER_CONTAINER_TIME_ZONE Etc/UTC Set the time zone in the container | DOCKER_HADOOP_HDFS_HOME /hadoop-3.1-0 Hadoop path in the following 3 imagesï¼SUBMARINE_INTERPRETER_DOCKER_IMAGEãtf.parameter.services.docker.imageãtf.worker.services.docker.imageï¼ | DOCKER_JAVA_HOME /opt/java JAVA path in the following 3 imagesï¼SUBMARINE_INTERPRETER_DOCKER_IMAGEãtf.parameter.services.docker.imageãtf.worker.services.docker.imageï¼ | HADOOP_YARN_SUBMARINE_JAR Path to the Submarine JAR package in the Hadoop-3.1+ release installed on the Zeppelin server | INTERPRETER_LAUNCH_MODE local/yarn Run the S
ubmarine interpreter instance in local or YARN local mainly for submarine interpreter development and debugging YARN mode for production environment | SUBMARINE_HADOOP_CONF_DIR Set the HADOOP-CONF path to support multiple Hadoop cluster environments SUBMARINE_HADOOP_HOME Hadoop-3.1+ above path installed on the Zeppelin server SUBMARINE_HADOOP_KEYTAB Keytab file path for a hadoop cluster with kerberos authentication turned on SUBMARINE_HADOOP_PRINCIPAL PRINCIPAL information for the keytab file of the hadoop cluster with kerberos authentication turned on SUBMARINE_INTERPRETER_DOCKER_IMAGE At INTERPRETER_LAUNCH_MODE=yarn, Submarine uses this image to create a Zeppelin Submarine interpreter container to create an algorithm development environment for the user. | docker.container.network YARN's Docker network name machinelearing.distributed.enable Whether to use the model training of the
distributed mode JOB RUN submission shell.command.timeout.millisecs 60000 Execute timeout settings for shell commands in the Submarine interpreter container submarine.algorithm.hdfs.path Save machine-based algorithms developed using Submarine interpreter to HDFS as files submarine.yarn.queue root.default Submarine submits model training YARN queue name tf.checkpoint.path Tensorflow checkpoint path, Each user will create a user's checkpoint secondary path using the username under this path. Each algorithm submitted by the user will create a checkpoint three-level path using the note id (the user's Tensorboard uses the checkpoint data in this path for visual display) tf.parameter.services.cpu Number of CPU cores applied to Tensorflow parameter services when Submarine submits model distributed training tf.parameter.services.docker.image Submarine creates a mirror for Tensorflow parameter services
when submitting model distributed training tf.parameter.services.gpu GPU cores applied to Tensorflow parameter services when Submarine submits model distributed training tf.parameter.services.memory 2G Memory resources requested by Tensorflow parameter services when Submarine submits model distributed training tf.parameter.services.num Number of Tensorflow parameter services used by Submarine to submit model distributed training tf.tensorboard.enable true Create a separate Tensorboard for each user tf.worker.services.cpu Submarine submits model resources for Tensorflow worker services when submitting model training tf.worker.services.docker.image Submarine creates a mirror for Tensorflow worker services when submitting model distributed training tf.worker.services.gpu Submarine submits GPU resources for Tensorflow worker services when submitting model training tf.worker.services.m
emory Submarine submits model resources for Tensorflow worker services when submitting model training tf.worker.services.num Number of Tensorflow worker services used by Submarine to submit model distributed training yarn.webapp.http.address http://hadoop:8088 YARN web ui address zeppelin.interpreter.rpc.portRange 29914 You need to export this port in the SUBMARINE_INTERPRETER_DOCKER_IMAGE configuration image. RPC communication for Zeppelin Server and Submarine interpreter containers zeppelin.ipython.grpc.message_size 33554432 Message size setting for IPython grpc in Submarine interpreter container zeppelin.ipython.launch.timeout 30000 IPython execution timeout setting in Submarine interpreter container zeppelin.python python Execution path of python in Submarine interpreter container zeppelin.python.maxResult 10000 The maximum number of python execution results returned from the Subma
rine interpreter container zeppelin.python.useIPython false IPython is currently not supported and must be false zeppelin.submarine.auth.type simple/kerberos Has Hadoop turned on kerberos authentication? Docker imagesThe docker images file is stored in the zeppelin/scripts/docker/submarine directory.submarine interpreter cpu versionsubmarine interpreter gpu versiontensorflow 1.10 &amp; hadoop 3.1.2 cpu versiontensorflow 1.10 &amp; hadoop 3.1.2 gpu versionChange Log0.1.0 (Zeppelin 0.9.0) :Support distributed or standolone tensorflow model training.Support submarine interpreter running local.Support submarine interpreter running YARN.Support Docker on YARN-3.3.0, Plan compatible with lower versions of yarn.Bugs &amp; ContactsSubmarine interpreter BUGIf you encounter a bug for this interpreter, please create a sub JIRA ticket on ZEPPELIN-3856.Submarine Running problemIf you encounter a problem for Submarine runtime, please create a ISSUE on hadoop
-submarine-ecosystem.YARN Submarine BUGIf you encounter a bug for Yarn Submarine, please create a JIRA ticket on SUBMARINE.DependencyYARNSubmarine currently need to run on Hadoop 3.3+The hadoop version of the hadoop submarine team git repository is periodically submitted to the code repository of the hadoop.The version of the git repository for the hadoop submarine team will be faster than the hadoop version release cycle.You can use the hadoop version of the hadoop submarine team git repository.Submarine runtime environmentyou can use Submarine-installer https://github.com/hadoopsubmarine, Deploy Docker and network environments.MoreHadoop Submarine Project: https://hadoop.apache.org/submarineYoutube Submarine Channel: https://www.youtube.com/channel/UC4JBt8Y8VJ0BW0IM9YpdCyQ",
+ "url": " /interpreter/submarine.html",
+ "group": "interpreter",
+ "excerpt": "Hadoop Submarine is the latest machine learning framework subproject in the Hadoop 3.1 release. It allows Hadoop to support Tensorflow, MXNet, Caffe, Spark, etc."
+ }
+ ,
+
+
+
+ "/interpreter/mahout.html": {
+ "title": "Mahout Interpreter for Apache Zeppelin",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Apache Mahout Interpreter for Apache ZeppelinInstallationApache Mahout is a collection of packages that enable machine learning and matrix algebra on underlying engines such as Apache Flink or Apache Spark. A convenience script for creating and configuring two Mahout enabled interpreters exists. The %sparkMahout and %flinkMahout interpreters do not exist by default but can be easily created using this script. Easy InstallationTo
quickly and easily get up and running using Apache Mahout, run the following command from the top-level directory of the Zeppelin install:python scripts/mahout/add_mahout.pyThis will create the %sparkMahout and %flinkMahout interpreters, and restart Zeppelin.Advanced InstallationThe add_mahout.py script contains several command line arguments for advanced users. Argument Description Example --zeppelin_home This is the path to the Zeppelin installation. This flag is not needed if the script is run from the top-level installation directory or from the zeppelin/scripts/mahout directory. /path/to/zeppelin --mahout_home If the user has already installed Mahout, this flag can set the path to MAHOUT_HOME. If this is set, downloading Mahout will be skipped. /path/to/mahout_home --restart_later Restarting is necessary for updates to take effect. By default the script will restart Zeppelin for you. Restart will be skipped if this flag is set.
NA --force_download This flag will force the script to re-download the binary even if it already exists. This is useful for previously failed downloads. NA --overwrite_existing This flag will force the script to overwrite existing %sparkMahout and %flinkMahout interpreters. Useful when you want to just start over. NA NOTE 1: Apache Mahout at this time only supports Spark 1.5 and Spark 1.6 and Scala 2.10. If the user is using another version of Spark (e.g. 2.0), the %sparkMahout will likely not work. The %flinkMahout interpreter will still work and the user is encouraged to develop with that engine as the code can be ported via copy and paste, as is evidenced by the tutorial notebook.NOTE 2: If using Apache Flink in cluster mode, the following libraries will also need to be coppied to ${FLINK_HOME}/lib- mahout-math-0.12.2.jar- mahout-math-scala2.10-0.12.2.jar- mahout-flink2.10-0.12.2.jar- mahout-hdfs-0.12.2.jar- com.google.guava:guava:14.0.1Ov
erviewThe Apache Mahout⢠project&#39;s goal is to build an environment for quickly creating scalable performant machine learning applications.Apache Mahout software provides three major features:A simple and extensible programming environment and framework for building scalable algorithmsA wide variety of premade algorithms for Scala + Apache Spark, H2O, Apache FlinkSamsara, a vector math experimentation environment with R-like syntax which works at scaleIn other words:Apache Mahout provides a unified API for quickly creating machine learning algorithms on a variety of engines.How to useWhen starting a session with Apache Mahout, depending on which engine you are using (Spark or Flink), a few imports must be made and a Distributed Context must be declared. Copy and paste the following code and run once to get started.Flink%flinkMahoutimport org.apache.flink.api.scala._import org.apache.mahout.math.drm._import org.apache.mahout.math.drm.RLikeDrmOps._import org.apache.mahout
.flinkbindings._import org.apache.mahout.math._import scalabindings._import RLikeOps._implicit val ctx = new FlinkDistributedContext(benv)Spark%sparkMahoutimport org.apache.mahout.math._import org.apache.mahout.math.scalabindings._import org.apache.mahout.math.drm._import org.apache.mahout.math.scalabindings.RLikeOps._import org.apache.mahout.math.drm.RLikeDrmOps._import org.apache.mahout.sparkbindings._implicit val sdc: org.apache.mahout.sparkbindings.SparkDistributedContext = sc2sdc(sc)Same Code, Different EnginesAfter importing and setting up the distributed context, the Mahout R-Like DSL is consistent across engines. The following code will run in both %flinkMahout and %sparkMahoutval drmData = drmParallelize(dense( (2, 2, 10.5, 10, 29.509541), // Apple Cinnamon Cheerios (1, 2, 12, 12, 18.042851), // Cap&#39;n&#39;Crunch (1, 1, 12, 13, 22.736446), // Cocoa Puffs (2, 1, 11, 13, 32.207582), // Froot Loops (1, 2, 12, 11, 21.871292), // Honey Graham Ohs (
2, 1, 16, 8, 36.187559), // Wheaties Honey Gold (6, 2, 17, 1, 50.764999), // Cheerios (3, 2, 13, 7, 40.400208), // Clusters (3, 3, 13, 4, 45.811716)), numPartitions = 2)drmData.collect(::, 0 until 4)val drmX = drmData(::, 0 until 4)val y = drmData.collect(::, 4)val drmXtX = drmX.t %*% drmXval drmXty = drmX.t %*% yval XtX = drmXtX.collectval Xty = drmXty.collect(::, 0)val beta = solve(XtX, Xty)Leveraging Resource Pools and R for VisualizationResource Pools are a powerful Zeppelin feature that lets us share information between interpreters. A fun trick is to take the output of our work in Mahout and analyze it in other languages.Setting up a Resource Pool in FlinkIn Spark based interpreters resource pools are accessed via the ZeppelinContext API. To put and get things from the resource pool one can be done simpleval myVal = 1z.put(&quot;foo&quot;, myVal)val myFetchedVal = z.get(&quot;foo&quot;)To add this functionality to a Flink based interpreter we
declare the follwoing%flinkMahoutimport org.apache.zeppelin.interpreter.InterpreterContextval z = InterpreterContext.get().getResourcePool()Now we can access the resource pool in a consistent manner from the %flinkMahout interpreter.Passing a variable from Mahout to R and PlottingIn this simple example, we use Mahout (on Flink or Spark, the code is the same) to create a random matrix and then take the Sin of each element. We then randomly sample the matrix and create a tab separated string. Finally we pass that string to R where it is read as a .tsv file, and a DataFrame is created and plotted using native R plotting libraries.val mxRnd = Matrices.symmetricUniformView(5000, 2, 1234)val drmRand = drmParallelize(mxRnd)val drmSin = drmRand.mapBlock() {case (keys, block) =&gt; val blockB = block.like() for (i &lt;- 0 until block.nrow) { blockB(i, 0) = block(i, 0) blockB(i, 1) = Math.sin((block(i, 0) * 8)) } keys -&gt; blockB}z.put(&quot;sinDrm&quot;, org
.apache.mahout.math.drm.drmSampleToTSV(drmSin, 0.85))And then in an R paragraph...%spark.r {&quot;imageWidth&quot;: &quot;400px&quot;}library(&quot;ggplot2&quot;)sinStr = z.get(&quot;flinkSinDrm&quot;)data &lt;- read.table(text= sinStr, sep=&quot;t&quot;, header=FALSE)plot(data, col=&quot;red&quot;)",
+ "url": " /interpreter/mahout.html",
+ "group": "interpreter",
+ "excerpt": "Apache Mahout provides a unified API (the R-Like Scala DSL) for quickly creating machine learning algorithms on a variety of engines."
+ }
+ ,
+
+
+
+ "/interpreter/kotlin.html": {
+ "title": "Kotlin interpreter in Apache Zeppelin",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Kotlin interpreter for Apache ZeppelinOverviewKotlin is a cross-platform, statically typed, general-purpose programming language with type inference.It is designed to interoperate fully with Java, and the JVM version of its standard library depends on the Java Class Library, but type inference allows its syntax to be more concise.Configuration Name Default Description zeppelin.kot
lin.maxResult 1000 Max n zeppelin.kotlin.shortenTypes true Display shortened types instead of full, e.g. Int vs kotlin.Int Example%kotlin fun square(n: Int): Int = n * nKotlin ContextKotlin context is accessible via kc object bound to the interpreter. It holds vars and functions fields that return all user-defined variables and functions present in the interpreter.You can also print variables or functions by calling kc.showVars() or kc.showFunctions().Examplefun square(n: Int): Int = n * nval greeter = { s: String -&gt; println(&quot;Hello $s!&quot;) }val l = listOf(&quot;Drive&quot;, &quot;to&quot;, &quot;develop&quot;)kc.showVars()kc.showFunctions()Output:l: List&lt;String&gt; = [Drive, to, develop]greeter: (String) -&gt; Unit = (kotlin.String) -&gt; kotlin.Unitfun square(Int): Int",
+ "url": " /interpreter/kotlin.html",
+ "group": "interpreter",
+ "excerpt": "Kotlin is a cross-platform, statically typed, general-purpose programming language with type inference."
+ }
+ ,
+
+
+
+ "/interpreter/spark.html": {
+ "title": "Apache Spark Interpreter for Apache Zeppelin",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Spark Interpreter for Apache ZeppelinOverviewApache Spark is a fast and general-purpose cluster computing system.It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs.Apache Spark is supported in Zeppelin with Spark interpreter group which consists of below six interpreters. Name Class Description %spark SparkInterpreter Creates a SparkContex
t/SparkSession and provides a Scala environment %spark.pyspark PySparkInterpreter Provides a Python environment %spark.ipyspark IPySparkInterpreter Provides a IPython environment %spark.r SparkRInterpreter Provides an R environment with SparkR support %spark.sql SparkSQLInterpreter Provides a SQL environment %spark.kotlin KotlinSparkInterpreter Provides a Kotlin environment ConfigurationThe Spark interpreter can be configured with properties provided by Zeppelin.You can also set other Spark properties which are not listed in the table. For a list of additional properties, refer to Spark Available Properties. Property Default Description SPARK_HOME Location of spark distribution spark.master local[*] Spark master uri. e.g. spark://masterhost:7077 spark.submit.deployMode The deploy mode of Spark driver program, either &quot;client&quot; or &quot;cluster&am
p;quot;, Which means to launch driver program locally (&quot;client&quot;) or remotely (&quot;cluster&quot;) on one of the nodes inside the cluster. spark.app.name Zeppelin The name of spark application. spark.driver.cores 1 Number of cores to use for the driver process, only in cluster mode. spark.driver.memory 1g Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in the same format as JVM memory strings with a size unit suffix (&quot;k&quot;, &quot;m&quot;, &quot;g&quot; or &quot;t&quot;) (e.g. 512m, 2g). spark.executor.cores 1 The number of cores to use on each executor spark.executor.memory 1g Executor memory per worker instance. e.g. 512m, 32g spark.executor.instances 2 The number of executors for static allocation spark.files Comma-separated list of files to be placed in the working directory of each exe
cutor. Globs are allowed. spark.jars Comma-separated list of jars to include on the driver and executor classpaths. Globs are allowed. spark.jars.packages Comma-separated list of Maven coordinates of jars to include on the driver and executor classpaths. The coordinates should be groupId:artifactId:version. If spark.jars.ivySettings is given artifacts will be resolved according to the configuration in the file, otherwise artifacts will be searched for in the local maven repo, then maven central and finally any additional remote repositories given by the command-line option --repositories. PYSPARK_PYTHON python Python binary executable to use for PySpark in both driver and executors (default is python). Property spark.pyspark.python take precedence if it is set PYSPARK_DRIVER_PYTHON python Python binary executable to use for PySpark in driver only (default is PYSPARK_PYTHON). Property spark.pyspark.driver.pyt
hon take precedence if it is set zeppelin.pyspark.useIPython false Whether use IPython when the ipython prerequisites are met in %spark.pyspark zeppelin.R.cmd R R binary executable path. zeppelin.spark.concurrentSQL false Execute multiple SQL concurrently if set true. zeppelin.spark.concurrentSQL.max 10 Max number of SQL concurrently executed zeppelin.spark.maxResult 1000 Max number rows of Spark SQL result to display. zeppelin.spark.printREPLOutput true Print scala REPL output zeppelin.spark.useHiveContext true Use HiveContext instead of SQLContext if it is true. Enable hive for SparkSession zeppelin.spark.enableSupportedVersionCheck true Do not change - developer only setting, not for production use zeppelin.spark.sql.interpolation false Enable ZeppelinContext variable interpolation into spark sql zeppelin.spark.uiWebUrl Overrides Spark UI default
URL. Value should be a full URL (ex: http://{hostName}/{uniquePath}. In Kubernetes mode, value can be Jinja template string with 3 template variables &#39;PORT&#39;, &#39;SERVICENAME&#39; and &#39;SERVICEDOMAIN&#39;. (ex: http://-.) spark.webui.yarn.useProxy false whether use yarn proxy url as spark weburl, e.g. http://localhost:8088/proxy/application1583396598068_0004 spark.repl.target jvm-1.6 Manually specifying the Java version of Spark Interpreter Scala REPL,Available options: scala-compile v2.10.7 to v2.11.12 supports &quot;jvm-1.5, jvm-1.6, jvm-1.7 and jvm-1.8&quot;, and the default value is jvm-1.6. scala-compile v2.10.1 to v2.10.6 supports &quot;jvm-1.5, jvm-1.6, jvm-1.7&quot;, and the default value is jvm-1.6. scala-compile v2.12.x defaults to jvm-1.8, and only supports jvm-1.8. Without any configuration, Spark interpreter works out of box in local mode. But if you
want to connect to your Spark cluster, you&#39;ll need to follow below two simple steps.Export SPARK_HOMEThere are several options for setting SPARK_HOME.Set SPARK_HOME in zeppelin-env.shSet SPARK_HOME in Interpreter setting pageSet SPARK_HOME via inline generic configuration 1. Set SPARK_HOME in zeppelin-env.shIf you work with only one version of spark, then you can set SPARK_HOME in zeppelin-env.sh because any setting in zeppelin-env.sh is globally applied.e.g. export SPARK_HOME=/usr/lib/sparkYou can optionally set more environment variables in zeppelin-env.sh# set hadoop conf direxport HADOOP_CONF_DIR=/usr/lib/hadoop2. Set SPARK_HOME in Interpreter setting pageIf you want to use multiple versions of spark, then you need create multiple spark interpreters and set SPARK_HOME for each of them. e.g.Create a new spark interpreter spark24 for spark 2.4 and set SPARK_HOME in interpreter setting pageCreate a new spark interpreter spark16 for spark 1.6 and set SPARK_HOME in interprete
r setting page3. Set SPARK_HOME via inline generic configurationBesides setting SPARK_HOME in interpreter setting page, you can also use inline generic configuration to put the configuration with code together for more flexibility. e.g.Set master in Interpreter menuAfter starting Zeppelin, go to Interpreter menu and edit spark.master property in your Spark interpreter setting. The value may vary depending on your Spark cluster deployment type.For example,local[*] in local modespark://master:7077 in standalone clusteryarn-client in Yarn client mode (Not supported in spark 3.x, refer below for how to configure yarn-client in Spark 3.x)yarn-cluster in Yarn cluster mode (Not supported in spark 3.x, refer below for how to configure yarn-client in Spark 3.x)mesos://host:5050 in Mesos clusterThat&#39;s it. Zeppelin will work with any version of Spark and any deployment type without rebuilding Zeppelin in this way.For the further information about Spark &amp; Zeppelin version comp
atibility, please refer to &quot;Available Interpreters&quot; section in Zeppelin download page.Note that without exporting SPARK_HOME, it&#39;s running in local mode with included version of Spark. The included version may vary depending on the build profile.Yarn client mode and local mode will run driver in the same machine with zeppelin server, this would be dangerous for production. Because it may run out of memory when there&#39;s many spark interpreters running at the same time. So we suggest you only allow yarn-cluster mode via setting zeppelin.spark.only_yarn_cluster in zeppelin-site.xml.Configure yarn mode for Spark 3.xSpecifying yarn-client &amp; yarn-cluster in spark.master is not supported in Spark 3.x any more, instead you need to use spark.master and spark.submit.deployMode together. Mode spark.master spark.submit.deployMode Yarn Client yarn client Yarn Cluster yarn cluster SparkContext, SQLContext, SparkSessi
on, ZeppelinContextSparkContext, SQLContext, SparkSession (for spark 2.x) and ZeppelinContext are automatically created and exposed as variable names sc, sqlContext, spark and z, respectively, in Scala, Kotlin, Python and R environments.Note that Scala/Python/R environment shares the same SparkContext, SQLContext, SparkSession and ZeppelinContext instance.YARN ModeZeppelin support both yarn client and yarn cluster mode (yarn cluster mode is supported from 0.8.0). For yarn mode, you must specify SPARK_HOME &amp; HADOOP_CONF_DIR. Usually you only have one hadoop cluster, so you can set HADOOP_CONF_DIR in zeppelin-env.sh which is applied to all spark interpreters. If you want to use spark against multiple hadoop cluster, then you need to defineHADOOP_CONF_DIR in interpreter setting or via inline generic configuration.Dependency ManagementFor spark interpreter, it is not recommended to use Zeppelin&#39;s Dependency Management for managing third party dependencies (%spark.dep is
removed from Zeppelin 0.9 as well). Instead you should set the standard Spark properties. Spark Property Spark Submit Argument Description spark.files --files Comma-separated list of files to be placed in the working directory of each executor. Globs are allowed. spark.jars --jars Comma-separated list of jars to include on the driver and executor classpaths. Globs are allowed. spark.jars.packages --packages Comma-separated list of Maven coordinates of jars to include on the driver and executor classpaths. The coordinates should be groupId:artifactId:version. If spark.jars.ivySettings is given artifacts will be resolved according to the configuration in the file, otherwise artifacts will be searched for in the local maven repo, then maven central and finally any additional remote repositories given by the command-line option --repositories. You can either set Spark properties in interpreter setting page or set Spark submit arguments
in zeppelin-env.sh via environment variable SPARK_SUBMIT_OPTIONS. For examples:export SPARK_SUBMIT_OPTIONS=&quot;--files &lt;my_file&gt; --jars &lt;my_jar&gt; --packages &lt;my_package&gt;&quot;But it is not recommended to set them in SPARK_SUBMIT_OPTIONS. Because it will be shared by all spark interpreters, which means you can not set different dependencies for different users.PySparkThere&#39;re 2 ways to use PySpark in Zeppelin:Vanilla PySparkIPySparkVanilla PySpark (Not Recommended)Vanilla PySpark interpreter is almost the same as vanilla Python interpreter except Zeppelin inject SparkContext, SQLContext, SparkSession via variables sc, sqlContext, spark.By default, Zeppelin would use IPython in %spark.pyspark when IPython is available, Otherwise it would fall back to the original PySpark implementation.If you don&#39;t want to use IPython, then you can set zeppelin.pyspark.useIPython as false in interpreter setting. For the IPython feature
s, you can refer docPython InterpreterIPySpark (Recommended)You can use IPySpark explicitly via %spark.ipyspark. IPySpark interpreter is almost the same as IPython interpreter except Zeppelin inject SparkContext, SQLContext, SparkSession via variables sc, sqlContext, spark.For the IPython features, you can refer doc Python InterpreterSparkRZeppelin support SparkR via %spark.r. Here&#39;s configuration for SparkR Interpreter. Spark Property Default Description zeppelin.R.cmd R R binary executable path. zeppelin.R.knitr true Whether use knitr or not. (It is recommended to install knitr and use it in Zeppelin) zeppelin.R.image.width 100% R plotting image width. zeppelin.R.render.options out.format = 'html', comment = NA, echo = FALSE, results = 'asis', message = F, warning = F, fig.retina = 2 R plotting options. SparkSqlSpark Sql Interpreter share the same SparkContext/SparkSession with other Spark inte
rpreter. That means any table registered in scala, python or r code can be accessed by Spark Sql.For examples:%sparkcase class People(name: String, age: Int)var df = spark.createDataFrame(List(People(&quot;jeff&quot;, 23), People(&quot;andy&quot;, 20)))df.createOrReplaceTempView(&quot;people&quot;)%spark.sqlselect * from peopleBy default, each sql statement would run sequentially in %spark.sql. But you can run them concurrently by following setup.Set zeppelin.spark.concurrentSQL to true to enable the sql concurrent feature, underneath zeppelin will change to use fairscheduler for spark. And also set zeppelin.spark.concurrentSQL.max to control the max number of sql statements running concurrently.Configure pools by creating fairscheduler.xml under your SPARK_CONF_DIR, check the official spark doc Configuring Pool PropertiesSet pool property via setting paragraph property. e.g.%spark(pool=pool1)sql statementThis pool feature is also available for all versions o
f scala Spark, PySpark. For SparkR, it is only available starting from 2.3.0.Interpreter Setting OptionYou can choose one of shared, scoped and isolated options when you configure Spark interpreter.e.g. In scoped per user mode, Zeppelin creates separated Scala compiler for each user but share a single SparkContext.In isolated per user mode, Zeppelin creates separated SparkContext for each user.ZeppelinContextZeppelin automatically injects ZeppelinContext as variable z in your Scala/Python environment. ZeppelinContext provides some additional functions and utilities.See Zeppelin-Context for more details.Setting up Zeppelin with KerberosLogical setup with Zeppelin, Kerberos Key Distribution Center (KDC), and Spark on YARN:There&#39;re several ways to make spark work with kerberos enabled hadoop cluster in Zeppelin. Share one single hadoop cluster.In this case you just need to specify zeppelin.server.kerberos.keytab and zeppelin.server.kerberos.principal in zeppelin-site.xml, Spark
interpreter will use these setting by default.Work with multiple hadoop clusters.In this case you can specify spark.yarn.keytab and spark.yarn.principal to override zeppelin.server.kerberos.keytab and zeppelin.server.kerberos.principal.User ImpersonationIn yarn mode, the user who launch the zeppelin server will be used to launch the spark yarn application. This is not a good practise.Most of time, you will enable shiro in Zeppelin and would like to use the login user to submit the spark yarn app. For this purpose,you need to enable user impersonation for more security control. In order the enable user impersonation, you need to do the following stepsStep 1 Enable user impersonation setting hadoop&#39;s core-site.xml. E.g. if you are using user zeppelin to launch Zeppelin, then add the following to core-site.xml, then restart both hdfs and yarn. &lt;property&gt; &lt;name&gt;hadoop.proxyuser.zeppelin.groups&lt;/name&gt; &lt;value&gt;*&lt;/val
ue&gt;&lt;/property&gt;&lt;property&gt; &lt;name&gt;hadoop.proxyuser.zeppelin.hosts&lt;/name&gt; &lt;value&gt;*&lt;/value&gt;&lt;/property&gt;Step 2 Enable interpreter user impersonation in Spark interpreter&#39;s interpreter setting. (Enable shiro first of course)Step 3(Optional) If you are using kerberos cluster, then you need to set zeppelin.server.kerberos.keytab and zeppelin.server.kerberos.principal to the user(aka. user in Step 1) you want to impersonate in zeppelin-site.xml.Deprecate Spark 2.2 and earlier versionsStarting from 0.9, Zeppelin deprecate Spark 2.2 and earlier versions. So you will see a warning message when you use Spark 2.2 and earlier.You can get rid of this message by setting zeppelin.spark.deprecatedMsg.show to false.Configuration SetupOn the server that Zeppelin is installed, install Kerberos client modules and configuration, krb5.conf.This is to make the server communicate with KDC.Add the t
wo properties below to Spark configuration ([SPARK_HOME]/conf/spark-defaults.conf):spark.yarn.principalspark.yarn.keytabNOTE: If you do not have permission to access for the above spark-defaults.conf file, optionally, you can add the above lines to the Spark Interpreter setting through the Interpreter tab in the Zeppelin UI.That&#39;s it. Play with Zeppelin!",
+ "url": " /interpreter/spark.html",
+ "group": "interpreter",
+ "excerpt": "Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution engine."
+ }
+ ,
+
+
+
+ "/interpreter/python.html": {
+ "title": "Python 2 & 3 Interpreter for Apache Zeppelin",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Python 2 &amp; 3 Interpreter for Apache ZeppelinOverviewZeppelin supports python language which is very popular in data analytics and machine learning. Name Class Description %python PythonInterpreter Vanilla python interpreter, with least dependencies, only python environment installed is required %python.ipython IPythonInterpreter Provide more fancy python runtime via IPython, almost the s
ame experience like Jupyter. It requires more things, but is the recommended interpreter for using python in Zeppelin, see below %python.sql PythonInterpreterPandasSql Provide sql capability to query data in Pandas DataFrame via pandasql Configuration Property Default Description zeppelin.python python Path of the installed Python binary (could be python2 or python3). You should set this property explicitly if python is not in your $PATH(example: /usr/bin/python). zeppelin.python.maxResult 1000 Max number of dataframe rows to display. zeppelin.python.useIPython true When this property is true, %python would be delegated to %python.ipython if IPython is available, otherwise IPython is only used in %python.ipython. Vanilla Python Interpreter (%python)The vanilla python interpreter provides basic python interpreter feature, only python installed is required.Matplotlib integrationThe vanilla python interprete
r can display matplotlib figures inline automatically using the matplotlib:%pythonimport matplotlib.pyplot as pltplt.plot([1, 2, 3])The output of this command will by default be converted to HTML by implicitly making use of the %html magic. Additional configuration can be achieved using the builtin z.configure_mpl() method. For example, z.configure_mpl(width=400, height=300, fmt=&#39;svg&#39;)plt.plot([1, 2, 3])Will produce a 400x300 image in SVG format, which by default are normally 600x400 and PNG respectively. In the future, another option called angular can be used to make it possible to update a plot produced from one paragraph directly from another (the output will be %angular instead of %html). However, this feature is already available in the pyspark interpreter. More details can be found in the included &quot;Zeppelin Tutorial: Python - matplotlib basic&quot; tutorial notebook. If Zeppelin cannot find the matplotlib backend files (which should usually be fou
nd in $ZEPPELIN_HOME/interpreter/lib/python) in your PYTHONPATH, then the backend will automatically be set to agg, and the (otherwise deprecated) instructions below can be used for more limited inline plotting.If you are unable to load the inline backend, use z.show(plt):%pythonimport matplotlib.pyplot as pltplt.figure()(.. ..)z.show(plt)plt.close()The z.show() function can take optional parameters to adapt graph dimensions (width and height) as well as output format (png or optionally svg).%pythonz.show(plt, width=&#39;50px&#39;)z.show(plt, height=&#39;150px&#39;, fmt=&#39;svg&#39;)IPython Interpreter (%python.ipython) (recommended)IPython is more powerful than the vanilla python interpreter with extra functionality. You can use IPython with Python2 or Python3 which depends on which python you set in zeppelin.python.For non-anaconda environment Prerequisites- Jupyter `pip install jupyter`- grpcio `pip install grpcio`- protobuf `pip install protobuf`For anac
onda environment (zeppelin.python points to the python under anaconda)Prerequisites- grpcio `pip install grpcio`- protobuf `pip install protobuf`In addition to all the basic functions of the vanilla python interpreter, you can use all the IPython advanced features as you use it in Jupyter Notebook.e.g. Use IPython magic%python.ipython#python helprange?#timeit%timeit range(100)Use matplotlib%python.ipython%matplotlib inlineimport matplotlib.pyplot as pltprint(&quot;hello world&quot;)data=[1,2,3,4]plt.figure()plt.plot(data)Colored text outputMore types of visualizatione.g. IPython supports hvplotBetter code completionBy default, Zeppelin would use IPython in %python if IPython prerequisites are meet, otherwise it would use vanilla Python interpreter in %python.If you don&#39;t want to use IPython via %python, then you can set zeppelin.python.useIPython as false in interpreter setting.Pandas integrationApache Zeppelin Table Display System provides built-in data visualizatio
n capabilities. Python interpreter leverages it to visualize Pandas DataFrames though similar z.show() API, same as with Matplotlib integration.Example:%pythonimport pandas as pdrates = pd.read_csv(&quot;bank.csv&quot;, sep=&quot;;&quot;)z.show(rates)SQL over Pandas DataFramesThere is a convenience %python.sql interpreter that matches Apache Spark experience in Zeppelin and enables usage of SQL language to query Pandas DataFrames and visualization of results though built-in Table Display System.PrerequisitesPandas pip install pandasPandaSQL pip install -U pandasqlHere&#39;s one example:first paragraph%pythonimport pandas as pdrates = pd.read_csv(&quot;bank.csv&quot;, sep=&quot;;&quot;) ```next paragraph%python.sqlSELECT * FROM rates WHERE age &lt; 40 ```Using Zeppelin Dynamic FormsYou can leverage Zeppelin Dynamic Form inside your Python code.Example : %python### Input formprint(z.input(&quot;f1&quot;,&quot;defaultValue&quot;
))### Select formprint(z.select(&quot;f2&quot;,[(&quot;o1&quot;,&quot;1&quot;),(&quot;o2&quot;,&quot;2&quot;)],&quot;o1&quot;))### Checkbox formprint(&quot;&quot;.join(z.checkbox(&quot;f3&quot;, [(&quot;o1&quot;,&quot;1&quot;), (&quot;o2&quot;,&quot;2&quot;)],[&quot;o1&quot;])))ZeppelinContext APIPython interpreter create a variable z which represent ZeppelinContext for you. User can use it to do more fancy and complex things in Zeppelin. API Description z.put(key, value) Put object value with identifier key to distributed resource pool of Zeppelin, so that it can be used by other interpreters z.get(key) Get object with identifier key from distributed resource pool of Zeppelin z.remove(key) Remove object with identifier key from distributed resource pool of Zeppelin z.getAsDataFrame(key) Get object with identifier key from distri
buted resource pool of Zeppelin and converted into pandas dataframe. The object in the distributed resource pool must be table type, e.g. jdbc interpreter result. z.angular(name, noteId = None, paragraphId = None) Get the angular object with identifier name z.angularBind(name, value, noteId = None, paragraphId = None) Bind value to angular object with identifier name z.angularUnbind(name, noteId = None) Unbind value from angular object with identifier name z.show(p) Show python object p in Zeppelin, if it is pandas dataframe, it would be displayed in Zeppelin's table format, others will be converted to string z.textbox(name, defaultValue="") Create dynamic form Textbox name with defaultValue z.select(name, options, defaultValue="") Create dynamic form Select name with options and defaultValue. options should be a list of Tuple(first element is key, the second element is the displayed
value) e.g. z.select("f2",[("o1","1"),("o2","2")],"o1") z.checkbox(name, options, defaultChecked=[]) Create dynamic form Checkbox `name` with options and defaultChecked. options should be a list of Tuple(first element is key, the second element is the displayed value) e.g. z.checkbox("f3", [("o1","1"), ("o2","2")],["o1"]) z.noteTextbox(name, defaultValue="") Create note level dynamic form Textbox z.noteSelect(name, options, defaultValue="") Create note level dynamic form Select z.noteCheckbox(name, options, defaultChecked=[]) Create note level dynamic form Checkbox z.run(paragraphId) Run paragraph z.run(noteId, paragraphId) Run paragraph z.runNote(noteId) Run the whole note Python environmentsDefaultBy default, PythonInterpreter will use python command defined in zeppeli
n.python property to run python process.The interpreter can use all modules already installed (with pip, easy_install...)CondaConda is an package management system and environment management system for python.%python.conda interpreter lets you change between environments.Usageget the Conda Information: %python.conda infolist the Conda environments: %python.conda env listcreate a conda enviornment: %python.conda create --name [ENV NAME]activate an environment (python interpreter will be restarted): %python.conda activate [ENV NAME]deactivate%python.conda deactivateget installed package list inside the current environment%python.conda listinstall package%python.conda install [PACKAGE NAME]uninstall package%python.conda uninstall [PACKAGE NAME]Docker%python.docker interpreter allows PythonInterpreter creates python process in a specified docker container.Usageactivate an environment%python.docker activate [Repository]%python.docker activate [Repository:Tag]%python.docker activate [Imag
e Id]deactivate%python.docker deactivateHere is an example# activate latest tensorflow image as a python environment%python.docker activate gcr.io/tensorflow/tensorflow:latestTechnical descriptionFor in-depth technical details on current implementation please refer to python/README.md.Some features not yet implemented in the vanilla Python interpreterInterrupt a paragraph execution (cancel() method) is currently only supported in Linux and MacOs. If interpreter runs in another operating system (for instance MS Windows) , interrupt a paragraph will close the whole interpreter. A JIRA ticket (ZEPPELIN-893) is opened to implement this feature in a next release of the interpreter.Progression bar in webUI (getProgress() method) is currently not implemented.",
+ "url": " /interpreter/python.html",
+ "group": "interpreter",
+ "excerpt": "Python is a programming language that lets you work quickly and integrate systems more effectively."
+ }
+ ,
+
+
+
+ "/interpreter/hive.html": {
+ "title": "Hive Interpreter for Apache Zeppelin",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Hive Interpreter for Apache ZeppelinImportant NoticeHive Interpreter will be deprecated and merged into JDBC Interpreter. You can use Hive Interpreter by using JDBC Interpreter with same functionality. See the example below of settings and dependencies.Properties Property Value hive.driver org.apache.hive.jdbc.HiveDriver hive.url jdbc:hive2://localhost:10000 hive.user hiveUser hive.passw
ord hivePassword Dependencies Artifact Exclude org.apache.hive:hive-jdbc:0.14.0 org.apache.hadoop:hadoop-common:2.6.0 Configuration Property Default Description default.driver org.apache.hive.jdbc.HiveDriver Class path of JDBC driver default.url jdbc:hive2://localhost:10000 Url for connection default.user ( Optional ) Username of the connection default.password ( Optional ) Password of the connection default.xxx ( Optional ) Other properties used by the driver ${prefix}.driver Driver class path of %hive(${prefix}) ${prefix}.url Url of %hive(${prefix}) ${prefix}.user ( Optional ) Username of the connection of %hive(${prefix}) ${prefix}.password ( Optional ) Password of the connection of %hive(${prefix}) ${prefix}.xxx ( Optional ) Other properties used by the driver of %hive(${prefix}) This interpr
eter provides multiple configuration with ${prefix}. User can set a multiple connection properties by this prefix. It can be used like %hive(${prefix}).OverviewThe Apache Hive ⢠data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.How to useBasically, you can use%hiveselect * from my_table;or%hive(etl)-- &#39;etl&#39; is a ${prefix}select * from my_table;You can also run multiple queries up to 10 by default. Changing these settings is not implemented yet.Apply Zeppelin Dynamic FormsYou can leverage Zeppelin Dynamic Form inside your queries. You can use both the text input and select form parameterization fea
tures.%hiveSELECT ${group_by}, count(*) as countFROM retail_demo.order_lineitems_pxfGROUP BY ${group_by=product_id,product_id|product_name|customer_id|store_id}ORDER BY count ${order=DESC,DESC|ASC}LIMIT ${limit=10};",
+ "url": " /interpreter/hive.html",
+ "group": "interpreter",
+ "excerpt": "Apache Hive data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this..."
+ }
+ ,
+
+
+
+ "/interpreter/influxdb.html": {
+ "title": "InfluxDB Interpreter for Apache Zeppelin",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->InfluxDB Interpreter for Apache ZeppelinOverviewInfluxDB is an open-source time series database (TSDB) developed by InfluxData. It is written in Go and optimized for fast, high-availability storage and retrieval of time series data in fields such as operations monitoring, application metrics, Internet of Things sensor data, and real-time analytics.This interpreter allows to perform queries in Flux Language in Zeppelin Notebook.Not
esThis interpreter is compatible with InfluxDB 1.8+ and InfluxDB 2.0+ (v2 API, Flux language)Code complete and syntax highlighting is not supported for nowExample notebookConfiguration Property Default Value influxdb.url http://localhost:9999 InfluxDB API connection url influxdb.org my-org organization name, Organizations are supported in InfluxDB 2.0+, use "-" as org for InfluxDB 1.8 influxdb.token my-token authorization token for InfluxDB API, token are supported in InfluxDB 2.0+, for InfluxDB 1.8 use 'username:password' as a token. influxdb.logLevel NONE InfluxDB client library verbosity level (for debugging purpose) Example configurationOverviewHow to useBasically, you can use%influxdbfrom(bucket: &quot;my-bucket&quot;) |&gt; range(start: -1h) |&gt; filter(fn: (r) =&gt; r._measurement == &quot;cpu&quot;) |&gt; filter(fn: (r) =&gt; r.cpu == &quot;cpu-total&
amp;quot;) |&gt; pivot(rowKey:[&quot;_time&quot;], columnKey: [&quot;_field&quot;], valueColumn: &quot;_value&quot;)In this example we use data collected by [[inputs.cpu]] Telegraf input plugin. The result of Flux command can contain more one or more tables. In the case of multiple tables, each table is rendered as a separate %table structure. This example uses pivot function to collect values from multiple tables into single table. How to run InfluxDB 2.0 using dockerdocker pull quay.io/influxdb/influxdb:nightlydocker run --name influxdb -p 9999:9999 quay.io/influxdb/influxdb:nightly## Post onBoarding request, to setup initial user (my-user@my-password), org (my-org) and bucketSetup (my-bucket)&quot;curl -i -X POST http://localhost:9999/api/v2/setup -H &#39;accept: application/json&#39; -d &#39;{ &quot;username&quot;: &quot;my-user&quot;, &quot;password&quot;: &quot;my-password&q
uot;, &quot;org&quot;: &quot;my-org&quot;, &quot;bucket&quot;: &quot;my-bucket&quot;, &quot;token&quot;: &quot;my-token&quot; }&#39;",
+ "url": " /interpreter/influxdb.html",
+ "group": "interpreter",
+ "excerpt": "InfluxDB is an open-source time series database designed to handle high write and query loads."
+ }
+ ,
+
+
+
+ "/interpreter/ignite.html": {
+ "title": "Ignite Interpreter for Apache Zeppelin",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Ignite Interpreter for Apache ZeppelinOverviewApache Ignite In-Memory Data Fabric is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.You can use Zeppelin to retrieve distributed data from cache using Ignite SQL interpreter. Moreover, Ignite interpreter allo
ws you to execute any Scala code in cases when SQL doesn&#39;t fit to your requirements. For example, you can populate data into your caches or execute distributed computations.Installing and Running Ignite exampleIn order to use Ignite interpreters, you may install Apache Ignite in some simple steps:Ignite provides examples only with source or binary release. Download Ignite source release or binary release whatever you want. But you must download Ignite as the same version of Zeppelin&#39;s. If it is not, you can&#39;t use scala code on Zeppelin. The supported Ignite version is specified in Supported Interpreter table for each Zeppelin release. If you&#39;re using Zeppelin master branch, please see ignite.version in path/to/your-Zeppelin/ignite/pom.xml.Examples are shipped as a separate Maven project, so to start running you simply need to import provided &lt;dest_dir&gt;/apache-ignite-fabric-{version}-bin/examples/pom.xml file into your favourite IDE, such
as Eclipse.In case of Eclipse, Eclipse -&gt; File -&gt; Import -&gt; Existing Maven ProjectsSet examples directory path to Eclipse and select the pom.xml.Then start org.apache.ignite.examples.ExampleNodeStartup (or whatever you want) to run at least one or more ignite node. When you run example code, you may notice that the number of node is increase one by one.Tip. If you want to run Ignite examples on the cli not IDE, you can export executable Jar file from IDE. Then run it by using below command.nohup java -jar &lt;/path/to/your Jar file name&gt;Configuring Ignite InterpreterAt the &quot;Interpreters&quot; menu, you may edit Ignite interpreter or create new one. Zeppelin provides these properties for Ignite. Property Name value Description ignite.addresses 127.0.0.1:47500..47509 Coma separated list of Ignite cluster hosts. See Ignite Cluster Configuration section for more details. ignite.clientMode true You can con
nect to the Ignite cluster as client or server node. See Ignite Clients vs. Servers section for details. Use true or false values in order to connect in client or server mode respectively. ignite.config.url Configuration URL. Overrides all other settings. ignite.jdbc.url jdbc:ignite:cfg://default-ignite-jdbc.xml Ignite JDBC connection URL. ignite.peerClassLoadingEnabled true Enables peer-class-loading. See Zero Deployment section for details. Use true or false values in order to enable or disable P2P class loading respectively. How to useAfter configuring Ignite interpreter, create your own notebook. Then you can bind interpreters like below image.For more interpreter binding information see here.Ignite SQL interpreterIn order to execute SQL query, use %ignite.ignitesql prefix. Supposing you are running org.apache.ignite.examples.streaming.wordcount.StreamWords, then you can use &quot;words&quot; cache( Of course you have to specify t
his cache name to the Ignite interpreter setting section ignite.jdbc.url of Zeppelin ).For example, you can select top 10 words in the words cache using the following query%ignite.ignitesqlselect _val, count(_val) as cnt from String group by _val order by cnt desc limit 10As long as your Ignite version and Zeppelin Ignite version is same, you can also use scala code. Please check the Zeppelin Ignite version before you download your own Ignite.%igniteimport org.apache.ignite._import org.apache.ignite.cache.affinity._import org.apache.ignite.cache.query._import org.apache.ignite.configuration._import scala.collection.JavaConversions._val cache: IgniteCache[AffinityUuid, String] = ignite.cache(&quot;words&quot;)val qry = new SqlFieldsQuery(&quot;select avg(cnt), min(cnt), max(cnt) from (select count(_val) as cnt from String group by _val)&quot;, true)val res = cache.query(qry).getAll()collectionAsScalaIterable(res).foreach(println _)Apache Ignite also provides a guide d
ocs for Zeppelin &quot;Ignite with Apache Zeppelin&quot;",
+ "url": " /interpreter/ignite.html",
+ "group": "interpreter",
+ "excerpt": "Apache Ignite in-memory Data Fabric is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies."
+ }
+ ,
+
+
+
+ "/interpreter/groovy.html": {
+ "title": "Apache Groovy Interpreter for Apache Zeppelin",
[... 976 lines stripped ...]
Propchange: zeppelin/site/docs/0.9.0/search_data.json
------------------------------------------------------------------------------
svn:executable = *