You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@zeppelin.apache.org by mo...@apache.org on 2016/10/11 07:28:10 UTC
svn commit: r1764207 [2/2] - in /zeppelin/site/docs/0.7.0-SNAPSHOT: ./
assets/themes/zeppelin/img/docs-img/ development/ displaysystem/ install/
interpreter/ manual/ quickstart/ rest-api/ security/ storage/
Modified: zeppelin/site/docs/0.7.0-SNAPSHOT/search.html
URL: http://svn.apache.org/viewvc/zeppelin/site/docs/0.7.0-SNAPSHOT/search.html?rev=1764207&r1=1764206&r2=1764207&view=diff
==============================================================================
--- zeppelin/site/docs/0.7.0-SNAPSHOT/search.html (original)
+++ zeppelin/site/docs/0.7.0-SNAPSHOT/search.html Tue Oct 11 07:28:09 2016
@@ -80,6 +80,7 @@
<li role="separator" class="divider"></li>
<li class="title"><span><b>More</b><span></li>
<li><a href="/docs/0.7.0-SNAPSHOT/install/upgrade.html">Upgrade Zeppelin Version</a></li>
+ <li><a href="/docs/0.7.0-SNAPSHOT/quickstart/install_with_flink_and_spark_cluster.html">Install Zeppelin with Flink and Spark Clusters Tutorial</a></li>
</ul>
</li>
<li>
Modified: zeppelin/site/docs/0.7.0-SNAPSHOT/search_data.json
URL: http://svn.apache.org/viewvc/zeppelin/site/docs/0.7.0-SNAPSHOT/search_data.json?rev=1764207&r1=1764206&r2=1764207&view=diff
==============================================================================
--- zeppelin/site/docs/0.7.0-SNAPSHOT/search_data.json (original)
+++ zeppelin/site/docs/0.7.0-SNAPSHOT/search_data.json Tue Oct 11 07:28:09 2016
@@ -94,7 +94,7 @@
"/install/install.html": {
"title": "Quick Start",
- "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Quick StartWelcome to Apache Zeppelin! On this page are instructions to help you get started.InstallationApache Zeppelin officially supports and is tested on the following environments: Name Value Oracle JDK 1.7 (set JAVA_HOME) OS Mac OSX Ubuntu 14.X CentOS 6.X Windows 7 Pro SP1 To install Apache Zeppelin, you have two options:You can download pre-built binary packages from the archive. This is usua
lly easier than building from source, and you can download the latest stable version (or older versions, if necessary).You can also build from source. This gives you a development version of Zeppelin, which is more unstable but has new features.Downloading Binary PackageStable binary packages are available on the Apache Zeppelin Download Page. You can download a default package with all interpreters, or you can download the net-install package, which lets you choose which interpreters to install.If you downloaded the default package, just unpack it in a directory of your choice and you&#39;re ready to go. If you downloaded the net-install package, you should manually install additional interpreters first. You can also install everything by running ./bin/install-interpreter.sh --all.After unpacking, jump to the Starting Apache Zeppelin with Command Line.Building from SourceIf you want to build from source, you must first install the following dependencies: Name Value
Git (Any Version) Maven 3.1.x or higher If you haven&#39;t installed Git and Maven yet, check the Before Build section and follow the step by step instructions from there.1. Clone the Apache Zeppelin repositorygit clone https://github.com/apache/zeppelin.git2. Build source with optionsEach interpreter requires different build options. For more information about build options, please see the Build section.mvn clean package -DskipTests [Options]Here are some examples with several options:# build with spark-2.0, scala-2.11./dev/change_scala_version.sh 2.11mvn clean package -Pspark-2.0 -Phadoop-2.4 -Pyarn -Ppyspark -Psparkr -Pscala-2.11# build with spark-1.6, scala-2.10mvn clean package -Pspark-1.6 -Phadoop-2.4 -Pyarn -Ppyspark -Psparkr# spark-cassandra integrationmvn clean package -Pcassandra-spark-1.5 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests# with CDHmvn clean package -Pspark-1.5 -Dhadoop.version=2.6.0-cdh5.5.0 -Phadoop-2.6 -Pvendor-repo -DskipTests# with M
apRmvn clean package -Pspark-1.5 -Pmapr50 -DskipTestsFor further information about building from source, please see README.md in the Zeppelin repository.Starting Apache Zeppelin from the Command LineStarting Apache ZeppelinOn all platforms except for Windows:bin/zeppelin-daemon.sh startIf you are using Windows:binzeppelin.cmdAfter Zeppelin has started successfully, go to http://localhost:8080 with your web browser.Stopping Zeppelinbin/zeppelin-daemon.sh stop(Optional) Start Apache Zeppelin with a service managerNote : The below description was written based on Ubuntu Linux.Apache Zeppelin can be auto-started as a service with an init script, using a service manager like upstart.This is an example upstart script saved as /etc/init/zeppelin.confThis allows the service to be managed with commands such assudo service zeppelin start sudo service zeppelin stop sudo service zeppelin restartOther service managers could use a similar approach with the upstart argument passed to the zeppeli
n-daemon.sh script.bin/zeppelin-daemon.sh upstartzeppelin.confdescription &quot;zeppelin&quot;start on (local-filesystems and net-device-up IFACE!=lo)stop on shutdown# Respawn the process on unexpected terminationrespawn# respawn the job up to 7 times within a 5 second period.# If the job exceeds these values, it will be stopped and marked as failed.respawn limit 7 5# zeppelin was installed in /usr/share/zeppelin in this examplechdir /usr/share/zeppelinexec bin/zeppelin-daemon.sh upstartNext Steps:Congratulations, you have successfully installed Apache Zeppelin! Here are two next steps you might find useful:If you are new to Apache Zeppelin...For an in-depth overview of the Apache Zeppelin UI, head to Explore Apache Zeppelin UI.After getting familiar with the Apache Zeppelin UI, have fun with a short walk-through Tutorial that uses the Apache Spark backend.If you need more configuration for Apache Zeppelin, jump to the next section: Apache Zeppelin Configuration.If you need
more information about Spark or JDBC interpreter settings...Apache Zeppelin provides deep integration with Apache Spark. For more informtation, see Spark Interpreter for Apache Zeppelin. You can also use generic JDBC connections in Apache Zeppelin. Go to Generic JDBC Interpreter for Apache Zeppelin.If you are in a multi-user environment...You can set permissions for your notebooks and secure data resource in a multi-user environment. Go to More -&gt; Security section.Apache Zeppelin ConfigurationYou can configure Apache Zeppelin with either environment variables in conf/zeppelin-env.sh (confzeppelin-env.cmd for Windows) or Java properties in conf/zeppelin-site.xml. If both are defined, then the environment variables will take priority. zeppelin-env.sh zeppelin-site.xml Default value Description ZEPPELIN_PORT zeppelin.server.port 8080 Zeppelin server port ZEPPELIN_MEM N/A -Xmx1024m -XX:MaxPermSize=512m JVM mem options ZEPPELIN_
INTP_MEM N/A ZEPPELIN_MEM JVM mem options for interpreter process ZEPPELIN_JAVA_OPTS N/A JVM options ZEPPELIN_ALLOWED_ORIGINS zeppelin.server.allowed.origins * Enables a way to specify a ',' separated list of allowed origins for REST and websockets. i.e. http://localhost:8080 N/A zeppelin.anonymous.allowed true The anonymous user is allowed by default. ZEPPELIN_SERVER_CONTEXT_PATH zeppelin.server.context.path / Context path of the web application ZEPPELIN_SSL zeppelin.ssl false ZEPPELIN_SSL_CLIENT_AUTH zeppelin.ssl.client.auth false ZEPPELIN_SSL_KEYSTORE_PATH zeppelin.ssl.keystore.path keystore ZEPPELIN_SSL_KEYSTORE_TYPE zeppelin.ssl.keystore.type JKS ZEPPELIN_SSL_KEYSTORE_PASSWORD zeppelin.ssl.keystore.password ZEPPELIN_SSL_KEY_MANAGER_PASSWORD zeppelin.ssl.key.manager.password ZEPPELIN_S
SL_TRUSTSTORE_PATH zeppelin.ssl.truststore.path ZEPPELIN_SSL_TRUSTSTORE_TYPE zeppelin.ssl.truststore.type ZEPPELIN_SSL_TRUSTSTORE_PASSWORD zeppelin.ssl.truststore.password ZEPPELIN_NOTEBOOK_HOMESCREEN zeppelin.notebook.homescreen Display notebook IDs on the Apache Zeppelin homescreen i.e. 2A94M5J1Z ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE zeppelin.notebook.homescreen.hide false Hide the notebook ID set by ZEPPELIN_NOTEBOOK_HOMESCREEN on the Apache Zeppelin homescreen. For the further information, please read Customize your Zeppelin homepage. ZEPPELIN_WAR_TEMPDIR zeppelin.war.tempdir webapps Location of the jetty temporary directory ZEPPELIN_NOTEBOOK_DIR zeppelin.notebook.dir notebook The root directory where notebook directories are saved ZEPPELIN_NOTEBOOK_S3_BUCKET zeppelin.notebook.s3.bucket zeppelin S3 Bucket where notebook files will be saved ZEPPELIN_N
OTEBOOK_S3_USER zeppelin.notebook.s3.user user User name of an S3 bucketi.e. bucket/user/notebook/2A94M5J1Z/note.json ZEPPELIN_NOTEBOOK_S3_ENDPOINT zeppelin.notebook.s3.endpoint s3.amazonaws.com Endpoint for the bucket ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID zeppelin.notebook.s3.kmsKeyID AWS KMS Key ID to use for encrypting data in S3 (optional) ZEPPELIN_NOTEBOOK_S3_EMP zeppelin.notebook.s3.encryptionMaterialsProvider Class name of a custom S3 encryption materials provider implementation to use for encrypting data in S3 (optional) ZEPPELIN_NOTEBOOK_AZURE_CONNECTION_STRING zeppelin.notebook.azure.connectionString The Azure storage account connection stringi.e. DefaultEndpointsProtocol=https;AccountName=&lt;accountName&gt;;AccountKey=&lt;accountKey&gt; ZEPPELIN_NOTEBOOK_AZURE_SHARE zeppelin.notebook.azure.share zeppelin Azure Share where the notebook files will be saved ZEPPELIN_
NOTEBOOK_AZURE_USER zeppelin.notebook.azure.user user Optional user name of an Azure file sharei.e. share/user/notebook/2A94M5J1Z/note.json ZEPPELIN_NOTEBOOK_STORAGE zeppelin.notebook.storage org.apache.zeppelin.notebook.repo.VFSNotebookRepo Comma separated list of notebook storage locations ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC zeppelin.notebook.one.way.sync false If there are multiple notebook storage locations, should we treat the first one as the only source of truth? ZEPPELIN_INTERPRETERS zeppelin.interpreters org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.shell.ShellInterpreter, ... Comma separated interpreter configurations [Class] NOTE: This property is deprecated since Zeppelin-0.6.0 and will not be supported from Zeppeli
n-0.7.0 on. ZEPPELIN_INTERPRETER_DIR zeppelin.interpreter.dir interpreter Interpreter directory ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE zeppelin.websocket.max.text.message.size 1024000 Size (in characters) of the maximum text message that can be received by websocket. ",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Quick StartWelcome to Apache Zeppelin! On this page are instructions to help you get started.InstallationApache Zeppelin officially supports and is tested on the following environments: Name Value Oracle JDK 1.7 (set JAVA_HOME) OS Mac OSX Ubuntu 14.X CentOS 6.X Windows 7 Pro SP1 To install Apache Zeppelin, you have two options:You can download pre-built binary packages from the archive. This is usua
lly easier than building from source, and you can download the latest stable version (or older versions, if necessary).You can also build from source. This gives you a development version of Zeppelin, which is more unstable but has new features.Downloading Binary PackageStable binary packages are available on the Apache Zeppelin Download Page. You can download a default package with all interpreters, or you can download the net-install package, which lets you choose which interpreters to install.If you downloaded the default package, just unpack it in a directory of your choice and you&#39;re ready to go. If you downloaded the net-install package, you should manually install additional interpreters first. You can also install everything by running ./bin/install-interpreter.sh --all.After unpacking, jump to the Starting Apache Zeppelin with Command Line.Building from SourceIf you want to build from source, you must first install the following dependencies: Name Value
Git (Any Version) Maven 3.1.x or higher If you haven&#39;t installed Git and Maven yet, check the Before Build section and follow the step by step instructions from there.1. Clone the Apache Zeppelin repositorygit clone https://github.com/apache/zeppelin.git2. Build source with optionsEach interpreter requires different build options. For more information about build options, please see the Build section.mvn clean package -DskipTests [Options]Here are some examples with several options:# build with spark-2.0, scala-2.11./dev/change_scala_version.sh 2.11mvn clean package -Pspark-2.0 -Phadoop-2.4 -Pyarn -Ppyspark -Psparkr -Pscala-2.11# build with spark-1.6, scala-2.10mvn clean package -Pspark-1.6 -Phadoop-2.4 -Pyarn -Ppyspark -Psparkr# spark-cassandra integrationmvn clean package -Pcassandra-spark-1.5 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests# with CDHmvn clean package -Pspark-1.5 -Dhadoop.version=2.6.0-cdh5.5.0 -Phadoop-2.6 -Pvendor-repo -DskipTests# with M
apRmvn clean package -Pspark-1.5 -Pmapr50 -DskipTestsFor further information about building from source, please see README.md in the Zeppelin repository.Starting Apache Zeppelin from the Command LineStarting Apache ZeppelinOn all platforms except for Windows:bin/zeppelin-daemon.sh startIf you are using Windows:binzeppelin.cmdAfter Zeppelin has started successfully, go to http://localhost:8080 with your web browser.Stopping Zeppelinbin/zeppelin-daemon.sh stop(Optional) Start Apache Zeppelin with a service managerNote : The below description was written based on Ubuntu Linux.Apache Zeppelin can be auto-started as a service with an init script, using a service manager like upstart.This is an example upstart script saved as /etc/init/zeppelin.confThis allows the service to be managed with commands such assudo service zeppelin start sudo service zeppelin stop sudo service zeppelin restartOther service managers could use a similar approach with the upstart argument passed to the zeppeli
n-daemon.sh script.bin/zeppelin-daemon.sh upstartzeppelin.confdescription &quot;zeppelin&quot;start on (local-filesystems and net-device-up IFACE!=lo)stop on shutdown# Respawn the process on unexpected terminationrespawn# respawn the job up to 7 times within a 5 second period.# If the job exceeds these values, it will be stopped and marked as failed.respawn limit 7 5# zeppelin was installed in /usr/share/zeppelin in this examplechdir /usr/share/zeppelinexec bin/zeppelin-daemon.sh upstartNext Steps:Congratulations, you have successfully installed Apache Zeppelin! Here are two next steps you might find useful:If you are new to Apache Zeppelin...For an in-depth overview of the Apache Zeppelin UI, head to Explore Apache Zeppelin UI.After getting familiar with the Apache Zeppelin UI, have fun with a short walk-through Tutorial that uses the Apache Spark backend.If you need more configuration for Apache Zeppelin, jump to the next section: Apache Zeppelin Configuration.If you need
more information about Spark or JDBC interpreter settings...Apache Zeppelin provides deep integration with Apache Spark. For more informtation, see Spark Interpreter for Apache Zeppelin. You can also use generic JDBC connections in Apache Zeppelin. Go to Generic JDBC Interpreter for Apache Zeppelin.If you are in a multi-user environment...You can set permissions for your notebooks and secure data resource in a multi-user environment. Go to More -&gt; Security section.Apache Zeppelin ConfigurationYou can configure Apache Zeppelin with either environment variables in conf/zeppelin-env.sh (confzeppelin-env.cmd for Windows) or Java properties in conf/zeppelin-site.xml. If both are defined, then the environment variables will take priority. zeppelin-env.sh zeppelin-site.xml Default value Description ZEPPELIN_PORT zeppelin.server.port 8080 Zeppelin server port ZEPPELIN_SSL_PORT zeppelin.server.ssl.port 8443 Zeppelin Server ssl port (used w
hen ssl environment/property is set to true) ZEPPELIN_MEM N/A -Xmx1024m -XX:MaxPermSize=512m JVM mem options ZEPPELIN_INTP_MEM N/A ZEPPELIN_MEM JVM mem options for interpreter process ZEPPELIN_JAVA_OPTS N/A JVM options ZEPPELIN_ALLOWED_ORIGINS zeppelin.server.allowed.origins * Enables a way to specify a ',' separated list of allowed origins for REST and websockets. i.e. http://localhost:8080 N/A zeppelin.anonymous.allowed true The anonymous user is allowed by default. ZEPPELIN_SERVER_CONTEXT_PATH zeppelin.server.context.path / Context path of the web application ZEPPELIN_SSL zeppelin.ssl false ZEPPELIN_SSL_CLIENT_AUTH zeppelin.ssl.client.auth false ZEPPELIN_SSL_KEYSTORE_PATH zeppelin.ssl.keystore.path keystore ZEPPELIN_SSL_KEYSTORE_TYPE zeppelin.ssl.keystore.type JKS ZEPPELIN_SSL_KEYSTORE_PASSWORD z
eppelin.ssl.keystore.password ZEPPELIN_SSL_KEY_MANAGER_PASSWORD zeppelin.ssl.key.manager.password ZEPPELIN_SSL_TRUSTSTORE_PATH zeppelin.ssl.truststore.path ZEPPELIN_SSL_TRUSTSTORE_TYPE zeppelin.ssl.truststore.type ZEPPELIN_SSL_TRUSTSTORE_PASSWORD zeppelin.ssl.truststore.password ZEPPELIN_NOTEBOOK_HOMESCREEN zeppelin.notebook.homescreen Display notebook IDs on the Apache Zeppelin homescreen i.e. 2A94M5J1Z ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE zeppelin.notebook.homescreen.hide false Hide the notebook ID set by ZEPPELIN_NOTEBOOK_HOMESCREEN on the Apache Zeppelin homescreen. For the further information, please read Customize your Zeppelin homepage. ZEPPELIN_WAR_TEMPDIR zeppelin.war.tempdir webapps Location of the jetty temporary directory ZEPPELIN_NOTEBOOK_DIR zeppelin.notebook.dir notebook The root directory where notebook directories are saved
ZEPPELIN_NOTEBOOK_S3_BUCKET zeppelin.notebook.s3.bucket zeppelin S3 Bucket where notebook files will be saved ZEPPELIN_NOTEBOOK_S3_USER zeppelin.notebook.s3.user user User name of an S3 bucketi.e. bucket/user/notebook/2A94M5J1Z/note.json ZEPPELIN_NOTEBOOK_S3_ENDPOINT zeppelin.notebook.s3.endpoint s3.amazonaws.com Endpoint for the bucket ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID zeppelin.notebook.s3.kmsKeyID AWS KMS Key ID to use for encrypting data in S3 (optional) ZEPPELIN_NOTEBOOK_S3_EMP zeppelin.notebook.s3.encryptionMaterialsProvider Class name of a custom S3 encryption materials provider implementation to use for encrypting data in S3 (optional) ZEPPELIN_NOTEBOOK_AZURE_CONNECTION_STRING zeppelin.notebook.azure.connectionString The Azure storage account connection stringi.e. DefaultEndpointsProtocol=https;AccountName=&lt;accountName&gt;;AccountKey=&lt;accountKey&gt; ZEPP
ELIN_NOTEBOOK_AZURE_SHARE zeppelin.notebook.azure.share zeppelin Azure Share where the notebook files will be saved ZEPPELIN_NOTEBOOK_AZURE_USER zeppelin.notebook.azure.user user Optional user name of an Azure file sharei.e. share/user/notebook/2A94M5J1Z/note.json ZEPPELIN_NOTEBOOK_STORAGE zeppelin.notebook.storage org.apache.zeppelin.notebook.repo.VFSNotebookRepo Comma separated list of notebook storage locations ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC zeppelin.notebook.one.way.sync false If there are multiple notebook storage locations, should we treat the first one as the only source of truth? ZEPPELIN_INTERPRETERS zeppelin.interpreters org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.shell.ShellInterpreter, ... Comma sep
arated interpreter configurations [Class] NOTE: This property is deprecated since Zeppelin-0.6.0 and will not be supported from Zeppelin-0.7.0 on. ZEPPELIN_INTERPRETER_DIR zeppelin.interpreter.dir interpreter Interpreter directory ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE zeppelin.websocket.max.text.message.size 1024000 Size (in characters) of the maximum text message that can be received by websocket. ",
"url": " /install/install.html",
"group": "install",
"excerpt": "This page will help you get started and will guide you through installing Apache Zeppelin, running it in the command line and configuring options."
@@ -116,7 +116,7 @@
"/install/upgrade.html": {
"title": "Manual Zeppelin version upgrade procedure",
- "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Manual upgrade procedure for ZeppelinBasically, newer version of Zeppelin works with previous version notebook directory and configurations.So, copying notebook and conf directory should be enough.InstructionsStop Zeppelinbin/zeppelin-daemon.sh stopCopy your notebook and conf directory into a backup directoryDownload newer version of Zeppelin and Install. See Install page.Copy backup notebook and conf directory into newer version o
f Zeppelin notebook and conf directoryStart Zeppelinbin/zeppelin-daemon.sh startMigration GuideUpgrading from Zeppelin 0.6 to 0.7From 0.7, we don&#39;t use ZEPPELIN_JAVA_OPTS as default value of ZEPPELIN_INTP_JAVA_OPTS and also the same for ZEPPELIN_MEM/ZEPPELIN_INTP_MEM. If user want to configure the jvm opts of interpreter process, please set ZEPPELIN_INTP_JAVA_OPTS and ZEPPELIN_INTP_MEM explicitly.Mapping from %jdbc(prefix) to %prefix is no longer available. Instead, you can use %[interpreter alias] with multiple interpreter setttings on GUI.",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Manual upgrade procedure for ZeppelinBasically, newer version of Zeppelin works with previous version notebook directory and configurations.So, copying notebook and conf directory should be enough.InstructionsStop Zeppelinbin/zeppelin-daemon.sh stopCopy your notebook and conf directory into a backup directoryDownload newer version of Zeppelin and Install. See Install page.Copy backup notebook and conf directory into newer version o
f Zeppelin notebook and conf directoryStart Zeppelinbin/zeppelin-daemon.sh startMigration GuideUpgrading from Zeppelin 0.6 to 0.7From 0.7, we don&#39;t use ZEPPELIN_JAVA_OPTS as default value of ZEPPELIN_INTP_JAVA_OPTS and also the same for ZEPPELIN_MEM/ZEPPELIN_INTP_MEM. If user want to configure the jvm opts of interpreter process, please set ZEPPELIN_INTP_JAVA_OPTS and ZEPPELIN_INTP_MEM explicitly.Mapping from %jdbc(prefix) to %prefix is no longer available. Instead, you can use %[interpreter alias] with multiple interpreter setttings on GUI.Usage of ZEPPELIN_PORT is not supported in ssl mode. Instead use ZEPPELIN_SSL_PORT to configure the ssl port. Value from ZEPPELIN_PORT is used only when ZEPPELIN_SSL is set to false.",
"url": " /install/upgrade.html",
"group": "install",
"excerpt": "This document will guide you through a procedure of manual upgrade your Apache Zeppelin instance to a newer version. Apache Zeppelin keeps backward compatibility for the notebook file format."
@@ -270,7 +270,7 @@
"/interpreter/jdbc.html": {
"title": "Generic JDBC Interpreter for Apache Zeppelin",
- "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Generic JDBC Interpreter for Apache ZeppelinOverviewThis interpreter lets you create a JDBC connection to any data source, by now it has been tested with:PostgresMySqlMariaDBRedshiftApache HiveApache PhoenixApache Drill (Details on using Drill JDBC Driver)Apache TajoIf someone else used another database please report how it works to improve functionality.Create InterpreterWhen you create a interpreter by default use PostgreSQL with
the next properties: name value common.max_count 1000 default.driver org.postgresql.Driver default.password ******** default.url jdbc:postgresql://localhost:5432/ default.user gpadmin It is not necessary to add driver jar to the classpath for PostgreSQL as it is included in Zeppelin.Simple connectionPrior to creating the interpreter it is necessary to add maven coordinate or path of the JDBC driver to the Zeppelin classpath. To do this you must edit dependencies artifact(ex. mysql:mysql-connector-java:5.1.38) in interpreter menu as shown: To create the interpreter you need to specify connection parameters as shown in the table. name value common.max_count 1000 default.driver driver name default.password ******** default.url jdbc url default.user user name Multiple connectionsJDBC interpreter also allows connections to multiple data sources. It is
necessary to set a prefix for each connection to reference it in the paragraph in the form of %jdbc(prefix). Before you create the interpreter it is necessary to add each driver&#39;s maven coordinates or JDBC driver&#39;s jar file path to the Zeppelin classpath. To do this you must edit the dependencies of JDBC interpreter in interpreter menu as following: You can add all the jars you need to make multiple connections into the same JDBC interpreter. To create the interpreter you must specify the parameters. For example we will create two connections to MySQL and Redshift, the respective prefixes are default and redshift: name value common.max_count 1000 default.driver com.mysql.jdbc.Driver default.password ******** default.url jdbc:mysql://localhost:3306/ default.user mysql-user redshift.driver com.amazon.redshift.jdbc4.Driver redshift.password ******** redshift.url jdbc:redshift:
//examplecluster.abc123xyz789.us-west-2.redshift.amazonaws.com:5439 redshift.user redshift-user Bind to NotebookIn the Notebook click on the settings icon at the top-right corner. Use select/deselect to specify the interpreters to be used in the Notebook.More PropertiesYou can modify the interpreter configuration in the Interpreter section. The most common properties are as follows, but you can specify other properties that need to be connected. Property Name Description {prefix}.url JDBC URL to connect, the URL must include the name of the database {prefix}.user JDBC user name {prefix}.password JDBC password {prefix}.driver JDBC driver name. common.max_result Max number of SQL result to display to prevent the browser overload. This is common properties for all connections zeppelin.jdbc.auth.type Types of authentications&#39; methods supported are SIMPLE, and KERBERO
S zeppelin.jdbc.principal The principal name to load from the keytab zeppelin.jdbc.keytab.location The path to the keytab file To develop this functionality use this method. For example if a connection needs a schema parameter, it would have to add the property as follows: name value {prefix}.schema schema_name ExamplesHiveProperties Name Value hive.driver org.apache.hive.jdbc.HiveDriver hive.url jdbc:hive2://localhost:10000 hive.user hiveuser hive.password hivepassword Dependencies Artifact Excludes org.apache.hive:hive-jdbc:0.14.0 org.apache.hadoop:hadoop-common:2.6.0 PhoenixPhoenix supports thick and thin connection types:Thick client is faster, but must connect directly to ZooKeeper and HBase RegionServers.Thin client has fewer dependencies and connects through a Phoenix Query Server instance.Use the appropriate phoen
ix.driver and phoenix.url for your connection type.Properties: Name Value Description phoenix.driver org.apache.phoenix.jdbc.PhoenixDriver &#39;Thick Client&#39;, connects directly to Phoenix phoenix.driver org.apache.phoenix.queryserver.client.Driver &#39;Thin Client&#39;, connects via Phoenix Query Server phoenix.url jdbc:phoenix:localhost:2181:/hbase-unsecure &#39;Thick Client&#39;, connects directly to Phoenix phoenix.url jdbc:phoenix:thin:url=http://localhost:8765;serialization=PROTOBUF &#39;Thin Client&#39;, connects via Phoenix Query Server phoenix.user phoenixuser phoenix.password phoenixpassword Dependencies:Include the dependency for your connection type (it should be only one of the following). Artifact Excludes Description org.apache.phoenix:phoenix-core:4.4.0-HBase-1.0 &#39;T
hick Client&#39;, connects directly to Phoenix org.apache.phoenix:phoenix-server-client:4.7.0-HBase-1.1 &#39;Thin Client&#39; for Phoenix 4.7, connects via Phoenix Query Server org.apache.phoenix:phoenix-queryserver-client:4.8.0-HBase-1.2 &#39;Thin Client&#39; for Phoenix 4.8+, connects via Phoenix Query Server TajoProperties Name Value tajo.driver org.apache.tajo.jdbc.TajoDriver tajo.url jdbc:tajo://localhost:26002/default Dependencies Artifact Excludes org.apache.tajo:tajo-jdbc:0.11.0 How to useReference in paragraphStart the paragraphs with the %jdbc, this will use the default prefix for connection. If you want to use other connection you should specify the prefix of it as follows %jdbc(prefix):%jdbcSELECT * FROM db_name;or%jdbc(prefix)SELECT * FROM db_name;Apply Zeppelin Dynamic FormsYou can leverage Zeppelin Dynamic Form inside your queries. You ca
n use both the text input and select form parametrization features%jdbc(prefix)SELECT name, country, performerFROM demo.performersWHERE name=&#39;&#39;Bugs &amp; ReportingIf you find a bug for this interpreter, please create a JIRA ticket.",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Generic JDBC Interpreter for Apache ZeppelinOverviewJDBC interpreter lets you create a JDBC connection to any data sources seamlessly. By now, it has been tested with: Postgresql - JDBC Driver Mysql - JDBC Driver MariaDB - JDBC Driver Redshift - JDBC Driver Apache Hive - JDBC Driver Apache Phoenix itself is a JDBC driver
Apache Drill - JDBC Driver Apache Tajo - JDBC Driver If you are using other databases not in the above list, please feel free to share your use case. It would be helpful to improve the functionality of JDBC interpreter.Create a new JDBC InterpreterFirst, click + Create button at the top-right corner in the interpreter setting page.Fill Interpreter name field with whatever you want to use as the alias(e.g. mysql, mysql2, hive, redshift, and etc..). Please note that this alias will be used as %interpreter_name to call the interpreter in the paragraph. Then select jdbc as an Interpreter group. The default driver of JDBC interpreter is set as PostgreSQL. It means Zeppelin includes PostgreSQL driver jar in itself.So you don&#39;t need to add any dependencies(e.g. the artifact name or path for PostgreSQL driver jar) for PostgreSQL connection.The JDBC interpreter properties are defined by default like below. Name Default Value Descrip
tion common.max_count 1000 The maximun number of SQL result to display default.driver org.postgresql.Driver JDBC Driver Name default.password The JDBC user password default.url jdbc:postgresql://localhost:5432/ The URL for JDBC default.user gpadmin The JDBC user name If you want to connect other databases such as Mysql, Redshift and Hive, you need to edit the property values. The below example is for Mysql connection.The last step is Dependency Setting. Since Zeppelin only includes PostgreSQL driver jar by default, you need to add each driver&#39;s maven coordinates or JDBC driver&#39;s jar file path for the other databases.That&#39;s it. You can find more JDBC connection setting examples(Mysql, Apache Hive, Apache Phoenix, and Apache Tajo) in this section.More propertiesThere are more JDBC interpreter properties you can specify like below. Property Name Description common.max_result Max
number of SQL result to display to prevent the browser overload. This is common properties for all connections zeppelin.jdbc.auth.type Types of authentications' methods supported are SIMPLE, and KERBEROS zeppelin.jdbc.principal The principal name to load from the keytab zeppelin.jdbc.keytab.location The path to the keytab file You can also add more properties by using this method.For example, if a connection needs a schema parameter, it would have to add the property as follows: name value default.schema schema_name Binding JDBC interpter to notebookTo bind the interpreters created in the interpreter setting page, click the gear icon at the top-right corner.Select(blue) or deselect(white) the interpreter buttons depending on your use cases. If you need to use more than one interpreter in the notebook, activate several buttons.Don&#39;t forget to click Save button, or you will face Interpreter *** is not found error.How to u
seRun the paragraph with JDBC interpreterTo test whether your databases and Zeppelin are successfully connected or not, type %jdbc_interpreter_name(e.g. %mysql) at the top of the paragraph and run show databases.%jdbc_interpreter_nameshow databasesIf the paragraph is FINISHED without any errors, a new paragraph will be automatically added after the previous one with %jdbc_interpreter_name.So you don&#39;t need to type this prefix in every paragraphs&#39; header.Apply Zeppelin Dynamic FormsYou can leverage Zeppelin Dynamic Form inside your queries. You can use both the text input and select form parametrization features.%jdbc_interpreter_nameSELECT name, country, performerFROM demo.performersWHERE name=&#39;{{performer=Sheryl Crow|Doof|Fanfarlo|Los Paranoia}}&#39;ExamplesHere are some examples you can refer to. Including the below connectors, you can connect every databases as long as it can be configured with it&#39;s JDBC driver.MysqlProperties Name Valu
e default.driver com.mysql.jdbc.Driver default.url jdbc:mysql://localhost:3306/ default.user mysql_user default.password mysql_password Dependencies Artifact Excludes mysql:mysql-connector-java:5.1.38 Apache HiveProperties Name Value default.driver org.apache.hive.jdbc.HiveDriver default.url jdbc:hive2://localhost:10000 default.user hive_user default.password hive_password Dependencies Artifact Excludes org.apache.hive:hive-jdbc:0.14.0 org.apache.hadoop:hadoop-common:2.6.0 Apache PhoenixPhoenix supports thick and thin connection types:Thick client is faster, but must connect directly to ZooKeeper and HBase RegionServers.Thin client has fewer dependencies and connects through a Phoenix Query Server instance.Use the appropriate default.driver, default.url, and the dependency artifact for your connection type.Thick client connectionProperties
Name Value default.driver org.apache.phoenix.jdbc.PhoenixDriver default.url jdbc:phoenix:localhost:2181:/hbase-unsecure default.user phoenix_user default.password phoenix_password Dependencies Artifact Excludes org.apache.phoenix:phoenix-core:4.4.0-HBase-1.0 Thin client connectionProperties Name Value default.driver org.apache.phoenix.queryserver.client.Driver default.url jdbc:phoenix:thin:url=http://localhost:8765;serialization=PROTOBUF default.user phoenix_user default.password phoenix_password DependenciesBefore Adding one of the below dependencies, check the Phoenix version first. Artifact Excludes Description org.apache.phoenix:phoenix-server-client:4.7.0-HBase-1.1 For Phoenix 4.7 org.apache.phoenix:phoenix-queryserver-client:4.8.0-HBase-1.2 For Phoenix 4.8+ Apache TajoProperties Name Value default.driver org
.apache.tajo.jdbc.TajoDriver default.url jdbc:tajo://localhost:26002/default Dependencies Artifact Excludes org.apache.tajo:tajo-jdbc:0.11.0 Bug reportingIf you find a bug using JDBC interpreter, please create a JIRA ticket.",
"url": " /interpreter/jdbc.html",
"group": "interpreter",
"excerpt": "Generic JDBC Interpreter lets you create a JDBC connection to any data source. You can use Postgres, MySql, MariaDB, Redshift, Apache Hive, Apache Phoenix, Apache Drill and Apache Tajo using JDBC interpreter."
@@ -380,7 +380,7 @@
"/manual/dependencymanagement.html": {
"title": "Dependency Management for Apache Spark Interpreter",
- "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Dependency Management for InterpreterYou can include external libraries to interpreter by setting dependencies in interpreter menu.When your code requires external library, instead of doing download/copy/restart Zeppelin, you can easily do following jobs in this menu.Load libraries recursively from Maven repositoryLoad libraries from local filesystemAdd additional maven repositoryAutomatically add libraries to SparkCluster
Load Dependencies to Interpreter Click 'Interpreter' menu in navigation bar. Click 'edit' button of the interpreter which you want to load dependencies to. Fill artifact and exclude field to your needs. You can enter not only groupId:artifactId:version but also local file in artifact field. Press 'Save' to restart the interpreter with loaded libraries. Add repository for dependency resolving Press icon in 'Interpreter' menu on the top right side. It will show you available repository lists. If you need to resolve dependencies from other than central maven repository or local ~/.m2 repository, hit icon next to repository lists. Fill out the form and click 'Add' button, then you will be able to see that new repository is added. ",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Dependency Management for InterpreterYou can include external libraries to interpreter by setting dependencies in interpreter menu.When your code requires external library, instead of doing download/copy/restart Zeppelin, you can easily do following jobs in this menu.Load libraries recursively from Maven repositoryLoad libraries from local filesystemAdd additional maven repositoryAutomatically add libraries to SparkCluster
Load Dependencies to Interpreter Click 'Interpreter' menu in navigation bar. Click 'edit' button of the interpreter which you want to load dependencies to. Fill artifact and exclude field to your needs. You can enter not only groupId:artifactId:version but also local file in artifact field. Press 'Save' to restart the interpreter with loaded libraries. Add repository for dependency resolving Press icon in 'Interpreter' menu on the top right side. It will show you available repository lists. If you need to resolve dependencies from other than central maven repository or local ~/.m2 repository, hit icon next to repository lists. Fill out the form and click 'Add' button, then you will be able to see that new repository is added. Optionally, if you are behind a corporate firewall, you can s
pecify also all proxy settings so that Zeppelin can download the dependencies using the given credentials ",
"url": " /manual/dependencymanagement.html",
"group": "manual",
"excerpt": "Include external libraries to Apache Spark Interpreter by setting dependencies in interpreter menu."
@@ -467,6 +467,17 @@
+ "/quickstart/install_with_flink_and_spark_cluster.html": {
+ "title": "Install Zeppelin with Flink and Spark in cluster mode",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->This tutorial is extremely entry-level. It assumes no prior knowledge of Linux, git, or other tools. If you carefully type what I tell you when I tell you, you should be able to get Zeppelin running.Installing Zeppelin with Flink and Spark in cluster modeThis tutorial assumes the user has a machine (real or virtual with a fresh, minimal installation of Ubuntu 14.04.3 Server.Note: On the size requirements of the Virtual Machine, som
e users reported trouble when using the default virtual machine sizes, specifically that the hard drive needed to be at least 16GB- other users did not have this issue.There are many good tutorials on how to install Ubuntu Server on a virtual box, here is one of themRequired ProgramsAssuming the minimal install, there are several programs that we will need to install before Zeppelin, Flink, and Spark.gitopenssh-serverOpenJDK 7Maven 3.1+For git, openssh-server, and OpenJDK 7 we will be using the apt package manager.gitFrom the command prompt:sudo apt-get install gitopenssh-serversudo apt-get install openssh-serverOpenJDK 7sudo apt-get install openjdk-7-jdk openjdk-7-jre-libA note for those using Ubuntu 16.04: To install openjdk-7 on Ubuntu 16.04, one must add a repository. Sourcesudo add-apt-repository ppa:openjdk-r/ppasudo apt-get updatesudo apt-get install openjdk-7-jdk openjdk-7-jre-libMaven 3.1+Zeppelin requires maven version 3.x. The version available in the repositories at th
e time of writing is 2.x, so maven must be installed manually.Purge any existing versions of maven.sudo apt-get purge maven maven2Download the maven 3.3.9 binary.wget &quot;http://www.us.apache.org/dist/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz&quot;Unarchive the binary and move to the /usr/local directory.tar -zxvf apache-maven-3.3.9-bin.tar.gzsudo mv ./apache-maven-3.3.9 /usr/localCreate symbolic links in /usr/bin.sudo ln -s /usr/local/apache-maven-3.3.9/bin/mvn /usr/bin/mvnInstalling ZeppelinThis provides a quick overview of Zeppelin installation from source, however the reader is encouraged to review the Zeppelin Installation GuideFrom the command prompt:Clone Zeppelin.git clone https://github.com/apache/zeppelin.gitEnter the Zeppelin root directory.cd zeppelinPackage Zeppelin.mvn clean package -DskipTests -Pspark-1.6 -Dflink.version=1.1.2-DskipTests skips build tests- you&#39;re not developing (yet), so you don&#39;t need to do tests, the clone
version should build.-Pspark-1.6 tells maven to build a Zeppelin with Spark 1.6. This is important because Zeppelin has its own Spark interpreter and the versions must be the same.-Dflink.version=1.1.2 tells maven specifically to build Zeppelin with Flink version 1.1.2.Note: You may wish to include additional build flags such as -Ppyspark or -Psparkr. See the build section of github for more details.Note: You can build against any version of Spark that has a Zeppelin build profile available. The key is to make sure you check out the matching version of Spark to build. At the time of this writing, Spark 1.6 was the most recent Spark version available.Note: On build failures. Having installed Zeppelin close to 30 times now, I will tell you that sometimes the build fails for seemingly no reason.As long as you didn&#39;t edit any code, it is unlikely the build is failing because of something you did. What does tend to happen, is some dependency that maven is trying to download is
unreachable. If your build fails on this step here are some tips:- Don&#39;t get discouraged.- Scroll up and read through the logs. There will be clues there.- Retry (that is, run the mvn clean package -DskipTests -Pspark-1.6 again)- If there were clues that a dependency couldn&#39;t be downloaded wait a few hours or even days and retry again. Open source software when compiling is trying to download all of the dependencies it needs, if a server is off-line there is nothing you can do but wait for it to come back.- Make sure you followed all of the steps carefully.- Ask the community to help you. Go here and join the user mailing list. People are there to help you. Make sure to copy and paste the build output (everything that happened in the console) and include that in your message.Start the Zeppelin daemon.bin/zeppelin-daemon.sh startUse ifconfig to determine the host machine&#39;s IP address. If you are not familiar with how to do this, a fairly comprehensive post ca
n be found here.Open a web-browser on a machine connected to the same network as the host (or in the host operating system if using a virtual machine). Navigate to http://yourip:8080, where yourip is the IP address you found in ifconfig.See the Zeppelin tutorial for basic Zeppelin usage. It is also advised that you take a moment to check out the tutorial notebook that is included with each Zeppelin install, and to familiarize yourself with basic notebook functionality.Flink TestCreate a new notebook named &quot;Flink Test&quot; and copy and paste the following code.%flink // let Zeppelin know what interpreter to use.val text = env.fromElements(&quot;In the time of chimpanzees, I was a monkey&quot;, // some lines of text to analyze&quot;Butane in my veins and I&#39;m out to cut the junkie&quot;,&quot;With the plastic eyeballs, spray paint the vegetables&quot;,&quot;Dog food stalls with the beefcake pantyhose&quot;,&quot;Kill the hea
dlights and put it in neutral&quot;,&quot;Stock car flamin&#39; with a loser in the cruise control&quot;,&quot;Baby&#39;s in Reno with the Vitamin D&quot;,&quot;Got a couple of couches, sleep on the love seat&quot;,&quot;Someone came in sayin&#39; I&#39;m insane to complain&quot;,&quot;About a shotgun wedding and a stain on my shirt&quot;,&quot;Don&#39;t believe everything that you breathe&quot;,&quot;You get a parking violation and a maggot on your sleeve&quot;,&quot;So shave your face with some mace in the dark&quot;,&quot;Savin&#39; all your food stamps and burnin&#39; down the trailer park&quot;,&quot;Yo, cut it&quot;)/* The meat and potatoes: this tells Flink to iterate through the elements, in this case strings, transform the string to lower case and split the string at white space into individual words then finally aggregate the occurrence of e
ach word. This creates the count variable which is a list of tuples of the form (word, occurances)counts.collect().foreach(println(_)) // execute the script and print each element in the counts list*/val counts = text.flatMap{ _.toLowerCase.split(&quot;W+&quot;) }.map { (_,1) }.groupBy(0).sum(1)counts.collect().foreach(println(_)) // execute the script and print each element in the counts listRun the code to make sure the built-in Zeppelin Flink interpreter is working properly.Spark TestCreate a new notebook named &quot;Spark Test&quot; and copy and paste the following code.%spark // let Zeppelin know what interpreter to use.val text = sc.parallelize(List(&quot;In the time of chimpanzees, I was a monkey&quot;, // some lines of text to analyze&quot;Butane in my veins and I&#39;m out to cut the junkie&quot;,&quot;With the plastic eyeballs, spray paint the vegetables&quot;,&quot;Dog food stalls with the beefcake pantyhose&qu
ot;,&quot;Kill the headlights and put it in neutral&quot;,&quot;Stock car flamin&#39; with a loser in the cruise control&quot;,&quot;Baby&#39;s in Reno with the Vitamin D&quot;,&quot;Got a couple of couches, sleep on the love seat&quot;,&quot;Someone came in sayin&#39; I&#39;m insane to complain&quot;,&quot;About a shotgun wedding and a stain on my shirt&quot;,&quot;Don&#39;t believe everything that you breathe&quot;,&quot;You get a parking violation and a maggot on your sleeve&quot;,&quot;So shave your face with some mace in the dark&quot;,&quot;Savin&#39; all your food stamps and burnin&#39; down the trailer park&quot;,&quot;Yo, cut it&quot;))/* The meat and potatoes: this tells spark to iterate through the elements, in this case strings, transform the string to lower case and split the string at white space into individual words then finally ag
gregate the occurrence of each word. This creates the count variable which is a list of tuples of the form (word, occurances)*/val counts = text.flatMap { _.toLowerCase.split(&quot;W+&quot;) } .map { (_,1) } .reduceByKey(_ + _)counts.collect().foreach(println(_)) // execute the script and print each element in the counts listRun the code to make sure the built-in Zeppelin Flink interpreter is working properly.Finally, stop the Zeppelin daemon. From the command prompt run:bin/zeppelin-daemon.sh stopInstalling ClustersFlink ClusterDownload BinariesBuilding from source is recommended where possible, for simplicity in this tutorial we will download Flink and Spark Binaries.To download the Flink Binary use wgetwget &quot;http://mirror.cogentco.com/pub/apache/flink/flink-1.0.3/flink-1.0.3-bin-hadoop24-scala_2.10.tgz&quot;tar -xzvf flink-1.0.3-bin-hadoop24-scala_2.10.tgzThis will download Flink 1.0.3, compatible with Hadoop 2.4. Yo
u do not have to install Hadoop for this binary to work, but if you are using Hadoop, please change 24 to your appropriate version.Start the Flink Cluster.flink-1.0.3/bin/start-cluster.shBuilding From sourceIf you wish to build Flink from source, the following will be instructive. Note that if you have downloaded and used the binary version this should be skipped. The changing nature of build tools and versions across platforms makes this section somewhat precarious. For example, Java8 and Maven 3.0.3 are recommended for building Flink, which are not recommended for Zeppelin at the time of writing. If the user wishes to attempt to build from source, this section will provide some reference. If errors are encountered, please contact the Apache Flink community.See the Flink Installation guide for more detailed instructions.Return to the directory where you have been downloading, this tutorial assumes that is $HOME. Clone Flink, check out release-1.0, and build.cd $HOMEgit clone
https://github.com/apache/flink.gitcd flinkgit checkout release-1.0mvn clean install -DskipTestsStart the Flink Cluster in stand-alone modebuild-target/bin/start-cluster.shEnsure the cluster is upIn a browser, navigate to http://yourip:8082 to see the Flink Web-UI. Click on &#39;Task Managers&#39; in the left navigation bar. Ensure there is at least one Task Manager present.If no task managers are present, restart the Flink cluster with the following commands:(if binaries)flink-1.0.3/bin/stop-cluster.shflink-1.0.3/bin/start-cluster.sh(if built from source)build-target/bin/stop-cluster.shbuild-target/bin/start-cluster.shSpark 1.6 ClusterDownload BinariesBuilding from source is recommended where possible, for simplicity in this tutorial we will download Flink and Spark Binaries.Using binaries is alsoTo download the Spark Binary use wgetwget &quot;http://mirrors.koehn.com/apache/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.4.tgz&quot;tar -xzvf spark-1.6.1-bin-hadoop2.4.t
gzmv spark-1.6.1-bin-hadoop4.4 sparkThis will download Spark 1.6.1, compatible with Hadoop 2.4. You do not have to install Hadoop for this binary to work, but if you are using Hadoop, please change 2.4 to your appropriate version.Building From sourceSpark is an extraordinarily large project, which takes considerable time to download and build. It is also prone to build failures for similar reasons listed in the Flink section. If the user wishes to attempt to build from source, this section will provide some reference. If errors are encountered, please contact the Apache Spark community.See the Spark Installation guide for more detailed instructions.Return to the directory where you have been downloading, this tutorial assumes that is $HOME. Clone Spark, check out branch-1.6, and build.Note: Recall, we&#39;re only checking out 1.6 because it is the most recent Spark for which a Zeppelin profile exists at the time of writing. You are free to check out other version, just make
sure you build Zeppelin against the correct version of Spark.cd $HOMEClone, check out, and build Spark version 1.6.x.git clone https://github.com/apache/spark.gitcd sparkgit checkout branch-1.6mvn clean package -DskipTestsStart the Spark clusterReturn to the $HOME directory.cd $HOMEStart the Spark cluster in stand alone mode, specifying the webui-port as some port other than 8080 (the webui-port of Zeppelin).spark/sbin/start-master.sh --webui-port 8082Note: Why --webui-port 8082? There is a digression toward the end of this document that explains this.Open a browser and navigate to http://yourip:8082 to ensure the Spark master is running.Toward the top of the page there will be a URL: spark://yourhost:7077. Note this URL, the Spark Master URI, it will be needed in subsequent steps.Start the slave using the URI from the Spark master WebUI:spark/sbin/start-slave.sh spark://yourhostname:7077Return to the root directory and start the Zeppelin daemon.cd $HOMEzeppelin/bin/zeppelin-daemon
.sh startConfigure InterpretersOpen a web browser and go to the Zeppelin web-ui at http://yourip:8080.Now go back to the Zeppelin web-ui at http://yourip:8080 and this time click on anonymous at the top right, which will open a drop-down menu, select Interpreters to enter interpreter configuration.In the Spark section, click the edit button in the top right corner to make the property values editable (looks like a pencil).The only field that needs to be edited in the Spark interpreter is the master field. Change this value from local[*] to the URL you used to start the slave, mine was spark://ubuntu:7077.Click Save to update the parameters, and click OK when it asks you about restarting the interpreter.Now scroll down to the Flink section. Click the edit button and change the value of host from local to localhost. Click Save again.Reopen the examples and execute them again (I.e. you need to click the play button at the top of the screen, or the button on the paragraph .You should be
able check the Flink and Spark webuis (at something like http://yourip:8081, http://yourip:8082, http://yourip:8083) and see that jobs have been run against the clusters.Digression Sorry to be vague and use terms such as &#39;something like&#39;, but exactly what web-ui is at what port is going to depend on what order you started things. What is really going on here is you are pointing your browser at specific ports, namely 8081, 8082, and 8083. Flink and Spark all want to put their web-ui on port 8080, but are well behaved and will take the next port available. Since Zeppelin started first, it will get port 8080. When Flink starts (assuming you started Flink first), it will try to bind to port 8080, see that it is already taken, and go to the next one available, hopefully 8081. Spark has a webui for the master and the slave, so when they start they will try to bind to 8080 already taken by Zeppelin), then 8081 (already taken by Flink&#39;s webui), then 8082. If ev
erything goes smoothy and you followed the directions precisely, the webuis should be 8081 and 8082. It is possible to specify the port you want the webui to bind to (at the command line by passing the --webui-port &lt;port&gt; flag when you start the Flink and Spark, where &lt;port&gt; is the port you want to see that webui on. You can also set the default webui port of Spark and Flink (and Zeppelin) in the configuration files, but this is a tutorial for novices and slightly out of scope.Next StepsCheck out the tutorial for more cool things you can do with your new toy!Join the community, ask questions and contribute! Every little bit helps.",
+ "url": " /quickstart/install_with_flink_and_spark_cluster.html",
+ "group": "tutorial",
+ "excerpt": "Tutorial is valid for Spark 1.6.x and Flink 1.1.2"
+ }
+ ,
+
+
+
"/quickstart/tutorial.html": {
"title": "Apache Zeppelin Tutorial",
"content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->Zeppelin TutorialThis tutorial walks you through some of the fundamental Zeppelin concepts. We will assume you have already installed Zeppelin. If not, please see here first.Current main backend processing engine of Zeppelin is Apache Spark. If you&#39;re new to this system, you might want to start by getting an idea of how it processes data to get the most out of Zeppelin.Tutorial with Local FileData RefineBefore you start Zep
pelin tutorial, you will need to download bank.zip. First, to transform csv format data into RDD of Bank objects, run following script. This will also remove header using filter function.val bankText = sc.textFile(&quot;yourPath/bank/bank-full.csv&quot;)case class Bank(age:Integer, job:String, marital : String, education : String, balance : Integer)// split each line, filter out header (starts with &quot;age&quot;), and map it into Bank case classval bank = bankText.map(s=&gt;s.split(&quot;;&quot;)).filter(s=&gt;s(0)!=&quot;&quot;age&quot;&quot;).map( s=&gt;Bank(s(0).toInt, s(1).replaceAll(&quot;&quot;&quot;, &quot;&quot;), s(2).replaceAll(&quot;&quot;&quot;, &quot;&quot;), s(3).replaceAll(&quot;&quot;&quot;, &quot;&quot;), s(5).replaceAll(&quot;&quot;&quot;, &quot;&quot;).toInt ))// convert to Data
Frame and create temporal tablebank.toDF().registerTempTable(&quot;bank&quot;)Data RetrievalSuppose we want to see age distribution from bank. To do this, run:%sql select age, count(1) from bank where age &lt; 30 group by age order by ageYou can make input box for setting age condition by replacing 30 with ${maxAge=30}.%sql select age, count(1) from bank where age &lt; ${maxAge=30} group by age order by ageNow we want to see age distribution with certain marital status and add combo box to select marital status. Run:%sql select age, count(1) from bank where marital=&quot;${marital=single,single|divorced|married}&quot; group by age order by ageTutorial with Streaming DataData RefineSince this tutorial is based on Twitter&#39;s sample tweet stream, you must configure authentication with a Twitter account. To do this, take a look at Twitter Credential Setup. After you get API keys, you should fill out credential related values(apiKey, apiSecret, accessToken,
accessTokenSecret) with your API keys on following script.This will create a RDD of Tweet objects and register these stream data as a table:import org.apache.spark.streaming._import org.apache.spark.streaming.twitter._import org.apache.spark.storage.StorageLevelimport scala.io.Sourceimport scala.collection.mutable.HashMapimport java.io.Fileimport org.apache.log4j.Loggerimport org.apache.log4j.Levelimport sys.process.stringSeqToProcess/** Configures the Oauth Credentials for accessing Twitter */def configureTwitterCredentials(apiKey: String, apiSecret: String, accessToken: String, accessTokenSecret: String) { val configs = new HashMap[String, String] ++= Seq( &quot;apiKey&quot; -&gt; apiKey, &quot;apiSecret&quot; -&gt; apiSecret, &quot;accessToken&quot; -&gt; accessToken, &quot;accessTokenSecret&quot; -&gt; accessTokenSecret) println(&quot;Configuring Twitter OAuth&quot;) configs.foreach{ case(key, value) =&gt; if
(value.trim.isEmpty) { throw new Exception(&quot;Error setting authentication - value for &quot; + key + &quot; not set&quot;) } val fullKey = &quot;twitter4j.oauth.&quot; + key.replace(&quot;api&quot;, &quot;consumer&quot;) System.setProperty(fullKey, value.trim) println(&quot;tProperty &quot; + fullKey + &quot; set as [&quot; + value.trim + &quot;]&quot;) } println()}// Configure Twitter credentialsval apiKey = &quot;xxxxxxxxxxxxxxxxxxxxxxxxx&quot;val apiSecret = &quot;xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx&quot;val accessToken = &quot;xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx&quot;val accessTokenSecret = &quot;xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx&quot;configureTwitterCredentials(apiKey, apiSecret, accessToken, accessTokenSecret)import org.apache.spark.streaming.twitter._val ssc = new StreamingContext(sc, Seconds(2))val tweets = TwitterUt
ils.createStream(ssc, None)val twt = tweets.window(Seconds(60))case class Tweet(createdAt:Long, text:String)twt.map(status=&gt; Tweet(status.getCreatedAt().getTime()/1000, status.getText())).foreachRDD(rdd=&gt; // Below line works only in spark 1.3.0. // For spark 1.1.x and spark 1.2.x, // use rdd.registerTempTable(&quot;tweets&quot;) instead. rdd.toDF().registerAsTable(&quot;tweets&quot;))twt.printssc.start()Data RetrievalFor each following script, every time you click run button you will see different result since it is based on real-time data.Let&#39;s begin by extracting maximum 10 tweets which contain the word girl.%sql select * from tweets where text like &#39;%girl%&#39; limit 10This time suppose we want to see how many tweets have been created per sec during last 60 sec. To do this, run:%sql select createdAt, count(1) from tweets group by createdAt order by createdAtYou can make user-defined function and use it in Spark SQL. Let&#
39;s try it by making function named sentiment. This function will return one of the three attitudes( positive, negative, neutral ) towards the parameter.def sentiment(s:String) : String = { val positive = Array(&quot;like&quot;, &quot;love&quot;, &quot;good&quot;, &quot;great&quot;, &quot;happy&quot;, &quot;cool&quot;, &quot;the&quot;, &quot;one&quot;, &quot;that&quot;) val negative = Array(&quot;hate&quot;, &quot;bad&quot;, &quot;stupid&quot;, &quot;is&quot;) var st = 0; val words = s.split(&quot; &quot;) positive.foreach(p =&gt; words.foreach(w =&gt; if(p==w) st = st+1 ) ) negative.foreach(p=&gt; words.foreach(w=&gt; if(p==w) st = st-1 ) ) if(st&gt;0) &quot;positivie&quot; else if(st&lt;0) &quot;negative&quot; else &quot
;neutral&quot;}// Below line works only in spark 1.3.0.// For spark 1.1.x and spark 1.2.x,// use sqlc.registerFunction(&quot;sentiment&quot;, sentiment _) instead.sqlc.udf.register(&quot;sentiment&quot;, sentiment _)To check how people think about girls using sentiment function we&#39;ve made above, run this:%sql select sentiment(text), count(1) from tweets where text like &#39;%girl%&#39; group by sentiment(text)",
@@ -539,7 +550,7 @@
"/security/datasource_authorization.html": {
"title": "Data Source Authorization in Apache Zeppelin",
- "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->{% include JB/setup %}# Data Source Authorization in Apache Zeppelin## OverviewData source authorization involves authenticating to the data source like a Mysql database and letting it determine user permissions.Apache Zeppelin allows users to use their own credentials to authenticate with **Data Sources**.For example, let's assume you have an account in the Vertica databases with credentials. You might want to use this account
to create a JDBC connection instead of a shared account with all users who are defined in `conf/shiro.ini`. In this case, you can add your credential information to Apache Zeppelin and use them with below simple steps. ## How to save the credential information?You can add new credentials in the dropdown menu for your data source which can be passed to interpreters. **Entity** can be the key that distinguishes each credential sets. Type **Username & Password** for your own credentials. ex) user & password of Mysql The credentials saved as per users defined in `conf/shiro.ini`.If you didn't activate [shiro authentication in Apache Zeppelin](./shiroauthentication.html), your credential information will be saved as `anonymous`.All credential information also can be found in `conf/credentials.json`. #### JDBC interpreterYou need to maintain per-user connection pools.The interpret method takes the user string as a parameter and executes the jdbc call using a connection in th
e user's connection pool.#### Presto You don't need a password if the Presto DB server runs backend code using HDFS authorization for the user.#### Vertica and Mysql You have to store the password information for users.## Please noteAs a first step of data source authentication feature, [ZEPPELIN-828](https://issues.apache.org/jira/browse/ZEPPELIN-828) was proposed and implemented in Pull Request [#860](https://github.com/apache/zeppelin/pull/860).Currently, only customized 3rd party interpreters can use this feature. We are planning to apply this mechanism to [the community interpreters](../manual/interpreterinstallation.md#available-community-managed-interpreters) in the near future. Please keep track [ZEPPELIN-1070](https://issues.apache.org/jira/browse/ZEPPELIN-1070). ",
+ "content" : "<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.-->{% include JB/setup %}# Data Source Authorization in Apache Zeppelin## OverviewData source authorization involves authenticating to the data source like a Mysql database and letting it determine user permissions.Apache Zeppelin allows users to use their own credentials to authenticate with **Data Sources**.For example, let's assume you have an account in the Vertica databases with credentials. You might want to use this account
to create a JDBC connection instead of a shared account with all users who are defined in `conf/shiro.ini`. In this case, you can add your credential information to Apache Zeppelin and use them with below simple steps. ## How to save the credential information?You can add new credentials in the dropdown menu for your data source which can be passed to interpreters. **Entity** can be the key that distinguishes each credential sets. Type **Username & Password** for your own credentials. ex) user & password of Mysql The credentials saved as per users defined in `conf/shiro.ini`.If you didn't activate [shiro authentication in Apache Zeppelin](./shiroauthentication.html), your credential information will be saved as `anonymous`.All credential information also can be found in `conf/credentials.json`. #### JDBC interpreterYou need to maintain per-user connection pools.The interpret method takes the user string as a parameter and executes the jdbc call using a connection in th
e user's connection pool.#### Presto You don't need a password if the Presto DB server runs backend code using HDFS authorization for the user.#### Vertica and Mysql You have to store the password information for users.## Please noteAs a first step of data source authentication feature, [ZEPPELIN-828](https://issues.apache.org/jira/browse/ZEPPELIN-828) was proposed and implemented in Pull Request [#860](https://github.com/apache/zeppelin/pull/860).Currently, only customized 3rd party interpreters can use this feature. We are planning to apply this mechanism to [the community managed interpreters](../manual/interpreterinstallation.html#available-community-managed-interpreters) in the near future. Please keep track [ZEPPELIN-1070](https://issues.apache.org/jira/browse/ZEPPELIN-1070). ",
"url": " /security/datasource_authorization.html",
"group": "security",
"excerpt": "Apache Zeppelin supports protected data sources. In case of a MySql database, every users can set up their own credentials to access it."
Modified: zeppelin/site/docs/0.7.0-SNAPSHOT/security/authentication.html
URL: http://svn.apache.org/viewvc/zeppelin/site/docs/0.7.0-SNAPSHOT/security/authentication.html?rev=1764207&r1=1764206&r2=1764207&view=diff
==============================================================================
--- zeppelin/site/docs/0.7.0-SNAPSHOT/security/authentication.html (original)
+++ zeppelin/site/docs/0.7.0-SNAPSHOT/security/authentication.html Tue Oct 11 07:28:09 2016
@@ -80,6 +80,7 @@
<li role="separator" class="divider"></li>
<li class="title"><span><b>More</b><span></li>
<li><a href="/docs/0.7.0-SNAPSHOT/install/upgrade.html">Upgrade Zeppelin Version</a></li>
+ <li><a href="/docs/0.7.0-SNAPSHOT/quickstart/install_with_flink_and_spark_cluster.html">Install Zeppelin with Flink and Spark Clusters Tutorial</a></li>
</ul>
</li>
<li>
Modified: zeppelin/site/docs/0.7.0-SNAPSHOT/security/notebook_authorization.html
URL: http://svn.apache.org/viewvc/zeppelin/site/docs/0.7.0-SNAPSHOT/security/notebook_authorization.html?rev=1764207&r1=1764206&r2=1764207&view=diff
==============================================================================
--- zeppelin/site/docs/0.7.0-SNAPSHOT/security/notebook_authorization.html (original)
+++ zeppelin/site/docs/0.7.0-SNAPSHOT/security/notebook_authorization.html Tue Oct 11 07:28:09 2016
@@ -80,6 +80,7 @@
<li role="separator" class="divider"></li>
<li class="title"><span><b>More</b><span></li>
<li><a href="/docs/0.7.0-SNAPSHOT/install/upgrade.html">Upgrade Zeppelin Version</a></li>
+ <li><a href="/docs/0.7.0-SNAPSHOT/quickstart/install_with_flink_and_spark_cluster.html">Install Zeppelin with Flink and Spark Clusters Tutorial</a></li>
</ul>
</li>
<li>
Modified: zeppelin/site/docs/0.7.0-SNAPSHOT/security/shiroauthentication.html
URL: http://svn.apache.org/viewvc/zeppelin/site/docs/0.7.0-SNAPSHOT/security/shiroauthentication.html?rev=1764207&r1=1764206&r2=1764207&view=diff
==============================================================================
--- zeppelin/site/docs/0.7.0-SNAPSHOT/security/shiroauthentication.html (original)
+++ zeppelin/site/docs/0.7.0-SNAPSHOT/security/shiroauthentication.html Tue Oct 11 07:28:09 2016
@@ -80,6 +80,7 @@
<li role="separator" class="divider"></li>
<li class="title"><span><b>More</b><span></li>
<li><a href="/docs/0.7.0-SNAPSHOT/install/upgrade.html">Upgrade Zeppelin Version</a></li>
+ <li><a href="/docs/0.7.0-SNAPSHOT/quickstart/install_with_flink_and_spark_cluster.html">Install Zeppelin with Flink and Spark Clusters Tutorial</a></li>
</ul>
</li>
<li>
Modified: zeppelin/site/docs/0.7.0-SNAPSHOT/sitemap.txt
URL: http://svn.apache.org/viewvc/zeppelin/site/docs/0.7.0-SNAPSHOT/sitemap.txt?rev=1764207&r1=1764206&r2=1764207&view=diff
==============================================================================
--- zeppelin/site/docs/0.7.0-SNAPSHOT/sitemap.txt (original)
+++ zeppelin/site/docs/0.7.0-SNAPSHOT/sitemap.txt Tue Oct 11 07:28:09 2016
@@ -45,6 +45,7 @@ http://zeppelin.apache.org/manual/notebo
http://zeppelin.apache.org/manual/publish.html
http://zeppelin.apache.org/pleasecontribute.html
http://zeppelin.apache.org/quickstart/explorezeppelinui.html
+http://zeppelin.apache.org/quickstart/install_with_flink_and_spark_cluster.html
http://zeppelin.apache.org/quickstart/tutorial.html
http://zeppelin.apache.org/rest-api/rest-configuration.html
http://zeppelin.apache.org/rest-api/rest-credential.html
Modified: zeppelin/site/docs/0.7.0-SNAPSHOT/storage/storage.html
URL: http://svn.apache.org/viewvc/zeppelin/site/docs/0.7.0-SNAPSHOT/storage/storage.html?rev=1764207&r1=1764206&r2=1764207&view=diff
==============================================================================
--- zeppelin/site/docs/0.7.0-SNAPSHOT/storage/storage.html (original)
+++ zeppelin/site/docs/0.7.0-SNAPSHOT/storage/storage.html Tue Oct 11 07:28:09 2016
@@ -80,6 +80,7 @@
<li role="separator" class="divider"></li>
<li class="title"><span><b>More</b><span></li>
<li><a href="/docs/0.7.0-SNAPSHOT/install/upgrade.html">Upgrade Zeppelin Version</a></li>
+ <li><a href="/docs/0.7.0-SNAPSHOT/quickstart/install_with_flink_and_spark_cluster.html">Install Zeppelin with Flink and Spark Clusters Tutorial</a></li>
</ul>
</li>
<li>