You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/08/04 17:12:01 UTC

Zeppelin installation instructions

Introduction

A web-based notebook that enables interactive data analytics. You can make
beautiful data-driven, interactive and collaborative documents with SQL,
Scala and more. More details : https://zeppelin.incubator.apache.org/.
Zeppelin has integration with Spark, SQL, Hive among others to process the
data, visualization and sharing of reports. This wiki will describe how to
setup a Zeppelin on a existing YARN cluster. YARN cluster is created on dev
c3 using Ambari.
Setup

We assume that a YARN cluster is available along Spark history server.
c3 Instance

Create a c3 xLarge instance with CentOs 6.4.x as OS. A new instance is
chosen to make sure there is enough resources available for Zeppelin.

Prepare

Zeppelin node needs to have Hadoop clients installed and hence it must be
prepared before clients can be installed from Ambari. Run the below set of
commands to prepare zeppelin node.
*Prepare*
echo "ssh-rsa
AAAAB3NzaC1yc2EAAAABIwAAAQEAzBihXIpeZey1G1tQecThBZnJarkX2GjzbE+aQ8dL8TsHchAnwWGVwEmiSNes1O/2L7NV1OpO97gbG3DxhZ8joSxkv0or8WWh17FHY0wdS8ypypffE0YKWxeEJqTbTz6y0pizpZuexi2Sq07On3Nln2me9atVvDE0s0U0vH7JMYgcKSDTog/pvNk6Le54RRkQz5yi8bVDZiOMfhJfn2phXmNB42Upij+kiClXXOEz2E70fQo0Bo5+iTNF/oxSk1vrtYDOHtxGcPZYe60TEp8dASB8NG732vgOs6eecR4LQcGKiBN6uDEuMd3vWMK8or59tCVrEh+/h+2XipZ3hnmu7w==
root@ambariserver-3409" >> /root/.ssh/authorized_keys
# So that root user from ambari server can do passwordless login to
zeppelin server.

ntpd
hostname -f
setenforce 0
chkconfig iptables off
/etc/init.d/iptables stop
ipaddr=$(ifconfig | grep 'inet addr:'| grep -v '127.0.0.1' | cut -d: -f2 |
awk '{ print $1}')
fhost=`hostname -f`
echo "$ipaddr $fhost `hostname`"
echo "$ipaddr $fhost `hostname`" >> /etc/hosts
cat /etc/hosts
# Ensure full hostname is present in /etc/hosts and hostname -f shows full
hostname.

Once the zeppelin server is prepared go to Ambari web interface and run a
action to add new host from Hosts tab. Install Hadoop clients only.
Building Zeppelin

In order to build zeppelin Apache Maven, JDK 1.7 & Git needs to be
installed. You can run below commands as root user. (root permissions are
not mandatory)
*Install Git/Java/Maven*
# Install Git
yum install curl-devel expat-devel gettext-devel openssl-devel zlib-devel
yum install gcc perl-ExtUtils-MakeMaker
yum remove git
cd /usr/src
wget https://www.kernel.org/pub/software/scm/git/git-2.0.4.tar.gz
tar xzf git-2.0.4.tar.gz
cd git-2.0.4
make prefix=/usr/local/git all
make prefix=/usr/local/git install
echo "export PATH=$PATH:/usr/local/git/bin" >> /etc/bashrc
source /etc/bashrc
git --version

# Install JDK 1.7
cd /usr/src
#wget
http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-x64.tar.gz?AuthParam=1438626626_b7fb864ed0343b3322bd003ced1e03f5
#Download JDK 1.7
mv jdk-7u79-linux-x64.tar.gz\?AuthParam\=1438626626_b7fb864ed0343b3322bd003ced1e03f5
jdk-7u79-linux-x64.tar.gz
tar -xf jdk-7u79-linux-x64.tar.gz
export JAVA_HOME=/usr/src/jdk1.7.0_79

# Install Apache Maven
wget ftp://mirror.reverse.net/pub/apache/maven/maven-3/3.3.3
/binaries/apache-maven-3.3.3-bin.tar.gz
tar -xf apache-maven-3.3.3-bin.tar.gz
cd apache-maven-3.3.3
export MAVEN_HOME=/usr/src/apache-maven-3.3.3
echo "export PATH=$PATH:/usr/src/apache-maven-3.3.3/bin" >> /etc/bashrc
source /etc/bashrc

git --version
mvn -version

Create a new user zeppelin and switch to that.
*Zeppelin User*
useradd zeppelin
su - zeppelin

Checkout zeppelin from github
*Checkout Zeppelin*
git clone https://github.com/apache/incubator-zeppelin.git
cd /home/zeppelin/incubator-zeppelinmvn -version

Zeppelin must be built against a specific version of Hadoop & Spark. This
is important as appropriate Hadoop, Spark libraries will be bundled
accordingly. To build Zeppelin against Spark 1.3.1 & Hadoop 2.7.1.2.3.1.0-2574.
Run the following command
*Build*
mvn clean package -Pspark-1.3 -Dspark.version=1.3.1 -Dhadoop.version=2.7.0
-Phadoop-2.6 -Pyarn -DskipTests

Once the build is complete (first run takes time) and as zeppelin node had
Hadoop clients installed using Ambari, run the below command to obtain
hadoop version.
*Configuration*
hdp-select status hadoop-client | sed 's/hadoop-client - $.*$/\1/'
# It returned 2.3.1.0-2574

Zeppelin supports Hive interpreter and hence copy hive-site.xml to the conf
folder of Zeppelin. Once Zeppelin is built it will have conf folder.
*Hive*
cp /etc/hive/conf/hive-site.xml /home/zeppelin/incubator-zeppelin/conf

Finally Zeppelin configurations needs to be modified to point to YARN
cluster. Create a copy of zeppelin environment XML
*Configuration*
cp /home/zeppelin/incubator-zeppelin/conf/zeppelin-env.sh.template
/home/zeppelin/incubator-zeppelin/conf/zeppelin-env.sh

And set the following properties
*Configuration*
export JAVA_HOME=/usr/src/jdk1.7.0_79
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_PORT=10008
export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.1.0-2574"
Run Zeppelin

Your zeppelin server is ready to spin. Start server using
*Run*
/home/zeppelin/incubator-zeppelin/bin/zeppelin-daemon.sh start
#and you can stop it using
/home/zeppelin/incubator-zeppelin/bin/zeppelin-daemon.sh stop

As Hadoop clients are installed on zeppelin server, hdfs user would have
been added by Ambari. Switch to hdfs user and create a zeppelin user
directory and make zeppelin user as its owner.
*Zeppelin HDFS Directory*
su hdfs
hdfs dfs -mkdir /user/zeppelin
hdfs dfs -chown zeppelin:hdfs /user/zeppelin
Configure Zeppelin

Once the zeppelin server is started you can open the web interface at
http://zeppelin-server-hostname:10008. Switch to Interpreter tab and have
a look at the out of box support that Zeppelin provides to process the data
that ranges from Spark, Hive, Tajo, Ignite and Lens. Add/Modify the
following properties from Spark interpreter

master, spark.home, spark.driver.extraJavaOptions,
spark.yarn.am.extraJavaOptions and spark.yarn.jar are set appropriately.
The value of spark.home can be obtained from zeppelin node as it has Spark
clients installed.

Under Hive interpreter set HiveServer2 hostname (keep the port as is).

Once these configurations are updated, Zeppelin will prompt you to restart
the interpreter. Accept the prompt and the interpreter will reload the
configurations.

At this point, we are ready to take Zeppelin notebook for a spin.
Navigate to http://$host:10008

You should see a screenshot like the one below:

Debug

Zeppelin creates a log file for each kind of interpreter and does not emit
any kind of error messages on paragraphs. The reason for failure needs to
be looked into log files which is present in logs directory under zeppelin
installation base directory.
*Debug*
[zeppelin@zeppelin-3529 logs]$ pwd
/home/zeppelin/incubator-zeppelin/logs
[zeppelin@zeppelin-3529 logs]$ ls -l
total 844
-rw-rw-r-- 1 zeppelin zeppelin 14648 Aug 3 14:45
zeppelin-interpreter-hive-zeppelin-zeppelin-3529.log
-rw-rw-r-- 1 zeppelin zeppelin 625050 Aug 3 16:05
zeppelin-interpreter-spark-zeppelin-zeppelin-3529.log
-rw-rw-r-- 1 zeppelin zeppelin 200394 Aug 3 21:15
zeppelin-zeppelin-zeppelin-3529.log
-rw-rw-r-- 1 zeppelin zeppelin 16162 Aug 3 14:03
zeppelin-zeppelin-zeppelin-3529.out
[zeppelin@zeppelin-3529 logs]$

--
Deepak

Re: Zeppelin installation instructions

Posted by moon soo Lee <mo...@apache.org>.

Hi,

Thanks for sharing helpful document.
Do you mind sharing a link if this document is published?

Best,
moon


On Tue, Aug 4, 2015 at 8:13 AM ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com> wrote:

> Introduction
>
> A web-based notebook that enables interactive data analytics.  You can
> make beautiful data-driven, interactive and collaborative documents with
> SQL, Scala and more. More details : https://zeppelin.incubator.apache.org/.
> Zeppelin has integration with Spark, SQL, Hive among others to process the
> data, visualization and sharing of reports. This wiki will describe how
> to setup a Zeppelin on a existing YARN cluster. YARN cluster is created on
> dev c3 using Ambari.
> Setup
>
>  We assume that a YARN cluster is available along Spark history server.
> c3 Instance
>
> Create a c3 xLarge instance with CentOs 6.4.x as OS. A new instance is
> chosen to make sure there is enough resources available for Zeppelin.
>
> Prepare
>
> Zeppelin node needs to have Hadoop clients installed and hence it must be
> prepared before clients can be installed from Ambari. Run the below set of
> commands to prepare zeppelin node.
> *Prepare*
> echo "ssh-rsa
> AAAAB3NzaC1yc2EAAAABIwAAAQEAzBihXIpeZey1G1tQecThBZnJarkX2GjzbE+aQ8dL8TsHchAnwWGVwEmiSNes1O/2L7NV1OpO97gbG3DxhZ8joSxkv0or8WWh17FHY0wdS8ypypffE0YKWxeEJqTbTz6y0pizpZuexi2Sq07On3Nln2me9atVvDE0s0U0vH7JMYgcKSDTog/pvNk6Le54RRkQz5yi8bVDZiOMfhJfn2phXmNB42Upij+kiClXXOEz2E70fQo0Bo5+iTNF/oxSk1vrtYDOHtxGcPZYe60TEp8dASB8NG732vgOs6eecR4LQcGKiBN6uDEuMd3vWMK8or59tCVrEh+/h+2XipZ3hnmu7w==
> root@ambariserver-3409" >> /root/.ssh/authorized_keys
> # So that root user from ambari server can do passwordless login to
> zeppelin server.
>
> ntpd
> hostname -f
> setenforce 0
> chkconfig iptables off
> /etc/init.d/iptables stop
> ipaddr=$(ifconfig  | grep 'inet addr:'| grep -v '127.0.0.1' | cut -d: -f2
> | awk '{ print $1}')
> fhost=`hostname -f`
> echo "$ipaddr $fhost `hostname`"
> echo "$ipaddr $fhost `hostname`" >> /etc/hosts
> cat /etc/hosts
> # Ensure full hostname is present in /etc/hosts and hostname -f shows full
> hostname.
>
> Once the zeppelin server is prepared go to Ambari web interface and run a
> action to add new host from Hosts tab. Install Hadoop clients only.
> Building Zeppelin
>
> In order to build zeppelin Apache Maven, JDK 1.7 & Git needs to be
> installed. You can run below commands as root user. (root permissions are
> not mandatory)
> *Install Git/Java/Maven*
> # Install Git
> yum install curl-devel expat-devel gettext-devel openssl-devel zlib-devel
> yum install  gcc perl-ExtUtils-MakeMaker
> yum remove git
> cd /usr/src
> wget https://www.kernel.org/pub/software/scm/git/git-2.0.4.tar.gz
> tar xzf git-2.0.4.tar.gz
> cd git-2.0.4
> make prefix=/usr/local/git all
> make prefix=/usr/local/git install
> echo "export PATH=$PATH:/usr/local/git/bin" >> /etc/bashrc
> source /etc/bashrc
> git --version
>
> # Install JDK 1.7
> cd /usr/src
> #wget
> http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-x64.tar.gz?AuthParam=1438626626_b7fb864ed0343b3322bd003ced1e03f5
> #Download JDK 1.7
> mv jdk-7u79-linux-x64.tar.gz\?AuthParam\=1438626626_b7fb864ed0343b3322bd003ced1e03f5
> jdk-7u79-linux-x64.tar.gz
> tar -xf jdk-7u79-linux-x64.tar.gz
> export JAVA_HOME=/usr/src/jdk1.7.0_79
>
> # Install Apache Maven
> wget ftp://mirror.reverse.net/pub/apache/maven/maven-3/3.3.3
> /binaries/apache-maven-3.3.3-bin.tar.gz
> tar -xf apache-maven-3.3.3-bin.tar.gz
> cd apache-maven-3.3.3
> export MAVEN_HOME=/usr/src/apache-maven-3.3.3
> echo "export PATH=$PATH:/usr/src/apache-maven-3.3.3/bin" >> /etc/bashrc
> source /etc/bashrc
>
> git --version
> mvn -version
>
> Create a new user zeppelin and switch to that.
> *Zeppelin User*
> useradd zeppelin
> su - zeppelin
>
> Checkout zeppelin from github
> *Checkout Zeppelin*
>