You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Mich Talebzadeh <mi...@peridale.co.uk> on 2015/12/07 01:47:08 UTC

The Hive shell and Spark issue

Hi,

 

Sounds like the issue with Hive and Spark as Hive engine comes from the
following lines in $HIVE_HOME/bin/hive which is

 

# add Spark assembly jar to the classpath

if [[ -n "$SPARK_HOME" ]]

then

  sparkAssemblyPath=`ls ${SPARK_HOME}/lib/spark-assembly-*.jar`

  CLASSPATH="${CLASSPATH}:${sparkAssemblyPath}"

fi

 

As we know Hive will not be able to use Spark with spark-assembly-*.jar
which is located in pre-built spark download. It will not work! For now as a
work-around you need to build Spark from source code and exclude Hive
libraries. Then copy spark-assembly-*.jar file from
$SPARK_BUILT_FROM_SOURCE_CODE_HOME/lib to $HIVE_HOME/lib. That is if you
want to test Hive with Spark engine.

 

So either you have to unset the ENV variable $SPARK_HOME when connecting to
Hive CLI or comment out CLASSPATH="${CLASSPATH}:${sparkAssemblyPath}" in
$HIVE_HOME/bin/hive

 

But leaving the file  "spark-assembly-*.jar" in $HIVE_HOME/bin/hive seems to
cause Hive server not to start properly so client connections like beeline
don't seem to work as well.

 

I am still investigating. 

 

 

HTH

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

 
<http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908
.pdf>
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.
pdf

Author of the books "A Practitioner's Guide to Upgrading to Sybase ASE 15",
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN:
978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume
one out shortly

 

 <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 


RE: The Hive shell and Spark issue

Posted by Mich Talebzadeh <mi...@peridale.co.uk>.
Few other observations based upon my experience with making Hive 1.2.1 use
spark-1.3.1-bin-hadoop2.6 and use the jar file built from source code (spark
version 1.3.1) -> spark-assembly-1.3.1-hadoop2.4.0.jar 

 

1.    Putting spark-assembly-1.3.1-hadoop2.4.0.jar simply in $HIVE_HOME/lib
is not going to work as you are going to get all sorts of stack traces. This
is because the shell script $HIVE_HOME/ bin/hive is going to create
CLASSPATH which results in hive not starting

 

2.    The code simply does

 

for f in ${HIVE_LIB}/*.jar; do

  CLASSPATH=${CLASSPATH}:$f;

done

 

# add Spark assembly jar to the classpath

if [[ -n "$SPARK_HOME" ]]

then

  sparkAssemblyPath=`ls ${SPARK_HOME}/lib/spark-assembly-*.jar`

  CLASSPATH="${CLASSPATH}:${sparkAssemblyPath}"

fi

 

The first loop adds all jar files to the CLASSPATH which ends up
spark-assembly-1.3.1-hadoop2.4.0.jar being ahead of Hadoop related jar
files. The file spark-assembly-1.3.1-hadoop2.4.0.jar is pretty older
version!

 

The second loop states that if £SPARK_HOME is set up then add
spark-assembly-*.jar from $SPARK_HOME/lib to the CLASSPATH which we know
will not work because of class dependencies.

 

 

The proposed solution

 

1.    Before starting Hive do

2.    unset $SPARK_HOME

3.    create a new environment variable to indicate that you want to use
Spark as execution engine for Hive  --> HIVE_ON_SPARK='Y'

 

Modify hive shell to do the following:

 

# Exclude any spark-assemly*.jar from the normal build for hive

for f in `ls ${HIVE_LIB}/*.jar|grep -v spark-assembly-1.3.1-hadoop2.4.0.jar`

do

  CLASSPATH=${CLASSPATH}:$f;

done

CLASSPATH=${CLASSPATH}:

 

# Add Spark assembly jar to the classpath for future work. Otherwise ensure
SPARK_HOME is unset outside of this shell

if [[ -n "$SPARK_HOME" ]]

then

  sparkAssemblyPath=`ls ${SPARK_HOME}/lib/spark-assembly-*.jar`

  CLASSPATH="${CLASSPATH}:${sparkAssemblyPath}"

fi

 

# Add Spark assembly jar to the classpath for Hive on Spark engine as a
work-around! Set HIVE_ON_SPARK=’Y’ outside of this shell

if [[ -n "$HIVE_ON_SPARK" ]]

then

  sparkAssemblyPath=`ls ${HIVE_HOME}/lib/spark-assembly-*.jar`

  CLASSPATH="${CLASSPATH}:${sparkAssemblyPath}"

fi

 

HTH,

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

 
<http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908
.pdf>
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.
pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15",
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN:
978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume
one out shortly

 

 <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 

From: Mich Talebzadeh [mailto:mich@peridale.co.uk] 
Sent: 07 December 2015 00:47
To: user@hive.apache.org
Subject: The Hive shell and Spark issue

 

Hi,

 

Sounds like the issue with Hive and Spark as Hive engine comes from the
following lines in $HIVE_HOME/bin/hive which is

 

# add Spark assembly jar to the classpath

if [[ -n "$SPARK_HOME" ]]

then

  sparkAssemblyPath=`ls ${SPARK_HOME}/lib/spark-assembly-*.jar`

  CLASSPATH="${CLASSPATH}:${sparkAssemblyPath}"

fi

 

As we know Hive will not be able to use Spark with spark-assembly-*.jar
which is located in pre-built spark download. It will not work! For now as a
work-around you need to build Spark from source code and exclude Hive
libraries. Then copy spark-assembly-*.jar file from
$SPARK_BUILT_FROM_SOURCE_CODE_HOME/lib to $HIVE_HOME/lib. That is if you
want to test Hive with Spark engine.

 

So either you have to unset the ENV variable $SPARK_HOME when connecting to
Hive CLI or comment out CLASSPATH="${CLASSPATH}:${sparkAssemblyPath}" in
$HIVE_HOME/bin/hive

 

But leaving the file  “spark-assembly-*.jar” in $HIVE_HOME/bin/hive seems to
cause Hive server not to start properly so client connections like beeline
don’t seem to work as well.

 

I am still investigating. 

 

 

HTH

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

 
<http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908
.pdf>
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.
pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15",
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN:
978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume
one out shortly

 

 <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.