You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Xuefu Zhang <xz...@cloudera.com> on 2015/12/16 23:05:06 UTC
Re: making session setting "set spark.master=yarn-client" for Hive on Spark
Mich,
By switching the values for spark.master, you're basically asking Hive to
use your YARN cluster rather than your spark standalone cluster. Both modes
are supported besides local, local-cluster, and yarn-cluster. And
yarn-cluster is the recommended mode.
Thanks,
Xuefu
On Wed, Dec 16, 2015 at 1:39 PM, Mich Talebzadeh <mi...@peridale.co.uk>
wrote:
> Hi,
>
>
>
> My environment:
>
>
>
> Hadoop 2.6.0
>
> Hive 1.2.1
>
> spark-1.3.1-bin-hadoop2.6 (downloaded from prebuild spark-1.3.1-bin-hadoop2.6.gz
>
>
> The Jar file used in $HIVE_HOME/lib to link Hive to spark was à
> spark-assembly-1.3.1-hadoop2.4.0.jar
>
> (built from the source downloaded as zipped file spark-1.3.1.gz and
> built with command line make-distribution.sh --name
> "hadoop2-without-hive" --tgz
> "-Pyarn,hadoop-provided,hadoop-2.4,parquet-provided"
>
>
>
> I try to use Hive on Spark.
>
>
>
> Before I had:
>
>
>
> set spark.home=/usr/lib/spark-1.3.1-bin-hadoop2.6;
>
> set hive.execution.engine=spark;
>
> set spark.master=spark://50.140.197.217:7077;
>
> set spark.eventLog.enabled=true;
>
> set spark.eventLog.dir=/usr/lib/spark-1.3.1-bin-hadoop2.6/logs;
>
> set spark.executor.memory=512m;
>
> set spark.executor.cores=2;
>
> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
>
> set hive.spark.client.server.connect.timeout=220000ms;
>
> set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec;
>
> set spark.SPARK_PID_DIR=/work/hadoop/tmp/spark;
>
>
>
>
>
> And It sporadically worked
>
>
>
>
>
> Today I changed spark.master to
>
>
>
> set spark.master=yarn-client;
>
>
>
> and it works fine without any intermittent connectivity issue. The Haddop
> application UI shows the job as “Hive on Spark” and the application type
> as SPARK as well.
>
>
>
>
>
> What are the implications of this please?
>
>
>
>
>
>
>
> Thanks
>
>
>
>
>
>
>
> Mich Talebzadeh
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Ltd, its subsidiaries nor their employees
> accept any responsibility.
>
>
>
RE: making session setting "set spark.master=yarn-client" for Hive on Spark
Posted by Mich Talebzadeh <mi...@peridale.co.uk>.
Sounds like from the following list of session settings for hive
set spark.home=/usr/lib/spark-1.3.1-bin-hadoop2.6;
set hive.execution.engine=spark;
set spark.master=yarn-client;
set spark.master=spark://50.140.197.217:7077;
set spark.eventLog.enabled=true;
set spark.eventLog.dir=/usr/lib/spark-1.3.1-bin-hadoop2.6/logs;
set spark.executor.memory=512m;
set spark.executor.cores=2;
set spark.serializer=org.apache.spark.serializer.KryoSerializer;
set hive.spark.client.server.connect.timeout=220000ms;
set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec;
set spark.spark_pid_dir=/work/hadoop/tmp/spark;
set spark.home and set spark.master parameters cannot be setup in hive-site.xml. The rest I can
In other words for the query to work every query has to make references to
set spark.home=/usr/lib/spark-1.3.1-bin-hadoop2.6;
set spark.master=yarn-client;
Can someone please confirm this or correct the configuration settings below?
These are my settings
<property>
<name>spark.home</name>
<value>/usr/lib/spark-1.3.1-bin-hadoop2</value>
<description>something</description>
</property>
<property>
<name>spark.master</name>
<value>yarn-client</value>
<description>something</description>
</property>
<property>
<name>spark.eventLog.enabled</name>
<value>true</value>
<description>something</description>
</property>
<property>
<name>spark.eventLog.dir</name>
<value>/usr/lib/spark-1.3.1-bin-hadoop2.6/logs</value>
<description>something</description>
</property>
<property>
<name>spark.executor.memory</name>
<value>512m</value>
<description>something</description>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
<description>something</description>
</property>
<property>
<name>hive.spark.client.server.connect.timeout</name>
<value>220000ms</value>
<description>something</description>
</property>
<property>
<name>spark.io.compression.codec</name>
<value>org.apache.spark.io.LZFCompressionCodec</value>
<description>something</description>
</property>
<property>
<name>spark.spark_pid_dir</name>
<value>/work/hadoop/tmp/spark</value>
<description>something</description>
</property>
Mich Talebzadeh
Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7.
co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4
Publications due shortly:
Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly
http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
From: Mich Talebzadeh [mailto:mich@peridale.co.uk]
Sent: 16 December 2015 22:23
To: user@hive.apache.org
Subject: RE: making session setting "set spark.master=yarn-client" for Hive on Spark
Thanks.
With spark.master=yarn-cluster I see much stable connections and better there is no need to start spark master on port 7077 etc.
Cheers,
Mich Talebzadeh
Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7.
co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4
Publications due shortly:
Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly
http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
From: Xuefu Zhang [mailto:xzhang@cloudera.com]
Sent: 16 December 2015 22:05
To: user@hive.apache.org <ma...@hive.apache.org>
Subject: Re: making session setting "set spark.master=yarn-client" for Hive on Spark
Mich,
By switching the values for spark.master, you're basically asking Hive to use your YARN cluster rather than your spark standalone cluster. Both modes are supported besides local, local-cluster, and yarn-cluster. And yarn-cluster is the recommended mode.
Thanks,
Xuefu
On Wed, Dec 16, 2015 at 1:39 PM, Mich Talebzadeh <mich@peridale.co.uk <ma...@peridale.co.uk> > wrote:
Hi,
My environment:
Hadoop 2.6.0
Hive 1.2.1
spark-1.3.1-bin-hadoop2.6 (downloaded from prebuild spark-1.3.1-bin-hadoop2.6.gz
The Jar file used in $HIVE_HOME/lib to link Hive to spark was à spark-assembly-1.3.1-hadoop2.4.0.jar
(built from the source downloaded as zipped file spark-1.3.1.gz and built with command line make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.4,parquet-provided"
I try to use Hive on Spark.
Before I had:
set spark.home=/usr/lib/spark-1.3.1-bin-hadoop2.6;
set hive.execution.engine=spark;
set spark.master=spark://50.140.197.217:7077 <http://50.140.197.217:7077> ;
set spark.eventLog.enabled=true;
set spark.eventLog.dir=/usr/lib/spark-1.3.1-bin-hadoop2.6/logs;
set spark.executor.memory=512m;
set spark.executor.cores=2;
set spark.serializer=org.apache.spark.serializer.KryoSerializer;
set hive.spark.client.server.connect.timeout=220000ms;
set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec;
set spark.SPARK_PID_DIR=/work/hadoop/tmp/spark;
And It sporadically worked
Today I changed spark.master to
set spark.master=yarn-client;
and it works fine without any intermittent connectivity issue. The Haddop application UI shows the job as “Hive on Spark” and the application type as SPARK as well.
What are the implications of this please?
Thanks
Mich Talebzadeh
Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7.
co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4
Publications due shortly:
Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly
http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
RE: making session setting "set spark.master=yarn-client" for Hive on Spark
Posted by Mich Talebzadeh <mi...@peridale.co.uk>.
Thanks.
With spark.master=yarn-cluster I see much stable connections and better there is no need to start spark master on port 7077 etc.
Cheers,
Mich Talebzadeh
Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7.
co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4
Publications due shortly:
Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly
http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
From: Xuefu Zhang [mailto:xzhang@cloudera.com]
Sent: 16 December 2015 22:05
To: user@hive.apache.org
Subject: Re: making session setting "set spark.master=yarn-client" for Hive on Spark
Mich,
By switching the values for spark.master, you're basically asking Hive to use your YARN cluster rather than your spark standalone cluster. Both modes are supported besides local, local-cluster, and yarn-cluster. And yarn-cluster is the recommended mode.
Thanks,
Xuefu
On Wed, Dec 16, 2015 at 1:39 PM, Mich Talebzadeh <mich@peridale.co.uk <ma...@peridale.co.uk> > wrote:
Hi,
My environment:
Hadoop 2.6.0
Hive 1.2.1
spark-1.3.1-bin-hadoop2.6 (downloaded from prebuild spark-1.3.1-bin-hadoop2.6.gz
The Jar file used in $HIVE_HOME/lib to link Hive to spark was à spark-assembly-1.3.1-hadoop2.4.0.jar
(built from the source downloaded as zipped file spark-1.3.1.gz and built with command line make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.4,parquet-provided"
I try to use Hive on Spark.
Before I had:
set spark.home=/usr/lib/spark-1.3.1-bin-hadoop2.6;
set hive.execution.engine=spark;
set spark.master=spark://50.140.197.217:7077 <http://50.140.197.217:7077> ;
set spark.eventLog.enabled=true;
set spark.eventLog.dir=/usr/lib/spark-1.3.1-bin-hadoop2.6/logs;
set spark.executor.memory=512m;
set spark.executor.cores=2;
set spark.serializer=org.apache.spark.serializer.KryoSerializer;
set hive.spark.client.server.connect.timeout=220000ms;
set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec;
set spark.SPARK_PID_DIR=/work/hadoop/tmp/spark;
And It sporadically worked
Today I changed spark.master to
set spark.master=yarn-client;
and it works fine without any intermittent connectivity issue. The Haddop application UI shows the job as “Hive on Spark” and the application type as SPARK as well.
What are the implications of this please?
Thanks
Mich Talebzadeh
Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7.
co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4
Publications due shortly:
Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly
http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.