You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Xuefu Zhang <xz...@cloudera.com> on 2015/12/16 23:05:06 UTC

Re: making session setting "set spark.master=yarn-client" for Hive on Spark

Mich,

By switching the values for spark.master, you're basically asking Hive to
use your YARN cluster rather than your spark standalone cluster. Both modes
are supported besides local, local-cluster, and yarn-cluster. And
yarn-cluster is the recommended mode.

Thanks,
Xuefu

On Wed, Dec 16, 2015 at 1:39 PM, Mich Talebzadeh <mi...@peridale.co.uk>
wrote:

> Hi,
>
>
>
> My environment:
>
>
>
> Hadoop 2.6.0
>
> Hive 1.2.1
>
> spark-1.3.1-bin-hadoop2.6 (downloaded from prebuild spark-1.3.1-bin-hadoop2.6.gz
>
>
> The Jar file used in $HIVE_HOME/lib to link Hive to spark was à
> spark-assembly-1.3.1-hadoop2.4.0.jar
>
>    (built from the source downloaded as zipped file spark-1.3.1.gz and
> built with command line make-distribution.sh --name
> "hadoop2-without-hive" --tgz
> "-Pyarn,hadoop-provided,hadoop-2.4,parquet-provided"
>
>
>
> I try to use Hive on Spark.
>
>
>
> Before I had:
>
>
>
> set spark.home=/usr/lib/spark-1.3.1-bin-hadoop2.6;
>
> set hive.execution.engine=spark;
>
> set spark.master=spark://50.140.197.217:7077;
>
> set spark.eventLog.enabled=true;
>
> set spark.eventLog.dir=/usr/lib/spark-1.3.1-bin-hadoop2.6/logs;
>
> set spark.executor.memory=512m;
>
> set spark.executor.cores=2;
>
> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
>
> set hive.spark.client.server.connect.timeout=220000ms;
>
> set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec;
>
> set spark.SPARK_PID_DIR=/work/hadoop/tmp/spark;
>
>
>
>
>
> And It sporadically worked
>
>
>
>
>
> Today I changed spark.master to
>
>
>
> set spark.master=yarn-client;
>
>
>
> and it works fine without any intermittent connectivity issue. The Haddop
> application UI shows the job as “Hive on Spark”  and the application type
> as SPARK as well.
>
>
>
>
>
> What are the implications of this please?
>
>
>
>
>
>
>
> Thanks
>
>
>
>
>
>
>
> Mich Talebzadeh
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Ltd, its subsidiaries nor their employees
> accept any responsibility.
>
>
>

RE: making session setting "set spark.master=yarn-client" for Hive on Spark

Posted by Mich Talebzadeh <mi...@peridale.co.uk>.

Sounds like from the following list of session settings for hive

 

set spark.home=/usr/lib/spark-1.3.1-bin-hadoop2.6;

set hive.execution.engine=spark;

set spark.master=yarn-client;

set spark.master=spark://50.140.197.217:7077;

set spark.eventLog.enabled=true;

set spark.eventLog.dir=/usr/lib/spark-1.3.1-bin-hadoop2.6/logs;

set spark.executor.memory=512m;

set spark.executor.cores=2;

set spark.serializer=org.apache.spark.serializer.KryoSerializer;

set hive.spark.client.server.connect.timeout=220000ms;

set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec;

set spark.spark_pid_dir=/work/hadoop/tmp/spark;

 

 

set spark.home and set spark.master parameters cannot be setup in hive-site.xml. The rest I can

 

In other words for the query to work every query has to make references to

 

set spark.home=/usr/lib/spark-1.3.1-bin-hadoop2.6;

set spark.master=yarn-client;

 

Can someone please confirm this or correct the configuration settings below?

 

 

These are my settings

 

  <property>

    <name>spark.home</name>

    <value>/usr/lib/spark-1.3.1-bin-hadoop2</value>

    <description>something</description>

  </property>

  <property>

    <name>spark.master</name>

    <value>yarn-client</value>

    <description>something</description>

  </property>

  <property>

    <name>spark.eventLog.enabled</name>

    <value>true</value>

    <description>something</description>

  </property>

  <property>

    <name>spark.eventLog.dir</name>

    <value>/usr/lib/spark-1.3.1-bin-hadoop2.6/logs</value>

    <description>something</description>

  </property>

  <property>

    <name>spark.executor.memory</name>

    <value>512m</value>

    <description>something</description>

  </property>

  <property>

    <name>spark.serializer</name>

    <value>org.apache.spark.serializer.KryoSerializer</value>

    <description>something</description>

  </property>

  <property>

    <name>hive.spark.client.server.connect.timeout</name>

    <value>220000ms</value>

    <description>something</description>

  </property>

  <property>

    <name>spark.io.compression.codec</name>

    <value>org.apache.spark.io.LZFCompressionCodec</value>

    <description>something</description>

  </property>

  <property>

    <name>spark.spark_pid_dir</name>

    <value>/work/hadoop/tmp/spark</value>

    <description>something</description>

  </property>

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.

 

From: Mich Talebzadeh [mailto:mich@peridale.co.uk] 
Sent: 16 December 2015 22:23
To: user@hive.apache.org
Subject: RE: making session setting "set spark.master=yarn-client" for Hive on Spark

 

Thanks.

 

With spark.master=yarn-cluster I see much stable connections and better there is no need to start spark master on port 7077 etc.

 

Cheers,

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.

 

From: Xuefu Zhang [mailto:xzhang@cloudera.com] 
Sent: 16 December 2015 22:05
To: user@hive.apache.org <ma...@hive.apache.org> 
Subject: Re: making session setting "set spark.master=yarn-client" for Hive on Spark

 

Mich,

By switching the values for spark.master, you're basically asking Hive to use your YARN cluster rather than your spark standalone cluster. Both modes are supported besides local, local-cluster, and yarn-cluster. And yarn-cluster is the recommended mode.

Thanks,

Xuefu

 

On Wed, Dec 16, 2015 at 1:39 PM, Mich Talebzadeh <mich@peridale.co.uk <ma...@peridale.co.uk> > wrote:

Hi,

 

My environment:

 

Hadoop 2.6.0

Hive 1.2.1

spark-1.3.1-bin-hadoop2.6 (downloaded from prebuild spark-1.3.1-bin-hadoop2.6.gz 

The Jar file used in $HIVE_HOME/lib to link Hive to spark was à spark-assembly-1.3.1-hadoop2.4.0.jar 

   (built from the source downloaded as zipped file spark-1.3.1.gz and built with command line make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.4,parquet-provided"

 

I try to use Hive on Spark.

 

Before I had:

 

set spark.home=/usr/lib/spark-1.3.1-bin-hadoop2.6;

set hive.execution.engine=spark;

set spark.master=spark://50.140.197.217:7077 <http://50.140.197.217:7077> ;

set spark.eventLog.enabled=true;

set spark.eventLog.dir=/usr/lib/spark-1.3.1-bin-hadoop2.6/logs;

set spark.executor.memory=512m;

set spark.executor.cores=2;

set spark.serializer=org.apache.spark.serializer.KryoSerializer;

set hive.spark.client.server.connect.timeout=220000ms;

set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec;

set spark.SPARK_PID_DIR=/work/hadoop/tmp/spark;

 

 

And It sporadically worked

 

 

Today I changed spark.master to

 

set spark.master=yarn-client;

 

and it works fine without any intermittent connectivity issue. The Haddop application UI shows the job as “Hive on Spark”  and the application type as SPARK as well.

 

 

What are the implications of this please?

 

 

 

Thanks

 

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.

RE: making session setting "set spark.master=yarn-client" for Hive on Spark

Posted by Mich Talebzadeh <mi...@peridale.co.uk>.

Thanks.

 

With spark.master=yarn-cluster I see much stable connections and better there is no need to start spark master on port 7077 etc.

 

Cheers,

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.

 

From: Xuefu Zhang [mailto:xzhang@cloudera.com] 
Sent: 16 December 2015 22:05
To: user@hive.apache.org
Subject: Re: making session setting "set spark.master=yarn-client" for Hive on Spark

 

Mich,

By switching the values for spark.master, you're basically asking Hive to use your YARN cluster rather than your spark standalone cluster. Both modes are supported besides local, local-cluster, and yarn-cluster. And yarn-cluster is the recommended mode.

Thanks,

Xuefu

 

On Wed, Dec 16, 2015 at 1:39 PM, Mich Talebzadeh <mich@peridale.co.uk <ma...@peridale.co.uk> > wrote:

Hi,

 

My environment:

 

Hadoop 2.6.0

Hive 1.2.1

spark-1.3.1-bin-hadoop2.6 (downloaded from prebuild spark-1.3.1-bin-hadoop2.6.gz 

The Jar file used in $HIVE_HOME/lib to link Hive to spark was à spark-assembly-1.3.1-hadoop2.4.0.jar 

   (built from the source downloaded as zipped file spark-1.3.1.gz and built with command line make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.4,parquet-provided"

 

I try to use Hive on Spark.

 

Before I had:

 

set spark.home=/usr/lib/spark-1.3.1-bin-hadoop2.6;

set hive.execution.engine=spark;

set spark.master=spark://50.140.197.217:7077 <http://50.140.197.217:7077> ;

set spark.eventLog.enabled=true;

set spark.eventLog.dir=/usr/lib/spark-1.3.1-bin-hadoop2.6/logs;

set spark.executor.memory=512m;

set spark.executor.cores=2;

set spark.serializer=org.apache.spark.serializer.KryoSerializer;

set hive.spark.client.server.connect.timeout=220000ms;

set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec;

set spark.SPARK_PID_DIR=/work/hadoop/tmp/spark;

 

 

And It sporadically worked

 

 

Today I changed spark.master to

 

set spark.master=yarn-client;

 

and it works fine without any intermittent connectivity issue. The Haddop application UI shows the job as “Hive on Spark”  and the application type as SPARK as well.

 

 

What are the implications of this please?

 

 

 

Thanks

 

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.