You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sa...@wellsfargo.com on 2016/11/21 19:04:06 UTC
Cluster deploy mode driver location
Hello there,
I have a Spark program in 1.6.1, however, when I submit it to cluster, it randomly picks the driver.
I know there is a driver specification option, but along with it it is mandatory to define many other options I am not familiar with. The trouble is, the .jars I am launching need to be available at the driver host, and I would like to have this jars in just a specific host, which I like it to be the driver.
Any help?
Thanks!
Saif
Re: Cluster deploy mode driver location
Posted by Masood Krohy <ma...@intact.net>.
You may also try distributing your JARS along with your Spark app; see
options below. You put on the client node whatever that is necessary and
submit them all in each run. There is also a --files option which you can
remove below, but may be helpful for some configs.
You do not need to specify all the arguments; the default values are
picked up when not explicitly given.
spark-submit \
--master yarn \
--deploy-mode cluster \
--num-executors 2 \
--driver-memory 4g \
--executor-memory 8g \
--files /usr/hdp/current/spark-client/conf/hive-site.xml \
--jars
/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar
\
--class "SparkApp" \
/pathToAppOnTheClientNode/SparkApp.jar (if any, arguments passed to the
Spark App here)
Masood
------------------------------
Masood Krohy, Ph.D.
Data Scientist, Intact Lab-R&D
Intact Financial Corporation
http://ca.linkedin.com/in/masoodkh
De : Silvio Fiorito <si...@granturing.com>
A : "Saif.A.Ellafi@wellsfargo.com" <Sa...@wellsfargo.com>,
"user@spark.apache.org" <us...@spark.apache.org>
Date : 2016-11-22 08:02
Objet : Re: Cluster deploy mode driver location
Hi Saif!
Unfortunately, I don't think this is possible for YARN driver-cluster
mode. Regarding the JARs you're referring to, can you place them on HDFS
so you can then have them in a central location and refer to them that way
for dependencies?
http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management
Thanks,
Silvio
From: Saif.A.Ellafi@wellsfargo.com <Sa...@wellsfargo.com>
Sent: Monday, November 21, 2016 2:04:06 PM
To: user@spark.apache.org
Subject: Cluster deploy mode driver location
Hello there,
I have a Spark program in 1.6.1, however, when I submit it to cluster, it
randomly picks the driver.
I know there is a driver specification option, but along with it it is
mandatory to define many other options I am not familiar with. The trouble
is, the .jars I am launching need to be available at the driver host, and
I would like to have this jars in just a specific host, which I like it to
be the driver.
Any help?
Thanks!
Saif
Re: Cluster deploy mode driver location
Posted by Silvio Fiorito <si...@granturing.com>.
Hi Saif!
Unfortunately, I don't think this is possible for YARN driver-cluster mode. Regarding the JARs you're referring to, can you place them on HDFS so you can then have them in a central location and refer to them that way for dependencies?
http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management
Thanks,
Silvio
________________________________
From: Saif.A.Ellafi@wellsfargo.com <Sa...@wellsfargo.com>
Sent: Monday, November 21, 2016 2:04:06 PM
To: user@spark.apache.org
Subject: Cluster deploy mode driver location
Hello there,
I have a Spark program in 1.6.1, however, when I submit it to cluster, it randomly picks the driver.
I know there is a driver specification option, but along with it it is mandatory to define many other options I am not familiar with. The trouble is, the .jars I am launching need to be available at the driver host, and I would like to have this jars in just a specific host, which I like it to be the driver.
Any help?
Thanks!
Saif