You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Andrew Ash <an...@andrewash.com> on 2014/05/30 18:21:04 UTC
bin/spark-shell --jars option
Hi Spark users,
In past Spark releases I always had to add jars to multiple places when
using the spark-shell, and I'm looking to cut down on those. The --jars
option looks like it does what I want, but it doesn't work. I did a quick
experiment on latest branch-1.0 and found this:
*# 0) jar not added anywhere*
./bin/spark-shell --master spark://aash-mbp.local:7077
spark> import org.joda.time.DateTime
[fails -- expected because the .jar isn't anywhere]
*# 1) just --jars*
./bin/spark-shell --master spark://aash-mbp.local:7077 --jars
/tmp/joda-time-2.3.jar
spark> import org.joda.time.DateTime
[fails -- but might work on non-standalone clusters?]
*# 2) using --jars and sc.addJar()*
./bin/spark-shell --master spark://aash-mbp.local:7077 --jars
/tmp/joda-time-2.3.jar
spark> sc.addJar("/tmp/joda-time-2.3.jar")
spark> import org.joda.time.DateTime
[fails -- shouldn't sc.addJar() make imports possible?]
*# 3) just --driver-class-path*
./bin/spark-shell --master spark://aash-mbp.local:7077 --driver-class-path
/tmp/joda-time-2.3.jar
spark> import org.joda.time.DateTime
spark> new DateTime()
res0: org.joda.time.DateTime = 2014-05-29T11:10:56.745-07:00
spark> sc.parallelize(1 to 10).map(k => new DateTime()).collect
[fails -- expected because jar wasn't ever sent to executors, only driver]
*# 4) using --driver-class-path and sc.addJar()*
./bin/spark-shell --master spark://aash-mbp.local:7077 --driver-class-path
/tmp/joda-time-2.3.jar
spark> import org.joda.time.DateTime
spark> sc.addJar("/tmp/joda-time-2.3.jar")
spark> new DateTime()
res0: org.joda.time.DateTime = 2014-05-29T11:10:56.745-07:00
spark> sc.parallelize(1 to 10).map(k => new DateTime()).collect
[success!]
Looking at the documentation for --jars, it looks like --jars doesn't work
with standalone in cluster deployment mode. Here are the relevant doc
entries:
--jars JARS A comma-separated list of local jars to
include on the
driver classpath and that SparkContext.addJar
will work
with. Doesn't work on standalone with
'cluster' deploy mode.
--driver-class-path Extra class path entries to pass to the
driver. Note that
jars added with --jars are automatically
included in the
classpath.
For the --jars comment about not working with standalone, is this something
that can be fixed to make the "1) just --jars" path above work? Or is
there some larger architecture reason that --jars can't work with
standalone mode?
Appreciate it!
Andrew