You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Andrew Ash <an...@andrewash.com> on 2014/05/30 18:21:04 UTC

bin/spark-shell --jars option

Hi Spark users,

In past Spark releases I always had to add jars to multiple places when
using the spark-shell, and I'm looking to cut down on those.  The --jars
option looks like it does what I want, but it doesn't work.  I did a quick
experiment on latest branch-1.0 and found this:

*# 0) jar not added anywhere*
./bin/spark-shell --master spark://aash-mbp.local:7077
spark> import org.joda.time.DateTime
[fails -- expected because the .jar isn't anywhere]

*# 1) just --jars*
./bin/spark-shell --master spark://aash-mbp.local:7077 --jars
/tmp/joda-time-2.3.jar
spark> import org.joda.time.DateTime
[fails -- but might work on non-standalone clusters?]

*# 2) using --jars and sc.addJar()*
./bin/spark-shell --master spark://aash-mbp.local:7077 --jars
/tmp/joda-time-2.3.jar
spark> sc.addJar("/tmp/joda-time-2.3.jar")
spark> import org.joda.time.DateTime
[fails -- shouldn't sc.addJar() make imports possible?]

*# 3) just --driver-class-path*
./bin/spark-shell --master spark://aash-mbp.local:7077 --driver-class-path
/tmp/joda-time-2.3.jar
spark> import org.joda.time.DateTime
spark> new DateTime()
res0: org.joda.time.DateTime = 2014-05-29T11:10:56.745-07:00
spark> sc.parallelize(1 to 10).map(k => new DateTime()).collect
[fails -- expected because jar wasn't ever sent to executors, only driver]

*# 4) using --driver-class-path and sc.addJar()*
./bin/spark-shell --master spark://aash-mbp.local:7077 --driver-class-path
/tmp/joda-time-2.3.jar
spark> import org.joda.time.DateTime
spark> sc.addJar("/tmp/joda-time-2.3.jar")
spark> new DateTime()
res0: org.joda.time.DateTime = 2014-05-29T11:10:56.745-07:00
spark> sc.parallelize(1 to 10).map(k => new DateTime()).collect
[success!]


Looking at the documentation for --jars, it looks like --jars doesn't work
with standalone in cluster deployment mode.  Here are the relevant doc
entries:

  --jars JARS                 A comma-separated list of local jars to
include on the
                              driver classpath and that SparkContext.addJar
will work
                              with. Doesn't work on standalone with
'cluster' deploy mode.

  --driver-class-path         Extra class path entries to pass to the
driver. Note that
                              jars added with --jars are automatically
included in the
                              classpath.


For the --jars comment about not working with standalone, is this something
that can be fixed to make the "1) just --jars" path above work?  Or is
there some larger architecture reason that --jars can't work with
standalone mode?

Appreciate it!
Andrew