You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by jiml <ji...@megalearningllc.com> on 2016/01/12 07:34:33 UTC
Re: how to submit multiple jar files when using spark-submit script
in shell?
Question is: Looking for all the ways to specify a set of jars using --jars
on spark-submit
I know this is old but I am about to submit a proposed docs change on
--jars, and I had an issue with --jars today
When this user submitted the following command line, is that a proper way to
reference a jar?
hdfs://master:8000/srcdata/kmeans (is that a directory? or a jar that
doesn't end with .jar? I have not gotten into the machine learning libs yet
to recognize this)
I know the docs say, "Path to a bundled jar including your application and
all dependencies. The URL must be globally visible inside of your cluster,
for instance, an hdfs:// path or a file:// path that is present on all
nodes."
*So this application-jar can point to a directory and will be expanded? Or
needs to be a path to a single specific jar?*
I ask because when I was testing --jars today, we had to explicitly provide
a path to each jar:
//usr/local/spark/bin/spark-submit --class jpsgcs.thold.PipeLinkageData
---jars=local:/usr/local/spark/jars/groovy-all-2.3.3.jar,local:/usr/local/spark/jars/guava-14.0.1.jar,local:/usr/local/spark/jars/jopt-simple-4.6.jar,local:/usr/local/spark/jars/jpsgcs-core-1.0.8-2.jar,local:/usr/local/spark/jars/jpsgcs-pipe-1.0.6-7.jar
/usr/local/spark/jars/thold-0.0.1-1.jar/
(The only way I figured out to use the commas was a StackOverflow answer
that led me to look beyond the docs to the command line: spark-submit --help
results in :
--jars JARS Comma-separated list of local jars to include
on the driver
and executor classpaths.
And it seems that we do not need to put the main jar in the --jars argument,
I have not tested yet if other classes in the application-jar
(/usr/local/spark/jars/thold-0.0.1-1.jar) are shipped to workers, or if I
need to put the application-jar in the --jars path to get classes not named
after --class to be seen?
Thanks for any ideas
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-submit-multiple-jar-files-when-using-spark-submit-script-in-shell-tp16662p25942.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Fwd: how to submit multiple jar files when using spark-submit
script in shell?
Posted by Jim Lohse <sp...@megalearningllc.com>.
Thanks for your answer, you are correct, it's just a different approach
than the one I am asking for :)
Building an uber- or assembly- jar goes against the idea of placing the
jars on all workers. Uber-jars increase network traffic, using local:/
in the classpath reduces network traffic.
Eventually, depending on uber-jars can run into various problems.
Really the question is narrowly geared toward understand what arguments
can setup the classpath using the --jars argument. Using an uber-jar is
a workaround, true, but with downsides.
Thanks!
On 01/12/2016 12:06 AM, UMESH CHAUDHARY wrote:
>
>
> Could you build a fat jar by including all your dependencies along
> with you application. See here
> <http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management> and
> here
> <http://spark.apache.org/docs/latest/submitting-applications.html#bundling-your-applications-dependencies> .
>
>
> Also:
> /*So this application-jar can point to a directory and will be
> expanded? Or
> needs to be a path to a single specific jar?*/
> /
> /
> *This will be path to a single specific JAR.*
>
> On Tue, Jan 12, 2016 at 12:04 PM, jiml <jim@megalearningllc.com
> <ma...@megalearningllc.com>> wrote:
>
> Question is: Looking for all the ways to specify a set of jars
> using --jars
> on spark-submit
>
> I know this is old but I am about to submit a proposed docs change on
> --jars, and I had an issue with --jars today
>
> When this user submitted the following command line, is that a
> proper way to
> reference a jar?
>
> hdfs://master:8000/srcdata/kmeans (is that a directory? or a jar that
> doesn't end with .jar? I have not gotten into the machine learning
> libs yet
> to recognize this)
>
> I know the docs say, "Path to a bundled jar including your
> application and
> all dependencies. The URL must be globally visible inside of your
> cluster,
> for instance, an hdfs:// path or a file:// path that is present on all
> nodes."
>
> *So this application-jar can point to a directory and will be
> expanded? Or
> needs to be a path to a single specific jar?*
>
> I ask because when I was testing --jars today, we had to
> explicitly provide
> a path to each jar:
>
> //usr/local/spark/bin/spark-submit --class
> jpsgcs.thold.PipeLinkageData
> ---jars=local:/usr/local/spark/jars/groovy-all-2.3.3.jar,local:/usr/local/spark/jars/guava-14.0.1.jar,local:/usr/local/spark/jars/jopt-simple-4.6.jar,local:/usr/local/spark/jars/jpsgcs-core-1.0.8-2.jar,local:/usr/local/spark/jars/jpsgcs-pipe-1.0.6-7.jar
> /usr/local/spark/jars/thold-0.0.1-1.jar/
>
> (The only way I figured out to use the commas was a StackOverflow
> answer
> that led me to look beyond the docs to the command line:
> spark-submit --help
> results in :
>
> --jars JARS Comma-separated list of local jars to
> include
> on the driver
> and executor classpaths.
>
>
> And it seems that we do not need to put the main jar in the --jars
> argument,
> I have not tested yet if other classes in the application-jar
> (/usr/local/spark/jars/thold-0.0.1-1.jar) are shipped to workers,
> or if I
> need to put the application-jar in the --jars path to get classes
> not named
> after --class to be seen?
>
> Thanks for any ideas
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/how-to-submit-multiple-jar-files-when-using-spark-submit-script-in-shell-tp16662p25942.html
> Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> <ma...@spark.apache.org>
> For additional commands, e-mail: user-help@spark.apache.org
> <ma...@spark.apache.org>
>
>
>
Fwd: how to submit multiple jar files when using spark-submit script
in shell?
Posted by UMESH CHAUDHARY <um...@gmail.com>.
Could you build a fat jar by including all your dependencies along with you
application. See here
<http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management>
and here
<http://spark.apache.org/docs/latest/submitting-applications.html#bundling-your-applications-dependencies>
.
Also:
**So this application-jar can point to a directory and will be expanded?
Orneeds to be a path to a single specific jar?**
*This will be path to a single specific JAR.*
On Tue, Jan 12, 2016 at 12:04 PM, jiml <ji...@megalearningllc.com> wrote:
> Question is: Looking for all the ways to specify a set of jars using --jars
> on spark-submit
>
> I know this is old but I am about to submit a proposed docs change on
> --jars, and I had an issue with --jars today
>
> When this user submitted the following command line, is that a proper way
> to
> reference a jar?
>
> hdfs://master:8000/srcdata/kmeans (is that a directory? or a jar that
> doesn't end with .jar? I have not gotten into the machine learning libs yet
> to recognize this)
>
> I know the docs say, "Path to a bundled jar including your application and
> all dependencies. The URL must be globally visible inside of your cluster,
> for instance, an hdfs:// path or a file:// path that is present on all
> nodes."
>
> *So this application-jar can point to a directory and will be expanded? Or
> needs to be a path to a single specific jar?*
>
> I ask because when I was testing --jars today, we had to explicitly provide
> a path to each jar:
>
> //usr/local/spark/bin/spark-submit --class jpsgcs.thold.PipeLinkageData
>
> ---jars=local:/usr/local/spark/jars/groovy-all-2.3.3.jar,local:/usr/local/spark/jars/guava-14.0.1.jar,local:/usr/local/spark/jars/jopt-simple-4.6.jar,local:/usr/local/spark/jars/jpsgcs-core-1.0.8-2.jar,local:/usr/local/spark/jars/jpsgcs-pipe-1.0.6-7.jar
> /usr/local/spark/jars/thold-0.0.1-1.jar/
>
> (The only way I figured out to use the commas was a StackOverflow answer
> that led me to look beyond the docs to the command line: spark-submit
> --help
> results in :
>
> --jars JARS Comma-separated list of local jars to include
> on the driver
> and executor classpaths.
>
>
> And it seems that we do not need to put the main jar in the --jars
> argument,
> I have not tested yet if other classes in the application-jar
> (/usr/local/spark/jars/thold-0.0.1-1.jar) are shipped to workers, or if I
> need to put the application-jar in the --jars path to get classes not named
> after --class to be seen?
>
> Thanks for any ideas
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/how-to-submit-multiple-jar-files-when-using-spark-submit-script-in-shell-tp16662p25942.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>