You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by jiml <ji...@megalearningllc.com> on 2016/01/12 07:34:33 UTC

Re: how to submit multiple jar files when using spark-submit script in shell?

Question is: Looking for all the ways to specify a set of jars using --jars
on spark-submit

I know this is old but I am about to submit a proposed docs change on
--jars, and I had an issue with --jars today

When this user submitted the following command line, is that a proper way to
reference a jar?

hdfs://master:8000/srcdata/kmeans  (is that a directory? or a jar that
doesn't end with .jar? I have not gotten into the machine learning libs yet
to recognize this)

I know the docs say, "Path to a bundled jar including your application and
all dependencies. The URL must be globally visible inside of your cluster,
for instance, an hdfs:// path or a file:// path that is present on all
nodes."

*So this application-jar can point to a directory and will be expanded? Or
needs to be a path to a single specific jar?*

I ask because when I was testing --jars today, we had to explicitly provide
a path to each jar:

//usr/local/spark/bin/spark-submit --class jpsgcs.thold.PipeLinkageData
---jars=local:/usr/local/spark/jars/groovy-all-2.3.3.jar,local:/usr/local/spark/jars/guava-14.0.1.jar,local:/usr/local/spark/jars/jopt-simple-4.6.jar,local:/usr/local/spark/jars/jpsgcs-core-1.0.8-2.jar,local:/usr/local/spark/jars/jpsgcs-pipe-1.0.6-7.jar
/usr/local/spark/jars/thold-0.0.1-1.jar/

(The only way I figured out to use the commas was a StackOverflow answer
that led me to look beyond the docs to the command line: spark-submit --help
results in :

 --jars JARS                 Comma-separated list of local jars to include
on the driver
                              and executor classpaths.


And it seems that we do not need to put the main jar in the --jars argument,
I have not tested yet if other classes in the application-jar
(/usr/local/spark/jars/thold-0.0.1-1.jar) are shipped to workers, or if I
need to put the application-jar in the --jars path to get classes not named
after --class to be seen?

Thanks for any ideas




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-submit-multiple-jar-files-when-using-spark-submit-script-in-shell-tp16662p25942.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Fwd: how to submit multiple jar files when using spark-submit script in shell?

Posted by Jim Lohse <sp...@megalearningllc.com>.

Thanks for your answer, you are correct, it's just a different approach 
than the one I am asking for :)

Building an uber- or assembly- jar goes against the idea of placing the 
jars on all workers. Uber-jars increase network traffic, using local:/ 
in the classpath reduces network traffic.

Eventually, depending on uber-jars can run into various problems.

Really the question is narrowly geared toward understand what arguments 
can setup the classpath using the --jars argument. Using an uber-jar is 
a workaround, true, but with downsides.

Thanks!

On 01/12/2016 12:06 AM, UMESH CHAUDHARY wrote:
>
>
> Could you build a fat jar by including all your dependencies along 
> with you application. See here 
> <http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management> and 
> here 
> <http://spark.apache.org/docs/latest/submitting-applications.html#bundling-your-applications-dependencies> . 
>
>
> Also:
> /*So this application-jar can point to a directory and will be 
> expanded? Or
> needs to be a path to a single specific jar?*/
> /
> /
> *This will be path to a single specific JAR.*
>
> On Tue, Jan 12, 2016 at 12:04 PM, jiml <jim@megalearningllc.com 
> <ma...@megalearningllc.com>> wrote:
>
>     Question is: Looking for all the ways to specify a set of jars
>     using --jars
>     on spark-submit
>
>     I know this is old but I am about to submit a proposed docs change on
>     --jars, and I had an issue with --jars today
>
>     When this user submitted the following command line, is that a
>     proper way to
>     reference a jar?
>
>     hdfs://master:8000/srcdata/kmeans  (is that a directory? or a jar that
>     doesn't end with .jar? I have not gotten into the machine learning
>     libs yet
>     to recognize this)
>
>     I know the docs say, "Path to a bundled jar including your
>     application and
>     all dependencies. The URL must be globally visible inside of your
>     cluster,
>     for instance, an hdfs:// path or a file:// path that is present on all
>     nodes."
>
>     *So this application-jar can point to a directory and will be
>     expanded? Or
>     needs to be a path to a single specific jar?*
>
>     I ask because when I was testing --jars today, we had to
>     explicitly provide
>     a path to each jar:
>
>     //usr/local/spark/bin/spark-submit --class
>     jpsgcs.thold.PipeLinkageData
>     ---jars=local:/usr/local/spark/jars/groovy-all-2.3.3.jar,local:/usr/local/spark/jars/guava-14.0.1.jar,local:/usr/local/spark/jars/jopt-simple-4.6.jar,local:/usr/local/spark/jars/jpsgcs-core-1.0.8-2.jar,local:/usr/local/spark/jars/jpsgcs-pipe-1.0.6-7.jar
>     /usr/local/spark/jars/thold-0.0.1-1.jar/
>
>     (The only way I figured out to use the commas was a StackOverflow
>     answer
>     that led me to look beyond the docs to the command line:
>     spark-submit --help
>     results in :
>
>      --jars JARS                 Comma-separated list of local jars to
>     include
>     on the driver
>                                   and executor classpaths.
>
>
>     And it seems that we do not need to put the main jar in the --jars
>     argument,
>     I have not tested yet if other classes in the application-jar
>     (/usr/local/spark/jars/thold-0.0.1-1.jar) are shipped to workers,
>     or if I
>     need to put the application-jar in the --jars path to get classes
>     not named
>     after --class to be seen?
>
>     Thanks for any ideas
>
>
>
>
>     --
>     View this message in context:
>     http://apache-spark-user-list.1001560.n3.nabble.com/how-to-submit-multiple-jar-files-when-using-spark-submit-script-in-shell-tp16662p25942.html
>     Sent from the Apache Spark User List mailing list archive at
>     Nabble.com.
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>     <ma...@spark.apache.org>
>     For additional commands, e-mail: user-help@spark.apache.org
>     <ma...@spark.apache.org>
>
>
>

Fwd: how to submit multiple jar files when using spark-submit script in shell?

Posted by UMESH CHAUDHARY <um...@gmail.com>.

Could you build a fat jar by including all your dependencies along with you
application. See here
<http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management>
 and here
<http://spark.apache.org/docs/latest/submitting-applications.html#bundling-your-applications-dependencies>
 .

Also:

**So this application-jar can point to a directory and will be expanded?
Orneeds to be a path to a single specific jar?**

*This will be path to a single specific JAR.*

On Tue, Jan 12, 2016 at 12:04 PM, jiml <ji...@megalearningllc.com> wrote:

> Question is: Looking for all the ways to specify a set of jars using --jars
> on spark-submit
>
> I know this is old but I am about to submit a proposed docs change on
> --jars, and I had an issue with --jars today
>
> When this user submitted the following command line, is that a proper way
> to
> reference a jar?
>
> hdfs://master:8000/srcdata/kmeans  (is that a directory? or a jar that
> doesn't end with .jar? I have not gotten into the machine learning libs yet
> to recognize this)
>
> I know the docs say, "Path to a bundled jar including your application and
> all dependencies. The URL must be globally visible inside of your cluster,
> for instance, an hdfs:// path or a file:// path that is present on all
> nodes."
>
> *So this application-jar can point to a directory and will be expanded? Or
> needs to be a path to a single specific jar?*
>
> I ask because when I was testing --jars today, we had to explicitly provide
> a path to each jar:
>
> //usr/local/spark/bin/spark-submit --class jpsgcs.thold.PipeLinkageData
>
> ---jars=local:/usr/local/spark/jars/groovy-all-2.3.3.jar,local:/usr/local/spark/jars/guava-14.0.1.jar,local:/usr/local/spark/jars/jopt-simple-4.6.jar,local:/usr/local/spark/jars/jpsgcs-core-1.0.8-2.jar,local:/usr/local/spark/jars/jpsgcs-pipe-1.0.6-7.jar
> /usr/local/spark/jars/thold-0.0.1-1.jar/
>
> (The only way I figured out to use the commas was a StackOverflow answer
> that led me to look beyond the docs to the command line: spark-submit
> --help
> results in :
>
>  --jars JARS                 Comma-separated list of local jars to include
> on the driver
>                               and executor classpaths.
>
>
> And it seems that we do not need to put the main jar in the --jars
> argument,
> I have not tested yet if other classes in the application-jar
> (/usr/local/spark/jars/thold-0.0.1-1.jar) are shipped to workers, or if I
> need to put the application-jar in the --jars path to get classes not named
> after --class to be seen?
>
> Thanks for any ideas
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/how-to-submit-multiple-jar-files-when-using-spark-submit-script-in-shell-tp16662p25942.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>