You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ngoc Dao <ng...@gmail.com> on 2014/06/02 04:12:31 UTC

Re: Is uberjar a recommended way of running Spark/Scala applications?

Alternative solution:
https://github.com/xitrum-framework/xitrum-package

It collects all dependency .jar files in your Scala program into a
directory. It doesn't merge the .jar files together, the .jar files
are left "as is".


On Sat, May 31, 2014 at 3:42 AM, Andrei <fa...@gmail.com> wrote:
> Thanks, Stephen. I have eventually decided to go with assembly, but put away
> Spark and Hadoop jars, and instead use `spark-submit` to automatically
> provide these dependencies. This way no resource conflicts arise and
> mergeStrategy needs no modification. To memorize this stable setup and also
> share it with the community I've crafted a project [1] with minimal working
> config. It is SBT project with assembly plugin, Spark 1.0 and Cloudera's
> Hadoop client. Hope, it will help somebody to take Spark setup quicker.
>
> Though I'm fine with this setup for final builds, I'm still looking for a
> more interactive dev setup - something that doesn't require full rebuild.
>
> [1]: https://github.com/faithlessfriend/sample-spark-project
>
> Thanks and have a good weekend,
> Andrei
>
> On Thu, May 29, 2014 at 8:27 PM, Stephen Boesch <ja...@gmail.com> wrote:
>>
>>
>> The MergeStrategy combined with sbt assembly did work for me.  This is not
>> painless: some trial and error and the assembly may take multiple minutes.
>>
>> You will likely want to filter out some additional classes from the
>> generated jar file.  Here is an SOF answer to explain that and with IMHO the
>> best answer snippet included here (in this case the OP understandably did
>> not want to not include javax.servlet.Servlet)
>>
>> http://stackoverflow.com/questions/7819066/sbt-exclude-class-from-jar
>>
>>
>> mappings in (Compile,packageBin) ~= { (ms: Seq[(File, String)]) => ms
>> filter { case (file, toPath) => toPath != "javax/servlet/Servlet.class" } }
>>
>> There is a setting to not include the project files in the assembly but I
>> do not recall it at this moment.
>>
>>
>>
>> 2014-05-29 10:13 GMT-07:00 Andrei <fa...@gmail.com>:
>>
>>> Thanks, Jordi, your gist looks pretty much like what I have in my project
>>> currently (with few exceptions that I'm going to borrow).
>>>
>>> I like the idea of using "sbt package", since it doesn't require third
>>> party plugins and, most important, doesn't create a mess of classes and
>>> resources. But in this case I'll have to handle jar list manually via Spark
>>> context. Is there a way to automate this process? E.g. when I was a Clojure
>>> guy, I could run "lein deps" (lein is a build tool similar to sbt) to
>>> download all dependencies and then just enumerate them from my app. Maybe
>>> you have heard of something like that for Spark/SBT?
>>>
>>> Thanks,
>>> Andrei
>>>
>>>
>>> On Thu, May 29, 2014 at 3:48 PM, jaranda <jo...@bsc.es> wrote:
>>>>
>>>> Hi Andrei,
>>>>
>>>> I think the preferred way to deploy Spark jobs is by using the sbt
>>>> package
>>>> task instead of using the sbt assembly plugin. In any case, as you
>>>> comment,
>>>> the mergeStrategy in combination with some dependency exlusions should
>>>> fix
>>>> your problems. Have a look at  this gist
>>>> <https://gist.github.com/JordiAranda/bdbad58d128c14277a05>   for further
>>>> details (I just followed some recommendations commented in the sbt
>>>> assembly
>>>> plugin documentation).
>>>>
>>>> Up to now I haven't found a proper way to combine my
>>>> development/deployment
>>>> phases, although I must say my experience in Spark is pretty poor (it
>>>> really
>>>> depends in your deployment requirements as well). In this case, I think
>>>> someone else could give you some further insights.
>>>>
>>>> Best,
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Is-uberjar-a-recommended-way-of-running-Spark-Scala-applications-tp6518p6520.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>>
>>
>

Re: Is uberjar a recommended way of running Spark/Scala applications?

Posted by Pierre Borckmans <pi...@realimpactanalytics.com>.
You might want to look at another great plugin : “sbt-pack” https://github.com/xerial/sbt-pack.

It collects all the dependencies JARs and creates launch scripts for *nix (including Mac OS) and windows.

HTH

Pierre


On 02 Jun 2014, at 17:29, Andrei <fa...@gmail.com> wrote:

> Thanks! This is even closer to what I am looking for. I'm in a trip now, so I'm going to give it a try when I come back. 
> 
> 
> On Mon, Jun 2, 2014 at 5:12 AM, Ngoc Dao <ng...@gmail.com> wrote:
> Alternative solution:
> https://github.com/xitrum-framework/xitrum-package
> 
> It collects all dependency .jar files in your Scala program into a
> directory. It doesn't merge the .jar files together, the .jar files
> are left "as is".
> 
> 
> On Sat, May 31, 2014 at 3:42 AM, Andrei <fa...@gmail.com> wrote:
> > Thanks, Stephen. I have eventually decided to go with assembly, but put away
> > Spark and Hadoop jars, and instead use `spark-submit` to automatically
> > provide these dependencies. This way no resource conflicts arise and
> > mergeStrategy needs no modification. To memorize this stable setup and also
> > share it with the community I've crafted a project [1] with minimal working
> > config. It is SBT project with assembly plugin, Spark 1.0 and Cloudera's
> > Hadoop client. Hope, it will help somebody to take Spark setup quicker.
> >
> > Though I'm fine with this setup for final builds, I'm still looking for a
> > more interactive dev setup - something that doesn't require full rebuild.
> >
> > [1]: https://github.com/faithlessfriend/sample-spark-project
> >
> > Thanks and have a good weekend,
> > Andrei
> >
> > On Thu, May 29, 2014 at 8:27 PM, Stephen Boesch <ja...@gmail.com> wrote:
> >>
> >>
> >> The MergeStrategy combined with sbt assembly did work for me.  This is not
> >> painless: some trial and error and the assembly may take multiple minutes.
> >>
> >> You will likely want to filter out some additional classes from the
> >> generated jar file.  Here is an SOF answer to explain that and with IMHO the
> >> best answer snippet included here (in this case the OP understandably did
> >> not want to not include javax.servlet.Servlet)
> >>
> >> http://stackoverflow.com/questions/7819066/sbt-exclude-class-from-jar
> >>
> >>
> >> mappings in (Compile,packageBin) ~= { (ms: Seq[(File, String)]) => ms
> >> filter { case (file, toPath) => toPath != "javax/servlet/Servlet.class" } }
> >>
> >> There is a setting to not include the project files in the assembly but I
> >> do not recall it at this moment.
> >>
> >>
> >>
> >> 2014-05-29 10:13 GMT-07:00 Andrei <fa...@gmail.com>:
> >>
> >>> Thanks, Jordi, your gist looks pretty much like what I have in my project
> >>> currently (with few exceptions that I'm going to borrow).
> >>>
> >>> I like the idea of using "sbt package", since it doesn't require third
> >>> party plugins and, most important, doesn't create a mess of classes and
> >>> resources. But in this case I'll have to handle jar list manually via Spark
> >>> context. Is there a way to automate this process? E.g. when I was a Clojure
> >>> guy, I could run "lein deps" (lein is a build tool similar to sbt) to
> >>> download all dependencies and then just enumerate them from my app. Maybe
> >>> you have heard of something like that for Spark/SBT?
> >>>
> >>> Thanks,
> >>> Andrei
> >>>
> >>>
> >>> On Thu, May 29, 2014 at 3:48 PM, jaranda <jo...@bsc.es> wrote:
> >>>>
> >>>> Hi Andrei,
> >>>>
> >>>> I think the preferred way to deploy Spark jobs is by using the sbt
> >>>> package
> >>>> task instead of using the sbt assembly plugin. In any case, as you
> >>>> comment,
> >>>> the mergeStrategy in combination with some dependency exlusions should
> >>>> fix
> >>>> your problems. Have a look at  this gist
> >>>> <https://gist.github.com/JordiAranda/bdbad58d128c14277a05>   for further
> >>>> details (I just followed some recommendations commented in the sbt
> >>>> assembly
> >>>> plugin documentation).
> >>>>
> >>>> Up to now I haven't found a proper way to combine my
> >>>> development/deployment
> >>>> phases, although I must say my experience in Spark is pretty poor (it
> >>>> really
> >>>> depends in your deployment requirements as well). In this case, I think
> >>>> someone else could give you some further insights.
> >>>>
> >>>> Best,
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> View this message in context:
> >>>> http://apache-spark-user-list.1001560.n3.nabble.com/Is-uberjar-a-recommended-way-of-running-Spark-Scala-applications-tp6518p6520.html
> >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >>>
> >>>
> >>
> >
> 


Re: Is uberjar a recommended way of running Spark/Scala applications?

Posted by Andrei <fa...@gmail.com>.
Thanks! This is even closer to what I am looking for. I'm in a trip now, so
I'm going to give it a try when I come back.


On Mon, Jun 2, 2014 at 5:12 AM, Ngoc Dao <ng...@gmail.com> wrote:

> Alternative solution:
> https://github.com/xitrum-framework/xitrum-package
>
> It collects all dependency .jar files in your Scala program into a
> directory. It doesn't merge the .jar files together, the .jar files
> are left "as is".
>
>
> On Sat, May 31, 2014 at 3:42 AM, Andrei <fa...@gmail.com> wrote:
> > Thanks, Stephen. I have eventually decided to go with assembly, but put
> away
> > Spark and Hadoop jars, and instead use `spark-submit` to automatically
> > provide these dependencies. This way no resource conflicts arise and
> > mergeStrategy needs no modification. To memorize this stable setup and
> also
> > share it with the community I've crafted a project [1] with minimal
> working
> > config. It is SBT project with assembly plugin, Spark 1.0 and Cloudera's
> > Hadoop client. Hope, it will help somebody to take Spark setup quicker.
> >
> > Though I'm fine with this setup for final builds, I'm still looking for a
> > more interactive dev setup - something that doesn't require full rebuild.
> >
> > [1]: https://github.com/faithlessfriend/sample-spark-project
> >
> > Thanks and have a good weekend,
> > Andrei
> >
> > On Thu, May 29, 2014 at 8:27 PM, Stephen Boesch <ja...@gmail.com>
> wrote:
> >>
> >>
> >> The MergeStrategy combined with sbt assembly did work for me.  This is
> not
> >> painless: some trial and error and the assembly may take multiple
> minutes.
> >>
> >> You will likely want to filter out some additional classes from the
> >> generated jar file.  Here is an SOF answer to explain that and with
> IMHO the
> >> best answer snippet included here (in this case the OP understandably
> did
> >> not want to not include javax.servlet.Servlet)
> >>
> >> http://stackoverflow.com/questions/7819066/sbt-exclude-class-from-jar
> >>
> >>
> >> mappings in (Compile,packageBin) ~= { (ms: Seq[(File, String)]) => ms
> >> filter { case (file, toPath) => toPath != "javax/servlet/Servlet.class"
> } }
> >>
> >> There is a setting to not include the project files in the assembly but
> I
> >> do not recall it at this moment.
> >>
> >>
> >>
> >> 2014-05-29 10:13 GMT-07:00 Andrei <fa...@gmail.com>:
> >>
> >>> Thanks, Jordi, your gist looks pretty much like what I have in my
> project
> >>> currently (with few exceptions that I'm going to borrow).
> >>>
> >>> I like the idea of using "sbt package", since it doesn't require third
> >>> party plugins and, most important, doesn't create a mess of classes and
> >>> resources. But in this case I'll have to handle jar list manually via
> Spark
> >>> context. Is there a way to automate this process? E.g. when I was a
> Clojure
> >>> guy, I could run "lein deps" (lein is a build tool similar to sbt) to
> >>> download all dependencies and then just enumerate them from my app.
> Maybe
> >>> you have heard of something like that for Spark/SBT?
> >>>
> >>> Thanks,
> >>> Andrei
> >>>
> >>>
> >>> On Thu, May 29, 2014 at 3:48 PM, jaranda <jo...@bsc.es> wrote:
> >>>>
> >>>> Hi Andrei,
> >>>>
> >>>> I think the preferred way to deploy Spark jobs is by using the sbt
> >>>> package
> >>>> task instead of using the sbt assembly plugin. In any case, as you
> >>>> comment,
> >>>> the mergeStrategy in combination with some dependency exlusions should
> >>>> fix
> >>>> your problems. Have a look at  this gist
> >>>> <https://gist.github.com/JordiAranda/bdbad58d128c14277a05>   for
> further
> >>>> details (I just followed some recommendations commented in the sbt
> >>>> assembly
> >>>> plugin documentation).
> >>>>
> >>>> Up to now I haven't found a proper way to combine my
> >>>> development/deployment
> >>>> phases, although I must say my experience in Spark is pretty poor (it
> >>>> really
> >>>> depends in your deployment requirements as well). In this case, I
> think
> >>>> someone else could give you some further insights.
> >>>>
> >>>> Best,
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> View this message in context:
> >>>>
> http://apache-spark-user-list.1001560.n3.nabble.com/Is-uberjar-a-recommended-way-of-running-Spark-Scala-applications-tp6518p6520.html
> >>>> Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
> >>>
> >>>
> >>
> >
>