You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "Yiming (John) Zhang" <sd...@gmail.com> on 2014/11/16 02:41:41 UTC

mvn or sbt for studying and developing Spark?

Hi,

 

I am new in developing Spark and my current focus is about co-scheduling of
spark tasks. However, I am confused with the building tools: sometimes the
documentation uses mvn but sometimes uses sbt. 

 

So, my question is that which one is the preferred tool of Spark community?
And what's the technical difference between them? Thank you!

 

Cheers,

Yiming


Re: mvn or sbt for studying and developing Spark?

Posted by Nicholas Chammas <ni...@gmail.com>.
The docs on using sbt are here:
https://github.com/apache/spark/blob/master/docs/building-spark.md#building-with-sbt

They'll be published with 1.2.0 presumably.
On 2014년 11월 17일 (월) at 오후 2:49 Michael Armbrust <mi...@databricks.com>
wrote:

> >
> > * I moved from sbt to maven in June specifically due to Andrew Or's
> > describing mvn as the default build tool.  Developers should keep in mind
> > that jenkins uses mvn so we need to run mvn before submitting PR's - even
> > if sbt were used for day to day dev work
> >
>
> To be clear, I think that the PR builder actually uses sbt
> <https://github.com/apache/spark/blob/master/dev/run-tests#L198>
> currently,
> but there are master builds that make sure maven doesn't break (amongst
> other things).
>
>
> > *  In addition, as Sean has alluded to, the Intellij seems to comprehend
> > the maven builds a bit more readily than sbt
> >
>
> Yeah, this is a very good point.  I have used `sbt/sbt gen-idea` in the
> past, but I'm currently using the maven integration of inteliJ since it
> seems more stable.
>
>
> > * But for command line and day to day dev purposes:  sbt sounds great to
> > use  Those sound bites you provided about exposing built-in test
> databases
> > for hive and for displaying available testcases are sweet.  Any
> > easy/convenient way to see "more of " those kinds of facilities available
> > through sbt ?
> >
>
> The Spark SQL developer readme
> <https://github.com/apache/spark/tree/master/sql> has a little bit of
> this,
> but we really should have some documentation on using SBT as well.
>
>  Integrating with those systems is generally easier if you are also working
> > with Spark in Maven.  (And I wouldn't classify all of those Maven-built
> > systems as "legacy", Michael :)
>
>
> Also a good point, though I've seen some pretty clever uses of sbt's
> external project references to link spark into other projects.  I'll
> certainly admit I have a bias towards new shiny things in general though,
> so my definition of legacy is probably skewed :)
>

Re: mvn or sbt for studying and developing Spark?

Posted by Michael Armbrust <mi...@databricks.com>.
>
> * I moved from sbt to maven in June specifically due to Andrew Or's
> describing mvn as the default build tool.  Developers should keep in mind
> that jenkins uses mvn so we need to run mvn before submitting PR's - even
> if sbt were used for day to day dev work
>

To be clear, I think that the PR builder actually uses sbt
<https://github.com/apache/spark/blob/master/dev/run-tests#L198> currently,
but there are master builds that make sure maven doesn't break (amongst
other things).


> *  In addition, as Sean has alluded to, the Intellij seems to comprehend
> the maven builds a bit more readily than sbt
>

Yeah, this is a very good point.  I have used `sbt/sbt gen-idea` in the
past, but I'm currently using the maven integration of inteliJ since it
seems more stable.


> * But for command line and day to day dev purposes:  sbt sounds great to
> use  Those sound bites you provided about exposing built-in test databases
> for hive and for displaying available testcases are sweet.  Any
> easy/convenient way to see "more of " those kinds of facilities available
> through sbt ?
>

The Spark SQL developer readme
<https://github.com/apache/spark/tree/master/sql> has a little bit of this,
but we really should have some documentation on using SBT as well.

 Integrating with those systems is generally easier if you are also working
> with Spark in Maven.  (And I wouldn't classify all of those Maven-built
> systems as "legacy", Michael :)


Also a good point, though I've seen some pretty clever uses of sbt's
external project references to link spark into other projects.  I'll
certainly admit I have a bias towards new shiny things in general though,
so my definition of legacy is probably skewed :)

Re: mvn or sbt for studying and developing Spark?

Posted by Stephen Boesch <ja...@gmail.com>.
HI Michael,
 That insight is useful.   Some thoughts:

* I moved from sbt to maven in June specifically due to Andrew Or's
describing mvn as the default build tool.  Developers should keep in mind
that jenkins uses mvn so we need to run mvn before submitting PR's - even
if sbt were used for day to day dev work
*  In addition, as Sean has alluded to, the Intellij seems to comprehend
the maven builds a bit more readily than sbt
* But for command line and day to day dev purposes:  sbt sounds great to
use  Those sound bites you provided about exposing built-in test databases
for hive and for displaying available testcases are sweet.  Any
easy/convenient way to see "more of " those kinds of facilities available
through sbt ?


2014-11-16 13:23 GMT-08:00 Michael Armbrust <mi...@databricks.com>:

> I'm going to have to disagree here.  If you are building a release
> distribution or integrating with legacy systems then maven is probably the
> correct choice.  However most of the core developers that I know use sbt,
> and I think its a better choice for exploration and development overall.
> That said, this probably falls into the category of a religious argument so
> you might want to look at both options and decide for yourself.
>
> In my experience the SBT build is significantly faster with less effort
> (and I think sbt is still faster even if you go through the extra effort of
> installing zinc) and easier to read.  The console mode of sbt (just run
> sbt/sbt and then a long running console session is started that will accept
> further commands) is great for building individual subprojects or running
> single test suites.  In addition to being faster since its a long running
> JVM, its got a lot of nice features like tab-completion for test case
> names.
>
> For example, if I wanted to see what test cases are available in the SQL
> subproject you can do the following:
>
> [marmbrus@michaels-mbp spark (tpcds)]$ sbt/sbt
> [info] Loading project definition from
> /Users/marmbrus/workspace/spark/project/project
> [info] Loading project definition from
>
> /Users/marmbrus/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project
> [info] Set current project to spark-parent (in build
> file:/Users/marmbrus/workspace/spark/)
> > sql/test-only *<tab>*
> --
>  org.apache.spark.sql.CachedTableSuite
> org.apache.spark.sql.DataTypeSuite
>  org.apache.spark.sql.DslQuerySuite
> org.apache.spark.sql.InsertIntoSuite
> ...
>
> Another very useful feature is the development console, which starts an
> interactive REPL including the most recent version of the code and a lot of
> useful imports for some subprojects.  For example in the hive subproject it
> automatically sets up a temporary database with a bunch of test data
> pre-loaded:
>
> $ sbt/sbt hive/console
> > hive/console
> ...
> import org.apache.spark.sql.hive._
> import org.apache.spark.sql.hive.test.TestHive._
> import org.apache.spark.sql.parquet.ParquetTestData
> Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.7.0_45).
> Type in expressions to have them evaluated.
> Type :help for more information.
>
> scala> sql("SELECT * FROM src").take(2)
> res0: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86])
>
> Michael
>
> On Sun, Nov 16, 2014 at 3:27 AM, Dinesh J. Weerakkody <
> dineshjweerakkody@gmail.com> wrote:
>
> > Hi Stephen and Sean,
> >
> > Thanks for correction.
> >
> > On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen <so...@cloudera.com> wrote:
> >
> > > No, the Maven build is the main one.  I would use it unless you have a
> > > need to use the SBT build in particular.
> > > On Nov 16, 2014 2:58 AM, "Dinesh J. Weerakkody" <
> > > dineshjweerakkody@gmail.com> wrote:
> > >
> > >> Hi Yiming,
> > >>
> > >> I believe that both SBT and MVN is supported in SPARK, but SBT is
> > >> preferred
> > >> (I'm not 100% sure about this :) ). When I'm using MVN I got some
> build
> > >> failures. After that used SBT and works fine.
> > >>
> > >> You can go through these discussions regarding SBT vs MVN and learn
> pros
> > >> and cons of both [1] [2].
> > >>
> > >> [1]
> > >>
> > >>
> >
> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html
> > >>
> > >> [2]
> > >>
> > >>
> >
> https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ
> > >>
> > >> Thanks,
> > >>
> > >> On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang <
> sdiris@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> >
> > >> >
> > >> > I am new in developing Spark and my current focus is about
> > >> co-scheduling of
> > >> > spark tasks. However, I am confused with the building tools:
> sometimes
> > >> the
> > >> > documentation uses mvn but sometimes uses sbt.
> > >> >
> > >> >
> > >> >
> > >> > So, my question is that which one is the preferred tool of Spark
> > >> community?
> > >> > And what's the technical difference between them? Thank you!
> > >> >
> > >> >
> > >> >
> > >> > Cheers,
> > >> >
> > >> > Yiming
> > >> >
> > >> >
> > >>
> > >>
> > >> --
> > >> Thanks & Best Regards,
> > >>
> > >> *Dinesh J. Weerakkody*
> > >>
> > >
> >
> >
> > --
> > Thanks & Best Regards,
> >
> > *Dinesh J. Weerakkody*
> >
>

Re: mvn or sbt for studying and developing Spark?

Posted by Patrick Wendell <pw...@gmail.com>.
Neither is strictly optimal which is why we ended up supporting both.
Our reference build for packaging is Maven so you are less likely to
run into unexpected dependency issues, etc. Many developers use sbt as
well. It's somewhat religion and the best thing might be to try both
and see which you prefer.

- Patrick

On Sun, Nov 16, 2014 at 1:47 PM, Mark Hamstra <ma...@clearstorydata.com> wrote:
>>
>> The console mode of sbt (just run
>> sbt/sbt and then a long running console session is started that will accept
>> further commands) is great for building individual subprojects or running
>> single test suites.  In addition to being faster since its a long running
>> JVM, its got a lot of nice features like tab-completion for test case
>> names.
>
>
> We include the scala-maven-plugin in spark/pom.xml, so equivalent
> functionality is available using Maven.  You can start a console session
> with `mvn scala:console`.
>
>
> On Sun, Nov 16, 2014 at 1:23 PM, Michael Armbrust <mi...@databricks.com>
> wrote:
>
>> I'm going to have to disagree here.  If you are building a release
>> distribution or integrating with legacy systems then maven is probably the
>> correct choice.  However most of the core developers that I know use sbt,
>> and I think its a better choice for exploration and development overall.
>> That said, this probably falls into the category of a religious argument so
>> you might want to look at both options and decide for yourself.
>>
>> In my experience the SBT build is significantly faster with less effort
>> (and I think sbt is still faster even if you go through the extra effort of
>> installing zinc) and easier to read.  The console mode of sbt (just run
>> sbt/sbt and then a long running console session is started that will accept
>> further commands) is great for building individual subprojects or running
>> single test suites.  In addition to being faster since its a long running
>> JVM, its got a lot of nice features like tab-completion for test case
>> names.
>>
>> For example, if I wanted to see what test cases are available in the SQL
>> subproject you can do the following:
>>
>> [marmbrus@michaels-mbp spark (tpcds)]$ sbt/sbt
>> [info] Loading project definition from
>> /Users/marmbrus/workspace/spark/project/project
>> [info] Loading project definition from
>>
>> /Users/marmbrus/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project
>> [info] Set current project to spark-parent (in build
>> file:/Users/marmbrus/workspace/spark/)
>> > sql/test-only *<tab>*
>> --
>>  org.apache.spark.sql.CachedTableSuite
>> org.apache.spark.sql.DataTypeSuite
>>  org.apache.spark.sql.DslQuerySuite
>> org.apache.spark.sql.InsertIntoSuite
>> ...
>>
>> Another very useful feature is the development console, which starts an
>> interactive REPL including the most recent version of the code and a lot of
>> useful imports for some subprojects.  For example in the hive subproject it
>> automatically sets up a temporary database with a bunch of test data
>> pre-loaded:
>>
>> $ sbt/sbt hive/console
>> > hive/console
>> ...
>> import org.apache.spark.sql.hive._
>> import org.apache.spark.sql.hive.test.TestHive._
>> import org.apache.spark.sql.parquet.ParquetTestData
>> Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
>> 1.7.0_45).
>> Type in expressions to have them evaluated.
>> Type :help for more information.
>>
>> scala> sql("SELECT * FROM src").take(2)
>> res0: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86])
>>
>> Michael
>>
>> On Sun, Nov 16, 2014 at 3:27 AM, Dinesh J. Weerakkody <
>> dineshjweerakkody@gmail.com> wrote:
>>
>> > Hi Stephen and Sean,
>> >
>> > Thanks for correction.
>> >
>> > On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen <so...@cloudera.com> wrote:
>> >
>> > > No, the Maven build is the main one.  I would use it unless you have a
>> > > need to use the SBT build in particular.
>> > > On Nov 16, 2014 2:58 AM, "Dinesh J. Weerakkody" <
>> > > dineshjweerakkody@gmail.com> wrote:
>> > >
>> > >> Hi Yiming,
>> > >>
>> > >> I believe that both SBT and MVN is supported in SPARK, but SBT is
>> > >> preferred
>> > >> (I'm not 100% sure about this :) ). When I'm using MVN I got some
>> build
>> > >> failures. After that used SBT and works fine.
>> > >>
>> > >> You can go through these discussions regarding SBT vs MVN and learn
>> pros
>> > >> and cons of both [1] [2].
>> > >>
>> > >> [1]
>> > >>
>> > >>
>> >
>> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html
>> > >>
>> > >> [2]
>> > >>
>> > >>
>> >
>> https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ
>> > >>
>> > >> Thanks,
>> > >>
>> > >> On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang <
>> sdiris@gmail.com>
>> > >> wrote:
>> > >>
>> > >> > Hi,
>> > >> >
>> > >> >
>> > >> >
>> > >> > I am new in developing Spark and my current focus is about
>> > >> co-scheduling of
>> > >> > spark tasks. However, I am confused with the building tools:
>> sometimes
>> > >> the
>> > >> > documentation uses mvn but sometimes uses sbt.
>> > >> >
>> > >> >
>> > >> >
>> > >> > So, my question is that which one is the preferred tool of Spark
>> > >> community?
>> > >> > And what's the technical difference between them? Thank you!
>> > >> >
>> > >> >
>> > >> >
>> > >> > Cheers,
>> > >> >
>> > >> > Yiming
>> > >> >
>> > >> >
>> > >>
>> > >>
>> > >> --
>> > >> Thanks & Best Regards,
>> > >>
>> > >> *Dinesh J. Weerakkody*
>> > >>
>> > >
>> >
>> >
>> > --
>> > Thanks & Best Regards,
>> >
>> > *Dinesh J. Weerakkody*
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: mvn or sbt for studying and developing Spark?

Posted by Mark Hamstra <ma...@clearstorydata.com>.
Ok, strictly speaking, that's equivalent to your second class of
examples, "development
console", not the first "sbt console"

On Sun, Nov 16, 2014 at 1:47 PM, Mark Hamstra <ma...@clearstorydata.com>
wrote:

> The console mode of sbt (just run
>> sbt/sbt and then a long running console session is started that will
>> accept
>> further commands) is great for building individual subprojects or running
>> single test suites.  In addition to being faster since its a long running
>> JVM, its got a lot of nice features like tab-completion for test case
>> names.
>
>
> We include the scala-maven-plugin in spark/pom.xml, so equivalent
> functionality is available using Maven.  You can start a console session
> with `mvn scala:console`.
>
>
> On Sun, Nov 16, 2014 at 1:23 PM, Michael Armbrust <mi...@databricks.com>
> wrote:
>
>> I'm going to have to disagree here.  If you are building a release
>> distribution or integrating with legacy systems then maven is probably the
>> correct choice.  However most of the core developers that I know use sbt,
>> and I think its a better choice for exploration and development overall.
>> That said, this probably falls into the category of a religious argument
>> so
>> you might want to look at both options and decide for yourself.
>>
>> In my experience the SBT build is significantly faster with less effort
>> (and I think sbt is still faster even if you go through the extra effort
>> of
>> installing zinc) and easier to read.  The console mode of sbt (just run
>> sbt/sbt and then a long running console session is started that will
>> accept
>> further commands) is great for building individual subprojects or running
>> single test suites.  In addition to being faster since its a long running
>> JVM, its got a lot of nice features like tab-completion for test case
>> names.
>>
>> For example, if I wanted to see what test cases are available in the SQL
>> subproject you can do the following:
>>
>> [marmbrus@michaels-mbp spark (tpcds)]$ sbt/sbt
>> [info] Loading project definition from
>> /Users/marmbrus/workspace/spark/project/project
>> [info] Loading project definition from
>>
>> /Users/marmbrus/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project
>> [info] Set current project to spark-parent (in build
>> file:/Users/marmbrus/workspace/spark/)
>> > sql/test-only *<tab>*
>> --
>>  org.apache.spark.sql.CachedTableSuite
>> org.apache.spark.sql.DataTypeSuite
>>  org.apache.spark.sql.DslQuerySuite
>> org.apache.spark.sql.InsertIntoSuite
>> ...
>>
>> Another very useful feature is the development console, which starts an
>> interactive REPL including the most recent version of the code and a lot
>> of
>> useful imports for some subprojects.  For example in the hive subproject
>> it
>> automatically sets up a temporary database with a bunch of test data
>> pre-loaded:
>>
>> $ sbt/sbt hive/console
>> > hive/console
>> ...
>> import org.apache.spark.sql.hive._
>> import org.apache.spark.sql.hive.test.TestHive._
>> import org.apache.spark.sql.parquet.ParquetTestData
>> Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
>> 1.7.0_45).
>> Type in expressions to have them evaluated.
>> Type :help for more information.
>>
>> scala> sql("SELECT * FROM src").take(2)
>> res0: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86])
>>
>> Michael
>>
>> On Sun, Nov 16, 2014 at 3:27 AM, Dinesh J. Weerakkody <
>> dineshjweerakkody@gmail.com> wrote:
>>
>> > Hi Stephen and Sean,
>> >
>> > Thanks for correction.
>> >
>> > On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen <so...@cloudera.com> wrote:
>> >
>> > > No, the Maven build is the main one.  I would use it unless you have a
>> > > need to use the SBT build in particular.
>> > > On Nov 16, 2014 2:58 AM, "Dinesh J. Weerakkody" <
>> > > dineshjweerakkody@gmail.com> wrote:
>> > >
>> > >> Hi Yiming,
>> > >>
>> > >> I believe that both SBT and MVN is supported in SPARK, but SBT is
>> > >> preferred
>> > >> (I'm not 100% sure about this :) ). When I'm using MVN I got some
>> build
>> > >> failures. After that used SBT and works fine.
>> > >>
>> > >> You can go through these discussions regarding SBT vs MVN and learn
>> pros
>> > >> and cons of both [1] [2].
>> > >>
>> > >> [1]
>> > >>
>> > >>
>> >
>> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html
>> > >>
>> > >> [2]
>> > >>
>> > >>
>> >
>> https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ
>> > >>
>> > >> Thanks,
>> > >>
>> > >> On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang <
>> sdiris@gmail.com>
>> > >> wrote:
>> > >>
>> > >> > Hi,
>> > >> >
>> > >> >
>> > >> >
>> > >> > I am new in developing Spark and my current focus is about
>> > >> co-scheduling of
>> > >> > spark tasks. However, I am confused with the building tools:
>> sometimes
>> > >> the
>> > >> > documentation uses mvn but sometimes uses sbt.
>> > >> >
>> > >> >
>> > >> >
>> > >> > So, my question is that which one is the preferred tool of Spark
>> > >> community?
>> > >> > And what's the technical difference between them? Thank you!
>> > >> >
>> > >> >
>> > >> >
>> > >> > Cheers,
>> > >> >
>> > >> > Yiming
>> > >> >
>> > >> >
>> > >>
>> > >>
>> > >> --
>> > >> Thanks & Best Regards,
>> > >>
>> > >> *Dinesh J. Weerakkody*
>> > >>
>> > >
>> >
>> >
>> > --
>> > Thanks & Best Regards,
>> >
>> > *Dinesh J. Weerakkody*
>> >
>>
>
>

Re: mvn or sbt for studying and developing Spark?

Posted by Mark Hamstra <ma...@clearstorydata.com>.
>
> The console mode of sbt (just run
> sbt/sbt and then a long running console session is started that will accept
> further commands) is great for building individual subprojects or running
> single test suites.  In addition to being faster since its a long running
> JVM, its got a lot of nice features like tab-completion for test case
> names.


We include the scala-maven-plugin in spark/pom.xml, so equivalent
functionality is available using Maven.  You can start a console session
with `mvn scala:console`.


On Sun, Nov 16, 2014 at 1:23 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> I'm going to have to disagree here.  If you are building a release
> distribution or integrating with legacy systems then maven is probably the
> correct choice.  However most of the core developers that I know use sbt,
> and I think its a better choice for exploration and development overall.
> That said, this probably falls into the category of a religious argument so
> you might want to look at both options and decide for yourself.
>
> In my experience the SBT build is significantly faster with less effort
> (and I think sbt is still faster even if you go through the extra effort of
> installing zinc) and easier to read.  The console mode of sbt (just run
> sbt/sbt and then a long running console session is started that will accept
> further commands) is great for building individual subprojects or running
> single test suites.  In addition to being faster since its a long running
> JVM, its got a lot of nice features like tab-completion for test case
> names.
>
> For example, if I wanted to see what test cases are available in the SQL
> subproject you can do the following:
>
> [marmbrus@michaels-mbp spark (tpcds)]$ sbt/sbt
> [info] Loading project definition from
> /Users/marmbrus/workspace/spark/project/project
> [info] Loading project definition from
>
> /Users/marmbrus/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project
> [info] Set current project to spark-parent (in build
> file:/Users/marmbrus/workspace/spark/)
> > sql/test-only *<tab>*
> --
>  org.apache.spark.sql.CachedTableSuite
> org.apache.spark.sql.DataTypeSuite
>  org.apache.spark.sql.DslQuerySuite
> org.apache.spark.sql.InsertIntoSuite
> ...
>
> Another very useful feature is the development console, which starts an
> interactive REPL including the most recent version of the code and a lot of
> useful imports for some subprojects.  For example in the hive subproject it
> automatically sets up a temporary database with a bunch of test data
> pre-loaded:
>
> $ sbt/sbt hive/console
> > hive/console
> ...
> import org.apache.spark.sql.hive._
> import org.apache.spark.sql.hive.test.TestHive._
> import org.apache.spark.sql.parquet.ParquetTestData
> Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.7.0_45).
> Type in expressions to have them evaluated.
> Type :help for more information.
>
> scala> sql("SELECT * FROM src").take(2)
> res0: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86])
>
> Michael
>
> On Sun, Nov 16, 2014 at 3:27 AM, Dinesh J. Weerakkody <
> dineshjweerakkody@gmail.com> wrote:
>
> > Hi Stephen and Sean,
> >
> > Thanks for correction.
> >
> > On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen <so...@cloudera.com> wrote:
> >
> > > No, the Maven build is the main one.  I would use it unless you have a
> > > need to use the SBT build in particular.
> > > On Nov 16, 2014 2:58 AM, "Dinesh J. Weerakkody" <
> > > dineshjweerakkody@gmail.com> wrote:
> > >
> > >> Hi Yiming,
> > >>
> > >> I believe that both SBT and MVN is supported in SPARK, but SBT is
> > >> preferred
> > >> (I'm not 100% sure about this :) ). When I'm using MVN I got some
> build
> > >> failures. After that used SBT and works fine.
> > >>
> > >> You can go through these discussions regarding SBT vs MVN and learn
> pros
> > >> and cons of both [1] [2].
> > >>
> > >> [1]
> > >>
> > >>
> >
> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html
> > >>
> > >> [2]
> > >>
> > >>
> >
> https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ
> > >>
> > >> Thanks,
> > >>
> > >> On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang <
> sdiris@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> >
> > >> >
> > >> > I am new in developing Spark and my current focus is about
> > >> co-scheduling of
> > >> > spark tasks. However, I am confused with the building tools:
> sometimes
> > >> the
> > >> > documentation uses mvn but sometimes uses sbt.
> > >> >
> > >> >
> > >> >
> > >> > So, my question is that which one is the preferred tool of Spark
> > >> community?
> > >> > And what's the technical difference between them? Thank you!
> > >> >
> > >> >
> > >> >
> > >> > Cheers,
> > >> >
> > >> > Yiming
> > >> >
> > >> >
> > >>
> > >>
> > >> --
> > >> Thanks & Best Regards,
> > >>
> > >> *Dinesh J. Weerakkody*
> > >>
> > >
> >
> >
> > --
> > Thanks & Best Regards,
> >
> > *Dinesh J. Weerakkody*
> >
>

Re: mvn or sbt for studying and developing Spark?

Posted by Sean Owen <so...@cloudera.com>.
Yeah, my comment was mostly reflecting the fact that mvn is what
creates the releases and is the 'build of reference', from which the
SBT build is generated. The docs were recently changed to suggest that
Maven is the default build and SBT is for advanced users. I find Maven
plays nicer with IDEs, or at least, IntelliJ.

SBT is faster for incremental compilation and better for anyone who
knows and can leverage SBT's model.

If someone's new to it all, I dunno, they're likelier to have fewer
problems using Maven to start? YMMV.

On Sun, Nov 16, 2014 at 9:23 PM, Michael Armbrust
<mi...@databricks.com> wrote:
> I'm going to have to disagree here.  If you are building a release
> distribution or integrating with legacy systems then maven is probably the
> correct choice.  However most of the core developers that I know use sbt,
> and I think its a better choice for exploration and development overall.
> That said, this probably falls into the category of a religious argument so
> you might want to look at both options and decide for yourself.
>
> In my experience the SBT build is significantly faster with less effort (and
> I think sbt is still faster even if you go through the extra effort of
> installing zinc) and easier to read.  The console mode of sbt (just run
> sbt/sbt and then a long running console session is started that will accept
> further commands) is great for building individual subprojects or running
> single test suites.  In addition to being faster since its a long running
> JVM, its got a lot of nice features like tab-completion for test case names.
>
> For example, if I wanted to see what test cases are available in the SQL
> subproject you can do the following:
>
> [marmbrus@michaels-mbp spark (tpcds)]$ sbt/sbt
> [info] Loading project definition from
> /Users/marmbrus/workspace/spark/project/project
> [info] Loading project definition from
> /Users/marmbrus/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project
> [info] Set current project to spark-parent (in build
> file:/Users/marmbrus/workspace/spark/)
>> sql/test-only <tab>
> --
> org.apache.spark.sql.CachedTableSuite
> org.apache.spark.sql.DataTypeSuite
> org.apache.spark.sql.DslQuerySuite
> org.apache.spark.sql.InsertIntoSuite
> ...
>
> Another very useful feature is the development console, which starts an
> interactive REPL including the most recent version of the code and a lot of
> useful imports for some subprojects.  For example in the hive subproject it
> automatically sets up a temporary database with a bunch of test data
> pre-loaded:
>
> $ sbt/sbt hive/console
>> hive/console
> ...
> import org.apache.spark.sql.hive._
> import org.apache.spark.sql.hive.test.TestHive._
> import org.apache.spark.sql.parquet.ParquetTestData
> Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.7.0_45).
> Type in expressions to have them evaluated.
> Type :help for more information.
>
> scala> sql("SELECT * FROM src").take(2)
> res0: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86])
>
> Michael
>
> On Sun, Nov 16, 2014 at 3:27 AM, Dinesh J. Weerakkody
> <di...@gmail.com> wrote:
>>
>> Hi Stephen and Sean,
>>
>> Thanks for correction.
>>
>> On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen <so...@cloudera.com> wrote:
>>
>> > No, the Maven build is the main one.  I would use it unless you have a
>> > need to use the SBT build in particular.
>> > On Nov 16, 2014 2:58 AM, "Dinesh J. Weerakkody" <
>> > dineshjweerakkody@gmail.com> wrote:
>> >
>> >> Hi Yiming,
>> >>
>> >> I believe that both SBT and MVN is supported in SPARK, but SBT is
>> >> preferred
>> >> (I'm not 100% sure about this :) ). When I'm using MVN I got some build
>> >> failures. After that used SBT and works fine.
>> >>
>> >> You can go through these discussions regarding SBT vs MVN and learn
>> >> pros
>> >> and cons of both [1] [2].
>> >>
>> >> [1]
>> >>
>> >>
>> >> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html
>> >>
>> >> [2]
>> >>
>> >>
>> >> https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ
>> >>
>> >> Thanks,
>> >>
>> >> On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang <sd...@gmail.com>
>> >> wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> >
>> >> >
>> >> > I am new in developing Spark and my current focus is about
>> >> co-scheduling of
>> >> > spark tasks. However, I am confused with the building tools:
>> >> > sometimes
>> >> the
>> >> > documentation uses mvn but sometimes uses sbt.
>> >> >
>> >> >
>> >> >
>> >> > So, my question is that which one is the preferred tool of Spark
>> >> community?
>> >> > And what's the technical difference between them? Thank you!
>> >> >
>> >> >
>> >> >
>> >> > Cheers,
>> >> >
>> >> > Yiming
>> >> >
>> >> >
>> >>
>> >>
>> >> --
>> >> Thanks & Best Regards,
>> >>
>> >> *Dinesh J. Weerakkody*
>> >>
>> >
>>
>>
>> --
>> Thanks & Best Regards,
>>
>> *Dinesh J. Weerakkody*
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: mvn or sbt for studying and developing Spark?

Posted by Michael Armbrust <mi...@databricks.com>.
I'm going to have to disagree here.  If you are building a release
distribution or integrating with legacy systems then maven is probably the
correct choice.  However most of the core developers that I know use sbt,
and I think its a better choice for exploration and development overall.
That said, this probably falls into the category of a religious argument so
you might want to look at both options and decide for yourself.

In my experience the SBT build is significantly faster with less effort
(and I think sbt is still faster even if you go through the extra effort of
installing zinc) and easier to read.  The console mode of sbt (just run
sbt/sbt and then a long running console session is started that will accept
further commands) is great for building individual subprojects or running
single test suites.  In addition to being faster since its a long running
JVM, its got a lot of nice features like tab-completion for test case names.

For example, if I wanted to see what test cases are available in the SQL
subproject you can do the following:

[marmbrus@michaels-mbp spark (tpcds)]$ sbt/sbt
[info] Loading project definition from
/Users/marmbrus/workspace/spark/project/project
[info] Loading project definition from
/Users/marmbrus/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project
[info] Set current project to spark-parent (in build
file:/Users/marmbrus/workspace/spark/)
> sql/test-only *<tab>*
--
 org.apache.spark.sql.CachedTableSuite
org.apache.spark.sql.DataTypeSuite
 org.apache.spark.sql.DslQuerySuite
org.apache.spark.sql.InsertIntoSuite
...

Another very useful feature is the development console, which starts an
interactive REPL including the most recent version of the code and a lot of
useful imports for some subprojects.  For example in the hive subproject it
automatically sets up a temporary database with a bunch of test data
pre-loaded:

$ sbt/sbt hive/console
> hive/console
...
import org.apache.spark.sql.hive._
import org.apache.spark.sql.hive.test.TestHive._
import org.apache.spark.sql.parquet.ParquetTestData
Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
1.7.0_45).
Type in expressions to have them evaluated.
Type :help for more information.

scala> sql("SELECT * FROM src").take(2)
res0: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86])

Michael

On Sun, Nov 16, 2014 at 3:27 AM, Dinesh J. Weerakkody <
dineshjweerakkody@gmail.com> wrote:

> Hi Stephen and Sean,
>
> Thanks for correction.
>
> On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen <so...@cloudera.com> wrote:
>
> > No, the Maven build is the main one.  I would use it unless you have a
> > need to use the SBT build in particular.
> > On Nov 16, 2014 2:58 AM, "Dinesh J. Weerakkody" <
> > dineshjweerakkody@gmail.com> wrote:
> >
> >> Hi Yiming,
> >>
> >> I believe that both SBT and MVN is supported in SPARK, but SBT is
> >> preferred
> >> (I'm not 100% sure about this :) ). When I'm using MVN I got some build
> >> failures. After that used SBT and works fine.
> >>
> >> You can go through these discussions regarding SBT vs MVN and learn pros
> >> and cons of both [1] [2].
> >>
> >> [1]
> >>
> >>
> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html
> >>
> >> [2]
> >>
> >>
> https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ
> >>
> >> Thanks,
> >>
> >> On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang <sd...@gmail.com>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> >
> >> >
> >> > I am new in developing Spark and my current focus is about
> >> co-scheduling of
> >> > spark tasks. However, I am confused with the building tools: sometimes
> >> the
> >> > documentation uses mvn but sometimes uses sbt.
> >> >
> >> >
> >> >
> >> > So, my question is that which one is the preferred tool of Spark
> >> community?
> >> > And what's the technical difference between them? Thank you!
> >> >
> >> >
> >> >
> >> > Cheers,
> >> >
> >> > Yiming
> >> >
> >> >
> >>
> >>
> >> --
> >> Thanks & Best Regards,
> >>
> >> *Dinesh J. Weerakkody*
> >>
> >
>
>
> --
> Thanks & Best Regards,
>
> *Dinesh J. Weerakkody*
>

Re: mvn or sbt for studying and developing Spark?

Posted by "Dinesh J. Weerakkody" <di...@gmail.com>.
Hi Stephen and Sean,

Thanks for correction.

On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen <so...@cloudera.com> wrote:

> No, the Maven build is the main one.  I would use it unless you have a
> need to use the SBT build in particular.
> On Nov 16, 2014 2:58 AM, "Dinesh J. Weerakkody" <
> dineshjweerakkody@gmail.com> wrote:
>
>> Hi Yiming,
>>
>> I believe that both SBT and MVN is supported in SPARK, but SBT is
>> preferred
>> (I'm not 100% sure about this :) ). When I'm using MVN I got some build
>> failures. After that used SBT and works fine.
>>
>> You can go through these discussions regarding SBT vs MVN and learn pros
>> and cons of both [1] [2].
>>
>> [1]
>>
>> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html
>>
>> [2]
>>
>> https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ
>>
>> Thanks,
>>
>> On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang <sd...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> >
>> >
>> > I am new in developing Spark and my current focus is about
>> co-scheduling of
>> > spark tasks. However, I am confused with the building tools: sometimes
>> the
>> > documentation uses mvn but sometimes uses sbt.
>> >
>> >
>> >
>> > So, my question is that which one is the preferred tool of Spark
>> community?
>> > And what's the technical difference between them? Thank you!
>> >
>> >
>> >
>> > Cheers,
>> >
>> > Yiming
>> >
>> >
>>
>>
>> --
>> Thanks & Best Regards,
>>
>> *Dinesh J. Weerakkody*
>>
>


-- 
Thanks & Best Regards,

*Dinesh J. Weerakkody*

Re: mvn or sbt for studying and developing Spark?

Posted by Sean Owen <so...@cloudera.com>.
No, the Maven build is the main one.  I would use it unless you have a need
to use the SBT build in particular.
On Nov 16, 2014 2:58 AM, "Dinesh J. Weerakkody" <di...@gmail.com>
wrote:

> Hi Yiming,
>
> I believe that both SBT and MVN is supported in SPARK, but SBT is preferred
> (I'm not 100% sure about this :) ). When I'm using MVN I got some build
> failures. After that used SBT and works fine.
>
> You can go through these discussions regarding SBT vs MVN and learn pros
> and cons of both [1] [2].
>
> [1]
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html
>
> [2]
>
> https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ
>
> Thanks,
>
> On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang <sd...@gmail.com>
> wrote:
>
> > Hi,
> >
> >
> >
> > I am new in developing Spark and my current focus is about co-scheduling
> of
> > spark tasks. However, I am confused with the building tools: sometimes
> the
> > documentation uses mvn but sometimes uses sbt.
> >
> >
> >
> > So, my question is that which one is the preferred tool of Spark
> community?
> > And what's the technical difference between them? Thank you!
> >
> >
> >
> > Cheers,
> >
> > Yiming
> >
> >
>
>
> --
> Thanks & Best Regards,
>
> *Dinesh J. Weerakkody*
>

Re: mvn or sbt for studying and developing Spark?

Posted by "Dinesh J. Weerakkody" <di...@gmail.com>.
Hi Yiming,

I believe that both SBT and MVN is supported in SPARK, but SBT is preferred
(I'm not 100% sure about this :) ). When I'm using MVN I got some build
failures. After that used SBT and works fine.

You can go through these discussions regarding SBT vs MVN and learn pros
and cons of both [1] [2].

[1]
http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html

[2]
https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ

Thanks,

On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang <sd...@gmail.com>
wrote:

> Hi,
>
>
>
> I am new in developing Spark and my current focus is about co-scheduling of
> spark tasks. However, I am confused with the building tools: sometimes the
> documentation uses mvn but sometimes uses sbt.
>
>
>
> So, my question is that which one is the preferred tool of Spark community?
> And what's the technical difference between them? Thank you!
>
>
>
> Cheers,
>
> Yiming
>
>


-- 
Thanks & Best Regards,

*Dinesh J. Weerakkody*