You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Nicholas Chammas <ni...@gmail.com> on 2014/08/08 18:01:57 UTC

Unit tests in < 5 minutes

Howdy,

Do we think it's both feasible and worthwhile to invest in getting our unit
tests to finish in under 5 minutes (or something similarly brief) when run
by Jenkins?

Unit tests currently seem to take anywhere from 30 min to 2 hours. As
people add more tests, I imagine this time will only grow. I think it would
be better for both contributors and reviewers if they didn't have to wait
so long for test results; PR reviews would be shorter, if nothing else.

I don't know how how this is normally done, but maybe it wouldn't be too
much work to get a test cycle to feel lighter.

Most unit tests are independent and can be run concurrently, right? Would
it make sense to build a given patch on many servers at once and send
disjoint sets of unit tests to each?

I'd be interested in working on something like that if possible (and
sensible).

Nick

Re: Unit tests in < 5 minutes

Posted by Mridul Muralidharan <mr...@gmail.com>.
Issue with supporting this imo is the fact that scala-test uses the
same vm for all the tests (surefire plugin supports fork, but
scala-test ignores it iirc).
So different tests would initialize different spark context, and can
potentially step on each others toes.

Regards,
Mridul


On Fri, Aug 8, 2014 at 9:31 PM, Nicholas Chammas
<ni...@gmail.com> wrote:
> Howdy,
>
> Do we think it's both feasible and worthwhile to invest in getting our unit
> tests to finish in under 5 minutes (or something similarly brief) when run
> by Jenkins?
>
> Unit tests currently seem to take anywhere from 30 min to 2 hours. As
> people add more tests, I imagine this time will only grow. I think it would
> be better for both contributors and reviewers if they didn't have to wait
> so long for test results; PR reviews would be shorter, if nothing else.
>
> I don't know how how this is normally done, but maybe it wouldn't be too
> much work to get a test cycle to feel lighter.
>
> Most unit tests are independent and can be run concurrently, right? Would
> it make sense to build a given patch on many servers at once and send
> disjoint sets of unit tests to each?
>
> I'd be interested in working on something like that if possible (and
> sensible).
>
> Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Unit tests in < 5 minutes

Posted by Patrick Wendell <pw...@gmail.com>.
Josh - that was actually fixed recently (we just bind to a random port
when running tests).

On Fri, Aug 8, 2014 at 12:00 PM, Josh Rosen <ro...@gmail.com> wrote:
> One simple optimization might be to disable the application web UI in tests
> that don't need it.  When running tests on my local machine while also
> running another Spark shell, I've noticed that the test logs fill up with
> errors when the web UI attempts to bind to the default port, fails, and
> tries a higher one.
>
> - Josh
>
> On August 8, 2014 at 11:54:24 AM, Patrick Wendell (pwendell@gmail.com)
> wrote:
>
> I dug around this a bit a while ago, I think if someone sat down and
> profiled the tests it's likely we could find some things to optimize.
> In particular, there may be overheads in starting up a local spark
> context that could be minimized and speed up all the tests. Also,
> there are some tests (especially in Streaming) that take really long,
> like 60 seconds for a single test (see some of the new flume tests).
> These could almost certainly be optimized.
>
> I think 5 minutes might be out of reach, but something like a 2X
> improvement might be possible and would be very valuable if
> accomplished.
>
> - Patrick
>
> On Fri, Aug 8, 2014 at 11:24 AM, Matei Zaharia <ma...@gmail.com>
> wrote:
>> Just as a note, when you're developing stuff, you can use "test-only" in
>> sbt, or the equivalent feature in Maven, to run just some of the tests. This
>> is what I do, I don't wait for Jenkins to run things. 90% of the time if it
>> passes the tests that I know could break stuff, it will pass all of Jenkins.
>>
>> Jenkins should always be doing all the integration tests, so I don't think
>> it will become *that* much shorter in the long run, though it can certainly
>> be improved.
>>
>> Matei
>>
>> On August 8, 2014 at 10:20:35 AM, Nicolas Liochon (nkeywal@gmail.com)
>> wrote:
>>
>> fwiw, when we did this work in HBase, we categorized the tests. Then some
>> tests can share a single jvm, while some others need to be isolated in
>> their own jvm. Nevertheless surefire can still run them in parallel by
>> starting/stopping several jvm.
>>
>> Nicolas
>>
>>
>> On Fri, Aug 8, 2014 at 7:10 PM, Reynold Xin <rx...@databricks.com> wrote:
>>
>>> ScalaTest actually has support for parallelization built-in. We can use
>>> that.
>>>
>>> The main challenge is to make sure all the test suites can work in
>>> parallel
>>> when running along side each other.
>>>
>>>
>>> On Fri, Aug 8, 2014 at 9:47 AM, Ted Yu <yu...@gmail.com> wrote:
>>>
>>> > How about using parallel execution feature of maven-surefire-plugin
>>> > (assuming all the tests were made parallel friendly) ?
>>> >
>>> >
>>> >
>>>
>>> http://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html
>>> >
>>> > Cheers
>>> >
>>> >
>>> > On Fri, Aug 8, 2014 at 9:14 AM, Sean Owen <so...@cloudera.com> wrote:
>>> >
>>> > > A common approach is to separate unit tests from integration tests.
>>> > > Maven has support for this distinction. I'm not sure it helps a lot
>>> > > though, since it only helps you to not run integration tests all the
>>> > > time. But lots of Spark tests are integration-test-like and are
>>> > > important to run to know a change works.
>>> > >
>>> > > I haven't heard of a plugin to run different test suites remotely on
>>> > > many machines, but I would not be surprised if it exists.
>>> > >
>>> > > The Jenkins servers aren't CPU-bound as far as I can tell. It's that
>>> > > the tests spend a lot of time waiting for bits to start up or
>>> > > complete. That implies the existing tests could be sped up by just
>>> > > running in parallel locally. I recall someone recently proposed this?
>>> > >
>>> > > And I think the problem with that is simply that some of the tests
>>> > > collide with each other, by opening up the same port at the same time
>>> > > for example. I know that kind of problem is being attacked even right
>>> > > now. But if all the tests were made parallel friendly, I imagine
>>> > > parallelism could be enabled and speed up builds greatly without any
>>> > > remote machines.
>>> > >
>>> > >
>>> > > On Fri, Aug 8, 2014 at 5:01 PM, Nicholas Chammas
>>> > > <ni...@gmail.com> wrote:
>>> > > > Howdy,
>>> > > >
>>> > > > Do we think it's both feasible and worthwhile to invest in getting
>>> our
>>> > > unit
>>> > > > tests to finish in under 5 minutes (or something similarly brief)
>>> when
>>> > > run
>>> > > > by Jenkins?
>>> > > >
>>> > > > Unit tests currently seem to take anywhere from 30 min to 2 hours.
>>> > > > As
>>> > > > people add more tests, I imagine this time will only grow. I think
>>> > > > it
>>> > > would
>>> > > > be better for both contributors and reviewers if they didn't have
>>> > > > to
>>> > wait
>>> > > > so long for test results; PR reviews would be shorter, if nothing
>>> else.
>>> > > >
>>> > > > I don't know how how this is normally done, but maybe it wouldn't
>>> > > > be
>>> > too
>>> > > > much work to get a test cycle to feel lighter.
>>> > > >
>>> > > > Most unit tests are independent and can be run concurrently, right?
>>> > Would
>>> > > > it make sense to build a given patch on many servers at once and
>>> > > > send
>>> > > > disjoint sets of unit tests to each?
>>> > > >
>>> > > > I'd be interested in working on something like that if possible
>>> > > > (and
>>> > > > sensible).
>>> > > >
>>> > > > Nick
>>> > >
>>> > > ---------------------------------------------------------------------
>>> > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> > > For additional commands, e-mail: dev-help@spark.apache.org
>>> > >
>>> > >
>>> >
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Unit tests in < 5 minutes

Posted by Josh Rosen <ro...@gmail.com>.
One simple optimization might be to disable the application web UI in tests that don’t need it.  When running tests on my local machine while also running another Spark shell, I’ve noticed that the test logs fill up with errors when the web UI attempts to bind to the default port, fails, and tries a higher one.

- Josh
On August 8, 2014 at 11:54:24 AM, Patrick Wendell (pwendell@gmail.com) wrote:

I dug around this a bit a while ago, I think if someone sat down and  
profiled the tests it's likely we could find some things to optimize.  
In particular, there may be overheads in starting up a local spark  
context that could be minimized and speed up all the tests. Also,  
there are some tests (especially in Streaming) that take really long,  
like 60 seconds for a single test (see some of the new flume tests).  
These could almost certainly be optimized.  

I think 5 minutes might be out of reach, but something like a 2X  
improvement might be possible and would be very valuable if  
accomplished.  

- Patrick  

On Fri, Aug 8, 2014 at 11:24 AM, Matei Zaharia <ma...@gmail.com> wrote:  
> Just as a note, when you're developing stuff, you can use "test-only" in sbt, or the equivalent feature in Maven, to run just some of the tests. This is what I do, I don't wait for Jenkins to run things. 90% of the time if it passes the tests that I know could break stuff, it will pass all of Jenkins.  
>  
> Jenkins should always be doing all the integration tests, so I don't think it will become *that* much shorter in the long run, though it can certainly be improved.  
>  
> Matei  
>  
> On August 8, 2014 at 10:20:35 AM, Nicolas Liochon (nkeywal@gmail.com) wrote:  
>  
> fwiw, when we did this work in HBase, we categorized the tests. Then some  
> tests can share a single jvm, while some others need to be isolated in  
> their own jvm. Nevertheless surefire can still run them in parallel by  
> starting/stopping several jvm.  
>  
> Nicolas  
>  
>  
> On Fri, Aug 8, 2014 at 7:10 PM, Reynold Xin <rx...@databricks.com> wrote:  
>  
>> ScalaTest actually has support for parallelization built-in. We can use  
>> that.  
>>  
>> The main challenge is to make sure all the test suites can work in parallel  
>> when running along side each other.  
>>  
>>  
>> On Fri, Aug 8, 2014 at 9:47 AM, Ted Yu <yu...@gmail.com> wrote:  
>>  
>> > How about using parallel execution feature of maven-surefire-plugin  
>> > (assuming all the tests were made parallel friendly) ?  
>> >  
>> >  
>> >  
>> http://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html  
>> >  
>> > Cheers  
>> >  
>> >  
>> > On Fri, Aug 8, 2014 at 9:14 AM, Sean Owen <so...@cloudera.com> wrote:  
>> >  
>> > > A common approach is to separate unit tests from integration tests.  
>> > > Maven has support for this distinction. I'm not sure it helps a lot  
>> > > though, since it only helps you to not run integration tests all the  
>> > > time. But lots of Spark tests are integration-test-like and are  
>> > > important to run to know a change works.  
>> > >  
>> > > I haven't heard of a plugin to run different test suites remotely on  
>> > > many machines, but I would not be surprised if it exists.  
>> > >  
>> > > The Jenkins servers aren't CPU-bound as far as I can tell. It's that  
>> > > the tests spend a lot of time waiting for bits to start up or  
>> > > complete. That implies the existing tests could be sped up by just  
>> > > running in parallel locally. I recall someone recently proposed this?  
>> > >  
>> > > And I think the problem with that is simply that some of the tests  
>> > > collide with each other, by opening up the same port at the same time  
>> > > for example. I know that kind of problem is being attacked even right  
>> > > now. But if all the tests were made parallel friendly, I imagine  
>> > > parallelism could be enabled and speed up builds greatly without any  
>> > > remote machines.  
>> > >  
>> > >  
>> > > On Fri, Aug 8, 2014 at 5:01 PM, Nicholas Chammas  
>> > > <ni...@gmail.com> wrote:  
>> > > > Howdy,  
>> > > >  
>> > > > Do we think it's both feasible and worthwhile to invest in getting  
>> our  
>> > > unit  
>> > > > tests to finish in under 5 minutes (or something similarly brief)  
>> when  
>> > > run  
>> > > > by Jenkins?  
>> > > >  
>> > > > Unit tests currently seem to take anywhere from 30 min to 2 hours. As  
>> > > > people add more tests, I imagine this time will only grow. I think it  
>> > > would  
>> > > > be better for both contributors and reviewers if they didn't have to  
>> > wait  
>> > > > so long for test results; PR reviews would be shorter, if nothing  
>> else.  
>> > > >  
>> > > > I don't know how how this is normally done, but maybe it wouldn't be  
>> > too  
>> > > > much work to get a test cycle to feel lighter.  
>> > > >  
>> > > > Most unit tests are independent and can be run concurrently, right?  
>> > Would  
>> > > > it make sense to build a given patch on many servers at once and send  
>> > > > disjoint sets of unit tests to each?  
>> > > >  
>> > > > I'd be interested in working on something like that if possible (and  
>> > > > sensible).  
>> > > >  
>> > > > Nick  
>> > >  
>> > > ---------------------------------------------------------------------  
>> > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org  
>> > > For additional commands, e-mail: dev-help@spark.apache.org  
>> > >  
>> > >  
>> >  
>>  

---------------------------------------------------------------------  
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org  
For additional commands, e-mail: dev-help@spark.apache.org  


Re: Unit tests in < 5 minutes

Posted by Patrick Wendell <pw...@gmail.com>.
I dug around this a bit a while ago, I think if someone sat down and
profiled the tests it's likely we could find some things to optimize.
In particular, there may be overheads in starting up a local spark
context that could be minimized and speed up all the tests. Also,
there are some tests (especially in Streaming) that take really long,
like 60 seconds for a single test (see some of the new flume tests).
These could almost certainly be optimized.

I think 5 minutes might be out of reach, but something like a 2X
improvement might be possible and would be very valuable if
accomplished.

- Patrick

On Fri, Aug 8, 2014 at 11:24 AM, Matei Zaharia <ma...@gmail.com> wrote:
> Just as a note, when you're developing stuff, you can use "test-only" in sbt, or the equivalent feature in Maven, to run just some of the tests. This is what I do, I don't wait for Jenkins to run things. 90% of the time if it passes the tests that I know could break stuff, it will pass all of Jenkins.
>
> Jenkins should always be doing all the integration tests, so I don't think it will become *that* much shorter in the long run, though it can certainly be improved.
>
> Matei
>
> On August 8, 2014 at 10:20:35 AM, Nicolas Liochon (nkeywal@gmail.com) wrote:
>
> fwiw, when we did this work in HBase, we categorized the tests. Then some
> tests can share a single jvm, while some others need to be isolated in
> their own jvm. Nevertheless surefire can still run them in parallel by
> starting/stopping several jvm.
>
> Nicolas
>
>
> On Fri, Aug 8, 2014 at 7:10 PM, Reynold Xin <rx...@databricks.com> wrote:
>
>> ScalaTest actually has support for parallelization built-in. We can use
>> that.
>>
>> The main challenge is to make sure all the test suites can work in parallel
>> when running along side each other.
>>
>>
>> On Fri, Aug 8, 2014 at 9:47 AM, Ted Yu <yu...@gmail.com> wrote:
>>
>> > How about using parallel execution feature of maven-surefire-plugin
>> > (assuming all the tests were made parallel friendly) ?
>> >
>> >
>> >
>> http://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html
>> >
>> > Cheers
>> >
>> >
>> > On Fri, Aug 8, 2014 at 9:14 AM, Sean Owen <so...@cloudera.com> wrote:
>> >
>> > > A common approach is to separate unit tests from integration tests.
>> > > Maven has support for this distinction. I'm not sure it helps a lot
>> > > though, since it only helps you to not run integration tests all the
>> > > time. But lots of Spark tests are integration-test-like and are
>> > > important to run to know a change works.
>> > >
>> > > I haven't heard of a plugin to run different test suites remotely on
>> > > many machines, but I would not be surprised if it exists.
>> > >
>> > > The Jenkins servers aren't CPU-bound as far as I can tell. It's that
>> > > the tests spend a lot of time waiting for bits to start up or
>> > > complete. That implies the existing tests could be sped up by just
>> > > running in parallel locally. I recall someone recently proposed this?
>> > >
>> > > And I think the problem with that is simply that some of the tests
>> > > collide with each other, by opening up the same port at the same time
>> > > for example. I know that kind of problem is being attacked even right
>> > > now. But if all the tests were made parallel friendly, I imagine
>> > > parallelism could be enabled and speed up builds greatly without any
>> > > remote machines.
>> > >
>> > >
>> > > On Fri, Aug 8, 2014 at 5:01 PM, Nicholas Chammas
>> > > <ni...@gmail.com> wrote:
>> > > > Howdy,
>> > > >
>> > > > Do we think it's both feasible and worthwhile to invest in getting
>> our
>> > > unit
>> > > > tests to finish in under 5 minutes (or something similarly brief)
>> when
>> > > run
>> > > > by Jenkins?
>> > > >
>> > > > Unit tests currently seem to take anywhere from 30 min to 2 hours. As
>> > > > people add more tests, I imagine this time will only grow. I think it
>> > > would
>> > > > be better for both contributors and reviewers if they didn't have to
>> > wait
>> > > > so long for test results; PR reviews would be shorter, if nothing
>> else.
>> > > >
>> > > > I don't know how how this is normally done, but maybe it wouldn't be
>> > too
>> > > > much work to get a test cycle to feel lighter.
>> > > >
>> > > > Most unit tests are independent and can be run concurrently, right?
>> > Would
>> > > > it make sense to build a given patch on many servers at once and send
>> > > > disjoint sets of unit tests to each?
>> > > >
>> > > > I'd be interested in working on something like that if possible (and
>> > > > sensible).
>> > > >
>> > > > Nick
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> > > For additional commands, e-mail: dev-help@spark.apache.org
>> > >
>> > >
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Unit tests in < 5 minutes

Posted by Matei Zaharia <ma...@gmail.com>.
Just as a note, when you're developing stuff, you can use "test-only" in sbt, or the equivalent feature in Maven, to run just some of the tests. This is what I do, I don't wait for Jenkins to run things. 90% of the time if it passes the tests that I know could break stuff, it will pass all of Jenkins.

Jenkins should always be doing all the integration tests, so I don't think it will become *that* much shorter in the long run, though it can certainly be improved.

Matei

On August 8, 2014 at 10:20:35 AM, Nicolas Liochon (nkeywal@gmail.com) wrote:

fwiw, when we did this work in HBase, we categorized the tests. Then some 
tests can share a single jvm, while some others need to be isolated in 
their own jvm. Nevertheless surefire can still run them in parallel by 
starting/stopping several jvm. 

Nicolas 


On Fri, Aug 8, 2014 at 7:10 PM, Reynold Xin <rx...@databricks.com> wrote: 

> ScalaTest actually has support for parallelization built-in. We can use 
> that. 
> 
> The main challenge is to make sure all the test suites can work in parallel 
> when running along side each other. 
> 
> 
> On Fri, Aug 8, 2014 at 9:47 AM, Ted Yu <yu...@gmail.com> wrote: 
> 
> > How about using parallel execution feature of maven-surefire-plugin 
> > (assuming all the tests were made parallel friendly) ? 
> > 
> > 
> > 
> http://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html 
> > 
> > Cheers 
> > 
> > 
> > On Fri, Aug 8, 2014 at 9:14 AM, Sean Owen <so...@cloudera.com> wrote: 
> > 
> > > A common approach is to separate unit tests from integration tests. 
> > > Maven has support for this distinction. I'm not sure it helps a lot 
> > > though, since it only helps you to not run integration tests all the 
> > > time. But lots of Spark tests are integration-test-like and are 
> > > important to run to know a change works. 
> > > 
> > > I haven't heard of a plugin to run different test suites remotely on 
> > > many machines, but I would not be surprised if it exists. 
> > > 
> > > The Jenkins servers aren't CPU-bound as far as I can tell. It's that 
> > > the tests spend a lot of time waiting for bits to start up or 
> > > complete. That implies the existing tests could be sped up by just 
> > > running in parallel locally. I recall someone recently proposed this? 
> > > 
> > > And I think the problem with that is simply that some of the tests 
> > > collide with each other, by opening up the same port at the same time 
> > > for example. I know that kind of problem is being attacked even right 
> > > now. But if all the tests were made parallel friendly, I imagine 
> > > parallelism could be enabled and speed up builds greatly without any 
> > > remote machines. 
> > > 
> > > 
> > > On Fri, Aug 8, 2014 at 5:01 PM, Nicholas Chammas 
> > > <ni...@gmail.com> wrote: 
> > > > Howdy, 
> > > > 
> > > > Do we think it's both feasible and worthwhile to invest in getting 
> our 
> > > unit 
> > > > tests to finish in under 5 minutes (or something similarly brief) 
> when 
> > > run 
> > > > by Jenkins? 
> > > > 
> > > > Unit tests currently seem to take anywhere from 30 min to 2 hours. As 
> > > > people add more tests, I imagine this time will only grow. I think it 
> > > would 
> > > > be better for both contributors and reviewers if they didn't have to 
> > wait 
> > > > so long for test results; PR reviews would be shorter, if nothing 
> else. 
> > > > 
> > > > I don't know how how this is normally done, but maybe it wouldn't be 
> > too 
> > > > much work to get a test cycle to feel lighter. 
> > > > 
> > > > Most unit tests are independent and can be run concurrently, right? 
> > Would 
> > > > it make sense to build a given patch on many servers at once and send 
> > > > disjoint sets of unit tests to each? 
> > > > 
> > > > I'd be interested in working on something like that if possible (and 
> > > > sensible). 
> > > > 
> > > > Nick 
> > > 
> > > --------------------------------------------------------------------- 
> > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org 
> > > For additional commands, e-mail: dev-help@spark.apache.org 
> > > 
> > > 
> > 
> 

Re: Unit tests in < 5 minutes

Posted by Nicolas Liochon <nk...@gmail.com>.
fwiw, when we did this work in HBase, we categorized the tests. Then some
tests can share a single jvm, while some others need to be isolated in
their own jvm. Nevertheless surefire can still run them in parallel by
starting/stopping several jvm.

Nicolas


On Fri, Aug 8, 2014 at 7:10 PM, Reynold Xin <rx...@databricks.com> wrote:

> ScalaTest actually has support for parallelization built-in. We can use
> that.
>
> The main challenge is to make sure all the test suites can work in parallel
> when running along side each other.
>
>
> On Fri, Aug 8, 2014 at 9:47 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > How about using parallel execution feature of maven-surefire-plugin
> > (assuming all the tests were made parallel friendly) ?
> >
> >
> >
> http://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html
> >
> > Cheers
> >
> >
> > On Fri, Aug 8, 2014 at 9:14 AM, Sean Owen <so...@cloudera.com> wrote:
> >
> > > A common approach is to separate unit tests from integration tests.
> > > Maven has support for this distinction. I'm not sure it helps a lot
> > > though, since it only helps you to not run integration tests all the
> > > time. But lots of Spark tests are integration-test-like and are
> > > important to run to know a change works.
> > >
> > > I haven't heard of a plugin to run different test suites remotely on
> > > many machines, but I would not be surprised if it exists.
> > >
> > > The Jenkins servers aren't CPU-bound as far as I can tell. It's that
> > > the tests spend a lot of time waiting for bits to start up or
> > > complete. That implies the existing tests could be sped up by just
> > > running in parallel locally. I recall someone recently proposed this?
> > >
> > > And I think the problem with that is simply that some of the tests
> > > collide with each other, by opening up the same port at the same time
> > > for example. I know that kind of problem is being attacked even right
> > > now. But if all the tests were made parallel friendly, I imagine
> > > parallelism could be enabled and speed up builds greatly without any
> > > remote machines.
> > >
> > >
> > > On Fri, Aug 8, 2014 at 5:01 PM, Nicholas Chammas
> > > <ni...@gmail.com> wrote:
> > > > Howdy,
> > > >
> > > > Do we think it's both feasible and worthwhile to invest in getting
> our
> > > unit
> > > > tests to finish in under 5 minutes (or something similarly brief)
> when
> > > run
> > > > by Jenkins?
> > > >
> > > > Unit tests currently seem to take anywhere from 30 min to 2 hours. As
> > > > people add more tests, I imagine this time will only grow. I think it
> > > would
> > > > be better for both contributors and reviewers if they didn't have to
> > wait
> > > > so long for test results; PR reviews would be shorter, if nothing
> else.
> > > >
> > > > I don't know how how this is normally done, but maybe it wouldn't be
> > too
> > > > much work to get a test cycle to feel lighter.
> > > >
> > > > Most unit tests are independent and can be run concurrently, right?
> > Would
> > > > it make sense to build a given patch on many servers at once and send
> > > > disjoint sets of unit tests to each?
> > > >
> > > > I'd be interested in working on something like that if possible (and
> > > > sensible).
> > > >
> > > > Nick
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > > For additional commands, e-mail: dev-help@spark.apache.org
> > >
> > >
> >
>

Re: Unit tests in < 5 minutes

Posted by Ted Yu <yu...@gmail.com>.
bq. I may move on to trying Maven.

Maven is my favorite :-)

On Sat, Dec 6, 2014 at 10:54 AM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> Ted,
>
> I posted some updates
> <https://issues.apache.org/jira/browse/SPARK-3431?focusedCommentId=14236540&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14236540> on
> JIRA on my progress (or lack thereof) getting SBT to parallelize test
> suites properly. I'm currently stuck with SBT / ScalaTest, so I may move on
> to trying Maven.
>
> Andrew,
>
> Once we have a basic grasp of how to parallelize some of the tests, the
> next step will probably be to use containers (i.e. Docker) to allow more
> parallelization, especially for those tests that, for example, contend for
> ports.
>
> Nick
>
> On Fri Dec 05 2014 at 2:05:29 PM Andrew Or <an...@databricks.com> wrote:
>
>> @Patrick and Josh actually we went even further than that. We simply
>> disable the UI for most tests and these used to be the single largest
>> source of port conflict.
>>
>

Re: Unit tests in < 5 minutes

Posted by Nicholas Chammas <ni...@gmail.com>.
Ted,

I posted some updates
<https://issues.apache.org/jira/browse/SPARK-3431?focusedCommentId=14236540&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14236540>
on
JIRA on my progress (or lack thereof) getting SBT to parallelize test
suites properly. I'm currently stuck with SBT / ScalaTest, so I may move on
to trying Maven.

Andrew,

Once we have a basic grasp of how to parallelize some of the tests, the
next step will probably be to use containers (i.e. Docker) to allow more
parallelization, especially for those tests that, for example, contend for
ports.

Nick

On Fri Dec 05 2014 at 2:05:29 PM Andrew Or <an...@databricks.com> wrote:

> @Patrick and Josh actually we went even further than that. We simply
> disable the UI for most tests and these used to be the single largest
> source of port conflict.
>

Re: Unit tests in < 5 minutes

Posted by Andrew Or <an...@databricks.com>.
@Patrick and Josh actually we went even further than that. We simply
disable the UI for most tests and these used to be the single largest
source of port conflict.

Re: Unit tests in < 5 minutes

Posted by Ted Yu <yu...@gmail.com>.
Have you seen this thread http://search-hadoop.com/m/JW1q5xxSAa2 ?

Test categorization in HBase is done through maven-surefire-plugin

Cheers

On Thu, Dec 4, 2014 at 4:05 PM, Nicholas Chammas <nicholas.chammas@gmail.com
> wrote:

> fwiw, when we did this work in HBase, we categorized the tests. Then some
> tests can share a single jvm, while some others need to be isolated in
> their own jvm. Nevertheless surefire can still run them in parallel by
> starting/stopping several jvm.
>
> I think we need to do this as well. Perhaps the test naming hierarchy can
> be used to group non-parallelizable tests in the same JVM.
>
> For example, here are some Hive tests from our project:
>
> org.apache.spark.sql.hive.StatisticsSuite
> org.apache.spark.sql.hive.execution.HiveQuerySuite
> org.apache.spark.sql.QueryTest
> org.apache.spark.sql.parquet.HiveParquetSuite
>
> If we group tests by the first 5 parts of their name (e.g.
> org.apache.spark.sql.hive), then we’d have the first 2 tests run in the
> same JVM, and the next 2 tests each run in their own JVM.
>
> I’m new to this stuff so I’m not sure if I’m going about this in the right
> way, but you can see my attempt with this approach on GitHub
> <https://github.com/nchammas/spark/blob/ab127b798dbfa9399833d546e627f9651b060918/project/SparkBuild.scala#L388-L397>,
> as well as the related discussion on JIRA
> <https://issues.apache.org/jira/browse/SPARK-3431>.
>
> If anyone has more feedback on this, I’d love to hear it (either on this
> thread or in the JIRA issue).
>
> Nick
> ​
>
> On Sun Sep 07 2014 at 8:28:51 PM Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> On Fri, Aug 8, 2014 at 1:12 PM, Reynold Xin <rx...@databricks.com> wrote:
>>
>>> Nick,
>>>
>>> Would you like to file a ticket to track this?
>>>
>>
>> SPARK-3431 <https://issues.apache.org/jira/browse/SPARK-3431>:
>> Parallelize execution of tests
>> > Sub-task: SPARK-3432 <https://issues.apache.org/jira/browse/SPARK-3432>:
>> Fix logging of unit test execution time
>>
>> Nick
>>
>

Re: Unit tests in < 5 minutes

Posted by Nicholas Chammas <ni...@gmail.com>.
fwiw, when we did this work in HBase, we categorized the tests. Then some
tests can share a single jvm, while some others need to be isolated in
their own jvm. Nevertheless surefire can still run them in parallel by
starting/stopping several jvm.

I think we need to do this as well. Perhaps the test naming hierarchy can
be used to group non-parallelizable tests in the same JVM.

For example, here are some Hive tests from our project:

org.apache.spark.sql.hive.StatisticsSuite
org.apache.spark.sql.hive.execution.HiveQuerySuite
org.apache.spark.sql.QueryTest
org.apache.spark.sql.parquet.HiveParquetSuite

If we group tests by the first 5 parts of their name (e.g.
org.apache.spark.sql.hive), then we’d have the first 2 tests run in the
same JVM, and the next 2 tests each run in their own JVM.

I’m new to this stuff so I’m not sure if I’m going about this in the right
way, but you can see my attempt with this approach on GitHub
<https://github.com/nchammas/spark/blob/ab127b798dbfa9399833d546e627f9651b060918/project/SparkBuild.scala#L388-L397>,
as well as the related discussion on JIRA
<https://issues.apache.org/jira/browse/SPARK-3431>.

If anyone has more feedback on this, I’d love to hear it (either on this
thread or in the JIRA issue).

Nick
​

On Sun Sep 07 2014 at 8:28:51 PM Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> On Fri, Aug 8, 2014 at 1:12 PM, Reynold Xin <rx...@databricks.com> wrote:
>
>> Nick,
>>
>> Would you like to file a ticket to track this?
>>
>
> SPARK-3431 <https://issues.apache.org/jira/browse/SPARK-3431>:
> Parallelize execution of tests
> > Sub-task: SPARK-3432 <https://issues.apache.org/jira/browse/SPARK-3432>:
> Fix logging of unit test execution time
>
> Nick
>

Re: Unit tests in < 5 minutes

Posted by Nicholas Chammas <ni...@gmail.com>.
On Fri, Aug 8, 2014 at 1:12 PM, Reynold Xin <rx...@databricks.com> wrote:

> Nick,
>
> Would you like to file a ticket to track this?
>

SPARK-3431 <https://issues.apache.org/jira/browse/SPARK-3431>: Parallelize
execution of tests
> Sub-task: SPARK-3432 <https://issues.apache.org/jira/browse/SPARK-3432>:
Fix logging of unit test execution time

Nick

Re: Unit tests in < 5 minutes

Posted by Reynold Xin <rx...@databricks.com>.
Nick,

Would you like to file a ticket to track this?

I think the first baby step is to log the amount of time each test cases
take. This is supposed to happen already (see the flag), but somehow the
time are not showing. If you have some time to figure that out, that'd be
great.

https://github.com/apache/spark/blob/master/project/SparkBuild.scala#L350




On Fri, Aug 8, 2014 at 10:10 AM, Reynold Xin <rx...@databricks.com> wrote:

> ScalaTest actually has support for parallelization built-in. We can use
> that.
>
> The main challenge is to make sure all the test suites can work in
> parallel when running along side each other.
>
>
> On Fri, Aug 8, 2014 at 9:47 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> How about using parallel execution feature of maven-surefire-plugin
>> (assuming all the tests were made parallel friendly) ?
>>
>>
>> http://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html
>>
>> Cheers
>>
>>
>> On Fri, Aug 8, 2014 at 9:14 AM, Sean Owen <so...@cloudera.com> wrote:
>>
>> > A common approach is to separate unit tests from integration tests.
>> > Maven has support for this distinction. I'm not sure it helps a lot
>> > though, since it only helps you to not run integration tests all the
>> > time. But lots of Spark tests are integration-test-like and are
>> > important to run to know a change works.
>> >
>> > I haven't heard of a plugin to run different test suites remotely on
>> > many machines, but I would not be surprised if it exists.
>> >
>> > The Jenkins servers aren't CPU-bound as far as I can tell. It's that
>> > the tests spend a lot of time waiting for bits to start up or
>> > complete. That implies the existing tests could be sped up by just
>> > running in parallel locally. I recall someone recently proposed this?
>> >
>> > And I think the problem with that is simply that some of the tests
>> > collide with each other, by opening up the same port at the same time
>> > for example. I know that kind of problem is being attacked even right
>> > now. But if all the tests were made parallel friendly, I imagine
>> > parallelism could be enabled and speed up builds greatly without any
>> > remote machines.
>> >
>> >
>> > On Fri, Aug 8, 2014 at 5:01 PM, Nicholas Chammas
>> > <ni...@gmail.com> wrote:
>> > > Howdy,
>> > >
>> > > Do we think it's both feasible and worthwhile to invest in getting our
>> > unit
>> > > tests to finish in under 5 minutes (or something similarly brief) when
>> > run
>> > > by Jenkins?
>> > >
>> > > Unit tests currently seem to take anywhere from 30 min to 2 hours. As
>> > > people add more tests, I imagine this time will only grow. I think it
>> > would
>> > > be better for both contributors and reviewers if they didn't have to
>> wait
>> > > so long for test results; PR reviews would be shorter, if nothing
>> else.
>> > >
>> > > I don't know how how this is normally done, but maybe it wouldn't be
>> too
>> > > much work to get a test cycle to feel lighter.
>> > >
>> > > Most unit tests are independent and can be run concurrently, right?
>> Would
>> > > it make sense to build a given patch on many servers at once and send
>> > > disjoint sets of unit tests to each?
>> > >
>> > > I'd be interested in working on something like that if possible (and
>> > > sensible).
>> > >
>> > > Nick
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> > For additional commands, e-mail: dev-help@spark.apache.org
>> >
>> >
>>
>
>

Re: Unit tests in < 5 minutes

Posted by Reynold Xin <rx...@databricks.com>.
ScalaTest actually has support for parallelization built-in. We can use
that.

The main challenge is to make sure all the test suites can work in parallel
when running along side each other.


On Fri, Aug 8, 2014 at 9:47 AM, Ted Yu <yu...@gmail.com> wrote:

> How about using parallel execution feature of maven-surefire-plugin
> (assuming all the tests were made parallel friendly) ?
>
>
> http://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html
>
> Cheers
>
>
> On Fri, Aug 8, 2014 at 9:14 AM, Sean Owen <so...@cloudera.com> wrote:
>
> > A common approach is to separate unit tests from integration tests.
> > Maven has support for this distinction. I'm not sure it helps a lot
> > though, since it only helps you to not run integration tests all the
> > time. But lots of Spark tests are integration-test-like and are
> > important to run to know a change works.
> >
> > I haven't heard of a plugin to run different test suites remotely on
> > many machines, but I would not be surprised if it exists.
> >
> > The Jenkins servers aren't CPU-bound as far as I can tell. It's that
> > the tests spend a lot of time waiting for bits to start up or
> > complete. That implies the existing tests could be sped up by just
> > running in parallel locally. I recall someone recently proposed this?
> >
> > And I think the problem with that is simply that some of the tests
> > collide with each other, by opening up the same port at the same time
> > for example. I know that kind of problem is being attacked even right
> > now. But if all the tests were made parallel friendly, I imagine
> > parallelism could be enabled and speed up builds greatly without any
> > remote machines.
> >
> >
> > On Fri, Aug 8, 2014 at 5:01 PM, Nicholas Chammas
> > <ni...@gmail.com> wrote:
> > > Howdy,
> > >
> > > Do we think it's both feasible and worthwhile to invest in getting our
> > unit
> > > tests to finish in under 5 minutes (or something similarly brief) when
> > run
> > > by Jenkins?
> > >
> > > Unit tests currently seem to take anywhere from 30 min to 2 hours. As
> > > people add more tests, I imagine this time will only grow. I think it
> > would
> > > be better for both contributors and reviewers if they didn't have to
> wait
> > > so long for test results; PR reviews would be shorter, if nothing else.
> > >
> > > I don't know how how this is normally done, but maybe it wouldn't be
> too
> > > much work to get a test cycle to feel lighter.
> > >
> > > Most unit tests are independent and can be run concurrently, right?
> Would
> > > it make sense to build a given patch on many servers at once and send
> > > disjoint sets of unit tests to each?
> > >
> > > I'd be interested in working on something like that if possible (and
> > > sensible).
> > >
> > > Nick
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > For additional commands, e-mail: dev-help@spark.apache.org
> >
> >
>

Re: Unit tests in < 5 minutes

Posted by Ted Yu <yu...@gmail.com>.
How about using parallel execution feature of maven-surefire-plugin
(assuming all the tests were made parallel friendly) ?

http://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html

Cheers


On Fri, Aug 8, 2014 at 9:14 AM, Sean Owen <so...@cloudera.com> wrote:

> A common approach is to separate unit tests from integration tests.
> Maven has support for this distinction. I'm not sure it helps a lot
> though, since it only helps you to not run integration tests all the
> time. But lots of Spark tests are integration-test-like and are
> important to run to know a change works.
>
> I haven't heard of a plugin to run different test suites remotely on
> many machines, but I would not be surprised if it exists.
>
> The Jenkins servers aren't CPU-bound as far as I can tell. It's that
> the tests spend a lot of time waiting for bits to start up or
> complete. That implies the existing tests could be sped up by just
> running in parallel locally. I recall someone recently proposed this?
>
> And I think the problem with that is simply that some of the tests
> collide with each other, by opening up the same port at the same time
> for example. I know that kind of problem is being attacked even right
> now. But if all the tests were made parallel friendly, I imagine
> parallelism could be enabled and speed up builds greatly without any
> remote machines.
>
>
> On Fri, Aug 8, 2014 at 5:01 PM, Nicholas Chammas
> <ni...@gmail.com> wrote:
> > Howdy,
> >
> > Do we think it's both feasible and worthwhile to invest in getting our
> unit
> > tests to finish in under 5 minutes (or something similarly brief) when
> run
> > by Jenkins?
> >
> > Unit tests currently seem to take anywhere from 30 min to 2 hours. As
> > people add more tests, I imagine this time will only grow. I think it
> would
> > be better for both contributors and reviewers if they didn't have to wait
> > so long for test results; PR reviews would be shorter, if nothing else.
> >
> > I don't know how how this is normally done, but maybe it wouldn't be too
> > much work to get a test cycle to feel lighter.
> >
> > Most unit tests are independent and can be run concurrently, right? Would
> > it make sense to build a given patch on many servers at once and send
> > disjoint sets of unit tests to each?
> >
> > I'd be interested in working on something like that if possible (and
> > sensible).
> >
> > Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: Unit tests in < 5 minutes

Posted by Sean Owen <so...@cloudera.com>.
A common approach is to separate unit tests from integration tests.
Maven has support for this distinction. I'm not sure it helps a lot
though, since it only helps you to not run integration tests all the
time. But lots of Spark tests are integration-test-like and are
important to run to know a change works.

I haven't heard of a plugin to run different test suites remotely on
many machines, but I would not be surprised if it exists.

The Jenkins servers aren't CPU-bound as far as I can tell. It's that
the tests spend a lot of time waiting for bits to start up or
complete. That implies the existing tests could be sped up by just
running in parallel locally. I recall someone recently proposed this?

And I think the problem with that is simply that some of the tests
collide with each other, by opening up the same port at the same time
for example. I know that kind of problem is being attacked even right
now. But if all the tests were made parallel friendly, I imagine
parallelism could be enabled and speed up builds greatly without any
remote machines.


On Fri, Aug 8, 2014 at 5:01 PM, Nicholas Chammas
<ni...@gmail.com> wrote:
> Howdy,
>
> Do we think it's both feasible and worthwhile to invest in getting our unit
> tests to finish in under 5 minutes (or something similarly brief) when run
> by Jenkins?
>
> Unit tests currently seem to take anywhere from 30 min to 2 hours. As
> people add more tests, I imagine this time will only grow. I think it would
> be better for both contributors and reviewers if they didn't have to wait
> so long for test results; PR reviews would be shorter, if nothing else.
>
> I don't know how how this is normally done, but maybe it wouldn't be too
> much work to get a test cycle to feel lighter.
>
> Most unit tests are independent and can be run concurrently, right? Would
> it make sense to build a given patch on many servers at once and send
> disjoint sets of unit tests to each?
>
> I'd be interested in working on something like that if possible (and
> sensible).
>
> Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org