You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Nicholas Chammas <ni...@gmail.com> on 2014/12/05 01:05:26 UTC

Re: Unit tests in < 5 minutes

fwiw, when we did this work in HBase, we categorized the tests. Then some
tests can share a single jvm, while some others need to be isolated in
their own jvm. Nevertheless surefire can still run them in parallel by
starting/stopping several jvm.

I think we need to do this as well. Perhaps the test naming hierarchy can
be used to group non-parallelizable tests in the same JVM.

For example, here are some Hive tests from our project:

org.apache.spark.sql.hive.StatisticsSuite
org.apache.spark.sql.hive.execution.HiveQuerySuite
org.apache.spark.sql.QueryTest
org.apache.spark.sql.parquet.HiveParquetSuite

If we group tests by the first 5 parts of their name (e.g.
org.apache.spark.sql.hive), then we’d have the first 2 tests run in the
same JVM, and the next 2 tests each run in their own JVM.

I’m new to this stuff so I’m not sure if I’m going about this in the right
way, but you can see my attempt with this approach on GitHub
<https://github.com/nchammas/spark/blob/ab127b798dbfa9399833d546e627f9651b060918/project/SparkBuild.scala#L388-L397>,
as well as the related discussion on JIRA
<https://issues.apache.org/jira/browse/SPARK-3431>.

If anyone has more feedback on this, I’d love to hear it (either on this
thread or in the JIRA issue).

Nick
​

On Sun Sep 07 2014 at 8:28:51 PM Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> On Fri, Aug 8, 2014 at 1:12 PM, Reynold Xin <rx...@databricks.com> wrote:
>
>> Nick,
>>
>> Would you like to file a ticket to track this?
>>
>
> SPARK-3431 <https://issues.apache.org/jira/browse/SPARK-3431>:
> Parallelize execution of tests
> > Sub-task: SPARK-3432 <https://issues.apache.org/jira/browse/SPARK-3432>:
> Fix logging of unit test execution time
>
> Nick
>

Re: Unit tests in < 5 minutes

Posted by Ted Yu <yu...@gmail.com>.
bq. I may move on to trying Maven.

Maven is my favorite :-)

On Sat, Dec 6, 2014 at 10:54 AM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> Ted,
>
> I posted some updates
> <https://issues.apache.org/jira/browse/SPARK-3431?focusedCommentId=14236540&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14236540> on
> JIRA on my progress (or lack thereof) getting SBT to parallelize test
> suites properly. I'm currently stuck with SBT / ScalaTest, so I may move on
> to trying Maven.
>
> Andrew,
>
> Once we have a basic grasp of how to parallelize some of the tests, the
> next step will probably be to use containers (i.e. Docker) to allow more
> parallelization, especially for those tests that, for example, contend for
> ports.
>
> Nick
>
> On Fri Dec 05 2014 at 2:05:29 PM Andrew Or <an...@databricks.com> wrote:
>
>> @Patrick and Josh actually we went even further than that. We simply
>> disable the UI for most tests and these used to be the single largest
>> source of port conflict.
>>
>

Re: Unit tests in < 5 minutes

Posted by Nicholas Chammas <ni...@gmail.com>.
Ted,

I posted some updates
<https://issues.apache.org/jira/browse/SPARK-3431?focusedCommentId=14236540&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14236540>
on
JIRA on my progress (or lack thereof) getting SBT to parallelize test
suites properly. I'm currently stuck with SBT / ScalaTest, so I may move on
to trying Maven.

Andrew,

Once we have a basic grasp of how to parallelize some of the tests, the
next step will probably be to use containers (i.e. Docker) to allow more
parallelization, especially for those tests that, for example, contend for
ports.

Nick

On Fri Dec 05 2014 at 2:05:29 PM Andrew Or <an...@databricks.com> wrote:

> @Patrick and Josh actually we went even further than that. We simply
> disable the UI for most tests and these used to be the single largest
> source of port conflict.
>

Re: Unit tests in < 5 minutes

Posted by Andrew Or <an...@databricks.com>.
@Patrick and Josh actually we went even further than that. We simply
disable the UI for most tests and these used to be the single largest
source of port conflict.

Re: Unit tests in < 5 minutes

Posted by Ted Yu <yu...@gmail.com>.
Have you seen this thread http://search-hadoop.com/m/JW1q5xxSAa2 ?

Test categorization in HBase is done through maven-surefire-plugin

Cheers

On Thu, Dec 4, 2014 at 4:05 PM, Nicholas Chammas <nicholas.chammas@gmail.com
> wrote:

> fwiw, when we did this work in HBase, we categorized the tests. Then some
> tests can share a single jvm, while some others need to be isolated in
> their own jvm. Nevertheless surefire can still run them in parallel by
> starting/stopping several jvm.
>
> I think we need to do this as well. Perhaps the test naming hierarchy can
> be used to group non-parallelizable tests in the same JVM.
>
> For example, here are some Hive tests from our project:
>
> org.apache.spark.sql.hive.StatisticsSuite
> org.apache.spark.sql.hive.execution.HiveQuerySuite
> org.apache.spark.sql.QueryTest
> org.apache.spark.sql.parquet.HiveParquetSuite
>
> If we group tests by the first 5 parts of their name (e.g.
> org.apache.spark.sql.hive), then we’d have the first 2 tests run in the
> same JVM, and the next 2 tests each run in their own JVM.
>
> I’m new to this stuff so I’m not sure if I’m going about this in the right
> way, but you can see my attempt with this approach on GitHub
> <https://github.com/nchammas/spark/blob/ab127b798dbfa9399833d546e627f9651b060918/project/SparkBuild.scala#L388-L397>,
> as well as the related discussion on JIRA
> <https://issues.apache.org/jira/browse/SPARK-3431>.
>
> If anyone has more feedback on this, I’d love to hear it (either on this
> thread or in the JIRA issue).
>
> Nick
> ​
>
> On Sun Sep 07 2014 at 8:28:51 PM Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> On Fri, Aug 8, 2014 at 1:12 PM, Reynold Xin <rx...@databricks.com> wrote:
>>
>>> Nick,
>>>
>>> Would you like to file a ticket to track this?
>>>
>>
>> SPARK-3431 <https://issues.apache.org/jira/browse/SPARK-3431>:
>> Parallelize execution of tests
>> > Sub-task: SPARK-3432 <https://issues.apache.org/jira/browse/SPARK-3432>:
>> Fix logging of unit test execution time
>>
>> Nick
>>
>