You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Marcelo Vanzin <va...@cloudera.com> on 2015/08/25 22:33:41 UTC

Paring down / tagging tests (or some other way to avoid timeouts)?

Hello y'all,

So I've been getting kinda annoyed with how many PR tests have been
timing out. I took one of the logs from one of my PRs and started to
do some crunching on the data from the output, and here's a list of
the 5 slowest suites:

307.14s HiveSparkSubmitSuite
382.641s VersionsSuite
398s CliSuite
410.52s HashJoinCompatibilitySuite
2508.61s HiveCompatibilitySuite

Looking at those, I'm not surprised at all that we see so many
timeouts. Is there any ongoing effort to trim down those tests
(especially HiveCompatibilitySuite) or somehow restrict when they're
run?

Almost 1 hour to run a single test suite that affects a rather
isolated part of the code base looks a little excessive to me.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Paring down / tagging tests (or some other way to avoid timeouts)?

Posted by Marcelo Vanzin <va...@cloudera.com>.
I chatted with Patrick briefly offline. It would be interesting to
know whether the scripts have some way of saying "run a smaller
version of certain tests" (e.g. by setting a system property that the
tests look at to decide what to run). That way, if there are no
changes under sql/, we could still run a small part of
HiveCompatibilitySuite, just not all of it. The reasoning being that
if a core change breaks something in Hive, it will probably break many
tests, not a specific one.

On Tue, Aug 25, 2015 at 1:48 PM, Michael Armbrust
<mi...@databricks.com> wrote:
> I'd be okay skipping the HiveCompatibilitySuite for core-only changes.  They
> do often catch bugs in changes to catalyst or sql though.  Same for
> HashJoinCompatibilitySuite/VersionsSuite.
>
> HiveSparkSubmitSuite/CliSuite should probably stay, as they do test things
> like addJar that have been broken by core in the past.
>
> On Tue, Aug 25, 2015 at 1:40 PM, Patrick Wendell <pw...@gmail.com> wrote:
>>
>> There is already code in place that restricts which tests run
>> depending on which code is modified. However, changes inside of
>> Spark's core currently require running all dependent tests. If you
>> have some ideas about how to improve that heuristic, it would be
>> great.
>>
>> - Patrick
>>
>> On Tue, Aug 25, 2015 at 1:33 PM, Marcelo Vanzin <va...@cloudera.com>
>> wrote:
>> > Hello y'all,
>> >
>> > So I've been getting kinda annoyed with how many PR tests have been
>> > timing out. I took one of the logs from one of my PRs and started to
>> > do some crunching on the data from the output, and here's a list of
>> > the 5 slowest suites:
>> >
>> > 307.14s HiveSparkSubmitSuite
>> > 382.641s VersionsSuite
>> > 398s CliSuite
>> > 410.52s HashJoinCompatibilitySuite
>> > 2508.61s HiveCompatibilitySuite
>> >
>> > Looking at those, I'm not surprised at all that we see so many
>> > timeouts. Is there any ongoing effort to trim down those tests
>> > (especially HiveCompatibilitySuite) or somehow restrict when they're
>> > run?
>> >
>> > Almost 1 hour to run a single test suite that affects a rather
>> > isolated part of the code base looks a little excessive to me.
>> >
>> > --
>> > Marcelo
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> > For additional commands, e-mail: dev-help@spark.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Paring down / tagging tests (or some other way to avoid timeouts)?

Posted by Michael Armbrust <mi...@databricks.com>.
I'd be okay skipping the HiveCompatibilitySuite for core-only changes.
They do often catch bugs in changes to catalyst or sql though.  Same for
HashJoinCompatibilitySuite/VersionsSuite.

HiveSparkSubmitSuite/CliSuite should probably stay, as they do test things
like addJar that have been broken by core in the past.

On Tue, Aug 25, 2015 at 1:40 PM, Patrick Wendell <pw...@gmail.com> wrote:

> There is already code in place that restricts which tests run
> depending on which code is modified. However, changes inside of
> Spark's core currently require running all dependent tests. If you
> have some ideas about how to improve that heuristic, it would be
> great.
>
> - Patrick
>
> On Tue, Aug 25, 2015 at 1:33 PM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
> > Hello y'all,
> >
> > So I've been getting kinda annoyed with how many PR tests have been
> > timing out. I took one of the logs from one of my PRs and started to
> > do some crunching on the data from the output, and here's a list of
> > the 5 slowest suites:
> >
> > 307.14s HiveSparkSubmitSuite
> > 382.641s VersionsSuite
> > 398s CliSuite
> > 410.52s HashJoinCompatibilitySuite
> > 2508.61s HiveCompatibilitySuite
> >
> > Looking at those, I'm not surprised at all that we see so many
> > timeouts. Is there any ongoing effort to trim down those tests
> > (especially HiveCompatibilitySuite) or somehow restrict when they're
> > run?
> >
> > Almost 1 hour to run a single test suite that affects a rather
> > isolated part of the code base looks a little excessive to me.
> >
> > --
> > Marcelo
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > For additional commands, e-mail: dev-help@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: Paring down / tagging tests (or some other way to avoid timeouts)?

Posted by Patrick Wendell <pw...@gmail.com>.
There is already code in place that restricts which tests run
depending on which code is modified. However, changes inside of
Spark's core currently require running all dependent tests. If you
have some ideas about how to improve that heuristic, it would be
great.

- Patrick

On Tue, Aug 25, 2015 at 1:33 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
> Hello y'all,
>
> So I've been getting kinda annoyed with how many PR tests have been
> timing out. I took one of the logs from one of my PRs and started to
> do some crunching on the data from the output, and here's a list of
> the 5 slowest suites:
>
> 307.14s HiveSparkSubmitSuite
> 382.641s VersionsSuite
> 398s CliSuite
> 410.52s HashJoinCompatibilitySuite
> 2508.61s HiveCompatibilitySuite
>
> Looking at those, I'm not surprised at all that we see so many
> timeouts. Is there any ongoing effort to trim down those tests
> (especially HiveCompatibilitySuite) or somehow restrict when they're
> run?
>
> Almost 1 hour to run a single test suite that affects a rather
> isolated part of the code base looks a little excessive to me.
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org