You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Michael Allman <mi...@videoamp.com> on 2016/07/08 15:58:11 UTC
Spark performance regression test suite
Hello,
I've seen a few messages on the mailing list regarding Spark performance concerns, especially regressions from previous versions. It got me thinking that perhaps an automated performance regression suite would be a worthwhile contribution? Is anyone working on this? Do we have a Jira issue for it?
I cannot commit to taking charge of such a project. I just thought it would be a great contribution for someone who does have the time and the chops to build it.
Cheers,
Michael
---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
Re: Spark performance regression test suite
Posted by Adam Roberts <AR...@uk.ibm.com>.
Agreed, this is something that we do regularly when producing our own
Spark distributions in IBM and so it will be beneficial to share updates
with the wider community, so far it looks like Spark 1.6.2 is the best out
of the box on spark-perf and HiBench (of course this may vary for real
workloads, individual applications and tuning efforts) but we have more
2.0 tests to be performed and we're not aware of any regressions between
previous versions except for perhaps with the Spark 2.0.0 post I made.
I'm looking for testing and feedback from any Spark gurus with my 2.0
changes for spark-perf (have a look at the open issue Holden's mentioned:
https://github.com/databricks/spark-perf/issues/108) and the same goes for
HiBench (FWIW we see the same regression on HiBench too:
https://github.com/intel-hadoop/HiBench/issues/221).
One idea for us is that the benchmarking could be run optionally as part
of the existing contribution process, an ideal solution IMO would involve
an additional parameter for the Jenkins job that when ticked will result
in a performance run being done with and without the change. As we don't
have direct access to the Jenkins build button in the community, when
contributing a change users could mark their change with something like
@performance or "jenkins performance test this please".
Alternatively the influential Spark folk could notice a change with a
potential performance impact and have it tested accordingly. While
microbenchmarks are useful it will be important to test the whole of
Spark. Then there's also the use of tags in the JIRA - lots for us to work
with if we wanted this.
This probably means the addition and therefore maintenance of dedicated
machines in the build farm although this would highlight any regressions
FAST as opposed to later on in the development cycle.
If there is indeed a regression we may have the fun task of binary
chopping commits between 1.6.2 and now...again TBC but a real possibility,
so interested to see if anybody else is doing regression testing and if
they see a similar problem.
If we don't go down the "benchmark as you contribute" route, having such a
suite will be perfect - it would clone the latest versions of each
benchmark, build them for the current version of Spark (can identify the
release from the pom), run the benchmarks we care about (let's say in
Spark standalone mode with a couple of executors) and produce a geomean
score - highlighting any significant deviations.
I'm happy to help with designing/reviewing this
Cheers,
From: Michael Gummelt <mg...@mesosphere.io>
To: Eric Liang <ek...@databricks.com>
Cc: Holden Karau <ho...@pigscanfly.ca>, Ted Yu <yu...@gmail.com>,
Michael Allman <mi...@videoamp.com>, dev <de...@spark.apache.org>
Date: 11/07/2016 17:00
Subject: Re: Spark performance regression test suite
I second any effort to update, automate, and communicate the results of
spark-perf (https://github.com/databricks/spark-perf)
On Fri, Jul 8, 2016 at 12:28 PM, Eric Liang <ek...@databricks.com> wrote:
Something like speed.pypy.org or the Chrome performance dashboards would
be very useful.
On Fri, Jul 8, 2016 at 9:50 AM Holden Karau <ho...@pigscanfly.ca> wrote:
There are also the spark-perf and spark-sql-perf projects in the
Databricks github (although I see an open issue for Spark 2.0 support in
one of them).
On Friday, July 8, 2016, Ted Yu <yu...@gmail.com> wrote:
Found a few issues:
[SPARK-6810] Performance benchmarks for SparkR
[SPARK-2833] performance tests for linear regression
[SPARK-15447] Performance test for ALS in Spark 2.0
Haven't found one for Spark core.
On Fri, Jul 8, 2016 at 8:58 AM, Michael Allman <mi...@videoamp.com>
wrote:
Hello,
I've seen a few messages on the mailing list regarding Spark performance
concerns, especially regressions from previous versions. It got me
thinking that perhaps an automated performance regression suite would be a
worthwhile contribution? Is anyone working on this? Do we have a Jira
issue for it?
I cannot commit to taking charge of such a project. I just thought it
would be a great contribution for someone who does have the time and the
chops to build it.
Cheers,
Michael
---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
--
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau
--
Michael Gummelt
Software Engineer
Mesosphere
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Re: Spark performance regression test suite
Posted by Michael Gummelt <mg...@mesosphere.io>.
I second any effort to update, automate, and communicate the results of
spark-perf (https://github.com/databricks/spark-perf)
On Fri, Jul 8, 2016 at 12:28 PM, Eric Liang <ek...@databricks.com> wrote:
> Something like speed.pypy.org
> <http://speed.pypy.org/timeline/#/?exe=3,6,1,5&base=2+472&ben=grid&env=1&revs=200&equid=off> or
> the Chrome performance dashboards <https://chromeperf.appspot.com/> would
> be very useful.
>
> On Fri, Jul 8, 2016 at 9:50 AM Holden Karau <ho...@pigscanfly.ca> wrote:
>
>> There are also the spark-perf and spark-sql-perf projects in the
>> Databricks github (although I see an open issue for Spark 2.0 support in
>> one of them).
>>
>> On Friday, July 8, 2016, Ted Yu <yu...@gmail.com> wrote:
>>
>>> Found a few issues:
>>>
>>> [SPARK-6810] Performance benchmarks for SparkR
>>> [SPARK-2833] performance tests for linear regression
>>> [SPARK-15447] Performance test for ALS in Spark 2.0
>>>
>>> Haven't found one for Spark core.
>>>
>>> On Fri, Jul 8, 2016 at 8:58 AM, Michael Allman <mi...@videoamp.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I've seen a few messages on the mailing list regarding Spark
>>>> performance concerns, especially regressions from previous versions. It got
>>>> me thinking that perhaps an automated performance regression suite would be
>>>> a worthwhile contribution? Is anyone working on this? Do we have a Jira
>>>> issue for it?
>>>>
>>>> I cannot commit to taking charge of such a project. I just thought it
>>>> would be a great contribution for someone who does have the time and the
>>>> chops to build it.
>>>>
>>>> Cheers,
>>>>
>>>> Michael
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>
>>>>
>>>
>>
>> --
>> Cell : 425-233-8271
>> Twitter: https://twitter.com/holdenkarau
>>
>>
--
Michael Gummelt
Software Engineer
Mesosphere
Re: Spark performance regression test suite
Posted by Eric Liang <ek...@databricks.com>.
Something like speed.pypy.org
<http://speed.pypy.org/timeline/#/?exe=3,6,1,5&base=2+472&ben=grid&env=1&revs=200&equid=off>
or
the Chrome performance dashboards <https://chromeperf.appspot.com/> would
be very useful.
On Fri, Jul 8, 2016 at 9:50 AM Holden Karau <ho...@pigscanfly.ca> wrote:
> There are also the spark-perf and spark-sql-perf projects in the
> Databricks github (although I see an open issue for Spark 2.0 support in
> one of them).
>
> On Friday, July 8, 2016, Ted Yu <yu...@gmail.com> wrote:
>
>> Found a few issues:
>>
>> [SPARK-6810] Performance benchmarks for SparkR
>> [SPARK-2833] performance tests for linear regression
>> [SPARK-15447] Performance test for ALS in Spark 2.0
>>
>> Haven't found one for Spark core.
>>
>> On Fri, Jul 8, 2016 at 8:58 AM, Michael Allman <mi...@videoamp.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I've seen a few messages on the mailing list regarding Spark performance
>>> concerns, especially regressions from previous versions. It got me thinking
>>> that perhaps an automated performance regression suite would be a
>>> worthwhile contribution? Is anyone working on this? Do we have a Jira issue
>>> for it?
>>>
>>> I cannot commit to taking charge of such a project. I just thought it
>>> would be a great contribution for someone who does have the time and the
>>> chops to build it.
>>>
>>> Cheers,
>>>
>>> Michael
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>
>>>
>>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>
>
Re: Spark performance regression test suite
Posted by Holden Karau <ho...@pigscanfly.ca>.
There are also the spark-perf and spark-sql-perf projects in the Databricks
github (although I see an open issue for Spark 2.0 support in one of them).
On Friday, July 8, 2016, Ted Yu <yu...@gmail.com> wrote:
> Found a few issues:
>
> [SPARK-6810] Performance benchmarks for SparkR
> [SPARK-2833] performance tests for linear regression
> [SPARK-15447] Performance test for ALS in Spark 2.0
>
> Haven't found one for Spark core.
>
> On Fri, Jul 8, 2016 at 8:58 AM, Michael Allman <michael@videoamp.com
> <javascript:_e(%7B%7D,'cvml','michael@videoamp.com');>> wrote:
>
>> Hello,
>>
>> I've seen a few messages on the mailing list regarding Spark performance
>> concerns, especially regressions from previous versions. It got me thinking
>> that perhaps an automated performance regression suite would be a
>> worthwhile contribution? Is anyone working on this? Do we have a Jira issue
>> for it?
>>
>> I cannot commit to taking charge of such a project. I just thought it
>> would be a great contribution for someone who does have the time and the
>> chops to build it.
>>
>> Cheers,
>>
>> Michael
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> <javascript:_e(%7B%7D,'cvml','dev-unsubscribe@spark.apache.org');>
>>
>>
>
--
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau
Re: Spark performance regression test suite
Posted by Ted Yu <yu...@gmail.com>.
Found a few issues:
[SPARK-6810] Performance benchmarks for SparkR
[SPARK-2833] performance tests for linear regression
[SPARK-15447] Performance test for ALS in Spark 2.0
Haven't found one for Spark core.
On Fri, Jul 8, 2016 at 8:58 AM, Michael Allman <mi...@videoamp.com> wrote:
> Hello,
>
> I've seen a few messages on the mailing list regarding Spark performance
> concerns, especially regressions from previous versions. It got me thinking
> that perhaps an automated performance regression suite would be a
> worthwhile contribution? Is anyone working on this? Do we have a Jira issue
> for it?
>
> I cannot commit to taking charge of such a project. I just thought it
> would be a great contribution for someone who does have the time and the
> chops to build it.
>
> Cheers,
>
> Michael
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>