You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@solr.apache.org by Mike Drob <md...@mdrob.com> on 2022/07/21 19:59:33 UTC

On tests labelled @Slow...

Howdy devs,

I stumbled onto https://issues.apache.org/jira/browse/SOLR-16304 while
trying to upgrade our Lucene dependency and it's motivated me to take a
little bit of a look at our tests. I know that there are dragons here and
I'm under no illusions that I can fix everything, but I feel like a
thorough audit might be useful.

The short of it is that @Slow is going away. We have choices on what to do.
We currently have 112 tests annotated as such.

Let's start with some definitions? What is our threshold for how slow
is @Slow? Obviously this will vary from machine to machine, but maybe let's
say that anything under 10s on my 2017 iMac Pro is fast and anything longer
is slow? Arbitrary, and I reserve the right to move this later if I feel
there's a better cut off.

So maybe some tests get a new breath on life by being unlabelled. Maybe
some other ones get fixed (reducing data size is one idea...)

Some tests are slow because we have distributed systems and
propagation delay and lots of gross sleeps and waits, and I don't want to
touch those. Maybe those become Nightlies.

Are there other approaches? What do folks want to do to move us forward?

Mike

Re: On tests labelled @Slow...

Posted by David Smiley <ds...@apache.org>.

Or Slow should be disabled by default?  One or the other.

In the Lucene issue you linked to
https://issues.apache.org/jira/browse/LUCENE-10532 Tomoko did a comparison
table of tests running with & without Slow, and across threads.  If we
assume at least 4 workers, what are the results?  I wouldn't be surprised
if disabling Slow could make a big difference for Solr due to the long tail
of slower tests and Gradle's inability to keep all workers busy.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Jul 22, 2022 at 9:58 PM Mike Drob <md...@mdrob.com> wrote:

> Ok, another correction, tests.slow is enabled by default, so if they're
> already running most of the time then it's pretty "safe" to just axe the
> annotations.
>
> On Thu, Jul 21, 2022 at 8:19 PM Mike Drob <md...@mdrob.com> wrote:
>
> > Hmm... correction here - the failing Slow tests also happen to be
> > AwaitsFix tests so they were broken anyway. I wonder why my gradle
> command
> > decided to include them.
> >
> > On Thu, Jul 21, 2022 at 8:14 PM Mike Drob <md...@mdrob.com> wrote:
> >
> >> While I would agree with you in principle, I don't think the Slow tests
> >> are currently running anywhere right now. I tried running them locally
> and
> >> immediately got three reproducible failures.
> >>
> >> Uwe's jenkins doesn't run the slow tests and I don't see any jobs on ASF
> >> Jenkins that seem to do that either.
> >>
> >> On Thu, Jul 21, 2022 at 3:42 PM David Smiley <ds...@apache.org>
> wrote:
> >>
> >>> Thanks for spearheading this!
> >>>
> >>> Your definition of "slow" seems fine.  We can change it later.  As long
> >>> as
> >>> the build publishes tests with a runtime exceeding this threshold, we
> can
> >>> maintain this easily.
> >>>
> >>> I think keeping @Slow makes sense so that we can identify these tests
> >>> as-such to avoid running them at the CLI during normal development to
> >>> keep
> >>> us productive.  Obviously, slow tests need to run _sometimes_, which I
> >>> think should be at least CI & probably PR validation too.
> >>>
> >>> ~ David Smiley
> >>> Apache Lucene/Solr Search Developer
> >>> http://www.linkedin.com/in/davidwsmiley
> >>>
> >>>
> >>> On Thu, Jul 21, 2022 at 4:00 PM Mike Drob <md...@mdrob.com> wrote:
> >>>
> >>> > Howdy devs,
> >>> >
> >>> > I stumbled onto https://issues.apache.org/jira/browse/SOLR-16304
> while
> >>> > trying to upgrade our Lucene dependency and it's motivated me to
> take a
> >>> > little bit of a look at our tests. I know that there are dragons here
> >>> and
> >>> > I'm under no illusions that I can fix everything, but I feel like a
> >>> > thorough audit might be useful.
> >>> >
> >>> > The short of it is that @Slow is going away. We have choices on what
> >>> to do.
> >>> > We currently have 112 tests annotated as such.
> >>> >
> >>> > Let's start with some definitions? What is our threshold for how slow
> >>> > is @Slow? Obviously this will vary from machine to machine, but maybe
> >>> let's
> >>> > say that anything under 10s on my 2017 iMac Pro is fast and anything
> >>> longer
> >>> > is slow? Arbitrary, and I reserve the right to move this later if I
> >>> feel
> >>> > there's a better cut off.
> >>> >
> >>> > So maybe some tests get a new breath on life by being unlabelled.
> Maybe
> >>> > some other ones get fixed (reducing data size is one idea...)
> >>> >
> >>> > Some tests are slow because we have distributed systems and
> >>> > propagation delay and lots of gross sleeps and waits, and I don't
> want
> >>> to
> >>> > touch those. Maybe those become Nightlies.
> >>> >
> >>> > Are there other approaches? What do folks want to do to move us
> >>> forward?
> >>> >
> >>> > Mike
> >>> >
> >>>
> >>
>

Re: On tests labelled @Slow...

Posted by Mike Drob <md...@mdrob.com>.

Ok, another correction, tests.slow is enabled by default, so if they're
already running most of the time then it's pretty "safe" to just axe the
annotations.

On Thu, Jul 21, 2022 at 8:19 PM Mike Drob <md...@mdrob.com> wrote:

> Hmm... correction here - the failing Slow tests also happen to be
> AwaitsFix tests so they were broken anyway. I wonder why my gradle command
> decided to include them.
>
> On Thu, Jul 21, 2022 at 8:14 PM Mike Drob <md...@mdrob.com> wrote:
>
>> While I would agree with you in principle, I don't think the Slow tests
>> are currently running anywhere right now. I tried running them locally and
>> immediately got three reproducible failures.
>>
>> Uwe's jenkins doesn't run the slow tests and I don't see any jobs on ASF
>> Jenkins that seem to do that either.
>>
>> On Thu, Jul 21, 2022 at 3:42 PM David Smiley <ds...@apache.org> wrote:
>>
>>> Thanks for spearheading this!
>>>
>>> Your definition of "slow" seems fine.  We can change it later.  As long
>>> as
>>> the build publishes tests with a runtime exceeding this threshold, we can
>>> maintain this easily.
>>>
>>> I think keeping @Slow makes sense so that we can identify these tests
>>> as-such to avoid running them at the CLI during normal development to
>>> keep
>>> us productive.  Obviously, slow tests need to run _sometimes_, which I
>>> think should be at least CI & probably PR validation too.
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> On Thu, Jul 21, 2022 at 4:00 PM Mike Drob <md...@mdrob.com> wrote:
>>>
>>> > Howdy devs,
>>> >
>>> > I stumbled onto https://issues.apache.org/jira/browse/SOLR-16304 while
>>> > trying to upgrade our Lucene dependency and it's motivated me to take a
>>> > little bit of a look at our tests. I know that there are dragons here
>>> and
>>> > I'm under no illusions that I can fix everything, but I feel like a
>>> > thorough audit might be useful.
>>> >
>>> > The short of it is that @Slow is going away. We have choices on what
>>> to do.
>>> > We currently have 112 tests annotated as such.
>>> >
>>> > Let's start with some definitions? What is our threshold for how slow
>>> > is @Slow? Obviously this will vary from machine to machine, but maybe
>>> let's
>>> > say that anything under 10s on my 2017 iMac Pro is fast and anything
>>> longer
>>> > is slow? Arbitrary, and I reserve the right to move this later if I
>>> feel
>>> > there's a better cut off.
>>> >
>>> > So maybe some tests get a new breath on life by being unlabelled. Maybe
>>> > some other ones get fixed (reducing data size is one idea...)
>>> >
>>> > Some tests are slow because we have distributed systems and
>>> > propagation delay and lots of gross sleeps and waits, and I don't want
>>> to
>>> > touch those. Maybe those become Nightlies.
>>> >
>>> > Are there other approaches? What do folks want to do to move us
>>> forward?
>>> >
>>> > Mike
>>> >
>>>
>>

Re: On tests labelled @Slow...

Posted by Mike Drob <md...@mdrob.com>.

Hmm... correction here - the failing Slow tests also happen to be AwaitsFix
tests so they were broken anyway. I wonder why my gradle command decided to
include them.

On Thu, Jul 21, 2022 at 8:14 PM Mike Drob <md...@mdrob.com> wrote:

> While I would agree with you in principle, I don't think the Slow tests
> are currently running anywhere right now. I tried running them locally and
> immediately got three reproducible failures.
>
> Uwe's jenkins doesn't run the slow tests and I don't see any jobs on ASF
> Jenkins that seem to do that either.
>
> On Thu, Jul 21, 2022 at 3:42 PM David Smiley <ds...@apache.org> wrote:
>
>> Thanks for spearheading this!
>>
>> Your definition of "slow" seems fine.  We can change it later.  As long as
>> the build publishes tests with a runtime exceeding this threshold, we can
>> maintain this easily.
>>
>> I think keeping @Slow makes sense so that we can identify these tests
>> as-such to avoid running them at the CLI during normal development to keep
>> us productive.  Obviously, slow tests need to run _sometimes_, which I
>> think should be at least CI & probably PR validation too.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Thu, Jul 21, 2022 at 4:00 PM Mike Drob <md...@mdrob.com> wrote:
>>
>> > Howdy devs,
>> >
>> > I stumbled onto https://issues.apache.org/jira/browse/SOLR-16304 while
>> > trying to upgrade our Lucene dependency and it's motivated me to take a
>> > little bit of a look at our tests. I know that there are dragons here
>> and
>> > I'm under no illusions that I can fix everything, but I feel like a
>> > thorough audit might be useful.
>> >
>> > The short of it is that @Slow is going away. We have choices on what to
>> do.
>> > We currently have 112 tests annotated as such.
>> >
>> > Let's start with some definitions? What is our threshold for how slow
>> > is @Slow? Obviously this will vary from machine to machine, but maybe
>> let's
>> > say that anything under 10s on my 2017 iMac Pro is fast and anything
>> longer
>> > is slow? Arbitrary, and I reserve the right to move this later if I feel
>> > there's a better cut off.
>> >
>> > So maybe some tests get a new breath on life by being unlabelled. Maybe
>> > some other ones get fixed (reducing data size is one idea...)
>> >
>> > Some tests are slow because we have distributed systems and
>> > propagation delay and lots of gross sleeps and waits, and I don't want
>> to
>> > touch those. Maybe those become Nightlies.
>> >
>> > Are there other approaches? What do folks want to do to move us forward?
>> >
>> > Mike
>> >
>>
>

Re: On tests labelled @Slow...

Posted by Mike Drob <md...@mdrob.com>.

While I would agree with you in principle, I don't think the Slow tests are
currently running anywhere right now. I tried running them locally and
immediately got three reproducible failures.

Uwe's jenkins doesn't run the slow tests and I don't see any jobs on ASF
Jenkins that seem to do that either.

On Thu, Jul 21, 2022 at 3:42 PM David Smiley <ds...@apache.org> wrote:

> Thanks for spearheading this!
>
> Your definition of "slow" seems fine.  We can change it later.  As long as
> the build publishes tests with a runtime exceeding this threshold, we can
> maintain this easily.
>
> I think keeping @Slow makes sense so that we can identify these tests
> as-such to avoid running them at the CLI during normal development to keep
> us productive.  Obviously, slow tests need to run _sometimes_, which I
> think should be at least CI & probably PR validation too.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Thu, Jul 21, 2022 at 4:00 PM Mike Drob <md...@mdrob.com> wrote:
>
> > Howdy devs,
> >
> > I stumbled onto https://issues.apache.org/jira/browse/SOLR-16304 while
> > trying to upgrade our Lucene dependency and it's motivated me to take a
> > little bit of a look at our tests. I know that there are dragons here and
> > I'm under no illusions that I can fix everything, but I feel like a
> > thorough audit might be useful.
> >
> > The short of it is that @Slow is going away. We have choices on what to
> do.
> > We currently have 112 tests annotated as such.
> >
> > Let's start with some definitions? What is our threshold for how slow
> > is @Slow? Obviously this will vary from machine to machine, but maybe
> let's
> > say that anything under 10s on my 2017 iMac Pro is fast and anything
> longer
> > is slow? Arbitrary, and I reserve the right to move this later if I feel
> > there's a better cut off.
> >
> > So maybe some tests get a new breath on life by being unlabelled. Maybe
> > some other ones get fixed (reducing data size is one idea...)
> >
> > Some tests are slow because we have distributed systems and
> > propagation delay and lots of gross sleeps and waits, and I don't want to
> > touch those. Maybe those become Nightlies.
> >
> > Are there other approaches? What do folks want to do to move us forward?
> >
> > Mike
> >
>

Re: On tests labelled @Slow...

Posted by David Smiley <ds...@apache.org>.

Thanks for spearheading this!

Your definition of "slow" seems fine.  We can change it later.  As long as
the build publishes tests with a runtime exceeding this threshold, we can
maintain this easily.

I think keeping @Slow makes sense so that we can identify these tests
as-such to avoid running them at the CLI during normal development to keep
us productive.  Obviously, slow tests need to run _sometimes_, which I
think should be at least CI & probably PR validation too.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Jul 21, 2022 at 4:00 PM Mike Drob <md...@mdrob.com> wrote:

> Howdy devs,
>
> I stumbled onto https://issues.apache.org/jira/browse/SOLR-16304 while
> trying to upgrade our Lucene dependency and it's motivated me to take a
> little bit of a look at our tests. I know that there are dragons here and
> I'm under no illusions that I can fix everything, but I feel like a
> thorough audit might be useful.
>
> The short of it is that @Slow is going away. We have choices on what to do.
> We currently have 112 tests annotated as such.
>
> Let's start with some definitions? What is our threshold for how slow
> is @Slow? Obviously this will vary from machine to machine, but maybe let's
> say that anything under 10s on my 2017 iMac Pro is fast and anything longer
> is slow? Arbitrary, and I reserve the right to move this later if I feel
> there's a better cut off.
>
> So maybe some tests get a new breath on life by being unlabelled. Maybe
> some other ones get fixed (reducing data size is one idea...)
>
> Some tests are slow because we have distributed systems and
> propagation delay and lots of gross sleeps and waits, and I don't want to
> touch those. Maybe those become Nightlies.
>
> Are there other approaches? What do folks want to do to move us forward?
>
> Mike
>