You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Shawn Heisey <ap...@elyograg.org> on 2018/08/14 06:37:55 UTC

Speeding up the really slow Solr tests

If there's anyone I need to target specifically with this email, I think 
it's anyone who has a good working mental map of SolrCloud internals -- 
how everything interacts with ZK and between multiple nodes.  Erick 
deserves special mention because he's down in the test trenches 
frequently, slogging through the scary places.

There are a number of Solr tests, particularly those having to to with 
SolrCloud, that take a long time even on a particularly good test run.  
A bunch of them that take a minute or longer seem to involve ZooKeeper 
in some way, or exercise some other part of SolrCloud.

Here's some examples of the outliers:

    [junit4] Suite: org.apache.solr.cloud.api.collections.ShardSplitTest
    [junit4] Completed [421/829] on J2 in 464.11s, 10 tests

    [junit4] Suite: org.apache.solr.cloud.BasicDistributedZkTest
    [junit4] Completed [446/829] on J0 in 518.82s, 1 test

I'm wondering how much of the time on these long-running cloud tests is 
spent waiting for 15-60 second timeouts rather than actually executing 
test code.  Could we possibly speed some of these tests up just by 
adjusting timeouts to lower values?  My thought is that if a subsystem 
failure is expected as part of a test, why not expedite things so it 
happens in 5 seconds or less, instead of waiting 30 or 60 seconds?  
Maybe just make this change on tests where we actually do expect 
timeouts to be exceeded, not tests where everything is supposed to work 
correctly.

I know that we won't be able to speed up EVERY test in this way.  The 
timeouts default to such long values because there have been observable 
situations in the wild where short timeouts just aren't enough.  But if 
the idea has merit at all, I think there might be an opportunity to 
substantially speed up an overall test run.

Is this idea completely insane?

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Speeding up the really slow Solr tests

Posted by Erick Erickson <er...@gmail.com>.
bq. So alternatively include them by default but have a flag that will
exclude them for everyday (and Lucene?) use.

+1.

On Tue, Aug 14, 2018 at 5:52 AM, Jan Høydahl <ja...@cominvent.com> wrote:
> *think* your idea is that "ant test" would run the (probably quicker) unit
> tests, but "ant precommit" would run a larger set that includes integration.
>
>
> Not necessarily in "precommit" but perhaps "ant integrationtest" or
> something. The risk is that they fall under the radar and become even more
> rotten than they are already. So alternatively include them by default but
> have a flag that will exclude them for everyday (and Lucene?) use.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 14. aug. 2018 kl. 14:29 skrev Shawn Heisey <ap...@elyograg.org>:
>
> On 8/14/2018 6:13 AM, Jan Høydahl wrote:
>
> Is is normally a good practice to separate unit tests from integration
> tests, but with Solr we run all tests (except nightly and badapple) every
> time.
> I think it would help everyday development workflow if the normal "ant test"
> runs would exclude integration tests (spinning up clusters etc), but rather
> require those to be run before actually committing. Before we make such a
> change we'd probably need to look at the code coverage without those
> integration tests and add more unit tests to cover weak areas with
> stubbed/mocked unit tests.
>
>
> If I'm understanding you correctly, it sounds like a good idea.  ( *think*
> your idea is that "ant test" would run the (probably quicker) unit tests,
> but "ant precommit" would run a larger set that includes integration.
> Probably going to need a new annotation.
>
> I do think that integration tests that are actually expecting a timeout
> should be expedited when possible.  I can't say how frequently that would be
> possible -- I'm not very familiar with all the inner workings of SolrCloud,
> its interaction with ZK, and how the tests work.  I suspect that most
> integration test where we are NOT trying to cause failures with timeouts
> probably will run relatively quickly, though I am sure there are some where
> this is not the case.
>
> Thanks,
> Shawn
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Speeding up the really slow Solr tests

Posted by Jan Høydahl <ja...@cominvent.com>.
> *think* your idea is that "ant test" would run the (probably quicker) unit tests, but "ant precommit" would run a larger set that includes integration.

Not necessarily in "precommit" but perhaps "ant integrationtest" or something. The risk is that they fall under the radar and become even more rotten than they are already. So alternatively include them by default but have a flag that will exclude them for everyday (and Lucene?) use.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 14. aug. 2018 kl. 14:29 skrev Shawn Heisey <ap...@elyograg.org>:
> 
> On 8/14/2018 6:13 AM, Jan Høydahl wrote:
>> Is is normally a good practice to separate unit tests from integration tests, but with Solr we run all tests (except nightly and badapple) every time.
>> I think it would help everyday development workflow if the normal "ant test" runs would exclude integration tests (spinning up clusters etc), but rather
>> require those to be run before actually committing. Before we make such a change we'd probably need to look at the code coverage without those
>> integration tests and add more unit tests to cover weak areas with stubbed/mocked unit tests.
> 
> If I'm understanding you correctly, it sounds like a good idea.  ( *think* your idea is that "ant test" would run the (probably quicker) unit tests, but "ant precommit" would run a larger set that includes integration.  Probably going to need a new annotation.
> 
> I do think that integration tests that are actually expecting a timeout should be expedited when possible.  I can't say how frequently that would be possible -- I'm not very familiar with all the inner workings of SolrCloud, its interaction with ZK, and how the tests work.  I suspect that most integration test where we are NOT trying to cause failures with timeouts probably will run relatively quickly, though I am sure there are some where this is not the case.
> 
> Thanks,
> Shawn
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 


Re: Speeding up the really slow Solr tests

Posted by Shawn Heisey <ap...@elyograg.org>.
On 8/14/2018 6:13 AM, Jan Høydahl wrote:
> Is is normally a good practice to separate unit tests from integration 
> tests, but with Solr we run all tests (except nightly and badapple) 
> every time.
> I think it would help everyday development workflow if the normal "ant 
> test" runs would exclude integration tests (spinning up clusters etc), 
> but rather
> require those to be run before actually committing. Before we make 
> such a change we'd probably need to look at the code coverage without 
> those
> integration tests and add more unit tests to cover weak areas with 
> stubbed/mocked unit tests.

If I'm understanding you correctly, it sounds like a good idea.  ( 
*think* your idea is that "ant test" would run the (probably quicker) 
unit tests, but "ant precommit" would run a larger set that includes 
integration.  Probably going to need a new annotation.

I do think that integration tests that are actually expecting a timeout 
should be expedited when possible.  I can't say how frequently that 
would be possible -- I'm not very familiar with all the inner workings 
of SolrCloud, its interaction with ZK, and how the tests work.  I 
suspect that most integration test where we are NOT trying to cause 
failures with timeouts probably will run relatively quickly, though I am 
sure there are some where this is not the case.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Speeding up the really slow Solr tests

Posted by Jan Høydahl <ja...@cominvent.com>.
Hi,

Is is normally a good practice to separate unit tests from integration tests, but with Solr we run all tests (except nightly and badapple) every time.
I think it would help everyday development workflow if the normal "ant test" runs would exclude integration tests (spinning up clusters etc), but rather
require those to be run before actually committing. Before we make such a change we'd probably need to look at the code coverage without those
integration tests and add more unit tests to cover weak areas with stubbed/mocked unit tests.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 14. aug. 2018 kl. 08:37 skrev Shawn Heisey <ap...@elyograg.org>:
> 
> If there's anyone I need to target specifically with this email, I think it's anyone who has a good working mental map of SolrCloud internals -- how everything interacts with ZK and between multiple nodes.  Erick deserves special mention because he's down in the test trenches frequently, slogging through the scary places.
> 
> There are a number of Solr tests, particularly those having to to with SolrCloud, that take a long time even on a particularly good test run.  A bunch of them that take a minute or longer seem to involve ZooKeeper in some way, or exercise some other part of SolrCloud.
> 
> Here's some examples of the outliers:
> 
>    [junit4] Suite: org.apache.solr.cloud.api.collections.ShardSplitTest
>    [junit4] Completed [421/829] on J2 in 464.11s, 10 tests
> 
>    [junit4] Suite: org.apache.solr.cloud.BasicDistributedZkTest
>    [junit4] Completed [446/829] on J0 in 518.82s, 1 test
> 
> I'm wondering how much of the time on these long-running cloud tests is spent waiting for 15-60 second timeouts rather than actually executing test code.  Could we possibly speed some of these tests up just by adjusting timeouts to lower values?  My thought is that if a subsystem failure is expected as part of a test, why not expedite things so it happens in 5 seconds or less, instead of waiting 30 or 60 seconds?  Maybe just make this change on tests where we actually do expect timeouts to be exceeded, not tests where everything is supposed to work correctly.
> 
> I know that we won't be able to speed up EVERY test in this way.  The timeouts default to such long values because there have been observable situations in the wild where short timeouts just aren't enough.  But if the idea has merit at all, I think there might be an opportunity to substantially speed up an overall test run.
> 
> Is this idea completely insane?
> 
> Thanks,
> Shawn
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>