You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Mark Miller <ma...@gmail.com> on 2018/10/30 14:34:16 UTC

Update on the test fixing effort.

We are off to the races ...

With any luck I'll put my first patch up on SOLR-12801 today. It works
towards addressing 3-4 of the sub issues. Most importantly to me, it gives
ant test a chance to pass.

This was about 40 hours of almost continuous work and it is really just the
start. I've addressed a ton of test stuff, but we are not even close to
done. All of these tests need to be beasted thoroughly and defended for a
start. That is the effort I'll be taking on by package.

I'm working on making beasting super simple and adding doc for it. I'm also
working on making precommit do a light beasting to any new or modified
tests. I'm also working on some other things.

ant test was really, really bad. I'm going to need a lot of help to make
all this effort not turn into nothing. Without some real effort and change,
I promise these improvements will melt away faster than they are felt.

- Mark
-- 
- Mark
about.me/markrmiller

Re: Update on the test fixing effort.

Posted by Mark Miller <ma...@gmail.com>.
Okay, here is my current work:
https://github.com/apache/lucene-solr/pull/486

Please check it out and try 'ant clean test' in the solr directory. Report
your fails in SOLR-12932 <https://issues.apache.org/jira/browse/SOLR-12932>.

I'll work on those fails. If you want to help, you could do the same.


Mark

On Tue, Oct 30, 2018 at 2:57 PM Mark Miller <ma...@gmail.com> wrote:

> Here is something that would be useful someone might want to help with.
>
> I'm working on LUCENE-8541: Fix ant beast to not overwrite junit xml
> results for each beast.iters iteration.
>
> You can see there are a few linked issues around that.
>
> One thing that would be cool that I am not working on:
>
> Most junit xml result reporters don't handle multiple results for the same
> test correctly.
>
> I'm making it really easy to run 'ant beast' for a whole module or based
> on test matching patterns, which then leaves a bunch of folders with result
> xml files in them, one folder per beast round. We really could use a junit
> reporter that scans all those folders and processes all the xml files so
> it's easy to browse which test runs failed and what the fail message was
> for each test that was beasted. Same as the result html files that we
> produce now, but that that take into account multiple result xml files per
> test#method combination.
>
> - Mark
>
> On Tue, Oct 30, 2018 at 1:33 PM Mark Miller <ma...@gmail.com> wrote:
>
>> I've gotten a few pings on more concrete ideas on helping out. I'll keep
>> working on that over time, but I'll share a starting blurb here. I'm about
>> to share my first patch. I'll ping this thread when it's up. One thing you
>> can do is check that out and try 'ant test'. Report any fails you see
>> to SOLR-12932. If you have time, try to fix it. It's hard to do this with
>> any confidence without beasting the test.
>>
>> Also, in no particular order:
>>
>> tlog replica types rely on waitForInSyncWithLeader - this wait will often
>> wait even when indexes are in sync though, and just exhaust the timeout
>> every time. This needs to be fixed for these tests to be re-enabled.
>>
>> I've done some work to stabilize the auto scaling tests, but more needs
>> to be done. They need to be beasted and awaitsfix annotations removed after
>> fixing.
>>
>> @Nightly tests need to be beasted and addressed.
>>
>> Please report any ant tests fails you see on the command line
>> here: SOLR-12932
>>
>> Review tests with @AwaitsFix, try to address them.
>>
>> Beast tests that you write or change. Improvements and doc coming to help.
>>
>> Take your time committing. Run ant test more than once. When you see test
>> fails, report them. Better yet, look into them. If you are making quick
>> commits, you will almost certainly contribute to our test problem unless
>> you have some super tight and isolated unit test.
>>
>> Know how long it takes to run the tests on your machine! Pay attention to
>> fluctuation after making non trivial changes! On my old 6 core machine
>> running 8jvms at a time, I get this with my patch:
>>
>> BUILD SUCCESSFUL
>> Total time: 22 minutes 37 seconds
>>
>> When I see it say 40 minutes, I'll know something has gone wrong!
>>
>> As things progress, I'll bring up more. This is focusing on short term
>> firefighting. For that to be meaningful, there will be a lot of prevention
>> work to do as well. I've still got some work to do before I have more to
>> say on that.
>>
>> - Mark
>>
>> On Tue, Oct 30, 2018 at 11:03 AM Erick Erickson <er...@gmail.com>
>> wrote:
>>
>>> Sounds great, I'm anxious to see it in action.
>>>
>>> Let me know if I can help with beasting. I have two machines I can use.
>>>
>>> Of course having volunteered just about when things are getting under
>>> way I'll be gone Friday->Monday. Siiigggghhhh.
>>>
>>> Erick
>>> On Tue, Oct 30, 2018 at 7:34 AM Mark Miller <ma...@gmail.com>
>>> wrote:
>>> >
>>> > We are off to the races ...
>>> >
>>> > With any luck I'll put my first patch up on SOLR-12801 today. It works
>>> towards addressing 3-4 of the sub issues. Most importantly to me, it gives
>>> ant test a chance to pass.
>>> >
>>> > This was about 40 hours of almost continuous work and it is really
>>> just the start. I've addressed a ton of test stuff, but we are not even
>>> close to done. All of these tests need to be beasted thoroughly and
>>> defended for a start. That is the effort I'll be taking on by package.
>>> >
>>> > I'm working on making beasting super simple and adding doc for it. I'm
>>> also working on making precommit do a light beasting to any new or modified
>>> tests. I'm also working on some other things.
>>> >
>>> > ant test was really, really bad. I'm going to need a lot of help to
>>> make all this effort not turn into nothing. Without some real effort and
>>> change, I promise these improvements will melt away faster than they are
>>> felt.
>>> >
>>> > - Mark
>>> > --
>>> > - Mark
>>> > about.me/markrmiller
>>>
>> --
>> - Mark
>> about.me/markrmiller
>>
> --
> - Mark
> about.me/markrmiller
>
-- 
- Mark
about.me/markrmiller

Re: Update on the test fixing effort.

Posted by Mark Miller <ma...@gmail.com>.
Here is something that would be useful someone might want to help with.

I'm working on LUCENE-8541: Fix ant beast to not overwrite junit xml
results for each beast.iters iteration.

You can see there are a few linked issues around that.

One thing that would be cool that I am not working on:

Most junit xml result reporters don't handle multiple results for the same
test correctly.

I'm making it really easy to run 'ant beast' for a whole module or based on
test matching patterns, which then leaves a bunch of folders with result
xml files in them, one folder per beast round. We really could use a junit
reporter that scans all those folders and processes all the xml files so
it's easy to browse which test runs failed and what the fail message was
for each test that was beasted. Same as the result html files that we
produce now, but that that take into account multiple result xml files per
test#method combination.

- Mark

On Tue, Oct 30, 2018 at 1:33 PM Mark Miller <ma...@gmail.com> wrote:

> I've gotten a few pings on more concrete ideas on helping out. I'll keep
> working on that over time, but I'll share a starting blurb here. I'm about
> to share my first patch. I'll ping this thread when it's up. One thing you
> can do is check that out and try 'ant test'. Report any fails you see
> to SOLR-12932. If you have time, try to fix it. It's hard to do this with
> any confidence without beasting the test.
>
> Also, in no particular order:
>
> tlog replica types rely on waitForInSyncWithLeader - this wait will often
> wait even when indexes are in sync though, and just exhaust the timeout
> every time. This needs to be fixed for these tests to be re-enabled.
>
> I've done some work to stabilize the auto scaling tests, but more needs to
> be done. They need to be beasted and awaitsfix annotations removed after
> fixing.
>
> @Nightly tests need to be beasted and addressed.
>
> Please report any ant tests fails you see on the command line
> here: SOLR-12932
>
> Review tests with @AwaitsFix, try to address them.
>
> Beast tests that you write or change. Improvements and doc coming to help.
>
> Take your time committing. Run ant test more than once. When you see test
> fails, report them. Better yet, look into them. If you are making quick
> commits, you will almost certainly contribute to our test problem unless
> you have some super tight and isolated unit test.
>
> Know how long it takes to run the tests on your machine! Pay attention to
> fluctuation after making non trivial changes! On my old 6 core machine
> running 8jvms at a time, I get this with my patch:
>
> BUILD SUCCESSFUL
> Total time: 22 minutes 37 seconds
>
> When I see it say 40 minutes, I'll know something has gone wrong!
>
> As things progress, I'll bring up more. This is focusing on short term
> firefighting. For that to be meaningful, there will be a lot of prevention
> work to do as well. I've still got some work to do before I have more to
> say on that.
>
> - Mark
>
> On Tue, Oct 30, 2018 at 11:03 AM Erick Erickson <er...@gmail.com>
> wrote:
>
>> Sounds great, I'm anxious to see it in action.
>>
>> Let me know if I can help with beasting. I have two machines I can use.
>>
>> Of course having volunteered just about when things are getting under
>> way I'll be gone Friday->Monday. Siiigggghhhh.
>>
>> Erick
>> On Tue, Oct 30, 2018 at 7:34 AM Mark Miller <ma...@gmail.com>
>> wrote:
>> >
>> > We are off to the races ...
>> >
>> > With any luck I'll put my first patch up on SOLR-12801 today. It works
>> towards addressing 3-4 of the sub issues. Most importantly to me, it gives
>> ant test a chance to pass.
>> >
>> > This was about 40 hours of almost continuous work and it is really just
>> the start. I've addressed a ton of test stuff, but we are not even close to
>> done. All of these tests need to be beasted thoroughly and defended for a
>> start. That is the effort I'll be taking on by package.
>> >
>> > I'm working on making beasting super simple and adding doc for it. I'm
>> also working on making precommit do a light beasting to any new or modified
>> tests. I'm also working on some other things.
>> >
>> > ant test was really, really bad. I'm going to need a lot of help to
>> make all this effort not turn into nothing. Without some real effort and
>> change, I promise these improvements will melt away faster than they are
>> felt.
>> >
>> > - Mark
>> > --
>> > - Mark
>> > about.me/markrmiller
>>
> --
> - Mark
> about.me/markrmiller
>
-- 
- Mark
about.me/markrmiller

Re: Update on the test fixing effort.

Posted by Mark Miller <ma...@gmail.com>.
I've gotten a few pings on more concrete ideas on helping out. I'll keep
working on that over time, but I'll share a starting blurb here. I'm about
to share my first patch. I'll ping this thread when it's up. One thing you
can do is check that out and try 'ant test'. Report any fails you see
to SOLR-12932. If you have time, try to fix it. It's hard to do this with
any confidence without beasting the test.

Also, in no particular order:

tlog replica types rely on waitForInSyncWithLeader - this wait will often
wait even when indexes are in sync though, and just exhaust the timeout
every time. This needs to be fixed for these tests to be re-enabled.

I've done some work to stabilize the auto scaling tests, but more needs to
be done. They need to be beasted and awaitsfix annotations removed after
fixing.

@Nightly tests need to be beasted and addressed.

Please report any ant tests fails you see on the command line
here: SOLR-12932

Review tests with @AwaitsFix, try to address them.

Beast tests that you write or change. Improvements and doc coming to help.

Take your time committing. Run ant test more than once. When you see test
fails, report them. Better yet, look into them. If you are making quick
commits, you will almost certainly contribute to our test problem unless
you have some super tight and isolated unit test.

Know how long it takes to run the tests on your machine! Pay attention to
fluctuation after making non trivial changes! On my old 6 core machine
running 8jvms at a time, I get this with my patch:

BUILD SUCCESSFUL
Total time: 22 minutes 37 seconds

When I see it say 40 minutes, I'll know something has gone wrong!

As things progress, I'll bring up more. This is focusing on short term
firefighting. For that to be meaningful, there will be a lot of prevention
work to do as well. I've still got some work to do before I have more to
say on that.

- Mark

On Tue, Oct 30, 2018 at 11:03 AM Erick Erickson <er...@gmail.com>
wrote:

> Sounds great, I'm anxious to see it in action.
>
> Let me know if I can help with beasting. I have two machines I can use.
>
> Of course having volunteered just about when things are getting under
> way I'll be gone Friday->Monday. Siiigggghhhh.
>
> Erick
> On Tue, Oct 30, 2018 at 7:34 AM Mark Miller <ma...@gmail.com> wrote:
> >
> > We are off to the races ...
> >
> > With any luck I'll put my first patch up on SOLR-12801 today. It works
> towards addressing 3-4 of the sub issues. Most importantly to me, it gives
> ant test a chance to pass.
> >
> > This was about 40 hours of almost continuous work and it is really just
> the start. I've addressed a ton of test stuff, but we are not even close to
> done. All of these tests need to be beasted thoroughly and defended for a
> start. That is the effort I'll be taking on by package.
> >
> > I'm working on making beasting super simple and adding doc for it. I'm
> also working on making precommit do a light beasting to any new or modified
> tests. I'm also working on some other things.
> >
> > ant test was really, really bad. I'm going to need a lot of help to make
> all this effort not turn into nothing. Without some real effort and change,
> I promise these improvements will melt away faster than they are felt.
> >
> > - Mark
> > --
> > - Mark
> > about.me/markrmiller
>
-- 
- Mark
about.me/markrmiller

Re: Update on the test fixing effort.

Posted by Erick Erickson <er...@gmail.com>.
Sounds great, I'm anxious to see it in action.

Let me know if I can help with beasting. I have two machines I can use.

Of course having volunteered just about when things are getting under
way I'll be gone Friday->Monday. Siiigggghhhh.

Erick
On Tue, Oct 30, 2018 at 7:34 AM Mark Miller <ma...@gmail.com> wrote:
>
> We are off to the races ...
>
> With any luck I'll put my first patch up on SOLR-12801 today. It works towards addressing 3-4 of the sub issues. Most importantly to me, it gives ant test a chance to pass.
>
> This was about 40 hours of almost continuous work and it is really just the start. I've addressed a ton of test stuff, but we are not even close to done. All of these tests need to be beasted thoroughly and defended for a start. That is the effort I'll be taking on by package.
>
> I'm working on making beasting super simple and adding doc for it. I'm also working on making precommit do a light beasting to any new or modified tests. I'm also working on some other things.
>
> ant test was really, really bad. I'm going to need a lot of help to make all this effort not turn into nothing. Without some real effort and change, I promise these improvements will melt away faster than they are felt.
>
> - Mark
> --
> - Mark
> about.me/markrmiller

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org