You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2018/02/21 15:51:17 UTC

Test failures are out of control......

There's an elephant in the room, and it's that failing tests are being
ignored. Mind you, Solr and Lucene are progressing at a furious pace
with lots of great functionality being added. That said, we're
building up a considerable "technical debt" when it comes to testing.

And I should say up front that major new functionality is expected to
take a while to shake out (e.g. autoscaling, streaming, V2 API etc.),
and noise from tests of new functionality is expected while things
bake.

Below is a list of tests that have failed at least once since just
last night. This has been getting worse as time passes, the broken
window problem. Some e-mails have 10 failing tests (+/-) so unless I
go through each and every one I don't know whether something I've done
is causing a problem or not.

I'm as guilty of letting things slide as anyone else, there's been a
long-standing issue with TestLazyCores that I work on sporadically for
instance that's _probably_ "something in the test framework"....

Several folks have spent some time digging into test failures and
identifying at least some of the causes, kudos to them. It seems
they're voices crying out in the wilderness though.

There is so much noise at this point that tests are becoming
irrelevant. I'm trying to work on SOLR-10809 for instance, where
there's a pretty good possibility that I'll close at least one thing
that shouldn't be closed. So I ran the full suite 10 times and
gathered all the failures. Now I have to try to separate the failures
caused by that JIRA from the ones that aren't related to it so I beast
each of the failing tests 100 times against master. If I get a failure
on master too for a particular test, I'll assume it's "not my problem"
and drive on.

I freely acknowledge that this is poor practice. It's driven by
frustration and the desire to make progress. While it's poor practice,
it's not as bad as only looking at tests that I _think_ are related or
ignoring all tests failures I can't instantly recognize as "my fault".

So what's our stance on this? Mark Miller had a terrific program at
one point allowing categorization of tests that failed at a glance,
but it hasn't been updated in a while.  Steve Rowe is working on the
problem too. Hoss and Cassandra have both added to the efforts as
well. And I'm sure I'm leaving out others.

Then there's the @Ignore and @BadApple annotations....

So, as a community, are we going to devote some energy to this? Or
shall we just start ignoring all of the frequently failing tests?
Frankly we'd be farther ahead at this point marking failing tests that
aren't getting any work with @Ignore or @BadApple and getting
compulsive about not letting any _new_ tests fail than continuing our
current path. I don't _like_ this option mind you, but it's better
than letting these accumulate forever and tests become more and more
difficult to use. As tests become more difficult to use, they're used
less and the problem gets worse.

Note, I made no effort to separate suite .vs. individual reports here.....

Erick

FAILED:  junit.framework.TestSuite.org.apache.lucene.index.TestBagOfPositions
FAILED:  junit.framework.TestSuite.org.apache.lucene.index.TestIndexWriterDeleteByQuery
FAILED:  junit.framework.TestSuite.org.apache.lucene.store.TestSleepingLockWrapper
FAILED:  junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyFieldFacetCloudTest
FAILED:  junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyFieldFacetExtrasCloudTest
FAILED:  junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyQueryFacetCloudTest
FAILED:  junit.framework.TestSuite.org.apache.solr.client.solrj.TestLBHttpSolrClient
FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.TestSolrCloudWithSecureImpersonation
FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.autoscaling.AutoAddReplicasPlanActionTest
FAILED:  junit.framework.TestSuite.org.apache.solr.core.AlternateDirectoryTest
FAILED:  junit.framework.TestSuite.org.apache.solr.core.TestLazyCores
FAILED:  junit.framework.TestSuite.org.apache.solr.handler.component.DistributedFacetPivotSmallAdvancedTest
FAILED:  junit.framework.TestSuite.org.apache.solr.ltr.TestSelectiveWeightCreation
FAILED:  junit.framework.TestSuite.org.apache.solr.ltr.store.rest.TestModelManager
FAILED:  junit.framework.TestSuite.org.apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory
FAILED:  junit.framework.TestSuite.org.apache.solr.search.join.BlockJoinFacetDistribTest
FAILED:  junit.framework.TestSuite.org.apache.solr.security.TestAuthorizationFramework
FAILED:  junit.framework.TestSuite.org.apache.solr.update.processor.TestOpenNLPExtractNamedEntitiesUpdateProcessorFactory
FAILED:  org.apache.lucene.index.TestStressNRT.test
FAILED:  org.apache.solr.cloud.AddReplicaTest.test
FAILED:  org.apache.solr.cloud.DeleteShardTest.test
FAILED:  org.apache.solr.cloud.PeerSyncReplicationTest.test
FAILED:  org.apache.solr.cloud.ReplaceNodeNoTargetTest.test
FAILED:  org.apache.solr.cloud.TestUtilizeNode.test
FAILED:  org.apache.solr.cloud.api.collections.CollectionsAPIDistributedZkTest.testCollectionsAPI
FAILED:  org.apache.solr.cloud.api.collections.ShardSplitTest.testSplitAfterFailedSplit
FAILED:  org.apache.solr.cloud.autoscaling.AutoAddReplicasIntegrationTest.testSimple
FAILED:  org.apache.solr.cloud.autoscaling.ComputePlanActionTest.testNodeWithMultipleReplicasLost
FAILED:  org.apache.solr.cloud.autoscaling.HdfsAutoAddReplicasIntegrationTest.testSimple
FAILED:  org.apache.solr.cloud.autoscaling.SystemLogListenerTest.test
FAILED:  org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testEventQueue
FAILED:  org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testMetricTrigger
FAILED:  org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testSearchRate
FAILED:  org.apache.solr.cloud.autoscaling.sim.TestLargeCluster.testSearchRate
FAILED:  org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication
FAILED:  org.apache.solr.handler.admin.AutoscalingHistoryHandlerTest.testHistory
FAILED:  org.apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory.testCanHandleDecodingAndEncodingForSynonyms

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Erick Erickson <er...@gmail.com>.

That's fine. I'm not totally clear what the "anti-regression" path
forward is. This should make tests less flakey, right? I'd guess that
if we test with badapples=true and don't get failures for a while for
some tests we'll try un-BadAppling tests as time passes.

Erick

P.S. Besides,it's already done ;)

On Sun, Feb 25, 2018 at 12:55 PM, Mikhail Khludnev <mk...@apache.org> wrote:
> TLDR;
> I'm going to push https://issues.apache.org/jira/browse/SOLR-12027 in a day.
> Let me know if you think it's a bad idea.
>
> On Fri, Feb 23, 2018 at 8:06 PM, Erick Erickson <er...@gmail.com>
> wrote:
>>
>> Testing distributed systems requires, well, distributed systems which
>> is what starting clusters is all about. The great leap of faith of
>> individual-method unit testing is that if all the small parts are
>> tested, combining them in various ways will "just work". This is
>> emphatically not true with distributed systems.
>>
>> Which is also one of the reasons some of the tests are long. It takes
>> time (as you pointed out) to set up a cluster. So once a cluster is
>> started, testing a bunch of things amortizes the expense of setting up
>> the cluster. If each test of some bit of distributed functionality set
>> up and tore down a cluster, that would extend the time it takes to run
>> a full test suite by quite a bit. Note this is mostly a problem in
>> Solr, Lucene tests tend to run much faster.
>>
>> What Dawid said about randomness. All the randomization functions are
>> controlled by the "seed", that's what the "reproduce with" line in the
>> results is all about.  That "controlled randomization" has uncovered
>> any number of bugs for obscure things that would have been vastly more
>> painful to discover otherwise. One example I remember went along the
>> lines of "this particular functionality is broken when op systems X
>> thinks it's in the Turkish locale". Which is _also_ why all tests must
>> use the framework random() method provided by LuceneTestCase and never
>> the Java random functions.
>>
>> For that matter, one _other_ problem uncovered by the randomness is
>> that tests in a suite are executed in different order with different
>> seeds, so side effects of one test method that would affect another
>> are flushed out.
>>
>> Mind you, this doesn't help with race conditions that are sensitive
>> to, say, the clock speed of the machine you're running on....
>>
>> All that said, there's plenty of room for improving our tests. I'm
>> sure there are tests that spin up a cluster that don't need to.  All
>> patches welcome of course.
>>
>> Best,
>> Erick
>>
>>
>>
>> On Fri, Feb 23, 2018 at 8:20 AM, Dawid Weiss <da...@gmail.com>
>> wrote:
>> >> Randomness makes it difficult to correlate a failure to the commit that
>> >> made
>> >> the test to fail (as was pointed out earlier in the discussion). If
>> >> each
>> >> execution path is different, it may very well be that a failure you
>> >> experience is introduced several commits ago, so it may not be your
>> >> fault.
>> >
>> > This is true only to a certain degree. If you  don't randomize all you
>> > do is essentially run a fixed scenario. This protects you against a
>> > regression in this particular state, but it doesn't help in
>> > discovering new corner cases or environment quirks, which would be
>> > prohibitive to run as a full Cartesian product of all possibilities.
>> > So there is a tradeoff here and most folks in this project have agreed
>> > to it. If you look at how many problems randomization have helped
>> > discover I think it's a good tradeoff.
>> >
>> > Finally: your scenario can be actually reproduced with ease. Run the
>> > tests with a fixed seed before you apply a patch and after you apply
>> > it... if there is no regression you can assume your patch is fine (but
>> > it doesn't mean it won't fail later on on a different seed, which
>> > nobody will blame you for).
>> >
>> > Dawid
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: dev-help@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Mikhail Khludnev <mk...@apache.org>.

TLDR;
I'm going to push https://issues.apache.org/jira/browse/SOLR-12027 in a
day.
Let me know if you think it's a bad idea.

On Fri, Feb 23, 2018 at 8:06 PM, Erick Erickson <er...@gmail.com>
wrote:

> Testing distributed systems requires, well, distributed systems which
> is what starting clusters is all about. The great leap of faith of
> individual-method unit testing is that if all the small parts are
> tested, combining them in various ways will "just work". This is
> emphatically not true with distributed systems.
>
> Which is also one of the reasons some of the tests are long. It takes
> time (as you pointed out) to set up a cluster. So once a cluster is
> started, testing a bunch of things amortizes the expense of setting up
> the cluster. If each test of some bit of distributed functionality set
> up and tore down a cluster, that would extend the time it takes to run
> a full test suite by quite a bit. Note this is mostly a problem in
> Solr, Lucene tests tend to run much faster.
>
> What Dawid said about randomness. All the randomization functions are
> controlled by the "seed", that's what the "reproduce with" line in the
> results is all about.  That "controlled randomization" has uncovered
> any number of bugs for obscure things that would have been vastly more
> painful to discover otherwise. One example I remember went along the
> lines of "this particular functionality is broken when op systems X
> thinks it's in the Turkish locale". Which is _also_ why all tests must
> use the framework random() method provided by LuceneTestCase and never
> the Java random functions.
>
> For that matter, one _other_ problem uncovered by the randomness is
> that tests in a suite are executed in different order with different
> seeds, so side effects of one test method that would affect another
> are flushed out.
>
> Mind you, this doesn't help with race conditions that are sensitive
> to, say, the clock speed of the machine you're running on....
>
> All that said, there's plenty of room for improving our tests. I'm
> sure there are tests that spin up a cluster that don't need to.  All
> patches welcome of course.
>
> Best,
> Erick
>
>
>
> On Fri, Feb 23, 2018 at 8:20 AM, Dawid Weiss <da...@gmail.com>
> wrote:
> >> Randomness makes it difficult to correlate a failure to the commit that
> made
> >> the test to fail (as was pointed out earlier in the discussion). If each
> >> execution path is different, it may very well be that a failure you
> >> experience is introduced several commits ago, so it may not be your
> fault.
> >
> > This is true only to a certain degree. If you  don't randomize all you
> > do is essentially run a fixed scenario. This protects you against a
> > regression in this particular state, but it doesn't help in
> > discovering new corner cases or environment quirks, which would be
> > prohibitive to run as a full Cartesian product of all possibilities.
> > So there is a tradeoff here and most folks in this project have agreed
> > to it. If you look at how many problems randomization have helped
> > discover I think it's a good tradeoff.
> >
> > Finally: your scenario can be actually reproduced with ease. Run the
> > tests with a fixed seed before you apply a patch and after you apply
> > it... if there is no regression you can assume your patch is fine (but
> > it doesn't mean it won't fail later on on a different seed, which
> > nobody will blame you for).
> >
> > Dawid
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Test failures are out of control......

Posted by Erick Erickson <er...@gmail.com>.

Testing distributed systems requires, well, distributed systems which
is what starting clusters is all about. The great leap of faith of
individual-method unit testing is that if all the small parts are
tested, combining them in various ways will "just work". This is
emphatically not true with distributed systems.

Which is also one of the reasons some of the tests are long. It takes
time (as you pointed out) to set up a cluster. So once a cluster is
started, testing a bunch of things amortizes the expense of setting up
the cluster. If each test of some bit of distributed functionality set
up and tore down a cluster, that would extend the time it takes to run
a full test suite by quite a bit. Note this is mostly a problem in
Solr, Lucene tests tend to run much faster.

What Dawid said about randomness. All the randomization functions are
controlled by the "seed", that's what the "reproduce with" line in the
results is all about.  That "controlled randomization" has uncovered
any number of bugs for obscure things that would have been vastly more
painful to discover otherwise. One example I remember went along the
lines of "this particular functionality is broken when op systems X
thinks it's in the Turkish locale". Which is _also_ why all tests must
use the framework random() method provided by LuceneTestCase and never
the Java random functions.

For that matter, one _other_ problem uncovered by the randomness is
that tests in a suite are executed in different order with different
seeds, so side effects of one test method that would affect another
are flushed out.

Mind you, this doesn't help with race conditions that are sensitive
to, say, the clock speed of the machine you're running on....

All that said, there's plenty of room for improving our tests. I'm
sure there are tests that spin up a cluster that don't need to.  All
patches welcome of course.

Best,
Erick

On Fri, Feb 23, 2018 at 8:20 AM, Dawid Weiss <da...@gmail.com> wrote:
>> Randomness makes it difficult to correlate a failure to the commit that made
>> the test to fail (as was pointed out earlier in the discussion). If each
>> execution path is different, it may very well be that a failure you
>> experience is introduced several commits ago, so it may not be your fault.
>
> This is true only to a certain degree. If you  don't randomize all you
> do is essentially run a fixed scenario. This protects you against a
> regression in this particular state, but it doesn't help in
> discovering new corner cases or environment quirks, which would be
> prohibitive to run as a full Cartesian product of all possibilities.
> So there is a tradeoff here and most folks in this project have agreed
> to it. If you look at how many problems randomization have helped
> discover I think it's a good tradeoff.
>
> Finally: your scenario can be actually reproduced with ease. Run the
> tests with a fixed seed before you apply a patch and after you apply
> it... if there is no regression you can assume your patch is fine (but
> it doesn't mean it won't fail later on on a different seed, which
> nobody will blame you for).
>
> Dawid
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Dawid Weiss <da...@gmail.com>.

> Randomness makes it difficult to correlate a failure to the commit that made
> the test to fail (as was pointed out earlier in the discussion). If each
> execution path is different, it may very well be that a failure you
> experience is introduced several commits ago, so it may not be your fault.

This is true only to a certain degree. If you  don't randomize all you
do is essentially run a fixed scenario. This protects you against a
regression in this particular state, but it doesn't help in
discovering new corner cases or environment quirks, which would be
prohibitive to run as a full Cartesian product of all possibilities.
So there is a tradeoff here and most folks in this project have agreed
to it. If you look at how many problems randomization have helped
discover I think it's a good tradeoff.

Finally: your scenario can be actually reproduced with ease. Run the
tests with a fixed seed before you apply a patch and after you apply
it... if there is no regression you can assume your patch is fine (but
it doesn't mean it won't fail later on on a different seed, which
nobody will blame you for).

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Mihály Tóth <mi...@gmail.com>.

Hi,

After working a little bit with Solr, let me share my view on the test
failures. As I see there are certain technical aspects of the tests
themselves that make them gravitate towards ignorance.

*Randomness* makes it difficult to correlate a failure to the commit that
made the test to fail (as was pointed out earlier in the discussion). If
each execution path is different, it may very well be that a failure you
experience is introduced several commits ago, so it may not be your fault.
Also you may not be sure that if your tests are green you have not broken
anything. The relevance of the test run for your work is decreased. This
just contradicts to the aim of these tests: ship if and only if the tests
are green. So in my opinion random tests alone are not very handy to
protect against regression. They are best for an additional layer of
stability/endurance tests.

*Diagnosability* of tests could be improved. For example:

public static boolean waitForLiveAndActiveReplicaCount(ZkStateReader
zkStateReader,
    String collection, int replicaCount, int timeoutInMs) {

.. this line tells if live replica count has reached the limit or not. It
does not tell how many replicas are there, what the others are doing, etc..
You have to get into the logs for details.

assertEquals(0, response3.getStatus());

This will tell if response3 failed, but will not tell what was the error.
You could tell much more because response3.getErrorMessages() tells the
details and could help in diagnosing the problem better if that was
included into the error/failure message of the test assertion.

Many tests are *slower* than a traditional unit test. Mostly because they
start a cluster of some sort. Unit tests that test a single class which
gets executed in the test thread is much faster, much easier to rerun,
debug and fix.

Many tests are* long* and could be broken into multiple of testcases. The
real expected micro-functionalities are waved inside the testcase. For
example in SharedFSAutoReplicaFailoverTest it is verified that if
autoaddReplicas is true, the failed core is recovered on another one, if it
is false it is then not recovered, replicas dont exceed the limit of the
maxShardsPerNode, etc. If these are separate testcases, it is easier to
tell which behaviour is broken, so it is easier to diagnose, and it is
easier to extend with additional functionality.

These may sound small but all these add up and gets multiplied with the
amount of testcases, the number of runs and the number of developers.

Best Regards,

  Misi

> --------------------------------------------
> From: Erick Erickson <er...@gmail.com>
> Date: 2018-02-22 20:39 GMT+01:00
> Subject: Re: Test failures are out of control......
> To: dev@lucene.apache.org
>
>
> See: https://issues.apache.org/jira/browse/SOLR-12016
>
> And there are already several linked issues that have been fixed, I'll
> try removing the annotations in SOLR-12017.
>
> Erick
>
> On Thu, Feb 22, 2018 at 9:55 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> >> : great and very helpful! Does it only contain Solr or are there also
> Lucene
> >> tests reported?
> >>
> >> That was just a dumb choice on my part when creating the fucit.org URL
> >> since the vast majority of the jenkins failures are in Solr tests.
> >>
> >> It's really every test failure reported from any build of any jenkins
> jobs
> >> in one of the feeds it's tracking...
> >>
> >> https://github.com/hossman/jenkins-reports/blob/master/venus.ini#L66
> >>
> >> (BTW: If you could setup a "jenkins view" for only the "Lucene/Solr"
> jobs
> >> on your jenkins server -- similar to what we have on builds.apache and
> >> sarowe's jenkins -- that would help reduce the noise in the report
> if/when
> >> there are any test failures in other non-lucene jenkins jobs you have
> >> running)
> >
> > Here you are: https://jenkins.thetaphi.de/view/Lucene-Solr/
> >
> >> If, for example, you type "org.apache.lucene" in the text box in the
> class
> >> column, it will filter down to only show you test failures from lucene
> >> package tests -- like for example: in the past 24 hours,
> >> oal.search.TestSearcherManager has had suite level failures in 10% of
> the
> >> jenkins builds in which it run...
> >>
> >>    http://fucit.org/solr-jenkins-reports/failure-report.html
> >
> > Wonderful!
> >
> > Uwe
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
>

Re: Test failures are out of control......

Posted by Erick Erickson <er...@gmail.com>.

See: https://issues.apache.org/jira/browse/SOLR-12016

And there are already several linked issues that have been fixed, I'll
try removing the annotations in SOLR-12017.

Erick

On Thu, Feb 22, 2018 at 9:55 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
>> : great and very helpful! Does it only contain Solr or are there also Lucene
>> tests reported?
>>
>> That was just a dumb choice on my part when creating the fucit.org URL
>> since the vast majority of the jenkins failures are in Solr tests.
>>
>> It's really every test failure reported from any build of any jenkins jobs
>> in one of the feeds it's tracking...
>>
>> https://github.com/hossman/jenkins-reports/blob/master/venus.ini#L66
>>
>> (BTW: If you could setup a "jenkins view" for only the "Lucene/Solr" jobs
>> on your jenkins server -- similar to what we have on builds.apache and
>> sarowe's jenkins -- that would help reduce the noise in the report if/when
>> there are any test failures in other non-lucene jenkins jobs you have
>> running)
>
> Here you are: https://jenkins.thetaphi.de/view/Lucene-Solr/
>
>> If, for example, you type "org.apache.lucene" in the text box in the class
>> column, it will filter down to only show you test failures from lucene
>> package tests -- like for example: in the past 24 hours,
>> oal.search.TestSearcherManager has had suite level failures in 10% of the
>> jenkins builds in which it run...
>>
>>    http://fucit.org/solr-jenkins-reports/failure-report.html
>
> Wonderful!
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

RE: Test failures are out of control......

Posted by Uwe Schindler <uw...@thetaphi.de>.

> : great and very helpful! Does it only contain Solr or are there also Lucene
> tests reported?
> 
> That was just a dumb choice on my part when creating the fucit.org URL
> since the vast majority of the jenkins failures are in Solr tests.
> 
> It's really every test failure reported from any build of any jenkins jobs
> in one of the feeds it's tracking...
> 
> https://github.com/hossman/jenkins-reports/blob/master/venus.ini#L66
> 
> (BTW: If you could setup a "jenkins view" for only the "Lucene/Solr" jobs
> on your jenkins server -- similar to what we have on builds.apache and
> sarowe's jenkins -- that would help reduce the noise in the report if/when
> there are any test failures in other non-lucene jenkins jobs you have
> running)

Here you are: https://jenkins.thetaphi.de/view/Lucene-Solr/

> If, for example, you type "org.apache.lucene" in the text box in the class
> column, it will filter down to only show you test failures from lucene
> package tests -- like for example: in the past 24 hours,
> oal.search.TestSearcherManager has had suite level failures in 10% of the
> jenkins builds in which it run...
> 
>    http://fucit.org/solr-jenkins-reports/failure-report.html

Wonderful!

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

RE: Test failures are out of control......

Posted by Chris Hostetter <ho...@fucit.org>.

: great and very helpful! Does it only contain Solr or are there also Lucene tests reported?

That was just a dumb choice on my part when creating the fucit.org URL 
since the vast majority of the jenkins failures are in Solr tests.

It's really every test failure reported from any build of any jenkins jobs 
in one of the feeds it's tracking...

https://github.com/hossman/jenkins-reports/blob/master/venus.ini#L66

(BTW: If you could setup a "jenkins view" for only the "Lucene/Solr" jobs 
on your jenkins server -- similar to what we have on builds.apache and 
sarowe's jenkins -- that would help reduce the noise in the report if/when 
there are any test failures in other non-lucene jenkins jobs you have 
running)


If, for example, you type "org.apache.lucene" in the text box in the class 
column, it will filter down to only show you test failures from lucene 
package tests -- like for example: in the past 24 hours, 
oal.search.TestSearcherManager has had suite level failures in 10% of the 
jenkins builds in which it run...

   http://fucit.org/solr-jenkins-reports/failure-report.html



: > -----Original Message-----
: > From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
: > Sent: Thursday, February 22, 2018 1:56 AM
: > To: dev@lucene.apache.org
: > Subject: Re: Test failures are out of control......
: > 
: > 
: > : * Hoss has worked on aggregating all test failures from the 3 Jenkins
: > : systems (ASF, Policeman, and Steve's), downloading the test results & logs,
: > : and running some reports/stats on failures. He should be ready to share
: > : this more publicly soon.
: > 
: > I think Steve's linked to some of this before from jira comments, but it
: > was only recently I realized i've never explicitly said to the list "Hey
: > folks, here's a thing i've been working on" ...
: > 
: >   http://fucit.org/solr-jenkins-reports/
: >   https://github.com/hossman/jenkins-reports/
: > 
: > The most interesting bit is probably here...
: > 
: >   http://fucit.org/solr-jenkins-reports/failure-report.html
: > 
: > ...but there are currently a few caveats:
: > 
: > 1) there's some noise inthe '7days' data because I wasn't accounting for
: > the way jenkins reports some types of failure -- that will gradually clean
: > itself up
: > 
: > 2) I think i've been been blocked by builds.apache.org, so at the moment
: > the data seems to just be from the sarowe & policeman jenkins failures.
: > 
: > 3) allthough the system is archiving the past 7 days worth of jenkins logs
: > for any jobs with failures, there is currently no easy way to download
: > the relevant log(s) from that failure report -- you currently have to
: > download a CSV file like this one to corrolate the test failures to the
: > jenkins job, and then go look for that job in the job-data dirs...
: > 
: >   http://fucit.org/solr-jenkins-reports/reports/7days-method-failures.csv
: >   http://fucit.org/solr-jenkins-reports/job-data/
: > 
: > (My hope is to make #3 trivial from failure-report.html -- so you can say
: > "hey weird, this test has failed X times, let's go download those logs."
: > right from a single screen in your browser)
: > 
: > 
: > 
: > 
: > -Hoss
: > http://www.lucidworks.com/
: > 
: > ---------------------------------------------------------------------
: > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
: > For additional commands, e-mail: dev-help@lucene.apache.org
: 
: 
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
: For additional commands, e-mail: dev-help@lucene.apache.org
: 
: 

-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

RE: Test failures are out of control......

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi Hoss,

great and very helpful! Does it only contain Solr or are there also Lucene tests reported?

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
> Sent: Thursday, February 22, 2018 1:56 AM
> To: dev@lucene.apache.org
> Subject: Re: Test failures are out of control......
> 
> 
> : * Hoss has worked on aggregating all test failures from the 3 Jenkins
> : systems (ASF, Policeman, and Steve's), downloading the test results & logs,
> : and running some reports/stats on failures. He should be ready to share
> : this more publicly soon.
> 
> I think Steve's linked to some of this before from jira comments, but it
> was only recently I realized i've never explicitly said to the list "Hey
> folks, here's a thing i've been working on" ...
> 
>   http://fucit.org/solr-jenkins-reports/
>   https://github.com/hossman/jenkins-reports/
> 
> The most interesting bit is probably here...
> 
>   http://fucit.org/solr-jenkins-reports/failure-report.html
> 
> ...but there are currently a few caveats:
> 
> 1) there's some noise inthe '7days' data because I wasn't accounting for
> the way jenkins reports some types of failure -- that will gradually clean
> itself up
> 
> 2) I think i've been been blocked by builds.apache.org, so at the moment
> the data seems to just be from the sarowe & policeman jenkins failures.
> 
> 3) allthough the system is archiving the past 7 days worth of jenkins logs
> for any jobs with failures, there is currently no easy way to download
> the relevant log(s) from that failure report -- you currently have to
> download a CSV file like this one to corrolate the test failures to the
> jenkins job, and then go look for that job in the job-data dirs...
> 
>   http://fucit.org/solr-jenkins-reports/reports/7days-method-failures.csv
>   http://fucit.org/solr-jenkins-reports/job-data/
> 
> (My hope is to make #3 trivial from failure-report.html -- so you can say
> "hey weird, this test has failed X times, let's go download those logs."
> right from a single screen in your browser)
> 
> 
> 
> 
> -Hoss
> http://www.lucidworks.com/
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Shawn Heisey <ap...@elyograg.org>.

On 2/22/2018 9:04 AM, Erick Erickson wrote:
> Thanks all for contributing to the discussion. I'll write up a JIRA
> before Monday and try to summarize the discussion and we can go from
> there. Hmmm, a LUCENE or SOLR JIRA? Does it matter?

That's a head scratcher.  The majority of what makes the build system 
tick is either in Lucene or applies to both, but I think most of the 
noisy tests are in Solr.

In one of the emails for this thread I did see a handful of Lucene tests 
mentioned as failing, but they seem to be a minority.

My bias would be to make it a LUCENE issue, because what Erick is trying 
to accomplish affects the entire codebase.  If it's later determined 
that there is no real concern about Lucene, it can be moved.  But if 
there's strong opposition to this, I won't obect to it being in SOLR 
from the start.

Thanks,
Shawn

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Erick Erickson <er...@gmail.com>.

Thanks all for contributing to the discussion. I'll write up a JIRA
before Monday and try to summarize the discussion and we can go from
there. Hmmm, a LUCENE or SOLR JIRA? Does it matter?

One thing I also realized is that my use-case would be served just
fine with an easy way to identify runs with awaitsfix=true set,
especially if it were in the subject line. That's an easy filter to
create so I can look at whichever one suits me at the moment. I did
check yesterday and there are 20+ tests with the AwaitsFix annotation.
An additional question is which annotation to repurpose, we can
discuss that on the JIRA as well.

Hoss' program won't report on tests that aren't run, so there's
additional value in runs with awaitsfix=true to give it some data to
chew on.

Erick

On Thu, Feb 22, 2018 at 6:41 AM, Yonik Seeley <ys...@gmail.com> wrote:
> On Thu, Feb 22, 2018 at 8:59 AM, Adrien Grand <jp...@gmail.com> wrote:
>> I understand your point Yonik, but the practical consequences are worse
>
> I think that's what we were debating though.  IMO, the overall
> practical consequences of simply sweeping the problem under the rug by
> disabling all flakey tests is actually worse.
> I agree with the issue of noise in automated reports decreasing their
> relevance, but it's an independent (albeit related) issue and there
> are better ways to fix that.
>
> But at this point, I think I'll defer to the people who have made time
> to work on this problem (and adding @Ignore to flakey tests isn't
> actually decreasing the number of flakey tests).
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Yonik Seeley <ys...@gmail.com>.

On Thu, Feb 22, 2018 at 8:59 AM, Adrien Grand <jp...@gmail.com> wrote:
> I understand your point Yonik, but the practical consequences are worse

I think that's what we were debating though.  IMO, the overall
practical consequences of simply sweeping the problem under the rug by
disabling all flakey tests is actually worse.
I agree with the issue of noise in automated reports decreasing their
relevance, but it's an independent (albeit related) issue and there
are better ways to fix that.

But at this point, I think I'll defer to the people who have made time
to work on this problem (and adding @Ignore to flakey tests isn't
actually decreasing the number of flakey tests).

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Adrien Grand <jp...@gmail.com>.

+1 Dawid

I understand your point Yonik, but the practical consequences are worse
than disabling these tests like Erick pointed out in his initial emails.

If we are concerned about forgetting these disabled tests, which is a
concern I agree, I think Uwe's idea to add a weekly job that runs with
-Dtests.awaitsfix=true
is a good compromise.

Le jeu. 22 févr. 2018 à 07:55, Dawid Weiss <da...@gmail.com> a écrit :

>
> Don't know, Yonik... If I make a change I am interested in regressions
> from the start state; running flaky tests makes it impossible and
> frustrating (and pointless in my opinion). I don't think my expectations
> are that much off from the average - you may wake up being the only person
> who has the defaults enabled, which seems wrong.
>
> Dawid
>
>
>
> On Feb 22, 2018 01:55, "Chris Hostetter" <ho...@fucit.org> wrote:
>
>>
>> : * Hoss has worked on aggregating all test failures from the 3 Jenkins
>> : systems (ASF, Policeman, and Steve's), downloading the test results &
>> logs,
>> : and running some reports/stats on failures. He should be ready to share
>> : this more publicly soon.
>>
>> I think Steve's linked to some of this before from jira comments, but it
>> was only recently I realized i've never explicitly said to the list "Hey
>> folks, here's a thing i've been working on" ...
>>
>>   http://fucit.org/solr-jenkins-reports/
>>   https://github.com/hossman/jenkins-reports/
>>
>> The most interesting bit is probably here...
>>
>>   http://fucit.org/solr-jenkins-reports/failure-report.html
>>
>> ...but there are currently a few caveats:
>>
>> 1) there's some noise inthe '7days' data because I wasn't accounting for
>> the way jenkins reports some types of failure -- that will gradually clean
>> itself up
>>
>> 2) I think i've been been blocked by builds.apache.org, so at the moment
>> the data seems to just be from the sarowe & policeman jenkins failures.
>>
>> 3) allthough the system is archiving the past 7 days worth of jenkins logs
>> for any jobs with failures, there is currently no easy way to download
>> the relevant log(s) from that failure report -- you currently have to
>> download a CSV file like this one to corrolate the test failures to the
>> jenkins job, and then go look for that job in the job-data dirs...
>>
>>   http://fucit.org/solr-jenkins-reports/reports/7days-method-failures.csv
>>   http://fucit.org/solr-jenkins-reports/job-data/
>>
>> (My hope is to make #3 trivial from failure-report.html -- so you can say
>> "hey weird, this test has failed X times, let's go download those logs."
>> right from a single screen in your browser)
>>
>>
>>
>>
>> -Hoss
>> http://www.lucidworks.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>

Re: Test failures are out of control......

Posted by Dawid Weiss <da...@gmail.com>.

Don't know, Yonik... If I make a change I am interested in regressions from
the start state; running flaky tests makes it impossible and frustrating
(and pointless in my opinion). I don't think my expectations are that much
off from the average - you may wake up being the only person who has the
defaults enabled, which seems wrong.

Dawid



On Feb 22, 2018 01:55, "Chris Hostetter" <ho...@fucit.org> wrote:

>
> : * Hoss has worked on aggregating all test failures from the 3 Jenkins
> : systems (ASF, Policeman, and Steve's), downloading the test results &
> logs,
> : and running some reports/stats on failures. He should be ready to share
> : this more publicly soon.
>
> I think Steve's linked to some of this before from jira comments, but it
> was only recently I realized i've never explicitly said to the list "Hey
> folks, here's a thing i've been working on" ...
>
>   http://fucit.org/solr-jenkins-reports/
>   https://github.com/hossman/jenkins-reports/
>
> The most interesting bit is probably here...
>
>   http://fucit.org/solr-jenkins-reports/failure-report.html
>
> ...but there are currently a few caveats:
>
> 1) there's some noise inthe '7days' data because I wasn't accounting for
> the way jenkins reports some types of failure -- that will gradually clean
> itself up
>
> 2) I think i've been been blocked by builds.apache.org, so at the moment
> the data seems to just be from the sarowe & policeman jenkins failures.
>
> 3) allthough the system is archiving the past 7 days worth of jenkins logs
> for any jobs with failures, there is currently no easy way to download
> the relevant log(s) from that failure report -- you currently have to
> download a CSV file like this one to corrolate the test failures to the
> jenkins job, and then go look for that job in the job-data dirs...
>
>   http://fucit.org/solr-jenkins-reports/reports/7days-method-failures.csv
>   http://fucit.org/solr-jenkins-reports/job-data/
>
> (My hope is to make #3 trivial from failure-report.html -- so you can say
> "hey weird, this test has failed X times, let's go download those logs."
> right from a single screen in your browser)
>
>
>
>
> -Hoss
> http://www.lucidworks.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Test failures are out of control......

Posted by Dawid Weiss <da...@gmail.com>.

You can also increase the suite timeout for a particular test with
@TimeoutSuite(...) (although making it too long doesn't seem right
too, even nightlies should terminate in a sensible time).

D.

On Tue, Feb 27, 2018 at 11:37 PM, Karl Wright <da...@gmail.com> wrote:
> It succeeds but it takes too long, so I committed a reduction in the number
> of iterations that are attempted for nightly from 200000 to 50000.  That
> should make it pass reliably.
>
> Karl
>
>
> On Tue, Feb 27, 2018 at 3:52 PM, Erick Erickson <er...@gmail.com>
> wrote:
>>
>> Karl:
>>
>> Let us know if you can't get this to fail locally, we can turn it back
>> on on Jenkins as long as it's getting active attention.
>>
>> Erick
>>
>> On Tue, Feb 27, 2018 at 12:40 PM, Karl Wright <da...@gmail.com> wrote:
>> > I looked at the Geo3d failures; these are timeouts exclusively.  I
>> > suspect
>> > we're seeing a test issue.  Will try to correct.
>> >
>> > Karl
>> >
>> >
>> > On Tue, Feb 27, 2018 at 1:51 PM, Chris Hostetter
>> > <ho...@fucit.org>
>> > wrote:
>> >>
>> >> : The most interesting bit is probably here...
>> >> :
>> >> :   http://fucit.org/solr-jenkins-reports/failure-report.html
>> >>
>> >> FYI:
>> >>
>> >> I realized this morning that the "Suite Runs" counts were being
>> >> artificially inflated for suites that are frequently SKIPPED (either
>> >> because they are @Nighly or @Slow and not run by all jenkins jobs).
>> >> I've
>> >> now fixed this, so at a glance the "suite level" failure rates are
>> >> much higher today then they have been if you looked at it in the past.
>> >>
>> >> It means it's also now possible for a Suite failure rate to be greater
>> >> then 100%, because sometimes it's reporting multiple suite level
>> >> failures
>> >> for a single run.
>> >>
>> >>
>> >> Other follow up on some previous comments...
>> >>
>> >> : 1) there's some noise inthe '7days' data because I wasn't accounting
>> >> for
>> >> : the way jenkins reports some types of failure -- that will gradually
>> >> clean
>> >> : itself up
>> >>
>> >> ...some of these "Class: xml" failures are still visible in the report
>> >> but
>> >> should drop off over the next few days
>> >>
>> >> : 2) I think i've been been blocked by builds.apache.org, so at the
>> >> moment
>> >> : the data seems to just be from the sarowe & policeman jenkins
>> >> failures.
>> >>
>> >> ...this is fixed.
>> >>
>> >> : 3) allthough the system is archiving the past 7 days worth of jenkins
>> >> logs
>> >> : for any jobs with failures, there is currently no easy way to
>> >> download
>> >> : the relevant log(s) from that failure report -- you currently have to
>> >>
>> >> ...this is also fixed.  now if you click on any row in the test you'll
>> >> get
>> >> a pop up showing you a link to all the job-data dirs that recorded a
>> >> failure on that report view.  The most recent jobs are listed first.
>> >>
>> >>
>> >>
>> >> -Hoss
>> >> http://www.lucidworks.com/
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: dev-help@lucene.apache.org
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Karl Wright <da...@gmail.com>.

It succeeds but it takes too long, so I committed a reduction in the number
of iterations that are attempted for nightly from 200000 to 50000.  That
should make it pass reliably.

Karl


On Tue, Feb 27, 2018 at 3:52 PM, Erick Erickson <er...@gmail.com>
wrote:

> Karl:
>
> Let us know if you can't get this to fail locally, we can turn it back
> on on Jenkins as long as it's getting active attention.
>
> Erick
>
> On Tue, Feb 27, 2018 at 12:40 PM, Karl Wright <da...@gmail.com> wrote:
> > I looked at the Geo3d failures; these are timeouts exclusively.  I
> suspect
> > we're seeing a test issue.  Will try to correct.
> >
> > Karl
> >
> >
> > On Tue, Feb 27, 2018 at 1:51 PM, Chris Hostetter <
> hossman_lucene@fucit.org>
> > wrote:
> >>
> >> : The most interesting bit is probably here...
> >> :
> >> :   http://fucit.org/solr-jenkins-reports/failure-report.html
> >>
> >> FYI:
> >>
> >> I realized this morning that the "Suite Runs" counts were being
> >> artificially inflated for suites that are frequently SKIPPED (either
> >> because they are @Nighly or @Slow and not run by all jenkins jobs).
> I've
> >> now fixed this, so at a glance the "suite level" failure rates are
> >> much higher today then they have been if you looked at it in the past.
> >>
> >> It means it's also now possible for a Suite failure rate to be greater
> >> then 100%, because sometimes it's reporting multiple suite level
> failures
> >> for a single run.
> >>
> >>
> >> Other follow up on some previous comments...
> >>
> >> : 1) there's some noise inthe '7days' data because I wasn't accounting
> for
> >> : the way jenkins reports some types of failure -- that will gradually
> >> clean
> >> : itself up
> >>
> >> ...some of these "Class: xml" failures are still visible in the report
> but
> >> should drop off over the next few days
> >>
> >> : 2) I think i've been been blocked by builds.apache.org, so at the
> moment
> >> : the data seems to just be from the sarowe & policeman jenkins
> failures.
> >>
> >> ...this is fixed.
> >>
> >> : 3) allthough the system is archiving the past 7 days worth of jenkins
> >> logs
> >> : for any jobs with failures, there is currently no easy way to download
> >> : the relevant log(s) from that failure report -- you currently have to
> >>
> >> ...this is also fixed.  now if you click on any row in the test you'll
> get
> >> a pop up showing you a link to all the job-data dirs that recorded a
> >> failure on that report view.  The most recent jobs are listed first.
> >>
> >>
> >>
> >> -Hoss
> >> http://www.lucidworks.com/
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Test failures are out of control......

Posted by Erick Erickson <er...@gmail.com>.

Karl:

Let us know if you can't get this to fail locally, we can turn it back
on on Jenkins as long as it's getting active attention.

Erick

On Tue, Feb 27, 2018 at 12:40 PM, Karl Wright <da...@gmail.com> wrote:
> I looked at the Geo3d failures; these are timeouts exclusively.  I suspect
> we're seeing a test issue.  Will try to correct.
>
> Karl
>
>
> On Tue, Feb 27, 2018 at 1:51 PM, Chris Hostetter <ho...@fucit.org>
> wrote:
>>
>> : The most interesting bit is probably here...
>> :
>> :   http://fucit.org/solr-jenkins-reports/failure-report.html
>>
>> FYI:
>>
>> I realized this morning that the "Suite Runs" counts were being
>> artificially inflated for suites that are frequently SKIPPED (either
>> because they are @Nighly or @Slow and not run by all jenkins jobs).  I've
>> now fixed this, so at a glance the "suite level" failure rates are
>> much higher today then they have been if you looked at it in the past.
>>
>> It means it's also now possible for a Suite failure rate to be greater
>> then 100%, because sometimes it's reporting multiple suite level failures
>> for a single run.
>>
>>
>> Other follow up on some previous comments...
>>
>> : 1) there's some noise inthe '7days' data because I wasn't accounting for
>> : the way jenkins reports some types of failure -- that will gradually
>> clean
>> : itself up
>>
>> ...some of these "Class: xml" failures are still visible in the report but
>> should drop off over the next few days
>>
>> : 2) I think i've been been blocked by builds.apache.org, so at the moment
>> : the data seems to just be from the sarowe & policeman jenkins failures.
>>
>> ...this is fixed.
>>
>> : 3) allthough the system is archiving the past 7 days worth of jenkins
>> logs
>> : for any jobs with failures, there is currently no easy way to download
>> : the relevant log(s) from that failure report -- you currently have to
>>
>> ...this is also fixed.  now if you click on any row in the test you'll get
>> a pop up showing you a link to all the job-data dirs that recorded a
>> failure on that report view.  The most recent jobs are listed first.
>>
>>
>>
>> -Hoss
>> http://www.lucidworks.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Karl Wright <da...@gmail.com>.

I looked at the Geo3d failures; these are timeouts exclusively.  I suspect
we're seeing a test issue.  Will try to correct.

Karl


On Tue, Feb 27, 2018 at 1:51 PM, Chris Hostetter <ho...@fucit.org>
wrote:

> : The most interesting bit is probably here...
> :
> :   http://fucit.org/solr-jenkins-reports/failure-report.html
>
> FYI:
>
> I realized this morning that the "Suite Runs" counts were being
> artificially inflated for suites that are frequently SKIPPED (either
> because they are @Nighly or @Slow and not run by all jenkins jobs).  I've
> now fixed this, so at a glance the "suite level" failure rates are
> much higher today then they have been if you looked at it in the past.
>
> It means it's also now possible for a Suite failure rate to be greater
> then 100%, because sometimes it's reporting multiple suite level failures
> for a single run.
>
>
> Other follow up on some previous comments...
>
> : 1) there's some noise inthe '7days' data because I wasn't accounting for
> : the way jenkins reports some types of failure -- that will gradually
> clean
> : itself up
>
> ...some of these "Class: xml" failures are still visible in the report but
> should drop off over the next few days
>
> : 2) I think i've been been blocked by builds.apache.org, so at the moment
> : the data seems to just be from the sarowe & policeman jenkins failures.
>
> ...this is fixed.
>
> : 3) allthough the system is archiving the past 7 days worth of jenkins
> logs
> : for any jobs with failures, there is currently no easy way to download
> : the relevant log(s) from that failure report -- you currently have to
>
> ...this is also fixed.  now if you click on any row in the test you'll get
> a pop up showing you a link to all the job-data dirs that recorded a
> failure on that report view.  The most recent jobs are listed first.
>
>
>
> -Hoss
> http://www.lucidworks.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Test failures are out of control......

Posted by Chris Hostetter <ho...@fucit.org>.

: The most interesting bit is probably here...
: 
:   http://fucit.org/solr-jenkins-reports/failure-report.html

FYI:

I realized this morning that the "Suite Runs" counts were being 
artificially inflated for suites that are frequently SKIPPED (either 
because they are @Nighly or @Slow and not run by all jenkins jobs).  I've 
now fixed this, so at a glance the "suite level" failure rates are 
much higher today then they have been if you looked at it in the past.

It means it's also now possible for a Suite failure rate to be greater 
then 100%, because sometimes it's reporting multiple suite level failures 
for a single run.


Other follow up on some previous comments...

: 1) there's some noise inthe '7days' data because I wasn't accounting for 
: the way jenkins reports some types of failure -- that will gradually clean 
: itself up

...some of these "Class: xml" failures are still visible in the report but 
should drop off over the next few days

: 2) I think i've been been blocked by builds.apache.org, so at the moment 
: the data seems to just be from the sarowe & policeman jenkins failures.

...this is fixed.

: 3) allthough the system is archiving the past 7 days worth of jenkins logs 
: for any jobs with failures, there is currently no easy way to download 
: the relevant log(s) from that failure report -- you currently have to 

...this is also fixed.  now if you click on any row in the test you'll get 
a pop up showing you a link to all the job-data dirs that recorded a 
failure on that report view.  The most recent jobs are listed first.



-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Chris Hostetter <ho...@fucit.org>.

: * Hoss has worked on aggregating all test failures from the 3 Jenkins
: systems (ASF, Policeman, and Steve's), downloading the test results & logs,
: and running some reports/stats on failures. He should be ready to share
: this more publicly soon.

I think Steve's linked to some of this before from jira comments, but it 
was only recently I realized i've never explicitly said to the list "Hey 
folks, here's a thing i've been working on" ...

  http://fucit.org/solr-jenkins-reports/
  https://github.com/hossman/jenkins-reports/

The most interesting bit is probably here...

  http://fucit.org/solr-jenkins-reports/failure-report.html

...but there are currently a few caveats:

1) there's some noise inthe '7days' data because I wasn't accounting for 
the way jenkins reports some types of failure -- that will gradually clean 
itself up

2) I think i've been been blocked by builds.apache.org, so at the moment 
the data seems to just be from the sarowe & policeman jenkins failures.

3) allthough the system is archiving the past 7 days worth of jenkins logs 
for any jobs with failures, there is currently no easy way to download 
the relevant log(s) from that failure report -- you currently have to 
download a CSV file like this one to corrolate the test failures to the 
jenkins job, and then go look for that job in the job-data dirs...

  http://fucit.org/solr-jenkins-reports/reports/7days-method-failures.csv
  http://fucit.org/solr-jenkins-reports/job-data/

(My hope is to make #3 trivial from failure-report.html -- so you can say 
"hey weird, this test has failed X times, let's go download those logs." 
right from a single screen in your browser)




-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Cassandra Targett <ca...@gmail.com>.

This issue is hugely important.

At Lucidworks we have implemented a "Test Confidence" role that focuses on
improving the ability of all members of the community to trust that
reported failures from any of the Jenkins systems are actual failures and
not flakey tests. This role rotates among the committers on our Solr Team,
and a committer is assigned to the role for 2-week periods of time. Our
goal is to have at least one committer on our team focused full-time on
improving test confidence at all times. (Just a note on timing, we started
this last summer, but we only recently reconfirmed our commitment to having
someone assigned to it at all times.)

One of the guidelines we've agreed to is that the person in the role should
not look (only) at tests he has worked on. Instead, he should focus on
tests that fail less than 100% of the time and/or are hard to reproduce
*even if he didn't write the test or the code*.

Another aspect of the Test Confidence role is to try to develop tools that
can help the community overall in improving this situation. Two things have
grown out of this effort so far:

* Steve Rowe's work on a Jenkins job to reproduce test failures
(LUCENE-8106)
* Hoss has worked on aggregating all test failures from the 3 Jenkins
systems (ASF, Policeman, and Steve's), downloading the test results & logs,
and running some reports/stats on failures. He should be ready to share
this more publicly soon.

I think it's important to understand that flakey tests will *never* go
away. There will always be a new flakey test to review/fix. Our goal should
be to make it so most of the time, you can assume the test is broken and
only discover it's flakey as part of digging.

The idea of @BadApple marking (or some other notation) is an OK idea, but
the problem is so bad today I worry it does nothing to find a way to ensure
they get fixed. Lots of JIRAs get filed for problems with tests - I count
about 180 open issues today - and many just sit there forever.

The biggest thing I want to to avoid is making it even easier to
avoid/ignore them. We should try to make it easier to highlight them, and
we need a concerted effort to fix the tests once they've been identified as
flakey.

On Wed, Feb 21, 2018 at 5:03 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> Hi,
>
> > Flakey Test Problems:
> > a) Flakey tests create so much noise that people no longer pay
> > attention to the automated reporting via email.
> > b) When running unit tests manually before a commit (i.e. "ant test")
> > a flakey test can fail.
> >
> > Solutions:
> > We cloud fix (a) by marking as flakey and having a new target
> > "non-flakey" that is run by the jenkins jobs that are currently run
> > continuously.
>
> We have a solution for this already: Mark all those tests with @AwaitsFix
> or @BadApple
> By default those aren't executed in Jenkins runs and also not for
> developers, but devs can enable/disable them using -Dtests.awaitsfix=true
> and -Dtests.badapples=true:
>
>      [help] # Test groups. ------------------------------
> ----------------------
>      [help] #
>      [help] # test groups can be enabled or disabled (true/false). Default
>      [help] # value provided below in [brackets].
>      [help]
>      [help] ant -Dtests.nightly=[false]   - nightly test group (@Nightly)
>      [help] ant -Dtests.weekly=[false]    - weekly tests (@Weekly)
>      [help] ant -Dtests.awaitsfix=[false] - known issue (@AwaitsFix)
>      [help] ant -Dtests.slow=[true]       - slow tests (@Slow)
>
> We can of course also make a weekly jenkins jobs that enables those tests
> on Jenkins only weekly (like nightly stuff). We have "tests.badapples" and
> "tests.awaitsfix" - I don't know what's the difference between both.
>
> So we have 2 options to classify tests, let's choose one and apply it to
> all Flakey tests!
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Test failures are out of control......

Posted by Jason Gerlowski <ge...@gmail.com>.

I don't have strong opinions about what we do with our existing flaky
tests.  I think re-running failures before commit might theoretically
catch more bugs than ignoring the test outright, but with all the
noise and how standard it is to need to rerun tests I'd be surprised
if the numbers are all that different.

Where I see some potential common ground though is in preventing new
flaky tests.  And in the long run, I think what we do to prevent new
flakes is going to be much more important than how we handle the
BadApples we have at this particular instant.  If we can put our
finger in the dam, the existing flakiness becomes much easier to put a
dent in.

I'm curious what you guys had in mind when you mentioned preventing
new flaky tests from popping up.  What are our options for "enforcing"
that?  Were you imagining reopening JIRAs and asking the original
committer to investigate?  Or outright reverting commits that
introduce flaky tests?  Or something in between (like disabling
features with flaky tests prior to releases)?

Best,

Jason

On Wed, Feb 21, 2018 at 6:03 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> Hi,
>
>> Flakey Test Problems:
>> a) Flakey tests create so much noise that people no longer pay
>> attention to the automated reporting via email.
>> b) When running unit tests manually before a commit (i.e. "ant test")
>> a flakey test can fail.
>>
>> Solutions:
>> We cloud fix (a) by marking as flakey and having a new target
>> "non-flakey" that is run by the jenkins jobs that are currently run
>> continuously.
>
> We have a solution for this already: Mark all those tests with @AwaitsFix or @BadApple
> By default those aren't executed in Jenkins runs and also not for developers, but devs can enable/disable them using -Dtests.awaitsfix=true and -Dtests.badapples=true:
>
>      [help] # Test groups. ----------------------------------------------------
>      [help] #
>      [help] # test groups can be enabled or disabled (true/false). Default
>      [help] # value provided below in [brackets].
>      [help]
>      [help] ant -Dtests.nightly=[false]   - nightly test group (@Nightly)
>      [help] ant -Dtests.weekly=[false]    - weekly tests (@Weekly)
>      [help] ant -Dtests.awaitsfix=[false] - known issue (@AwaitsFix)
>      [help] ant -Dtests.slow=[true]       - slow tests (@Slow)
>
> We can of course also make a weekly jenkins jobs that enables those tests on Jenkins only weekly (like nightly stuff). We have "tests.badapples" and "tests.awaitsfix" - I don't know what's the difference between both.
>
> So we have 2 options to classify tests, let's choose one and apply it to all Flakey tests!
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

RE: Test failures are out of control......

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi,

> Flakey Test Problems:
> a) Flakey tests create so much noise that people no longer pay
> attention to the automated reporting via email.
> b) When running unit tests manually before a commit (i.e. "ant test")
> a flakey test can fail.
> 
> Solutions:
> We cloud fix (a) by marking as flakey and having a new target
> "non-flakey" that is run by the jenkins jobs that are currently run
> continuously.

We have a solution for this already: Mark all those tests with @AwaitsFix or @BadApple
By default those aren't executed in Jenkins runs and also not for developers, but devs can enable/disable them using -Dtests.awaitsfix=true and -Dtests.badapples=true:

     [help] # Test groups. ----------------------------------------------------
     [help] #
     [help] # test groups can be enabled or disabled (true/false). Default
     [help] # value provided below in [brackets].
     [help]
     [help] ant -Dtests.nightly=[false]   - nightly test group (@Nightly)
     [help] ant -Dtests.weekly=[false]    - weekly tests (@Weekly)
     [help] ant -Dtests.awaitsfix=[false] - known issue (@AwaitsFix)
     [help] ant -Dtests.slow=[true]       - slow tests (@Slow)

We can of course also make a weekly jenkins jobs that enables those tests on Jenkins only weekly (like nightly stuff). We have "tests.badapples" and "tests.awaitsfix" - I don't know what's the difference between both.

So we have 2 options to classify tests, let's choose one and apply it to all Flakey tests!

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

RE: Test failures are out of control......

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hey Yonik,

Have you read my e-mail? I just said that there is no need to add another sysprop as its already there! The default value for the sysprop is just a common-build.xml one-line change.

BTW, as I don't care about Solr tests most of the time, I disabled them completely on my local machine using lucene.build.properties in my user's home directory. Every developer can do the same on his own lucene.build.properties file (e.g. enable/disable bad apples). Just the default should be decided here.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Yonik Seeley [mailto:yseeley@gmail.com]
> Sent: Thursday, February 22, 2018 12:17 AM
> To: Solr/Lucene Dev <de...@lucene.apache.org>
> Subject: Re: Test failures are out of control......
> 
> On Wed, Feb 21, 2018 at 6:13 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> >> > There are exactly three
> >> > BadApple annotations in the entire code base at present, is there
> >> > enough value in introducing another annotation to make it worthwhile?
> >>
> >> If we change BadApple tests to be executed by default for "ant test"
> >> (but not for the most frequent jenkins jobs), then that would be fine.
> >> Basically, add a -Dtests.disable-badapples and use that for the
> >> jenkins jobs that email the list all the time.
> >
> > No need for a new sysprop. It's already there, just inverted! Configuring
> Jenkins to enable disable them is trivial.
> 
> The issue is that flakey tests should not be ignored by developers
> running unit tests before committing new changes.  That's the most
> important point in time for test coverage.
> 
> -Yonik
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Yonik Seeley <ys...@gmail.com>.

On Wed, Feb 21, 2018 at 6:13 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>> > There are exactly three
>> > BadApple annotations in the entire code base at present, is there
>> > enough value in introducing another annotation to make it worthwhile?
>>
>> If we change BadApple tests to be executed by default for "ant test"
>> (but not for the most frequent jenkins jobs), then that would be fine.
>> Basically, add a -Dtests.disable-badapples and use that for the
>> jenkins jobs that email the list all the time.
>
> No need for a new sysprop. It's already there, just inverted! Configuring Jenkins to enable disable them is trivial.

The issue is that flakey tests should not be ignored by developers
running unit tests before committing new changes.  That's the most
important point in time for test coverage.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

RE: Test failures are out of control......

Posted by Uwe Schindler <uw...@thetaphi.de>.

> > There are exactly three
> > BadApple annotations in the entire code base at present, is there
> > enough value in introducing another annotation to make it worthwhile?
> 
> If we change BadApple tests to be executed by default for "ant test"
> (but not for the most frequent jenkins jobs), then that would be fine.
> Basically, add a -Dtests.disable-badapples and use that for the
> jenkins jobs that email the list all the time.

No need for a new sysprop. It's already there, just inverted! Configuring Jenkins to enable disable them is trivial.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Yonik Seeley <ys...@gmail.com>.

On Wed, Feb 21, 2018 at 5:52 PM, Erick Erickson <er...@gmail.com> wrote:

> There are exactly three
> BadApple annotations in the entire code base at present, is there
> enough value in introducing another annotation to make it worthwhile?

If we change BadApple tests to be executed by default for "ant test"
(but not for the most frequent jenkins jobs), then that would be fine.
Basically, add a -Dtests.disable-badapples and use that for the
jenkins jobs that email the list all the time.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Erick Erickson <er...@gmail.com>.

Yonik:

Good discussion. I'm not wedded to a particular solution, it's just
the current direction is not sustainable.

I'll back up a bit and see if I can state my goals more clearly, it
looks like we're arguing for much the same thing.

> I want e-mail messages with test failures to be worth looking at. When I see that a test fail, I don't want to waste a time trying to figure out whether it's something newly introduced or not. I also want some less painful way to say "this change broke tests" rather than "this change may or may not have broken tests. Could somebody beast the old and new versions 100 times and hope that's enough to make a determination?". This looks like your (a).

> When I make a change, I want to be able to quickly determine whether my changes likely the cause of test failures or not. This looks like your (b). If we annotate all flakey tests that would be a significant help since it would be easy to glance at the test to see if it's a known flakey test or not. Armed with that knowledge I can be more comfortable with having it succeed a few times and chalking it up to flakey tests.

> I want to stop the downward trend we've been experiencing lately with more and more tests failing.

An annotation makes that possible I think, although I'm not clear on
why @Flakey this is superior to @BadApple. There are exactly three
BadApple annotations in the entire code base at present, is there
enough value in introducing another annotation to make it worthwhile?
Or could we just figure out whether any of those three tests that use
@BadApple should be changed to, say, @Ignore and then use @BadApple
for the rest? Perhaps we change the build system to enable BadApple by
default when running locally (or, conversely, enabling BadApple on
Jenkins).

Alternatively would it be possible to turn off e-mail notifications of
failures for @Flakey (or @BadApple, whatever) tests? That would do
too. That probably has the added advantage of allowing some reporting
tools to continue to function.

bq: And we can *always* decide to prevent new flakey tests, regardless
of what we do about the existing flakey tests...

We haven't been doing this though, flakey tests have been
proliferating. Mark's tool hasn't been run since last August unless
there's a newer URL than I'm looking at:
http://solr-tests.bitballoon.com/. I'm not so interested in what we
_could_ do as what we _are_ doing. And even tools such as this require
someone to monitor/complain/whimper. And I don't see volunteers
stepping forward. It's much easier to have a system where any failure
is unusual than count on people to wade through voluminous output.

bq: Just because we are frustrated doesn't mean that *any* change is positive.

Of course not. But nobody else seems to be bringing the topic up so I
thought I would.

On Wed, Feb 21, 2018 at 1:49 PM, Yonik Seeley <ys...@gmail.com> wrote:
> On Wed, Feb 21, 2018 at 3:26 PM, Erick Erickson <er...@gmail.com> wrote:
>> Yonik:
>>
>> What I'm frustrated by now is that variations on these themes haven't
>> cured the problem, and it's spun out of control and is getting worse.
>
> I understand, but what problem(s) are you trying to solve?  Just
> because we are frustrated doesn't mean that *any* change is positive.
> Some changes can have a definite negative affect on software quality.
>
> You didn't respond to the main thrust of my message, so let me try to
> explain it again more succinctly:
>
> Flakey Test Problems:
> a) Flakey tests create so much noise that people no longer pay
> attention to the automated reporting via email.
> b) When running unit tests manually before a commit (i.e. "ant test")
> a flakey test can fail.
>
> Solutions:
> We cloud fix (a) by marking as flakey and having a new target
> "non-flakey" that is run by the jenkins jobs that are currently run
> continuously.
> For (b) "ant test" should still include the flakey tests since it's
> better to have to re-run a seemingly unrelated test to determine if
> one broke something rather than increase committed bugs due to loss of
> test coverage.  It's a pain, but perhaps it should be.  It's a real
> problem that needs fixing and @Ignoring it won't work as a better
> mechanism to get it fixed.  Sweeping it under the rug would seem to
> ensure that it gets less attention.
>
> And we can *always* decide to prevent new flakey tests, regardless of
> what we do about the existing flakey tests.  Mark's tool is a good way
> to see what the current list of flakey tests is.
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Yonik Seeley <ys...@gmail.com>.

On Wed, Feb 21, 2018 at 3:26 PM, Erick Erickson <er...@gmail.com> wrote:
> Yonik:
>
> What I'm frustrated by now is that variations on these themes haven't
> cured the problem, and it's spun out of control and is getting worse.

I understand, but what problem(s) are you trying to solve?  Just
because we are frustrated doesn't mean that *any* change is positive.
Some changes can have a definite negative affect on software quality.

You didn't respond to the main thrust of my message, so let me try to
explain it again more succinctly:

Flakey Test Problems:
a) Flakey tests create so much noise that people no longer pay
attention to the automated reporting via email.
b) When running unit tests manually before a commit (i.e. "ant test")
a flakey test can fail.

Solutions:
We cloud fix (a) by marking as flakey and having a new target
"non-flakey" that is run by the jenkins jobs that are currently run
continuously.
For (b) "ant test" should still include the flakey tests since it's
better to have to re-run a seemingly unrelated test to determine if
one broke something rather than increase committed bugs due to loss of
test coverage.  It's a pain, but perhaps it should be.  It's a real
problem that needs fixing and @Ignoring it won't work as a better
mechanism to get it fixed.  Sweeping it under the rug would seem to
ensure that it gets less attention.

And we can *always* decide to prevent new flakey tests, regardless of
what we do about the existing flakey tests.  Mark's tool is a good way
to see what the current list of flakey tests is.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Erick Erickson <er...@gmail.com>.

Yonik:

What I'm frustrated by now is that variations on these themes haven't
cured the problem, and it's spun out of control and is getting worse.
It's the "getting worse" part that is most disturbing. Continuing as
we have in the past isn't working, it's time to try something else.

There are 17 open JIRAs for tests right now. Some as far back as 2010,
listed below. Since last night I've collected 49 distinct failures
from the dev e-mails (haven't triaged them completely to see if some
are contained in others, but "sort < all_of_them | uniq > my_file"
results in a file 49 lines long).

What I'm after here is a way to keep from backsliding further, and a
path to getting better. That's what's behind the straw-man proposal
that we get the tests to be clean, even if that means disabling the
ones that are flakey. I should have emphasized more strongly that the
corollary to disabling flakey tests is that we need to get aggressive
about not tolerating _new_ flaky tests. I'm volunteering (I guess) to
be the enforcer here, as much as public comments can be construed to
be "enforcing" ;)

If someone has a favorite test that they think adds value even if it
fails fairly frequently, we can un-BadApple it provided someone is
actively working on it. Or it can be un-BadAppled locally and/or
temporarily. I'm perfectly fine with flakey tests being run and
reported _if_ that helps resolve it.

Also I'm volunteering to produce a "weekly BadApple" list so people
can work on them as they see fit expressly to keep them from getting
lost.

(1) I have  no idea how to do this, or even if it's possible. What do
you have in mind?

(2) doesn't seem to be working based on the open JIRAs below and the
number of failing tests that are accumulating.

(3a) Well, the first part is what my "enforcing" comment is about
above I think ;)

(3b) I'd argue that the part about "without losing test coverage" is
partly addressed by the notion of running periodically with BadApple
disabled. Which each individual can also do at their discretion if
they care to. More importantly though, the test coverage isn't very
useful when failures are ignored anyway.

(4) I thoroughly applaud that one long term, but I'll settle in the
short term for doing something to keep even _more_ from accumulating.

I should also emphasize that disabling tests is certainly _NOT_ my
preference, fixing them all is. I doubt that declaring a moratorium on
all commits until all the tests were fixed is practical though ;) And
without something changing in our approach, I don't see much progress
being made.

SOLR-10053
SOLR-10070
SOLR-10071
SOLR-10139
SOLR-10287
SOLR-10815
SOLR-11911
SOLR-2175
SOLR-4147
SOLR-5880
SOLR-6423
SOLR-6944
SOLR-6961
SOLR-6974
SOLR-8122
SOLR-8182
SOLR-9869

On Wed, Feb 21, 2018 at 11:12 AM, Yonik Seeley <ys...@gmail.com> wrote:
> We should be careful not to conflate running of unit tests with
> automated reporting, and the differing roles that flakey tests play in
> different scenarios.
> For example, I no longer pay attention to automated failure reports,
> esp if I haven't committed anything recently.
> However, when I'm making code changes and do "ant test", I certainly
> pay attention to failures and re-run any failing tests.  It sucks to
> have to re-run a test just because it's flakey, but it's better than
> accidentally committing a bug because test coverage was reduced.
>
> I'd suggest:
> 1) fix/tweak automated test reporting to increase relevancy for developers
> 2) open a JIRA for each flakey test and evaluate impact of removal on
> test coverage
> 3) If a new feature is added, and the test turns out to be flakey,
> then the feature itself should be disabled before release.  This
> prevents both new flakey tests without resulting in loss of test
> coverage, as well as motivates those who care about the feature to fix
> the tests.
> 4) fix flakey tests ;-)
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Yonik Seeley <ys...@gmail.com>.

We should be careful not to conflate running of unit tests with
automated reporting, and the differing roles that flakey tests play in
different scenarios.
For example, I no longer pay attention to automated failure reports,
esp if I haven't committed anything recently.
However, when I'm making code changes and do "ant test", I certainly
pay attention to failures and re-run any failing tests.  It sucks to
have to re-run a test just because it's flakey, but it's better than
accidentally committing a bug because test coverage was reduced.

I'd suggest:
1) fix/tweak automated test reporting to increase relevancy for developers
2) open a JIRA for each flakey test and evaluate impact of removal on
test coverage
3) If a new feature is added, and the test turns out to be flakey,
then the feature itself should be disabled before release.  This
prevents both new flakey tests without resulting in loss of test
coverage, as well as motivates those who care about the feature to fix
the tests.
4) fix flakey tests ;-)

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Erick Erickson <er...@gmail.com>.

Dawid:
Yep, definitely a recurring theme. But this time I may actually, you
know, do something about it ;)

Mark is one of the advocates of this theme, perhaps he got exhausted
trying to push that stone up the hill ;). Maybe it's my turn to pick
up the baton.... Comments about there being value to seeing these is
well taken, but outweighed IMO by the harm in there being so much
noise that failures that _should_ get attention are so easy to
overlook.

bq: The noise in Solr tests have increased to a degree that I stopped looking.

Exactly. To one degree or another I think this has happened to a _lot_
of people, myself certainly included.

And you've certainly done more than your share of fixing things in the
infrastructure, many thanks!

----------------

I'm not sure blanket @BadApple-ing these is The Right Thing To Do for
_all_ of them though as I know lots of active work is being done in
some areas. I'd hate for someone to be working in some area and
currently trying to fix something and have the  failures disappear and
think they were fixed when in reality they just weren't run.

Straw-man proposal:

> I'll volunteer to gather failing tests through the next few days from the dev e-mails. I'll create yet another umbrella JIRA that proposes to @BadApple _all_ of them unless someone steps up and volunteers to actively work on a particular test failure. Since I brought it up I'll get aggressive about @BadApple-ing failing tests in future. I'll link the current JIRAs for failing tests in as well (on a cursory glance there are 16 open ones)...

> If someone objects to @BadApple-ing a particular test, they should create a JIRA, assign it to themselves and actively work on it. Short shrift given to "I don't think we should @BadApple that test because someday someone might want to try to fix it".... In this proposal, it's perfectly acceptable to remove the @BadApple notation and push it, as long as it's being actively worked on.

> Would someone who knows the test infrastructure better than me be willing to volunteer to set up a run periodically with BadApple annotations disabled. Perhaps weekly? Even nightly? That way interested parties can see these but the rest of us would only have _one_ e-mail to ignore, not 10-20 a day. It'd be great if the subject mentioned something allowing the WithBadApple runs to be identified just by glancing at the subject..... Then errors that didn't have BadApple annotations would stand out from the noise since they would be in _other_ emails.

> It's easy enough to find all the BadApple-labeled tests, I'll also volunteer to post a weekly list.

Getting e-mails for flakey tests is acceptable IMO only if people are
working on them. I've certainly been in situations where I can't get
something to fail locally and have to rely on Jenkins etc to gather
logging info or see if my fixes really work. I _do_ care that we are
accumulating more and more failures and it's getting harder and harder
to know when failures are a function of new code or not.

WDYT?
Erick

On Wed, Feb 21, 2018 at 8:36 AM, Tommaso Teofili
<to...@gmail.com> wrote:
> +1, agree with Adrien, thanks for bringing this up Erick!
>
>
>
> Il giorno mer 21 feb 2018 alle ore 17:15 Adrien Grand <jp...@gmail.com> ha
> scritto:
>>
>> Thanks for bringing this up Erick. I agree with you we should silence
>> those frequent failures. Like you said, the side-effects of not silencing
>> them are even worse. I'll add that these flaky tests also make releasing
>> harder, it took me three runs last time (Lucene/Solr 7.2) for the release
>> build to succeed because of failed tests.
>>
>> Le mer. 21 févr. 2018 à 16:52, Erick Erickson <er...@gmail.com> a
>> écrit :
>>>
>>> There's an elephant in the room, and it's that failing tests are being
>>> ignored. Mind you, Solr and Lucene are progressing at a furious pace
>>> with lots of great functionality being added. That said, we're
>>> building up a considerable "technical debt" when it comes to testing.
>>>
>>> And I should say up front that major new functionality is expected to
>>> take a while to shake out (e.g. autoscaling, streaming, V2 API etc.),
>>> and noise from tests of new functionality is expected while things
>>> bake.
>>>
>>> Below is a list of tests that have failed at least once since just
>>> last night. This has been getting worse as time passes, the broken
>>> window problem. Some e-mails have 10 failing tests (+/-) so unless I
>>> go through each and every one I don't know whether something I've done
>>> is causing a problem or not.
>>>
>>> I'm as guilty of letting things slide as anyone else, there's been a
>>> long-standing issue with TestLazyCores that I work on sporadically for
>>> instance that's _probably_ "something in the test framework"....
>>>
>>> Several folks have spent some time digging into test failures and
>>> identifying at least some of the causes, kudos to them. It seems
>>> they're voices crying out in the wilderness though.
>>>
>>> There is so much noise at this point that tests are becoming
>>> irrelevant. I'm trying to work on SOLR-10809 for instance, where
>>> there's a pretty good possibility that I'll close at least one thing
>>> that shouldn't be closed. So I ran the full suite 10 times and
>>> gathered all the failures. Now I have to try to separate the failures
>>> caused by that JIRA from the ones that aren't related to it so I beast
>>> each of the failing tests 100 times against master. If I get a failure
>>> on master too for a particular test, I'll assume it's "not my problem"
>>> and drive on.
>>>
>>> I freely acknowledge that this is poor practice. It's driven by
>>> frustration and the desire to make progress. While it's poor practice,
>>> it's not as bad as only looking at tests that I _think_ are related or
>>> ignoring all tests failures I can't instantly recognize as "my fault".
>>>
>>> So what's our stance on this? Mark Miller had a terrific program at
>>> one point allowing categorization of tests that failed at a glance,
>>> but it hasn't been updated in a while.  Steve Rowe is working on the
>>> problem too. Hoss and Cassandra have both added to the efforts as
>>> well. And I'm sure I'm leaving out others.
>>>
>>> Then there's the @Ignore and @BadApple annotations....
>>>
>>> So, as a community, are we going to devote some energy to this? Or
>>> shall we just start ignoring all of the frequently failing tests?
>>> Frankly we'd be farther ahead at this point marking failing tests that
>>> aren't getting any work with @Ignore or @BadApple and getting
>>> compulsive about not letting any _new_ tests fail than continuing our
>>> current path. I don't _like_ this option mind you, but it's better
>>> than letting these accumulate forever and tests become more and more
>>> difficult to use. As tests become more difficult to use, they're used
>>> less and the problem gets worse.
>>>
>>> Note, I made no effort to separate suite .vs. individual reports
>>> here.....
>>>
>>> Erick
>>>
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.lucene.index.TestBagOfPositions
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.lucene.index.TestIndexWriterDeleteByQuery
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.lucene.store.TestSleepingLockWrapper
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyFieldFacetCloudTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyFieldFacetExtrasCloudTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyQueryFacetCloudTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.client.solrj.TestLBHttpSolrClient
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.cloud.TestSolrCloudWithSecureImpersonation
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.cloud.autoscaling.AutoAddReplicasPlanActionTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.core.AlternateDirectoryTest
>>> FAILED:  junit.framework.TestSuite.org.apache.solr.core.TestLazyCores
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.handler.component.DistributedFacetPivotSmallAdvancedTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.ltr.TestSelectiveWeightCreation
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.ltr.store.rest.TestModelManager
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.search.join.BlockJoinFacetDistribTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.security.TestAuthorizationFramework
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.update.processor.TestOpenNLPExtractNamedEntitiesUpdateProcessorFactory
>>> FAILED:  org.apache.lucene.index.TestStressNRT.test
>>> FAILED:  org.apache.solr.cloud.AddReplicaTest.test
>>> FAILED:  org.apache.solr.cloud.DeleteShardTest.test
>>> FAILED:  org.apache.solr.cloud.PeerSyncReplicationTest.test
>>> FAILED:  org.apache.solr.cloud.ReplaceNodeNoTargetTest.test
>>> FAILED:  org.apache.solr.cloud.TestUtilizeNode.test
>>> FAILED:
>>> org.apache.solr.cloud.api.collections.CollectionsAPIDistributedZkTest.testCollectionsAPI
>>> FAILED:
>>> org.apache.solr.cloud.api.collections.ShardSplitTest.testSplitAfterFailedSplit
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.AutoAddReplicasIntegrationTest.testSimple
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.ComputePlanActionTest.testNodeWithMultipleReplicasLost
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.HdfsAutoAddReplicasIntegrationTest.testSimple
>>> FAILED:  org.apache.solr.cloud.autoscaling.SystemLogListenerTest.test
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testEventQueue
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testMetricTrigger
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testSearchRate
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.sim.TestLargeCluster.testSearchRate
>>> FAILED:
>>> org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication
>>> FAILED:
>>> org.apache.solr.handler.admin.AutoscalingHistoryHandlerTest.testHistory
>>> FAILED:
>>> org.apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory.testCanHandleDecodingAndEncodingForSynonyms
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Dawid Weiss <da...@gmail.com>.

It's a recurring theme, huh, Erick? :)

http://markmail.org/message/7eykbuyyaxbxn364

I agree with your opinion and I have expressed it more than once -- a
test that is failing for longer while and cannot be identified or
fixed is a candidate for removal. The noise in Solr tests have
increased to a degree that I stopped looking (a long time ago), unless
somebody explicitly pings me about something. I tried to fix some of
those tests, but it's beyond my capabilities in many cases (and my
time budget in others).

I also recall some folks had a different take on the subject; see Mark
Miller's opinion in the thread above, for example (there were other
threads too, but I can't find them now).

Dawid


On Wed, Feb 21, 2018 at 5:36 PM, Tommaso Teofili
<to...@gmail.com> wrote:
> +1, agree with Adrien, thanks for bringing this up Erick!
>
>
>
> Il giorno mer 21 feb 2018 alle ore 17:15 Adrien Grand <jp...@gmail.com> ha
> scritto:
>>
>> Thanks for bringing this up Erick. I agree with you we should silence
>> those frequent failures. Like you said, the side-effects of not silencing
>> them are even worse. I'll add that these flaky tests also make releasing
>> harder, it took me three runs last time (Lucene/Solr 7.2) for the release
>> build to succeed because of failed tests.
>>
>> Le mer. 21 févr. 2018 à 16:52, Erick Erickson <er...@gmail.com> a
>> écrit :
>>>
>>> There's an elephant in the room, and it's that failing tests are being
>>> ignored. Mind you, Solr and Lucene are progressing at a furious pace
>>> with lots of great functionality being added. That said, we're
>>> building up a considerable "technical debt" when it comes to testing.
>>>
>>> And I should say up front that major new functionality is expected to
>>> take a while to shake out (e.g. autoscaling, streaming, V2 API etc.),
>>> and noise from tests of new functionality is expected while things
>>> bake.
>>>
>>> Below is a list of tests that have failed at least once since just
>>> last night. This has been getting worse as time passes, the broken
>>> window problem. Some e-mails have 10 failing tests (+/-) so unless I
>>> go through each and every one I don't know whether something I've done
>>> is causing a problem or not.
>>>
>>> I'm as guilty of letting things slide as anyone else, there's been a
>>> long-standing issue with TestLazyCores that I work on sporadically for
>>> instance that's _probably_ "something in the test framework"....
>>>
>>> Several folks have spent some time digging into test failures and
>>> identifying at least some of the causes, kudos to them. It seems
>>> they're voices crying out in the wilderness though.
>>>
>>> There is so much noise at this point that tests are becoming
>>> irrelevant. I'm trying to work on SOLR-10809 for instance, where
>>> there's a pretty good possibility that I'll close at least one thing
>>> that shouldn't be closed. So I ran the full suite 10 times and
>>> gathered all the failures. Now I have to try to separate the failures
>>> caused by that JIRA from the ones that aren't related to it so I beast
>>> each of the failing tests 100 times against master. If I get a failure
>>> on master too for a particular test, I'll assume it's "not my problem"
>>> and drive on.
>>>
>>> I freely acknowledge that this is poor practice. It's driven by
>>> frustration and the desire to make progress. While it's poor practice,
>>> it's not as bad as only looking at tests that I _think_ are related or
>>> ignoring all tests failures I can't instantly recognize as "my fault".
>>>
>>> So what's our stance on this? Mark Miller had a terrific program at
>>> one point allowing categorization of tests that failed at a glance,
>>> but it hasn't been updated in a while.  Steve Rowe is working on the
>>> problem too. Hoss and Cassandra have both added to the efforts as
>>> well. And I'm sure I'm leaving out others.
>>>
>>> Then there's the @Ignore and @BadApple annotations....
>>>
>>> So, as a community, are we going to devote some energy to this? Or
>>> shall we just start ignoring all of the frequently failing tests?
>>> Frankly we'd be farther ahead at this point marking failing tests that
>>> aren't getting any work with @Ignore or @BadApple and getting
>>> compulsive about not letting any _new_ tests fail than continuing our
>>> current path. I don't _like_ this option mind you, but it's better
>>> than letting these accumulate forever and tests become more and more
>>> difficult to use. As tests become more difficult to use, they're used
>>> less and the problem gets worse.
>>>
>>> Note, I made no effort to separate suite .vs. individual reports
>>> here.....
>>>
>>> Erick
>>>
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.lucene.index.TestBagOfPositions
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.lucene.index.TestIndexWriterDeleteByQuery
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.lucene.store.TestSleepingLockWrapper
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyFieldFacetCloudTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyFieldFacetExtrasCloudTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyQueryFacetCloudTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.client.solrj.TestLBHttpSolrClient
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.cloud.TestSolrCloudWithSecureImpersonation
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.cloud.autoscaling.AutoAddReplicasPlanActionTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.core.AlternateDirectoryTest
>>> FAILED:  junit.framework.TestSuite.org.apache.solr.core.TestLazyCores
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.handler.component.DistributedFacetPivotSmallAdvancedTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.ltr.TestSelectiveWeightCreation
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.ltr.store.rest.TestModelManager
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.search.join.BlockJoinFacetDistribTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.security.TestAuthorizationFramework
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.update.processor.TestOpenNLPExtractNamedEntitiesUpdateProcessorFactory
>>> FAILED:  org.apache.lucene.index.TestStressNRT.test
>>> FAILED:  org.apache.solr.cloud.AddReplicaTest.test
>>> FAILED:  org.apache.solr.cloud.DeleteShardTest.test
>>> FAILED:  org.apache.solr.cloud.PeerSyncReplicationTest.test
>>> FAILED:  org.apache.solr.cloud.ReplaceNodeNoTargetTest.test
>>> FAILED:  org.apache.solr.cloud.TestUtilizeNode.test
>>> FAILED:
>>> org.apache.solr.cloud.api.collections.CollectionsAPIDistributedZkTest.testCollectionsAPI
>>> FAILED:
>>> org.apache.solr.cloud.api.collections.ShardSplitTest.testSplitAfterFailedSplit
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.AutoAddReplicasIntegrationTest.testSimple
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.ComputePlanActionTest.testNodeWithMultipleReplicasLost
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.HdfsAutoAddReplicasIntegrationTest.testSimple
>>> FAILED:  org.apache.solr.cloud.autoscaling.SystemLogListenerTest.test
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testEventQueue
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testMetricTrigger
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testSearchRate
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.sim.TestLargeCluster.testSearchRate
>>> FAILED:
>>> org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication
>>> FAILED:
>>> org.apache.solr.handler.admin.AutoscalingHistoryHandlerTest.testHistory
>>> FAILED:
>>> org.apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory.testCanHandleDecodingAndEncodingForSynonyms
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Test failures are out of control......

Posted by Tommaso Teofili <to...@gmail.com>.

+1, agree with Adrien, thanks for bringing this up Erick!



Il giorno mer 21 feb 2018 alle ore 17:15 Adrien Grand <jp...@gmail.com>
ha scritto:

> Thanks for bringing this up Erick. I agree with you we should silence
> those frequent failures. Like you said, the side-effects of not silencing
> them are even worse. I'll add that these flaky tests also make releasing
> harder, it took me three runs last time (Lucene/Solr 7.2) for the release
> build to succeed because of failed tests.
>
> Le mer. 21 févr. 2018 à 16:52, Erick Erickson <er...@gmail.com> a
> écrit :
>
>> There's an elephant in the room, and it's that failing tests are being
>> ignored. Mind you, Solr and Lucene are progressing at a furious pace
>> with lots of great functionality being added. That said, we're
>> building up a considerable "technical debt" when it comes to testing.
>>
>> And I should say up front that major new functionality is expected to
>> take a while to shake out (e.g. autoscaling, streaming, V2 API etc.),
>> and noise from tests of new functionality is expected while things
>> bake.
>>
>> Below is a list of tests that have failed at least once since just
>> last night. This has been getting worse as time passes, the broken
>> window problem. Some e-mails have 10 failing tests (+/-) so unless I
>> go through each and every one I don't know whether something I've done
>> is causing a problem or not.
>>
>> I'm as guilty of letting things slide as anyone else, there's been a
>> long-standing issue with TestLazyCores that I work on sporadically for
>> instance that's _probably_ "something in the test framework"....
>>
>> Several folks have spent some time digging into test failures and
>> identifying at least some of the causes, kudos to them. It seems
>> they're voices crying out in the wilderness though.
>>
>> There is so much noise at this point that tests are becoming
>> irrelevant. I'm trying to work on SOLR-10809 for instance, where
>> there's a pretty good possibility that I'll close at least one thing
>> that shouldn't be closed. So I ran the full suite 10 times and
>> gathered all the failures. Now I have to try to separate the failures
>> caused by that JIRA from the ones that aren't related to it so I beast
>> each of the failing tests 100 times against master. If I get a failure
>> on master too for a particular test, I'll assume it's "not my problem"
>> and drive on.
>>
>> I freely acknowledge that this is poor practice. It's driven by
>> frustration and the desire to make progress. While it's poor practice,
>> it's not as bad as only looking at tests that I _think_ are related or
>> ignoring all tests failures I can't instantly recognize as "my fault".
>>
>> So what's our stance on this? Mark Miller had a terrific program at
>> one point allowing categorization of tests that failed at a glance,
>> but it hasn't been updated in a while.  Steve Rowe is working on the
>> problem too. Hoss and Cassandra have both added to the efforts as
>> well. And I'm sure I'm leaving out others.
>>
>> Then there's the @Ignore and @BadApple annotations....
>>
>> So, as a community, are we going to devote some energy to this? Or
>> shall we just start ignoring all of the frequently failing tests?
>> Frankly we'd be farther ahead at this point marking failing tests that
>> aren't getting any work with @Ignore or @BadApple and getting
>> compulsive about not letting any _new_ tests fail than continuing our
>> current path. I don't _like_ this option mind you, but it's better
>> than letting these accumulate forever and tests become more and more
>> difficult to use. As tests become more difficult to use, they're used
>> less and the problem gets worse.
>>
>> Note, I made no effort to separate suite .vs. individual reports here.....
>>
>> Erick
>>
>> FAILED:  junit.framework.TestSuite.org
>> .apache.lucene.index.TestBagOfPositions
>> FAILED:  junit.framework.TestSuite.org
>> .apache.lucene.index.TestIndexWriterDeleteByQuery
>> FAILED:  junit.framework.TestSuite.org
>> .apache.lucene.store.TestSleepingLockWrapper
>> FAILED:  junit.framework.TestSuite.org
>> .apache.solr.analytics.legacy.facet.LegacyFieldFacetCloudTest
>> FAILED:  junit.framework.TestSuite.org
>> .apache.solr.analytics.legacy.facet.LegacyFieldFacetExtrasCloudTest
>> FAILED:  junit.framework.TestSuite.org
>> .apache.solr.analytics.legacy.facet.LegacyQueryFacetCloudTest
>> FAILED:  junit.framework.TestSuite.org
>> .apache.solr.client.solrj.TestLBHttpSolrClient
>> FAILED:  junit.framework.TestSuite.org
>> .apache.solr.cloud.TestSolrCloudWithSecureImpersonation
>> FAILED:  junit.framework.TestSuite.org
>> .apache.solr.cloud.autoscaling.AutoAddReplicasPlanActionTest
>> FAILED:  junit.framework.TestSuite.org
>> .apache.solr.core.AlternateDirectoryTest
>> FAILED:  junit.framework.TestSuite.org.apache.solr.core.TestLazyCores
>> FAILED:  junit.framework.TestSuite.org
>> .apache.solr.handler.component.DistributedFacetPivotSmallAdvancedTest
>> FAILED:  junit.framework.TestSuite.org
>> .apache.solr.ltr.TestSelectiveWeightCreation
>> FAILED:  junit.framework.TestSuite.org
>> .apache.solr.ltr.store.rest.TestModelManager
>> FAILED:  junit.framework.TestSuite.org
>> .apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory
>> FAILED:  junit.framework.TestSuite.org
>> .apache.solr.search.join.BlockJoinFacetDistribTest
>> FAILED:  junit.framework.TestSuite.org
>> .apache.solr.security.TestAuthorizationFramework
>> FAILED:  junit.framework.TestSuite.org
>> .apache.solr.update.processor.TestOpenNLPExtractNamedEntitiesUpdateProcessorFactory
>> FAILED:  org.apache.lucene.index.TestStressNRT.test
>> FAILED:  org.apache.solr.cloud.AddReplicaTest.test
>> FAILED:  org.apache.solr.cloud.DeleteShardTest.test
>> FAILED:  org.apache.solr.cloud.PeerSyncReplicationTest.test
>> FAILED:  org.apache.solr.cloud.ReplaceNodeNoTargetTest.test
>> FAILED:  org.apache.solr.cloud.TestUtilizeNode.test
>> FAILED:
>> org.apache.solr.cloud.api.collections.CollectionsAPIDistributedZkTest.testCollectionsAPI
>> FAILED:
>> org.apache.solr.cloud.api.collections.ShardSplitTest.testSplitAfterFailedSplit
>> FAILED:
>> org.apache.solr.cloud.autoscaling.AutoAddReplicasIntegrationTest.testSimple
>> FAILED:
>> org.apache.solr.cloud.autoscaling.ComputePlanActionTest.testNodeWithMultipleReplicasLost
>> FAILED:
>> org.apache.solr.cloud.autoscaling.HdfsAutoAddReplicasIntegrationTest.testSimple
>> FAILED:  org.apache.solr.cloud.autoscaling.SystemLogListenerTest.test
>> FAILED:
>> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testEventQueue
>> FAILED:
>> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testMetricTrigger
>> FAILED:
>> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testSearchRate
>> FAILED:
>> org.apache.solr.cloud.autoscaling.sim.TestLargeCluster.testSearchRate
>> FAILED:
>> org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication
>> FAILED:
>> org.apache.solr.handler.admin.AutoscalingHistoryHandlerTest.testHistory
>> FAILED:
>> org.apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory.testCanHandleDecodingAndEncodingForSynonyms
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>

Re: Test failures are out of control......

Posted by Adrien Grand <jp...@gmail.com>.

Thanks for bringing this up Erick. I agree with you we should silence those
frequent failures. Like you said, the side-effects of not silencing them
are even worse. I'll add that these flaky tests also make releasing harder,
it took me three runs last time (Lucene/Solr 7.2) for the release build to
succeed because of failed tests.

Le mer. 21 févr. 2018 à 16:52, Erick Erickson <er...@gmail.com> a
écrit :

> There's an elephant in the room, and it's that failing tests are being
> ignored. Mind you, Solr and Lucene are progressing at a furious pace
> with lots of great functionality being added. That said, we're
> building up a considerable "technical debt" when it comes to testing.
>
> And I should say up front that major new functionality is expected to
> take a while to shake out (e.g. autoscaling, streaming, V2 API etc.),
> and noise from tests of new functionality is expected while things
> bake.
>
> Below is a list of tests that have failed at least once since just
> last night. This has been getting worse as time passes, the broken
> window problem. Some e-mails have 10 failing tests (+/-) so unless I
> go through each and every one I don't know whether something I've done
> is causing a problem or not.
>
> I'm as guilty of letting things slide as anyone else, there's been a
> long-standing issue with TestLazyCores that I work on sporadically for
> instance that's _probably_ "something in the test framework"....
>
> Several folks have spent some time digging into test failures and
> identifying at least some of the causes, kudos to them. It seems
> they're voices crying out in the wilderness though.
>
> There is so much noise at this point that tests are becoming
> irrelevant. I'm trying to work on SOLR-10809 for instance, where
> there's a pretty good possibility that I'll close at least one thing
> that shouldn't be closed. So I ran the full suite 10 times and
> gathered all the failures. Now I have to try to separate the failures
> caused by that JIRA from the ones that aren't related to it so I beast
> each of the failing tests 100 times against master. If I get a failure
> on master too for a particular test, I'll assume it's "not my problem"
> and drive on.
>
> I freely acknowledge that this is poor practice. It's driven by
> frustration and the desire to make progress. While it's poor practice,
> it's not as bad as only looking at tests that I _think_ are related or
> ignoring all tests failures I can't instantly recognize as "my fault".
>
> So what's our stance on this? Mark Miller had a terrific program at
> one point allowing categorization of tests that failed at a glance,
> but it hasn't been updated in a while.  Steve Rowe is working on the
> problem too. Hoss and Cassandra have both added to the efforts as
> well. And I'm sure I'm leaving out others.
>
> Then there's the @Ignore and @BadApple annotations....
>
> So, as a community, are we going to devote some energy to this? Or
> shall we just start ignoring all of the frequently failing tests?
> Frankly we'd be farther ahead at this point marking failing tests that
> aren't getting any work with @Ignore or @BadApple and getting
> compulsive about not letting any _new_ tests fail than continuing our
> current path. I don't _like_ this option mind you, but it's better
> than letting these accumulate forever and tests become more and more
> difficult to use. As tests become more difficult to use, they're used
> less and the problem gets worse.
>
> Note, I made no effort to separate suite .vs. individual reports here.....
>
> Erick
>
> FAILED:  junit.framework.TestSuite.org
> .apache.lucene.index.TestBagOfPositions
> FAILED:  junit.framework.TestSuite.org
> .apache.lucene.index.TestIndexWriterDeleteByQuery
> FAILED:  junit.framework.TestSuite.org
> .apache.lucene.store.TestSleepingLockWrapper
> FAILED:  junit.framework.TestSuite.org
> .apache.solr.analytics.legacy.facet.LegacyFieldFacetCloudTest
> FAILED:  junit.framework.TestSuite.org
> .apache.solr.analytics.legacy.facet.LegacyFieldFacetExtrasCloudTest
> FAILED:  junit.framework.TestSuite.org
> .apache.solr.analytics.legacy.facet.LegacyQueryFacetCloudTest
> FAILED:  junit.framework.TestSuite.org
> .apache.solr.client.solrj.TestLBHttpSolrClient
> FAILED:  junit.framework.TestSuite.org
> .apache.solr.cloud.TestSolrCloudWithSecureImpersonation
> FAILED:  junit.framework.TestSuite.org
> .apache.solr.cloud.autoscaling.AutoAddReplicasPlanActionTest
> FAILED:  junit.framework.TestSuite.org
> .apache.solr.core.AlternateDirectoryTest
> FAILED:  junit.framework.TestSuite.org.apache.solr.core.TestLazyCores
> FAILED:  junit.framework.TestSuite.org
> .apache.solr.handler.component.DistributedFacetPivotSmallAdvancedTest
> FAILED:  junit.framework.TestSuite.org
> .apache.solr.ltr.TestSelectiveWeightCreation
> FAILED:  junit.framework.TestSuite.org
> .apache.solr.ltr.store.rest.TestModelManager
> FAILED:  junit.framework.TestSuite.org
> .apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory
> FAILED:  junit.framework.TestSuite.org
> .apache.solr.search.join.BlockJoinFacetDistribTest
> FAILED:  junit.framework.TestSuite.org
> .apache.solr.security.TestAuthorizationFramework
> FAILED:  junit.framework.TestSuite.org
> .apache.solr.update.processor.TestOpenNLPExtractNamedEntitiesUpdateProcessorFactory
> FAILED:  org.apache.lucene.index.TestStressNRT.test
> FAILED:  org.apache.solr.cloud.AddReplicaTest.test
> FAILED:  org.apache.solr.cloud.DeleteShardTest.test
> FAILED:  org.apache.solr.cloud.PeerSyncReplicationTest.test
> FAILED:  org.apache.solr.cloud.ReplaceNodeNoTargetTest.test
> FAILED:  org.apache.solr.cloud.TestUtilizeNode.test
> FAILED:
> org.apache.solr.cloud.api.collections.CollectionsAPIDistributedZkTest.testCollectionsAPI
> FAILED:
> org.apache.solr.cloud.api.collections.ShardSplitTest.testSplitAfterFailedSplit
> FAILED:
> org.apache.solr.cloud.autoscaling.AutoAddReplicasIntegrationTest.testSimple
> FAILED:
> org.apache.solr.cloud.autoscaling.ComputePlanActionTest.testNodeWithMultipleReplicasLost
> FAILED:
> org.apache.solr.cloud.autoscaling.HdfsAutoAddReplicasIntegrationTest.testSimple
> FAILED:  org.apache.solr.cloud.autoscaling.SystemLogListenerTest.test
> FAILED:
> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testEventQueue
> FAILED:
> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testMetricTrigger
> FAILED:
> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testSearchRate
> FAILED:
> org.apache.solr.cloud.autoscaling.sim.TestLargeCluster.testSearchRate
> FAILED:
> org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication
> FAILED:
> org.apache.solr.handler.admin.AutoscalingHistoryHandlerTest.testHistory
> FAILED:
> org.apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory.testCanHandleDecodingAndEncodingForSynonyms
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>