You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2018/07/30 17:52:10 UTC

BadApple report. Seems like I'm wasting my time.

Is anybody paying the least attention to this or should I just stop bothering?

I'd hoped to get to a point where we could get at least semi-stable
and start whittling away at the backlog. But with an additional 63
tests to BadApple (a little fudging here because of some issues with
counting suite-level tests .vs. individual test) it doesn't seem like
we're going in the right direction at all.

Unless there's some value here, defined by people stepping up and at
least looking (and once a week is not asking too much) at the names of
the tests I'm going to BadApple to see if they ring any bells, I'll
stop wasting my time.

There are currently 100 BadApple tests. That number will increase by a
hefty percentage _this week alone_.

I suppose I'll just bet the latest example of tilting at this windmill.

Erick



**Annotated tests/suites that didn't fail in the last 4 weeks.


  **Annotations will be removed from the following tests because they
haven't failed in the last 4 rollups.

  **Methods: 12
   AddReplicaTest.test
   DeleteReplicaTest.deleteReplicaFromClusterState
   LeaderVoteWaitTimeoutTest.testMostInSyncReplicasCanWinElection
   MaxSizeAutoCommitTest
   OverseerRolesTest.testOverseerRole
   PeerSyncReplicationTest.test
   RecoveryZkTest.test
   RollingRestartTest.test
   TestCloudConsistency.testOutOfSyncReplicasCannotBecomeLeader
   TestCloudPivotFacet.test
   TestLargeCluster.testSearchRate
   TestPullReplicaErrorHandling.throws

  **Suites: 0


********Failures in Hoss' reports for the last 4 rollups.

All tests that failed 4 weeks running will be BadApple'd unless there
are objections

Failures in the last 4 reports..
   Report   Pct     runs    fails           test
     0123   0.9     1689     27      AutoAddReplicasIntegrationTest.testSimple
     0123   1.1     1698     36
CdcrBootstrapTest.testConvertClusterToCdcrAndBootstrap
     0123   0.9     1430     30      ChaosMonkeyNothingIsSafeTest(suite)
     0123   0.4     1453     16      ChaosMonkeyNothingIsSafeTest.test
     0123   0.4     1726     26      CloudSolrClientTest.preferLocalShardsTest
     0123   0.7     1726     64      CloudSolrClientTest.preferReplicaTypesTest
     0123   1.9     1682     34
CollectionsAPIAsyncDistributedZkTest.testAsyncIdRaceCondition
     0123   0.9     1676     14
CollectionsAPIDistributedZkTest.testCollectionsAPI
     0123   1.1     1717     13      ComputePlanActionTest.testNodeLost
     0123   0.2     1721     13
ComputePlanActionTest.testNodeWithMultipleReplicasLost
     0123   0.4     1684     30      DistributedMLTComponentTest.test
     0123   0.4     1707      9      DocValuesNotIndexedTest.testGroupingDVOnly
     0123   0.4     1642      9      FullSolrCloudDistribCmdsTest.test
     0123   0.9     1663     49      GraphExpressionTest(suite)
     0123   0.4     1693     21      GraphExpressionTest.testShortestPathStream
     0123   1.8     1660     41      GraphTest(suite)
     0123   0.9     1686     20      GraphTest.testShortestPathStream
     0123  72.7       88     70      HdfsChaosMonkeySafeLeaderTest(suite)
     0123   1.3     1622     32      HttpSolrCallGetCoreTest(suite)
     0123  16.6     1992    223      InfixSuggestersTest.testShutdownDuringBuild
     0123   0.4     1661     12      LargeVolumeJettyTest(suite)
     0123   0.4     1685     12      LargeVolumeJettyTest.testMultiThreaded
     0123   0.9     1661      9
LeaderElectionIntegrationTest.testSimpleSliceLeaderElection
     0123   3.2     1695     55
MetricTriggerIntegrationTest.testMetricTrigger
     0123   3.9     1624     84      MoveReplicaHDFSTest.testFailedMove
     0123   4.2     1716     71
ScheduledTriggerIntegrationTest.testScheduledTrigger
     0123   2.2     1626     37      SchemaApiFailureTest(suite)
     0123  38.1      422    164      ShardSplitTest.test
     0123   9.2      329     26      ShardSplitTest.testSplitMixedReplicaTypes
     0123   0.7     1701     24      SolrCloudReportersTest.testDefaultPlugins
     0123   0.9     1701     36
SolrCloudReportersTest.testExplicitConfiguration
     0123   1.9     1683     25      SolrJmxReporterCloudTest.testJmxReporter
     0123  11.8      568     12      StreamDecoratorTest.testClassifyStream
     0123  10.5     1133     60      StreamDecoratorTest.testExecutorStream
     0123   2.6     1133     16
StreamDecoratorTest.testParallelComplementStream
     0123   5.3     1134     15
StreamDecoratorTest.testParallelDaemonCommitStream
     0123  10.5     1134     66
StreamDecoratorTest.testParallelExecutorStream
     0123   7.9     1133     24      StreamDecoratorTest.testParallelFetchStream
     0123   7.9     1133     39
StreamDecoratorTest.testParallelHavingStream
     0123   7.9     1133     25      StreamDecoratorTest.testParallelMergeStream
     0123  17.6      568     24
StreamDecoratorTest.testParallelPriorityStream
     0123   5.3     1133     19
StreamDecoratorTest.testParallelReducerStream
     0123   2.6     1131     10
StreamDecoratorTest.testParallelUniqueStream
     0123   2.6     1133     13
StreamDecoratorTest.testParallelUpdateStream
     0123   1.1      880     22
StreamExpressionTest.testParallelTopicStream
     0123  11.8      129     10      StressHdfsTest(suite)
     0123   0.2     1644      4
TestCloudConsistency.testOutOfSyncReplicasCannotBecomeLeaderAfterRestart
     0123   0.7     1533      6
TestDelegationWithHadoopAuth.testDelegationTokenRenew
     0123   0.4     1666     29      TestDistribIDF.testMultiCollectionQuery
     0123   1.5     1474     18      TestDistributedSearch.test
     0123   8.3      130     11
TestGenericDistributedQueue.testDistributedQueue
     0123   0.5     1249     41      TestHdfsCloudBackupRestore.test
     0123   2.5     1433     25      TestHdfsUpdateLog(suite)
     0123   3.9     1681     40      TestInPlaceUpdatesDistrib.test
     0123   1.8     1660     23      TestLTROnSolrCloud(suite)
     0123   0.9     1679     12      TestLTROnSolrCloud.testSimpleQuery
     0123   1.1     1639     58      TestLocalFSCloudBackupRestore.test
     0123  20.3     1963    394
TestMiniSolrCloudClusterSSL.testSslWithCheckPeerName
     0123   7.5     1635    103      TestSQLHandler(suite)
     0123   7.5     1751    109      TestSQLHandler.doTest
     0123   0.2     1648      8      TestTlogReplica(suite)
     0123   8.3      178     85      TestTlogReplica.testCreateDelete
     0123  12.3      868    102
TestTriggerIntegration.testNodeLostTriggerRestoreState

Re: BadApple report. Seems like I'm wasting my time.

Posted by Erick Erickson <er...@gmail.com>.

Steve:

Ok, InfixSuggestersTest.testShutdownDuringBuild is in my "Do not annotate" list.

On Mon, Jul 30, 2018 at 7:33 PM, Steve Rowe <sa...@gmail.com> wrote:
> Hi Erick,
>
> I think it’s valuable to continue the BadApple process as you’re currently running it.  I’m guessing most people will not engage, but some will, myself included (though I don’t claim to read the list every week).
>
> I’m working on fixing InfixSuggestersTest.testShutdownDuringBuild (SOLR-12606), so please don’t BadApple it.
>
> Thanks,
>
> --
> Steve
> www.lucidworks.com
>
>> On Jul 30, 2018, at 1:52 PM, Erick Erickson <er...@gmail.com> wrote:
>>
>> Is anybody paying the least attention to this or should I just stop bothering?
>>
>> I'd hoped to get to a point where we could get at least semi-stable
>> and start whittling away at the backlog. But with an additional 63
>> tests to BadApple (a little fudging here because of some issues with
>> counting suite-level tests .vs. individual test) it doesn't seem like
>> we're going in the right direction at all.
>>
>> Unless there's some value here, defined by people stepping up and at
>> least looking (and once a week is not asking too much) at the names of
>> the tests I'm going to BadApple to see if they ring any bells, I'll
>> stop wasting my time.
>>
>> There are currently 100 BadApple tests. That number will increase by a
>> hefty percentage _this week alone_.
>>
>> I suppose I'll just bet the latest example of tilting at this windmill.
>>
>> Erick
>>
>>
>>
>> **Annotated tests/suites that didn't fail in the last 4 weeks.
>>
>>
>>  **Annotations will be removed from the following tests because they
>> haven't failed in the last 4 rollups.
>>
>>  **Methods: 12
>>   AddReplicaTest.test
>>   DeleteReplicaTest.deleteReplicaFromClusterState
>>   LeaderVoteWaitTimeoutTest.testMostInSyncReplicasCanWinElection
>>   MaxSizeAutoCommitTest
>>   OverseerRolesTest.testOverseerRole
>>   PeerSyncReplicationTest.test
>>   RecoveryZkTest.test
>>   RollingRestartTest.test
>>   TestCloudConsistency.testOutOfSyncReplicasCannotBecomeLeader
>>   TestCloudPivotFacet.test
>>   TestLargeCluster.testSearchRate
>>   TestPullReplicaErrorHandling.throws
>>
>>  **Suites: 0
>>
>>
>> ********Failures in Hoss' reports for the last 4 rollups.
>>
>> All tests that failed 4 weeks running will be BadApple'd unless there
>> are objections
>>
>> Failures in the last 4 reports..
>>   Report   Pct     runs    fails           test
>>     0123   0.9     1689     27      AutoAddReplicasIntegrationTest.testSimple
>>     0123   1.1     1698     36
>> CdcrBootstrapTest.testConvertClusterToCdcrAndBootstrap
>>     0123   0.9     1430     30      ChaosMonkeyNothingIsSafeTest(suite)
>>     0123   0.4     1453     16      ChaosMonkeyNothingIsSafeTest.test
>>     0123   0.4     1726     26      CloudSolrClientTest.preferLocalShardsTest
>>     0123   0.7     1726     64      CloudSolrClientTest.preferReplicaTypesTest
>>     0123   1.9     1682     34
>> CollectionsAPIAsyncDistributedZkTest.testAsyncIdRaceCondition
>>     0123   0.9     1676     14
>> CollectionsAPIDistributedZkTest.testCollectionsAPI
>>     0123   1.1     1717     13      ComputePlanActionTest.testNodeLost
>>     0123   0.2     1721     13
>> ComputePlanActionTest.testNodeWithMultipleReplicasLost
>>     0123   0.4     1684     30      DistributedMLTComponentTest.test
>>     0123   0.4     1707      9      DocValuesNotIndexedTest.testGroupingDVOnly
>>     0123   0.4     1642      9      FullSolrCloudDistribCmdsTest.test
>>     0123   0.9     1663     49      GraphExpressionTest(suite)
>>     0123   0.4     1693     21      GraphExpressionTest.testShortestPathStream
>>     0123   1.8     1660     41      GraphTest(suite)
>>     0123   0.9     1686     20      GraphTest.testShortestPathStream
>>     0123  72.7       88     70      HdfsChaosMonkeySafeLeaderTest(suite)
>>     0123   1.3     1622     32      HttpSolrCallGetCoreTest(suite)
>>     0123  16.6     1992    223      InfixSuggestersTest.testShutdownDuringBuild
>>     0123   0.4     1661     12      LargeVolumeJettyTest(suite)
>>     0123   0.4     1685     12      LargeVolumeJettyTest.testMultiThreaded
>>     0123   0.9     1661      9
>> LeaderElectionIntegrationTest.testSimpleSliceLeaderElection
>>     0123   3.2     1695     55
>> MetricTriggerIntegrationTest.testMetricTrigger
>>     0123   3.9     1624     84      MoveReplicaHDFSTest.testFailedMove
>>     0123   4.2     1716     71
>> ScheduledTriggerIntegrationTest.testScheduledTrigger
>>     0123   2.2     1626     37      SchemaApiFailureTest(suite)
>>     0123  38.1      422    164      ShardSplitTest.test
>>     0123   9.2      329     26      ShardSplitTest.testSplitMixedReplicaTypes
>>     0123   0.7     1701     24      SolrCloudReportersTest.testDefaultPlugins
>>     0123   0.9     1701     36
>> SolrCloudReportersTest.testExplicitConfiguration
>>     0123   1.9     1683     25      SolrJmxReporterCloudTest.testJmxReporter
>>     0123  11.8      568     12      StreamDecoratorTest.testClassifyStream
>>     0123  10.5     1133     60      StreamDecoratorTest.testExecutorStream
>>     0123   2.6     1133     16
>> StreamDecoratorTest.testParallelComplementStream
>>     0123   5.3     1134     15
>> StreamDecoratorTest.testParallelDaemonCommitStream
>>     0123  10.5     1134     66
>> StreamDecoratorTest.testParallelExecutorStream
>>     0123   7.9     1133     24      StreamDecoratorTest.testParallelFetchStream
>>     0123   7.9     1133     39
>> StreamDecoratorTest.testParallelHavingStream
>>     0123   7.9     1133     25      StreamDecoratorTest.testParallelMergeStream
>>     0123  17.6      568     24
>> StreamDecoratorTest.testParallelPriorityStream
>>     0123   5.3     1133     19
>> StreamDecoratorTest.testParallelReducerStream
>>     0123   2.6     1131     10
>> StreamDecoratorTest.testParallelUniqueStream
>>     0123   2.6     1133     13
>> StreamDecoratorTest.testParallelUpdateStream
>>     0123   1.1      880     22
>> StreamExpressionTest.testParallelTopicStream
>>     0123  11.8      129     10      StressHdfsTest(suite)
>>     0123   0.2     1644      4
>> TestCloudConsistency.testOutOfSyncReplicasCannotBecomeLeaderAfterRestart
>>     0123   0.7     1533      6
>> TestDelegationWithHadoopAuth.testDelegationTokenRenew
>>     0123   0.4     1666     29      TestDistribIDF.testMultiCollectionQuery
>>     0123   1.5     1474     18      TestDistributedSearch.test
>>     0123   8.3      130     11
>> TestGenericDistributedQueue.testDistributedQueue
>>     0123   0.5     1249     41      TestHdfsCloudBackupRestore.test
>>     0123   2.5     1433     25      TestHdfsUpdateLog(suite)
>>     0123   3.9     1681     40      TestInPlaceUpdatesDistrib.test
>>     0123   1.8     1660     23      TestLTROnSolrCloud(suite)
>>     0123   0.9     1679     12      TestLTROnSolrCloud.testSimpleQuery
>>     0123   1.1     1639     58      TestLocalFSCloudBackupRestore.test
>>     0123  20.3     1963    394
>> TestMiniSolrCloudClusterSSL.testSslWithCheckPeerName
>>     0123   7.5     1635    103      TestSQLHandler(suite)
>>     0123   7.5     1751    109      TestSQLHandler.doTest
>>     0123   0.2     1648      8      TestTlogReplica(suite)
>>     0123   8.3      178     85      TestTlogReplica.testCreateDelete
>>     0123  12.3      868    102
>> TestTriggerIntegration.testNodeLostTriggerRestoreState
>> <e-mail-2018-07-30.txt>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: BadApple report. Seems like I'm wasting my time.

Posted by Steve Rowe <sa...@gmail.com>.

Hi Erick,

I think it’s valuable to continue the BadApple process as you’re currently running it.  I’m guessing most people will not engage, but some will, myself included (though I don’t claim to read the list every week).

I’m working on fixing InfixSuggestersTest.testShutdownDuringBuild (SOLR-12606), so please don’t BadApple it.

Thanks,

--
Steve
www.lucidworks.com

> On Jul 30, 2018, at 1:52 PM, Erick Erickson <er...@gmail.com> wrote:
> 
> Is anybody paying the least attention to this or should I just stop bothering?
> 
> I'd hoped to get to a point where we could get at least semi-stable
> and start whittling away at the backlog. But with an additional 63
> tests to BadApple (a little fudging here because of some issues with
> counting suite-level tests .vs. individual test) it doesn't seem like
> we're going in the right direction at all.
> 
> Unless there's some value here, defined by people stepping up and at
> least looking (and once a week is not asking too much) at the names of
> the tests I'm going to BadApple to see if they ring any bells, I'll
> stop wasting my time.
> 
> There are currently 100 BadApple tests. That number will increase by a
> hefty percentage _this week alone_.
> 
> I suppose I'll just bet the latest example of tilting at this windmill.
> 
> Erick
> 
> 
> 
> **Annotated tests/suites that didn't fail in the last 4 weeks.
> 
> 
>  **Annotations will be removed from the following tests because they
> haven't failed in the last 4 rollups.
> 
>  **Methods: 12
>   AddReplicaTest.test
>   DeleteReplicaTest.deleteReplicaFromClusterState
>   LeaderVoteWaitTimeoutTest.testMostInSyncReplicasCanWinElection
>   MaxSizeAutoCommitTest
>   OverseerRolesTest.testOverseerRole
>   PeerSyncReplicationTest.test
>   RecoveryZkTest.test
>   RollingRestartTest.test
>   TestCloudConsistency.testOutOfSyncReplicasCannotBecomeLeader
>   TestCloudPivotFacet.test
>   TestLargeCluster.testSearchRate
>   TestPullReplicaErrorHandling.throws
> 
>  **Suites: 0
> 
> 
> ********Failures in Hoss' reports for the last 4 rollups.
> 
> All tests that failed 4 weeks running will be BadApple'd unless there
> are objections
> 
> Failures in the last 4 reports..
>   Report   Pct     runs    fails           test
>     0123   0.9     1689     27      AutoAddReplicasIntegrationTest.testSimple
>     0123   1.1     1698     36
> CdcrBootstrapTest.testConvertClusterToCdcrAndBootstrap
>     0123   0.9     1430     30      ChaosMonkeyNothingIsSafeTest(suite)
>     0123   0.4     1453     16      ChaosMonkeyNothingIsSafeTest.test
>     0123   0.4     1726     26      CloudSolrClientTest.preferLocalShardsTest
>     0123   0.7     1726     64      CloudSolrClientTest.preferReplicaTypesTest
>     0123   1.9     1682     34
> CollectionsAPIAsyncDistributedZkTest.testAsyncIdRaceCondition
>     0123   0.9     1676     14
> CollectionsAPIDistributedZkTest.testCollectionsAPI
>     0123   1.1     1717     13      ComputePlanActionTest.testNodeLost
>     0123   0.2     1721     13
> ComputePlanActionTest.testNodeWithMultipleReplicasLost
>     0123   0.4     1684     30      DistributedMLTComponentTest.test
>     0123   0.4     1707      9      DocValuesNotIndexedTest.testGroupingDVOnly
>     0123   0.4     1642      9      FullSolrCloudDistribCmdsTest.test
>     0123   0.9     1663     49      GraphExpressionTest(suite)
>     0123   0.4     1693     21      GraphExpressionTest.testShortestPathStream
>     0123   1.8     1660     41      GraphTest(suite)
>     0123   0.9     1686     20      GraphTest.testShortestPathStream
>     0123  72.7       88     70      HdfsChaosMonkeySafeLeaderTest(suite)
>     0123   1.3     1622     32      HttpSolrCallGetCoreTest(suite)
>     0123  16.6     1992    223      InfixSuggestersTest.testShutdownDuringBuild
>     0123   0.4     1661     12      LargeVolumeJettyTest(suite)
>     0123   0.4     1685     12      LargeVolumeJettyTest.testMultiThreaded
>     0123   0.9     1661      9
> LeaderElectionIntegrationTest.testSimpleSliceLeaderElection
>     0123   3.2     1695     55
> MetricTriggerIntegrationTest.testMetricTrigger
>     0123   3.9     1624     84      MoveReplicaHDFSTest.testFailedMove
>     0123   4.2     1716     71
> ScheduledTriggerIntegrationTest.testScheduledTrigger
>     0123   2.2     1626     37      SchemaApiFailureTest(suite)
>     0123  38.1      422    164      ShardSplitTest.test
>     0123   9.2      329     26      ShardSplitTest.testSplitMixedReplicaTypes
>     0123   0.7     1701     24      SolrCloudReportersTest.testDefaultPlugins
>     0123   0.9     1701     36
> SolrCloudReportersTest.testExplicitConfiguration
>     0123   1.9     1683     25      SolrJmxReporterCloudTest.testJmxReporter
>     0123  11.8      568     12      StreamDecoratorTest.testClassifyStream
>     0123  10.5     1133     60      StreamDecoratorTest.testExecutorStream
>     0123   2.6     1133     16
> StreamDecoratorTest.testParallelComplementStream
>     0123   5.3     1134     15
> StreamDecoratorTest.testParallelDaemonCommitStream
>     0123  10.5     1134     66
> StreamDecoratorTest.testParallelExecutorStream
>     0123   7.9     1133     24      StreamDecoratorTest.testParallelFetchStream
>     0123   7.9     1133     39
> StreamDecoratorTest.testParallelHavingStream
>     0123   7.9     1133     25      StreamDecoratorTest.testParallelMergeStream
>     0123  17.6      568     24
> StreamDecoratorTest.testParallelPriorityStream
>     0123   5.3     1133     19
> StreamDecoratorTest.testParallelReducerStream
>     0123   2.6     1131     10
> StreamDecoratorTest.testParallelUniqueStream
>     0123   2.6     1133     13
> StreamDecoratorTest.testParallelUpdateStream
>     0123   1.1      880     22
> StreamExpressionTest.testParallelTopicStream
>     0123  11.8      129     10      StressHdfsTest(suite)
>     0123   0.2     1644      4
> TestCloudConsistency.testOutOfSyncReplicasCannotBecomeLeaderAfterRestart
>     0123   0.7     1533      6
> TestDelegationWithHadoopAuth.testDelegationTokenRenew
>     0123   0.4     1666     29      TestDistribIDF.testMultiCollectionQuery
>     0123   1.5     1474     18      TestDistributedSearch.test
>     0123   8.3      130     11
> TestGenericDistributedQueue.testDistributedQueue
>     0123   0.5     1249     41      TestHdfsCloudBackupRestore.test
>     0123   2.5     1433     25      TestHdfsUpdateLog(suite)
>     0123   3.9     1681     40      TestInPlaceUpdatesDistrib.test
>     0123   1.8     1660     23      TestLTROnSolrCloud(suite)
>     0123   0.9     1679     12      TestLTROnSolrCloud.testSimpleQuery
>     0123   1.1     1639     58      TestLocalFSCloudBackupRestore.test
>     0123  20.3     1963    394
> TestMiniSolrCloudClusterSSL.testSslWithCheckPeerName
>     0123   7.5     1635    103      TestSQLHandler(suite)
>     0123   7.5     1751    109      TestSQLHandler.doTest
>     0123   0.2     1648      8      TestTlogReplica(suite)
>     0123   8.3      178     85      TestTlogReplica.testCreateDelete
>     0123  12.3      868    102
> TestTriggerIntegration.testNodeLostTriggerRestoreState
> <e-mail-2018-07-30.txt>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: BadApple report. Seems like I'm wasting my time.

Posted by Mark Miller <ma...@gmail.com>.

I still think it’s a mistake to try and use all the Jenkins results to
drive ignoring tests. It needs to be an objective measure in a good env.

We also should not be ignoring tests in mass.l without individual
consideration. Critical test coverage should be treated differently than
any random test, especially when stability is sometimes simple to achieve
for that test.

A decade+ of history says it’s unlikely you get much consistent help
digging out of a huge test ignore hell.

Beasting in a known good environment and a few very interested parties is
the only path out of this if you ask me. We need to get clean in a known
good env and then automate beasting defense, using Jenkins to find issues
in other environments.

Unfortunately, not something I can help out with in the short term anymore.

Mark
On Wed, Aug 1, 2018 at 8:10 AM Erick Erickson <er...@gmail.com>
wrote:

> Alexandre:
>
> Feel free! What I'm struggling with is not that someone checked in
> some code that all the sudden started breaking things. Rather that a
> test that's been working perfectly will fail once the won't
> reproducibly fail again and does _not_ appear to be related to recent
> code changes.
>
> In fact that's the crux of the matter, it's difficult/impossible to
> tell at a glance when a test fails whether it is or is not related to
> a recent code change.....
>
> Erick
>
> On Wed, Aug 1, 2018 at 8:05 AM, Alexandre Rafalovitch
> <ar...@gmail.com> wrote:
> > Just a completely random thought that I do not have deep knowledge for
> > (still learning my way around Solr tests).
> >
> > Is this something that Machine Learning could help with? The Github
> > repo/history is a fantastic source of learning on who worked on which
> > file, how often, etc. We certainly should be able to get some 'most
> > significant developer' stats out of that.
> >
> > Regards,
> >    Alex.
> >
> > On 1 August 2018 at 10:56, Erick Erickson <er...@gmail.com>
> wrote:
> >> Shawn:
> >>
> >> Trouble is there were 945 tests that failed at least once in the last
> >> 4 weeks. And the trend is all over the map on a weekly basis.
> >>
> >> e-mail-2018-06-11.txt: There were 989 unannotated tests that failed
> >> e-mail-2018-06-18.txt: There were 689 unannotated tests that failed
> >> e-mail-2018-06-25.txt: There were 555 unannotated tests that failed
> >> e-mail-2018-07-02.txt: There were 723 unannotated tests that failed
> >> e-mail-2018-07-09.txt: There were 793 unannotated tests that failed
> >> e-mail-2018-07-16.txt: There were 809 unannotated tests that failed
> >> e-mail-2018-07-23.txt: There were 953 unannotated tests that failed
> >> e-mail-2018-07-30.txt: There were 945 unannotated tests that failed
> >>
> >> I'm BadApple'ing tests that fail every week for the last 4 weeks on
> >> the theory that those are not temporary issues (hey, we all commit
> >> code that breaks something then have to figure out why and fix).
> >>
> >> I also have the feeling that somewhere, somehow, our test framework is
> >> making some assumptions that are invalid. Or too strict. Or too fast.
> >> Or there's some fundamental issue with some of our classes. Or... The
> >> number of sporadic issues where the Object Tracker spits stuff out for
> >> instance screams that some assumption we're making, either in the code
> >> or in the test framework is flawed.
> >>
> >> What I don't know is how to make visible progress. It's discouraging
> >> to fix something and then next week have more tests fail for unrelated
> >> reasons.
> >>
> >> Visibility is the issue to me. We have no good way of saying "these
> >> tests _just started failing for a reason. As a quick experiment, I
> >> extended the triage to 10 weeks (no attempt to ascertain if these
> >> tests even existed 10 weeks ago). Here are the tests that have _only_
> >> failed in the last week, not the previous 9. BadApple'ing anything
> >> that's only failed once seems overkill
> >>
> >> Although the test that failed 77 times does just stand out....
> >>
> >> week     pct        runs  fails            test
> >> 0            0.2      460      1
> >> CloudSolrClientTest.testVersionsAreReturned
> >> 0            0.2      466      1
> >> ComputePlanActionTest.testSelectedCollections
> >> 0            0.2      464      1
> >> ConfusionMatrixGeneratorTest.testGetConfusionMatrixWithBM25NB
> >> 0            8.1       37      3      IndexSizeTriggerTest(suite)
> >> 0            0.2      454      1
> MBeansHandlerTest.testAddedMBeanDiff
> >> 0            0.2      454      1      MBeansHandlerTest.testDiff
> >> 0            0.2      455      1      MetricTriggerTest.test
> >> 0            0.2      455      1      MetricsHandlerTest.test
> >> 0            0.2      455      1      MetricsHandlerTest.testKeyMetrics
> >> 0            0.2      453      1      RequestHandlersTest.testInitCount
> >> 0            0.2      453      1      RequestHandlersTest.testStatistics
> >> 0            0.2      453      1
> ScheduledTriggerIntegrationTest(suite)
> >> 0            0.2      451      1
> SearchRateTriggerTest.testWaitForElapsed
> >> 0            0.2      425      1
> >> SoftAutoCommitTest.testSoftCommitWithinAndHardCommitMaxTimeRapidAdds
> >> 0           14.7      525     77
> >> StreamExpressionTest.testSignificantTermsStream
> >> 0            0.2      454      1      TestBadConfig(suite)
> >> 0            0.2      465      1
> >> TestBlockJoin.testMultiChildQueriesOfDiffParentLevels
> >> 0            0.6      462      3
> >> TestCloudCollectionsListeners.testCollectionDeletion
> >> 0            0.2      456      1      TestInfoStreamLogging(suite)
> >> 0            0.2      456      1      TestLazyCores.testLazySearch
> >> 0            0.2      473      1
> >> TestLucene70DocValuesFormat.testSortedSetAroundBlockSize
> >> 0           15.4       26      4
> >> TestMockDirectoryWrapper.testThreadSafetyInListAll
> >> 0            0.2      454      1      TestNodeLostTrigger.testTrigger
> >> 0            0.2      453      1      TestRecovery.stressLogReplay
> >> 0            0.2      505      1
> >> TestReplicationHandler.testRateLimitedReplication
> >> 0            0.2      425      1
> >> TestSolrCloudWithSecureImpersonation.testForwarding
> >> 0            0.9      461      4
> >> TestSolrDeletionPolicy1.testNumCommitsConfigured
> >> 0            0.2      454      1      TestSystemIdResolver(suite)
> >> 0            0.2      451      1      TestV2Request.testCloudSolrClient
> >> 0            0.2      451      1      TestV2Request.testHttpSolrClient
> >> 0            9.1       77      7
> >> TestWithCollection.testDeleteWithCollection
> >> 0            3.9       77      3
> >> TestWithCollection.testMoveReplicaWithCollection
> >>
> >> So I don't know what I'm going to do here, we'll see if I get more
> >> optimistic when the fog lifts.
> >>
> >> Erick
> >>
> >> On Wed, Aug 1, 2018 at 7:15 AM, Shawn Heisey <ap...@elyograg.org>
> wrote:
> >>> On 7/30/2018 11:52 AM, Erick Erickson wrote:
> >>>>
> >>>> Is anybody paying the least attention to this or should I just stop
> >>>> bothering?
> >>>
> >>>
> >>> The job you're doing is thankless.  That's the nature of the work.
> I'd love
> >>> to have the time to really help you out. If only my employer didn't
> expect
> >>> me to spend so much time *working*!
> >>>
> >>>> I'd hoped to get to a point where we could get at least semi-stable
> >>>> and start whittling away at the backlog. But with an additional 63
> >>>> tests to BadApple (a little fudging here because of some issues with
> >>>> counting suite-level tests .vs. individual test) it doesn't seem like
> >>>> we're going in the right direction at all.
> >>>>
> >>>> Unless there's some value here, defined by people stepping up and at
> >>>> least looking (and once a week is not asking too much) at the names of
> >>>> the tests I'm going to BadApple to see if they ring any bells, I'll
> >>>> stop wasting my time.
> >>>
> >>>
> >>> Here's a crazy thought, which might be something you already
> considered:
> >>> Try to figure out which tests pass consistently and BadApple *all the
> rest*
> >>> of the Solr tests.  If there are any Lucene tests that fail with some
> >>> regularity, BadApple those too.
> >>>
> >>> There are probably disadvantages to this approach, but here are the
> >>> advantages I can think of:  1) The noise stops quickly. 2) Future
> heroic
> >>> efforts will result in measurable progress -- to quote you, "whittling
> away
> >>> at the backlog."
> >>>
> >>> Thank you a million times over for all the care and effort you've put
> into
> >>> this.
> >>>
> >>> Shawn
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: dev-help@lucene.apache.org
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
> --
- Mark
about.me/markrmiller

Re: BadApple report. Seems like I'm wasting my time.

Posted by Erick Erickson <er...@gmail.com>.

Alexandre:

Feel free! What I'm struggling with is not that someone checked in
some code that all the sudden started breaking things. Rather that a
test that's been working perfectly will fail once the won't
reproducibly fail again and does _not_ appear to be related to recent
code changes.

In fact that's the crux of the matter, it's difficult/impossible to
tell at a glance when a test fails whether it is or is not related to
a recent code change.....

Erick

On Wed, Aug 1, 2018 at 8:05 AM, Alexandre Rafalovitch
<ar...@gmail.com> wrote:
> Just a completely random thought that I do not have deep knowledge for
> (still learning my way around Solr tests).
>
> Is this something that Machine Learning could help with? The Github
> repo/history is a fantastic source of learning on who worked on which
> file, how often, etc. We certainly should be able to get some 'most
> significant developer' stats out of that.
>
> Regards,
>    Alex.
>
> On 1 August 2018 at 10:56, Erick Erickson <er...@gmail.com> wrote:
>> Shawn:
>>
>> Trouble is there were 945 tests that failed at least once in the last
>> 4 weeks. And the trend is all over the map on a weekly basis.
>>
>> e-mail-2018-06-11.txt: There were 989 unannotated tests that failed
>> e-mail-2018-06-18.txt: There were 689 unannotated tests that failed
>> e-mail-2018-06-25.txt: There were 555 unannotated tests that failed
>> e-mail-2018-07-02.txt: There were 723 unannotated tests that failed
>> e-mail-2018-07-09.txt: There were 793 unannotated tests that failed
>> e-mail-2018-07-16.txt: There were 809 unannotated tests that failed
>> e-mail-2018-07-23.txt: There were 953 unannotated tests that failed
>> e-mail-2018-07-30.txt: There were 945 unannotated tests that failed
>>
>> I'm BadApple'ing tests that fail every week for the last 4 weeks on
>> the theory that those are not temporary issues (hey, we all commit
>> code that breaks something then have to figure out why and fix).
>>
>> I also have the feeling that somewhere, somehow, our test framework is
>> making some assumptions that are invalid. Or too strict. Or too fast.
>> Or there's some fundamental issue with some of our classes. Or... The
>> number of sporadic issues where the Object Tracker spits stuff out for
>> instance screams that some assumption we're making, either in the code
>> or in the test framework is flawed.
>>
>> What I don't know is how to make visible progress. It's discouraging
>> to fix something and then next week have more tests fail for unrelated
>> reasons.
>>
>> Visibility is the issue to me. We have no good way of saying "these
>> tests _just started failing for a reason. As a quick experiment, I
>> extended the triage to 10 weeks (no attempt to ascertain if these
>> tests even existed 10 weeks ago). Here are the tests that have _only_
>> failed in the last week, not the previous 9. BadApple'ing anything
>> that's only failed once seems overkill
>>
>> Although the test that failed 77 times does just stand out....
>>
>> week     pct        runs  fails            test
>> 0            0.2      460      1
>> CloudSolrClientTest.testVersionsAreReturned
>> 0            0.2      466      1
>> ComputePlanActionTest.testSelectedCollections
>> 0            0.2      464      1
>> ConfusionMatrixGeneratorTest.testGetConfusionMatrixWithBM25NB
>> 0            8.1       37      3      IndexSizeTriggerTest(suite)
>> 0            0.2      454      1      MBeansHandlerTest.testAddedMBeanDiff
>> 0            0.2      454      1      MBeansHandlerTest.testDiff
>> 0            0.2      455      1      MetricTriggerTest.test
>> 0            0.2      455      1      MetricsHandlerTest.test
>> 0            0.2      455      1      MetricsHandlerTest.testKeyMetrics
>> 0            0.2      453      1      RequestHandlersTest.testInitCount
>> 0            0.2      453      1      RequestHandlersTest.testStatistics
>> 0            0.2      453      1      ScheduledTriggerIntegrationTest(suite)
>> 0            0.2      451      1      SearchRateTriggerTest.testWaitForElapsed
>> 0            0.2      425      1
>> SoftAutoCommitTest.testSoftCommitWithinAndHardCommitMaxTimeRapidAdds
>> 0           14.7      525     77
>> StreamExpressionTest.testSignificantTermsStream
>> 0            0.2      454      1      TestBadConfig(suite)
>> 0            0.2      465      1
>> TestBlockJoin.testMultiChildQueriesOfDiffParentLevels
>> 0            0.6      462      3
>> TestCloudCollectionsListeners.testCollectionDeletion
>> 0            0.2      456      1      TestInfoStreamLogging(suite)
>> 0            0.2      456      1      TestLazyCores.testLazySearch
>> 0            0.2      473      1
>> TestLucene70DocValuesFormat.testSortedSetAroundBlockSize
>> 0           15.4       26      4
>> TestMockDirectoryWrapper.testThreadSafetyInListAll
>> 0            0.2      454      1      TestNodeLostTrigger.testTrigger
>> 0            0.2      453      1      TestRecovery.stressLogReplay
>> 0            0.2      505      1
>> TestReplicationHandler.testRateLimitedReplication
>> 0            0.2      425      1
>> TestSolrCloudWithSecureImpersonation.testForwarding
>> 0            0.9      461      4
>> TestSolrDeletionPolicy1.testNumCommitsConfigured
>> 0            0.2      454      1      TestSystemIdResolver(suite)
>> 0            0.2      451      1      TestV2Request.testCloudSolrClient
>> 0            0.2      451      1      TestV2Request.testHttpSolrClient
>> 0            9.1       77      7
>> TestWithCollection.testDeleteWithCollection
>> 0            3.9       77      3
>> TestWithCollection.testMoveReplicaWithCollection
>>
>> So I don't know what I'm going to do here, we'll see if I get more
>> optimistic when the fog lifts.
>>
>> Erick
>>
>> On Wed, Aug 1, 2018 at 7:15 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>>> On 7/30/2018 11:52 AM, Erick Erickson wrote:
>>>>
>>>> Is anybody paying the least attention to this or should I just stop
>>>> bothering?
>>>
>>>
>>> The job you're doing is thankless.  That's the nature of the work.  I'd love
>>> to have the time to really help you out. If only my employer didn't expect
>>> me to spend so much time *working*!
>>>
>>>> I'd hoped to get to a point where we could get at least semi-stable
>>>> and start whittling away at the backlog. But with an additional 63
>>>> tests to BadApple (a little fudging here because of some issues with
>>>> counting suite-level tests .vs. individual test) it doesn't seem like
>>>> we're going in the right direction at all.
>>>>
>>>> Unless there's some value here, defined by people stepping up and at
>>>> least looking (and once a week is not asking too much) at the names of
>>>> the tests I'm going to BadApple to see if they ring any bells, I'll
>>>> stop wasting my time.
>>>
>>>
>>> Here's a crazy thought, which might be something you already considered:
>>> Try to figure out which tests pass consistently and BadApple *all the rest*
>>> of the Solr tests.  If there are any Lucene tests that fail with some
>>> regularity, BadApple those too.
>>>
>>> There are probably disadvantages to this approach, but here are the
>>> advantages I can think of:  1) The noise stops quickly. 2) Future heroic
>>> efforts will result in measurable progress -- to quote you, "whittling away
>>> at the backlog."
>>>
>>> Thank you a million times over for all the care and effort you've put into
>>> this.
>>>
>>> Shawn
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: BadApple report. Seems like I'm wasting my time.

Posted by Alexandre Rafalovitch <ar...@gmail.com>.

Just a completely random thought that I do not have deep knowledge for
(still learning my way around Solr tests).

Is this something that Machine Learning could help with? The Github
repo/history is a fantastic source of learning on who worked on which
file, how often, etc. We certainly should be able to get some 'most
significant developer' stats out of that.

Regards,
   Alex.

On 1 August 2018 at 10:56, Erick Erickson <er...@gmail.com> wrote:
> Shawn:
>
> Trouble is there were 945 tests that failed at least once in the last
> 4 weeks. And the trend is all over the map on a weekly basis.
>
> e-mail-2018-06-11.txt: There were 989 unannotated tests that failed
> e-mail-2018-06-18.txt: There were 689 unannotated tests that failed
> e-mail-2018-06-25.txt: There were 555 unannotated tests that failed
> e-mail-2018-07-02.txt: There were 723 unannotated tests that failed
> e-mail-2018-07-09.txt: There were 793 unannotated tests that failed
> e-mail-2018-07-16.txt: There were 809 unannotated tests that failed
> e-mail-2018-07-23.txt: There were 953 unannotated tests that failed
> e-mail-2018-07-30.txt: There were 945 unannotated tests that failed
>
> I'm BadApple'ing tests that fail every week for the last 4 weeks on
> the theory that those are not temporary issues (hey, we all commit
> code that breaks something then have to figure out why and fix).
>
> I also have the feeling that somewhere, somehow, our test framework is
> making some assumptions that are invalid. Or too strict. Or too fast.
> Or there's some fundamental issue with some of our classes. Or... The
> number of sporadic issues where the Object Tracker spits stuff out for
> instance screams that some assumption we're making, either in the code
> or in the test framework is flawed.
>
> What I don't know is how to make visible progress. It's discouraging
> to fix something and then next week have more tests fail for unrelated
> reasons.
>
> Visibility is the issue to me. We have no good way of saying "these
> tests _just started failing for a reason. As a quick experiment, I
> extended the triage to 10 weeks (no attempt to ascertain if these
> tests even existed 10 weeks ago). Here are the tests that have _only_
> failed in the last week, not the previous 9. BadApple'ing anything
> that's only failed once seems overkill
>
> Although the test that failed 77 times does just stand out....
>
> week     pct        runs  fails            test
> 0            0.2      460      1
> CloudSolrClientTest.testVersionsAreReturned
> 0            0.2      466      1
> ComputePlanActionTest.testSelectedCollections
> 0            0.2      464      1
> ConfusionMatrixGeneratorTest.testGetConfusionMatrixWithBM25NB
> 0            8.1       37      3      IndexSizeTriggerTest(suite)
> 0            0.2      454      1      MBeansHandlerTest.testAddedMBeanDiff
> 0            0.2      454      1      MBeansHandlerTest.testDiff
> 0            0.2      455      1      MetricTriggerTest.test
> 0            0.2      455      1      MetricsHandlerTest.test
> 0            0.2      455      1      MetricsHandlerTest.testKeyMetrics
> 0            0.2      453      1      RequestHandlersTest.testInitCount
> 0            0.2      453      1      RequestHandlersTest.testStatistics
> 0            0.2      453      1      ScheduledTriggerIntegrationTest(suite)
> 0            0.2      451      1      SearchRateTriggerTest.testWaitForElapsed
> 0            0.2      425      1
> SoftAutoCommitTest.testSoftCommitWithinAndHardCommitMaxTimeRapidAdds
> 0           14.7      525     77
> StreamExpressionTest.testSignificantTermsStream
> 0            0.2      454      1      TestBadConfig(suite)
> 0            0.2      465      1
> TestBlockJoin.testMultiChildQueriesOfDiffParentLevels
> 0            0.6      462      3
> TestCloudCollectionsListeners.testCollectionDeletion
> 0            0.2      456      1      TestInfoStreamLogging(suite)
> 0            0.2      456      1      TestLazyCores.testLazySearch
> 0            0.2      473      1
> TestLucene70DocValuesFormat.testSortedSetAroundBlockSize
> 0           15.4       26      4
> TestMockDirectoryWrapper.testThreadSafetyInListAll
> 0            0.2      454      1      TestNodeLostTrigger.testTrigger
> 0            0.2      453      1      TestRecovery.stressLogReplay
> 0            0.2      505      1
> TestReplicationHandler.testRateLimitedReplication
> 0            0.2      425      1
> TestSolrCloudWithSecureImpersonation.testForwarding
> 0            0.9      461      4
> TestSolrDeletionPolicy1.testNumCommitsConfigured
> 0            0.2      454      1      TestSystemIdResolver(suite)
> 0            0.2      451      1      TestV2Request.testCloudSolrClient
> 0            0.2      451      1      TestV2Request.testHttpSolrClient
> 0            9.1       77      7
> TestWithCollection.testDeleteWithCollection
> 0            3.9       77      3
> TestWithCollection.testMoveReplicaWithCollection
>
> So I don't know what I'm going to do here, we'll see if I get more
> optimistic when the fog lifts.
>
> Erick
>
> On Wed, Aug 1, 2018 at 7:15 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>> On 7/30/2018 11:52 AM, Erick Erickson wrote:
>>>
>>> Is anybody paying the least attention to this or should I just stop
>>> bothering?
>>
>>
>> The job you're doing is thankless.  That's the nature of the work.  I'd love
>> to have the time to really help you out. If only my employer didn't expect
>> me to spend so much time *working*!
>>
>>> I'd hoped to get to a point where we could get at least semi-stable
>>> and start whittling away at the backlog. But with an additional 63
>>> tests to BadApple (a little fudging here because of some issues with
>>> counting suite-level tests .vs. individual test) it doesn't seem like
>>> we're going in the right direction at all.
>>>
>>> Unless there's some value here, defined by people stepping up and at
>>> least looking (and once a week is not asking too much) at the names of
>>> the tests I'm going to BadApple to see if they ring any bells, I'll
>>> stop wasting my time.
>>
>>
>> Here's a crazy thought, which might be something you already considered:
>> Try to figure out which tests pass consistently and BadApple *all the rest*
>> of the Solr tests.  If there are any Lucene tests that fail with some
>> regularity, BadApple those too.
>>
>> There are probably disadvantages to this approach, but here are the
>> advantages I can think of:  1) The noise stops quickly. 2) Future heroic
>> efforts will result in measurable progress -- to quote you, "whittling away
>> at the backlog."
>>
>> Thank you a million times over for all the care and effort you've put into
>> this.
>>
>> Shawn
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: BadApple report. Seems like I'm wasting my time.

Posted by Erick Erickson <er...@gmail.com>.

Shawn:

Trouble is there were 945 tests that failed at least once in the last
4 weeks. And the trend is all over the map on a weekly basis.

e-mail-2018-06-11.txt: There were 989 unannotated tests that failed
e-mail-2018-06-18.txt: There were 689 unannotated tests that failed
e-mail-2018-06-25.txt: There were 555 unannotated tests that failed
e-mail-2018-07-02.txt: There were 723 unannotated tests that failed
e-mail-2018-07-09.txt: There were 793 unannotated tests that failed
e-mail-2018-07-16.txt: There were 809 unannotated tests that failed
e-mail-2018-07-23.txt: There were 953 unannotated tests that failed
e-mail-2018-07-30.txt: There were 945 unannotated tests that failed

I'm BadApple'ing tests that fail every week for the last 4 weeks on
the theory that those are not temporary issues (hey, we all commit
code that breaks something then have to figure out why and fix).

I also have the feeling that somewhere, somehow, our test framework is
making some assumptions that are invalid. Or too strict. Or too fast.
Or there's some fundamental issue with some of our classes. Or... The
number of sporadic issues where the Object Tracker spits stuff out for
instance screams that some assumption we're making, either in the code
or in the test framework is flawed.

What I don't know is how to make visible progress. It's discouraging
to fix something and then next week have more tests fail for unrelated
reasons.

Visibility is the issue to me. We have no good way of saying "these
tests _just started failing for a reason. As a quick experiment, I
extended the triage to 10 weeks (no attempt to ascertain if these
tests even existed 10 weeks ago). Here are the tests that have _only_
failed in the last week, not the previous 9. BadApple'ing anything
that's only failed once seems overkill

Although the test that failed 77 times does just stand out....

week     pct        runs  fails            test
0            0.2      460      1
CloudSolrClientTest.testVersionsAreReturned
0            0.2      466      1
ComputePlanActionTest.testSelectedCollections
0            0.2      464      1
ConfusionMatrixGeneratorTest.testGetConfusionMatrixWithBM25NB
0            8.1       37      3      IndexSizeTriggerTest(suite)
0            0.2      454      1      MBeansHandlerTest.testAddedMBeanDiff
0            0.2      454      1      MBeansHandlerTest.testDiff
0            0.2      455      1      MetricTriggerTest.test
0            0.2      455      1      MetricsHandlerTest.test
0            0.2      455      1      MetricsHandlerTest.testKeyMetrics
0            0.2      453      1      RequestHandlersTest.testInitCount
0            0.2      453      1      RequestHandlersTest.testStatistics
0            0.2      453      1      ScheduledTriggerIntegrationTest(suite)
0            0.2      451      1      SearchRateTriggerTest.testWaitForElapsed
0            0.2      425      1
SoftAutoCommitTest.testSoftCommitWithinAndHardCommitMaxTimeRapidAdds
0           14.7      525     77
StreamExpressionTest.testSignificantTermsStream
0            0.2      454      1      TestBadConfig(suite)
0            0.2      465      1
TestBlockJoin.testMultiChildQueriesOfDiffParentLevels
0            0.6      462      3
TestCloudCollectionsListeners.testCollectionDeletion
0            0.2      456      1      TestInfoStreamLogging(suite)
0            0.2      456      1      TestLazyCores.testLazySearch
0            0.2      473      1
TestLucene70DocValuesFormat.testSortedSetAroundBlockSize
0           15.4       26      4
TestMockDirectoryWrapper.testThreadSafetyInListAll
0            0.2      454      1      TestNodeLostTrigger.testTrigger
0            0.2      453      1      TestRecovery.stressLogReplay
0            0.2      505      1
TestReplicationHandler.testRateLimitedReplication
0            0.2      425      1
TestSolrCloudWithSecureImpersonation.testForwarding
0            0.9      461      4
TestSolrDeletionPolicy1.testNumCommitsConfigured
0            0.2      454      1      TestSystemIdResolver(suite)
0            0.2      451      1      TestV2Request.testCloudSolrClient
0            0.2      451      1      TestV2Request.testHttpSolrClient
0            9.1       77      7
TestWithCollection.testDeleteWithCollection
0            3.9       77      3
TestWithCollection.testMoveReplicaWithCollection

So I don't know what I'm going to do here, we'll see if I get more
optimistic when the fog lifts.

Erick

On Wed, Aug 1, 2018 at 7:15 AM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 7/30/2018 11:52 AM, Erick Erickson wrote:
>>
>> Is anybody paying the least attention to this or should I just stop
>> bothering?
>
>
> The job you're doing is thankless.  That's the nature of the work.  I'd love
> to have the time to really help you out. If only my employer didn't expect
> me to spend so much time *working*!
>
>> I'd hoped to get to a point where we could get at least semi-stable
>> and start whittling away at the backlog. But with an additional 63
>> tests to BadApple (a little fudging here because of some issues with
>> counting suite-level tests .vs. individual test) it doesn't seem like
>> we're going in the right direction at all.
>>
>> Unless there's some value here, defined by people stepping up and at
>> least looking (and once a week is not asking too much) at the names of
>> the tests I'm going to BadApple to see if they ring any bells, I'll
>> stop wasting my time.
>
>
> Here's a crazy thought, which might be something you already considered:
> Try to figure out which tests pass consistently and BadApple *all the rest*
> of the Solr tests.  If there are any Lucene tests that fail with some
> regularity, BadApple those too.
>
> There are probably disadvantages to this approach, but here are the
> advantages I can think of:  1) The noise stops quickly. 2) Future heroic
> efforts will result in measurable progress -- to quote you, "whittling away
> at the backlog."
>
> Thank you a million times over for all the care and effort you've put into
> this.
>
> Shawn
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: BadApple report. Seems like I'm wasting my time.

Posted by Shawn Heisey <ap...@elyograg.org>.

On 7/30/2018 11:52 AM, Erick Erickson wrote:
> Is anybody paying the least attention to this or should I just stop bothering?

The job you're doing is thankless.  That's the nature of the work.  I'd 
love to have the time to really help you out. If only my employer didn't 
expect me to spend so much time *working*!

> I'd hoped to get to a point where we could get at least semi-stable
> and start whittling away at the backlog. But with an additional 63
> tests to BadApple (a little fudging here because of some issues with
> counting suite-level tests .vs. individual test) it doesn't seem like
> we're going in the right direction at all.
>
> Unless there's some value here, defined by people stepping up and at
> least looking (and once a week is not asking too much) at the names of
> the tests I'm going to BadApple to see if they ring any bells, I'll
> stop wasting my time.

Here's a crazy thought, which might be something you already 
considered:  Try to figure out which tests pass consistently and 
BadApple *all the rest* of the Solr tests.  If there are any Lucene 
tests that fail with some regularity, BadApple those too.

There are probably disadvantages to this approach, but here are the 
advantages I can think of:  1) The noise stops quickly. 2) Future heroic 
efforts will result in measurable progress -- to quote you, "whittling 
away at the backlog."

Thank you a million times over for all the care and effort you've put 
into this.

Shawn

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: BadApple report. Seems like I'm wasting my time.

Posted by David Smiley <da...@gmail.com>.

I was thinking of the challenge with sporadic/random failures the other day
and what would help.  I think more and smarter notifications of failures
could help a lot.

(A) Using Git history, a Jenkins plugin could send an email to anyone who
touched the failing test in the last 4 weeks.  If that list is empty then
choose the most recent person.  This notification does not go to the dev
list.  Rationale: People who most recently maintained the test in some way
are likely to want to help keep it passing.

(B) (At fucit.com?) if a test has not failed in the 4 weeks prior, then
notify the dev list with an email about just this test (in subject).  If
"many" tests fail in a build, then those failures don't count for this
tracking.  Rationale:  Any active developer ought to take notice as this
may be caused by one of their commits.  Note: if "many" tests fail in a
build, then it's likely a reproducible recently-committed change with a
wide blast radius that is going to be fixed soon and which will already be
reported by standard Jenkins notifications.

These are just some ideas.  I looked for a Jenkins plugin that did (A) but
found none.  It seems most build setups including ours aren't oriented
around longitudinal tracking of individual tests, and are instead just
overall pass/fail tracking of the entire suite.  Hoss (& Mark?) have helped
track tests longitudinally but it's a separate system that one must
manually look at; it's not integrated with Jenkins nor with notifications.

~ David

On Tue, Jul 31, 2018 at 3:00 AM Dawid Weiss <da...@gmail.com> wrote:

> Hi Erick,
>
> > Is anybody paying the least attention to this or should I just stop
> bothering?
>
> I think your effort is invaluable, although if not backed by actions
> to fix those bugs
> it's pointless. I'm paying attention to the Lucene part. As for Solr
> tests I admit I gave
> up hope a longer while ago. I can't run past Solr tests on my machine
> anymore, no
> matter how many runs I try. Yes, this means I commit stuff back if I
> can run precommit
> and Lucene tests only -- it is terrible, but a fact.
>
> > But with an additional 63 tests to BadApple [...]
>
> Exactly. I don't see this situation getting any better, even with all
> your (and other people's) work
> put into fixing them. I don't have any ideas or solution for this, I'm
> afraid.
>
> Dawid
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: BadApple report. Seems like I'm wasting my time.

Posted by Dawid Weiss <da...@gmail.com>.

Hi Erick,

> Is anybody paying the least attention to this or should I just stop bothering?

I think your effort is invaluable, although if not backed by actions
to fix those bugs
it's pointless. I'm paying attention to the Lucene part. As for Solr
tests I admit I gave
up hope a longer while ago. I can't run past Solr tests on my machine
anymore, no
matter how many runs I try. Yes, this means I commit stuff back if I
can run precommit
and Lucene tests only -- it is terrible, but a fact.

> But with an additional 63 tests to BadApple [...]

Exactly. I don't see this situation getting any better, even with all
your (and other people's) work
put into fixing them. I don't have any ideas or solution for this, I'm afraid.

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org