You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Anthony Baker <ab...@pivotal.io> on 2016/05/02 20:25:50 UTC

Re: Next steps with flickering tests

I have results from 10 runs of all the tests excluding @FlakyTest.  These are the only failures:

ubuntu@ip-172-31-44-240:~$ grep FAILED incubator-geode/nohup.out | grep gemfire
com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > testMultipleCacheServer FAILED
com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > testMultipleCacheServer FAILED
com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > testMultipleCacheServer FAILED
com.gemstone.gemfire.cache30.DistributedAckPersistentRegionCCEDUnitTest > testTombstones FAILED
com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > testMultipleCacheServer FAILED
com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > testMultipleCacheServer FAILED
com.gemstone.gemfire.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > testParallelPropagationHA FAILED

Anthony

> On Apr 27, 2016, at 7:22 PM, Kirk Lund <kl...@pivotal.io> wrote:
> 
> We currently have over 10,000 tests but only about 147 are annotated with
> FlakyTest. It probably wouldn't cause precheckin to take much longer. My
> main argument for separating the FlakyTests into their own Jenkins build
> job is to get the main build job 100% green while we know the FlakyTest
> build job might "flicker".
> 
> -Kirk
> 
> 
> On Tue, Apr 26, 2016 at 1:58 PM, Udo Kohlmeyer <uk...@pivotal.io>
> wrote:
> 
>> Depending on the amount of "flaky" tests, this should not increase the
>> time too much.
>> I forsee these "flaky" tests to be few and far in between. Over time I
>> imagine this would be a last resort if we cannot fix the test or even
>> improve the test harness to have a clean test space for each test.
>> 
>> --Udo
>> 
>> 
>> On 27/04/2016 6:42 am, Jens Deppe wrote:
>> 
>>> By running the Flakes with forkEvery 1 won't it extend precheckin by a
>>> fair
>>> bit? I'd prefer to see two separate builds running.
>>> 
>>> On Tue, Apr 26, 2016 at 11:53 AM, Kirk Lund <kl...@pivotal.io> wrote:
>>> 
>>> I'm in favor of running the FlakyTests together at the end of precheckin
>>>> using forkEvery 1 on them too.
>>>> 
>>>> What about running two nightly builds? One that runs all the non-flaky
>>>> UnitTests, IntegrationTests and DistributedTests. Plus another nightly
>>>> build that runs only FlakyTests? We can run Jenkins jobs on our local
>>>> machines that separates FlakyTests out into its own job too, but I'd like
>>>> to see the main nightly build go to 100% green (if that's even possible
>>>> without encounter many more flickering tests).
>>>> 
>>>> -Kirk
>>>> 
>>>> 
>>>> On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <ds...@pivotal.io> wrote:
>>>> 
>>>> +1 for separating these out and running them with forkEvery 1.
>>>>> 
>>>>> I think they should probably still run as part of precheckin and the
>>>>> nightly builds though. We don't want this to turn into essentially
>>>>> disabling and ignoring these tests.
>>>>> 
>>>>> -Dan
>>>>> 
>>>>> On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <kl...@pivotal.io> wrote:
>>>>> 
>>>>>> Also, I don't think there's much value continuing to use the "CI"
>>>>>> 
>>>>> label.
>>>> 
>>>>> If
>>>>> 
>>>>>> a test fails in Jenkins, then run the test to see if it fails
>>>>>> 
>>>>> consistently.
>>>>> 
>>>>>> If it doesn't, it's flaky. The developer looking at it should try to
>>>>>> determine the cause of it failing (ie, "it uses thread sleeps or random
>>>>>> ports with BindExceptions or has short timeouts with probable GC
>>>>>> 
>>>>> pause")
>>>> 
>>>>> and include that info when adding the FlakyTest annotation and filing a
>>>>>> Jira bug with the Flaky label. If the test fails consistently, then
>>>>>> 
>>>>> file
>>>> 
>>>>> a
>>>>> 
>>>>>> Jira bug without the Flaky label.
>>>>>> 
>>>>>> -Kirk
>>>>>> 
>>>>>> 
>>>>>> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <kl...@pivotal.io> wrote:
>>>>>> 
>>>>>> There are quite a few test classes that have multiple test methods
>>>>>>> 
>>>>>> which
>>>> 
>>>>> are annotated with the FlakyTest category.
>>>>>>> 
>>>>>>> More thoughts:
>>>>>>> 
>>>>>>> In general, I think that if any given test fails intermittently then
>>>>>>> 
>>>>>> it
>>>> 
>>>>> is
>>>>> 
>>>>>> a FlakyTest. A good test should either pass or fail consistently.
>>>>>>> 
>>>>>> After
>>>> 
>>>>> annotating a test method with FlakyTest, the developer should then add
>>>>>>> 
>>>>>> the
>>>>> 
>>>>>> Flaky label to corresponding Jira ticket. What we then do with the
>>>>>>> 
>>>>>> Jira
>>>> 
>>>>> tickets (ie, fix them) is probably more important than deciding if a
>>>>>>> 
>>>>>> test
>>>>> 
>>>>>> is flaky or not.
>>>>>>> 
>>>>>>> Rather than try to come up with some flaky process for determining if
>>>>>>> 
>>>>>> a
>>>> 
>>>>> given test is flaky (ie, "does it have thread sleeps?"), it would be
>>>>>>> 
>>>>>> better
>>>>> 
>>>>>> to have a wiki page that has examples of flakiness and how to fix them
>>>>>>> 
>>>>>> ("if
>>>>> 
>>>>>> the test has thread sleeps, then switch to using Awaitility and do
>>>>>>> this...").
>>>>>>> 
>>>>>>> -Kirk
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <ab...@pivotal.io>
>>>>>>> 
>>>>>> wrote:
>>>>> 
>>>>>> Thanks Kirk!
>>>>>>>> 
>>>>>>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . | grep
>>>>>>>> 
>>>>>>> -v
>>>>> 
>>>>>> Binary | wc -l | xargs echo "Flake factor:"
>>>>>>>> Flake factor: 136
>>>>>>>> 
>>>>>>>> Anthony
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Apr 25, 2016, at 9:45 PM, William Markito <wm...@pivotal.io>
>>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> +1
>>>>>>>>> 
>>>>>>>>> Are we also planning to automate the additional build task somehow
>>>>>>>>> 
>>>>>>>> ?
>>>> 
>>>>> I'd also suggest creating a wiki page with some stats (like how
>>>>>>>>> 
>>>>>>>> many
>>>> 
>>>>> FlakyTests we currently have) and the idea behind this effort so we
>>>>>>>>> 
>>>>>>>> can
>>>>> 
>>>>>> keep track and see how it's evolving over time.
>>>>>>>>> 
>>>>>>>>> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io>
>>>>>>>>> 
>>>>>>>> wrote:
>>>> 
>>>>> After completing GEODE-1233, all currently known flickering tests
>>>>>>>>>> 
>>>>>>>>> are
>>>>> 
>>>>>> now
>>>>>>>> 
>>>>>>>>> annotated with our FlakyTest JUnit Category.
>>>>>>>>>> 
>>>>>>>>>> In an effort to divide our build up into multiple build pipelines
>>>>>>>>>> 
>>>>>>>>> that
>>>>> 
>>>>>> are
>>>>>>>> 
>>>>>>>>> sequential and dependable, we could consider excluding FlakyTests
>>>>>>>>>> 
>>>>>>>>> from
>>>>> 
>>>>>> the
>>>>>>>> 
>>>>>>>>> primary integrationTest and distributedTest tasks. An additional
>>>>>>>>>> 
>>>>>>>>> build
>>>>> 
>>>>>> task
>>>>>>>> 
>>>>>>>>> would then execute all of the FlakyTests separately. This would
>>>>>>>>>> 
>>>>>>>>> hopefully
>>>>>>>> 
>>>>>>>>> help us get to a point where we can depend on our primary testing
>>>>>>>>>> 
>>>>>>>>> tasks
>>>>> 
>>>>>> staying green 100% of the time. We would then prioritize fixing
>>>>>>>>>> 
>>>>>>>>> the
>>>> 
>>>>> FlakyTests and one by one removing the FlakyTest category from
>>>>>>>>>> 
>>>>>>>>> them.
>>>> 
>>>>> I would also suggest that we execute the FlakyTests with
>>>>>>>>>> 
>>>>>>>>> "forkEvery
>>>> 
>>>>> 1"
>>>>> 
>>>>>> to
>>>>>>>> 
>>>>>>>>> give each test a clean JVM or set of DistributedTest JVMs. That
>>>>>>>>>> 
>>>>>>>>> would
>>>>> 
>>>>>> hopefully decrease the chance of a GC pause or test pollution
>>>>>>>>>> 
>>>>>>>>> causing
>>>>> 
>>>>>> flickering failures.
>>>>>>>>>> 
>>>>>>>>>> Having reviewed lots of test code and failure stacks, I believe
>>>>>>>>>> 
>>>>>>>>> that
>>>> 
>>>>> the
>>>>>>>> 
>>>>>>>>> primary causes of FlakyTests are timing sensitivity (thread sleeps
>>>>>>>>>> 
>>>>>>>>> or
>>>>> 
>>>>>> nothing that waits for async activity, timeouts or sleeps that are
>>>>>>>>>> insufficient on busy CPU or I/O or during due GC pause) and random
>>>>>>>>>> 
>>>>>>>>> ports
>>>>>>>> 
>>>>>>>>> via AvailablePort (instead of using zero for ephemeral port).
>>>>>>>>>> 
>>>>>>>>>> Opinions or ideas? Hate it? Love it?
>>>>>>>>>> 
>>>>>>>>>> -Kirk
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> 
>>>>>>>>> ~/William
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>> 


Re: Next steps with flickering tests

Posted by Kirk Lund <kl...@pivotal.io>.
Looks like those tickets were filed since GEODE-1233 was completed.

-Kirk


On Mon, May 2, 2016 at 1:42 PM, Dan Smith <ds...@pivotal.io> wrote:

> testMultipleCacheServer *is* annotated as a flaky test. Maybe you aren't
> actually excluding anything?
>
> I'm surprised testTomstones is not annotated with flaky test. We have at
> least 3 bugs all related to this method that are still open - GEODE-1285,
> GEODE-1332, GEODE-1287.
>
> -Dan
>
>
>
>
> On Mon, May 2, 2016 at 11:25 AM, Anthony Baker <ab...@pivotal.io> wrote:
>
> > I have results from 10 runs of all the tests excluding @FlakyTest.  These
> > are the only failures:
> >
> > ubuntu@ip-172-31-44-240:~$ grep FAILED incubator-geode/nohup.out | grep
> > gemfire
> > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> > testMultipleCacheServer FAILED
> > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> > testMultipleCacheServer FAILED
> > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> > testMultipleCacheServer FAILED
> > com.gemstone.gemfire.cache30.DistributedAckPersistentRegionCCEDUnitTest >
> > testTombstones FAILED
> > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> > testMultipleCacheServer FAILED
> > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> > testMultipleCacheServer FAILED
> >
> com.gemstone.gemfire.internal.cache.wan.parallel.ParallelWANStatsDUnitTest
> > > testParallelPropagationHA FAILED
> >
> > Anthony
> >
> > > On Apr 27, 2016, at 7:22 PM, Kirk Lund <kl...@pivotal.io> wrote:
> > >
> > > We currently have over 10,000 tests but only about 147 are annotated
> with
> > > FlakyTest. It probably wouldn't cause precheckin to take much longer.
> My
> > > main argument for separating the FlakyTests into their own Jenkins
> build
> > > job is to get the main build job 100% green while we know the FlakyTest
> > > build job might "flicker".
> > >
> > > -Kirk
> > >
> > >
> > > On Tue, Apr 26, 2016 at 1:58 PM, Udo Kohlmeyer <uk...@pivotal.io>
> > > wrote:
> > >
> > >> Depending on the amount of "flaky" tests, this should not increase the
> > >> time too much.
> > >> I forsee these "flaky" tests to be few and far in between. Over time I
> > >> imagine this would be a last resort if we cannot fix the test or even
> > >> improve the test harness to have a clean test space for each test.
> > >>
> > >> --Udo
> > >>
> > >>
> > >> On 27/04/2016 6:42 am, Jens Deppe wrote:
> > >>
> > >>> By running the Flakes with forkEvery 1 won't it extend precheckin by
> a
> > >>> fair
> > >>> bit? I'd prefer to see two separate builds running.
> > >>>
> > >>> On Tue, Apr 26, 2016 at 11:53 AM, Kirk Lund <kl...@pivotal.io>
> wrote:
> > >>>
> > >>> I'm in favor of running the FlakyTests together at the end of
> > precheckin
> > >>>> using forkEvery 1 on them too.
> > >>>>
> > >>>> What about running two nightly builds? One that runs all the
> non-flaky
> > >>>> UnitTests, IntegrationTests and DistributedTests. Plus another
> nightly
> > >>>> build that runs only FlakyTests? We can run Jenkins jobs on our
> local
> > >>>> machines that separates FlakyTests out into its own job too, but I'd
> > like
> > >>>> to see the main nightly build go to 100% green (if that's even
> > possible
> > >>>> without encounter many more flickering tests).
> > >>>>
> > >>>> -Kirk
> > >>>>
> > >>>>
> > >>>> On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <ds...@pivotal.io>
> > wrote:
> > >>>>
> > >>>> +1 for separating these out and running them with forkEvery 1.
> > >>>>>
> > >>>>> I think they should probably still run as part of precheckin and
> the
> > >>>>> nightly builds though. We don't want this to turn into essentially
> > >>>>> disabling and ignoring these tests.
> > >>>>>
> > >>>>> -Dan
> > >>>>>
> > >>>>> On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <kl...@pivotal.io>
> > wrote:
> > >>>>>
> > >>>>>> Also, I don't think there's much value continuing to use the "CI"
> > >>>>>>
> > >>>>> label.
> > >>>>
> > >>>>> If
> > >>>>>
> > >>>>>> a test fails in Jenkins, then run the test to see if it fails
> > >>>>>>
> > >>>>> consistently.
> > >>>>>
> > >>>>>> If it doesn't, it's flaky. The developer looking at it should try
> to
> > >>>>>> determine the cause of it failing (ie, "it uses thread sleeps or
> > random
> > >>>>>> ports with BindExceptions or has short timeouts with probable GC
> > >>>>>>
> > >>>>> pause")
> > >>>>
> > >>>>> and include that info when adding the FlakyTest annotation and
> > filing a
> > >>>>>> Jira bug with the Flaky label. If the test fails consistently,
> then
> > >>>>>>
> > >>>>> file
> > >>>>
> > >>>>> a
> > >>>>>
> > >>>>>> Jira bug without the Flaky label.
> > >>>>>>
> > >>>>>> -Kirk
> > >>>>>>
> > >>>>>>
> > >>>>>> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <kl...@pivotal.io>
> > wrote:
> > >>>>>>
> > >>>>>> There are quite a few test classes that have multiple test methods
> > >>>>>>>
> > >>>>>> which
> > >>>>
> > >>>>> are annotated with the FlakyTest category.
> > >>>>>>>
> > >>>>>>> More thoughts:
> > >>>>>>>
> > >>>>>>> In general, I think that if any given test fails intermittently
> > then
> > >>>>>>>
> > >>>>>> it
> > >>>>
> > >>>>> is
> > >>>>>
> > >>>>>> a FlakyTest. A good test should either pass or fail consistently.
> > >>>>>>>
> > >>>>>> After
> > >>>>
> > >>>>> annotating a test method with FlakyTest, the developer should then
> > add
> > >>>>>>>
> > >>>>>> the
> > >>>>>
> > >>>>>> Flaky label to corresponding Jira ticket. What we then do with the
> > >>>>>>>
> > >>>>>> Jira
> > >>>>
> > >>>>> tickets (ie, fix them) is probably more important than deciding if
> a
> > >>>>>>>
> > >>>>>> test
> > >>>>>
> > >>>>>> is flaky or not.
> > >>>>>>>
> > >>>>>>> Rather than try to come up with some flaky process for
> determining
> > if
> > >>>>>>>
> > >>>>>> a
> > >>>>
> > >>>>> given test is flaky (ie, "does it have thread sleeps?"), it would
> be
> > >>>>>>>
> > >>>>>> better
> > >>>>>
> > >>>>>> to have a wiki page that has examples of flakiness and how to fix
> > them
> > >>>>>>>
> > >>>>>> ("if
> > >>>>>
> > >>>>>> the test has thread sleeps, then switch to using Awaitility and do
> > >>>>>>> this...").
> > >>>>>>>
> > >>>>>>> -Kirk
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <
> abaker@pivotal.io
> > >
> > >>>>>>>
> > >>>>>> wrote:
> > >>>>>
> > >>>>>> Thanks Kirk!
> > >>>>>>>>
> > >>>>>>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . |
> > grep
> > >>>>>>>>
> > >>>>>>> -v
> > >>>>>
> > >>>>>> Binary | wc -l | xargs echo "Flake factor:"
> > >>>>>>>> Flake factor: 136
> > >>>>>>>>
> > >>>>>>>> Anthony
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Apr 25, 2016, at 9:45 PM, William Markito <
> wmarkito@pivotal.io
> > >
> > >>>>>>>>>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> +1
> > >>>>>>>>>
> > >>>>>>>>> Are we also planning to automate the additional build task
> > somehow
> > >>>>>>>>>
> > >>>>>>>> ?
> > >>>>
> > >>>>> I'd also suggest creating a wiki page with some stats (like how
> > >>>>>>>>>
> > >>>>>>>> many
> > >>>>
> > >>>>> FlakyTests we currently have) and the idea behind this effort so we
> > >>>>>>>>>
> > >>>>>>>> can
> > >>>>>
> > >>>>>> keep track and see how it's evolving over time.
> > >>>>>>>>>
> > >>>>>>>>> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io>
> > >>>>>>>>>
> > >>>>>>>> wrote:
> > >>>>
> > >>>>> After completing GEODE-1233, all currently known flickering tests
> > >>>>>>>>>>
> > >>>>>>>>> are
> > >>>>>
> > >>>>>> now
> > >>>>>>>>
> > >>>>>>>>> annotated with our FlakyTest JUnit Category.
> > >>>>>>>>>>
> > >>>>>>>>>> In an effort to divide our build up into multiple build
> > pipelines
> > >>>>>>>>>>
> > >>>>>>>>> that
> > >>>>>
> > >>>>>> are
> > >>>>>>>>
> > >>>>>>>>> sequential and dependable, we could consider excluding
> FlakyTests
> > >>>>>>>>>>
> > >>>>>>>>> from
> > >>>>>
> > >>>>>> the
> > >>>>>>>>
> > >>>>>>>>> primary integrationTest and distributedTest tasks. An
> additional
> > >>>>>>>>>>
> > >>>>>>>>> build
> > >>>>>
> > >>>>>> task
> > >>>>>>>>
> > >>>>>>>>> would then execute all of the FlakyTests separately. This would
> > >>>>>>>>>>
> > >>>>>>>>> hopefully
> > >>>>>>>>
> > >>>>>>>>> help us get to a point where we can depend on our primary
> testing
> > >>>>>>>>>>
> > >>>>>>>>> tasks
> > >>>>>
> > >>>>>> staying green 100% of the time. We would then prioritize fixing
> > >>>>>>>>>>
> > >>>>>>>>> the
> > >>>>
> > >>>>> FlakyTests and one by one removing the FlakyTest category from
> > >>>>>>>>>>
> > >>>>>>>>> them.
> > >>>>
> > >>>>> I would also suggest that we execute the FlakyTests with
> > >>>>>>>>>>
> > >>>>>>>>> "forkEvery
> > >>>>
> > >>>>> 1"
> > >>>>>
> > >>>>>> to
> > >>>>>>>>
> > >>>>>>>>> give each test a clean JVM or set of DistributedTest JVMs. That
> > >>>>>>>>>>
> > >>>>>>>>> would
> > >>>>>
> > >>>>>> hopefully decrease the chance of a GC pause or test pollution
> > >>>>>>>>>>
> > >>>>>>>>> causing
> > >>>>>
> > >>>>>> flickering failures.
> > >>>>>>>>>>
> > >>>>>>>>>> Having reviewed lots of test code and failure stacks, I
> believe
> > >>>>>>>>>>
> > >>>>>>>>> that
> > >>>>
> > >>>>> the
> > >>>>>>>>
> > >>>>>>>>> primary causes of FlakyTests are timing sensitivity (thread
> > sleeps
> > >>>>>>>>>>
> > >>>>>>>>> or
> > >>>>>
> > >>>>>> nothing that waits for async activity, timeouts or sleeps that are
> > >>>>>>>>>> insufficient on busy CPU or I/O or during due GC pause) and
> > random
> > >>>>>>>>>>
> > >>>>>>>>> ports
> > >>>>>>>>
> > >>>>>>>>> via AvailablePort (instead of using zero for ephemeral port).
> > >>>>>>>>>>
> > >>>>>>>>>> Opinions or ideas? Hate it? Love it?
> > >>>>>>>>>>
> > >>>>>>>>>> -Kirk
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>>
> > >>>>>>>>> ~/William
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>
> >
> >
>

Re: Next steps with flickering tests

Posted by Dan Smith <ds...@pivotal.io>.
testMultipleCacheServer *is* annotated as a flaky test. Maybe you aren't
actually excluding anything?

I'm surprised testTomstones is not annotated with flaky test. We have at
least 3 bugs all related to this method that are still open - GEODE-1285,
GEODE-1332, GEODE-1287.

-Dan




On Mon, May 2, 2016 at 11:25 AM, Anthony Baker <ab...@pivotal.io> wrote:

> I have results from 10 runs of all the tests excluding @FlakyTest.  These
> are the only failures:
>
> ubuntu@ip-172-31-44-240:~$ grep FAILED incubator-geode/nohup.out | grep
> gemfire
> com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> testMultipleCacheServer FAILED
> com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> testMultipleCacheServer FAILED
> com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> testMultipleCacheServer FAILED
> com.gemstone.gemfire.cache30.DistributedAckPersistentRegionCCEDUnitTest >
> testTombstones FAILED
> com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> testMultipleCacheServer FAILED
> com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> testMultipleCacheServer FAILED
> com.gemstone.gemfire.internal.cache.wan.parallel.ParallelWANStatsDUnitTest
> > testParallelPropagationHA FAILED
>
> Anthony
>
> > On Apr 27, 2016, at 7:22 PM, Kirk Lund <kl...@pivotal.io> wrote:
> >
> > We currently have over 10,000 tests but only about 147 are annotated with
> > FlakyTest. It probably wouldn't cause precheckin to take much longer. My
> > main argument for separating the FlakyTests into their own Jenkins build
> > job is to get the main build job 100% green while we know the FlakyTest
> > build job might "flicker".
> >
> > -Kirk
> >
> >
> > On Tue, Apr 26, 2016 at 1:58 PM, Udo Kohlmeyer <uk...@pivotal.io>
> > wrote:
> >
> >> Depending on the amount of "flaky" tests, this should not increase the
> >> time too much.
> >> I forsee these "flaky" tests to be few and far in between. Over time I
> >> imagine this would be a last resort if we cannot fix the test or even
> >> improve the test harness to have a clean test space for each test.
> >>
> >> --Udo
> >>
> >>
> >> On 27/04/2016 6:42 am, Jens Deppe wrote:
> >>
> >>> By running the Flakes with forkEvery 1 won't it extend precheckin by a
> >>> fair
> >>> bit? I'd prefer to see two separate builds running.
> >>>
> >>> On Tue, Apr 26, 2016 at 11:53 AM, Kirk Lund <kl...@pivotal.io> wrote:
> >>>
> >>> I'm in favor of running the FlakyTests together at the end of
> precheckin
> >>>> using forkEvery 1 on them too.
> >>>>
> >>>> What about running two nightly builds? One that runs all the non-flaky
> >>>> UnitTests, IntegrationTests and DistributedTests. Plus another nightly
> >>>> build that runs only FlakyTests? We can run Jenkins jobs on our local
> >>>> machines that separates FlakyTests out into its own job too, but I'd
> like
> >>>> to see the main nightly build go to 100% green (if that's even
> possible
> >>>> without encounter many more flickering tests).
> >>>>
> >>>> -Kirk
> >>>>
> >>>>
> >>>> On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <ds...@pivotal.io>
> wrote:
> >>>>
> >>>> +1 for separating these out and running them with forkEvery 1.
> >>>>>
> >>>>> I think they should probably still run as part of precheckin and the
> >>>>> nightly builds though. We don't want this to turn into essentially
> >>>>> disabling and ignoring these tests.
> >>>>>
> >>>>> -Dan
> >>>>>
> >>>>> On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <kl...@pivotal.io>
> wrote:
> >>>>>
> >>>>>> Also, I don't think there's much value continuing to use the "CI"
> >>>>>>
> >>>>> label.
> >>>>
> >>>>> If
> >>>>>
> >>>>>> a test fails in Jenkins, then run the test to see if it fails
> >>>>>>
> >>>>> consistently.
> >>>>>
> >>>>>> If it doesn't, it's flaky. The developer looking at it should try to
> >>>>>> determine the cause of it failing (ie, "it uses thread sleeps or
> random
> >>>>>> ports with BindExceptions or has short timeouts with probable GC
> >>>>>>
> >>>>> pause")
> >>>>
> >>>>> and include that info when adding the FlakyTest annotation and
> filing a
> >>>>>> Jira bug with the Flaky label. If the test fails consistently, then
> >>>>>>
> >>>>> file
> >>>>
> >>>>> a
> >>>>>
> >>>>>> Jira bug without the Flaky label.
> >>>>>>
> >>>>>> -Kirk
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <kl...@pivotal.io>
> wrote:
> >>>>>>
> >>>>>> There are quite a few test classes that have multiple test methods
> >>>>>>>
> >>>>>> which
> >>>>
> >>>>> are annotated with the FlakyTest category.
> >>>>>>>
> >>>>>>> More thoughts:
> >>>>>>>
> >>>>>>> In general, I think that if any given test fails intermittently
> then
> >>>>>>>
> >>>>>> it
> >>>>
> >>>>> is
> >>>>>
> >>>>>> a FlakyTest. A good test should either pass or fail consistently.
> >>>>>>>
> >>>>>> After
> >>>>
> >>>>> annotating a test method with FlakyTest, the developer should then
> add
> >>>>>>>
> >>>>>> the
> >>>>>
> >>>>>> Flaky label to corresponding Jira ticket. What we then do with the
> >>>>>>>
> >>>>>> Jira
> >>>>
> >>>>> tickets (ie, fix them) is probably more important than deciding if a
> >>>>>>>
> >>>>>> test
> >>>>>
> >>>>>> is flaky or not.
> >>>>>>>
> >>>>>>> Rather than try to come up with some flaky process for determining
> if
> >>>>>>>
> >>>>>> a
> >>>>
> >>>>> given test is flaky (ie, "does it have thread sleeps?"), it would be
> >>>>>>>
> >>>>>> better
> >>>>>
> >>>>>> to have a wiki page that has examples of flakiness and how to fix
> them
> >>>>>>>
> >>>>>> ("if
> >>>>>
> >>>>>> the test has thread sleeps, then switch to using Awaitility and do
> >>>>>>> this...").
> >>>>>>>
> >>>>>>> -Kirk
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <abaker@pivotal.io
> >
> >>>>>>>
> >>>>>> wrote:
> >>>>>
> >>>>>> Thanks Kirk!
> >>>>>>>>
> >>>>>>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . |
> grep
> >>>>>>>>
> >>>>>>> -v
> >>>>>
> >>>>>> Binary | wc -l | xargs echo "Flake factor:"
> >>>>>>>> Flake factor: 136
> >>>>>>>>
> >>>>>>>> Anthony
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Apr 25, 2016, at 9:45 PM, William Markito <wmarkito@pivotal.io
> >
> >>>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> +1
> >>>>>>>>>
> >>>>>>>>> Are we also planning to automate the additional build task
> somehow
> >>>>>>>>>
> >>>>>>>> ?
> >>>>
> >>>>> I'd also suggest creating a wiki page with some stats (like how
> >>>>>>>>>
> >>>>>>>> many
> >>>>
> >>>>> FlakyTests we currently have) and the idea behind this effort so we
> >>>>>>>>>
> >>>>>>>> can
> >>>>>
> >>>>>> keep track and see how it's evolving over time.
> >>>>>>>>>
> >>>>>>>>> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io>
> >>>>>>>>>
> >>>>>>>> wrote:
> >>>>
> >>>>> After completing GEODE-1233, all currently known flickering tests
> >>>>>>>>>>
> >>>>>>>>> are
> >>>>>
> >>>>>> now
> >>>>>>>>
> >>>>>>>>> annotated with our FlakyTest JUnit Category.
> >>>>>>>>>>
> >>>>>>>>>> In an effort to divide our build up into multiple build
> pipelines
> >>>>>>>>>>
> >>>>>>>>> that
> >>>>>
> >>>>>> are
> >>>>>>>>
> >>>>>>>>> sequential and dependable, we could consider excluding FlakyTests
> >>>>>>>>>>
> >>>>>>>>> from
> >>>>>
> >>>>>> the
> >>>>>>>>
> >>>>>>>>> primary integrationTest and distributedTest tasks. An additional
> >>>>>>>>>>
> >>>>>>>>> build
> >>>>>
> >>>>>> task
> >>>>>>>>
> >>>>>>>>> would then execute all of the FlakyTests separately. This would
> >>>>>>>>>>
> >>>>>>>>> hopefully
> >>>>>>>>
> >>>>>>>>> help us get to a point where we can depend on our primary testing
> >>>>>>>>>>
> >>>>>>>>> tasks
> >>>>>
> >>>>>> staying green 100% of the time. We would then prioritize fixing
> >>>>>>>>>>
> >>>>>>>>> the
> >>>>
> >>>>> FlakyTests and one by one removing the FlakyTest category from
> >>>>>>>>>>
> >>>>>>>>> them.
> >>>>
> >>>>> I would also suggest that we execute the FlakyTests with
> >>>>>>>>>>
> >>>>>>>>> "forkEvery
> >>>>
> >>>>> 1"
> >>>>>
> >>>>>> to
> >>>>>>>>
> >>>>>>>>> give each test a clean JVM or set of DistributedTest JVMs. That
> >>>>>>>>>>
> >>>>>>>>> would
> >>>>>
> >>>>>> hopefully decrease the chance of a GC pause or test pollution
> >>>>>>>>>>
> >>>>>>>>> causing
> >>>>>
> >>>>>> flickering failures.
> >>>>>>>>>>
> >>>>>>>>>> Having reviewed lots of test code and failure stacks, I believe
> >>>>>>>>>>
> >>>>>>>>> that
> >>>>
> >>>>> the
> >>>>>>>>
> >>>>>>>>> primary causes of FlakyTests are timing sensitivity (thread
> sleeps
> >>>>>>>>>>
> >>>>>>>>> or
> >>>>>
> >>>>>> nothing that waits for async activity, timeouts or sleeps that are
> >>>>>>>>>> insufficient on busy CPU or I/O or during due GC pause) and
> random
> >>>>>>>>>>
> >>>>>>>>> ports
> >>>>>>>>
> >>>>>>>>> via AvailablePort (instead of using zero for ephemeral port).
> >>>>>>>>>>
> >>>>>>>>>> Opinions or ideas? Hate it? Love it?
> >>>>>>>>>>
> >>>>>>>>>> -Kirk
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>>
> >>>>>>>>> ~/William
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>
>
>