You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Anthony Baker <ab...@pivotal.io> on 2016/05/02 20:25:50 UTC
Re: Next steps with flickering tests
I have results from 10 runs of all the tests excluding @FlakyTest. These are the only failures:
ubuntu@ip-172-31-44-240:~$ grep FAILED incubator-geode/nohup.out | grep gemfire
com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > testMultipleCacheServer FAILED
com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > testMultipleCacheServer FAILED
com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > testMultipleCacheServer FAILED
com.gemstone.gemfire.cache30.DistributedAckPersistentRegionCCEDUnitTest > testTombstones FAILED
com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > testMultipleCacheServer FAILED
com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > testMultipleCacheServer FAILED
com.gemstone.gemfire.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > testParallelPropagationHA FAILED
Anthony
> On Apr 27, 2016, at 7:22 PM, Kirk Lund <kl...@pivotal.io> wrote:
>
> We currently have over 10,000 tests but only about 147 are annotated with
> FlakyTest. It probably wouldn't cause precheckin to take much longer. My
> main argument for separating the FlakyTests into their own Jenkins build
> job is to get the main build job 100% green while we know the FlakyTest
> build job might "flicker".
>
> -Kirk
>
>
> On Tue, Apr 26, 2016 at 1:58 PM, Udo Kohlmeyer <uk...@pivotal.io>
> wrote:
>
>> Depending on the amount of "flaky" tests, this should not increase the
>> time too much.
>> I forsee these "flaky" tests to be few and far in between. Over time I
>> imagine this would be a last resort if we cannot fix the test or even
>> improve the test harness to have a clean test space for each test.
>>
>> --Udo
>>
>>
>> On 27/04/2016 6:42 am, Jens Deppe wrote:
>>
>>> By running the Flakes with forkEvery 1 won't it extend precheckin by a
>>> fair
>>> bit? I'd prefer to see two separate builds running.
>>>
>>> On Tue, Apr 26, 2016 at 11:53 AM, Kirk Lund <kl...@pivotal.io> wrote:
>>>
>>> I'm in favor of running the FlakyTests together at the end of precheckin
>>>> using forkEvery 1 on them too.
>>>>
>>>> What about running two nightly builds? One that runs all the non-flaky
>>>> UnitTests, IntegrationTests and DistributedTests. Plus another nightly
>>>> build that runs only FlakyTests? We can run Jenkins jobs on our local
>>>> machines that separates FlakyTests out into its own job too, but I'd like
>>>> to see the main nightly build go to 100% green (if that's even possible
>>>> without encounter many more flickering tests).
>>>>
>>>> -Kirk
>>>>
>>>>
>>>> On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <ds...@pivotal.io> wrote:
>>>>
>>>> +1 for separating these out and running them with forkEvery 1.
>>>>>
>>>>> I think they should probably still run as part of precheckin and the
>>>>> nightly builds though. We don't want this to turn into essentially
>>>>> disabling and ignoring these tests.
>>>>>
>>>>> -Dan
>>>>>
>>>>> On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <kl...@pivotal.io> wrote:
>>>>>
>>>>>> Also, I don't think there's much value continuing to use the "CI"
>>>>>>
>>>>> label.
>>>>
>>>>> If
>>>>>
>>>>>> a test fails in Jenkins, then run the test to see if it fails
>>>>>>
>>>>> consistently.
>>>>>
>>>>>> If it doesn't, it's flaky. The developer looking at it should try to
>>>>>> determine the cause of it failing (ie, "it uses thread sleeps or random
>>>>>> ports with BindExceptions or has short timeouts with probable GC
>>>>>>
>>>>> pause")
>>>>
>>>>> and include that info when adding the FlakyTest annotation and filing a
>>>>>> Jira bug with the Flaky label. If the test fails consistently, then
>>>>>>
>>>>> file
>>>>
>>>>> a
>>>>>
>>>>>> Jira bug without the Flaky label.
>>>>>>
>>>>>> -Kirk
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <kl...@pivotal.io> wrote:
>>>>>>
>>>>>> There are quite a few test classes that have multiple test methods
>>>>>>>
>>>>>> which
>>>>
>>>>> are annotated with the FlakyTest category.
>>>>>>>
>>>>>>> More thoughts:
>>>>>>>
>>>>>>> In general, I think that if any given test fails intermittently then
>>>>>>>
>>>>>> it
>>>>
>>>>> is
>>>>>
>>>>>> a FlakyTest. A good test should either pass or fail consistently.
>>>>>>>
>>>>>> After
>>>>
>>>>> annotating a test method with FlakyTest, the developer should then add
>>>>>>>
>>>>>> the
>>>>>
>>>>>> Flaky label to corresponding Jira ticket. What we then do with the
>>>>>>>
>>>>>> Jira
>>>>
>>>>> tickets (ie, fix them) is probably more important than deciding if a
>>>>>>>
>>>>>> test
>>>>>
>>>>>> is flaky or not.
>>>>>>>
>>>>>>> Rather than try to come up with some flaky process for determining if
>>>>>>>
>>>>>> a
>>>>
>>>>> given test is flaky (ie, "does it have thread sleeps?"), it would be
>>>>>>>
>>>>>> better
>>>>>
>>>>>> to have a wiki page that has examples of flakiness and how to fix them
>>>>>>>
>>>>>> ("if
>>>>>
>>>>>> the test has thread sleeps, then switch to using Awaitility and do
>>>>>>> this...").
>>>>>>>
>>>>>>> -Kirk
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <ab...@pivotal.io>
>>>>>>>
>>>>>> wrote:
>>>>>
>>>>>> Thanks Kirk!
>>>>>>>>
>>>>>>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . | grep
>>>>>>>>
>>>>>>> -v
>>>>>
>>>>>> Binary | wc -l | xargs echo "Flake factor:"
>>>>>>>> Flake factor: 136
>>>>>>>>
>>>>>>>> Anthony
>>>>>>>>
>>>>>>>>
>>>>>>>> On Apr 25, 2016, at 9:45 PM, William Markito <wm...@pivotal.io>
>>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> Are we also planning to automate the additional build task somehow
>>>>>>>>>
>>>>>>>> ?
>>>>
>>>>> I'd also suggest creating a wiki page with some stats (like how
>>>>>>>>>
>>>>>>>> many
>>>>
>>>>> FlakyTests we currently have) and the idea behind this effort so we
>>>>>>>>>
>>>>>>>> can
>>>>>
>>>>>> keep track and see how it's evolving over time.
>>>>>>>>>
>>>>>>>>> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io>
>>>>>>>>>
>>>>>>>> wrote:
>>>>
>>>>> After completing GEODE-1233, all currently known flickering tests
>>>>>>>>>>
>>>>>>>>> are
>>>>>
>>>>>> now
>>>>>>>>
>>>>>>>>> annotated with our FlakyTest JUnit Category.
>>>>>>>>>>
>>>>>>>>>> In an effort to divide our build up into multiple build pipelines
>>>>>>>>>>
>>>>>>>>> that
>>>>>
>>>>>> are
>>>>>>>>
>>>>>>>>> sequential and dependable, we could consider excluding FlakyTests
>>>>>>>>>>
>>>>>>>>> from
>>>>>
>>>>>> the
>>>>>>>>
>>>>>>>>> primary integrationTest and distributedTest tasks. An additional
>>>>>>>>>>
>>>>>>>>> build
>>>>>
>>>>>> task
>>>>>>>>
>>>>>>>>> would then execute all of the FlakyTests separately. This would
>>>>>>>>>>
>>>>>>>>> hopefully
>>>>>>>>
>>>>>>>>> help us get to a point where we can depend on our primary testing
>>>>>>>>>>
>>>>>>>>> tasks
>>>>>
>>>>>> staying green 100% of the time. We would then prioritize fixing
>>>>>>>>>>
>>>>>>>>> the
>>>>
>>>>> FlakyTests and one by one removing the FlakyTest category from
>>>>>>>>>>
>>>>>>>>> them.
>>>>
>>>>> I would also suggest that we execute the FlakyTests with
>>>>>>>>>>
>>>>>>>>> "forkEvery
>>>>
>>>>> 1"
>>>>>
>>>>>> to
>>>>>>>>
>>>>>>>>> give each test a clean JVM or set of DistributedTest JVMs. That
>>>>>>>>>>
>>>>>>>>> would
>>>>>
>>>>>> hopefully decrease the chance of a GC pause or test pollution
>>>>>>>>>>
>>>>>>>>> causing
>>>>>
>>>>>> flickering failures.
>>>>>>>>>>
>>>>>>>>>> Having reviewed lots of test code and failure stacks, I believe
>>>>>>>>>>
>>>>>>>>> that
>>>>
>>>>> the
>>>>>>>>
>>>>>>>>> primary causes of FlakyTests are timing sensitivity (thread sleeps
>>>>>>>>>>
>>>>>>>>> or
>>>>>
>>>>>> nothing that waits for async activity, timeouts or sleeps that are
>>>>>>>>>> insufficient on busy CPU or I/O or during due GC pause) and random
>>>>>>>>>>
>>>>>>>>> ports
>>>>>>>>
>>>>>>>>> via AvailablePort (instead of using zero for ephemeral port).
>>>>>>>>>>
>>>>>>>>>> Opinions or ideas? Hate it? Love it?
>>>>>>>>>>
>>>>>>>>>> -Kirk
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> ~/William
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>
Re: Next steps with flickering tests
Posted by Kirk Lund <kl...@pivotal.io>.
Looks like those tickets were filed since GEODE-1233 was completed.
-Kirk
On Mon, May 2, 2016 at 1:42 PM, Dan Smith <ds...@pivotal.io> wrote:
> testMultipleCacheServer *is* annotated as a flaky test. Maybe you aren't
> actually excluding anything?
>
> I'm surprised testTomstones is not annotated with flaky test. We have at
> least 3 bugs all related to this method that are still open - GEODE-1285,
> GEODE-1332, GEODE-1287.
>
> -Dan
>
>
>
>
> On Mon, May 2, 2016 at 11:25 AM, Anthony Baker <ab...@pivotal.io> wrote:
>
> > I have results from 10 runs of all the tests excluding @FlakyTest. These
> > are the only failures:
> >
> > ubuntu@ip-172-31-44-240:~$ grep FAILED incubator-geode/nohup.out | grep
> > gemfire
> > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> > testMultipleCacheServer FAILED
> > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> > testMultipleCacheServer FAILED
> > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> > testMultipleCacheServer FAILED
> > com.gemstone.gemfire.cache30.DistributedAckPersistentRegionCCEDUnitTest >
> > testTombstones FAILED
> > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> > testMultipleCacheServer FAILED
> > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> > testMultipleCacheServer FAILED
> >
> com.gemstone.gemfire.internal.cache.wan.parallel.ParallelWANStatsDUnitTest
> > > testParallelPropagationHA FAILED
> >
> > Anthony
> >
> > > On Apr 27, 2016, at 7:22 PM, Kirk Lund <kl...@pivotal.io> wrote:
> > >
> > > We currently have over 10,000 tests but only about 147 are annotated
> with
> > > FlakyTest. It probably wouldn't cause precheckin to take much longer.
> My
> > > main argument for separating the FlakyTests into their own Jenkins
> build
> > > job is to get the main build job 100% green while we know the FlakyTest
> > > build job might "flicker".
> > >
> > > -Kirk
> > >
> > >
> > > On Tue, Apr 26, 2016 at 1:58 PM, Udo Kohlmeyer <uk...@pivotal.io>
> > > wrote:
> > >
> > >> Depending on the amount of "flaky" tests, this should not increase the
> > >> time too much.
> > >> I forsee these "flaky" tests to be few and far in between. Over time I
> > >> imagine this would be a last resort if we cannot fix the test or even
> > >> improve the test harness to have a clean test space for each test.
> > >>
> > >> --Udo
> > >>
> > >>
> > >> On 27/04/2016 6:42 am, Jens Deppe wrote:
> > >>
> > >>> By running the Flakes with forkEvery 1 won't it extend precheckin by
> a
> > >>> fair
> > >>> bit? I'd prefer to see two separate builds running.
> > >>>
> > >>> On Tue, Apr 26, 2016 at 11:53 AM, Kirk Lund <kl...@pivotal.io>
> wrote:
> > >>>
> > >>> I'm in favor of running the FlakyTests together at the end of
> > precheckin
> > >>>> using forkEvery 1 on them too.
> > >>>>
> > >>>> What about running two nightly builds? One that runs all the
> non-flaky
> > >>>> UnitTests, IntegrationTests and DistributedTests. Plus another
> nightly
> > >>>> build that runs only FlakyTests? We can run Jenkins jobs on our
> local
> > >>>> machines that separates FlakyTests out into its own job too, but I'd
> > like
> > >>>> to see the main nightly build go to 100% green (if that's even
> > possible
> > >>>> without encounter many more flickering tests).
> > >>>>
> > >>>> -Kirk
> > >>>>
> > >>>>
> > >>>> On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <ds...@pivotal.io>
> > wrote:
> > >>>>
> > >>>> +1 for separating these out and running them with forkEvery 1.
> > >>>>>
> > >>>>> I think they should probably still run as part of precheckin and
> the
> > >>>>> nightly builds though. We don't want this to turn into essentially
> > >>>>> disabling and ignoring these tests.
> > >>>>>
> > >>>>> -Dan
> > >>>>>
> > >>>>> On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <kl...@pivotal.io>
> > wrote:
> > >>>>>
> > >>>>>> Also, I don't think there's much value continuing to use the "CI"
> > >>>>>>
> > >>>>> label.
> > >>>>
> > >>>>> If
> > >>>>>
> > >>>>>> a test fails in Jenkins, then run the test to see if it fails
> > >>>>>>
> > >>>>> consistently.
> > >>>>>
> > >>>>>> If it doesn't, it's flaky. The developer looking at it should try
> to
> > >>>>>> determine the cause of it failing (ie, "it uses thread sleeps or
> > random
> > >>>>>> ports with BindExceptions or has short timeouts with probable GC
> > >>>>>>
> > >>>>> pause")
> > >>>>
> > >>>>> and include that info when adding the FlakyTest annotation and
> > filing a
> > >>>>>> Jira bug with the Flaky label. If the test fails consistently,
> then
> > >>>>>>
> > >>>>> file
> > >>>>
> > >>>>> a
> > >>>>>
> > >>>>>> Jira bug without the Flaky label.
> > >>>>>>
> > >>>>>> -Kirk
> > >>>>>>
> > >>>>>>
> > >>>>>> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <kl...@pivotal.io>
> > wrote:
> > >>>>>>
> > >>>>>> There are quite a few test classes that have multiple test methods
> > >>>>>>>
> > >>>>>> which
> > >>>>
> > >>>>> are annotated with the FlakyTest category.
> > >>>>>>>
> > >>>>>>> More thoughts:
> > >>>>>>>
> > >>>>>>> In general, I think that if any given test fails intermittently
> > then
> > >>>>>>>
> > >>>>>> it
> > >>>>
> > >>>>> is
> > >>>>>
> > >>>>>> a FlakyTest. A good test should either pass or fail consistently.
> > >>>>>>>
> > >>>>>> After
> > >>>>
> > >>>>> annotating a test method with FlakyTest, the developer should then
> > add
> > >>>>>>>
> > >>>>>> the
> > >>>>>
> > >>>>>> Flaky label to corresponding Jira ticket. What we then do with the
> > >>>>>>>
> > >>>>>> Jira
> > >>>>
> > >>>>> tickets (ie, fix them) is probably more important than deciding if
> a
> > >>>>>>>
> > >>>>>> test
> > >>>>>
> > >>>>>> is flaky or not.
> > >>>>>>>
> > >>>>>>> Rather than try to come up with some flaky process for
> determining
> > if
> > >>>>>>>
> > >>>>>> a
> > >>>>
> > >>>>> given test is flaky (ie, "does it have thread sleeps?"), it would
> be
> > >>>>>>>
> > >>>>>> better
> > >>>>>
> > >>>>>> to have a wiki page that has examples of flakiness and how to fix
> > them
> > >>>>>>>
> > >>>>>> ("if
> > >>>>>
> > >>>>>> the test has thread sleeps, then switch to using Awaitility and do
> > >>>>>>> this...").
> > >>>>>>>
> > >>>>>>> -Kirk
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <
> abaker@pivotal.io
> > >
> > >>>>>>>
> > >>>>>> wrote:
> > >>>>>
> > >>>>>> Thanks Kirk!
> > >>>>>>>>
> > >>>>>>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . |
> > grep
> > >>>>>>>>
> > >>>>>>> -v
> > >>>>>
> > >>>>>> Binary | wc -l | xargs echo "Flake factor:"
> > >>>>>>>> Flake factor: 136
> > >>>>>>>>
> > >>>>>>>> Anthony
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Apr 25, 2016, at 9:45 PM, William Markito <
> wmarkito@pivotal.io
> > >
> > >>>>>>>>>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> +1
> > >>>>>>>>>
> > >>>>>>>>> Are we also planning to automate the additional build task
> > somehow
> > >>>>>>>>>
> > >>>>>>>> ?
> > >>>>
> > >>>>> I'd also suggest creating a wiki page with some stats (like how
> > >>>>>>>>>
> > >>>>>>>> many
> > >>>>
> > >>>>> FlakyTests we currently have) and the idea behind this effort so we
> > >>>>>>>>>
> > >>>>>>>> can
> > >>>>>
> > >>>>>> keep track and see how it's evolving over time.
> > >>>>>>>>>
> > >>>>>>>>> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io>
> > >>>>>>>>>
> > >>>>>>>> wrote:
> > >>>>
> > >>>>> After completing GEODE-1233, all currently known flickering tests
> > >>>>>>>>>>
> > >>>>>>>>> are
> > >>>>>
> > >>>>>> now
> > >>>>>>>>
> > >>>>>>>>> annotated with our FlakyTest JUnit Category.
> > >>>>>>>>>>
> > >>>>>>>>>> In an effort to divide our build up into multiple build
> > pipelines
> > >>>>>>>>>>
> > >>>>>>>>> that
> > >>>>>
> > >>>>>> are
> > >>>>>>>>
> > >>>>>>>>> sequential and dependable, we could consider excluding
> FlakyTests
> > >>>>>>>>>>
> > >>>>>>>>> from
> > >>>>>
> > >>>>>> the
> > >>>>>>>>
> > >>>>>>>>> primary integrationTest and distributedTest tasks. An
> additional
> > >>>>>>>>>>
> > >>>>>>>>> build
> > >>>>>
> > >>>>>> task
> > >>>>>>>>
> > >>>>>>>>> would then execute all of the FlakyTests separately. This would
> > >>>>>>>>>>
> > >>>>>>>>> hopefully
> > >>>>>>>>
> > >>>>>>>>> help us get to a point where we can depend on our primary
> testing
> > >>>>>>>>>>
> > >>>>>>>>> tasks
> > >>>>>
> > >>>>>> staying green 100% of the time. We would then prioritize fixing
> > >>>>>>>>>>
> > >>>>>>>>> the
> > >>>>
> > >>>>> FlakyTests and one by one removing the FlakyTest category from
> > >>>>>>>>>>
> > >>>>>>>>> them.
> > >>>>
> > >>>>> I would also suggest that we execute the FlakyTests with
> > >>>>>>>>>>
> > >>>>>>>>> "forkEvery
> > >>>>
> > >>>>> 1"
> > >>>>>
> > >>>>>> to
> > >>>>>>>>
> > >>>>>>>>> give each test a clean JVM or set of DistributedTest JVMs. That
> > >>>>>>>>>>
> > >>>>>>>>> would
> > >>>>>
> > >>>>>> hopefully decrease the chance of a GC pause or test pollution
> > >>>>>>>>>>
> > >>>>>>>>> causing
> > >>>>>
> > >>>>>> flickering failures.
> > >>>>>>>>>>
> > >>>>>>>>>> Having reviewed lots of test code and failure stacks, I
> believe
> > >>>>>>>>>>
> > >>>>>>>>> that
> > >>>>
> > >>>>> the
> > >>>>>>>>
> > >>>>>>>>> primary causes of FlakyTests are timing sensitivity (thread
> > sleeps
> > >>>>>>>>>>
> > >>>>>>>>> or
> > >>>>>
> > >>>>>> nothing that waits for async activity, timeouts or sleeps that are
> > >>>>>>>>>> insufficient on busy CPU or I/O or during due GC pause) and
> > random
> > >>>>>>>>>>
> > >>>>>>>>> ports
> > >>>>>>>>
> > >>>>>>>>> via AvailablePort (instead of using zero for ephemeral port).
> > >>>>>>>>>>
> > >>>>>>>>>> Opinions or ideas? Hate it? Love it?
> > >>>>>>>>>>
> > >>>>>>>>>> -Kirk
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>>
> > >>>>>>>>> ~/William
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>
> >
> >
>
Re: Next steps with flickering tests
Posted by Dan Smith <ds...@pivotal.io>.
testMultipleCacheServer *is* annotated as a flaky test. Maybe you aren't
actually excluding anything?
I'm surprised testTomstones is not annotated with flaky test. We have at
least 3 bugs all related to this method that are still open - GEODE-1285,
GEODE-1332, GEODE-1287.
-Dan
On Mon, May 2, 2016 at 11:25 AM, Anthony Baker <ab...@pivotal.io> wrote:
> I have results from 10 runs of all the tests excluding @FlakyTest. These
> are the only failures:
>
> ubuntu@ip-172-31-44-240:~$ grep FAILED incubator-geode/nohup.out | grep
> gemfire
> com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> testMultipleCacheServer FAILED
> com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> testMultipleCacheServer FAILED
> com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> testMultipleCacheServer FAILED
> com.gemstone.gemfire.cache30.DistributedAckPersistentRegionCCEDUnitTest >
> testTombstones FAILED
> com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> testMultipleCacheServer FAILED
> com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> testMultipleCacheServer FAILED
> com.gemstone.gemfire.internal.cache.wan.parallel.ParallelWANStatsDUnitTest
> > testParallelPropagationHA FAILED
>
> Anthony
>
> > On Apr 27, 2016, at 7:22 PM, Kirk Lund <kl...@pivotal.io> wrote:
> >
> > We currently have over 10,000 tests but only about 147 are annotated with
> > FlakyTest. It probably wouldn't cause precheckin to take much longer. My
> > main argument for separating the FlakyTests into their own Jenkins build
> > job is to get the main build job 100% green while we know the FlakyTest
> > build job might "flicker".
> >
> > -Kirk
> >
> >
> > On Tue, Apr 26, 2016 at 1:58 PM, Udo Kohlmeyer <uk...@pivotal.io>
> > wrote:
> >
> >> Depending on the amount of "flaky" tests, this should not increase the
> >> time too much.
> >> I forsee these "flaky" tests to be few and far in between. Over time I
> >> imagine this would be a last resort if we cannot fix the test or even
> >> improve the test harness to have a clean test space for each test.
> >>
> >> --Udo
> >>
> >>
> >> On 27/04/2016 6:42 am, Jens Deppe wrote:
> >>
> >>> By running the Flakes with forkEvery 1 won't it extend precheckin by a
> >>> fair
> >>> bit? I'd prefer to see two separate builds running.
> >>>
> >>> On Tue, Apr 26, 2016 at 11:53 AM, Kirk Lund <kl...@pivotal.io> wrote:
> >>>
> >>> I'm in favor of running the FlakyTests together at the end of
> precheckin
> >>>> using forkEvery 1 on them too.
> >>>>
> >>>> What about running two nightly builds? One that runs all the non-flaky
> >>>> UnitTests, IntegrationTests and DistributedTests. Plus another nightly
> >>>> build that runs only FlakyTests? We can run Jenkins jobs on our local
> >>>> machines that separates FlakyTests out into its own job too, but I'd
> like
> >>>> to see the main nightly build go to 100% green (if that's even
> possible
> >>>> without encounter many more flickering tests).
> >>>>
> >>>> -Kirk
> >>>>
> >>>>
> >>>> On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <ds...@pivotal.io>
> wrote:
> >>>>
> >>>> +1 for separating these out and running them with forkEvery 1.
> >>>>>
> >>>>> I think they should probably still run as part of precheckin and the
> >>>>> nightly builds though. We don't want this to turn into essentially
> >>>>> disabling and ignoring these tests.
> >>>>>
> >>>>> -Dan
> >>>>>
> >>>>> On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <kl...@pivotal.io>
> wrote:
> >>>>>
> >>>>>> Also, I don't think there's much value continuing to use the "CI"
> >>>>>>
> >>>>> label.
> >>>>
> >>>>> If
> >>>>>
> >>>>>> a test fails in Jenkins, then run the test to see if it fails
> >>>>>>
> >>>>> consistently.
> >>>>>
> >>>>>> If it doesn't, it's flaky. The developer looking at it should try to
> >>>>>> determine the cause of it failing (ie, "it uses thread sleeps or
> random
> >>>>>> ports with BindExceptions or has short timeouts with probable GC
> >>>>>>
> >>>>> pause")
> >>>>
> >>>>> and include that info when adding the FlakyTest annotation and
> filing a
> >>>>>> Jira bug with the Flaky label. If the test fails consistently, then
> >>>>>>
> >>>>> file
> >>>>
> >>>>> a
> >>>>>
> >>>>>> Jira bug without the Flaky label.
> >>>>>>
> >>>>>> -Kirk
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <kl...@pivotal.io>
> wrote:
> >>>>>>
> >>>>>> There are quite a few test classes that have multiple test methods
> >>>>>>>
> >>>>>> which
> >>>>
> >>>>> are annotated with the FlakyTest category.
> >>>>>>>
> >>>>>>> More thoughts:
> >>>>>>>
> >>>>>>> In general, I think that if any given test fails intermittently
> then
> >>>>>>>
> >>>>>> it
> >>>>
> >>>>> is
> >>>>>
> >>>>>> a FlakyTest. A good test should either pass or fail consistently.
> >>>>>>>
> >>>>>> After
> >>>>
> >>>>> annotating a test method with FlakyTest, the developer should then
> add
> >>>>>>>
> >>>>>> the
> >>>>>
> >>>>>> Flaky label to corresponding Jira ticket. What we then do with the
> >>>>>>>
> >>>>>> Jira
> >>>>
> >>>>> tickets (ie, fix them) is probably more important than deciding if a
> >>>>>>>
> >>>>>> test
> >>>>>
> >>>>>> is flaky or not.
> >>>>>>>
> >>>>>>> Rather than try to come up with some flaky process for determining
> if
> >>>>>>>
> >>>>>> a
> >>>>
> >>>>> given test is flaky (ie, "does it have thread sleeps?"), it would be
> >>>>>>>
> >>>>>> better
> >>>>>
> >>>>>> to have a wiki page that has examples of flakiness and how to fix
> them
> >>>>>>>
> >>>>>> ("if
> >>>>>
> >>>>>> the test has thread sleeps, then switch to using Awaitility and do
> >>>>>>> this...").
> >>>>>>>
> >>>>>>> -Kirk
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <abaker@pivotal.io
> >
> >>>>>>>
> >>>>>> wrote:
> >>>>>
> >>>>>> Thanks Kirk!
> >>>>>>>>
> >>>>>>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . |
> grep
> >>>>>>>>
> >>>>>>> -v
> >>>>>
> >>>>>> Binary | wc -l | xargs echo "Flake factor:"
> >>>>>>>> Flake factor: 136
> >>>>>>>>
> >>>>>>>> Anthony
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Apr 25, 2016, at 9:45 PM, William Markito <wmarkito@pivotal.io
> >
> >>>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> +1
> >>>>>>>>>
> >>>>>>>>> Are we also planning to automate the additional build task
> somehow
> >>>>>>>>>
> >>>>>>>> ?
> >>>>
> >>>>> I'd also suggest creating a wiki page with some stats (like how
> >>>>>>>>>
> >>>>>>>> many
> >>>>
> >>>>> FlakyTests we currently have) and the idea behind this effort so we
> >>>>>>>>>
> >>>>>>>> can
> >>>>>
> >>>>>> keep track and see how it's evolving over time.
> >>>>>>>>>
> >>>>>>>>> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io>
> >>>>>>>>>
> >>>>>>>> wrote:
> >>>>
> >>>>> After completing GEODE-1233, all currently known flickering tests
> >>>>>>>>>>
> >>>>>>>>> are
> >>>>>
> >>>>>> now
> >>>>>>>>
> >>>>>>>>> annotated with our FlakyTest JUnit Category.
> >>>>>>>>>>
> >>>>>>>>>> In an effort to divide our build up into multiple build
> pipelines
> >>>>>>>>>>
> >>>>>>>>> that
> >>>>>
> >>>>>> are
> >>>>>>>>
> >>>>>>>>> sequential and dependable, we could consider excluding FlakyTests
> >>>>>>>>>>
> >>>>>>>>> from
> >>>>>
> >>>>>> the
> >>>>>>>>
> >>>>>>>>> primary integrationTest and distributedTest tasks. An additional
> >>>>>>>>>>
> >>>>>>>>> build
> >>>>>
> >>>>>> task
> >>>>>>>>
> >>>>>>>>> would then execute all of the FlakyTests separately. This would
> >>>>>>>>>>
> >>>>>>>>> hopefully
> >>>>>>>>
> >>>>>>>>> help us get to a point where we can depend on our primary testing
> >>>>>>>>>>
> >>>>>>>>> tasks
> >>>>>
> >>>>>> staying green 100% of the time. We would then prioritize fixing
> >>>>>>>>>>
> >>>>>>>>> the
> >>>>
> >>>>> FlakyTests and one by one removing the FlakyTest category from
> >>>>>>>>>>
> >>>>>>>>> them.
> >>>>
> >>>>> I would also suggest that we execute the FlakyTests with
> >>>>>>>>>>
> >>>>>>>>> "forkEvery
> >>>>
> >>>>> 1"
> >>>>>
> >>>>>> to
> >>>>>>>>
> >>>>>>>>> give each test a clean JVM or set of DistributedTest JVMs. That
> >>>>>>>>>>
> >>>>>>>>> would
> >>>>>
> >>>>>> hopefully decrease the chance of a GC pause or test pollution
> >>>>>>>>>>
> >>>>>>>>> causing
> >>>>>
> >>>>>> flickering failures.
> >>>>>>>>>>
> >>>>>>>>>> Having reviewed lots of test code and failure stacks, I believe
> >>>>>>>>>>
> >>>>>>>>> that
> >>>>
> >>>>> the
> >>>>>>>>
> >>>>>>>>> primary causes of FlakyTests are timing sensitivity (thread
> sleeps
> >>>>>>>>>>
> >>>>>>>>> or
> >>>>>
> >>>>>> nothing that waits for async activity, timeouts or sleeps that are
> >>>>>>>>>> insufficient on busy CPU or I/O or during due GC pause) and
> random
> >>>>>>>>>>
> >>>>>>>>> ports
> >>>>>>>>
> >>>>>>>>> via AvailablePort (instead of using zero for ephemeral port).
> >>>>>>>>>>
> >>>>>>>>>> Opinions or ideas? Hate it? Love it?
> >>>>>>>>>>
> >>>>>>>>>> -Kirk
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>>
> >>>>>>>>> ~/William
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>
>
>