You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@geode.apache.org by Kirk Lund <kl...@pivotal.io> on 2016/04/26 03:54:10 UTC

Next steps with flickering tests

After completing GEODE-1233, all currently known flickering tests are now
annotated with our FlakyTest JUnit Category.

In an effort to divide our build up into multiple build pipelines that are
sequential and dependable, we could consider excluding FlakyTests from the
primary integrationTest and distributedTest tasks. An additional build task
would then execute all of the FlakyTests separately. This would hopefully
help us get to a point where we can depend on our primary testing tasks
staying green 100% of the time. We would then prioritize fixing the
FlakyTests and one by one removing the FlakyTest category from them.

I would also suggest that we execute the FlakyTests with "forkEvery 1" to
give each test a clean JVM or set of DistributedTest JVMs. That would
hopefully decrease the chance of a GC pause or test pollution causing
flickering failures.

Having reviewed lots of test code and failure stacks, I believe that the
primary causes of FlakyTests are timing sensitivity (thread sleeps or
nothing that waits for async activity, timeouts or sleeps that are
insufficient on busy CPU or I/O or during due GC pause) and random ports
via AvailablePort (instead of using zero for ephemeral port).

Opinions or ideas? Hate it? Love it?

-Kirk

Re: Next steps with flickering tests

Posted by Kirk Lund <kl...@pivotal.io>.

That would probably work for any test that doesn't require @FixMethodOrder,
but I think we only have half a dozen of those and I believe those are
IntegrationTests (no DistributedTests).

-Kirk


On Mon, Apr 25, 2016 at 7:19 PM, Udo Kohlmeyer <uk...@pivotal.io>
wrote:

> +1 on forkEvery1.
>
> Maybe the DUnit VM + ProcessManager could even be tuned and renew VMs
> after every test.
>
>
> On 26/04/2016 11:54 am, Kirk Lund wrote:
>
>> After completing GEODE-1233, all currently known flickering tests are now
>> annotated with our FlakyTest JUnit Category.
>>
>> In an effort to divide our build up into multiple build pipelines that are
>> sequential and dependable, we could consider excluding FlakyTests from the
>> primary integrationTest and distributedTest tasks. An additional build
>> task
>> would then execute all of the FlakyTests separately. This would hopefully
>> help us get to a point where we can depend on our primary testing tasks
>> staying green 100% of the time. We would then prioritize fixing the
>> FlakyTests and one by one removing the FlakyTest category from them.
>>
>> I would also suggest that we execute the FlakyTests with "forkEvery 1" to
>> give each test a clean JVM or set of DistributedTest JVMs. That would
>> hopefully decrease the chance of a GC pause or test pollution causing
>> flickering failures.
>>
>> Having reviewed lots of test code and failure stacks, I believe that the
>> primary causes of FlakyTests are timing sensitivity (thread sleeps or
>> nothing that waits for async activity, timeouts or sleeps that are
>> insufficient on busy CPU or I/O or during due GC pause) and random ports
>> via AvailablePort (instead of using zero for ephemeral port).
>>
>> Opinions or ideas? Hate it? Love it?
>>
>> -Kirk
>>
>>
>

Re: Next steps with flickering tests

Posted by Udo Kohlmeyer <uk...@pivotal.io>.

+1 on forkEvery1.

Maybe the DUnit VM + ProcessManager could even be tuned and renew VMs 
after every test.

On 26/04/2016 11:54 am, Kirk Lund wrote:
> After completing GEODE-1233, all currently known flickering tests are now
> annotated with our FlakyTest JUnit Category.
>
> In an effort to divide our build up into multiple build pipelines that are
> sequential and dependable, we could consider excluding FlakyTests from the
> primary integrationTest and distributedTest tasks. An additional build task
> would then execute all of the FlakyTests separately. This would hopefully
> help us get to a point where we can depend on our primary testing tasks
> staying green 100% of the time. We would then prioritize fixing the
> FlakyTests and one by one removing the FlakyTest category from them.
>
> I would also suggest that we execute the FlakyTests with "forkEvery 1" to
> give each test a clean JVM or set of DistributedTest JVMs. That would
> hopefully decrease the chance of a GC pause or test pollution causing
> flickering failures.
>
> Having reviewed lots of test code and failure stacks, I believe that the
> primary causes of FlakyTests are timing sensitivity (thread sleeps or
> nothing that waits for async activity, timeouts or sleeps that are
> insufficient on busy CPU or I/O or during due GC pause) and random ports
> via AvailablePort (instead of using zero for ephemeral port).
>
> Opinions or ideas? Hate it? Love it?
>
> -Kirk
>

Re: Next steps with flickering tests

Posted by Jianxia Chen <jc...@pivotal.io>.

+1

On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io> wrote:

> After completing GEODE-1233, all currently known flickering tests are now
> annotated with our FlakyTest JUnit Category.
>
> In an effort to divide our build up into multiple build pipelines that are
> sequential and dependable, we could consider excluding FlakyTests from the
> primary integrationTest and distributedTest tasks. An additional build task
> would then execute all of the FlakyTests separately. This would hopefully
> help us get to a point where we can depend on our primary testing tasks
> staying green 100% of the time. We would then prioritize fixing the
> FlakyTests and one by one removing the FlakyTest category from them.
>
> I would also suggest that we execute the FlakyTests with "forkEvery 1" to
> give each test a clean JVM or set of DistributedTest JVMs. That would
> hopefully decrease the chance of a GC pause or test pollution causing
> flickering failures.
>
> Having reviewed lots of test code and failure stacks, I believe that the
> primary causes of FlakyTests are timing sensitivity (thread sleeps or
> nothing that waits for async activity, timeouts or sleeps that are
> insufficient on busy CPU or I/O or during due GC pause) and random ports
> via AvailablePort (instead of using zero for ephemeral port).
>
> Opinions or ideas? Hate it? Love it?
>
> -Kirk
>

Re: Next steps with flickering tests

Posted by Kenneth Howe <kh...@pivotal.io>.

+1 

Additional thoughts - if it’s necessary to run FlakyTests with “forkEvery 1” for tests to pass consistently, then they're still flaky tests. This may be useful when analyzing intermittent test failures, but I don’t see it as a good thing for an additional CI build task.

Ken

> On Apr 26, 2016, at 10:28 AM, Kirk Lund <kl...@pivotal.io> wrote:
> 
> Also, I don't think there's much value continuing to use the "CI" label. If
> a test fails in Jenkins, then run the test to see if it fails consistently.
> If it doesn't, it's flaky. The developer looking at it should try to
> determine the cause of it failing (ie, "it uses thread sleeps or random
> ports with BindExceptions or has short timeouts with probable GC pause")
> and include that info when adding the FlakyTest annotation and filing a
> Jira bug with the Flaky label. If the test fails consistently, then file a
> Jira bug without the Flaky label.
> 
> -Kirk
> 
> 
> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <kl...@pivotal.io> wrote:
> 
>> There are quite a few test classes that have multiple test methods which
>> are annotated with the FlakyTest category.
>> 
>> More thoughts:
>> 
>> In general, I think that if any given test fails intermittently then it is
>> a FlakyTest. A good test should either pass or fail consistently. After
>> annotating a test method with FlakyTest, the developer should then add the
>> Flaky label to corresponding Jira ticket. What we then do with the Jira
>> tickets (ie, fix them) is probably more important than deciding if a test
>> is flaky or not.
>> 
>> Rather than try to come up with some flaky process for determining if a
>> given test is flaky (ie, "does it have thread sleeps?"), it would be better
>> to have a wiki page that has examples of flakiness and how to fix them ("if
>> the test has thread sleeps, then switch to using Awaitility and do
>> this...").
>> 
>> -Kirk
>> 
>> 
>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <ab...@pivotal.io> wrote:
>> 
>>> Thanks Kirk!
>>> 
>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . | grep -v
>>> Binary | wc -l | xargs echo "Flake factor:"
>>> Flake factor: 136
>>> 
>>> Anthony
>>> 
>>> 
>>>> On Apr 25, 2016, at 9:45 PM, William Markito <wm...@pivotal.io>
>>> wrote:
>>>> 
>>>> +1
>>>> 
>>>> Are we also planning to automate the additional build task somehow ?
>>>> 
>>>> I'd also suggest creating a wiki page with some stats (like how many
>>>> FlakyTests we currently have) and the idea behind this effort so we can
>>>> keep track and see how it's evolving over time.
>>>> 
>>>> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io> wrote:
>>>> 
>>>>> After completing GEODE-1233, all currently known flickering tests are
>>> now
>>>>> annotated with our FlakyTest JUnit Category.
>>>>> 
>>>>> In an effort to divide our build up into multiple build pipelines that
>>> are
>>>>> sequential and dependable, we could consider excluding FlakyTests from
>>> the
>>>>> primary integrationTest and distributedTest tasks. An additional build
>>> task
>>>>> would then execute all of the FlakyTests separately. This would
>>> hopefully
>>>>> help us get to a point where we can depend on our primary testing tasks
>>>>> staying green 100% of the time. We would then prioritize fixing the
>>>>> FlakyTests and one by one removing the FlakyTest category from them.
>>>>> 
>>>>> I would also suggest that we execute the FlakyTests with "forkEvery 1"
>>> to
>>>>> give each test a clean JVM or set of DistributedTest JVMs. That would
>>>>> hopefully decrease the chance of a GC pause or test pollution causing
>>>>> flickering failures.
>>>>> 
>>>>> Having reviewed lots of test code and failure stacks, I believe that
>>> the
>>>>> primary causes of FlakyTests are timing sensitivity (thread sleeps or
>>>>> nothing that waits for async activity, timeouts or sleeps that are
>>>>> insufficient on busy CPU or I/O or during due GC pause) and random
>>> ports
>>>>> via AvailablePort (instead of using zero for ephemeral port).
>>>>> 
>>>>> Opinions or ideas? Hate it? Love it?
>>>>> 
>>>>> -Kirk
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> 
>>>> ~/William
>>> 
>>> 
>>

Re: Next steps with flickering tests

Posted by Hitesh Khamesra <hi...@yahoo.com.INVALID>.

>>If we then want the test to create clients that connect to that server we'll need a
mechanism to find out what the in-use port is.
We should do this mass-change to avoid bind-exception failure. Similarly for JMX port.I can see many tests are using this concerned pattern.
Probably for single locator test we can have port "0" functionality.  Let us know if we want to add this.



      From: Kirk Lund <kl...@pivotal.io>
 To: geode <de...@geode.incubator.apache.org>; Hitesh Khamesra <hi...@yahoo.com> 
 Sent: Friday, April 29, 2016 4:35 PM
 Subject: Re: Next steps with flickering tests
   
Yes, but the commands for checking status will then always report zero, the
mbeans will report zero, and that is only for the server port. If we then
want the test to create clients that connect to that server we'll need a
mechanism to find out what the in-use port is.

We have a similar problem with locator (and other components). The locator
will fail if you try to pass in zero for a locator port.

So we'll need to alter more code to make sure we can use zero and that
everything that then reports what the port is will fetch the actual port
in-use rather than using zero. But, yes what you're describing is exactly
the solution I think we need to move towards for random ports and
BindException failures.

-Kirk


On Fri, Apr 29, 2016 at 4:10 PM, Hitesh Khamesra <
hiteshk25@yahoo.com.invalid> wrote:

> Kirk,
> have we considered following pattern to change..
> from -->
>    int port = AvailablePort.getRandomAvailablePort(AvailablePort.SOCKET);
>    server.setPort(port);
> to ->  server.setPort(0);  port = server.getPort()
> This should take care "bind address" issue.
>
>      From: Kirk Lund <kl...@pivotal.io>
>  To: geode <de...@geode.incubator.apache.org>
>  Sent: Wednesday, April 27, 2016 7:22 PM
>  Subject: Re: Next steps with flickering tests
>
> We currently have over 10,000 tests but only about 147 are annotated with
> FlakyTest. It probably wouldn't cause precheckin to take much longer. My
> main argument for separating the FlakyTests into their own Jenkins build
> job is to get the main build job 100% green while we know the FlakyTest
> build job might "flicker".
>
> -Kirk
>
>
> On Tue, Apr 26, 2016 at 1:58 PM, Udo Kohlmeyer <uk...@pivotal.io>
> wrote:
>
> > Depending on the amount of "flaky" tests, this should not increase the
> > time too much.
> > I forsee these "flaky" tests to be few and far in between. Over time I
> > imagine this would be a last resort if we cannot fix the test or even
> > improve the test harness to have a clean test space for each test.
> >
> > --Udo
> >
> >
> > On 27/04/2016 6:42 am, Jens Deppe wrote:
> >
> >> By running the Flakes with forkEvery 1 won't it extend precheckin by a
> >> fair
> >> bit? I'd prefer to see two separate builds running.
> >>
> >> On Tue, Apr 26, 2016 at 11:53 AM, Kirk Lund <kl...@pivotal.io> wrote:
> >>
> >> I'm in favor of running the FlakyTests together at the end of precheckin
> >>> using forkEvery 1 on them too.
> >>>
> >>> What about running two nightly builds? One that runs all the non-flaky
> >>> UnitTests, IntegrationTests and DistributedTests. Plus another nightly
> >>> build that runs only FlakyTests? We can run Jenkins jobs on our local
> >>> machines that separates FlakyTests out into its own job too, but I'd
> like
> >>> to see the main nightly build go to 100% green (if that's even possible
> >>> without encounter many more flickering tests).
> >>>
> >>> -Kirk
> >>>
> >>>
> >>> On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <ds...@pivotal.io> wrote:
> >>>
> >>> +1 for separating these out and running them with forkEvery 1.
> >>>>
> >>>> I think they should probably still run as part of precheckin and the
> >>>> nightly builds though. We don't want this to turn into essentially
> >>>> disabling and ignoring these tests.
> >>>>
> >>>> -Dan
> >>>>
> >>>> On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <kl...@pivotal.io> wrote:
> >>>>
> >>>>> Also, I don't think there's much value continuing to use the "CI"
> >>>>>
> >>>> label.
> >>>
> >>>> If
> >>>>
> >>>>> a test fails in Jenkins, then run the test to see if it fails
> >>>>>
> >>>> consistently.
> >>>>
> >>>>> If it doesn't, it's flaky. The developer looking at it should try to
> >>>>> determine the cause of it failing (ie, "it uses thread sleeps or
> random
> >>>>> ports with BindExceptions or has short timeouts with probable GC
> >>>>>
> >>>> pause")
> >>>
> >>>> and include that info when adding the FlakyTest annotation and filing
> a
> >>>>> Jira bug with the Flaky label. If the test fails consistently, then
> >>>>>
> >>>> file
> >>>
> >>>> a
> >>>>
> >>>>> Jira bug without the Flaky label.
> >>>>>
> >>>>> -Kirk
> >>>>>
> >>>>>
> >>>>> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <kl...@pivotal.io>
> wrote:
> >>>>>
> >>>>> There are quite a few test classes that have multiple test methods
> >>>>>>
> >>>>> which
> >>>
> >>>> are annotated with the FlakyTest category.
> >>>>>>
> >>>>>> More thoughts:
> >>>>>>
> >>>>>> In general, I think that if any given test fails intermittently then
> >>>>>>
> >>>>> it
> >>>
> >>>> is
> >>>>
> >>>>> a FlakyTest. A good test should either pass or fail consistently.
> >>>>>>
> >>>>> After
> >>>
> >>>> annotating a test method with FlakyTest, the developer should then add
> >>>>>>
> >>>>> the
> >>>>
> >>>>> Flaky label to corresponding Jira ticket. What we then do with the
> >>>>>>
> >>>>> Jira
> >>>
> >>>> tickets (ie, fix them) is probably more important than deciding if a
> >>>>>>
> >>>>> test
> >>>>
> >>>>> is flaky or not.
> >>>>>>
> >>>>>> Rather than try to come up with some flaky process for determining
> if
> >>>>>>
> >>>>> a
> >>>
> >>>> given test is flaky (ie, "does it have thread sleeps?"), it would be
> >>>>>>
> >>>>> better
> >>>>
> >>>>> to have a wiki page that has examples of flakiness and how to fix
> them
> >>>>>>
> >>>>> ("if
> >>>>
> >>>>> the test has thread sleeps, then switch to using Awaitility and do
> >>>>>> this...").
> >>>>>>
> >>>>>> -Kirk
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <ab...@pivotal.io>
> >>>>>>
> >>>>> wrote:
> >>>>
> >>>>> Thanks Kirk!
> >>>>>>>
> >>>>>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . |
> grep
> >>>>>>>
> >>>>>> -v
> >>>>
> >>>>> Binary | wc -l | xargs echo "Flake factor:"
> >>>>>>> Flake factor: 136
> >>>>>>>
> >>>>>>> Anthony
> >>>>>>>
> >>>>>>>
> >>>>>>> On Apr 25, 2016, at 9:45 PM, William Markito <wm...@pivotal.io>
> >>>>>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> +1
> >>>>>>>>
> >>>>>>>> Are we also planning to automate the additional build task somehow
> >>>>>>>>
> >>>>>>> ?
> >>>
> >>>> I'd also suggest creating a wiki page with some stats (like how
> >>>>>>>>
> >>>>>>> many
> >>>
> >>>> FlakyTests we currently have) and the idea behind this effort so we
> >>>>>>>>
> >>>>>>> can
> >>>>
> >>>>> keep track and see how it's evolving over time.
> >>>>>>>>
> >>>>>>>> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io>
> >>>>>>>>
> >>>>>>> wrote:
> >>>
> >>>> After completing GEODE-1233, all currently known flickering tests
> >>>>>>>>>
> >>>>>>>> are
> >>>>
> >>>>> now
> >>>>>>>
> >>>>>>>> annotated with our FlakyTest JUnit Category.
> >>>>>>>>>
> >>>>>>>>> In an effort to divide our build up into multiple build pipelines
> >>>>>>>>>
> >>>>>>>> that
> >>>>
> >>>>> are
> >>>>>>>
> >>>>>>>> sequential and dependable, we could consider excluding FlakyTests
> >>>>>>>>>
> >>>>>>>> from
> >>>>
> >>>>> the
> >>>>>>>
> >>>>>>>> primary integrationTest and distributedTest tasks. An additional
> >>>>>>>>>
> >>>>>>>> build
> >>>>
> >>>>> task
> >>>>>>>
> >>>>>>>> would then execute all of the FlakyTests separately. This would
> >>>>>>>>>
> >>>>>>>> hopefully
> >>>>>>>
> >>>>>>>> help us get to a point where we can depend on our primary testing
> >>>>>>>>>
> >>>>>>>> tasks
> >>>>
> >>>>> staying green 100% of the time. We would then prioritize fixing
> >>>>>>>>>
> >>>>>>>> the
> >>>
> >>>> FlakyTests and one by one removing the FlakyTest category from
> >>>>>>>>>
> >>>>>>>> them.
> >>>
> >>>> I would also suggest that we execute the FlakyTests with
> >>>>>>>>>
> >>>>>>>> "forkEvery
> >>>
> >>>> 1"
> >>>>
> >>>>> to
> >>>>>>>
> >>>>>>>> give each test a clean JVM or set of DistributedTest JVMs. That
> >>>>>>>>>
> >>>>>>>> would
> >>>>
> >>>>> hopefully decrease the chance of a GC pause or test pollution
> >>>>>>>>>
> >>>>>>>> causing
> >>>>
> >>>>> flickering failures.
> >>>>>>>>>
> >>>>>>>>> Having reviewed lots of test code and failure stacks, I believe
> >>>>>>>>>
> >>>>>>>> that
> >>>
> >>>> the
> >>>>>>>
> >>>>>>>> primary causes of FlakyTests are timing sensitivity (thread sleeps
> >>>>>>>>>
> >>>>>>>> or
> >>>>
> >>>>> nothing that waits for async activity, timeouts or sleeps that are
> >>>>>>>>> insufficient on busy CPU or I/O or during due GC pause) and
> random
> >>>>>>>>>
> >>>>>>>> ports
> >>>>>>>
> >>>>>>>> via AvailablePort (instead of using zero for ephemeral port).
> >>>>>>>>>
> >>>>>>>>> Opinions or ideas? Hate it? Love it?
> >>>>>>>>>
> >>>>>>>>> -Kirk
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>>
> >>>>>>>> ~/William
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >
>
>
>
>

Re: Next steps with flickering tests

Posted by Kirk Lund <kl...@pivotal.io>.

Yes, but the commands for checking status will then always report zero, the
mbeans will report zero, and that is only for the server port. If we then
want the test to create clients that connect to that server we'll need a
mechanism to find out what the in-use port is.

We have a similar problem with locator (and other components). The locator
will fail if you try to pass in zero for a locator port.

So we'll need to alter more code to make sure we can use zero and that
everything that then reports what the port is will fetch the actual port
in-use rather than using zero. But, yes what you're describing is exactly
the solution I think we need to move towards for random ports and
BindException failures.

-Kirk


On Fri, Apr 29, 2016 at 4:10 PM, Hitesh Khamesra <
hiteshk25@yahoo.com.invalid> wrote:

> Kirk,
> have we considered following pattern to change..
> from -->
>    int port = AvailablePort.getRandomAvailablePort(AvailablePort.SOCKET);
>     server.setPort(port);
> to ->  server.setPort(0);  port = server.getPort()
> This should take care "bind address" issue.
>
>       From: Kirk Lund <kl...@pivotal.io>
>  To: geode <de...@geode.incubator.apache.org>
>  Sent: Wednesday, April 27, 2016 7:22 PM
>  Subject: Re: Next steps with flickering tests
>
> We currently have over 10,000 tests but only about 147 are annotated with
> FlakyTest. It probably wouldn't cause precheckin to take much longer. My
> main argument for separating the FlakyTests into their own Jenkins build
> job is to get the main build job 100% green while we know the FlakyTest
> build job might "flicker".
>
> -Kirk
>
>
> On Tue, Apr 26, 2016 at 1:58 PM, Udo Kohlmeyer <uk...@pivotal.io>
> wrote:
>
> > Depending on the amount of "flaky" tests, this should not increase the
> > time too much.
> > I forsee these "flaky" tests to be few and far in between. Over time I
> > imagine this would be a last resort if we cannot fix the test or even
> > improve the test harness to have a clean test space for each test.
> >
> > --Udo
> >
> >
> > On 27/04/2016 6:42 am, Jens Deppe wrote:
> >
> >> By running the Flakes with forkEvery 1 won't it extend precheckin by a
> >> fair
> >> bit? I'd prefer to see two separate builds running.
> >>
> >> On Tue, Apr 26, 2016 at 11:53 AM, Kirk Lund <kl...@pivotal.io> wrote:
> >>
> >> I'm in favor of running the FlakyTests together at the end of precheckin
> >>> using forkEvery 1 on them too.
> >>>
> >>> What about running two nightly builds? One that runs all the non-flaky
> >>> UnitTests, IntegrationTests and DistributedTests. Plus another nightly
> >>> build that runs only FlakyTests? We can run Jenkins jobs on our local
> >>> machines that separates FlakyTests out into its own job too, but I'd
> like
> >>> to see the main nightly build go to 100% green (if that's even possible
> >>> without encounter many more flickering tests).
> >>>
> >>> -Kirk
> >>>
> >>>
> >>> On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <ds...@pivotal.io> wrote:
> >>>
> >>> +1 for separating these out and running them with forkEvery 1.
> >>>>
> >>>> I think they should probably still run as part of precheckin and the
> >>>> nightly builds though. We don't want this to turn into essentially
> >>>> disabling and ignoring these tests.
> >>>>
> >>>> -Dan
> >>>>
> >>>> On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <kl...@pivotal.io> wrote:
> >>>>
> >>>>> Also, I don't think there's much value continuing to use the "CI"
> >>>>>
> >>>> label.
> >>>
> >>>> If
> >>>>
> >>>>> a test fails in Jenkins, then run the test to see if it fails
> >>>>>
> >>>> consistently.
> >>>>
> >>>>> If it doesn't, it's flaky. The developer looking at it should try to
> >>>>> determine the cause of it failing (ie, "it uses thread sleeps or
> random
> >>>>> ports with BindExceptions or has short timeouts with probable GC
> >>>>>
> >>>> pause")
> >>>
> >>>> and include that info when adding the FlakyTest annotation and filing
> a
> >>>>> Jira bug with the Flaky label. If the test fails consistently, then
> >>>>>
> >>>> file
> >>>
> >>>> a
> >>>>
> >>>>> Jira bug without the Flaky label.
> >>>>>
> >>>>> -Kirk
> >>>>>
> >>>>>
> >>>>> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <kl...@pivotal.io>
> wrote:
> >>>>>
> >>>>> There are quite a few test classes that have multiple test methods
> >>>>>>
> >>>>> which
> >>>
> >>>> are annotated with the FlakyTest category.
> >>>>>>
> >>>>>> More thoughts:
> >>>>>>
> >>>>>> In general, I think that if any given test fails intermittently then
> >>>>>>
> >>>>> it
> >>>
> >>>> is
> >>>>
> >>>>> a FlakyTest. A good test should either pass or fail consistently.
> >>>>>>
> >>>>> After
> >>>
> >>>> annotating a test method with FlakyTest, the developer should then add
> >>>>>>
> >>>>> the
> >>>>
> >>>>> Flaky label to corresponding Jira ticket. What we then do with the
> >>>>>>
> >>>>> Jira
> >>>
> >>>> tickets (ie, fix them) is probably more important than deciding if a
> >>>>>>
> >>>>> test
> >>>>
> >>>>> is flaky or not.
> >>>>>>
> >>>>>> Rather than try to come up with some flaky process for determining
> if
> >>>>>>
> >>>>> a
> >>>
> >>>> given test is flaky (ie, "does it have thread sleeps?"), it would be
> >>>>>>
> >>>>> better
> >>>>
> >>>>> to have a wiki page that has examples of flakiness and how to fix
> them
> >>>>>>
> >>>>> ("if
> >>>>
> >>>>> the test has thread sleeps, then switch to using Awaitility and do
> >>>>>> this...").
> >>>>>>
> >>>>>> -Kirk
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <ab...@pivotal.io>
> >>>>>>
> >>>>> wrote:
> >>>>
> >>>>> Thanks Kirk!
> >>>>>>>
> >>>>>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . |
> grep
> >>>>>>>
> >>>>>> -v
> >>>>
> >>>>> Binary | wc -l | xargs echo "Flake factor:"
> >>>>>>> Flake factor: 136
> >>>>>>>
> >>>>>>> Anthony
> >>>>>>>
> >>>>>>>
> >>>>>>> On Apr 25, 2016, at 9:45 PM, William Markito <wm...@pivotal.io>
> >>>>>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> +1
> >>>>>>>>
> >>>>>>>> Are we also planning to automate the additional build task somehow
> >>>>>>>>
> >>>>>>> ?
> >>>
> >>>> I'd also suggest creating a wiki page with some stats (like how
> >>>>>>>>
> >>>>>>> many
> >>>
> >>>> FlakyTests we currently have) and the idea behind this effort so we
> >>>>>>>>
> >>>>>>> can
> >>>>
> >>>>> keep track and see how it's evolving over time.
> >>>>>>>>
> >>>>>>>> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io>
> >>>>>>>>
> >>>>>>> wrote:
> >>>
> >>>> After completing GEODE-1233, all currently known flickering tests
> >>>>>>>>>
> >>>>>>>> are
> >>>>
> >>>>> now
> >>>>>>>
> >>>>>>>> annotated with our FlakyTest JUnit Category.
> >>>>>>>>>
> >>>>>>>>> In an effort to divide our build up into multiple build pipelines
> >>>>>>>>>
> >>>>>>>> that
> >>>>
> >>>>> are
> >>>>>>>
> >>>>>>>> sequential and dependable, we could consider excluding FlakyTests
> >>>>>>>>>
> >>>>>>>> from
> >>>>
> >>>>> the
> >>>>>>>
> >>>>>>>> primary integrationTest and distributedTest tasks. An additional
> >>>>>>>>>
> >>>>>>>> build
> >>>>
> >>>>> task
> >>>>>>>
> >>>>>>>> would then execute all of the FlakyTests separately. This would
> >>>>>>>>>
> >>>>>>>> hopefully
> >>>>>>>
> >>>>>>>> help us get to a point where we can depend on our primary testing
> >>>>>>>>>
> >>>>>>>> tasks
> >>>>
> >>>>> staying green 100% of the time. We would then prioritize fixing
> >>>>>>>>>
> >>>>>>>> the
> >>>
> >>>> FlakyTests and one by one removing the FlakyTest category from
> >>>>>>>>>
> >>>>>>>> them.
> >>>
> >>>> I would also suggest that we execute the FlakyTests with
> >>>>>>>>>
> >>>>>>>> "forkEvery
> >>>
> >>>> 1"
> >>>>
> >>>>> to
> >>>>>>>
> >>>>>>>> give each test a clean JVM or set of DistributedTest JVMs. That
> >>>>>>>>>
> >>>>>>>> would
> >>>>
> >>>>> hopefully decrease the chance of a GC pause or test pollution
> >>>>>>>>>
> >>>>>>>> causing
> >>>>
> >>>>> flickering failures.
> >>>>>>>>>
> >>>>>>>>> Having reviewed lots of test code and failure stacks, I believe
> >>>>>>>>>
> >>>>>>>> that
> >>>
> >>>> the
> >>>>>>>
> >>>>>>>> primary causes of FlakyTests are timing sensitivity (thread sleeps
> >>>>>>>>>
> >>>>>>>> or
> >>>>
> >>>>> nothing that waits for async activity, timeouts or sleeps that are
> >>>>>>>>> insufficient on busy CPU or I/O or during due GC pause) and
> random
> >>>>>>>>>
> >>>>>>>> ports
> >>>>>>>
> >>>>>>>> via AvailablePort (instead of using zero for ephemeral port).
> >>>>>>>>>
> >>>>>>>>> Opinions or ideas? Hate it? Love it?
> >>>>>>>>>
> >>>>>>>>> -Kirk
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>>
> >>>>>>>> ~/William
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >
>
>
>
>

Re: Next steps with flickering tests

Posted by Hitesh Khamesra <hi...@yahoo.com.INVALID>.

Kirk,
have we considered following pattern to change..
from -->
   int port = AvailablePort.getRandomAvailablePort(AvailablePort.SOCKET);
    server.setPort(port);
to ->  server.setPort(0);  port = server.getPort()
This should take care "bind address" issue.

      From: Kirk Lund <kl...@pivotal.io>
 To: geode <de...@geode.incubator.apache.org> 
 Sent: Wednesday, April 27, 2016 7:22 PM
 Subject: Re: Next steps with flickering tests
   
We currently have over 10,000 tests but only about 147 are annotated with
FlakyTest. It probably wouldn't cause precheckin to take much longer. My
main argument for separating the FlakyTests into their own Jenkins build
job is to get the main build job 100% green while we know the FlakyTest
build job might "flicker".

-Kirk


On Tue, Apr 26, 2016 at 1:58 PM, Udo Kohlmeyer <uk...@pivotal.io>
wrote:

> Depending on the amount of "flaky" tests, this should not increase the
> time too much.
> I forsee these "flaky" tests to be few and far in between. Over time I
> imagine this would be a last resort if we cannot fix the test or even
> improve the test harness to have a clean test space for each test.
>
> --Udo
>
>
> On 27/04/2016 6:42 am, Jens Deppe wrote:
>
>> By running the Flakes with forkEvery 1 won't it extend precheckin by a
>> fair
>> bit? I'd prefer to see two separate builds running.
>>
>> On Tue, Apr 26, 2016 at 11:53 AM, Kirk Lund <kl...@pivotal.io> wrote:
>>
>> I'm in favor of running the FlakyTests together at the end of precheckin
>>> using forkEvery 1 on them too.
>>>
>>> What about running two nightly builds? One that runs all the non-flaky
>>> UnitTests, IntegrationTests and DistributedTests. Plus another nightly
>>> build that runs only FlakyTests? We can run Jenkins jobs on our local
>>> machines that separates FlakyTests out into its own job too, but I'd like
>>> to see the main nightly build go to 100% green (if that's even possible
>>> without encounter many more flickering tests).
>>>
>>> -Kirk
>>>
>>>
>>> On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <ds...@pivotal.io> wrote:
>>>
>>> +1 for separating these out and running them with forkEvery 1.
>>>>
>>>> I think they should probably still run as part of precheckin and the
>>>> nightly builds though. We don't want this to turn into essentially
>>>> disabling and ignoring these tests.
>>>>
>>>> -Dan
>>>>
>>>> On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <kl...@pivotal.io> wrote:
>>>>
>>>>> Also, I don't think there's much value continuing to use the "CI"
>>>>>
>>>> label.
>>>
>>>> If
>>>>
>>>>> a test fails in Jenkins, then run the test to see if it fails
>>>>>
>>>> consistently.
>>>>
>>>>> If it doesn't, it's flaky. The developer looking at it should try to
>>>>> determine the cause of it failing (ie, "it uses thread sleeps or random
>>>>> ports with BindExceptions or has short timeouts with probable GC
>>>>>
>>>> pause")
>>>
>>>> and include that info when adding the FlakyTest annotation and filing a
>>>>> Jira bug with the Flaky label. If the test fails consistently, then
>>>>>
>>>> file
>>>
>>>> a
>>>>
>>>>> Jira bug without the Flaky label.
>>>>>
>>>>> -Kirk
>>>>>
>>>>>
>>>>> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <kl...@pivotal.io> wrote:
>>>>>
>>>>> There are quite a few test classes that have multiple test methods
>>>>>>
>>>>> which
>>>
>>>> are annotated with the FlakyTest category.
>>>>>>
>>>>>> More thoughts:
>>>>>>
>>>>>> In general, I think that if any given test fails intermittently then
>>>>>>
>>>>> it
>>>
>>>> is
>>>>
>>>>> a FlakyTest. A good test should either pass or fail consistently.
>>>>>>
>>>>> After
>>>
>>>> annotating a test method with FlakyTest, the developer should then add
>>>>>>
>>>>> the
>>>>
>>>>> Flaky label to corresponding Jira ticket. What we then do with the
>>>>>>
>>>>> Jira
>>>
>>>> tickets (ie, fix them) is probably more important than deciding if a
>>>>>>
>>>>> test
>>>>
>>>>> is flaky or not.
>>>>>>
>>>>>> Rather than try to come up with some flaky process for determining if
>>>>>>
>>>>> a
>>>
>>>> given test is flaky (ie, "does it have thread sleeps?"), it would be
>>>>>>
>>>>> better
>>>>
>>>>> to have a wiki page that has examples of flakiness and how to fix them
>>>>>>
>>>>> ("if
>>>>
>>>>> the test has thread sleeps, then switch to using Awaitility and do
>>>>>> this...").
>>>>>>
>>>>>> -Kirk
>>>>>>
>>>>>>
>>>>>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <ab...@pivotal.io>
>>>>>>
>>>>> wrote:
>>>>
>>>>> Thanks Kirk!
>>>>>>>
>>>>>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . | grep
>>>>>>>
>>>>>> -v
>>>>
>>>>> Binary | wc -l | xargs echo "Flake factor:"
>>>>>>> Flake factor: 136
>>>>>>>
>>>>>>> Anthony
>>>>>>>
>>>>>>>
>>>>>>> On Apr 25, 2016, at 9:45 PM, William Markito <wm...@pivotal.io>
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> +1
>>>>>>>>
>>>>>>>> Are we also planning to automate the additional build task somehow
>>>>>>>>
>>>>>>> ?
>>>
>>>> I'd also suggest creating a wiki page with some stats (like how
>>>>>>>>
>>>>>>> many
>>>
>>>> FlakyTests we currently have) and the idea behind this effort so we
>>>>>>>>
>>>>>>> can
>>>>
>>>>> keep track and see how it's evolving over time.
>>>>>>>>
>>>>>>>> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io>
>>>>>>>>
>>>>>>> wrote:
>>>
>>>> After completing GEODE-1233, all currently known flickering tests
>>>>>>>>>
>>>>>>>> are
>>>>
>>>>> now
>>>>>>>
>>>>>>>> annotated with our FlakyTest JUnit Category.
>>>>>>>>>
>>>>>>>>> In an effort to divide our build up into multiple build pipelines
>>>>>>>>>
>>>>>>>> that
>>>>
>>>>> are
>>>>>>>
>>>>>>>> sequential and dependable, we could consider excluding FlakyTests
>>>>>>>>>
>>>>>>>> from
>>>>
>>>>> the
>>>>>>>
>>>>>>>> primary integrationTest and distributedTest tasks. An additional
>>>>>>>>>
>>>>>>>> build
>>>>
>>>>> task
>>>>>>>
>>>>>>>> would then execute all of the FlakyTests separately. This would
>>>>>>>>>
>>>>>>>> hopefully
>>>>>>>
>>>>>>>> help us get to a point where we can depend on our primary testing
>>>>>>>>>
>>>>>>>> tasks
>>>>
>>>>> staying green 100% of the time. We would then prioritize fixing
>>>>>>>>>
>>>>>>>> the
>>>
>>>> FlakyTests and one by one removing the FlakyTest category from
>>>>>>>>>
>>>>>>>> them.
>>>
>>>> I would also suggest that we execute the FlakyTests with
>>>>>>>>>
>>>>>>>> "forkEvery
>>>
>>>> 1"
>>>>
>>>>> to
>>>>>>>
>>>>>>>> give each test a clean JVM or set of DistributedTest JVMs. That
>>>>>>>>>
>>>>>>>> would
>>>>
>>>>> hopefully decrease the chance of a GC pause or test pollution
>>>>>>>>>
>>>>>>>> causing
>>>>
>>>>> flickering failures.
>>>>>>>>>
>>>>>>>>> Having reviewed lots of test code and failure stacks, I believe
>>>>>>>>>
>>>>>>>> that
>>>
>>>> the
>>>>>>>
>>>>>>>> primary causes of FlakyTests are timing sensitivity (thread sleeps
>>>>>>>>>
>>>>>>>> or
>>>>
>>>>> nothing that waits for async activity, timeouts or sleeps that are
>>>>>>>>> insufficient on busy CPU or I/O or during due GC pause) and random
>>>>>>>>>
>>>>>>>> ports
>>>>>>>
>>>>>>>> via AvailablePort (instead of using zero for ephemeral port).
>>>>>>>>>
>>>>>>>>> Opinions or ideas? Hate it? Love it?
>>>>>>>>>
>>>>>>>>> -Kirk
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> ~/William
>>>>>>>>
>>>>>>>
>>>>>>>
>

Re: Next steps with flickering tests

Posted by Kirk Lund <kl...@pivotal.io>.

Looks like those tickets were filed since GEODE-1233 was completed.

-Kirk


On Mon, May 2, 2016 at 1:42 PM, Dan Smith <ds...@pivotal.io> wrote:

> testMultipleCacheServer *is* annotated as a flaky test. Maybe you aren't
> actually excluding anything?
>
> I'm surprised testTomstones is not annotated with flaky test. We have at
> least 3 bugs all related to this method that are still open - GEODE-1285,
> GEODE-1332, GEODE-1287.
>
> -Dan
>
>
>
>
> On Mon, May 2, 2016 at 11:25 AM, Anthony Baker <ab...@pivotal.io> wrote:
>
> > I have results from 10 runs of all the tests excluding @FlakyTest.  These
> > are the only failures:
> >
> > ubuntu@ip-172-31-44-240:~$ grep FAILED incubator-geode/nohup.out | grep
> > gemfire
> > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> > testMultipleCacheServer FAILED
> > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> > testMultipleCacheServer FAILED
> > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> > testMultipleCacheServer FAILED
> > com.gemstone.gemfire.cache30.DistributedAckPersistentRegionCCEDUnitTest >
> > testTombstones FAILED
> > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> > testMultipleCacheServer FAILED
> > com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> > testMultipleCacheServer FAILED
> >
> com.gemstone.gemfire.internal.cache.wan.parallel.ParallelWANStatsDUnitTest
> > > testParallelPropagationHA FAILED
> >
> > Anthony
> >
> > > On Apr 27, 2016, at 7:22 PM, Kirk Lund <kl...@pivotal.io> wrote:
> > >
> > > We currently have over 10,000 tests but only about 147 are annotated
> with
> > > FlakyTest. It probably wouldn't cause precheckin to take much longer.
> My
> > > main argument for separating the FlakyTests into their own Jenkins
> build
> > > job is to get the main build job 100% green while we know the FlakyTest
> > > build job might "flicker".
> > >
> > > -Kirk
> > >
> > >
> > > On Tue, Apr 26, 2016 at 1:58 PM, Udo Kohlmeyer <uk...@pivotal.io>
> > > wrote:
> > >
> > >> Depending on the amount of "flaky" tests, this should not increase the
> > >> time too much.
> > >> I forsee these "flaky" tests to be few and far in between. Over time I
> > >> imagine this would be a last resort if we cannot fix the test or even
> > >> improve the test harness to have a clean test space for each test.
> > >>
> > >> --Udo
> > >>
> > >>
> > >> On 27/04/2016 6:42 am, Jens Deppe wrote:
> > >>
> > >>> By running the Flakes with forkEvery 1 won't it extend precheckin by
> a
> > >>> fair
> > >>> bit? I'd prefer to see two separate builds running.
> > >>>
> > >>> On Tue, Apr 26, 2016 at 11:53 AM, Kirk Lund <kl...@pivotal.io>
> wrote:
> > >>>
> > >>> I'm in favor of running the FlakyTests together at the end of
> > precheckin
> > >>>> using forkEvery 1 on them too.
> > >>>>
> > >>>> What about running two nightly builds? One that runs all the
> non-flaky
> > >>>> UnitTests, IntegrationTests and DistributedTests. Plus another
> nightly
> > >>>> build that runs only FlakyTests? We can run Jenkins jobs on our
> local
> > >>>> machines that separates FlakyTests out into its own job too, but I'd
> > like
> > >>>> to see the main nightly build go to 100% green (if that's even
> > possible
> > >>>> without encounter many more flickering tests).
> > >>>>
> > >>>> -Kirk
> > >>>>
> > >>>>
> > >>>> On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <ds...@pivotal.io>
> > wrote:
> > >>>>
> > >>>> +1 for separating these out and running them with forkEvery 1.
> > >>>>>
> > >>>>> I think they should probably still run as part of precheckin and
> the
> > >>>>> nightly builds though. We don't want this to turn into essentially
> > >>>>> disabling and ignoring these tests.
> > >>>>>
> > >>>>> -Dan
> > >>>>>
> > >>>>> On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <kl...@pivotal.io>
> > wrote:
> > >>>>>
> > >>>>>> Also, I don't think there's much value continuing to use the "CI"
> > >>>>>>
> > >>>>> label.
> > >>>>
> > >>>>> If
> > >>>>>
> > >>>>>> a test fails in Jenkins, then run the test to see if it fails
> > >>>>>>
> > >>>>> consistently.
> > >>>>>
> > >>>>>> If it doesn't, it's flaky. The developer looking at it should try
> to
> > >>>>>> determine the cause of it failing (ie, "it uses thread sleeps or
> > random
> > >>>>>> ports with BindExceptions or has short timeouts with probable GC
> > >>>>>>
> > >>>>> pause")
> > >>>>
> > >>>>> and include that info when adding the FlakyTest annotation and
> > filing a
> > >>>>>> Jira bug with the Flaky label. If the test fails consistently,
> then
> > >>>>>>
> > >>>>> file
> > >>>>
> > >>>>> a
> > >>>>>
> > >>>>>> Jira bug without the Flaky label.
> > >>>>>>
> > >>>>>> -Kirk
> > >>>>>>
> > >>>>>>
> > >>>>>> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <kl...@pivotal.io>
> > wrote:
> > >>>>>>
> > >>>>>> There are quite a few test classes that have multiple test methods
> > >>>>>>>
> > >>>>>> which
> > >>>>
> > >>>>> are annotated with the FlakyTest category.
> > >>>>>>>
> > >>>>>>> More thoughts:
> > >>>>>>>
> > >>>>>>> In general, I think that if any given test fails intermittently
> > then
> > >>>>>>>
> > >>>>>> it
> > >>>>
> > >>>>> is
> > >>>>>
> > >>>>>> a FlakyTest. A good test should either pass or fail consistently.
> > >>>>>>>
> > >>>>>> After
> > >>>>
> > >>>>> annotating a test method with FlakyTest, the developer should then
> > add
> > >>>>>>>
> > >>>>>> the
> > >>>>>
> > >>>>>> Flaky label to corresponding Jira ticket. What we then do with the
> > >>>>>>>
> > >>>>>> Jira
> > >>>>
> > >>>>> tickets (ie, fix them) is probably more important than deciding if
> a
> > >>>>>>>
> > >>>>>> test
> > >>>>>
> > >>>>>> is flaky or not.
> > >>>>>>>
> > >>>>>>> Rather than try to come up with some flaky process for
> determining
> > if
> > >>>>>>>
> > >>>>>> a
> > >>>>
> > >>>>> given test is flaky (ie, "does it have thread sleeps?"), it would
> be
> > >>>>>>>
> > >>>>>> better
> > >>>>>
> > >>>>>> to have a wiki page that has examples of flakiness and how to fix
> > them
> > >>>>>>>
> > >>>>>> ("if
> > >>>>>
> > >>>>>> the test has thread sleeps, then switch to using Awaitility and do
> > >>>>>>> this...").
> > >>>>>>>
> > >>>>>>> -Kirk
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <
> abaker@pivotal.io
> > >
> > >>>>>>>
> > >>>>>> wrote:
> > >>>>>
> > >>>>>> Thanks Kirk!
> > >>>>>>>>
> > >>>>>>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . |
> > grep
> > >>>>>>>>
> > >>>>>>> -v
> > >>>>>
> > >>>>>> Binary | wc -l | xargs echo "Flake factor:"
> > >>>>>>>> Flake factor: 136
> > >>>>>>>>
> > >>>>>>>> Anthony
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Apr 25, 2016, at 9:45 PM, William Markito <
> wmarkito@pivotal.io
> > >
> > >>>>>>>>>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> +1
> > >>>>>>>>>
> > >>>>>>>>> Are we also planning to automate the additional build task
> > somehow
> > >>>>>>>>>
> > >>>>>>>> ?
> > >>>>
> > >>>>> I'd also suggest creating a wiki page with some stats (like how
> > >>>>>>>>>
> > >>>>>>>> many
> > >>>>
> > >>>>> FlakyTests we currently have) and the idea behind this effort so we
> > >>>>>>>>>
> > >>>>>>>> can
> > >>>>>
> > >>>>>> keep track and see how it's evolving over time.
> > >>>>>>>>>
> > >>>>>>>>> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io>
> > >>>>>>>>>
> > >>>>>>>> wrote:
> > >>>>
> > >>>>> After completing GEODE-1233, all currently known flickering tests
> > >>>>>>>>>>
> > >>>>>>>>> are
> > >>>>>
> > >>>>>> now
> > >>>>>>>>
> > >>>>>>>>> annotated with our FlakyTest JUnit Category.
> > >>>>>>>>>>
> > >>>>>>>>>> In an effort to divide our build up into multiple build
> > pipelines
> > >>>>>>>>>>
> > >>>>>>>>> that
> > >>>>>
> > >>>>>> are
> > >>>>>>>>
> > >>>>>>>>> sequential and dependable, we could consider excluding
> FlakyTests
> > >>>>>>>>>>
> > >>>>>>>>> from
> > >>>>>
> > >>>>>> the
> > >>>>>>>>
> > >>>>>>>>> primary integrationTest and distributedTest tasks. An
> additional
> > >>>>>>>>>>
> > >>>>>>>>> build
> > >>>>>
> > >>>>>> task
> > >>>>>>>>
> > >>>>>>>>> would then execute all of the FlakyTests separately. This would
> > >>>>>>>>>>
> > >>>>>>>>> hopefully
> > >>>>>>>>
> > >>>>>>>>> help us get to a point where we can depend on our primary
> testing
> > >>>>>>>>>>
> > >>>>>>>>> tasks
> > >>>>>
> > >>>>>> staying green 100% of the time. We would then prioritize fixing
> > >>>>>>>>>>
> > >>>>>>>>> the
> > >>>>
> > >>>>> FlakyTests and one by one removing the FlakyTest category from
> > >>>>>>>>>>
> > >>>>>>>>> them.
> > >>>>
> > >>>>> I would also suggest that we execute the FlakyTests with
> > >>>>>>>>>>
> > >>>>>>>>> "forkEvery
> > >>>>
> > >>>>> 1"
> > >>>>>
> > >>>>>> to
> > >>>>>>>>
> > >>>>>>>>> give each test a clean JVM or set of DistributedTest JVMs. That
> > >>>>>>>>>>
> > >>>>>>>>> would
> > >>>>>
> > >>>>>> hopefully decrease the chance of a GC pause or test pollution
> > >>>>>>>>>>
> > >>>>>>>>> causing
> > >>>>>
> > >>>>>> flickering failures.
> > >>>>>>>>>>
> > >>>>>>>>>> Having reviewed lots of test code and failure stacks, I
> believe
> > >>>>>>>>>>
> > >>>>>>>>> that
> > >>>>
> > >>>>> the
> > >>>>>>>>
> > >>>>>>>>> primary causes of FlakyTests are timing sensitivity (thread
> > sleeps
> > >>>>>>>>>>
> > >>>>>>>>> or
> > >>>>>
> > >>>>>> nothing that waits for async activity, timeouts or sleeps that are
> > >>>>>>>>>> insufficient on busy CPU or I/O or during due GC pause) and
> > random
> > >>>>>>>>>>
> > >>>>>>>>> ports
> > >>>>>>>>
> > >>>>>>>>> via AvailablePort (instead of using zero for ephemeral port).
> > >>>>>>>>>>
> > >>>>>>>>>> Opinions or ideas? Hate it? Love it?
> > >>>>>>>>>>
> > >>>>>>>>>> -Kirk
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>>
> > >>>>>>>>> ~/William
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>
> >
> >
>

Re: Next steps with flickering tests

Posted by Dan Smith <ds...@pivotal.io>.

testMultipleCacheServer *is* annotated as a flaky test. Maybe you aren't
actually excluding anything?

I'm surprised testTomstones is not annotated with flaky test. We have at
least 3 bugs all related to this method that are still open - GEODE-1285,
GEODE-1332, GEODE-1287.

-Dan




On Mon, May 2, 2016 at 11:25 AM, Anthony Baker <ab...@pivotal.io> wrote:

> I have results from 10 runs of all the tests excluding @FlakyTest.  These
> are the only failures:
>
> ubuntu@ip-172-31-44-240:~$ grep FAILED incubator-geode/nohup.out | grep
> gemfire
> com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> testMultipleCacheServer FAILED
> com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> testMultipleCacheServer FAILED
> com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> testMultipleCacheServer FAILED
> com.gemstone.gemfire.cache30.DistributedAckPersistentRegionCCEDUnitTest >
> testTombstones FAILED
> com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> testMultipleCacheServer FAILED
> com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest >
> testMultipleCacheServer FAILED
> com.gemstone.gemfire.internal.cache.wan.parallel.ParallelWANStatsDUnitTest
> > testParallelPropagationHA FAILED
>
> Anthony
>
> > On Apr 27, 2016, at 7:22 PM, Kirk Lund <kl...@pivotal.io> wrote:
> >
> > We currently have over 10,000 tests but only about 147 are annotated with
> > FlakyTest. It probably wouldn't cause precheckin to take much longer. My
> > main argument for separating the FlakyTests into their own Jenkins build
> > job is to get the main build job 100% green while we know the FlakyTest
> > build job might "flicker".
> >
> > -Kirk
> >
> >
> > On Tue, Apr 26, 2016 at 1:58 PM, Udo Kohlmeyer <uk...@pivotal.io>
> > wrote:
> >
> >> Depending on the amount of "flaky" tests, this should not increase the
> >> time too much.
> >> I forsee these "flaky" tests to be few and far in between. Over time I
> >> imagine this would be a last resort if we cannot fix the test or even
> >> improve the test harness to have a clean test space for each test.
> >>
> >> --Udo
> >>
> >>
> >> On 27/04/2016 6:42 am, Jens Deppe wrote:
> >>
> >>> By running the Flakes with forkEvery 1 won't it extend precheckin by a
> >>> fair
> >>> bit? I'd prefer to see two separate builds running.
> >>>
> >>> On Tue, Apr 26, 2016 at 11:53 AM, Kirk Lund <kl...@pivotal.io> wrote:
> >>>
> >>> I'm in favor of running the FlakyTests together at the end of
> precheckin
> >>>> using forkEvery 1 on them too.
> >>>>
> >>>> What about running two nightly builds? One that runs all the non-flaky
> >>>> UnitTests, IntegrationTests and DistributedTests. Plus another nightly
> >>>> build that runs only FlakyTests? We can run Jenkins jobs on our local
> >>>> machines that separates FlakyTests out into its own job too, but I'd
> like
> >>>> to see the main nightly build go to 100% green (if that's even
> possible
> >>>> without encounter many more flickering tests).
> >>>>
> >>>> -Kirk
> >>>>
> >>>>
> >>>> On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <ds...@pivotal.io>
> wrote:
> >>>>
> >>>> +1 for separating these out and running them with forkEvery 1.
> >>>>>
> >>>>> I think they should probably still run as part of precheckin and the
> >>>>> nightly builds though. We don't want this to turn into essentially
> >>>>> disabling and ignoring these tests.
> >>>>>
> >>>>> -Dan
> >>>>>
> >>>>> On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <kl...@pivotal.io>
> wrote:
> >>>>>
> >>>>>> Also, I don't think there's much value continuing to use the "CI"
> >>>>>>
> >>>>> label.
> >>>>
> >>>>> If
> >>>>>
> >>>>>> a test fails in Jenkins, then run the test to see if it fails
> >>>>>>
> >>>>> consistently.
> >>>>>
> >>>>>> If it doesn't, it's flaky. The developer looking at it should try to
> >>>>>> determine the cause of it failing (ie, "it uses thread sleeps or
> random
> >>>>>> ports with BindExceptions or has short timeouts with probable GC
> >>>>>>
> >>>>> pause")
> >>>>
> >>>>> and include that info when adding the FlakyTest annotation and
> filing a
> >>>>>> Jira bug with the Flaky label. If the test fails consistently, then
> >>>>>>
> >>>>> file
> >>>>
> >>>>> a
> >>>>>
> >>>>>> Jira bug without the Flaky label.
> >>>>>>
> >>>>>> -Kirk
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <kl...@pivotal.io>
> wrote:
> >>>>>>
> >>>>>> There are quite a few test classes that have multiple test methods
> >>>>>>>
> >>>>>> which
> >>>>
> >>>>> are annotated with the FlakyTest category.
> >>>>>>>
> >>>>>>> More thoughts:
> >>>>>>>
> >>>>>>> In general, I think that if any given test fails intermittently
> then
> >>>>>>>
> >>>>>> it
> >>>>
> >>>>> is
> >>>>>
> >>>>>> a FlakyTest. A good test should either pass or fail consistently.
> >>>>>>>
> >>>>>> After
> >>>>
> >>>>> annotating a test method with FlakyTest, the developer should then
> add
> >>>>>>>
> >>>>>> the
> >>>>>
> >>>>>> Flaky label to corresponding Jira ticket. What we then do with the
> >>>>>>>
> >>>>>> Jira
> >>>>
> >>>>> tickets (ie, fix them) is probably more important than deciding if a
> >>>>>>>
> >>>>>> test
> >>>>>
> >>>>>> is flaky or not.
> >>>>>>>
> >>>>>>> Rather than try to come up with some flaky process for determining
> if
> >>>>>>>
> >>>>>> a
> >>>>
> >>>>> given test is flaky (ie, "does it have thread sleeps?"), it would be
> >>>>>>>
> >>>>>> better
> >>>>>
> >>>>>> to have a wiki page that has examples of flakiness and how to fix
> them
> >>>>>>>
> >>>>>> ("if
> >>>>>
> >>>>>> the test has thread sleeps, then switch to using Awaitility and do
> >>>>>>> this...").
> >>>>>>>
> >>>>>>> -Kirk
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <abaker@pivotal.io
> >
> >>>>>>>
> >>>>>> wrote:
> >>>>>
> >>>>>> Thanks Kirk!
> >>>>>>>>
> >>>>>>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . |
> grep
> >>>>>>>>
> >>>>>>> -v
> >>>>>
> >>>>>> Binary | wc -l | xargs echo "Flake factor:"
> >>>>>>>> Flake factor: 136
> >>>>>>>>
> >>>>>>>> Anthony
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Apr 25, 2016, at 9:45 PM, William Markito <wmarkito@pivotal.io
> >
> >>>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> +1
> >>>>>>>>>
> >>>>>>>>> Are we also planning to automate the additional build task
> somehow
> >>>>>>>>>
> >>>>>>>> ?
> >>>>
> >>>>> I'd also suggest creating a wiki page with some stats (like how
> >>>>>>>>>
> >>>>>>>> many
> >>>>
> >>>>> FlakyTests we currently have) and the idea behind this effort so we
> >>>>>>>>>
> >>>>>>>> can
> >>>>>
> >>>>>> keep track and see how it's evolving over time.
> >>>>>>>>>
> >>>>>>>>> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io>
> >>>>>>>>>
> >>>>>>>> wrote:
> >>>>
> >>>>> After completing GEODE-1233, all currently known flickering tests
> >>>>>>>>>>
> >>>>>>>>> are
> >>>>>
> >>>>>> now
> >>>>>>>>
> >>>>>>>>> annotated with our FlakyTest JUnit Category.
> >>>>>>>>>>
> >>>>>>>>>> In an effort to divide our build up into multiple build
> pipelines
> >>>>>>>>>>
> >>>>>>>>> that
> >>>>>
> >>>>>> are
> >>>>>>>>
> >>>>>>>>> sequential and dependable, we could consider excluding FlakyTests
> >>>>>>>>>>
> >>>>>>>>> from
> >>>>>
> >>>>>> the
> >>>>>>>>
> >>>>>>>>> primary integrationTest and distributedTest tasks. An additional
> >>>>>>>>>>
> >>>>>>>>> build
> >>>>>
> >>>>>> task
> >>>>>>>>
> >>>>>>>>> would then execute all of the FlakyTests separately. This would
> >>>>>>>>>>
> >>>>>>>>> hopefully
> >>>>>>>>
> >>>>>>>>> help us get to a point where we can depend on our primary testing
> >>>>>>>>>>
> >>>>>>>>> tasks
> >>>>>
> >>>>>> staying green 100% of the time. We would then prioritize fixing
> >>>>>>>>>>
> >>>>>>>>> the
> >>>>
> >>>>> FlakyTests and one by one removing the FlakyTest category from
> >>>>>>>>>>
> >>>>>>>>> them.
> >>>>
> >>>>> I would also suggest that we execute the FlakyTests with
> >>>>>>>>>>
> >>>>>>>>> "forkEvery
> >>>>
> >>>>> 1"
> >>>>>
> >>>>>> to
> >>>>>>>>
> >>>>>>>>> give each test a clean JVM or set of DistributedTest JVMs. That
> >>>>>>>>>>
> >>>>>>>>> would
> >>>>>
> >>>>>> hopefully decrease the chance of a GC pause or test pollution
> >>>>>>>>>>
> >>>>>>>>> causing
> >>>>>
> >>>>>> flickering failures.
> >>>>>>>>>>
> >>>>>>>>>> Having reviewed lots of test code and failure stacks, I believe
> >>>>>>>>>>
> >>>>>>>>> that
> >>>>
> >>>>> the
> >>>>>>>>
> >>>>>>>>> primary causes of FlakyTests are timing sensitivity (thread
> sleeps
> >>>>>>>>>>
> >>>>>>>>> or
> >>>>>
> >>>>>> nothing that waits for async activity, timeouts or sleeps that are
> >>>>>>>>>> insufficient on busy CPU or I/O or during due GC pause) and
> random
> >>>>>>>>>>
> >>>>>>>>> ports
> >>>>>>>>
> >>>>>>>>> via AvailablePort (instead of using zero for ephemeral port).
> >>>>>>>>>>
> >>>>>>>>>> Opinions or ideas? Hate it? Love it?
> >>>>>>>>>>
> >>>>>>>>>> -Kirk
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>>
> >>>>>>>>> ~/William
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>
>
>

Re: Next steps with flickering tests

Posted by Anthony Baker <ab...@pivotal.io>.

I have results from 10 runs of all the tests excluding @FlakyTest.  These are the only failures:

ubuntu@ip-172-31-44-240:~$ grep FAILED incubator-geode/nohup.out | grep gemfire
com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > testMultipleCacheServer FAILED
com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > testMultipleCacheServer FAILED
com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > testMultipleCacheServer FAILED
com.gemstone.gemfire.cache30.DistributedAckPersistentRegionCCEDUnitTest > testTombstones FAILED
com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > testMultipleCacheServer FAILED
com.gemstone.gemfire.internal.cache.wan.CacheClientNotifierDUnitTest > testMultipleCacheServer FAILED
com.gemstone.gemfire.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > testParallelPropagationHA FAILED

Anthony

> On Apr 27, 2016, at 7:22 PM, Kirk Lund <kl...@pivotal.io> wrote:
> 
> We currently have over 10,000 tests but only about 147 are annotated with
> FlakyTest. It probably wouldn't cause precheckin to take much longer. My
> main argument for separating the FlakyTests into their own Jenkins build
> job is to get the main build job 100% green while we know the FlakyTest
> build job might "flicker".
> 
> -Kirk
> 
> 
> On Tue, Apr 26, 2016 at 1:58 PM, Udo Kohlmeyer <uk...@pivotal.io>
> wrote:
> 
>> Depending on the amount of "flaky" tests, this should not increase the
>> time too much.
>> I forsee these "flaky" tests to be few and far in between. Over time I
>> imagine this would be a last resort if we cannot fix the test or even
>> improve the test harness to have a clean test space for each test.
>> 
>> --Udo
>> 
>> 
>> On 27/04/2016 6:42 am, Jens Deppe wrote:
>> 
>>> By running the Flakes with forkEvery 1 won't it extend precheckin by a
>>> fair
>>> bit? I'd prefer to see two separate builds running.
>>> 
>>> On Tue, Apr 26, 2016 at 11:53 AM, Kirk Lund <kl...@pivotal.io> wrote:
>>> 
>>> I'm in favor of running the FlakyTests together at the end of precheckin
>>>> using forkEvery 1 on them too.
>>>> 
>>>> What about running two nightly builds? One that runs all the non-flaky
>>>> UnitTests, IntegrationTests and DistributedTests. Plus another nightly
>>>> build that runs only FlakyTests? We can run Jenkins jobs on our local
>>>> machines that separates FlakyTests out into its own job too, but I'd like
>>>> to see the main nightly build go to 100% green (if that's even possible
>>>> without encounter many more flickering tests).
>>>> 
>>>> -Kirk
>>>> 
>>>> 
>>>> On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <ds...@pivotal.io> wrote:
>>>> 
>>>> +1 for separating these out and running them with forkEvery 1.
>>>>> 
>>>>> I think they should probably still run as part of precheckin and the
>>>>> nightly builds though. We don't want this to turn into essentially
>>>>> disabling and ignoring these tests.
>>>>> 
>>>>> -Dan
>>>>> 
>>>>> On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <kl...@pivotal.io> wrote:
>>>>> 
>>>>>> Also, I don't think there's much value continuing to use the "CI"
>>>>>> 
>>>>> label.
>>>> 
>>>>> If
>>>>> 
>>>>>> a test fails in Jenkins, then run the test to see if it fails
>>>>>> 
>>>>> consistently.
>>>>> 
>>>>>> If it doesn't, it's flaky. The developer looking at it should try to
>>>>>> determine the cause of it failing (ie, "it uses thread sleeps or random
>>>>>> ports with BindExceptions or has short timeouts with probable GC
>>>>>> 
>>>>> pause")
>>>> 
>>>>> and include that info when adding the FlakyTest annotation and filing a
>>>>>> Jira bug with the Flaky label. If the test fails consistently, then
>>>>>> 
>>>>> file
>>>> 
>>>>> a
>>>>> 
>>>>>> Jira bug without the Flaky label.
>>>>>> 
>>>>>> -Kirk
>>>>>> 
>>>>>> 
>>>>>> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <kl...@pivotal.io> wrote:
>>>>>> 
>>>>>> There are quite a few test classes that have multiple test methods
>>>>>>> 
>>>>>> which
>>>> 
>>>>> are annotated with the FlakyTest category.
>>>>>>> 
>>>>>>> More thoughts:
>>>>>>> 
>>>>>>> In general, I think that if any given test fails intermittently then
>>>>>>> 
>>>>>> it
>>>> 
>>>>> is
>>>>> 
>>>>>> a FlakyTest. A good test should either pass or fail consistently.
>>>>>>> 
>>>>>> After
>>>> 
>>>>> annotating a test method with FlakyTest, the developer should then add
>>>>>>> 
>>>>>> the
>>>>> 
>>>>>> Flaky label to corresponding Jira ticket. What we then do with the
>>>>>>> 
>>>>>> Jira
>>>> 
>>>>> tickets (ie, fix them) is probably more important than deciding if a
>>>>>>> 
>>>>>> test
>>>>> 
>>>>>> is flaky or not.
>>>>>>> 
>>>>>>> Rather than try to come up with some flaky process for determining if
>>>>>>> 
>>>>>> a
>>>> 
>>>>> given test is flaky (ie, "does it have thread sleeps?"), it would be
>>>>>>> 
>>>>>> better
>>>>> 
>>>>>> to have a wiki page that has examples of flakiness and how to fix them
>>>>>>> 
>>>>>> ("if
>>>>> 
>>>>>> the test has thread sleeps, then switch to using Awaitility and do
>>>>>>> this...").
>>>>>>> 
>>>>>>> -Kirk
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <ab...@pivotal.io>
>>>>>>> 
>>>>>> wrote:
>>>>> 
>>>>>> Thanks Kirk!
>>>>>>>> 
>>>>>>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . | grep
>>>>>>>> 
>>>>>>> -v
>>>>> 
>>>>>> Binary | wc -l | xargs echo "Flake factor:"
>>>>>>>> Flake factor: 136
>>>>>>>> 
>>>>>>>> Anthony
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Apr 25, 2016, at 9:45 PM, William Markito <wm...@pivotal.io>
>>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> +1
>>>>>>>>> 
>>>>>>>>> Are we also planning to automate the additional build task somehow
>>>>>>>>> 
>>>>>>>> ?
>>>> 
>>>>> I'd also suggest creating a wiki page with some stats (like how
>>>>>>>>> 
>>>>>>>> many
>>>> 
>>>>> FlakyTests we currently have) and the idea behind this effort so we
>>>>>>>>> 
>>>>>>>> can
>>>>> 
>>>>>> keep track and see how it's evolving over time.
>>>>>>>>> 
>>>>>>>>> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io>
>>>>>>>>> 
>>>>>>>> wrote:
>>>> 
>>>>> After completing GEODE-1233, all currently known flickering tests
>>>>>>>>>> 
>>>>>>>>> are
>>>>> 
>>>>>> now
>>>>>>>> 
>>>>>>>>> annotated with our FlakyTest JUnit Category.
>>>>>>>>>> 
>>>>>>>>>> In an effort to divide our build up into multiple build pipelines
>>>>>>>>>> 
>>>>>>>>> that
>>>>> 
>>>>>> are
>>>>>>>> 
>>>>>>>>> sequential and dependable, we could consider excluding FlakyTests
>>>>>>>>>> 
>>>>>>>>> from
>>>>> 
>>>>>> the
>>>>>>>> 
>>>>>>>>> primary integrationTest and distributedTest tasks. An additional
>>>>>>>>>> 
>>>>>>>>> build
>>>>> 
>>>>>> task
>>>>>>>> 
>>>>>>>>> would then execute all of the FlakyTests separately. This would
>>>>>>>>>> 
>>>>>>>>> hopefully
>>>>>>>> 
>>>>>>>>> help us get to a point where we can depend on our primary testing
>>>>>>>>>> 
>>>>>>>>> tasks
>>>>> 
>>>>>> staying green 100% of the time. We would then prioritize fixing
>>>>>>>>>> 
>>>>>>>>> the
>>>> 
>>>>> FlakyTests and one by one removing the FlakyTest category from
>>>>>>>>>> 
>>>>>>>>> them.
>>>> 
>>>>> I would also suggest that we execute the FlakyTests with
>>>>>>>>>> 
>>>>>>>>> "forkEvery
>>>> 
>>>>> 1"
>>>>> 
>>>>>> to
>>>>>>>> 
>>>>>>>>> give each test a clean JVM or set of DistributedTest JVMs. That
>>>>>>>>>> 
>>>>>>>>> would
>>>>> 
>>>>>> hopefully decrease the chance of a GC pause or test pollution
>>>>>>>>>> 
>>>>>>>>> causing
>>>>> 
>>>>>> flickering failures.
>>>>>>>>>> 
>>>>>>>>>> Having reviewed lots of test code and failure stacks, I believe
>>>>>>>>>> 
>>>>>>>>> that
>>>> 
>>>>> the
>>>>>>>> 
>>>>>>>>> primary causes of FlakyTests are timing sensitivity (thread sleeps
>>>>>>>>>> 
>>>>>>>>> or
>>>>> 
>>>>>> nothing that waits for async activity, timeouts or sleeps that are
>>>>>>>>>> insufficient on busy CPU or I/O or during due GC pause) and random
>>>>>>>>>> 
>>>>>>>>> ports
>>>>>>>> 
>>>>>>>>> via AvailablePort (instead of using zero for ephemeral port).
>>>>>>>>>> 
>>>>>>>>>> Opinions or ideas? Hate it? Love it?
>>>>>>>>>> 
>>>>>>>>>> -Kirk
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> 
>>>>>>>>> ~/William
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>

Re: Next steps with flickering tests

Posted by Kirk Lund <kl...@pivotal.io>.

We currently have over 10,000 tests but only about 147 are annotated with
FlakyTest. It probably wouldn't cause precheckin to take much longer. My
main argument for separating the FlakyTests into their own Jenkins build
job is to get the main build job 100% green while we know the FlakyTest
build job might "flicker".

-Kirk


On Tue, Apr 26, 2016 at 1:58 PM, Udo Kohlmeyer <uk...@pivotal.io>
wrote:

> Depending on the amount of "flaky" tests, this should not increase the
> time too much.
> I forsee these "flaky" tests to be few and far in between. Over time I
> imagine this would be a last resort if we cannot fix the test or even
> improve the test harness to have a clean test space for each test.
>
> --Udo
>
>
> On 27/04/2016 6:42 am, Jens Deppe wrote:
>
>> By running the Flakes with forkEvery 1 won't it extend precheckin by a
>> fair
>> bit? I'd prefer to see two separate builds running.
>>
>> On Tue, Apr 26, 2016 at 11:53 AM, Kirk Lund <kl...@pivotal.io> wrote:
>>
>> I'm in favor of running the FlakyTests together at the end of precheckin
>>> using forkEvery 1 on them too.
>>>
>>> What about running two nightly builds? One that runs all the non-flaky
>>> UnitTests, IntegrationTests and DistributedTests. Plus another nightly
>>> build that runs only FlakyTests? We can run Jenkins jobs on our local
>>> machines that separates FlakyTests out into its own job too, but I'd like
>>> to see the main nightly build go to 100% green (if that's even possible
>>> without encounter many more flickering tests).
>>>
>>> -Kirk
>>>
>>>
>>> On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <ds...@pivotal.io> wrote:
>>>
>>> +1 for separating these out and running them with forkEvery 1.
>>>>
>>>> I think they should probably still run as part of precheckin and the
>>>> nightly builds though. We don't want this to turn into essentially
>>>> disabling and ignoring these tests.
>>>>
>>>> -Dan
>>>>
>>>> On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <kl...@pivotal.io> wrote:
>>>>
>>>>> Also, I don't think there's much value continuing to use the "CI"
>>>>>
>>>> label.
>>>
>>>> If
>>>>
>>>>> a test fails in Jenkins, then run the test to see if it fails
>>>>>
>>>> consistently.
>>>>
>>>>> If it doesn't, it's flaky. The developer looking at it should try to
>>>>> determine the cause of it failing (ie, "it uses thread sleeps or random
>>>>> ports with BindExceptions or has short timeouts with probable GC
>>>>>
>>>> pause")
>>>
>>>> and include that info when adding the FlakyTest annotation and filing a
>>>>> Jira bug with the Flaky label. If the test fails consistently, then
>>>>>
>>>> file
>>>
>>>> a
>>>>
>>>>> Jira bug without the Flaky label.
>>>>>
>>>>> -Kirk
>>>>>
>>>>>
>>>>> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <kl...@pivotal.io> wrote:
>>>>>
>>>>> There are quite a few test classes that have multiple test methods
>>>>>>
>>>>> which
>>>
>>>> are annotated with the FlakyTest category.
>>>>>>
>>>>>> More thoughts:
>>>>>>
>>>>>> In general, I think that if any given test fails intermittently then
>>>>>>
>>>>> it
>>>
>>>> is
>>>>
>>>>> a FlakyTest. A good test should either pass or fail consistently.
>>>>>>
>>>>> After
>>>
>>>> annotating a test method with FlakyTest, the developer should then add
>>>>>>
>>>>> the
>>>>
>>>>> Flaky label to corresponding Jira ticket. What we then do with the
>>>>>>
>>>>> Jira
>>>
>>>> tickets (ie, fix them) is probably more important than deciding if a
>>>>>>
>>>>> test
>>>>
>>>>> is flaky or not.
>>>>>>
>>>>>> Rather than try to come up with some flaky process for determining if
>>>>>>
>>>>> a
>>>
>>>> given test is flaky (ie, "does it have thread sleeps?"), it would be
>>>>>>
>>>>> better
>>>>
>>>>> to have a wiki page that has examples of flakiness and how to fix them
>>>>>>
>>>>> ("if
>>>>
>>>>> the test has thread sleeps, then switch to using Awaitility and do
>>>>>> this...").
>>>>>>
>>>>>> -Kirk
>>>>>>
>>>>>>
>>>>>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <ab...@pivotal.io>
>>>>>>
>>>>> wrote:
>>>>
>>>>> Thanks Kirk!
>>>>>>>
>>>>>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . | grep
>>>>>>>
>>>>>> -v
>>>>
>>>>> Binary | wc -l | xargs echo "Flake factor:"
>>>>>>> Flake factor: 136
>>>>>>>
>>>>>>> Anthony
>>>>>>>
>>>>>>>
>>>>>>> On Apr 25, 2016, at 9:45 PM, William Markito <wm...@pivotal.io>
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> +1
>>>>>>>>
>>>>>>>> Are we also planning to automate the additional build task somehow
>>>>>>>>
>>>>>>> ?
>>>
>>>> I'd also suggest creating a wiki page with some stats (like how
>>>>>>>>
>>>>>>> many
>>>
>>>> FlakyTests we currently have) and the idea behind this effort so we
>>>>>>>>
>>>>>>> can
>>>>
>>>>> keep track and see how it's evolving over time.
>>>>>>>>
>>>>>>>> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io>
>>>>>>>>
>>>>>>> wrote:
>>>
>>>> After completing GEODE-1233, all currently known flickering tests
>>>>>>>>>
>>>>>>>> are
>>>>
>>>>> now
>>>>>>>
>>>>>>>> annotated with our FlakyTest JUnit Category.
>>>>>>>>>
>>>>>>>>> In an effort to divide our build up into multiple build pipelines
>>>>>>>>>
>>>>>>>> that
>>>>
>>>>> are
>>>>>>>
>>>>>>>> sequential and dependable, we could consider excluding FlakyTests
>>>>>>>>>
>>>>>>>> from
>>>>
>>>>> the
>>>>>>>
>>>>>>>> primary integrationTest and distributedTest tasks. An additional
>>>>>>>>>
>>>>>>>> build
>>>>
>>>>> task
>>>>>>>
>>>>>>>> would then execute all of the FlakyTests separately. This would
>>>>>>>>>
>>>>>>>> hopefully
>>>>>>>
>>>>>>>> help us get to a point where we can depend on our primary testing
>>>>>>>>>
>>>>>>>> tasks
>>>>
>>>>> staying green 100% of the time. We would then prioritize fixing
>>>>>>>>>
>>>>>>>> the
>>>
>>>> FlakyTests and one by one removing the FlakyTest category from
>>>>>>>>>
>>>>>>>> them.
>>>
>>>> I would also suggest that we execute the FlakyTests with
>>>>>>>>>
>>>>>>>> "forkEvery
>>>
>>>> 1"
>>>>
>>>>> to
>>>>>>>
>>>>>>>> give each test a clean JVM or set of DistributedTest JVMs. That
>>>>>>>>>
>>>>>>>> would
>>>>
>>>>> hopefully decrease the chance of a GC pause or test pollution
>>>>>>>>>
>>>>>>>> causing
>>>>
>>>>> flickering failures.
>>>>>>>>>
>>>>>>>>> Having reviewed lots of test code and failure stacks, I believe
>>>>>>>>>
>>>>>>>> that
>>>
>>>> the
>>>>>>>
>>>>>>>> primary causes of FlakyTests are timing sensitivity (thread sleeps
>>>>>>>>>
>>>>>>>> or
>>>>
>>>>> nothing that waits for async activity, timeouts or sleeps that are
>>>>>>>>> insufficient on busy CPU or I/O or during due GC pause) and random
>>>>>>>>>
>>>>>>>> ports
>>>>>>>
>>>>>>>> via AvailablePort (instead of using zero for ephemeral port).
>>>>>>>>>
>>>>>>>>> Opinions or ideas? Hate it? Love it?
>>>>>>>>>
>>>>>>>>> -Kirk
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> ~/William
>>>>>>>>
>>>>>>>
>>>>>>>
>

Re: Next steps with flickering tests

Posted by Udo Kohlmeyer <uk...@pivotal.io>.

Depending on the amount of "flaky" tests, this should not increase the 
time too much.
I forsee these "flaky" tests to be few and far in between. Over time I 
imagine this would be a last resort if we cannot fix the test or even 
improve the test harness to have a clean test space for each test.

--Udo

On 27/04/2016 6:42 am, Jens Deppe wrote:
> By running the Flakes with forkEvery 1 won't it extend precheckin by a fair
> bit? I'd prefer to see two separate builds running.
>
> On Tue, Apr 26, 2016 at 11:53 AM, Kirk Lund <kl...@pivotal.io> wrote:
>
>> I'm in favor of running the FlakyTests together at the end of precheckin
>> using forkEvery 1 on them too.
>>
>> What about running two nightly builds? One that runs all the non-flaky
>> UnitTests, IntegrationTests and DistributedTests. Plus another nightly
>> build that runs only FlakyTests? We can run Jenkins jobs on our local
>> machines that separates FlakyTests out into its own job too, but I'd like
>> to see the main nightly build go to 100% green (if that's even possible
>> without encounter many more flickering tests).
>>
>> -Kirk
>>
>>
>> On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <ds...@pivotal.io> wrote:
>>
>>> +1 for separating these out and running them with forkEvery 1.
>>>
>>> I think they should probably still run as part of precheckin and the
>>> nightly builds though. We don't want this to turn into essentially
>>> disabling and ignoring these tests.
>>>
>>> -Dan
>>>
>>> On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <kl...@pivotal.io> wrote:
>>>> Also, I don't think there's much value continuing to use the "CI"
>> label.
>>> If
>>>> a test fails in Jenkins, then run the test to see if it fails
>>> consistently.
>>>> If it doesn't, it's flaky. The developer looking at it should try to
>>>> determine the cause of it failing (ie, "it uses thread sleeps or random
>>>> ports with BindExceptions or has short timeouts with probable GC
>> pause")
>>>> and include that info when adding the FlakyTest annotation and filing a
>>>> Jira bug with the Flaky label. If the test fails consistently, then
>> file
>>> a
>>>> Jira bug without the Flaky label.
>>>>
>>>> -Kirk
>>>>
>>>>
>>>> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <kl...@pivotal.io> wrote:
>>>>
>>>>> There are quite a few test classes that have multiple test methods
>> which
>>>>> are annotated with the FlakyTest category.
>>>>>
>>>>> More thoughts:
>>>>>
>>>>> In general, I think that if any given test fails intermittently then
>> it
>>> is
>>>>> a FlakyTest. A good test should either pass or fail consistently.
>> After
>>>>> annotating a test method with FlakyTest, the developer should then add
>>> the
>>>>> Flaky label to corresponding Jira ticket. What we then do with the
>> Jira
>>>>> tickets (ie, fix them) is probably more important than deciding if a
>>> test
>>>>> is flaky or not.
>>>>>
>>>>> Rather than try to come up with some flaky process for determining if
>> a
>>>>> given test is flaky (ie, "does it have thread sleeps?"), it would be
>>> better
>>>>> to have a wiki page that has examples of flakiness and how to fix them
>>> ("if
>>>>> the test has thread sleeps, then switch to using Awaitility and do
>>>>> this...").
>>>>>
>>>>> -Kirk
>>>>>
>>>>>
>>>>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <ab...@pivotal.io>
>>> wrote:
>>>>>> Thanks Kirk!
>>>>>>
>>>>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . | grep
>>> -v
>>>>>> Binary | wc -l | xargs echo "Flake factor:"
>>>>>> Flake factor: 136
>>>>>>
>>>>>> Anthony
>>>>>>
>>>>>>
>>>>>>> On Apr 25, 2016, at 9:45 PM, William Markito <wm...@pivotal.io>
>>>>>> wrote:
>>>>>>> +1
>>>>>>>
>>>>>>> Are we also planning to automate the additional build task somehow
>> ?
>>>>>>> I'd also suggest creating a wiki page with some stats (like how
>> many
>>>>>>> FlakyTests we currently have) and the idea behind this effort so we
>>> can
>>>>>>> keep track and see how it's evolving over time.
>>>>>>>
>>>>>>> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io>
>> wrote:
>>>>>>>> After completing GEODE-1233, all currently known flickering tests
>>> are
>>>>>> now
>>>>>>>> annotated with our FlakyTest JUnit Category.
>>>>>>>>
>>>>>>>> In an effort to divide our build up into multiple build pipelines
>>> that
>>>>>> are
>>>>>>>> sequential and dependable, we could consider excluding FlakyTests
>>> from
>>>>>> the
>>>>>>>> primary integrationTest and distributedTest tasks. An additional
>>> build
>>>>>> task
>>>>>>>> would then execute all of the FlakyTests separately. This would
>>>>>> hopefully
>>>>>>>> help us get to a point where we can depend on our primary testing
>>> tasks
>>>>>>>> staying green 100% of the time. We would then prioritize fixing
>> the
>>>>>>>> FlakyTests and one by one removing the FlakyTest category from
>> them.
>>>>>>>> I would also suggest that we execute the FlakyTests with
>> "forkEvery
>>> 1"
>>>>>> to
>>>>>>>> give each test a clean JVM or set of DistributedTest JVMs. That
>>> would
>>>>>>>> hopefully decrease the chance of a GC pause or test pollution
>>> causing
>>>>>>>> flickering failures.
>>>>>>>>
>>>>>>>> Having reviewed lots of test code and failure stacks, I believe
>> that
>>>>>> the
>>>>>>>> primary causes of FlakyTests are timing sensitivity (thread sleeps
>>> or
>>>>>>>> nothing that waits for async activity, timeouts or sleeps that are
>>>>>>>> insufficient on busy CPU or I/O or during due GC pause) and random
>>>>>> ports
>>>>>>>> via AvailablePort (instead of using zero for ephemeral port).
>>>>>>>>
>>>>>>>> Opinions or ideas? Hate it? Love it?
>>>>>>>>
>>>>>>>> -Kirk
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> ~/William
>>>>>>

Re: Next steps with flickering tests

Posted by Jens Deppe <jd...@pivotal.io>.

By running the Flakes with forkEvery 1 won't it extend precheckin by a fair
bit? I'd prefer to see two separate builds running.

On Tue, Apr 26, 2016 at 11:53 AM, Kirk Lund <kl...@pivotal.io> wrote:

> I'm in favor of running the FlakyTests together at the end of precheckin
> using forkEvery 1 on them too.
>
> What about running two nightly builds? One that runs all the non-flaky
> UnitTests, IntegrationTests and DistributedTests. Plus another nightly
> build that runs only FlakyTests? We can run Jenkins jobs on our local
> machines that separates FlakyTests out into its own job too, but I'd like
> to see the main nightly build go to 100% green (if that's even possible
> without encounter many more flickering tests).
>
> -Kirk
>
>
> On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <ds...@pivotal.io> wrote:
>
> > +1 for separating these out and running them with forkEvery 1.
> >
> > I think they should probably still run as part of precheckin and the
> > nightly builds though. We don't want this to turn into essentially
> > disabling and ignoring these tests.
> >
> > -Dan
> >
> > On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <kl...@pivotal.io> wrote:
> > > Also, I don't think there's much value continuing to use the "CI"
> label.
> > If
> > > a test fails in Jenkins, then run the test to see if it fails
> > consistently.
> > > If it doesn't, it's flaky. The developer looking at it should try to
> > > determine the cause of it failing (ie, "it uses thread sleeps or random
> > > ports with BindExceptions or has short timeouts with probable GC
> pause")
> > > and include that info when adding the FlakyTest annotation and filing a
> > > Jira bug with the Flaky label. If the test fails consistently, then
> file
> > a
> > > Jira bug without the Flaky label.
> > >
> > > -Kirk
> > >
> > >
> > > On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <kl...@pivotal.io> wrote:
> > >
> > >> There are quite a few test classes that have multiple test methods
> which
> > >> are annotated with the FlakyTest category.
> > >>
> > >> More thoughts:
> > >>
> > >> In general, I think that if any given test fails intermittently then
> it
> > is
> > >> a FlakyTest. A good test should either pass or fail consistently.
> After
> > >> annotating a test method with FlakyTest, the developer should then add
> > the
> > >> Flaky label to corresponding Jira ticket. What we then do with the
> Jira
> > >> tickets (ie, fix them) is probably more important than deciding if a
> > test
> > >> is flaky or not.
> > >>
> > >> Rather than try to come up with some flaky process for determining if
> a
> > >> given test is flaky (ie, "does it have thread sleeps?"), it would be
> > better
> > >> to have a wiki page that has examples of flakiness and how to fix them
> > ("if
> > >> the test has thread sleeps, then switch to using Awaitility and do
> > >> this...").
> > >>
> > >> -Kirk
> > >>
> > >>
> > >> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <ab...@pivotal.io>
> > wrote:
> > >>
> > >>> Thanks Kirk!
> > >>>
> > >>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . | grep
> > -v
> > >>> Binary | wc -l | xargs echo "Flake factor:"
> > >>> Flake factor: 136
> > >>>
> > >>> Anthony
> > >>>
> > >>>
> > >>> > On Apr 25, 2016, at 9:45 PM, William Markito <wm...@pivotal.io>
> > >>> wrote:
> > >>> >
> > >>> > +1
> > >>> >
> > >>> > Are we also planning to automate the additional build task somehow
> ?
> > >>> >
> > >>> > I'd also suggest creating a wiki page with some stats (like how
> many
> > >>> > FlakyTests we currently have) and the idea behind this effort so we
> > can
> > >>> > keep track and see how it's evolving over time.
> > >>> >
> > >>> > On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io>
> wrote:
> > >>> >
> > >>> >> After completing GEODE-1233, all currently known flickering tests
> > are
> > >>> now
> > >>> >> annotated with our FlakyTest JUnit Category.
> > >>> >>
> > >>> >> In an effort to divide our build up into multiple build pipelines
> > that
> > >>> are
> > >>> >> sequential and dependable, we could consider excluding FlakyTests
> > from
> > >>> the
> > >>> >> primary integrationTest and distributedTest tasks. An additional
> > build
> > >>> task
> > >>> >> would then execute all of the FlakyTests separately. This would
> > >>> hopefully
> > >>> >> help us get to a point where we can depend on our primary testing
> > tasks
> > >>> >> staying green 100% of the time. We would then prioritize fixing
> the
> > >>> >> FlakyTests and one by one removing the FlakyTest category from
> them.
> > >>> >>
> > >>> >> I would also suggest that we execute the FlakyTests with
> "forkEvery
> > 1"
> > >>> to
> > >>> >> give each test a clean JVM or set of DistributedTest JVMs. That
> > would
> > >>> >> hopefully decrease the chance of a GC pause or test pollution
> > causing
> > >>> >> flickering failures.
> > >>> >>
> > >>> >> Having reviewed lots of test code and failure stacks, I believe
> that
> > >>> the
> > >>> >> primary causes of FlakyTests are timing sensitivity (thread sleeps
> > or
> > >>> >> nothing that waits for async activity, timeouts or sleeps that are
> > >>> >> insufficient on busy CPU or I/O or during due GC pause) and random
> > >>> ports
> > >>> >> via AvailablePort (instead of using zero for ephemeral port).
> > >>> >>
> > >>> >> Opinions or ideas? Hate it? Love it?
> > >>> >>
> > >>> >> -Kirk
> > >>> >>
> > >>> >
> > >>> >
> > >>> >
> > >>> > --
> > >>> >
> > >>> > ~/William
> > >>>
> > >>>
> > >>
> >
>

Re: Next steps with flickering tests

Posted by Kirk Lund <kl...@pivotal.io>.

I'm in favor of running the FlakyTests together at the end of precheckin
using forkEvery 1 on them too.

What about running two nightly builds? One that runs all the non-flaky
UnitTests, IntegrationTests and DistributedTests. Plus another nightly
build that runs only FlakyTests? We can run Jenkins jobs on our local
machines that separates FlakyTests out into its own job too, but I'd like
to see the main nightly build go to 100% green (if that's even possible
without encounter many more flickering tests).

-Kirk


On Tue, Apr 26, 2016 at 11:02 AM, Dan Smith <ds...@pivotal.io> wrote:

> +1 for separating these out and running them with forkEvery 1.
>
> I think they should probably still run as part of precheckin and the
> nightly builds though. We don't want this to turn into essentially
> disabling and ignoring these tests.
>
> -Dan
>
> On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <kl...@pivotal.io> wrote:
> > Also, I don't think there's much value continuing to use the "CI" label.
> If
> > a test fails in Jenkins, then run the test to see if it fails
> consistently.
> > If it doesn't, it's flaky. The developer looking at it should try to
> > determine the cause of it failing (ie, "it uses thread sleeps or random
> > ports with BindExceptions or has short timeouts with probable GC pause")
> > and include that info when adding the FlakyTest annotation and filing a
> > Jira bug with the Flaky label. If the test fails consistently, then file
> a
> > Jira bug without the Flaky label.
> >
> > -Kirk
> >
> >
> > On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <kl...@pivotal.io> wrote:
> >
> >> There are quite a few test classes that have multiple test methods which
> >> are annotated with the FlakyTest category.
> >>
> >> More thoughts:
> >>
> >> In general, I think that if any given test fails intermittently then it
> is
> >> a FlakyTest. A good test should either pass or fail consistently. After
> >> annotating a test method with FlakyTest, the developer should then add
> the
> >> Flaky label to corresponding Jira ticket. What we then do with the Jira
> >> tickets (ie, fix them) is probably more important than deciding if a
> test
> >> is flaky or not.
> >>
> >> Rather than try to come up with some flaky process for determining if a
> >> given test is flaky (ie, "does it have thread sleeps?"), it would be
> better
> >> to have a wiki page that has examples of flakiness and how to fix them
> ("if
> >> the test has thread sleeps, then switch to using Awaitility and do
> >> this...").
> >>
> >> -Kirk
> >>
> >>
> >> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <ab...@pivotal.io>
> wrote:
> >>
> >>> Thanks Kirk!
> >>>
> >>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . | grep
> -v
> >>> Binary | wc -l | xargs echo "Flake factor:"
> >>> Flake factor: 136
> >>>
> >>> Anthony
> >>>
> >>>
> >>> > On Apr 25, 2016, at 9:45 PM, William Markito <wm...@pivotal.io>
> >>> wrote:
> >>> >
> >>> > +1
> >>> >
> >>> > Are we also planning to automate the additional build task somehow ?
> >>> >
> >>> > I'd also suggest creating a wiki page with some stats (like how many
> >>> > FlakyTests we currently have) and the idea behind this effort so we
> can
> >>> > keep track and see how it's evolving over time.
> >>> >
> >>> > On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io> wrote:
> >>> >
> >>> >> After completing GEODE-1233, all currently known flickering tests
> are
> >>> now
> >>> >> annotated with our FlakyTest JUnit Category.
> >>> >>
> >>> >> In an effort to divide our build up into multiple build pipelines
> that
> >>> are
> >>> >> sequential and dependable, we could consider excluding FlakyTests
> from
> >>> the
> >>> >> primary integrationTest and distributedTest tasks. An additional
> build
> >>> task
> >>> >> would then execute all of the FlakyTests separately. This would
> >>> hopefully
> >>> >> help us get to a point where we can depend on our primary testing
> tasks
> >>> >> staying green 100% of the time. We would then prioritize fixing the
> >>> >> FlakyTests and one by one removing the FlakyTest category from them.
> >>> >>
> >>> >> I would also suggest that we execute the FlakyTests with "forkEvery
> 1"
> >>> to
> >>> >> give each test a clean JVM or set of DistributedTest JVMs. That
> would
> >>> >> hopefully decrease the chance of a GC pause or test pollution
> causing
> >>> >> flickering failures.
> >>> >>
> >>> >> Having reviewed lots of test code and failure stacks, I believe that
> >>> the
> >>> >> primary causes of FlakyTests are timing sensitivity (thread sleeps
> or
> >>> >> nothing that waits for async activity, timeouts or sleeps that are
> >>> >> insufficient on busy CPU or I/O or during due GC pause) and random
> >>> ports
> >>> >> via AvailablePort (instead of using zero for ephemeral port).
> >>> >>
> >>> >> Opinions or ideas? Hate it? Love it?
> >>> >>
> >>> >> -Kirk
> >>> >>
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> >
> >>> > ~/William
> >>>
> >>>
> >>
>

Re: Next steps with flickering tests

Posted by Anthony Baker <ab...@pivotal.io>.

I think it would be interesting to see results from N runs without flaky tests.

If we get 100% success from those runs, then I see value in splitting out Jenkins jobs.  It gives a known good state that should never break.  We can then whittle down the Flakies over time.

Anthony

> On Apr 26, 2016, at 11:02 AM, Dan Smith <ds...@pivotal.io> wrote:
> 
> I think they should probably still run as part of precheckin and the
> nightly builds though. We don't want this to turn into essentially
> disabling and ignoring these tests.

Re: Next steps with flickering tests

Posted by Dan Smith <ds...@pivotal.io>.

+1 for separating these out and running them with forkEvery 1.

I think they should probably still run as part of precheckin and the
nightly builds though. We don't want this to turn into essentially
disabling and ignoring these tests.

-Dan

On Tue, Apr 26, 2016 at 10:28 AM, Kirk Lund <kl...@pivotal.io> wrote:
> Also, I don't think there's much value continuing to use the "CI" label. If
> a test fails in Jenkins, then run the test to see if it fails consistently.
> If it doesn't, it's flaky. The developer looking at it should try to
> determine the cause of it failing (ie, "it uses thread sleeps or random
> ports with BindExceptions or has short timeouts with probable GC pause")
> and include that info when adding the FlakyTest annotation and filing a
> Jira bug with the Flaky label. If the test fails consistently, then file a
> Jira bug without the Flaky label.
>
> -Kirk
>
>
> On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <kl...@pivotal.io> wrote:
>
>> There are quite a few test classes that have multiple test methods which
>> are annotated with the FlakyTest category.
>>
>> More thoughts:
>>
>> In general, I think that if any given test fails intermittently then it is
>> a FlakyTest. A good test should either pass or fail consistently. After
>> annotating a test method with FlakyTest, the developer should then add the
>> Flaky label to corresponding Jira ticket. What we then do with the Jira
>> tickets (ie, fix them) is probably more important than deciding if a test
>> is flaky or not.
>>
>> Rather than try to come up with some flaky process for determining if a
>> given test is flaky (ie, "does it have thread sleeps?"), it would be better
>> to have a wiki page that has examples of flakiness and how to fix them ("if
>> the test has thread sleeps, then switch to using Awaitility and do
>> this...").
>>
>> -Kirk
>>
>>
>> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <ab...@pivotal.io> wrote:
>>
>>> Thanks Kirk!
>>>
>>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . | grep -v
>>> Binary | wc -l | xargs echo "Flake factor:"
>>> Flake factor: 136
>>>
>>> Anthony
>>>
>>>
>>> > On Apr 25, 2016, at 9:45 PM, William Markito <wm...@pivotal.io>
>>> wrote:
>>> >
>>> > +1
>>> >
>>> > Are we also planning to automate the additional build task somehow ?
>>> >
>>> > I'd also suggest creating a wiki page with some stats (like how many
>>> > FlakyTests we currently have) and the idea behind this effort so we can
>>> > keep track and see how it's evolving over time.
>>> >
>>> > On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io> wrote:
>>> >
>>> >> After completing GEODE-1233, all currently known flickering tests are
>>> now
>>> >> annotated with our FlakyTest JUnit Category.
>>> >>
>>> >> In an effort to divide our build up into multiple build pipelines that
>>> are
>>> >> sequential and dependable, we could consider excluding FlakyTests from
>>> the
>>> >> primary integrationTest and distributedTest tasks. An additional build
>>> task
>>> >> would then execute all of the FlakyTests separately. This would
>>> hopefully
>>> >> help us get to a point where we can depend on our primary testing tasks
>>> >> staying green 100% of the time. We would then prioritize fixing the
>>> >> FlakyTests and one by one removing the FlakyTest category from them.
>>> >>
>>> >> I would also suggest that we execute the FlakyTests with "forkEvery 1"
>>> to
>>> >> give each test a clean JVM or set of DistributedTest JVMs. That would
>>> >> hopefully decrease the chance of a GC pause or test pollution causing
>>> >> flickering failures.
>>> >>
>>> >> Having reviewed lots of test code and failure stacks, I believe that
>>> the
>>> >> primary causes of FlakyTests are timing sensitivity (thread sleeps or
>>> >> nothing that waits for async activity, timeouts or sleeps that are
>>> >> insufficient on busy CPU or I/O or during due GC pause) and random
>>> ports
>>> >> via AvailablePort (instead of using zero for ephemeral port).
>>> >>
>>> >> Opinions or ideas? Hate it? Love it?
>>> >>
>>> >> -Kirk
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> >
>>> > ~/William
>>>
>>>
>>

Re: Next steps with flickering tests

Posted by Kirk Lund <kl...@pivotal.io>.

Also, I don't think there's much value continuing to use the "CI" label. If
a test fails in Jenkins, then run the test to see if it fails consistently.
If it doesn't, it's flaky. The developer looking at it should try to
determine the cause of it failing (ie, "it uses thread sleeps or random
ports with BindExceptions or has short timeouts with probable GC pause")
and include that info when adding the FlakyTest annotation and filing a
Jira bug with the Flaky label. If the test fails consistently, then file a
Jira bug without the Flaky label.

-Kirk


On Tue, Apr 26, 2016 at 10:24 AM, Kirk Lund <kl...@pivotal.io> wrote:

> There are quite a few test classes that have multiple test methods which
> are annotated with the FlakyTest category.
>
> More thoughts:
>
> In general, I think that if any given test fails intermittently then it is
> a FlakyTest. A good test should either pass or fail consistently. After
> annotating a test method with FlakyTest, the developer should then add the
> Flaky label to corresponding Jira ticket. What we then do with the Jira
> tickets (ie, fix them) is probably more important than deciding if a test
> is flaky or not.
>
> Rather than try to come up with some flaky process for determining if a
> given test is flaky (ie, "does it have thread sleeps?"), it would be better
> to have a wiki page that has examples of flakiness and how to fix them ("if
> the test has thread sleeps, then switch to using Awaitility and do
> this...").
>
> -Kirk
>
>
> On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <ab...@pivotal.io> wrote:
>
>> Thanks Kirk!
>>
>> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . | grep -v
>> Binary | wc -l | xargs echo "Flake factor:"
>> Flake factor: 136
>>
>> Anthony
>>
>>
>> > On Apr 25, 2016, at 9:45 PM, William Markito <wm...@pivotal.io>
>> wrote:
>> >
>> > +1
>> >
>> > Are we also planning to automate the additional build task somehow ?
>> >
>> > I'd also suggest creating a wiki page with some stats (like how many
>> > FlakyTests we currently have) and the idea behind this effort so we can
>> > keep track and see how it's evolving over time.
>> >
>> > On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io> wrote:
>> >
>> >> After completing GEODE-1233, all currently known flickering tests are
>> now
>> >> annotated with our FlakyTest JUnit Category.
>> >>
>> >> In an effort to divide our build up into multiple build pipelines that
>> are
>> >> sequential and dependable, we could consider excluding FlakyTests from
>> the
>> >> primary integrationTest and distributedTest tasks. An additional build
>> task
>> >> would then execute all of the FlakyTests separately. This would
>> hopefully
>> >> help us get to a point where we can depend on our primary testing tasks
>> >> staying green 100% of the time. We would then prioritize fixing the
>> >> FlakyTests and one by one removing the FlakyTest category from them.
>> >>
>> >> I would also suggest that we execute the FlakyTests with "forkEvery 1"
>> to
>> >> give each test a clean JVM or set of DistributedTest JVMs. That would
>> >> hopefully decrease the chance of a GC pause or test pollution causing
>> >> flickering failures.
>> >>
>> >> Having reviewed lots of test code and failure stacks, I believe that
>> the
>> >> primary causes of FlakyTests are timing sensitivity (thread sleeps or
>> >> nothing that waits for async activity, timeouts or sleeps that are
>> >> insufficient on busy CPU or I/O or during due GC pause) and random
>> ports
>> >> via AvailablePort (instead of using zero for ephemeral port).
>> >>
>> >> Opinions or ideas? Hate it? Love it?
>> >>
>> >> -Kirk
>> >>
>> >
>> >
>> >
>> > --
>> >
>> > ~/William
>>
>>
>

Re: Next steps with flickering tests

Posted by Kirk Lund <kl...@pivotal.io>.

There are quite a few test classes that have multiple test methods which
are annotated with the FlakyTest category.

More thoughts:

In general, I think that if any given test fails intermittently then it is
a FlakyTest. A good test should either pass or fail consistently. After
annotating a test method with FlakyTest, the developer should then add the
Flaky label to corresponding Jira ticket. What we then do with the Jira
tickets (ie, fix them) is probably more important than deciding if a test
is flaky or not.

Rather than try to come up with some flaky process for determining if a
given test is flaky (ie, "does it have thread sleeps?"), it would be better
to have a wiki page that has examples of flakiness and how to fix them ("if
the test has thread sleeps, then switch to using Awaitility and do
this...").

-Kirk


On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <ab...@pivotal.io> wrote:

> Thanks Kirk!
>
> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . | grep -v
> Binary | wc -l | xargs echo "Flake factor:"
> Flake factor: 136
>
> Anthony
>
>
> > On Apr 25, 2016, at 9:45 PM, William Markito <wm...@pivotal.io>
> wrote:
> >
> > +1
> >
> > Are we also planning to automate the additional build task somehow ?
> >
> > I'd also suggest creating a wiki page with some stats (like how many
> > FlakyTests we currently have) and the idea behind this effort so we can
> > keep track and see how it's evolving over time.
> >
> > On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io> wrote:
> >
> >> After completing GEODE-1233, all currently known flickering tests are
> now
> >> annotated with our FlakyTest JUnit Category.
> >>
> >> In an effort to divide our build up into multiple build pipelines that
> are
> >> sequential and dependable, we could consider excluding FlakyTests from
> the
> >> primary integrationTest and distributedTest tasks. An additional build
> task
> >> would then execute all of the FlakyTests separately. This would
> hopefully
> >> help us get to a point where we can depend on our primary testing tasks
> >> staying green 100% of the time. We would then prioritize fixing the
> >> FlakyTests and one by one removing the FlakyTest category from them.
> >>
> >> I would also suggest that we execute the FlakyTests with "forkEvery 1"
> to
> >> give each test a clean JVM or set of DistributedTest JVMs. That would
> >> hopefully decrease the chance of a GC pause or test pollution causing
> >> flickering failures.
> >>
> >> Having reviewed lots of test code and failure stacks, I believe that the
> >> primary causes of FlakyTests are timing sensitivity (thread sleeps or
> >> nothing that waits for async activity, timeouts or sleeps that are
> >> insufficient on busy CPU or I/O or during due GC pause) and random ports
> >> via AvailablePort (instead of using zero for ephemeral port).
> >>
> >> Opinions or ideas? Hate it? Love it?
> >>
> >> -Kirk
> >>
> >
> >
> >
> > --
> >
> > ~/William
>
>

Re: Next steps with flickering tests

Posted by Anthony Baker <ab...@pivotal.io>.

Thanks Kirk!

~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . | grep -v Binary | wc -l | xargs echo "Flake factor:"
Flake factor: 136

Anthony


> On Apr 25, 2016, at 9:45 PM, William Markito <wm...@pivotal.io> wrote:
> 
> +1
> 
> Are we also planning to automate the additional build task somehow ?
> 
> I'd also suggest creating a wiki page with some stats (like how many
> FlakyTests we currently have) and the idea behind this effort so we can
> keep track and see how it's evolving over time.
> 
> On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io> wrote:
> 
>> After completing GEODE-1233, all currently known flickering tests are now
>> annotated with our FlakyTest JUnit Category.
>> 
>> In an effort to divide our build up into multiple build pipelines that are
>> sequential and dependable, we could consider excluding FlakyTests from the
>> primary integrationTest and distributedTest tasks. An additional build task
>> would then execute all of the FlakyTests separately. This would hopefully
>> help us get to a point where we can depend on our primary testing tasks
>> staying green 100% of the time. We would then prioritize fixing the
>> FlakyTests and one by one removing the FlakyTest category from them.
>> 
>> I would also suggest that we execute the FlakyTests with "forkEvery 1" to
>> give each test a clean JVM or set of DistributedTest JVMs. That would
>> hopefully decrease the chance of a GC pause or test pollution causing
>> flickering failures.
>> 
>> Having reviewed lots of test code and failure stacks, I believe that the
>> primary causes of FlakyTests are timing sensitivity (thread sleeps or
>> nothing that waits for async activity, timeouts or sleeps that are
>> insufficient on busy CPU or I/O or during due GC pause) and random ports
>> via AvailablePort (instead of using zero for ephemeral port).
>> 
>> Opinions or ideas? Hate it? Love it?
>> 
>> -Kirk
>> 
> 
> 
> 
> --
> 
> ~/William

Re: Next steps with flickering tests

Posted by William Markito <wm...@pivotal.io>.

+1

Are we also planning to automate the additional build task somehow ?

I'd also suggest creating a wiki page with some stats (like how many
FlakyTests we currently have) and the idea behind this effort so we can
keep track and see how it's evolving over time.

On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <kl...@pivotal.io> wrote:

> After completing GEODE-1233, all currently known flickering tests are now
> annotated with our FlakyTest JUnit Category.
>
> In an effort to divide our build up into multiple build pipelines that are
> sequential and dependable, we could consider excluding FlakyTests from the
> primary integrationTest and distributedTest tasks. An additional build task
> would then execute all of the FlakyTests separately. This would hopefully
> help us get to a point where we can depend on our primary testing tasks
> staying green 100% of the time. We would then prioritize fixing the
> FlakyTests and one by one removing the FlakyTest category from them.
>
> I would also suggest that we execute the FlakyTests with "forkEvery 1" to
> give each test a clean JVM or set of DistributedTest JVMs. That would
> hopefully decrease the chance of a GC pause or test pollution causing
> flickering failures.
>
> Having reviewed lots of test code and failure stacks, I believe that the
> primary causes of FlakyTests are timing sensitivity (thread sleeps or
> nothing that waits for async activity, timeouts or sleeps that are
> insufficient on busy CPU or I/O or during due GC pause) and random ports
> via AvailablePort (instead of using zero for ephemeral port).
>
> Opinions or ideas? Hate it? Love it?
>
> -Kirk
>



-- 

~/William

Re: Next steps with flickering tests

Posted by Michael Stolz <ms...@pivotal.io>.

I love it. Isolate the bad stuff and even evaluate it's value in the big
scheme of things. Fix the important ones. Maybe rethink the rest.

--
Mike Stolz
Principal Engineer - Gemfire Product Manager
Mobile: 631-835-4771
On Apr 25, 2016 6:54 PM, "Kirk Lund" <kl...@pivotal.io> wrote:

> After completing GEODE-1233, all currently known flickering tests are now
> annotated with our FlakyTest JUnit Category.
>
> In an effort to divide our build up into multiple build pipelines that are
> sequential and dependable, we could consider excluding FlakyTests from the
> primary integrationTest and distributedTest tasks. An additional build task
> would then execute all of the FlakyTests separately. This would hopefully
> help us get to a point where we can depend on our primary testing tasks
> staying green 100% of the time. We would then prioritize fixing the
> FlakyTests and one by one removing the FlakyTest category from them.
>
> I would also suggest that we execute the FlakyTests with "forkEvery 1" to
> give each test a clean JVM or set of DistributedTest JVMs. That would
> hopefully decrease the chance of a GC pause or test pollution causing
> flickering failures.
>
> Having reviewed lots of test code and failure stacks, I believe that the
> primary causes of FlakyTests are timing sensitivity (thread sleeps or
> nothing that waits for async activity, timeouts or sleeps that are
> insufficient on busy CPU or I/O or during due GC pause) and random ports
> via AvailablePort (instead of using zero for ephemeral port).
>
> Opinions or ideas? Hate it? Love it?
>
> -Kirk
>