You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Ekaterina Dimitrova <ek...@datastax.com> on 2020/05/27 21:13:18 UTC

[DISCUSSION] Flaky tests

Dear all,
I spent some time these days looking into the Release Lifecycle document.
As we keep on saying we approach Beta based on the Jira board, I was
curious what is the exact borderline to cut it.

Looking at all the latest reports (thanks to everyone who was working on
that; I think having an overview on what's going on is always a good
thing), I have the feeling that the thing that prevents us primarily from
cutting beta at the moment is flaky tests. According to the lifecycle
document:

   - No flaky tests - All tests (Unit Tests and DTests) should pass
   consistently. A failing test, upon analyzing the root cause of failure, may
   be “ignored in exceptional cases”, if appropriate, for the release, after
   discussion in the dev mailing list."

 Now the related questions that popped up into my mind:
- "ignored in exceptional cases" - examples?
- No flaky tests according to Jenkins or CircleCI? Also, some people run
the free tier, others take advantage of premium CircleCI. What should be
the framework?
- Furthermore, flaky tests with what frequency? (This is a tricky question,
I know)

In different conversations with colleagues from the C* community I got the
impression that canonical suite (in this case Jenkins) might be the right
direction to follow.

To be clear, I am always checking any failures seen in any environment and
I truly believe that they are worth it to be checked. Not advocating to
skip anything!  But also, sometimes I feel in many cases CircleCI could
provide input worth tracking but less likely to be product flakes. Am I
right? In addition, different people use different CircleCI config and see
different output. Not to mention flaky tests on Mac running with two
cores... Yes, this is sometimes the only way to reproduce some of the
reported tests' issues...

So my idea was to suggest to start tracking an exact Jenkins report maybe?
Anything reported out of it also to be checked but potentially to be able
to leave it for Beta in case we don't feel it shows a product defect. One
more thing to consider is that the big Test epic is primarily happening in
beta.

Curious to hear what the community thinks about this topic. Probably people
also have additional thoughts based on experience from the previous
releases. How those things worked in the past? Any lessons learned? What is
our "plan Beta"?

Ekaterina Dimitrova
e. ekaterina.dimitrova@datastax.com
w. www.datastax.com

Re: [DISCUSSION] Flaky tests

Posted by Joshua McKenzie <jm...@apache.org>.

Agree re: 15299.

This thread is about pushing out flaky tests and what we define as that
cohort as I understand it.

On Thu, May 28, 2020 at 7:59 AM Ekaterina Dimitrova <e....@gmail.com>
wrote:

> CASSANDRA-15299  - All interface-related still open tickets are blockers.
> My point was that they are already just a few, looking into Jira. So except
> them, flaky tests are really a thing that requires attention.
>
> Also, I agree with Mick that it’s good to have a plan and opened Jira
> tickets earlier than later.
>
> On Thu, 28 May 2020 at 5:27, Sam Tunnicliffe <sa...@beobal.com> wrote:
>
> > > I have the feeling that the thing that prevents us primarily from
> > cutting beta at the moment is flaky tests
> >
> > CASSANDRA-15299 is still in progress and I think we have to consider it a
> > blocker, given that beta "should be interface-stable, so that consumers
> do
> > not have to incur any code changes on their end, as the release
> progresses
> > from Alpha through EOL."
> >
> >
> > > On 28 May 2020, at 01:23, Joshua McKenzie <jm...@apache.org>
> wrote:
> > >
> > >>
> > >> So my idea was to suggest to start tracking an exact Jenkins report
> > maybe?
> > >
> > > Basing our point of view on the canonical test runs on apache infra
> makes
> > > sense to me, assuming that infra is behaving these days. :) Pretty sure
> > > Mick got that in working order.
> > >
> > > At least for me, what I learned in the past is we'd drive to a green
> test
> > > board and immediately transition it as a milestone, so flaky tests
> would
> > > reappear like a disappointing game of whack-a-mole. They seem
> > frustratingly
> > > ever-present.
> > >
> > > I'd personally advocate for us taking the following stance on flaky
> tests
> > > from this point in the cycle forward:
> > >
> > >   - Default posture to label fix version as beta
> > >   - *excepting* on case-by-case basis, if flake could imply product
> > defect
> > >   that would greatly impair beta testing we leave alpha
> > >   - Take current flakes and go fixver beta
> > >   - Hard, no compromise position on "we don't RC until all flakes are
> > dead"
> > >   - Use Jenkins as canonical source of truth for "is beta ready" cutoff
> > >
> > > I'm personally balancing the risk of flaky tests confounding beta work
> > > against my perceived value of being able to widely signal beta's
> > > availability and encourage widespread user testing. I believe the value
> > in
> > > the latter justifies the risk of the former (I currently perceive that
> > risk
> > > as minimal; I could be wrong). I am also weighting the risk of "test
> > > failures persist to or past RC" at 0. That's a hill I'll die on.
> > >
> > >
> > > On Wed, May 27, 2020 at 5:13 PM Ekaterina Dimitrova <
> > > ekaterina.dimitrova@datastax.com> wrote:
> > >
> > >> Dear all,
> > >> I spent some time these days looking into the Release Lifecycle
> > document.
> > >> As we keep on saying we approach Beta based on the Jira board, I was
> > >> curious what is the exact borderline to cut it.
> > >>
> > >> Looking at all the latest reports (thanks to everyone who was working
> on
> > >> that; I think having an overview on what's going on is always a good
> > >> thing), I have the feeling that the thing that prevents us primarily
> > from
> > >> cutting beta at the moment is flaky tests. According to the lifecycle
> > >> document:
> > >>
> > >>   - No flaky tests - All tests (Unit Tests and DTests) should pass
> > >>   consistently. A failing test, upon analyzing the root cause of
> > failure,
> > >> may
> > >>   be “ignored in exceptional cases”, if appropriate, for the release,
> > >> after
> > >>   discussion in the dev mailing list."
> > >>
> > >> Now the related questions that popped up into my mind:
> > >> - "ignored in exceptional cases" - examples?
> > >> - No flaky tests according to Jenkins or CircleCI? Also, some people
> run
> > >> the free tier, others take advantage of premium CircleCI. What should
> be
> > >> the framework?
> > >> - Furthermore, flaky tests with what frequency? (This is a tricky
> > question,
> > >> I know)
> > >>
> > >> In different conversations with colleagues from the C* community I got
> > the
> > >> impression that canonical suite (in this case Jenkins) might be the
> > right
> > >> direction to follow.
> > >>
> > >> To be clear, I am always checking any failures seen in any environment
> > and
> > >> I truly believe that they are worth it to be checked. Not advocating
> to
> > >> skip anything!  But also, sometimes I feel in many cases CircleCI
> could
> > >> provide input worth tracking but less likely to be product flakes. Am
> I
> > >> right? In addition, different people use different CircleCI config and
> > see
> > >> different output. Not to mention flaky tests on Mac running with two
> > >> cores... Yes, this is sometimes the only way to reproduce some of the
> > >> reported tests' issues...
> > >>
> > >> So my idea was to suggest to start tracking an exact Jenkins report
> > maybe?
> > >> Anything reported out of it also to be checked but potentially to be
> > able
> > >> to leave it for Beta in case we don't feel it shows a product defect.
> > One
> > >> more thing to consider is that the big Test epic is primarily
> happening
> > in
> > >> beta.
> > >>
> > >> Curious to hear what the community thinks about this topic. Probably
> > people
> > >> also have additional thoughts based on experience from the previous
> > >> releases. How those things worked in the past? Any lessons learned?
> > What is
> > >> our "plan Beta"?
> > >>
> > >> Ekaterina Dimitrova
> > >> e. ekaterina.dimitrova@datastax.com
> > >> w. www.datastax.com
> > >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>

Re: [DISCUSSION] Flaky tests

Posted by Ekaterina Dimitrova <e....@gmail.com>.

CASSANDRA-15299  - All interface-related still open tickets are blockers.
My point was that they are already just a few, looking into Jira. So except
them, flaky tests are really a thing that requires attention.

Also, I agree with Mick that it’s good to have a plan and opened Jira
tickets earlier than later.

On Thu, 28 May 2020 at 5:27, Sam Tunnicliffe <sa...@beobal.com> wrote:

> > I have the feeling that the thing that prevents us primarily from
> cutting beta at the moment is flaky tests
>
> CASSANDRA-15299 is still in progress and I think we have to consider it a
> blocker, given that beta "should be interface-stable, so that consumers do
> not have to incur any code changes on their end, as the release progresses
> from Alpha through EOL."
>
>
> > On 28 May 2020, at 01:23, Joshua McKenzie <jm...@apache.org> wrote:
> >
> >>
> >> So my idea was to suggest to start tracking an exact Jenkins report
> maybe?
> >
> > Basing our point of view on the canonical test runs on apache infra makes
> > sense to me, assuming that infra is behaving these days. :) Pretty sure
> > Mick got that in working order.
> >
> > At least for me, what I learned in the past is we'd drive to a green test
> > board and immediately transition it as a milestone, so flaky tests would
> > reappear like a disappointing game of whack-a-mole. They seem
> frustratingly
> > ever-present.
> >
> > I'd personally advocate for us taking the following stance on flaky tests
> > from this point in the cycle forward:
> >
> >   - Default posture to label fix version as beta
> >   - *excepting* on case-by-case basis, if flake could imply product
> defect
> >   that would greatly impair beta testing we leave alpha
> >   - Take current flakes and go fixver beta
> >   - Hard, no compromise position on "we don't RC until all flakes are
> dead"
> >   - Use Jenkins as canonical source of truth for "is beta ready" cutoff
> >
> > I'm personally balancing the risk of flaky tests confounding beta work
> > against my perceived value of being able to widely signal beta's
> > availability and encourage widespread user testing. I believe the value
> in
> > the latter justifies the risk of the former (I currently perceive that
> risk
> > as minimal; I could be wrong). I am also weighting the risk of "test
> > failures persist to or past RC" at 0. That's a hill I'll die on.
> >
> >
> > On Wed, May 27, 2020 at 5:13 PM Ekaterina Dimitrova <
> > ekaterina.dimitrova@datastax.com> wrote:
> >
> >> Dear all,
> >> I spent some time these days looking into the Release Lifecycle
> document.
> >> As we keep on saying we approach Beta based on the Jira board, I was
> >> curious what is the exact borderline to cut it.
> >>
> >> Looking at all the latest reports (thanks to everyone who was working on
> >> that; I think having an overview on what's going on is always a good
> >> thing), I have the feeling that the thing that prevents us primarily
> from
> >> cutting beta at the moment is flaky tests. According to the lifecycle
> >> document:
> >>
> >>   - No flaky tests - All tests (Unit Tests and DTests) should pass
> >>   consistently. A failing test, upon analyzing the root cause of
> failure,
> >> may
> >>   be “ignored in exceptional cases”, if appropriate, for the release,
> >> after
> >>   discussion in the dev mailing list."
> >>
> >> Now the related questions that popped up into my mind:
> >> - "ignored in exceptional cases" - examples?
> >> - No flaky tests according to Jenkins or CircleCI? Also, some people run
> >> the free tier, others take advantage of premium CircleCI. What should be
> >> the framework?
> >> - Furthermore, flaky tests with what frequency? (This is a tricky
> question,
> >> I know)
> >>
> >> In different conversations with colleagues from the C* community I got
> the
> >> impression that canonical suite (in this case Jenkins) might be the
> right
> >> direction to follow.
> >>
> >> To be clear, I am always checking any failures seen in any environment
> and
> >> I truly believe that they are worth it to be checked. Not advocating to
> >> skip anything!  But also, sometimes I feel in many cases CircleCI could
> >> provide input worth tracking but less likely to be product flakes. Am I
> >> right? In addition, different people use different CircleCI config and
> see
> >> different output. Not to mention flaky tests on Mac running with two
> >> cores... Yes, this is sometimes the only way to reproduce some of the
> >> reported tests' issues...
> >>
> >> So my idea was to suggest to start tracking an exact Jenkins report
> maybe?
> >> Anything reported out of it also to be checked but potentially to be
> able
> >> to leave it for Beta in case we don't feel it shows a product defect.
> One
> >> more thing to consider is that the big Test epic is primarily happening
> in
> >> beta.
> >>
> >> Curious to hear what the community thinks about this topic. Probably
> people
> >> also have additional thoughts based on experience from the previous
> >> releases. How those things worked in the past? Any lessons learned?
> What is
> >> our "plan Beta"?
> >>
> >> Ekaterina Dimitrova
> >> e. ekaterina.dimitrova@datastax.com
> >> w. www.datastax.com
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSSION] Flaky tests

Posted by Sam Tunnicliffe <sa...@beobal.com>.

> I have the feeling that the thing that prevents us primarily from
cutting beta at the moment is flaky tests

CASSANDRA-15299 is still in progress and I think we have to consider it a blocker, given that beta "should be interface-stable, so that consumers do not have to incur any code changes on their end, as the release progresses from Alpha through EOL."


> On 28 May 2020, at 01:23, Joshua McKenzie <jm...@apache.org> wrote:
> 
>> 
>> So my idea was to suggest to start tracking an exact Jenkins report maybe?
> 
> Basing our point of view on the canonical test runs on apache infra makes
> sense to me, assuming that infra is behaving these days. :) Pretty sure
> Mick got that in working order.
> 
> At least for me, what I learned in the past is we'd drive to a green test
> board and immediately transition it as a milestone, so flaky tests would
> reappear like a disappointing game of whack-a-mole. They seem frustratingly
> ever-present.
> 
> I'd personally advocate for us taking the following stance on flaky tests
> from this point in the cycle forward:
> 
>   - Default posture to label fix version as beta
>   - *excepting* on case-by-case basis, if flake could imply product defect
>   that would greatly impair beta testing we leave alpha
>   - Take current flakes and go fixver beta
>   - Hard, no compromise position on "we don't RC until all flakes are dead"
>   - Use Jenkins as canonical source of truth for "is beta ready" cutoff
> 
> I'm personally balancing the risk of flaky tests confounding beta work
> against my perceived value of being able to widely signal beta's
> availability and encourage widespread user testing. I believe the value in
> the latter justifies the risk of the former (I currently perceive that risk
> as minimal; I could be wrong). I am also weighting the risk of "test
> failures persist to or past RC" at 0. That's a hill I'll die on.
> 
> 
> On Wed, May 27, 2020 at 5:13 PM Ekaterina Dimitrova <
> ekaterina.dimitrova@datastax.com> wrote:
> 
>> Dear all,
>> I spent some time these days looking into the Release Lifecycle document.
>> As we keep on saying we approach Beta based on the Jira board, I was
>> curious what is the exact borderline to cut it.
>> 
>> Looking at all the latest reports (thanks to everyone who was working on
>> that; I think having an overview on what's going on is always a good
>> thing), I have the feeling that the thing that prevents us primarily from
>> cutting beta at the moment is flaky tests. According to the lifecycle
>> document:
>> 
>>   - No flaky tests - All tests (Unit Tests and DTests) should pass
>>   consistently. A failing test, upon analyzing the root cause of failure,
>> may
>>   be “ignored in exceptional cases”, if appropriate, for the release,
>> after
>>   discussion in the dev mailing list."
>> 
>> Now the related questions that popped up into my mind:
>> - "ignored in exceptional cases" - examples?
>> - No flaky tests according to Jenkins or CircleCI? Also, some people run
>> the free tier, others take advantage of premium CircleCI. What should be
>> the framework?
>> - Furthermore, flaky tests with what frequency? (This is a tricky question,
>> I know)
>> 
>> In different conversations with colleagues from the C* community I got the
>> impression that canonical suite (in this case Jenkins) might be the right
>> direction to follow.
>> 
>> To be clear, I am always checking any failures seen in any environment and
>> I truly believe that they are worth it to be checked. Not advocating to
>> skip anything!  But also, sometimes I feel in many cases CircleCI could
>> provide input worth tracking but less likely to be product flakes. Am I
>> right? In addition, different people use different CircleCI config and see
>> different output. Not to mention flaky tests on Mac running with two
>> cores... Yes, this is sometimes the only way to reproduce some of the
>> reported tests' issues...
>> 
>> So my idea was to suggest to start tracking an exact Jenkins report maybe?
>> Anything reported out of it also to be checked but potentially to be able
>> to leave it for Beta in case we don't feel it shows a product defect. One
>> more thing to consider is that the big Test epic is primarily happening in
>> beta.
>> 
>> Curious to hear what the community thinks about this topic. Probably people
>> also have additional thoughts based on experience from the previous
>> releases. How those things worked in the past? Any lessons learned? What is
>> our "plan Beta"?
>> 
>> Ekaterina Dimitrova
>> e. ekaterina.dimitrova@datastax.com
>> w. www.datastax.com
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSSION] Flaky tests

Posted by Andrés de la Peña <a....@gmail.com>.

I don't know how hard to do it would be, but I think it would be great to
have a CI job to repeatedly run a specific test, so we can check if it's
flaky. We could systematically run it for new tests, or for existing tests
related to what we are modifying.

Using this CI job to run a suspicious test a few dozen times is probably
going to be quicker, safer and cheaper than re-running the entire suite.

That's not going to absolutely probe that a test is not flaky in every
environment, but probably it would reduce the risk of introducing new flaky
tests.

On Thu, 28 May 2020 at 22:26, David Capwell <dc...@apple.com.invalid>
wrote:

> > - No flaky tests according to Jenkins or CircleCI? Also, some people run
> > the free tier, others take advantage of premium CircleCI. What should be
> > the framework?
>
> It would be good to have a common understanding of this; my current mental
> model is
>
> 1) Jenkins
> 2) Circle CI Free tear unit tests (including in-jvm dtests)
> 3) Circle CI paid tear python dtest
>
> > - "ignored in exceptional cases" - examples?
>
>
> I personally don’t classify a test as flaky if the CI environment is at
> fault, simple example could be bad disk causing tests to fail.  In this
> example, actions should be taken to fix the CI environment, but if the
> tests pass in another environment I am fine moving on and not blocking a
> release.
>
> > I got the impression that canonical suite (in this case Jenkins) might
> be the right direction to follow.
>
>
> I agree that Jenkins must be a source of input, but don’t think it should
> be the only one at this moment; currently Circle CI produces more builds of
> Cassandra than Jenkins, so ignoring tests failures there causes a more
> unstable environment for development and hide the fact that Jenkins will
> also see the issue.  There are also gaps with Jenkins coverage which hide
> things such as lack of java 11 support and that tests fail more often on
> java 11.
>
> > But also, sometimes I feel in many cases CircleCI could provide input
> worth tracking but less likely to be product flakes
>
>
> Since Circle CI runs more builds than Jenkins, we are more likely to see
> flaky tests there than Jenkins.
>
> > Not to mention flaky tests on Mac running with two cores... Yes, this is
> sometimes the only way to reproduce some of the reported tests' issues...
>
>
> I am not aware of anyone opening JIRAs based off this, only using this
> method to reproduce issues found in CI.  I started using this method to
> help quickly reproduce race condition bugs found in CI such as nodetool
> reporting repairs as success when they were actually failed, and one case
> you are working on where preview repair conflicts with a non-committed IR
> participant even though we reported commit to users (both cases are valid
> bugs found in CI).
>
> > So my idea was to suggest to start tracking an exact Jenkins report maybe
>
>
> Better visibility is great!  Mick has been setup Slack/Email
> notifications, but maybe a summary in the 4.0 report would be great to
> enhance visibility to all?
>
> > checked but potentially to be able to leave it for Beta in case we don't
> feel it shows a product defect
>
> Based off
> https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle <
> https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle>
> flaky tests block beta releases, so need to happen before then.  What do
> you mean by “leave it for Beta”? Right now we label alpha but don’t block
> alpha releases on flaky tests, given this I don’t follow this statement,
> could you explain more?
>
> >> At least for me, what I learned in the past is we'd drive to a green
> test
> >> board and immediately transition it as a milestone, so flaky tests would
> >> reappear like a disappointing game of whack-a-mole. They seem
> frustratingly
> >> ever-present.
>
> How I see the document, all definitions/expectations from previous phases
> hold true for later stages. Right now the document says we can not cut
> beta1 until flaky tests are resolved, but this would also mean beta2+, rc+,
> etc; how I internalize this is that pre-beta1+, flaky tests are not
> allowed, so we don’t immediately transition away from this.
>
> One trend I have noticed in Cassandra is a lack of trust in tests caused
> by the fact that unrelated failing builds are common; what then happens is
> the author/reviewer ignore the new failing test, write it off as a flaky
> test, commit, and cause more tests to fail. Since testing can be skipped
> pre-commit, and failing tests can be ignored, it puts us in a state that
> new regressions pop up after commit; by having the flaky-tests as a guard
> against release it causes a forcing function to stay stable as long as
> possible.
>
> >> Default posture to label fix version as beta
>
> Can you explain what you mean by this? Currently we don’t block alpha
> releases on flaky tests even though they are marked alpha, are you
> proposing we don’t block beta releases on flaky tests or are you suggesting
> we label them beta to better match the doc and keep them beta release
> blockers?
>
> >>> Also, I agree with Mick that it’s good to have a plan and opened Jira
> tickets earlier than later.
>
> +1
>
> > On May 28, 2020, at 10:02 AM, Joshua McKenzie <jm...@apache.org>
> wrote:
> >
> > Good point Jordan re: flaky test being either implying API instability or
> > blocker to ability to beta test.
> >
> >
> > On Thu, May 28, 2020 at 12:56 PM Jordan West <jw...@apache.org> wrote:
> >
> >>> On Wed, May 27, 2020 at 5:13 PM Ekaterina Dimitrova <
> >>> ekaterina.dimitrova@datastax.com> wrote:
> >>
> >>> - No flaky tests according to Jenkins or CircleCI? Also, some people
> run
> >>>> the free tier, others take advantage of premium CircleCI. What should
> >> be
> >>>> the framework?
> >>
> >>
> >> While I agree that we should use the Apache infrastructure as the
> canonical
> >> infrastructure, failures in both (or any) environment matter when it
> comes
> >> to flaky tests.
> >>
> >> On Wed, May 27, 2020 at 5:23 PM Joshua McKenzie <jm...@apache.org>
> >> wrote:
> >>
> >>>
> >>> At least for me, what I learned in the past is we'd drive to a green
> test
> >>> board and immediately transition it as a milestone, so flaky tests
> would
> >>> reappear like a disappointing game of whack-a-mole. They seem
> >> frustratingly
> >>> ever-present.
> >>>
> >>>
> >> Agreed. Having multiple successive green runs would be a better bar than
> >> one on a single platform imo.
> >>
> >>
> >>> I'd personally advocate for us taking the following stance on flaky
> tests
> >>> from this point in the cycle forward:
> >>>
> >>>   - Default posture to label fix version as beta
> >>>   - *excepting* on case-by-case basis, if flake could imply product
> >> defect
> >>>   that would greatly impair beta testing we leave alpha
> >>>
> >>
> >> I would be in favor of tightening this further to flakes that imply
> >> interface changes or major defects (e.g. corruption, data loss, etc).
> To do
> >> so would require evaluation of the flaky test, however, which I think
> is in
> >> sync with our "start in alpha and make exceptions to move to beta". The
> >> difference would be that we better define and widen what flaky tests
> can be
> >> punted to beta and my guess is we could already evaluate all outstanding
> >> flaky test tickets by that bar.
> >>
> >> Jordan
> >>
>
>

Re: [DISCUSSION] Flaky tests

Posted by David Capwell <dc...@apple.com.INVALID>.

> - No flaky tests according to Jenkins or CircleCI? Also, some people run
> the free tier, others take advantage of premium CircleCI. What should be
> the framework?

It would be good to have a common understanding of this; my current mental model is

1) Jenkins
2) Circle CI Free tear unit tests (including in-jvm dtests)
3) Circle CI paid tear python dtest

> - "ignored in exceptional cases" - examples?

I personally don’t classify a test as flaky if the CI environment is at fault, simple example could be bad disk causing tests to fail.  In this example, actions should be taken to fix the CI environment, but if the tests pass in another environment I am fine moving on and not blocking a release.

> I got the impression that canonical suite (in this case Jenkins) might be the right direction to follow.

I agree that Jenkins must be a source of input, but don’t think it should be the only one at this moment; currently Circle CI produces more builds of Cassandra than Jenkins, so ignoring tests failures there causes a more unstable environment for development and hide the fact that Jenkins will also see the issue.  There are also gaps with Jenkins coverage which hide things such as lack of java 11 support and that tests fail more often on java 11.

> But also, sometimes I feel in many cases CircleCI could provide input worth tracking but less likely to be product flakes

Since Circle CI runs more builds than Jenkins, we are more likely to see flaky tests there than Jenkins.

> Not to mention flaky tests on Mac running with two cores... Yes, this is sometimes the only way to reproduce some of the reported tests' issues...

I am not aware of anyone opening JIRAs based off this, only using this method to reproduce issues found in CI.  I started using this method to help quickly reproduce race condition bugs found in CI such as nodetool reporting repairs as success when they were actually failed, and one case you are working on where preview repair conflicts with a non-committed IR participant even though we reported commit to users (both cases are valid bugs found in CI).

> So my idea was to suggest to start tracking an exact Jenkins report maybe

Better visibility is great!  Mick has been setup Slack/Email notifications, but maybe a summary in the 4.0 report would be great to enhance visibility to all?

> checked but potentially to be able to leave it for Beta in case we don't feel it shows a product defect

Based off https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle <https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle> flaky tests block beta releases, so need to happen before then.  What do you mean by “leave it for Beta”? Right now we label alpha but don’t block alpha releases on flaky tests, given this I don’t follow this statement, could you explain more?

>> At least for me, what I learned in the past is we'd drive to a green test
>> board and immediately transition it as a milestone, so flaky tests would
>> reappear like a disappointing game of whack-a-mole. They seem frustratingly
>> ever-present.

How I see the document, all definitions/expectations from previous phases hold true for later stages. Right now the document says we can not cut beta1 until flaky tests are resolved, but this would also mean beta2+, rc+, etc; how I internalize this is that pre-beta1+, flaky tests are not allowed, so we don’t immediately transition away from this.

One trend I have noticed in Cassandra is a lack of trust in tests caused by the fact that unrelated failing builds are common; what then happens is the author/reviewer ignore the new failing test, write it off as a flaky test, commit, and cause more tests to fail. Since testing can be skipped pre-commit, and failing tests can be ignored, it puts us in a state that new regressions pop up after commit; by having the flaky-tests as a guard against release it causes a forcing function to stay stable as long as possible.

>> Default posture to label fix version as beta

Can you explain what you mean by this? Currently we don’t block alpha releases on flaky tests even though they are marked alpha, are you proposing we don’t block beta releases on flaky tests or are you suggesting we label them beta to better match the doc and keep them beta release blockers?

>>> Also, I agree with Mick that it’s good to have a plan and opened Jira tickets earlier than later.

+1

> On May 28, 2020, at 10:02 AM, Joshua McKenzie <jm...@apache.org> wrote:
> 
> Good point Jordan re: flaky test being either implying API instability or
> blocker to ability to beta test.
> 
> 
> On Thu, May 28, 2020 at 12:56 PM Jordan West <jw...@apache.org> wrote:
> 
>>> On Wed, May 27, 2020 at 5:13 PM Ekaterina Dimitrova <
>>> ekaterina.dimitrova@datastax.com> wrote:
>> 
>>> - No flaky tests according to Jenkins or CircleCI? Also, some people run
>>>> the free tier, others take advantage of premium CircleCI. What should
>> be
>>>> the framework?
>> 
>> 
>> While I agree that we should use the Apache infrastructure as the canonical
>> infrastructure, failures in both (or any) environment matter when it comes
>> to flaky tests.
>> 
>> On Wed, May 27, 2020 at 5:23 PM Joshua McKenzie <jm...@apache.org>
>> wrote:
>> 
>>> 
>>> At least for me, what I learned in the past is we'd drive to a green test
>>> board and immediately transition it as a milestone, so flaky tests would
>>> reappear like a disappointing game of whack-a-mole. They seem
>> frustratingly
>>> ever-present.
>>> 
>>> 
>> Agreed. Having multiple successive green runs would be a better bar than
>> one on a single platform imo.
>> 
>> 
>>> I'd personally advocate for us taking the following stance on flaky tests
>>> from this point in the cycle forward:
>>> 
>>>   - Default posture to label fix version as beta
>>>   - *excepting* on case-by-case basis, if flake could imply product
>> defect
>>>   that would greatly impair beta testing we leave alpha
>>> 
>> 
>> I would be in favor of tightening this further to flakes that imply
>> interface changes or major defects (e.g. corruption, data loss, etc). To do
>> so would require evaluation of the flaky test, however, which I think is in
>> sync with our "start in alpha and make exceptions to move to beta". The
>> difference would be that we better define and widen what flaky tests can be
>> punted to beta and my guess is we could already evaluate all outstanding
>> flaky test tickets by that bar.
>> 
>> Jordan
>>

Re: [DISCUSSION] Flaky tests

Posted by Joshua McKenzie <jm...@apache.org>.

Good point Jordan re: flaky test being either implying API instability or
blocker to ability to beta test.


On Thu, May 28, 2020 at 12:56 PM Jordan West <jw...@apache.org> wrote:

> > On Wed, May 27, 2020 at 5:13 PM Ekaterina Dimitrova <
> > ekaterina.dimitrova@datastax.com> wrote:
>
> > - No flaky tests according to Jenkins or CircleCI? Also, some people run
> > > the free tier, others take advantage of premium CircleCI. What should
> be
> > > the framework?
>
>
> While I agree that we should use the Apache infrastructure as the canonical
> infrastructure, failures in both (or any) environment matter when it comes
> to flaky tests.
>
> On Wed, May 27, 2020 at 5:23 PM Joshua McKenzie <jm...@apache.org>
> wrote:
>
> >
> > At least for me, what I learned in the past is we'd drive to a green test
> > board and immediately transition it as a milestone, so flaky tests would
> > reappear like a disappointing game of whack-a-mole. They seem
> frustratingly
> > ever-present.
> >
> >
> Agreed. Having multiple successive green runs would be a better bar than
> one on a single platform imo.
>
>
> > I'd personally advocate for us taking the following stance on flaky tests
> > from this point in the cycle forward:
> >
> >    - Default posture to label fix version as beta
> >    - *excepting* on case-by-case basis, if flake could imply product
> defect
> >    that would greatly impair beta testing we leave alpha
> >
>
> I would be in favor of tightening this further to flakes that imply
> interface changes or major defects (e.g. corruption, data loss, etc). To do
> so would require evaluation of the flaky test, however, which I think is in
> sync with our "start in alpha and make exceptions to move to beta". The
> difference would be that we better define and widen what flaky tests can be
> punted to beta and my guess is we could already evaluate all outstanding
> flaky test tickets by that bar.
>
> Jordan
>

Re: [DISCUSSION] Flaky tests

Posted by Jordan West <jw...@apache.org>.

> On Wed, May 27, 2020 at 5:13 PM Ekaterina Dimitrova <
> ekaterina.dimitrova@datastax.com> wrote:

> - No flaky tests according to Jenkins or CircleCI? Also, some people run
> > the free tier, others take advantage of premium CircleCI. What should be
> > the framework?

While I agree that we should use the Apache infrastructure as the canonical
infrastructure, failures in both (or any) environment matter when it comes
to flaky tests.

On Wed, May 27, 2020 at 5:23 PM Joshua McKenzie <jm...@apache.org>
wrote:

>
> At least for me, what I learned in the past is we'd drive to a green test
> board and immediately transition it as a milestone, so flaky tests would
> reappear like a disappointing game of whack-a-mole. They seem frustratingly
> ever-present.
>
>
Agreed. Having multiple successive green runs would be a better bar than
one on a single platform imo.

> I'd personally advocate for us taking the following stance on flaky tests
> from this point in the cycle forward:
>
>    - Default posture to label fix version as beta
>    - *excepting* on case-by-case basis, if flake could imply product defect
>    that would greatly impair beta testing we leave alpha
>

I would be in favor of tightening this further to flakes that imply
interface changes or major defects (e.g. corruption, data loss, etc). To do
so would require evaluation of the flaky test, however, which I think is in
sync with our "start in alpha and make exceptions to move to beta". The
difference would be that we better define and widen what flaky tests can be
punted to beta and my guess is we could already evaluate all outstanding
flaky test tickets by that bar.

Jordan

Re: [DISCUSSION] Flaky tests

Posted by Benjamin Lerer <be...@datastax.com>.

>
> Most of these were fixed in CASSANDRA-15622
> But the remaining failures are from the use of
> `FBUtilities.getLocalAddress()` and `InetAddress.getLocalHost()`. It
> affects ci-cassandra because the agents need their public ip so the
> master can reach them.
>
> Some help with how best to fix these would be appreciated.
>

I can find some time to look into that. :-)




On Thu, May 28, 2020 at 10:48 AM Mick Semb Wever <mc...@apache.org> wrote:

> > > So my idea was to suggest to start tracking an exact Jenkins report
> maybe?
> >
> > Basing our point of view on the canonical test runs on apache infra makes
> > sense to me, assuming that infra is behaving these days. :) Pretty sure
> > Mick got that in working order.
>
>
> It's definitely closing in. Running on donated hosted hardware around
> the world has its own challenges, and there's some impl and history in
> the jenkins build stuff i'm still uncovering. And stuff that's waiting
> on other things (e.g. containerisation).  But the main branches look
> good.  That said it's also a platform that we are capable of breaking
> ourselves, now having control over the master.
>
>
> >    - Hard, no compromise position on "we don't RC until all flakes are
> dead"
>
>
> I like this.  Especially if we are good at entering flakey tests into
> jira early. As opposed to entering them all at the last minute and
> dashing hopes of the RC.
>
>
> > > In different conversations with colleagues from the C* community I got
> the
> > > impression that canonical suite (in this case Jenkins) might be the
> right
> > > direction to follow.
> > >
> > > To be clear, I am always checking any failures seen in any environment
> and
> > > I truly believe that they are worth it to be checked. Not advocating to
> > > skip anything!  But also, sometimes I feel in many cases CircleCI could
> > > provide input worth tracking but less likely to be product flakes. Am I
> > > right? In addition, different people use different CircleCI config and
> see
> > > different output. Not to mention flaky tests on Mac running with two
> > > cores... Yes, this is sometimes the only way to reproduce some of the
> > > reported tests' issues...
>
>
> One of the predominant unit test failures in Jenkins that is not in
> CircleCI, is the…
>
>     UnknownHostException: ip-X-X-X-X: ip-X-X-X-X: Name or service not known
>
> See
> https://ci-cassandra.apache.org/job/Cassandra-trunk/150/testReport/(root)/_init_/org_apache_cassandra_locator_ReplicaCollectionTest/
>
> Most of these were fixed in CASSANDRA-15622
> But the remaining failures are from the use of
> `FBUtilities.getLocalAddress()` and `InetAddress.getLocalHost()`. It
> affects ci-cassandra because the agents need their public ip so the
> master can reach them.
>
> Some help with how best to fix these would be appreciated.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSSION] Flaky tests

Posted by Mick Semb Wever <mc...@apache.org>.

> > So my idea was to suggest to start tracking an exact Jenkins report maybe?
>
> Basing our point of view on the canonical test runs on apache infra makes
> sense to me, assuming that infra is behaving these days. :) Pretty sure
> Mick got that in working order.


It's definitely closing in. Running on donated hosted hardware around
the world has its own challenges, and there's some impl and history in
the jenkins build stuff i'm still uncovering. And stuff that's waiting
on other things (e.g. containerisation).  But the main branches look
good.  That said it's also a platform that we are capable of breaking
ourselves, now having control over the master.


>    - Hard, no compromise position on "we don't RC until all flakes are dead"


I like this.  Especially if we are good at entering flakey tests into
jira early. As opposed to entering them all at the last minute and
dashing hopes of the RC.


> > In different conversations with colleagues from the C* community I got the
> > impression that canonical suite (in this case Jenkins) might be the right
> > direction to follow.
> >
> > To be clear, I am always checking any failures seen in any environment and
> > I truly believe that they are worth it to be checked. Not advocating to
> > skip anything!  But also, sometimes I feel in many cases CircleCI could
> > provide input worth tracking but less likely to be product flakes. Am I
> > right? In addition, different people use different CircleCI config and see
> > different output. Not to mention flaky tests on Mac running with two
> > cores... Yes, this is sometimes the only way to reproduce some of the
> > reported tests' issues...


One of the predominant unit test failures in Jenkins that is not in
CircleCI, is the…

    UnknownHostException: ip-X-X-X-X: ip-X-X-X-X: Name or service not known

See https://ci-cassandra.apache.org/job/Cassandra-trunk/150/testReport/(root)/_init_/org_apache_cassandra_locator_ReplicaCollectionTest/

Most of these were fixed in CASSANDRA-15622
But the remaining failures are from the use of
`FBUtilities.getLocalAddress()` and `InetAddress.getLocalHost()`. It
affects ci-cassandra because the agents need their public ip so the
master can reach them.

Some help with how best to fix these would be appreciated.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSSION] Flaky tests

Posted by Joshua McKenzie <jm...@apache.org>.

>
> So my idea was to suggest to start tracking an exact Jenkins report maybe?

Basing our point of view on the canonical test runs on apache infra makes
sense to me, assuming that infra is behaving these days. :) Pretty sure
Mick got that in working order.

At least for me, what I learned in the past is we'd drive to a green test
board and immediately transition it as a milestone, so flaky tests would
reappear like a disappointing game of whack-a-mole. They seem frustratingly
ever-present.

I'd personally advocate for us taking the following stance on flaky tests
from this point in the cycle forward:

   - Default posture to label fix version as beta
   - *excepting* on case-by-case basis, if flake could imply product defect
   that would greatly impair beta testing we leave alpha
   - Take current flakes and go fixver beta
   - Hard, no compromise position on "we don't RC until all flakes are dead"
   - Use Jenkins as canonical source of truth for "is beta ready" cutoff

I'm personally balancing the risk of flaky tests confounding beta work
against my perceived value of being able to widely signal beta's
availability and encourage widespread user testing. I believe the value in
the latter justifies the risk of the former (I currently perceive that risk
as minimal; I could be wrong). I am also weighting the risk of "test
failures persist to or past RC" at 0. That's a hill I'll die on.


On Wed, May 27, 2020 at 5:13 PM Ekaterina Dimitrova <
ekaterina.dimitrova@datastax.com> wrote:

> Dear all,
> I spent some time these days looking into the Release Lifecycle document.
> As we keep on saying we approach Beta based on the Jira board, I was
> curious what is the exact borderline to cut it.
>
> Looking at all the latest reports (thanks to everyone who was working on
> that; I think having an overview on what's going on is always a good
> thing), I have the feeling that the thing that prevents us primarily from
> cutting beta at the moment is flaky tests. According to the lifecycle
> document:
>
>    - No flaky tests - All tests (Unit Tests and DTests) should pass
>    consistently. A failing test, upon analyzing the root cause of failure,
> may
>    be “ignored in exceptional cases”, if appropriate, for the release,
> after
>    discussion in the dev mailing list."
>
>  Now the related questions that popped up into my mind:
> - "ignored in exceptional cases" - examples?
> - No flaky tests according to Jenkins or CircleCI? Also, some people run
> the free tier, others take advantage of premium CircleCI. What should be
> the framework?
> - Furthermore, flaky tests with what frequency? (This is a tricky question,
> I know)
>
> In different conversations with colleagues from the C* community I got the
> impression that canonical suite (in this case Jenkins) might be the right
> direction to follow.
>
> To be clear, I am always checking any failures seen in any environment and
> I truly believe that they are worth it to be checked. Not advocating to
> skip anything!  But also, sometimes I feel in many cases CircleCI could
> provide input worth tracking but less likely to be product flakes. Am I
> right? In addition, different people use different CircleCI config and see
> different output. Not to mention flaky tests on Mac running with two
> cores... Yes, this is sometimes the only way to reproduce some of the
> reported tests' issues...
>
> So my idea was to suggest to start tracking an exact Jenkins report maybe?
> Anything reported out of it also to be checked but potentially to be able
> to leave it for Beta in case we don't feel it shows a product defect. One
> more thing to consider is that the big Test epic is primarily happening in
> beta.
>
> Curious to hear what the community thinks about this topic. Probably people
> also have additional thoughts based on experience from the previous
> releases. How those things worked in the past? Any lessons learned? What is
> our "plan Beta"?
>
> Ekaterina Dimitrova
> e. ekaterina.dimitrova@datastax.com
> w. www.datastax.com
>