You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Joshua McKenzie <jm...@apache.org> on 2021/09/09 15:31:59 UTC

Keeping on top of test failures

(Taking #cassandra-dev slack chat to here)

For context, we have a long history of an ebb and flow of flaky test
failures building up and getting burned down, but don't really have a
workflow or discipline around having a clean snapshot of where we are or
attempting to stay at some kind of steady state. We have thousands of tests
executing in a wide variety of environments: this state is to be expected,
but I argue needs to be actively managed so we don't get into the kind of
situation we did with 4.0 again.

I threw together a couple of JIRA queries that paint a pretty navigable
picture IMO:

Total JIRA for test failures:
https://issues.apache.org/jira/issues/?filter=12350869&jql=project%20%3D%20Cassandra%20AND%20resolution%20%3D%20unresolved%20AND%20(summary%20~%20flaky%20OR%20summary%20~%20test%20OR%20component%20%3D%20%22Test%2Funit%22)%20AND%20type%20%3D%20bug%20AND%20issuekey%20not%20in%20(CASSANDRA-16010%2C%20CASSANDRA-16024%2C%20CASSANDRA-16022%2C%20CASSANDRA-16021%2C%20CASSANDRA-16025%2C%20CASSANDRA-16023)%20AND%20summary%20!~%20hardening%20ORDER%20BY%20cf%5B12313825%5D%20ASC
(sorry for the URL) - 112 failures

# of failures more recent than 6 months:
https://issues.apache.org/jira/issues/?filter=12350869
10 failures.

In the interest of tidying this up and staying on top of it going forward,
I propose the following:
1. We close as won't fix all test failures created >= 6 months ago (We had
a big push for 4.0 and a lot of this JIRA content is stale)
2. We switch the "Bug Category" for these 10 more recent to "Correctness -
Test Failure"
3. We document a "canonical" workflow around test failures that links to a
saved JIRA filter query that includes:
4. When you're working on something and you see a test failure that isn't
related to your patch, check that filter, see if the test name is there,
and if not create a new ticket w/that Bug Category

In theory this should give us a single source of truth for documented test
failures as well as an entry point for new contributors.

Thoughts?

~Josh

Re: Keeping on top of test failures

Posted by Ekaterina Dimitrova <e....@gmail.com>.

Same here, +1, thank you!

On Thu, 9 Sep 2021 at 11:35, Benjamin Lerer <bl...@apache.org> wrote:

> Thanks for the proposal Josh.
> It sounds good to me.
>
> Le jeu. 9 sept. 2021 à 17:32, Joshua McKenzie <jm...@apache.org> a
> écrit :
>
> > (Taking #cassandra-dev slack chat to here)
> >
> > For context, we have a long history of an ebb and flow of flaky test
> > failures building up and getting burned down, but don't really have a
> > workflow or discipline around having a clean snapshot of where we are or
> > attempting to stay at some kind of steady state. We have thousands of
> tests
> > executing in a wide variety of environments: this state is to be
> expected,
> > but I argue needs to be actively managed so we don't get into the kind of
> > situation we did with 4.0 again.
> >
> > I threw together a couple of JIRA queries that paint a pretty navigable
> > picture IMO:
> >
> > Total JIRA for test failures:
> >
> >
> https://issues.apache.org/jira/issues/?filter=12350869&jql=project%20%3D%20Cassandra%20AND%20resolution%20%3D%20unresolved%20AND%20(summary%20~%20flaky%20OR%20summary%20~%20test%20OR%20component%20%3D%20%22Test%2Funit%22)%20AND%20type%20%3D%20bug%20AND%20issuekey%20not%20in%20(CASSANDRA-16010%2C%20CASSANDRA-16024%2C%20CASSANDRA-16022%2C%20CASSANDRA-16021%2C%20CASSANDRA-16025%2C%20CASSANDRA-16023)%20AND%20summary%20!~%20hardening%20ORDER%20BY%20cf%5B12313825%5D%20ASC
> > (sorry for the URL) - 112 failures
> >
> > # of failures more recent than 6 months:
> > https://issues.apache.org/jira/issues/?filter=12350869
> > 10 failures.
> >
> > In the interest of tidying this up and staying on top of it going
> forward,
> > I propose the following:
> > 1. We close as won't fix all test failures created >= 6 months ago (We
> had
> > a big push for 4.0 and a lot of this JIRA content is stale)
> > 2. We switch the "Bug Category" for these 10 more recent to "Correctness
> -
> > Test Failure"
> > 3. We document a "canonical" workflow around test failures that links to
> a
> > saved JIRA filter query that includes:
> > 4. When you're working on something and you see a test failure that isn't
> > related to your patch, check that filter, see if the test name is there,
> > and if not create a new ticket w/that Bug Category
> >
> > In theory this should give us a single source of truth for documented
> test
> > failures as well as an entry point for new contributors.
> >
> > Thoughts?
> >
> > ~Josh
> >
>

Re: Keeping on top of test failures

Posted by Benjamin Lerer <bl...@apache.org>.

Thanks for the proposal Josh.
It sounds good to me.

Le jeu. 9 sept. 2021 à 17:32, Joshua McKenzie <jm...@apache.org> a
écrit :

> (Taking #cassandra-dev slack chat to here)
>
> For context, we have a long history of an ebb and flow of flaky test
> failures building up and getting burned down, but don't really have a
> workflow or discipline around having a clean snapshot of where we are or
> attempting to stay at some kind of steady state. We have thousands of tests
> executing in a wide variety of environments: this state is to be expected,
> but I argue needs to be actively managed so we don't get into the kind of
> situation we did with 4.0 again.
>
> I threw together a couple of JIRA queries that paint a pretty navigable
> picture IMO:
>
> Total JIRA for test failures:
>
> https://issues.apache.org/jira/issues/?filter=12350869&jql=project%20%3D%20Cassandra%20AND%20resolution%20%3D%20unresolved%20AND%20(summary%20~%20flaky%20OR%20summary%20~%20test%20OR%20component%20%3D%20%22Test%2Funit%22)%20AND%20type%20%3D%20bug%20AND%20issuekey%20not%20in%20(CASSANDRA-16010%2C%20CASSANDRA-16024%2C%20CASSANDRA-16022%2C%20CASSANDRA-16021%2C%20CASSANDRA-16025%2C%20CASSANDRA-16023)%20AND%20summary%20!~%20hardening%20ORDER%20BY%20cf%5B12313825%5D%20ASC
> (sorry for the URL) - 112 failures
>
> # of failures more recent than 6 months:
> https://issues.apache.org/jira/issues/?filter=12350869
> 10 failures.
>
> In the interest of tidying this up and staying on top of it going forward,
> I propose the following:
> 1. We close as won't fix all test failures created >= 6 months ago (We had
> a big push for 4.0 and a lot of this JIRA content is stale)
> 2. We switch the "Bug Category" for these 10 more recent to "Correctness -
> Test Failure"
> 3. We document a "canonical" workflow around test failures that links to a
> saved JIRA filter query that includes:
> 4. When you're working on something and you see a test failure that isn't
> related to your patch, check that filter, see if the test name is there,
> and if not create a new ticket w/that Bug Category
>
> In theory this should give us a single source of truth for documented test
> failures as well as an entry point for new contributors.
>
> Thoughts?
>
> ~Josh
>

Re: Keeping on top of test failures

Posted by Joshua McKenzie <jm...@apache.org>.

Closed out in bulk with a comment (liking that Auto Closed resolution),
looks like I managed not to accidentally email everyone on each update, and
will be looking to get the process into the website soon.

~Josh

On Sat, Sep 11, 2021 at 2:52 AM Berenguer Blasi <be...@gmail.com>
wrote:

> +100 to closing anything that old after the big 4.0 push
>
> On 10/9/21 18:21, Joshua McKenzie wrote:
> > Thanks for the feedback everyone. Drafting site changes now and I'll pull
> > the trigger on JIRA probably Monday; give people the weekend to chew on
> > this.
> >
> > If I open up the window to 52 weeks, we still only have 13 of the test
> > failure tickets being created in that window. Figure it's probably safe
> to
> > close out year old flaky failure tickets.
> >
> > ~Josh
> >
> > On Thu, Sep 9, 2021 at 5:01 PM David Capwell <dcapwell@apple.com.invalid
> >
> > wrote:
> >
> >> +1
> >>
> >>> On Sep 9, 2021, at 10:27 AM, Mick Semb Wever <mc...@apache.org> wrote:
> >>>
> >>> +1, much appreciated.
> >>>
> >>>
> >>> On 2021/09/09 16:03:31, Andrés de la Peña <a....@gmail.com>
> >> wrote:
> >>>> +1, thanks for the proposal.
> >>>>
> >>>> On Thu, 9 Sept 2021 at 16:45, Brandon Williams <dr...@gmail.com>
> >> wrote:
> >>>>> +1
> >>>>>
> >>>>> On Thu, Sep 9, 2021 at 10:39 AM Joshua McKenzie <
> jmckenzie@apache.org>
> >>>>> wrote:
> >>>>>> (Taking #cassandra-dev slack chat to here)
> >>>>>>
> >>>>>> For context, we have a long history of an ebb and flow of flaky test
> >>>>>> failures building up and getting burned down, but don't really have
> a
> >>>>>> workflow or discipline around having a clean snapshot of where we
> are
> >> or
> >>>>>> attempting to stay at some kind of steady state. We have thousands
> of
> >>>>> tests
> >>>>>> executing in a wide variety of environments: this state is to be
> >>>>> expected,
> >>>>>> but I argue needs to be actively managed so we don't get into the
> >> kind of
> >>>>>> situation we did with 4.0 again.
> >>>>>>
> >>>>>> I threw together a couple of JIRA queries that paint a pretty
> >> navigable
> >>>>>> picture IMO:
> >>>>>>
> >>>>>> Total JIRA for test failures:
> >>>>>>
> >>
> https://issues.apache.org/jira/issues/?filter=12350869&jql=project%20%3D%20Cassandra%20AND%20resolution%20%3D%20unresolved%20AND%20(summary%20~%20flaky%20OR%20summary%20~%20test%20OR%20component%20%3D%20%22Test%2Funit%22)%20AND%20type%20%3D%20bug%20AND%20issuekey%20not%20in%20(CASSANDRA-16010%2C%20CASSANDRA-16024%2C%20CASSANDRA-16022%2C%20CASSANDRA-16021%2C%20CASSANDRA-16025%2C%20CASSANDRA-16023)%20AND%20summary%20!~%20hardening%20ORDER%20BY%20cf%5B12313825%5D%20ASC
> >>>>>> (sorry for the URL) - 112 failures
> >>>>>>
> >>>>>> # of failures more recent than 6 months:
> >>>>>> https://issues.apache.org/jira/issues/?filter=12350869
> >>>>>> 10 failures.
> >>>>>>
> >>>>>> In the interest of tidying this up and staying on top of it going
> >>>>> forward,
> >>>>>> I propose the following:
> >>>>>> 1. We close as won't fix all test failures created >= 6 months ago
> (We
> >>>>> had
> >>>>>> a big push for 4.0 and a lot of this JIRA content is stale)
> >>>>>> 2. We switch the "Bug Category" for these 10 more recent to
> >> "Correctness
> >>>>> -
> >>>>>> Test Failure"
> >>>>>> 3. We document a "canonical" workflow around test failures that
> links
> >> to
> >>>>> a
> >>>>>> saved JIRA filter query that includes:
> >>>>>> 4. When you're working on something and you see a test failure that
> >> isn't
> >>>>>> related to your patch, check that filter, see if the test name is
> >> there,
> >>>>>> and if not create a new ticket w/that Bug Category
> >>>>>>
> >>>>>> In theory this should give us a single source of truth for
> documented
> >>>>> test
> >>>>>> failures as well as an entry point for new contributors.
> >>>>>>
> >>>>>> Thoughts?
> >>>>>>
> >>>>>> ~Josh
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>
> >>>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: Keeping on top of test failures

Posted by Berenguer Blasi <be...@gmail.com>.

+100 to closing anything that old after the big 4.0 push

On 10/9/21 18:21, Joshua McKenzie wrote:
> Thanks for the feedback everyone. Drafting site changes now and I'll pull
> the trigger on JIRA probably Monday; give people the weekend to chew on
> this.
>
> If I open up the window to 52 weeks, we still only have 13 of the test
> failure tickets being created in that window. Figure it's probably safe to
> close out year old flaky failure tickets.
>
> ~Josh
>
> On Thu, Sep 9, 2021 at 5:01 PM David Capwell <dc...@apple.com.invalid>
> wrote:
>
>> +1
>>
>>> On Sep 9, 2021, at 10:27 AM, Mick Semb Wever <mc...@apache.org> wrote:
>>>
>>> +1, much appreciated.
>>>
>>>
>>> On 2021/09/09 16:03:31, Andrés de la Peña <a....@gmail.com>
>> wrote:
>>>> +1, thanks for the proposal.
>>>>
>>>> On Thu, 9 Sept 2021 at 16:45, Brandon Williams <dr...@gmail.com>
>> wrote:
>>>>> +1
>>>>>
>>>>> On Thu, Sep 9, 2021 at 10:39 AM Joshua McKenzie <jm...@apache.org>
>>>>> wrote:
>>>>>> (Taking #cassandra-dev slack chat to here)
>>>>>>
>>>>>> For context, we have a long history of an ebb and flow of flaky test
>>>>>> failures building up and getting burned down, but don't really have a
>>>>>> workflow or discipline around having a clean snapshot of where we are
>> or
>>>>>> attempting to stay at some kind of steady state. We have thousands of
>>>>> tests
>>>>>> executing in a wide variety of environments: this state is to be
>>>>> expected,
>>>>>> but I argue needs to be actively managed so we don't get into the
>> kind of
>>>>>> situation we did with 4.0 again.
>>>>>>
>>>>>> I threw together a couple of JIRA queries that paint a pretty
>> navigable
>>>>>> picture IMO:
>>>>>>
>>>>>> Total JIRA for test failures:
>>>>>>
>> https://issues.apache.org/jira/issues/?filter=12350869&jql=project%20%3D%20Cassandra%20AND%20resolution%20%3D%20unresolved%20AND%20(summary%20~%20flaky%20OR%20summary%20~%20test%20OR%20component%20%3D%20%22Test%2Funit%22)%20AND%20type%20%3D%20bug%20AND%20issuekey%20not%20in%20(CASSANDRA-16010%2C%20CASSANDRA-16024%2C%20CASSANDRA-16022%2C%20CASSANDRA-16021%2C%20CASSANDRA-16025%2C%20CASSANDRA-16023)%20AND%20summary%20!~%20hardening%20ORDER%20BY%20cf%5B12313825%5D%20ASC
>>>>>> (sorry for the URL) - 112 failures
>>>>>>
>>>>>> # of failures more recent than 6 months:
>>>>>> https://issues.apache.org/jira/issues/?filter=12350869
>>>>>> 10 failures.
>>>>>>
>>>>>> In the interest of tidying this up and staying on top of it going
>>>>> forward,
>>>>>> I propose the following:
>>>>>> 1. We close as won't fix all test failures created >= 6 months ago (We
>>>>> had
>>>>>> a big push for 4.0 and a lot of this JIRA content is stale)
>>>>>> 2. We switch the "Bug Category" for these 10 more recent to
>> "Correctness
>>>>> -
>>>>>> Test Failure"
>>>>>> 3. We document a "canonical" workflow around test failures that links
>> to
>>>>> a
>>>>>> saved JIRA filter query that includes:
>>>>>> 4. When you're working on something and you see a test failure that
>> isn't
>>>>>> related to your patch, check that filter, see if the test name is
>> there,
>>>>>> and if not create a new ticket w/that Bug Category
>>>>>>
>>>>>> In theory this should give us a single source of truth for documented
>>>>> test
>>>>>> failures as well as an entry point for new contributors.
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> ~Josh
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>
>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Keeping on top of test failures

Posted by Joshua McKenzie <jm...@apache.org>.

Thanks for the feedback everyone. Drafting site changes now and I'll pull
the trigger on JIRA probably Monday; give people the weekend to chew on
this.

If I open up the window to 52 weeks, we still only have 13 of the test
failure tickets being created in that window. Figure it's probably safe to
close out year old flaky failure tickets.

~Josh

On Thu, Sep 9, 2021 at 5:01 PM David Capwell <dc...@apple.com.invalid>
wrote:

> +1
>
> > On Sep 9, 2021, at 10:27 AM, Mick Semb Wever <mc...@apache.org> wrote:
> >
> > +1, much appreciated.
> >
> >
> > On 2021/09/09 16:03:31, Andrés de la Peña <a....@gmail.com>
> wrote:
> >> +1, thanks for the proposal.
> >>
> >> On Thu, 9 Sept 2021 at 16:45, Brandon Williams <dr...@gmail.com>
> wrote:
> >>
> >>> +1
> >>>
> >>> On Thu, Sep 9, 2021 at 10:39 AM Joshua McKenzie <jm...@apache.org>
> >>> wrote:
> >>>>
> >>>> (Taking #cassandra-dev slack chat to here)
> >>>>
> >>>> For context, we have a long history of an ebb and flow of flaky test
> >>>> failures building up and getting burned down, but don't really have a
> >>>> workflow or discipline around having a clean snapshot of where we are
> or
> >>>> attempting to stay at some kind of steady state. We have thousands of
> >>> tests
> >>>> executing in a wide variety of environments: this state is to be
> >>> expected,
> >>>> but I argue needs to be actively managed so we don't get into the
> kind of
> >>>> situation we did with 4.0 again.
> >>>>
> >>>> I threw together a couple of JIRA queries that paint a pretty
> navigable
> >>>> picture IMO:
> >>>>
> >>>> Total JIRA for test failures:
> >>>>
> >>>
> https://issues.apache.org/jira/issues/?filter=12350869&jql=project%20%3D%20Cassandra%20AND%20resolution%20%3D%20unresolved%20AND%20(summary%20~%20flaky%20OR%20summary%20~%20test%20OR%20component%20%3D%20%22Test%2Funit%22)%20AND%20type%20%3D%20bug%20AND%20issuekey%20not%20in%20(CASSANDRA-16010%2C%20CASSANDRA-16024%2C%20CASSANDRA-16022%2C%20CASSANDRA-16021%2C%20CASSANDRA-16025%2C%20CASSANDRA-16023)%20AND%20summary%20!~%20hardening%20ORDER%20BY%20cf%5B12313825%5D%20ASC
> >>>> (sorry for the URL) - 112 failures
> >>>>
> >>>> # of failures more recent than 6 months:
> >>>> https://issues.apache.org/jira/issues/?filter=12350869
> >>>> 10 failures.
> >>>>
> >>>> In the interest of tidying this up and staying on top of it going
> >>> forward,
> >>>> I propose the following:
> >>>> 1. We close as won't fix all test failures created >= 6 months ago (We
> >>> had
> >>>> a big push for 4.0 and a lot of this JIRA content is stale)
> >>>> 2. We switch the "Bug Category" for these 10 more recent to
> "Correctness
> >>> -
> >>>> Test Failure"
> >>>> 3. We document a "canonical" workflow around test failures that links
> to
> >>> a
> >>>> saved JIRA filter query that includes:
> >>>> 4. When you're working on something and you see a test failure that
> isn't
> >>>> related to your patch, check that filter, see if the test name is
> there,
> >>>> and if not create a new ticket w/that Bug Category
> >>>>
> >>>> In theory this should give us a single source of truth for documented
> >>> test
> >>>> failures as well as an entry point for new contributors.
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> ~Josh
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>
> >>>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: Keeping on top of test failures

Posted by David Capwell <dc...@apple.com.INVALID>.

+1

> On Sep 9, 2021, at 10:27 AM, Mick Semb Wever <mc...@apache.org> wrote:
> 
> +1, much appreciated.
> 
> 
> On 2021/09/09 16:03:31, Andrés de la Peña <a....@gmail.com> wrote: 
>> +1, thanks for the proposal.
>> 
>> On Thu, 9 Sept 2021 at 16:45, Brandon Williams <dr...@gmail.com> wrote:
>> 
>>> +1
>>> 
>>> On Thu, Sep 9, 2021 at 10:39 AM Joshua McKenzie <jm...@apache.org>
>>> wrote:
>>>> 
>>>> (Taking #cassandra-dev slack chat to here)
>>>> 
>>>> For context, we have a long history of an ebb and flow of flaky test
>>>> failures building up and getting burned down, but don't really have a
>>>> workflow or discipline around having a clean snapshot of where we are or
>>>> attempting to stay at some kind of steady state. We have thousands of
>>> tests
>>>> executing in a wide variety of environments: this state is to be
>>> expected,
>>>> but I argue needs to be actively managed so we don't get into the kind of
>>>> situation we did with 4.0 again.
>>>> 
>>>> I threw together a couple of JIRA queries that paint a pretty navigable
>>>> picture IMO:
>>>> 
>>>> Total JIRA for test failures:
>>>> 
>>> https://issues.apache.org/jira/issues/?filter=12350869&jql=project%20%3D%20Cassandra%20AND%20resolution%20%3D%20unresolved%20AND%20(summary%20~%20flaky%20OR%20summary%20~%20test%20OR%20component%20%3D%20%22Test%2Funit%22)%20AND%20type%20%3D%20bug%20AND%20issuekey%20not%20in%20(CASSANDRA-16010%2C%20CASSANDRA-16024%2C%20CASSANDRA-16022%2C%20CASSANDRA-16021%2C%20CASSANDRA-16025%2C%20CASSANDRA-16023)%20AND%20summary%20!~%20hardening%20ORDER%20BY%20cf%5B12313825%5D%20ASC
>>>> (sorry for the URL) - 112 failures
>>>> 
>>>> # of failures more recent than 6 months:
>>>> https://issues.apache.org/jira/issues/?filter=12350869
>>>> 10 failures.
>>>> 
>>>> In the interest of tidying this up and staying on top of it going
>>> forward,
>>>> I propose the following:
>>>> 1. We close as won't fix all test failures created >= 6 months ago (We
>>> had
>>>> a big push for 4.0 and a lot of this JIRA content is stale)
>>>> 2. We switch the "Bug Category" for these 10 more recent to "Correctness
>>> -
>>>> Test Failure"
>>>> 3. We document a "canonical" workflow around test failures that links to
>>> a
>>>> saved JIRA filter query that includes:
>>>> 4. When you're working on something and you see a test failure that isn't
>>>> related to your patch, check that filter, see if the test name is there,
>>>> and if not create a new ticket w/that Bug Category
>>>> 
>>>> In theory this should give us a single source of truth for documented
>>> test
>>>> failures as well as an entry point for new contributors.
>>>> 
>>>> Thoughts?
>>>> 
>>>> ~Josh
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Keeping on top of test failures

Posted by Mick Semb Wever <mc...@apache.org>.

+1, much appreciated.


On 2021/09/09 16:03:31, Andrés de la Peña <a....@gmail.com> wrote: 
> +1, thanks for the proposal.
> 
> On Thu, 9 Sept 2021 at 16:45, Brandon Williams <dr...@gmail.com> wrote:
> 
> > +1
> >
> > On Thu, Sep 9, 2021 at 10:39 AM Joshua McKenzie <jm...@apache.org>
> > wrote:
> > >
> > > (Taking #cassandra-dev slack chat to here)
> > >
> > > For context, we have a long history of an ebb and flow of flaky test
> > > failures building up and getting burned down, but don't really have a
> > > workflow or discipline around having a clean snapshot of where we are or
> > > attempting to stay at some kind of steady state. We have thousands of
> > tests
> > > executing in a wide variety of environments: this state is to be
> > expected,
> > > but I argue needs to be actively managed so we don't get into the kind of
> > > situation we did with 4.0 again.
> > >
> > > I threw together a couple of JIRA queries that paint a pretty navigable
> > > picture IMO:
> > >
> > > Total JIRA for test failures:
> > >
> > https://issues.apache.org/jira/issues/?filter=12350869&jql=project%20%3D%20Cassandra%20AND%20resolution%20%3D%20unresolved%20AND%20(summary%20~%20flaky%20OR%20summary%20~%20test%20OR%20component%20%3D%20%22Test%2Funit%22)%20AND%20type%20%3D%20bug%20AND%20issuekey%20not%20in%20(CASSANDRA-16010%2C%20CASSANDRA-16024%2C%20CASSANDRA-16022%2C%20CASSANDRA-16021%2C%20CASSANDRA-16025%2C%20CASSANDRA-16023)%20AND%20summary%20!~%20hardening%20ORDER%20BY%20cf%5B12313825%5D%20ASC
> > > (sorry for the URL) - 112 failures
> > >
> > > # of failures more recent than 6 months:
> > > https://issues.apache.org/jira/issues/?filter=12350869
> > > 10 failures.
> > >
> > > In the interest of tidying this up and staying on top of it going
> > forward,
> > > I propose the following:
> > > 1. We close as won't fix all test failures created >= 6 months ago (We
> > had
> > > a big push for 4.0 and a lot of this JIRA content is stale)
> > > 2. We switch the "Bug Category" for these 10 more recent to "Correctness
> > -
> > > Test Failure"
> > > 3. We document a "canonical" workflow around test failures that links to
> > a
> > > saved JIRA filter query that includes:
> > > 4. When you're working on something and you see a test failure that isn't
> > > related to your patch, check that filter, see if the test name is there,
> > > and if not create a new ticket w/that Bug Category
> > >
> > > In theory this should give us a single source of truth for documented
> > test
> > > failures as well as an entry point for new contributors.
> > >
> > > Thoughts?
> > >
> > > ~Josh
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Keeping on top of test failures

Posted by Andrés de la Peña <a....@gmail.com>.

+1, thanks for the proposal.

On Thu, 9 Sept 2021 at 16:45, Brandon Williams <dr...@gmail.com> wrote:

> +1
>
> On Thu, Sep 9, 2021 at 10:39 AM Joshua McKenzie <jm...@apache.org>
> wrote:
> >
> > (Taking #cassandra-dev slack chat to here)
> >
> > For context, we have a long history of an ebb and flow of flaky test
> > failures building up and getting burned down, but don't really have a
> > workflow or discipline around having a clean snapshot of where we are or
> > attempting to stay at some kind of steady state. We have thousands of
> tests
> > executing in a wide variety of environments: this state is to be
> expected,
> > but I argue needs to be actively managed so we don't get into the kind of
> > situation we did with 4.0 again.
> >
> > I threw together a couple of JIRA queries that paint a pretty navigable
> > picture IMO:
> >
> > Total JIRA for test failures:
> >
> https://issues.apache.org/jira/issues/?filter=12350869&jql=project%20%3D%20Cassandra%20AND%20resolution%20%3D%20unresolved%20AND%20(summary%20~%20flaky%20OR%20summary%20~%20test%20OR%20component%20%3D%20%22Test%2Funit%22)%20AND%20type%20%3D%20bug%20AND%20issuekey%20not%20in%20(CASSANDRA-16010%2C%20CASSANDRA-16024%2C%20CASSANDRA-16022%2C%20CASSANDRA-16021%2C%20CASSANDRA-16025%2C%20CASSANDRA-16023)%20AND%20summary%20!~%20hardening%20ORDER%20BY%20cf%5B12313825%5D%20ASC
> > (sorry for the URL) - 112 failures
> >
> > # of failures more recent than 6 months:
> > https://issues.apache.org/jira/issues/?filter=12350869
> > 10 failures.
> >
> > In the interest of tidying this up and staying on top of it going
> forward,
> > I propose the following:
> > 1. We close as won't fix all test failures created >= 6 months ago (We
> had
> > a big push for 4.0 and a lot of this JIRA content is stale)
> > 2. We switch the "Bug Category" for these 10 more recent to "Correctness
> -
> > Test Failure"
> > 3. We document a "canonical" workflow around test failures that links to
> a
> > saved JIRA filter query that includes:
> > 4. When you're working on something and you see a test failure that isn't
> > related to your patch, check that filter, see if the test name is there,
> > and if not create a new ticket w/that Bug Category
> >
> > In theory this should give us a single source of truth for documented
> test
> > failures as well as an entry point for new contributors.
> >
> > Thoughts?
> >
> > ~Josh
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: Keeping on top of test failures

Posted by Brandon Williams <dr...@gmail.com>.

+1

On Thu, Sep 9, 2021 at 10:39 AM Joshua McKenzie <jm...@apache.org> wrote:
>
> (Taking #cassandra-dev slack chat to here)
>
> For context, we have a long history of an ebb and flow of flaky test
> failures building up and getting burned down, but don't really have a
> workflow or discipline around having a clean snapshot of where we are or
> attempting to stay at some kind of steady state. We have thousands of tests
> executing in a wide variety of environments: this state is to be expected,
> but I argue needs to be actively managed so we don't get into the kind of
> situation we did with 4.0 again.
>
> I threw together a couple of JIRA queries that paint a pretty navigable
> picture IMO:
>
> Total JIRA for test failures:
> https://issues.apache.org/jira/issues/?filter=12350869&jql=project%20%3D%20Cassandra%20AND%20resolution%20%3D%20unresolved%20AND%20(summary%20~%20flaky%20OR%20summary%20~%20test%20OR%20component%20%3D%20%22Test%2Funit%22)%20AND%20type%20%3D%20bug%20AND%20issuekey%20not%20in%20(CASSANDRA-16010%2C%20CASSANDRA-16024%2C%20CASSANDRA-16022%2C%20CASSANDRA-16021%2C%20CASSANDRA-16025%2C%20CASSANDRA-16023)%20AND%20summary%20!~%20hardening%20ORDER%20BY%20cf%5B12313825%5D%20ASC
> (sorry for the URL) - 112 failures
>
> # of failures more recent than 6 months:
> https://issues.apache.org/jira/issues/?filter=12350869
> 10 failures.
>
> In the interest of tidying this up and staying on top of it going forward,
> I propose the following:
> 1. We close as won't fix all test failures created >= 6 months ago (We had
> a big push for 4.0 and a lot of this JIRA content is stale)
> 2. We switch the "Bug Category" for these 10 more recent to "Correctness -
> Test Failure"
> 3. We document a "canonical" workflow around test failures that links to a
> saved JIRA filter query that includes:
> 4. When you're working on something and you see a test failure that isn't
> related to your patch, check that filter, see if the test name is there,
> and if not create a new ticket w/that Bug Category
>
> In theory this should give us a single source of truth for documented test
> failures as well as an entry point for new contributors.
>
> Thoughts?
>
> ~Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org