You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Jeff Jirsa <jj...@gmail.com> on 2018/02/15 18:10:33 UTC

Release votes

Moving this to it’s own thread:

We’ve declared this a requirement multiple times and then we occasionally get a critical issue and have to decide whether it’s worth the delay. I assume Jason’s earlier -1 on attempt 1 was an enforcement of that earlier stated goal. 

It’s up to the PMC. We’ve said in the past that we don’t release without green tests. The PMC gets to vote and enforce it. If you don’t vote yes without seeing the test results, that enforces it. 

-- 
Jeff Jirsa


> On Feb 15, 2018, at 9:49 AM, Josh McKenzie <jm...@apache.org> wrote:
> 
> What would it take for us to get green utest/dtests as a blocking part of
> the release process? i.e. "for any given SHA, here's a link to the tests
> that passed" in the release vote email?
> 
> That being said, +1.
> 
>> On Wed, Feb 14, 2018 at 4:33 PM, Nate McCall <zz...@gmail.com> wrote:
>> 
>> +1
>> 
>> On Thu, Feb 15, 2018 at 9:40 AM, Michael Shuler <mi...@pbandjelly.org>
>> wrote:
>>> I propose the following artifacts for release as 3.0.16.
>>> 
>>> sha1: 890f319142ddd3cf2692ff45ff28e71001365e96
>>> Git:
>>> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
>> shortlog;h=refs/tags/3.0.16-tentative
>>> Artifacts:
>>> https://repository.apache.org/content/repositories/
>> orgapachecassandra-1157/org/apache/cassandra/apache-cassandra/3.0.16/
>>> Staging repository:
>>> https://repository.apache.org/content/repositories/
>> orgapachecassandra-1157/
>>> 
>>> Debian and RPM packages are available here:
>>> http://people.apache.org/~mshuler
>>> 
>>> *** This release addresses an important fix for CASSANDRA-14092 ***
>>>    "Max ttl of 20 years will overflow localDeletionTime"
>>>    https://issues.apache.org/jira/browse/CASSANDRA-14092
>>> 
>>> The vote will be open for 72 hours (longer if needed).
>>> 
>>> [1]: (CHANGES.txt) https://goo.gl/rLj59Z
>>> [2]: (NEWS.txt) https://goo.gl/EkrT4G
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Release votes

Posted by Josh McKenzie <jm...@apache.org>.

It stands out to me that Google a) does pre-submit test runs, b) has a
dedicated team to identify and work with flaky tests, c) has some kind of
quarantining, and d) doesn't report on a test being flaky unless it fails
3x in a row if it's marked as flaky. While we obviously can't pursue b
since we're volunteer OSS, and a isn't really tractable with the current
build infra, c and d seem like they could be tractable.

We've talked previously about quarantining flaky failures w/an annotation
and then slowly re-integrating them as we fixed them. At the time, the
general concern was that we could false negative something as flaky that
was a genuine defect and/or tests remaining in a flaky state into
perpetuity. In retrospect, I've come around to thinking that using a flaky
annotation for quarantining and treating those failures as LHF as an easy
way for people to get involved in the project plus reducing the noise on
new work would be worth the risk of some test atrophy if we collectively
fail to have discipline.

In general it seems like a multi-pronged approach is the only realistic way
to keep this problem under control.

On Fri, Feb 16, 2018 at 8:12 AM, Jason Brown <ja...@gmail.com> wrote:

> Hi,
>
> I'm ecstatic others are now running the tests and, more importantly, that
> we're having the conversation.
>
> I've become convinced we cannot always have 100% green tests. I am reminded
> of this [1] blog post from Google when thinking about flaky tests.
> The TL;DR is "flakiness happens", to the tune of about 1.5% of all tests
> across Google.
>
> I am in no way advocating that we simply turn a blind eye to broken or
> flaky tests, or shrug our shoulders
> and rubber stamp a vote, but instead to accept it when it reasonably
> applies.
> To achieve this, we might need to have discussion at vote/release time (if
> not sooner) to triage flaky tests, but I see that as a good thing.
>
> Thanks,
>
> -Jason
>
> [1]
> https://testing.googleblog.com/2016/05/flaky-tests-at-
> google-and-how-we.html
>
>
>
> On Fri, Feb 16, 2018 at 12:47 AM, Dinesh Joshi <
> dinesh.joshi@yahoo.com.invalid> wrote:
>
> > I'm new to this project and here are my two cents.
> > If there are tests that are constantly failing or flaky and you have had
> > releases despite their failures, then they're not useful and can be
> > disabled. They can always be reenabled if they are in fact valuable.
> Having
> > 100% blue dashboard is not idealistic IMHO. Hardware failures are harder
> > but they can be addressed too.
> > I could pitch in to fix the noisy tests or just help in other ways to get
> > the dashboard to blue.
> > Dinesh
> > On Thursday, February 15, 2018, 1:14:33 PM PST, Josh McKenzie <
> > jmckenzie@apache.org> wrote:
> >  >
> > > We’ve said in the past that we don’t release without green tests. The
> PMC
> > > gets to vote and enforce it. If you don’t vote yes without seeing the
> > test
> > > results, that enforces it.
> >
> > I think this is noble and ideal in theory. In practice, the tests take
> long
> > enough, hardware infra has proven flaky enough, and the tests
> *themselves*
> > flaky enough, that there's been a consistent low-level of test failure
> > noise that makes separating signal from noise in this context very time
> > consuming. Reference 3.11-test-all for example re:noise:
> > https://builds.apache.org/view/A-D/view/Cassandra/job/
> > Cassandra-3.11-test-all/test/?width=1024&height=768
> >
> > Having spearheaded burning test failures to 0 multiple times and have
> them
> > regress over time, my gut intuition is we should have one person as our
> > Source of Truth with a less-flaky source for release-vetting CI
> (dedicated
> > hardware, circle account, etc) we can use as a reference to vote on
> release
> > SHA's.
> >
> > We’ve declared this a requirement multiple times
> >
> > Declaring things != changed behavior, and thus != changed culture. The
> > culture on this project is one of having a constant low level of test
> > failure noise in our CI as a product of our working processes. Unless we
> > change those (actually block release w/out green board, actually
> > aggressively block merge w/any failing tests, aggressively retroactively
> > track down test failures on a daily basis and RCA), the situation won't
> > improve. Given that this is a volunteer organization / project, that kind
> > of daily time investment is a big ask.
> >
> > On Thu, Feb 15, 2018 at 1:10 PM, Jeff Jirsa <jj...@gmail.com> wrote:
> >
> > > Moving this to it’s own thread:
> > >
> > > We’ve declared this a requirement multiple times and then we
> occasionally
> > > get a critical issue and have to decide whether it’s worth the delay. I
> > > assume Jason’s earlier -1 on attempt 1 was an enforcement of that
> earlier
> > > stated goal.
> > >
> > > It’s up to the PMC. We’ve said in the past that we don’t release
> without
> > > green tests. The PMC gets to vote and enforce it. If you don’t vote yes
> > > without seeing the test results, that enforces it.
> > >
> > > --
> > > Jeff Jirsa
> > >
> > >
> > > > On Feb 15, 2018, at 9:49 AM, Josh McKenzie <jm...@apache.org>
> > wrote:
> > > >
> > > > What would it take for us to get green utest/dtests as a blocking
> part
> > of
> > > > the release process? i.e. "for any given SHA, here's a link to the
> > tests
> > > > that passed" in the release vote email?
> > > >
> > > > That being said, +1.
> > > >
> > > >> On Wed, Feb 14, 2018 at 4:33 PM, Nate McCall <zz...@gmail.com>
> > > wrote:
> > > >>
> > > >> +1
> > > >>
> > > >> On Thu, Feb 15, 2018 at 9:40 AM, Michael Shuler <
> > michael@pbandjelly.org
> > > >
> > > >> wrote:
> > > >>> I propose the following artifacts for release as 3.0.16.
> > > >>>
> > > >>> sha1: 890f319142ddd3cf2692ff45ff28e71001365e96
> > > >>> Git:
> > > >>> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> > > >> shortlog;h=refs/tags/3.0.16-tentative
> > > >>> Artifacts:
> > > >>> https://repository.apache.org/content/repositories/
> > > >> orgapachecassandra-1157/org/apache/cassandra/apache-
> cassandra/3.0.16/
> > > >>> Staging repository:
> > > >>> https://repository.apache.org/content/repositories/
> > > >> orgapachecassandra-1157/
> > > >>>
> > > >>> Debian and RPM packages are available here:
> > > >>> http://people.apache.org/~mshuler
> > > >>>
> > > >>> *** This release addresses an important fix for CASSANDRA-14092 ***
> > > >>>    "Max ttl of 20 years will overflow localDeletionTime"
> > > >>>    https://issues.apache.org/jira/browse/CASSANDRA-14092
> > > >>>
> > > >>> The vote will be open for 72 hours (longer if needed).
> > > >>>
> > > >>> [1]: (CHANGES.txt) https://goo.gl/rLj59Z
> > > >>> [2]: (NEWS.txt) https://goo.gl/EkrT4G
> > > >>>
> > > >>> ------------------------------------------------------------
> > ---------
> > > >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >>>
> > > >>
> > > >> ------------------------------------------------------------
> ---------
> > > >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >>
> > > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> > >
> >
>

Re: Release votes

Posted by Jason Brown <ja...@gmail.com>.

Hi,

I'm ecstatic others are now running the tests and, more importantly, that
we're having the conversation.

I've become convinced we cannot always have 100% green tests. I am reminded
of this [1] blog post from Google when thinking about flaky tests.
The TL;DR is "flakiness happens", to the tune of about 1.5% of all tests
across Google.

I am in no way advocating that we simply turn a blind eye to broken or
flaky tests, or shrug our shoulders
and rubber stamp a vote, but instead to accept it when it reasonably
applies.
To achieve this, we might need to have discussion at vote/release time (if
not sooner) to triage flaky tests, but I see that as a good thing.

Thanks,

-Jason

[1]
https://testing.googleblog.com/2016/05/flaky-tests-at-google-and-how-we.html



On Fri, Feb 16, 2018 at 12:47 AM, Dinesh Joshi <
dinesh.joshi@yahoo.com.invalid> wrote:

> I'm new to this project and here are my two cents.
> If there are tests that are constantly failing or flaky and you have had
> releases despite their failures, then they're not useful and can be
> disabled. They can always be reenabled if they are in fact valuable. Having
> 100% blue dashboard is not idealistic IMHO. Hardware failures are harder
> but they can be addressed too.
> I could pitch in to fix the noisy tests or just help in other ways to get
> the dashboard to blue.
> Dinesh
> On Thursday, February 15, 2018, 1:14:33 PM PST, Josh McKenzie <
> jmckenzie@apache.org> wrote:
>  >
> > We’ve said in the past that we don’t release without green tests. The PMC
> > gets to vote and enforce it. If you don’t vote yes without seeing the
> test
> > results, that enforces it.
>
> I think this is noble and ideal in theory. In practice, the tests take long
> enough, hardware infra has proven flaky enough, and the tests *themselves*
> flaky enough, that there's been a consistent low-level of test failure
> noise that makes separating signal from noise in this context very time
> consuming. Reference 3.11-test-all for example re:noise:
> https://builds.apache.org/view/A-D/view/Cassandra/job/
> Cassandra-3.11-test-all/test/?width=1024&height=768
>
> Having spearheaded burning test failures to 0 multiple times and have them
> regress over time, my gut intuition is we should have one person as our
> Source of Truth with a less-flaky source for release-vetting CI (dedicated
> hardware, circle account, etc) we can use as a reference to vote on release
> SHA's.
>
> We’ve declared this a requirement multiple times
>
> Declaring things != changed behavior, and thus != changed culture. The
> culture on this project is one of having a constant low level of test
> failure noise in our CI as a product of our working processes. Unless we
> change those (actually block release w/out green board, actually
> aggressively block merge w/any failing tests, aggressively retroactively
> track down test failures on a daily basis and RCA), the situation won't
> improve. Given that this is a volunteer organization / project, that kind
> of daily time investment is a big ask.
>
> On Thu, Feb 15, 2018 at 1:10 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>
> > Moving this to it’s own thread:
> >
> > We’ve declared this a requirement multiple times and then we occasionally
> > get a critical issue and have to decide whether it’s worth the delay. I
> > assume Jason’s earlier -1 on attempt 1 was an enforcement of that earlier
> > stated goal.
> >
> > It’s up to the PMC. We’ve said in the past that we don’t release without
> > green tests. The PMC gets to vote and enforce it. If you don’t vote yes
> > without seeing the test results, that enforces it.
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On Feb 15, 2018, at 9:49 AM, Josh McKenzie <jm...@apache.org>
> wrote:
> > >
> > > What would it take for us to get green utest/dtests as a blocking part
> of
> > > the release process? i.e. "for any given SHA, here's a link to the
> tests
> > > that passed" in the release vote email?
> > >
> > > That being said, +1.
> > >
> > >> On Wed, Feb 14, 2018 at 4:33 PM, Nate McCall <zz...@gmail.com>
> > wrote:
> > >>
> > >> +1
> > >>
> > >> On Thu, Feb 15, 2018 at 9:40 AM, Michael Shuler <
> michael@pbandjelly.org
> > >
> > >> wrote:
> > >>> I propose the following artifacts for release as 3.0.16.
> > >>>
> > >>> sha1: 890f319142ddd3cf2692ff45ff28e71001365e96
> > >>> Git:
> > >>> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> > >> shortlog;h=refs/tags/3.0.16-tentative
> > >>> Artifacts:
> > >>> https://repository.apache.org/content/repositories/
> > >> orgapachecassandra-1157/org/apache/cassandra/apache-cassandra/3.0.16/
> > >>> Staging repository:
> > >>> https://repository.apache.org/content/repositories/
> > >> orgapachecassandra-1157/
> > >>>
> > >>> Debian and RPM packages are available here:
> > >>> http://people.apache.org/~mshuler
> > >>>
> > >>> *** This release addresses an important fix for CASSANDRA-14092 ***
> > >>>    "Max ttl of 20 years will overflow localDeletionTime"
> > >>>    https://issues.apache.org/jira/browse/CASSANDRA-14092
> > >>>
> > >>> The vote will be open for 72 hours (longer if needed).
> > >>>
> > >>> [1]: (CHANGES.txt) https://goo.gl/rLj59Z
> > >>> [2]: (NEWS.txt) https://goo.gl/EkrT4G
> > >>>
> > >>> ------------------------------------------------------------
> ---------
> > >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>

Re: Release votes

Posted by Dinesh Joshi <di...@yahoo.com.INVALID>.

I'm new to this project and here are my two cents.
If there are tests that are constantly failing or flaky and you have had releases despite their failures, then they're not useful and can be disabled. They can always be reenabled if they are in fact valuable. Having 100% blue dashboard is not idealistic IMHO. Hardware failures are harder but they can be addressed too.
I could pitch in to fix the noisy tests or just help in other ways to get the dashboard to blue.
Dinesh
On Thursday, February 15, 2018, 1:14:33 PM PST, Josh McKenzie <jm...@apache.org> wrote: 
 >
> We’ve said in the past that we don’t release without green tests. The PMC
> gets to vote and enforce it. If you don’t vote yes without seeing the test
> results, that enforces it.

I think this is noble and ideal in theory. In practice, the tests take long
enough, hardware infra has proven flaky enough, and the tests *themselves*
flaky enough, that there's been a consistent low-level of test failure
noise that makes separating signal from noise in this context very time
consuming. Reference 3.11-test-all for example re:noise:
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-3.11-test-all/test/?width=1024&height=768

Having spearheaded burning test failures to 0 multiple times and have them
regress over time, my gut intuition is we should have one person as our
Source of Truth with a less-flaky source for release-vetting CI (dedicated
hardware, circle account, etc) we can use as a reference to vote on release
SHA's.

We’ve declared this a requirement multiple times

Declaring things != changed behavior, and thus != changed culture. The
culture on this project is one of having a constant low level of test
failure noise in our CI as a product of our working processes. Unless we
change those (actually block release w/out green board, actually
aggressively block merge w/any failing tests, aggressively retroactively
track down test failures on a daily basis and RCA), the situation won't
improve. Given that this is a volunteer organization / project, that kind
of daily time investment is a big ask.

On Thu, Feb 15, 2018 at 1:10 PM, Jeff Jirsa <jj...@gmail.com> wrote:

> Moving this to it’s own thread:
>
> We’ve declared this a requirement multiple times and then we occasionally
> get a critical issue and have to decide whether it’s worth the delay. I
> assume Jason’s earlier -1 on attempt 1 was an enforcement of that earlier
> stated goal.
>
> It’s up to the PMC. We’ve said in the past that we don’t release without
> green tests. The PMC gets to vote and enforce it. If you don’t vote yes
> without seeing the test results, that enforces it.
>
> --
> Jeff Jirsa
>
>
> > On Feb 15, 2018, at 9:49 AM, Josh McKenzie <jm...@apache.org> wrote:
> >
> > What would it take for us to get green utest/dtests as a blocking part of
> > the release process? i.e. "for any given SHA, here's a link to the tests
> > that passed" in the release vote email?
> >
> > That being said, +1.
> >
> >> On Wed, Feb 14, 2018 at 4:33 PM, Nate McCall <zz...@gmail.com>
> wrote:
> >>
> >> +1
> >>
> >> On Thu, Feb 15, 2018 at 9:40 AM, Michael Shuler <michael@pbandjelly.org
> >
> >> wrote:
> >>> I propose the following artifacts for release as 3.0.16.
> >>>
> >>> sha1: 890f319142ddd3cf2692ff45ff28e71001365e96
> >>> Git:
> >>> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> >> shortlog;h=refs/tags/3.0.16-tentative
> >>> Artifacts:
> >>> https://repository.apache.org/content/repositories/
> >> orgapachecassandra-1157/org/apache/cassandra/apache-cassandra/3.0.16/
> >>> Staging repository:
> >>> https://repository.apache.org/content/repositories/
> >> orgapachecassandra-1157/
> >>>
> >>> Debian and RPM packages are available here:
> >>> http://people.apache.org/~mshuler
> >>>
> >>> *** This release addresses an important fix for CASSANDRA-14092 ***
> >>>    "Max ttl of 20 years will overflow localDeletionTime"
> >>>    https://issues.apache.org/jira/browse/CASSANDRA-14092
> >>>
> >>> The vote will be open for 72 hours (longer if needed).
> >>>
> >>> [1]: (CHANGES.txt) https://goo.gl/rLj59Z
> >>> [2]: (NEWS.txt) https://goo.gl/EkrT4G
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

RE: Release votes

Posted by Kenneth Brotman <ke...@yahoo.com.INVALID>.

I'd like to help as well.  For me the issue is I have only my time to contribute.  The resources to test Cassandra extensively are beyond that of most individuals including me, aren't they?  If resources are made available and I can help, count me in.

Also, perhaps having a standard reference like Slender Cassandra (18 nodes total, two regions, three AZ's total, six nodes per AZ) would help.  I'll have it done very soon.

Kenneth Brotman

-----Original Message-----
From: kurt greaves [mailto:kurt@instaclustr.com] 
Sent: Thursday, February 15, 2018 3:48 PM
To: dev@cassandra.apache.org
Subject: Re: Release votes

It seems there has been a bit of a slip in testing as of recently, mostly due to the fact that there's no canonical testing environment that isn't flaky. We probably need to come up with some ideas and a plan on how we're going to do testing in the future, and how we're going to make testing accessible for all contributors. I think this is the only way we're really going to change behaviour. Having an incredibly tedious process and then being aggressive about it only leads to resentment and workarounds.

I'm completely unsure of where dtests are at since the conversion to pytest, and there's a lot of failing dtests on the ASF jenkins jobs (which appear to be running pytest). As there's currently not a lot of visibility into what people are doing with CircleCI for this it's hard to say if things are better over there. I'd like to help here if anyone wants to fill me in.

On 15 February 2018 at 21:14, Josh McKenzie <jm...@apache.org> wrote:

> >
> > We’ve said in the past that we don’t release without green tests. 
> > The PMC gets to vote and enforce it. If you don’t vote yes without 
> > seeing the
> test
> > results, that enforces it.
>
> I think this is noble and ideal in theory. In practice, the tests take 
> long enough, hardware infra has proven flaky enough, and the tests 
> *themselves* flaky enough, that there's been a consistent low-level of 
> test failure noise that makes separating signal from noise in this 
> context very time consuming. Reference 3.11-test-all for example re:noise:
> https://builds.apache.org/view/A-D/view/Cassandra/job/
> Cassandra-3.11-test-all/test/?width=1024&height=768
>
> Having spearheaded burning test failures to 0 multiple times and have 
> them regress over time, my gut intuition is we should have one person 
> as our Source of Truth with a less-flaky source for release-vetting CI 
> (dedicated hardware, circle account, etc) we can use as a reference to 
> vote on release SHA's.
>
> We’ve declared this a requirement multiple times
>
> Declaring things != changed behavior, and thus != changed culture. The 
> culture on this project is one of having a constant low level of test 
> failure noise in our CI as a product of our working processes. Unless 
> we change those (actually block release w/out green board, actually 
> aggressively block merge w/any failing tests, aggressively 
> retroactively track down test failures on a daily basis and RCA), the 
> situation won't improve. Given that this is a volunteer organization / 
> project, that kind of daily time investment is a big ask.
>
> On Thu, Feb 15, 2018 at 1:10 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>
> > Moving this to it’s own thread:
> >
> > We’ve declared this a requirement multiple times and then we 
> > occasionally get a critical issue and have to decide whether it’s 
> > worth the delay. I assume Jason’s earlier -1 on attempt 1 was an 
> > enforcement of that earlier stated goal.
> >
> > It’s up to the PMC. We’ve said in the past that we don’t release 
> > without green tests. The PMC gets to vote and enforce it. If you 
> > don’t vote yes without seeing the test results, that enforces it.
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On Feb 15, 2018, at 9:49 AM, Josh McKenzie <jm...@apache.org>
> wrote:
> > >
> > > What would it take for us to get green utest/dtests as a blocking 
> > > part
> of
> > > the release process? i.e. "for any given SHA, here's a link to the
> tests
> > > that passed" in the release vote email?
> > >
> > > That being said, +1.
> > >
> > >> On Wed, Feb 14, 2018 at 4:33 PM, Nate McCall <zz...@gmail.com>
> > wrote:
> > >>
> > >> +1
> > >>
> > >> On Thu, Feb 15, 2018 at 9:40 AM, Michael Shuler <
> michael@pbandjelly.org
> > >
> > >> wrote:
> > >>> I propose the following artifacts for release as 3.0.16.
> > >>>
> > >>> sha1: 890f319142ddd3cf2692ff45ff28e71001365e96
> > >>> Git:
> > >>> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> > >> shortlog;h=refs/tags/3.0.16-tentative
> > >>> Artifacts:
> > >>> https://repository.apache.org/content/repositories/
> > >> orgapachecassandra-1157/org/apache/cassandra/apache-cassandra/3.0
> > >> .16/
> > >>> Staging repository:
> > >>> https://repository.apache.org/content/repositories/
> > >> orgapachecassandra-1157/
> > >>>
> > >>> Debian and RPM packages are available here:
> > >>> http://people.apache.org/~mshuler
> > >>>
> > >>> *** This release addresses an important fix for CASSANDRA-14092 ***
> > >>>    "Max ttl of 20 years will overflow localDeletionTime"
> > >>>    https://issues.apache.org/jira/browse/CASSANDRA-14092
> > >>>
> > >>> The vote will be open for 72 hours (longer if needed).
> > >>>
> > >>> [1]: (CHANGES.txt) https://goo.gl/rLj59Z
> > >>> [2]: (NEWS.txt) https://goo.gl/EkrT4G
> > >>>
> > >>> ------------------------------------------------------------
> ---------
> > >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>>
> > >>
> > >> -----------------------------------------------------------------
> > >> ---- To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>
> > >>
> >
> > --------------------------------------------------------------------
> > - To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Release votes

Posted by Ariel Weisberg <ar...@weisberg.ws>.

Hi,

I created https://issues.apache.org/jira/browse/CASSANDRA-14241 for this issue. You are right there is a solid chunk of failing tests on Apache infrastructure that don't fail on CircleCI. I'll find someone to get it done.

I think that fix before commit is only going to happen if we go all the way and route every single commit through testing infrastructure that runs all the tests multiple times and refuses to merge commits unless the tests pass somewhat consistently. Short of that flakey (and hard failing) tests are going to keep creeping in (and even then). That's not feasible without much better infrastructure available to everyone and it's not a short term thing RN I think. I mean maybe we move forward with it on the Apache infrastructure we have.

I 'm not sure flakey infrastructure is what is acutely hurting us although we do have infrastructure that exposes unreliable tests although maybe that's just a matter of framing.

Dealing with flakey tests generally devolves into picking victim(s) via some process. Blocking releases on failing tests is a way of picking the people who want the next release as victims. Blocking commits on flakey tests is a way of making people who want to merge stuff the victim. Doing nothing is making some random subset of volunteers who fix the tests as well as all developers who run the tests victims as well as end users to a certain extent. Excluding tests and running tests multiple times is picking the end user of releases as the victim.

RE multi-pronged. We are currently using a flaky annotation that reruns tests, we have skipped tests with JIRAs, we are are re-running tests right now if they fail for certain classes of reasons. So we are currently down that road right now. I think it's fine but we need a backpressure mechanism because we can't keep accruing this kind of thing forever.

In my mind processes for keeping the tests passing need to provide two functions, pick victims(s) (task management), and create backpressure (slow new development to match defect rate). It seems possible to create backpressure by blocking releases, but that fails to pick victims to an extent. Many people running C* are so far behind they aren't waiting on that next release. Or they are accustomed to running a private fork and backporting. When we were able to block commits via informal process I think it helped, but an informal process has limitations.

I think blocking commits via automation is going to spread the load out most evenly and make it a priority for everyone in the contributor base. We have 16 apache nodes to work with which I think would handle our current commit load. We can fine tune criteria for blocking commits as we go.

I don't have an answer for how we backpressure the utilization of flakey annotations and re-running tests. Maybe it's a czar saying no commits until we reach some goal done on a period (every 3 months). Maybe we vote on it periodically. Czars can be really effective in moving the herd. The Czar does need to be able to wield something to motivate some set of contributors to do the work. It's not so much about preventing the commits as it is signaling unambiguously that this is what we are working on now and if you aren't you are working on the wrong thing. It ends up being quite depressing though when you end up working through significant amounts of tech debt all at once. It hurts less when you have a lot of people working on it.

Ariel

On Thu, Feb 15, 2018, at 6:48 PM, kurt greaves wrote:
> It seems there has been a bit of a slip in testing as of recently, mostly
> due to the fact that there's no canonical testing environment that isn't
> flaky. We probably need to come up with some ideas and a plan on how we're
> going to do testing in the future, and how we're going to make testing
> accessible for all contributors. I think this is the only way we're really
> going to change behaviour. Having an incredibly tedious process and then
> being aggressive about it only leads to resentment and workarounds.
> 
> I'm completely unsure of where dtests are at since the conversion to
> pytest, and there's a lot of failing dtests on the ASF jenkins jobs (which
> appear to be running pytest). As there's currently not a lot of visibility
> into what people are doing with CircleCI for this it's hard to say if
> things are better over there. I'd like to help here if anyone wants to fill
> me in.
> 
> On 15 February 2018 at 21:14, Josh McKenzie <jm...@apache.org> wrote:
> 
> > >
> > > We’ve said in the past that we don’t release without green tests. The PMC
> > > gets to vote and enforce it. If you don’t vote yes without seeing the
> > test
> > > results, that enforces it.
> >
> > I think this is noble and ideal in theory. In practice, the tests take long
> > enough, hardware infra has proven flaky enough, and the tests *themselves*
> > flaky enough, that there's been a consistent low-level of test failure
> > noise that makes separating signal from noise in this context very time
> > consuming. Reference 3.11-test-all for example re:noise:
> > https://builds.apache.org/view/A-D/view/Cassandra/job/
> > Cassandra-3.11-test-all/test/?width=1024&height=768
> >
> > Having spearheaded burning test failures to 0 multiple times and have them
> > regress over time, my gut intuition is we should have one person as our
> > Source of Truth with a less-flaky source for release-vetting CI (dedicated
> > hardware, circle account, etc) we can use as a reference to vote on release
> > SHA's.
> >
> > We’ve declared this a requirement multiple times
> >
> > Declaring things != changed behavior, and thus != changed culture. The
> > culture on this project is one of having a constant low level of test
> > failure noise in our CI as a product of our working processes. Unless we
> > change those (actually block release w/out green board, actually
> > aggressively block merge w/any failing tests, aggressively retroactively
> > track down test failures on a daily basis and RCA), the situation won't
> > improve. Given that this is a volunteer organization / project, that kind
> > of daily time investment is a big ask.
> >
> > On Thu, Feb 15, 2018 at 1:10 PM, Jeff Jirsa <jj...@gmail.com> wrote:
> >
> > > Moving this to it’s own thread:
> > >
> > > We’ve declared this a requirement multiple times and then we occasionally
> > > get a critical issue and have to decide whether it’s worth the delay. I
> > > assume Jason’s earlier -1 on attempt 1 was an enforcement of that earlier
> > > stated goal.
> > >
> > > It’s up to the PMC. We’ve said in the past that we don’t release without
> > > green tests. The PMC gets to vote and enforce it. If you don’t vote yes
> > > without seeing the test results, that enforces it.
> > >
> > > --
> > > Jeff Jirsa
> > >
> > >
> > > > On Feb 15, 2018, at 9:49 AM, Josh McKenzie <jm...@apache.org>
> > wrote:
> > > >
> > > > What would it take for us to get green utest/dtests as a blocking part
> > of
> > > > the release process? i.e. "for any given SHA, here's a link to the
> > tests
> > > > that passed" in the release vote email?
> > > >
> > > > That being said, +1.
> > > >
> > > >> On Wed, Feb 14, 2018 at 4:33 PM, Nate McCall <zz...@gmail.com>
> > > wrote:
> > > >>
> > > >> +1
> > > >>
> > > >> On Thu, Feb 15, 2018 at 9:40 AM, Michael Shuler <
> > michael@pbandjelly.org
> > > >
> > > >> wrote:
> > > >>> I propose the following artifacts for release as 3.0.16.
> > > >>>
> > > >>> sha1: 890f319142ddd3cf2692ff45ff28e71001365e96
> > > >>> Git:
> > > >>> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> > > >> shortlog;h=refs/tags/3.0.16-tentative
> > > >>> Artifacts:
> > > >>> https://repository.apache.org/content/repositories/
> > > >> orgapachecassandra-1157/org/apache/cassandra/apache-cassandra/3.0.16/
> > > >>> Staging repository:
> > > >>> https://repository.apache.org/content/repositories/
> > > >> orgapachecassandra-1157/
> > > >>>
> > > >>> Debian and RPM packages are available here:
> > > >>> http://people.apache.org/~mshuler
> > > >>>
> > > >>> *** This release addresses an important fix for CASSANDRA-14092 ***
> > > >>>    "Max ttl of 20 years will overflow localDeletionTime"
> > > >>>    https://issues.apache.org/jira/browse/CASSANDRA-14092
> > > >>>
> > > >>> The vote will be open for 72 hours (longer if needed).
> > > >>>
> > > >>> [1]: (CHANGES.txt) https://goo.gl/rLj59Z
> > > >>> [2]: (NEWS.txt) https://goo.gl/EkrT4G
> > > >>>
> > > >>> ------------------------------------------------------------
> > ---------
> > > >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >>>
> > > >>
> > > >> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >>
> > > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> > >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Release votes

Posted by kurt greaves <ku...@instaclustr.com>.

It seems there has been a bit of a slip in testing as of recently, mostly
due to the fact that there's no canonical testing environment that isn't
flaky. We probably need to come up with some ideas and a plan on how we're
going to do testing in the future, and how we're going to make testing
accessible for all contributors. I think this is the only way we're really
going to change behaviour. Having an incredibly tedious process and then
being aggressive about it only leads to resentment and workarounds.

I'm completely unsure of where dtests are at since the conversion to
pytest, and there's a lot of failing dtests on the ASF jenkins jobs (which
appear to be running pytest). As there's currently not a lot of visibility
into what people are doing with CircleCI for this it's hard to say if
things are better over there. I'd like to help here if anyone wants to fill
me in.

On 15 February 2018 at 21:14, Josh McKenzie <jm...@apache.org> wrote:

> >
> > We’ve said in the past that we don’t release without green tests. The PMC
> > gets to vote and enforce it. If you don’t vote yes without seeing the
> test
> > results, that enforces it.
>
> I think this is noble and ideal in theory. In practice, the tests take long
> enough, hardware infra has proven flaky enough, and the tests *themselves*
> flaky enough, that there's been a consistent low-level of test failure
> noise that makes separating signal from noise in this context very time
> consuming. Reference 3.11-test-all for example re:noise:
> https://builds.apache.org/view/A-D/view/Cassandra/job/
> Cassandra-3.11-test-all/test/?width=1024&height=768
>
> Having spearheaded burning test failures to 0 multiple times and have them
> regress over time, my gut intuition is we should have one person as our
> Source of Truth with a less-flaky source for release-vetting CI (dedicated
> hardware, circle account, etc) we can use as a reference to vote on release
> SHA's.
>
> We’ve declared this a requirement multiple times
>
> Declaring things != changed behavior, and thus != changed culture. The
> culture on this project is one of having a constant low level of test
> failure noise in our CI as a product of our working processes. Unless we
> change those (actually block release w/out green board, actually
> aggressively block merge w/any failing tests, aggressively retroactively
> track down test failures on a daily basis and RCA), the situation won't
> improve. Given that this is a volunteer organization / project, that kind
> of daily time investment is a big ask.
>
> On Thu, Feb 15, 2018 at 1:10 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>
> > Moving this to it’s own thread:
> >
> > We’ve declared this a requirement multiple times and then we occasionally
> > get a critical issue and have to decide whether it’s worth the delay. I
> > assume Jason’s earlier -1 on attempt 1 was an enforcement of that earlier
> > stated goal.
> >
> > It’s up to the PMC. We’ve said in the past that we don’t release without
> > green tests. The PMC gets to vote and enforce it. If you don’t vote yes
> > without seeing the test results, that enforces it.
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On Feb 15, 2018, at 9:49 AM, Josh McKenzie <jm...@apache.org>
> wrote:
> > >
> > > What would it take for us to get green utest/dtests as a blocking part
> of
> > > the release process? i.e. "for any given SHA, here's a link to the
> tests
> > > that passed" in the release vote email?
> > >
> > > That being said, +1.
> > >
> > >> On Wed, Feb 14, 2018 at 4:33 PM, Nate McCall <zz...@gmail.com>
> > wrote:
> > >>
> > >> +1
> > >>
> > >> On Thu, Feb 15, 2018 at 9:40 AM, Michael Shuler <
> michael@pbandjelly.org
> > >
> > >> wrote:
> > >>> I propose the following artifacts for release as 3.0.16.
> > >>>
> > >>> sha1: 890f319142ddd3cf2692ff45ff28e71001365e96
> > >>> Git:
> > >>> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> > >> shortlog;h=refs/tags/3.0.16-tentative
> > >>> Artifacts:
> > >>> https://repository.apache.org/content/repositories/
> > >> orgapachecassandra-1157/org/apache/cassandra/apache-cassandra/3.0.16/
> > >>> Staging repository:
> > >>> https://repository.apache.org/content/repositories/
> > >> orgapachecassandra-1157/
> > >>>
> > >>> Debian and RPM packages are available here:
> > >>> http://people.apache.org/~mshuler
> > >>>
> > >>> *** This release addresses an important fix for CASSANDRA-14092 ***
> > >>>    "Max ttl of 20 years will overflow localDeletionTime"
> > >>>    https://issues.apache.org/jira/browse/CASSANDRA-14092
> > >>>
> > >>> The vote will be open for 72 hours (longer if needed).
> > >>>
> > >>> [1]: (CHANGES.txt) https://goo.gl/rLj59Z
> > >>> [2]: (NEWS.txt) https://goo.gl/EkrT4G
> > >>>
> > >>> ------------------------------------------------------------
> ---------
> > >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>

Re: Release votes

Posted by Josh McKenzie <jm...@apache.org>.

>
> We’ve said in the past that we don’t release without green tests. The PMC
> gets to vote and enforce it. If you don’t vote yes without seeing the test
> results, that enforces it.

I think this is noble and ideal in theory. In practice, the tests take long
enough, hardware infra has proven flaky enough, and the tests *themselves*
flaky enough, that there's been a consistent low-level of test failure
noise that makes separating signal from noise in this context very time
consuming. Reference 3.11-test-all for example re:noise:
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-3.11-test-all/test/?width=1024&height=768

Having spearheaded burning test failures to 0 multiple times and have them
regress over time, my gut intuition is we should have one person as our
Source of Truth with a less-flaky source for release-vetting CI (dedicated
hardware, circle account, etc) we can use as a reference to vote on release
SHA's.

We’ve declared this a requirement multiple times

Declaring things != changed behavior, and thus != changed culture. The
culture on this project is one of having a constant low level of test
failure noise in our CI as a product of our working processes. Unless we
change those (actually block release w/out green board, actually
aggressively block merge w/any failing tests, aggressively retroactively
track down test failures on a daily basis and RCA), the situation won't
improve. Given that this is a volunteer organization / project, that kind
of daily time investment is a big ask.

On Thu, Feb 15, 2018 at 1:10 PM, Jeff Jirsa <jj...@gmail.com> wrote:

> Moving this to it’s own thread:
>
> We’ve declared this a requirement multiple times and then we occasionally
> get a critical issue and have to decide whether it’s worth the delay. I
> assume Jason’s earlier -1 on attempt 1 was an enforcement of that earlier
> stated goal.
>
> It’s up to the PMC. We’ve said in the past that we don’t release without
> green tests. The PMC gets to vote and enforce it. If you don’t vote yes
> without seeing the test results, that enforces it.
>
> --
> Jeff Jirsa
>
>
> > On Feb 15, 2018, at 9:49 AM, Josh McKenzie <jm...@apache.org> wrote:
> >
> > What would it take for us to get green utest/dtests as a blocking part of
> > the release process? i.e. "for any given SHA, here's a link to the tests
> > that passed" in the release vote email?
> >
> > That being said, +1.
> >
> >> On Wed, Feb 14, 2018 at 4:33 PM, Nate McCall <zz...@gmail.com>
> wrote:
> >>
> >> +1
> >>
> >> On Thu, Feb 15, 2018 at 9:40 AM, Michael Shuler <michael@pbandjelly.org
> >
> >> wrote:
> >>> I propose the following artifacts for release as 3.0.16.
> >>>
> >>> sha1: 890f319142ddd3cf2692ff45ff28e71001365e96
> >>> Git:
> >>> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> >> shortlog;h=refs/tags/3.0.16-tentative
> >>> Artifacts:
> >>> https://repository.apache.org/content/repositories/
> >> orgapachecassandra-1157/org/apache/cassandra/apache-cassandra/3.0.16/
> >>> Staging repository:
> >>> https://repository.apache.org/content/repositories/
> >> orgapachecassandra-1157/
> >>>
> >>> Debian and RPM packages are available here:
> >>> http://people.apache.org/~mshuler
> >>>
> >>> *** This release addresses an important fix for CASSANDRA-14092 ***
> >>>    "Max ttl of 20 years will overflow localDeletionTime"
> >>>    https://issues.apache.org/jira/browse/CASSANDRA-14092
> >>>
> >>> The vote will be open for 72 hours (longer if needed).
> >>>
> >>> [1]: (CHANGES.txt) https://goo.gl/rLj59Z
> >>> [2]: (NEWS.txt) https://goo.gl/EkrT4G
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>