You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Joshua McKenzie <jm...@apache.org> on 2021/10/30 12:59:57 UTC

[DISCUSS] Releasable trunk and quality

We as a project have gone back and forth on the topic of quality and the
notion of a releasable trunk for quite a few years. If people are
interested, I'd like to rekindle this discussion a bit and see if we're
happy with where we are as a project or if we think there's steps we should
take to change the quality bar going forward. The following questions have
been rattling around for me for awhile:

1. How do we define what "releasable trunk" means? All reviewed by M
committers? Passing N% of tests? Passing all tests plus some other metrics
(manual testing, raising the number of reviewers, test coverage, usage in
dev or QA environments, etc)? Something else entirely?

2. With a definition settled upon in #1, what steps, if any, do we need to
take to get from where we are to having *and keeping* that releasable
trunk? Anything to codify there?

3. What are the benefits of having a releasable trunk as defined here? What
are the costs? Is it worth pursuing? What are the alternatives (for
instance: a freeze before a release + stabilization focus by the community
i.e. 4.0 push or the tock in tick-tock)?

Given the large volumes of work coming down the pike with CEP's, this seems
like a good time to at least check in on this topic as a community.

Full disclosure: running face-first into 60+ failing tests on trunk when
going through the commit process for denylisting this week brought this
topic back up for me (reminds me of when I went to merge CDC back in 3.6
and those test failures riled me up... I sense a pattern ;))

Looking forward to hearing what people think.

~Josh

Re: [DISCUSS] Releasable trunk and quality

Posted by Ekaterina Dimitrova <e....@gmail.com>.

Ok seems I was wrong and messed up the mails in my mailbox. Please ignore
my previous email

On Mon, 6 Dec 2021 at 18:01, Ekaterina Dimitrova <e....@gmail.com>
wrote:

>
> I think the script discussion is on a different thread and attached
> document which I am also about to address soon :-)
>
> On Mon, 6 Dec 2021 at 17:59, benedict@apache.org <be...@apache.org>
> wrote:
>
>> Is there a reason we discounted modifying the merge strategy?
>>
>> I’m just a little wary of relying on scripts for consistency of behaviour
>> here. Environments differ, and it would be far preferable for consistency
>> of behaviour to rely on shared infrastructure if possible. I would probably
>> be against mandating these scripts, at least.
>>
>> From: Joshua McKenzie <jm...@apache.org>
>> Date: Monday, 6 December 2021 at 22:20
>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>> Subject: Re: [DISCUSS] Releasable trunk and quality
>> As I work through the scripting on this, I don't know if we've documented
>> or clarified the following (don't see it here:
>> https://cassandra.apache.org/_/development/testing.html):
>>
>> Pre-commit test suites:
>> * Which JDK's?
>> * When to include all python tests or do JVM only (if ever)?
>> * When to run upgrade tests?
>> * What to do if a test is also failing on the reference root (i.e. trunk,
>> cassandra-4.0, etc)?
>> * What to do if a test fails intermittently?
>>
>> I'll also update the above linked documentation once we hammer this out
>> and
>> try and bake it into the scripting flow as much as possible as well. Goal
>> is to make it easy to do the right thing and hard to do the wrong thing,
>> and to have these things written down rather than have it be tribal
>> knowledge that varies a lot across the project.
>>
>> ~Josh
>>
>> On Sat, Dec 4, 2021 at 9:04 AM Joshua McKenzie <jm...@apache.org>
>> wrote:
>>
>> > After some offline collab, here's where this thread has landed on a
>> > proposal to change our processes to incrementally improve our processes
>> and
>> > hopefully stabilize the state of CI longer term:
>> >
>> > Link:
>> >
>> https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.16oxqq30bby4
>> >
>> > Hopefully the mail server doesn't butcher formatting; if it does, hit up
>> > the gdoc and leave comments there as should be open to all.
>> >
>> > Phase 1:
>> > Document merge criteria; update circle jobs to have a simple pre-merge
>> job
>> > (one for each JDK profile)
>> >      * Donate, document, and formalize usage of circleci-enable.py in
>> ASF
>> > repo (need new commit scripts / dev tooling section?)
>> >         * rewrites circle config jobs to simple clear flow
>> >         * ability to toggle between "run on push" or "click to run"
>> >         * Variety of other functionality; see below
>> > Document (site, help, README.md) and automate via scripting the
>> > relationship / dev / release process around:
>> >     * In-jvm dtest
>> >     * dtest
>> >     * ccm
>> > Integrate and document usage of script to build CI repeat test runs
>> >     * circleci-enable.py --repeat-unit org.apache.cassandra.SomeTest
>> >     * Document “Do this if you add or change tests”
>> > Introduce “Build Lead” role
>> >     * Weekly rotation; volunteer
>> >     * 1: Make sure JIRAs exist for test failures
>> >     * 2: Attempt to triage new test failures to root cause and assign
>> out
>> >     * 3: Coordinate and drive to green board on trunk
>> > Change and automate process for *trunk only* patches:
>> >     * Block on green CI (from merge criteria in CI above; potentially
>> > stricter definition of "clean" for trunk CI)
>> >     * Consider using github PR’s to merge (TODO: determine how to handle
>> > circle + CHANGES; see below)
>> > Automate process for *multi-branch* merges
>> >     * Harden / contribute / document dcapwell script (has one which does
>> > the following):
>> >         * rebases your branch to the latest (if on 3.0 then rebase
>> against
>> > cassandra-3.0)
>> >         * check compiles
>> >         * removes all changes to .circle (can opt-out for circleci
>> patches)
>> >         * removes all changes to CHANGES.txt and leverages JIRA for the
>> > content
>> >         * checks code still compiles
>> >         * changes circle to run ci
>> >         * push to a temp branch in git and run CI (circle + Jenkins)
>> >             * when all branches are clean (waiting step is manual)
>> >             * TODO: Define “clean”
>> >                 * No new test failures compared to reference?
>> >                 * Or no test failures at all?
>> >             * merge changes into the actual branches
>> >             * merge up changes; rewriting diff
>> >             * push --atomic
>> >
>> > Transition to phase 2 when:
>> >     * All items from phase 1 are complete
>> >     * Test boards for supported branches are green
>> >
>> > Phase 2:
>> > * Add Harry to recurring run against trunk
>> > * Add Harry to release pipeline
>> > * Suite of perf tests against trunk recurring
>> >
>> >
>> >
>> > On Wed, Nov 17, 2021 at 1:42 PM Joshua McKenzie <jm...@apache.org>
>> > wrote:
>> >
>> >> Sorry for not catching that Benedict, you're absolutely right. So long
>> as
>> >> we're using merge commits between branches I don't think auto-merging
>> via
>> >> train or blocking on green CI are options via the tooling, and
>> multi-branch
>> >> reverts will be something we should document very clearly should we
>> even
>> >> choose to go that route (a lot of room to make mistakes there).
>> >>
>> >> It may not be a huge issue as we can expect the more disruptive changes
>> >> (i.e. potentially destabilizing) to be happening on trunk only, so
>> perhaps
>> >> we can get away with slightly different workflows or policies based on
>> >> whether you're doing a multi-branch bugfix or a feature on trunk. Bears
>> >> thinking more deeply about.
>> >>
>> >> I'd also be game for revisiting our merge strategy. I don't see much
>> >> difference in labor between merging between branches vs. preparing
>> separate
>> >> patches for an individual developer, however I'm sure there's
>> maintenance
>> >> and integration implications there I'm not thinking of right now.
>> >>
>> >> On Wed, Nov 17, 2021 at 12:03 PM benedict@apache.org <
>> benedict@apache.org>
>> >> wrote:
>> >>
>> >>> I raised this before, but to highlight it again: how do these
>> approaches
>> >>> interface with our merge strategy?
>> >>>
>> >>> We might have to rebase several dependent merge commits and want to
>> >>> merge them atomically. So far as I know these tools don’t work
>> >>> fantastically in this scenario, but if I’m wrong that’s fantastic. If
>> not,
>> >>> given how important these things are, should we consider revisiting
>> our
>> >>> merge strategy?
>> >>>
>> >>> From: Joshua McKenzie <jm...@apache.org>
>> >>> Date: Wednesday, 17 November 2021 at 16:39
>> >>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>> >>> Subject: Re: [DISCUSS] Releasable trunk and quality
>> >>> Thanks for the feedback and insight Henrik; it's valuable to hear how
>> >>> other
>> >>> large complex infra projects have tackled this problem set.
>> >>>
>> >>> To attempt to summarize, what I got from your email:
>> >>> [Phase one]
>> >>> 1) Build Barons: rotation where there's always someone active tying
>> >>> failures to changes and adding those failures to our ticketing system
>> >>> 2) Best effort process of "test breakers" being assigned tickets to
>> fix
>> >>> the
>> >>> things their work broke
>> >>> 3) Moving to a culture where we regularly revert commits that break
>> tests
>> >>> 4) Running tests before we merge changes
>> >>>
>> >>> [Phase two]
>> >>> 1) Suite of performance tests on a regular cadence against trunk
>> >>> (w/hunter
>> >>> or otherwise)
>> >>> 2) Integration w/ github merge-train pipelines
>> >>>
>> >>> That cover the highlights? I agree with these points as useful places
>> for
>> >>> us to invest in as a project and I'll work on getting this into a gdoc
>> >>> for
>> >>> us to align on and discuss further this week.
>> >>>
>> >>> ~Josh
>> >>>
>> >>>
>> >>> On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo <
>> henrik.ingo@datastax.com>
>> >>> wrote:
>> >>>
>> >>> > There's an old joke: How many people read Slashdot? The answer is 5.
>> >>> The
>> >>> > rest of us just write comments without reading... In that spirit, I
>> >>> wanted
>> >>> > to share some thoughts in response to your question, even if I know
>> >>> some of
>> >>> > it will have been said in this thread already :-)
>> >>> >
>> >>> > Basically, I just want to share what has worked well in my past
>> >>> projects...
>> >>> >
>> >>> > Visualization: Now that we have Butler running, we can already see a
>> >>> > decline in failing tests for 4.0 and trunk! This shows that
>> >>> contributors
>> >>> > want to do the right thing, we just need the right tools and
>> processes
>> >>> to
>> >>> > achieve success.
>> >>> >
>> >>> > Process: I'm confident we will soon be back to seeing 0 failures for
>> >>> 4.0
>> >>> > and trunk. However, keeping that state requires constant vigilance!
>> At
>> >>> > Mongodb we had a role called Build Baron (aka Build Cop, etc...).
>> This
>> >>> is a
>> >>> > weekly rotating role where the person who is the Build Baron will at
>> >>> least
>> >>> > once per day go through all of the Butler dashboards to catch new
>> >>> > regressions early. We have used the same process also at Datastax to
>> >>> guard
>> >>> > our downstream fork of Cassandra 4.0. It's the responsibility of the
>> >>> Build
>> >>> > Baron to
>> >>> >  - file a jira ticket for new failures
>> >>> >  - determine which commit is responsible for introducing the
>> >>> regression.
>> >>> > Sometimes this is obvious, sometimes this requires "bisecting" by
>> >>> running
>> >>> > more builds e.g. between two nightly builds.
>> >>> >  - assign the jira ticket to the author of the commit that
>> introduced
>> >>> the
>> >>> > regression
>> >>> >
>> >>> > Given that Cassandra is a community that includes part time and
>> >>> volunteer
>> >>> > developers, we may want to try some variation of this, such as
>> pairing
>> >>> 2
>> >>> > build barons each week?
>> >>> >
>> >>> > Reverting: A policy that the commit causing the regression is
>> >>> automatically
>> >>> > reverted can be scary. It takes courage to be the junior test
>> engineer
>> >>> who
>> >>> > reverts yesterday's commit from the founder and CTO, just to give an
>> >>> > example... Yet this is the most efficient way to keep the build
>> green.
>> >>> And
>> >>> > it turns out it's not that much additional work for the original
>> >>> author to
>> >>> > fix the issue and then re-merge the patch.
>> >>> >
>> >>> > Merge-train: For any project with more than 1 commit per day, it
>> will
>> >>> > inevitably happen that you need to rebase a PR before merging, and
>> >>> even if
>> >>> > it passed all tests before, after rebase it won't. In the downstream
>> >>> > Cassandra fork previously mentioned, we have tried to enable a
>> github
>> >>> rule
>> >>> > which requires a) that all tests passed before merging, and b) the
>> PR
>> >>> is
>> >>> > against the head of the branch merged into, and c) the tests were
>> run
>> >>> after
>> >>> > such rebase. Unfortunately this leads to infinite loops where a
>> large
>> >>> PR
>> >>> > may never be able to commit because it has to be rebased again and
>> >>> again
>> >>> > when smaller PRs can merge faster. The solution to this problem is
>> to
>> >>> have
>> >>> > an automated process for the rebase-test-merge cycle. Gitlab
>> supports
>> >>> such
>> >>> > a feature and calls it merge-trean:
>> >>> > https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html
>> >>> >
>> >>> > The merge-train can be considered an advanced feature and we can
>> >>> return to
>> >>> > it later. The other points should be sufficient to keep a reasonably
>> >>> green
>> >>> > trunk.
>> >>> >
>> >>> > I guess the major area where we can improve daily test coverage
>> would
>> >>> be
>> >>> > performance tests. To that end we recently open sourced a nice tool
>> >>> that
>> >>> > can algorithmically detects performance regressions in a timeseries
>> >>> history
>> >>> > of benchmark results: https://github.com/datastax-labs/hunter Just
>> >>> like
>> >>> > with correctness testing it's my experience that catching
>> regressions
>> >>> the
>> >>> > day they happened is much better than trying to do it at beta or rc
>> >>> time.
>> >>> >
>> >>> > Piotr also blogged about Hunter when it was released:
>> >>> >
>> >>> >
>> >>>
>> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4
>> >>> >
>> >>> > henrik
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <
>> jmckenzie@apache.org>
>> >>> > wrote:
>> >>> >
>> >>> > > We as a project have gone back and forth on the topic of quality
>> and
>> >>> the
>> >>> > > notion of a releasable trunk for quite a few years. If people are
>> >>> > > interested, I'd like to rekindle this discussion a bit and see if
>> >>> we're
>> >>> > > happy with where we are as a project or if we think there's steps
>> we
>> >>> > should
>> >>> > > take to change the quality bar going forward. The following
>> questions
>> >>> > have
>> >>> > > been rattling around for me for awhile:
>> >>> > >
>> >>> > > 1. How do we define what "releasable trunk" means? All reviewed
>> by M
>> >>> > > committers? Passing N% of tests? Passing all tests plus some other
>> >>> > metrics
>> >>> > > (manual testing, raising the number of reviewers, test coverage,
>> >>> usage in
>> >>> > > dev or QA environments, etc)? Something else entirely?
>> >>> > >
>> >>> > > 2. With a definition settled upon in #1, what steps, if any, do we
>> >>> need
>> >>> > to
>> >>> > > take to get from where we are to having *and keeping* that
>> releasable
>> >>> > > trunk? Anything to codify there?
>> >>> > >
>> >>> > > 3. What are the benefits of having a releasable trunk as defined
>> >>> here?
>> >>> > What
>> >>> > > are the costs? Is it worth pursuing? What are the alternatives
>> (for
>> >>> > > instance: a freeze before a release + stabilization focus by the
>> >>> > community
>> >>> > > i.e. 4.0 push or the tock in tick-tock)?
>> >>> > >
>> >>> > > Given the large volumes of work coming down the pike with CEP's,
>> this
>> >>> > seems
>> >>> > > like a good time to at least check in on this topic as a
>> community.
>> >>> > >
>> >>> > > Full disclosure: running face-first into 60+ failing tests on
>> trunk
>> >>> when
>> >>> > > going through the commit process for denylisting this week brought
>> >>> this
>> >>> > > topic back up for me (reminds me of when I went to merge CDC back
>> in
>> >>> 3.6
>> >>> > > and those test failures riled me up... I sense a pattern ;))
>> >>> > >
>> >>> > > Looking forward to hearing what people think.
>> >>> > >
>> >>> > > ~Josh
>> >>> > >
>> >>> >
>> >>> >
>> >>> > --
>> >>> >
>> >>> > Henrik Ingo
>> >>> >
>> >>> > +358 40 569 7354 <358405697354>
>> >>> >
>> >>> > [image: Visit us online.] <https://www.datastax.com/>  [image:
>> Visit
>> >>> us on
>> >>> > Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
>> >>> YouTube.]
>> >>> > <
>> >>> >
>> >>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=
>> >>> > >
>> >>> >   [image: Visit my LinkedIn profile.] <
>> >>> https://www.linkedin.com/in/heingo/
>> >>> > >
>> >>> >
>> >>>
>> >>
>>
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Ekaterina Dimitrova <e....@gmail.com>.

I think the script discussion is on a different thread and attached
document which I am also about to address soon :-)

On Mon, 6 Dec 2021 at 17:59, benedict@apache.org <be...@apache.org>
wrote:

> Is there a reason we discounted modifying the merge strategy?
>
> I’m just a little wary of relying on scripts for consistency of behaviour
> here. Environments differ, and it would be far preferable for consistency
> of behaviour to rely on shared infrastructure if possible. I would probably
> be against mandating these scripts, at least.
>
> From: Joshua McKenzie <jm...@apache.org>
> Date: Monday, 6 December 2021 at 22:20
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: [DISCUSS] Releasable trunk and quality
> As I work through the scripting on this, I don't know if we've documented
> or clarified the following (don't see it here:
> https://cassandra.apache.org/_/development/testing.html):
>
> Pre-commit test suites:
> * Which JDK's?
> * When to include all python tests or do JVM only (if ever)?
> * When to run upgrade tests?
> * What to do if a test is also failing on the reference root (i.e. trunk,
> cassandra-4.0, etc)?
> * What to do if a test fails intermittently?
>
> I'll also update the above linked documentation once we hammer this out and
> try and bake it into the scripting flow as much as possible as well. Goal
> is to make it easy to do the right thing and hard to do the wrong thing,
> and to have these things written down rather than have it be tribal
> knowledge that varies a lot across the project.
>
> ~Josh
>
> On Sat, Dec 4, 2021 at 9:04 AM Joshua McKenzie <jm...@apache.org>
> wrote:
>
> > After some offline collab, here's where this thread has landed on a
> > proposal to change our processes to incrementally improve our processes
> and
> > hopefully stabilize the state of CI longer term:
> >
> > Link:
> >
> https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.16oxqq30bby4
> >
> > Hopefully the mail server doesn't butcher formatting; if it does, hit up
> > the gdoc and leave comments there as should be open to all.
> >
> > Phase 1:
> > Document merge criteria; update circle jobs to have a simple pre-merge
> job
> > (one for each JDK profile)
> >      * Donate, document, and formalize usage of circleci-enable.py in ASF
> > repo (need new commit scripts / dev tooling section?)
> >         * rewrites circle config jobs to simple clear flow
> >         * ability to toggle between "run on push" or "click to run"
> >         * Variety of other functionality; see below
> > Document (site, help, README.md) and automate via scripting the
> > relationship / dev / release process around:
> >     * In-jvm dtest
> >     * dtest
> >     * ccm
> > Integrate and document usage of script to build CI repeat test runs
> >     * circleci-enable.py --repeat-unit org.apache.cassandra.SomeTest
> >     * Document “Do this if you add or change tests”
> > Introduce “Build Lead” role
> >     * Weekly rotation; volunteer
> >     * 1: Make sure JIRAs exist for test failures
> >     * 2: Attempt to triage new test failures to root cause and assign out
> >     * 3: Coordinate and drive to green board on trunk
> > Change and automate process for *trunk only* patches:
> >     * Block on green CI (from merge criteria in CI above; potentially
> > stricter definition of "clean" for trunk CI)
> >     * Consider using github PR’s to merge (TODO: determine how to handle
> > circle + CHANGES; see below)
> > Automate process for *multi-branch* merges
> >     * Harden / contribute / document dcapwell script (has one which does
> > the following):
> >         * rebases your branch to the latest (if on 3.0 then rebase
> against
> > cassandra-3.0)
> >         * check compiles
> >         * removes all changes to .circle (can opt-out for circleci
> patches)
> >         * removes all changes to CHANGES.txt and leverages JIRA for the
> > content
> >         * checks code still compiles
> >         * changes circle to run ci
> >         * push to a temp branch in git and run CI (circle + Jenkins)
> >             * when all branches are clean (waiting step is manual)
> >             * TODO: Define “clean”
> >                 * No new test failures compared to reference?
> >                 * Or no test failures at all?
> >             * merge changes into the actual branches
> >             * merge up changes; rewriting diff
> >             * push --atomic
> >
> > Transition to phase 2 when:
> >     * All items from phase 1 are complete
> >     * Test boards for supported branches are green
> >
> > Phase 2:
> > * Add Harry to recurring run against trunk
> > * Add Harry to release pipeline
> > * Suite of perf tests against trunk recurring
> >
> >
> >
> > On Wed, Nov 17, 2021 at 1:42 PM Joshua McKenzie <jm...@apache.org>
> > wrote:
> >
> >> Sorry for not catching that Benedict, you're absolutely right. So long
> as
> >> we're using merge commits between branches I don't think auto-merging
> via
> >> train or blocking on green CI are options via the tooling, and
> multi-branch
> >> reverts will be something we should document very clearly should we even
> >> choose to go that route (a lot of room to make mistakes there).
> >>
> >> It may not be a huge issue as we can expect the more disruptive changes
> >> (i.e. potentially destabilizing) to be happening on trunk only, so
> perhaps
> >> we can get away with slightly different workflows or policies based on
> >> whether you're doing a multi-branch bugfix or a feature on trunk. Bears
> >> thinking more deeply about.
> >>
> >> I'd also be game for revisiting our merge strategy. I don't see much
> >> difference in labor between merging between branches vs. preparing
> separate
> >> patches for an individual developer, however I'm sure there's
> maintenance
> >> and integration implications there I'm not thinking of right now.
> >>
> >> On Wed, Nov 17, 2021 at 12:03 PM benedict@apache.org <
> benedict@apache.org>
> >> wrote:
> >>
> >>> I raised this before, but to highlight it again: how do these
> approaches
> >>> interface with our merge strategy?
> >>>
> >>> We might have to rebase several dependent merge commits and want to
> >>> merge them atomically. So far as I know these tools don’t work
> >>> fantastically in this scenario, but if I’m wrong that’s fantastic. If
> not,
> >>> given how important these things are, should we consider revisiting our
> >>> merge strategy?
> >>>
> >>> From: Joshua McKenzie <jm...@apache.org>
> >>> Date: Wednesday, 17 November 2021 at 16:39
> >>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> >>> Subject: Re: [DISCUSS] Releasable trunk and quality
> >>> Thanks for the feedback and insight Henrik; it's valuable to hear how
> >>> other
> >>> large complex infra projects have tackled this problem set.
> >>>
> >>> To attempt to summarize, what I got from your email:
> >>> [Phase one]
> >>> 1) Build Barons: rotation where there's always someone active tying
> >>> failures to changes and adding those failures to our ticketing system
> >>> 2) Best effort process of "test breakers" being assigned tickets to fix
> >>> the
> >>> things their work broke
> >>> 3) Moving to a culture where we regularly revert commits that break
> tests
> >>> 4) Running tests before we merge changes
> >>>
> >>> [Phase two]
> >>> 1) Suite of performance tests on a regular cadence against trunk
> >>> (w/hunter
> >>> or otherwise)
> >>> 2) Integration w/ github merge-train pipelines
> >>>
> >>> That cover the highlights? I agree with these points as useful places
> for
> >>> us to invest in as a project and I'll work on getting this into a gdoc
> >>> for
> >>> us to align on and discuss further this week.
> >>>
> >>> ~Josh
> >>>
> >>>
> >>> On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo <henrik.ingo@datastax.com
> >
> >>> wrote:
> >>>
> >>> > There's an old joke: How many people read Slashdot? The answer is 5.
> >>> The
> >>> > rest of us just write comments without reading... In that spirit, I
> >>> wanted
> >>> > to share some thoughts in response to your question, even if I know
> >>> some of
> >>> > it will have been said in this thread already :-)
> >>> >
> >>> > Basically, I just want to share what has worked well in my past
> >>> projects...
> >>> >
> >>> > Visualization: Now that we have Butler running, we can already see a
> >>> > decline in failing tests for 4.0 and trunk! This shows that
> >>> contributors
> >>> > want to do the right thing, we just need the right tools and
> processes
> >>> to
> >>> > achieve success.
> >>> >
> >>> > Process: I'm confident we will soon be back to seeing 0 failures for
> >>> 4.0
> >>> > and trunk. However, keeping that state requires constant vigilance!
> At
> >>> > Mongodb we had a role called Build Baron (aka Build Cop, etc...).
> This
> >>> is a
> >>> > weekly rotating role where the person who is the Build Baron will at
> >>> least
> >>> > once per day go through all of the Butler dashboards to catch new
> >>> > regressions early. We have used the same process also at Datastax to
> >>> guard
> >>> > our downstream fork of Cassandra 4.0. It's the responsibility of the
> >>> Build
> >>> > Baron to
> >>> >  - file a jira ticket for new failures
> >>> >  - determine which commit is responsible for introducing the
> >>> regression.
> >>> > Sometimes this is obvious, sometimes this requires "bisecting" by
> >>> running
> >>> > more builds e.g. between two nightly builds.
> >>> >  - assign the jira ticket to the author of the commit that introduced
> >>> the
> >>> > regression
> >>> >
> >>> > Given that Cassandra is a community that includes part time and
> >>> volunteer
> >>> > developers, we may want to try some variation of this, such as
> pairing
> >>> 2
> >>> > build barons each week?
> >>> >
> >>> > Reverting: A policy that the commit causing the regression is
> >>> automatically
> >>> > reverted can be scary. It takes courage to be the junior test
> engineer
> >>> who
> >>> > reverts yesterday's commit from the founder and CTO, just to give an
> >>> > example... Yet this is the most efficient way to keep the build
> green.
> >>> And
> >>> > it turns out it's not that much additional work for the original
> >>> author to
> >>> > fix the issue and then re-merge the patch.
> >>> >
> >>> > Merge-train: For any project with more than 1 commit per day, it will
> >>> > inevitably happen that you need to rebase a PR before merging, and
> >>> even if
> >>> > it passed all tests before, after rebase it won't. In the downstream
> >>> > Cassandra fork previously mentioned, we have tried to enable a github
> >>> rule
> >>> > which requires a) that all tests passed before merging, and b) the PR
> >>> is
> >>> > against the head of the branch merged into, and c) the tests were run
> >>> after
> >>> > such rebase. Unfortunately this leads to infinite loops where a large
> >>> PR
> >>> > may never be able to commit because it has to be rebased again and
> >>> again
> >>> > when smaller PRs can merge faster. The solution to this problem is to
> >>> have
> >>> > an automated process for the rebase-test-merge cycle. Gitlab supports
> >>> such
> >>> > a feature and calls it merge-trean:
> >>> > https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html
> >>> >
> >>> > The merge-train can be considered an advanced feature and we can
> >>> return to
> >>> > it later. The other points should be sufficient to keep a reasonably
> >>> green
> >>> > trunk.
> >>> >
> >>> > I guess the major area where we can improve daily test coverage would
> >>> be
> >>> > performance tests. To that end we recently open sourced a nice tool
> >>> that
> >>> > can algorithmically detects performance regressions in a timeseries
> >>> history
> >>> > of benchmark results: https://github.com/datastax-labs/hunter Just
> >>> like
> >>> > with correctness testing it's my experience that catching regressions
> >>> the
> >>> > day they happened is much better than trying to do it at beta or rc
> >>> time.
> >>> >
> >>> > Piotr also blogged about Hunter when it was released:
> >>> >
> >>> >
> >>>
> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4
> >>> >
> >>> > henrik
> >>> >
> >>> >
> >>> >
> >>> > On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <
> jmckenzie@apache.org>
> >>> > wrote:
> >>> >
> >>> > > We as a project have gone back and forth on the topic of quality
> and
> >>> the
> >>> > > notion of a releasable trunk for quite a few years. If people are
> >>> > > interested, I'd like to rekindle this discussion a bit and see if
> >>> we're
> >>> > > happy with where we are as a project or if we think there's steps
> we
> >>> > should
> >>> > > take to change the quality bar going forward. The following
> questions
> >>> > have
> >>> > > been rattling around for me for awhile:
> >>> > >
> >>> > > 1. How do we define what "releasable trunk" means? All reviewed by
> M
> >>> > > committers? Passing N% of tests? Passing all tests plus some other
> >>> > metrics
> >>> > > (manual testing, raising the number of reviewers, test coverage,
> >>> usage in
> >>> > > dev or QA environments, etc)? Something else entirely?
> >>> > >
> >>> > > 2. With a definition settled upon in #1, what steps, if any, do we
> >>> need
> >>> > to
> >>> > > take to get from where we are to having *and keeping* that
> releasable
> >>> > > trunk? Anything to codify there?
> >>> > >
> >>> > > 3. What are the benefits of having a releasable trunk as defined
> >>> here?
> >>> > What
> >>> > > are the costs? Is it worth pursuing? What are the alternatives (for
> >>> > > instance: a freeze before a release + stabilization focus by the
> >>> > community
> >>> > > i.e. 4.0 push or the tock in tick-tock)?
> >>> > >
> >>> > > Given the large volumes of work coming down the pike with CEP's,
> this
> >>> > seems
> >>> > > like a good time to at least check in on this topic as a community.
> >>> > >
> >>> > > Full disclosure: running face-first into 60+ failing tests on trunk
> >>> when
> >>> > > going through the commit process for denylisting this week brought
> >>> this
> >>> > > topic back up for me (reminds me of when I went to merge CDC back
> in
> >>> 3.6
> >>> > > and those test failures riled me up... I sense a pattern ;))
> >>> > >
> >>> > > Looking forward to hearing what people think.
> >>> > >
> >>> > > ~Josh
> >>> > >
> >>> >
> >>> >
> >>> > --
> >>> >
> >>> > Henrik Ingo
> >>> >
> >>> > +358 40 569 7354 <358405697354>
> >>> >
> >>> > [image: Visit us online.] <https://www.datastax.com/>  [image: Visit
> >>> us on
> >>> > Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
> >>> YouTube.]
> >>> > <
> >>> >
> >>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=
> >>> > >
> >>> >   [image: Visit my LinkedIn profile.] <
> >>> https://www.linkedin.com/in/heingo/
> >>> > >
> >>> >
> >>>
> >>
>

Re: [DISCUSS] Releasable trunk and quality

Posted by "benedict@apache.org" <be...@apache.org>.

Is there a reason we discounted modifying the merge strategy?

I’m just a little wary of relying on scripts for consistency of behaviour here. Environments differ, and it would be far preferable for consistency of behaviour to rely on shared infrastructure if possible. I would probably be against mandating these scripts, at least.

From: Joshua McKenzie <jm...@apache.org>
Date: Monday, 6 December 2021 at 22:20
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] Releasable trunk and quality
As I work through the scripting on this, I don't know if we've documented
or clarified the following (don't see it here:
https://cassandra.apache.org/_/development/testing.html):

Pre-commit test suites:
* Which JDK's?
* When to include all python tests or do JVM only (if ever)?
* When to run upgrade tests?
* What to do if a test is also failing on the reference root (i.e. trunk,
cassandra-4.0, etc)?
* What to do if a test fails intermittently?

I'll also update the above linked documentation once we hammer this out and
try and bake it into the scripting flow as much as possible as well. Goal
is to make it easy to do the right thing and hard to do the wrong thing,
and to have these things written down rather than have it be tribal
knowledge that varies a lot across the project.

~Josh

On Sat, Dec 4, 2021 at 9:04 AM Joshua McKenzie <jm...@apache.org> wrote:

> After some offline collab, here's where this thread has landed on a
> proposal to change our processes to incrementally improve our processes and
> hopefully stabilize the state of CI longer term:
>
> Link:
> https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.16oxqq30bby4
>
> Hopefully the mail server doesn't butcher formatting; if it does, hit up
> the gdoc and leave comments there as should be open to all.
>
> Phase 1:
> Document merge criteria; update circle jobs to have a simple pre-merge job
> (one for each JDK profile)
>      * Donate, document, and formalize usage of circleci-enable.py in ASF
> repo (need new commit scripts / dev tooling section?)
>         * rewrites circle config jobs to simple clear flow
>         * ability to toggle between "run on push" or "click to run"
>         * Variety of other functionality; see below
> Document (site, help, README.md) and automate via scripting the
> relationship / dev / release process around:
>     * In-jvm dtest
>     * dtest
>     * ccm
> Integrate and document usage of script to build CI repeat test runs
>     * circleci-enable.py --repeat-unit org.apache.cassandra.SomeTest
>     * Document “Do this if you add or change tests”
> Introduce “Build Lead” role
>     * Weekly rotation; volunteer
>     * 1: Make sure JIRAs exist for test failures
>     * 2: Attempt to triage new test failures to root cause and assign out
>     * 3: Coordinate and drive to green board on trunk
> Change and automate process for *trunk only* patches:
>     * Block on green CI (from merge criteria in CI above; potentially
> stricter definition of "clean" for trunk CI)
>     * Consider using github PR’s to merge (TODO: determine how to handle
> circle + CHANGES; see below)
> Automate process for *multi-branch* merges
>     * Harden / contribute / document dcapwell script (has one which does
> the following):
>         * rebases your branch to the latest (if on 3.0 then rebase against
> cassandra-3.0)
>         * check compiles
>         * removes all changes to .circle (can opt-out for circleci patches)
>         * removes all changes to CHANGES.txt and leverages JIRA for the
> content
>         * checks code still compiles
>         * changes circle to run ci
>         * push to a temp branch in git and run CI (circle + Jenkins)
>             * when all branches are clean (waiting step is manual)
>             * TODO: Define “clean”
>                 * No new test failures compared to reference?
>                 * Or no test failures at all?
>             * merge changes into the actual branches
>             * merge up changes; rewriting diff
>             * push --atomic
>
> Transition to phase 2 when:
>     * All items from phase 1 are complete
>     * Test boards for supported branches are green
>
> Phase 2:
> * Add Harry to recurring run against trunk
> * Add Harry to release pipeline
> * Suite of perf tests against trunk recurring
>
>
>
> On Wed, Nov 17, 2021 at 1:42 PM Joshua McKenzie <jm...@apache.org>
> wrote:
>
>> Sorry for not catching that Benedict, you're absolutely right. So long as
>> we're using merge commits between branches I don't think auto-merging via
>> train or blocking on green CI are options via the tooling, and multi-branch
>> reverts will be something we should document very clearly should we even
>> choose to go that route (a lot of room to make mistakes there).
>>
>> It may not be a huge issue as we can expect the more disruptive changes
>> (i.e. potentially destabilizing) to be happening on trunk only, so perhaps
>> we can get away with slightly different workflows or policies based on
>> whether you're doing a multi-branch bugfix or a feature on trunk. Bears
>> thinking more deeply about.
>>
>> I'd also be game for revisiting our merge strategy. I don't see much
>> difference in labor between merging between branches vs. preparing separate
>> patches for an individual developer, however I'm sure there's maintenance
>> and integration implications there I'm not thinking of right now.
>>
>> On Wed, Nov 17, 2021 at 12:03 PM benedict@apache.org <be...@apache.org>
>> wrote:
>>
>>> I raised this before, but to highlight it again: how do these approaches
>>> interface with our merge strategy?
>>>
>>> We might have to rebase several dependent merge commits and want to
>>> merge them atomically. So far as I know these tools don’t work
>>> fantastically in this scenario, but if I’m wrong that’s fantastic. If not,
>>> given how important these things are, should we consider revisiting our
>>> merge strategy?
>>>
>>> From: Joshua McKenzie <jm...@apache.org>
>>> Date: Wednesday, 17 November 2021 at 16:39
>>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>>> Subject: Re: [DISCUSS] Releasable trunk and quality
>>> Thanks for the feedback and insight Henrik; it's valuable to hear how
>>> other
>>> large complex infra projects have tackled this problem set.
>>>
>>> To attempt to summarize, what I got from your email:
>>> [Phase one]
>>> 1) Build Barons: rotation where there's always someone active tying
>>> failures to changes and adding those failures to our ticketing system
>>> 2) Best effort process of "test breakers" being assigned tickets to fix
>>> the
>>> things their work broke
>>> 3) Moving to a culture where we regularly revert commits that break tests
>>> 4) Running tests before we merge changes
>>>
>>> [Phase two]
>>> 1) Suite of performance tests on a regular cadence against trunk
>>> (w/hunter
>>> or otherwise)
>>> 2) Integration w/ github merge-train pipelines
>>>
>>> That cover the highlights? I agree with these points as useful places for
>>> us to invest in as a project and I'll work on getting this into a gdoc
>>> for
>>> us to align on and discuss further this week.
>>>
>>> ~Josh
>>>
>>>
>>> On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo <he...@datastax.com>
>>> wrote:
>>>
>>> > There's an old joke: How many people read Slashdot? The answer is 5.
>>> The
>>> > rest of us just write comments without reading... In that spirit, I
>>> wanted
>>> > to share some thoughts in response to your question, even if I know
>>> some of
>>> > it will have been said in this thread already :-)
>>> >
>>> > Basically, I just want to share what has worked well in my past
>>> projects...
>>> >
>>> > Visualization: Now that we have Butler running, we can already see a
>>> > decline in failing tests for 4.0 and trunk! This shows that
>>> contributors
>>> > want to do the right thing, we just need the right tools and processes
>>> to
>>> > achieve success.
>>> >
>>> > Process: I'm confident we will soon be back to seeing 0 failures for
>>> 4.0
>>> > and trunk. However, keeping that state requires constant vigilance! At
>>> > Mongodb we had a role called Build Baron (aka Build Cop, etc...). This
>>> is a
>>> > weekly rotating role where the person who is the Build Baron will at
>>> least
>>> > once per day go through all of the Butler dashboards to catch new
>>> > regressions early. We have used the same process also at Datastax to
>>> guard
>>> > our downstream fork of Cassandra 4.0. It's the responsibility of the
>>> Build
>>> > Baron to
>>> >  - file a jira ticket for new failures
>>> >  - determine which commit is responsible for introducing the
>>> regression.
>>> > Sometimes this is obvious, sometimes this requires "bisecting" by
>>> running
>>> > more builds e.g. between two nightly builds.
>>> >  - assign the jira ticket to the author of the commit that introduced
>>> the
>>> > regression
>>> >
>>> > Given that Cassandra is a community that includes part time and
>>> volunteer
>>> > developers, we may want to try some variation of this, such as pairing
>>> 2
>>> > build barons each week?
>>> >
>>> > Reverting: A policy that the commit causing the regression is
>>> automatically
>>> > reverted can be scary. It takes courage to be the junior test engineer
>>> who
>>> > reverts yesterday's commit from the founder and CTO, just to give an
>>> > example... Yet this is the most efficient way to keep the build green.
>>> And
>>> > it turns out it's not that much additional work for the original
>>> author to
>>> > fix the issue and then re-merge the patch.
>>> >
>>> > Merge-train: For any project with more than 1 commit per day, it will
>>> > inevitably happen that you need to rebase a PR before merging, and
>>> even if
>>> > it passed all tests before, after rebase it won't. In the downstream
>>> > Cassandra fork previously mentioned, we have tried to enable a github
>>> rule
>>> > which requires a) that all tests passed before merging, and b) the PR
>>> is
>>> > against the head of the branch merged into, and c) the tests were run
>>> after
>>> > such rebase. Unfortunately this leads to infinite loops where a large
>>> PR
>>> > may never be able to commit because it has to be rebased again and
>>> again
>>> > when smaller PRs can merge faster. The solution to this problem is to
>>> have
>>> > an automated process for the rebase-test-merge cycle. Gitlab supports
>>> such
>>> > a feature and calls it merge-trean:
>>> > https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html
>>> >
>>> > The merge-train can be considered an advanced feature and we can
>>> return to
>>> > it later. The other points should be sufficient to keep a reasonably
>>> green
>>> > trunk.
>>> >
>>> > I guess the major area where we can improve daily test coverage would
>>> be
>>> > performance tests. To that end we recently open sourced a nice tool
>>> that
>>> > can algorithmically detects performance regressions in a timeseries
>>> history
>>> > of benchmark results: https://github.com/datastax-labs/hunter Just
>>> like
>>> > with correctness testing it's my experience that catching regressions
>>> the
>>> > day they happened is much better than trying to do it at beta or rc
>>> time.
>>> >
>>> > Piotr also blogged about Hunter when it was released:
>>> >
>>> >
>>> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4
>>> >
>>> > henrik
>>> >
>>> >
>>> >
>>> > On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <jm...@apache.org>
>>> > wrote:
>>> >
>>> > > We as a project have gone back and forth on the topic of quality and
>>> the
>>> > > notion of a releasable trunk for quite a few years. If people are
>>> > > interested, I'd like to rekindle this discussion a bit and see if
>>> we're
>>> > > happy with where we are as a project or if we think there's steps we
>>> > should
>>> > > take to change the quality bar going forward. The following questions
>>> > have
>>> > > been rattling around for me for awhile:
>>> > >
>>> > > 1. How do we define what "releasable trunk" means? All reviewed by M
>>> > > committers? Passing N% of tests? Passing all tests plus some other
>>> > metrics
>>> > > (manual testing, raising the number of reviewers, test coverage,
>>> usage in
>>> > > dev or QA environments, etc)? Something else entirely?
>>> > >
>>> > > 2. With a definition settled upon in #1, what steps, if any, do we
>>> need
>>> > to
>>> > > take to get from where we are to having *and keeping* that releasable
>>> > > trunk? Anything to codify there?
>>> > >
>>> > > 3. What are the benefits of having a releasable trunk as defined
>>> here?
>>> > What
>>> > > are the costs? Is it worth pursuing? What are the alternatives (for
>>> > > instance: a freeze before a release + stabilization focus by the
>>> > community
>>> > > i.e. 4.0 push or the tock in tick-tock)?
>>> > >
>>> > > Given the large volumes of work coming down the pike with CEP's, this
>>> > seems
>>> > > like a good time to at least check in on this topic as a community.
>>> > >
>>> > > Full disclosure: running face-first into 60+ failing tests on trunk
>>> when
>>> > > going through the commit process for denylisting this week brought
>>> this
>>> > > topic back up for me (reminds me of when I went to merge CDC back in
>>> 3.6
>>> > > and those test failures riled me up... I sense a pattern ;))
>>> > >
>>> > > Looking forward to hearing what people think.
>>> > >
>>> > > ~Josh
>>> > >
>>> >
>>> >
>>> > --
>>> >
>>> > Henrik Ingo
>>> >
>>> > +358 40 569 7354 <358405697354>
>>> >
>>> > [image: Visit us online.] <https://www.datastax.com/>  [image: Visit
>>> us on
>>> > Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
>>> YouTube.]
>>> > <
>>> >
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=
>>> > >
>>> >   [image: Visit my LinkedIn profile.] <
>>> https://www.linkedin.com/in/heingo/
>>> > >
>>> >
>>>
>>

Re: [DISCUSS] Releasable trunk and quality

Posted by Ekaterina Dimitrova <e....@gmail.com>.

+1 on Mick’s suggestion (nb)


On Thu, 9 Dec 2021 at 14:46, Mick Semb Wever <mc...@apache.org> wrote:

> >
> > So let me pose the question here to the list: is there anyone who would
> > like to advocate for the current merge strategy (apply to oldest LTS,
> merge
> > up, often -s ours w/new patch applied + amend) instead of "apply to trunk
> > and cherry-pick back to LTS"?
>
>
>
> I'm in favour of the current merge strategy.
> I find it cleaner that work is found associated to one sha on the hardest
> branch, and we treat (or should be) CI holistically across branches.
> I can appreciate that github makes some things easier, but I suspect it
> will make other things messier and that there will be other consequences.
>
> My understanding was that we would first introduce such github fancies on
> only commits that are intended for trunk. I am in favour of taking that
> approach, changing our merge strategy can happen later, once we have ironed
> out how the github/CI/stable-trunk is working best for us. I think this
> would also help us understand more about the impacts of changing the merge
> strategy.
>
> I was also under the impression that we are now aiming to be committing
> less to the release branches. That means changing the merge strategy is of
> less importance (and that there is benefit to keeping it as-is). Certainly
> the commits on past branches seems very low at the moment, considering many
> users are on 4.0, and this is a trend we want to continue.
>
> My opinion, let's take this in two steps (try stuff on just trunk first)…
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

>
> limiting this feature to trunk _only_ patches? If so, that seems rather
> weak.

It's definitely a weaker guarantee. On the plus side, if we're doing bugfix
only to all released branches and limiting improvements and new features to
trunk, in theory those will be the more disruptive patches that are more
likely to break CI. Possibly.

On Thu, Dec 9, 2021 at 5:35 PM benedict@apache.org <be...@apache.org>
wrote:

> Does this work with trunk patches that involve other branches as well? I’d
> imagine we have the same problem. Or are we proposing limiting this feature
> to trunk _only_ patches? If so, that seems rather weak.
>
> From: Brandon Williams <dr...@gmail.com>
> Date: Thursday, 9 December 2021 at 20:25
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: [DISCUSS] Releasable trunk and quality
> +1 to trying trunk first.
>
> On Thu, Dec 9, 2021 at 1:52 PM Mick Semb Wever <mc...@apache.org> wrote:
> >
> > >
> > > So let me pose the question here to the list: is there anyone who would
> > > like to advocate for the current merge strategy (apply to oldest LTS,
> merge
> > > up, often -s ours w/new patch applied + amend) instead of "apply to
> trunk
> > > and cherry-pick back to LTS"?
> >
> >
> >
> > I'm in favour of the current merge strategy.
> > I find it cleaner that work is found associated to one sha on the hardest
> > branch, and we treat (or should be) CI holistically across branches.
> > I can appreciate that github makes some things easier, but I suspect it
> > will make other things messier and that there will be other consequences.
> >
> > My understanding was that we would first introduce such github fancies on
> > only commits that are intended for trunk. I am in favour of taking that
> > approach, changing our merge strategy can happen later, once we have
> ironed
> > out how the github/CI/stable-trunk is working best for us. I think this
> > would also help us understand more about the impacts of changing the
> merge
> > strategy.
> >
> > I was also under the impression that we are now aiming to be committing
> > less to the release branches. That means changing the merge strategy is
> of
> > less importance (and that there is benefit to keeping it as-is).
> Certainly
> > the commits on past branches seems very low at the moment, considering
> many
> > users are on 4.0, and this is a trend we want to continue.
> >
> > My opinion, let's take this in two steps (try stuff on just trunk first)…
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

Re: [DISCUSS] Releasable trunk and quality

Posted by "benedict@apache.org" <be...@apache.org>.

Does this work with trunk patches that involve other branches as well? I’d imagine we have the same problem. Or are we proposing limiting this feature to trunk _only_ patches? If so, that seems rather weak.

From: Brandon Williams <dr...@gmail.com>
Date: Thursday, 9 December 2021 at 20:25
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] Releasable trunk and quality
+1 to trying trunk first.

On Thu, Dec 9, 2021 at 1:52 PM Mick Semb Wever <mc...@apache.org> wrote:
>
> >
> > So let me pose the question here to the list: is there anyone who would
> > like to advocate for the current merge strategy (apply to oldest LTS, merge
> > up, often -s ours w/new patch applied + amend) instead of "apply to trunk
> > and cherry-pick back to LTS"?
>
>
>
> I'm in favour of the current merge strategy.
> I find it cleaner that work is found associated to one sha on the hardest
> branch, and we treat (or should be) CI holistically across branches.
> I can appreciate that github makes some things easier, but I suspect it
> will make other things messier and that there will be other consequences.
>
> My understanding was that we would first introduce such github fancies on
> only commits that are intended for trunk. I am in favour of taking that
> approach, changing our merge strategy can happen later, once we have ironed
> out how the github/CI/stable-trunk is working best for us. I think this
> would also help us understand more about the impacts of changing the merge
> strategy.
>
> I was also under the impression that we are now aiming to be committing
> less to the release branches. That means changing the merge strategy is of
> less importance (and that there is benefit to keeping it as-is). Certainly
> the commits on past branches seems very low at the moment, considering many
> users are on 4.0, and this is a trend we want to continue.
>
> My opinion, let's take this in two steps (try stuff on just trunk first)…

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

>
> If you see a merge commit in the history, isn't it normal to presume that
> it will contain the additional change for that branch for the parent commit
> getting merged in?

I've been noodling on this a bit; I think it's a question of degree. When I
see a merge commit my intuitive expectation is "it takes the context of
application of diff over there and moves it over here", not "it claims to
have taken the context from over there but actually only takes it
symbolically (-s ours), and all the changes are manually applied by a
separate motion here".

Applying hand-crafted patches to each branch w/out "merge -s ours" is more
_honest_ in the case where we can't merge between branches. I think the
status quo is largely fine if you have a trivial change and a merge commit
between branches, but I also find trivial patch application uninteresting
and simple enough that either merge strategy is fine. All this said, I am
*definitively* not advocating we have different merge strategies based on
the size of a patch and whether it necessitates the -s ours, more trying to
explore the nuance behind my reaction to the status quo.


On Thu, Jan 6, 2022 at 7:28 AM benedict@apache.org <be...@apache.org>
wrote:

> So, one advantage of merge commits is that review of each branch is
> potentially easier, as the merge commit helps direct the reviewer’s
> attention. However, in my experience most of the focus during review is to
> the main branch anyway. Having separate tickets to track backports, and
> permitting them to occur out of band, could improve the quality of review.
> We can also likely synthesise the merge commits for the purpose of review
> using something like
>
>
>
> git checkout patch-4.0~1
>
> git checkout -b patch-4.0-review
>
> git merge -s ours patch-4.1~1
>
> git merge --no-commit patch-4.1
>
> git checkout patch-4.0 .
>
> git commit
>
>
>
>
>
> *From: *benedict@apache.org <be...@apache.org>
> *Date: *Wednesday, 5 January 2022 at 21:07
> *To: *Mick Semb Wever <mc...@apache.org>
> *Cc: *dev <de...@cassandra.apache.org>
> *Subject: *Re: [DISCUSS] Releasable trunk and quality
>
>
>
> > If you see a merge commit in the history, isn't it normal to presume
> that it will contain the additional change for that branch for the parent
> commit getting merged in?
>
>
>
> Sure, but it is exceptionally non-trivial to treat the work as a single
> diff in any standard UX. In practice it becomes 3 or 4 diffs, none of which
> tell the whole story (and all of which bleed legacy details of earlier
> branches). This is a genuine significant pain point when doing archaeology,
> something I and others do quite frequently, and a large part of why I want
> to see them gone.
>
>
>
> > Folk forget to pull, rebase, then go to push and realise one of their
> patches on a branch needs rebasing and rework. That rework may make them
> reconsider the patch going to the other branches too.
>
>
>
> Conversely, this is exceptionally painful when maintaining branches and
> forks, and I can attest to the pain of maintaining these branches so they
> may be committed atomically to having wasted literal person-weeks in my
> time on the project. I do not recall experiencing a significant benefit in
> return.
>
>
>
> > do i have to start text searching the git history
>
>
>
> Yes. This is so simple as to be a non-issue - surely you must search git
> log semi-regularly? It is a frequent part of the job of developing against
> the project in my experience.
>
>
>
> > Developing patch on hardest branch first, then working on each softer
> branch. I don't know how important this is, but I have found it a useful
> practice that encourages smaller, more precise patches overall.
>
>
>
> I don’t think this strategy determines which branch is first developed
> against. However, if it did, it would seem to me to be a clear mark against
> the current system, which incentivises fully developing against the oldest
> version before forward-porting its entirety. Developing primarily against
> the most recent branch incentivises back-porting more minimal versions of
> the work, once the scope of the work is fully understood.
>
>
>
>
>
>
>

Re: [DISCUSS] Releasable trunk and quality

Posted by "benedict@apache.org" <be...@apache.org>.

So, one advantage of merge commits is that review of each branch is potentially easier, as the merge commit helps direct the reviewer’s attention. However, in my experience most of the focus during review is to the main branch anyway. Having separate tickets to track backports, and permitting them to occur out of band, could improve the quality of review. We can also likely synthesise the merge commits for the purpose of review using something like

git checkout patch-4.0~1
git checkout -b patch-4.0-review
git merge -s ours patch-4.1~1
git merge --no-commit patch-4.1
git checkout patch-4.0 .
git commit


From: benedict@apache.org <be...@apache.org>
Date: Wednesday, 5 January 2022 at 21:07
To: Mick Semb Wever <mc...@apache.org>
Cc: dev <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] Releasable trunk and quality

> If you see a merge commit in the history, isn't it normal to presume that it will contain the additional change for that branch for the parent commit getting merged in?

Sure, but it is exceptionally non-trivial to treat the work as a single diff in any standard UX. In practice it becomes 3 or 4 diffs, none of which tell the whole story (and all of which bleed legacy details of earlier branches). This is a genuine significant pain point when doing archaeology, something I and others do quite frequently, and a large part of why I want to see them gone.

> Folk forget to pull, rebase, then go to push and realise one of their patches on a branch needs rebasing and rework. That rework may make them reconsider the patch going to the other branches too.

Conversely, this is exceptionally painful when maintaining branches and forks, and I can attest to the pain of maintaining these branches so they may be committed atomically to having wasted literal person-weeks in my time on the project. I do not recall experiencing a significant benefit in return.

> do i have to start text searching the git history

Yes. This is so simple as to be a non-issue - surely you must search git log semi-regularly? It is a frequent part of the job of developing against the project in my experience.

> Developing patch on hardest branch first, then working on each softer branch. I don't know how important this is, but I have found it a useful practice that encourages smaller, more precise patches overall.

I don’t think this strategy determines which branch is first developed against. However, if it did, it would seem to me to be a clear mark against the current system, which incentivises fully developing against the oldest version before forward-porting its entirety. Developing primarily against the most recent branch incentivises back-porting more minimal versions of the work, once the scope of the work is fully understood.

Re: [DISCUSS] Releasable trunk and quality

Posted by "benedict@apache.org" <be...@apache.org>.

> If you see a merge commit in the history, isn't it normal to presume that it will contain the additional change for that branch for the parent commit getting merged in?

Sure, but it is exceptionally non-trivial to treat the work as a single diff in any standard UX. In practice it becomes 3 or 4 diffs, none of which tell the whole story (and all of which bleed legacy details of earlier branches). This is a genuine significant pain point when doing archaeology, something I and others do quite frequently, and a large part of why I want to see them gone.

> Folk forget to pull, rebase, then go to push and realise one of their patches on a branch needs rebasing and rework. That rework may make them reconsider the patch going to the other branches too.

Conversely, this is exceptionally painful when maintaining branches and forks, and I can attest to the pain of maintaining these branches so they may be committed atomically to having wasted literal person-weeks in my time on the project. I do not recall experiencing a significant benefit in return.

> do i have to start text searching the git history

Yes. This is so simple as to be a non-issue - surely you must search git log semi-regularly? It is a frequent part of the job of developing against the project in my experience.

> Developing patch on hardest branch first, then working on each softer branch. I don't know how important this is, but I have found it a useful practice that encourages smaller, more precise patches overall.

I don’t think this strategy determines which branch is first developed against. However, if it did, it would seem to me to be a clear mark against the current system, which incentivises fully developing against the oldest version before forward-porting its entirety. Developing primarily against the most recent branch incentivises back-porting more minimal versions of the work, once the scope of the work is fully understood.

Re: [DISCUSS] Releasable trunk and quality

Posted by Henrik Ingo <he...@datastax.com>.

On Wed, Jan 5, 2022 at 10:38 PM Mick Semb Wever <mc...@apache.org> wrote:

> +(?) Is the devil we know
>>
>
> + Bi-directional relationship between patches showing which branches it
> was applied to (and how).  From the original commit or any of the merge
> commits I can see which branches, and where the original commit, was
> applied. (See the mongo example from Henrik, how do I see which other
> branches the trunk commit was committed to? do i have to start text
> searching the git history or going through the ticket system :-(
>

Just to answer the question, I'm obviously not that impacted by how
Cassandra commits happen myself...

Maybe a thing I wouldn't copy from MongoDB is that their commit messages
are often one liners and yes you need to look up the jira ticket to read
what the commit does. A feature of this approach is that the jira ticket
can be edited later. But personally I always hated that the commit itself
didn't contain a description.

Other than that, yes I would grep or otherwise search for the ticket id or
githash, which must be included in the cherry picked commit. It never
occurred to me someone wouldn't like this. Note that the stable branches
eventually only get a handful of patches per month, so even just reading
the git log gives a nice overview. If anything the "merge branch X into Y"
always confuse me as I have no idea what that does.

As for verifying that each branch was patched, if you adhere to committing
first to trunk, then in descending order to each stable branch, you can
just check the oldest branch to verify the chain. Not everyone followed
this style, but I would always append the cherry pick message, so that the
last commit (to the oldest stable branch) would contain the chain of
githashes all the way to trunk.

henrik

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>

Re: [DISCUSS] Releasable trunk and quality

Posted by Mick Semb Wever <mc...@apache.org>.

> - hides changes inside the merge commit
>


This is what all merge commits do. If you see a merge commit in the
history, isn't it normal to presume that it will contain the additional
change for that branch for the parent commit getting merged in?



> - is exposed to race w/other committers across multiple branches requiring
> --atomic
>


This is a positive, IMHO.
Folk forget to pull, rebase, then go to push and realise one of their
patches on a branch needs rebasing and rework. That rework may make them
reconsider the patch going to the other branches too.



> +(?) Is the devil we know
>

+ Bi-directional relationship between patches showing which branches it was
applied to (and how).  From the original commit or any of the merge commits
I can see which branches, and where the original commit, was applied. (See
the mongo example from Henrik, how do I see which other branches the trunk
commit was committed to? do i have to start text searching the git history
or going through the ticket system :-(

+ Developing patch on hardest branch first, then working on each softer
branch. I don't know how important this is, but I have found it a useful
practice that encourages smaller, more precise patches overall.


Agree we want a defined end goal. And all for experimenting and testing
simpler/cleaner approaches.




>

Re: [DISCUSS] Releasable trunk and quality

Posted by "benedict@apache.org" <be...@apache.org>.

I think simple, consistent, reliable and unavoidable are *the* killer features for QA. All features (give or take) of the industry standard approach of using CI hooks to gate PR merge.

From: Joshua McKenzie <jm...@apache.org>
Date: Wednesday, 5 January 2022 at 14:53
To: dev <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] Releasable trunk and quality
A wise man once said "Simple is a feature" ;)

Our current process (commit oldest, merge up or merge -s ours w/ --amend):
- is confusing for new contributors to understand
- hides changes inside the merge commit
- masks future ability to see things with git attribute on commits
- is exposed to race w/other committers across multiple branches requiring --atomic
- is non-automatable requiring human intervention and prone to error
- prevents us from using industry standard tooling and workflows around CI thus contributing to CI degrading over time
+ Helps enforce that we don't forget to apply something to all branches
+(?) Is the devil we know

That's a lot of negatives for a very fixable single positive and some FUD.

On Tue, Jan 4, 2022 at 7:01 PM benedict@apache.org<ma...@apache.org> <be...@apache.org>> wrote:
To answer your point, I don’t have anything ideologically against a temporary divergence in treatment, but we should have a clear unified endpoint we are aiming for.

I would hate for this discussion to end without a clear answer about what that endpoint should be, though - even if we don’t get there immediately.

I personally dislike the idea of relying on scripts to enforce this, at least in the long run, as there is no uniformity of environment, so no uniformity of process, and when things go wrong due to diverging systems we’re creating additional work for people (and CI is headache enough when it goes wrong).

From: benedict@apache.org<ma...@apache.org> <be...@apache.org>>
Date: Tuesday, 4 January 2022 at 23:52
To: David Capwell <dc...@apple.com>>, Joshua McKenzie <jm...@apache.org>>
Cc: Henrik Ingo <he...@datastax.com>>, dev <de...@cassandra.apache.org>>
Subject: Re: [DISCUSS] Releasable trunk and quality
That all sounds terribly complicated to me.

My view is that we should switch to the branch strategy outlined by Henrik (I happen to prefer it anyway) and move to GitHub integrations to control merge for each branch independently. Simples.

From: David Capwell <dc...@apple.com>>
Date: Tuesday, 4 January 2022 at 23:33
To: Joshua McKenzie <jm...@apache.org>>
Cc: Henrik Ingo <he...@datastax.com>>, dev <de...@cassandra.apache.org>>
Subject: Re: [DISCUSS] Releasable trunk and quality
The more I think on it, the more I am anyway strongly -1 on having some bifurcated commit process. We should decide on a uniform commit process for the whole project, for all patches, whatever that may be.

Making the process stable and handle all the random things we need to handle takes a lot of time, for that reason I strongly feel we should start with trunk only and look to expand to other branches and/or handle multi-branch commits.  I agree that each branch should NOT have a different process, but feel its ok if we are evolving what the process should be.

About the merge commit thing, we can automate (think Josh wants to OSS my script) the current process so this isn’t a blocker for automation; the thing I hate about it is that I have not found any tool able to understand our history, so it forces me to go to CLI to figure out how the merge actually changed things (only the smallest version can be displayed properly), I am 100% in favor of removing, but don’t think its a dependency on automating our merge process.

On Jan 4, 2022, at 11:58 AM, Joshua McKenzie <jm...@apache.org>> wrote:

I put together a draft confluence wiki page (login required) for the Build Lead role covering what we discussed in the thread here. Link: https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=199527692&draftShareId=96dfa1ef-d927-427a-bff8-0cf711c790c9&

The only potentially controversial thing in there is text under what to do with a consistent test failure introduced by a diff to trunk: "If consistent, git revert the SHA that introduced the failure, re-open the original JIRA ticket, and leave a note for the original assignee about the breakage they introduced".

This would apply only to patches to trunk that introduce consistent failures to a test clearly attributable to that patch.

I am deferring on the topic of merge strategy as there's a lot of progress we can make without considering that more controversial topic yet.

On Tue, Dec 21, 2021 at 9:02 AM Henrik Ingo <he...@datastax.com>> wrote:
FWIW, I thought I could link to an example MongoDB commit:

https://github.com/mongodb/mongo/commit/dec388494b652488259072cf61fd987af3fa8470

* Fixes start from trunk or whatever is the highest version that includes the bug
* It is then cherry picked to each stable version that needs to fix. Above link is an example of such a cherry pick. The original sha is referenced in the commit message.
* I found that it makes sense to always cherry pick from the immediate higher version, since if you had to make some changes to the previous commit, they probably need to be in the next one as well.
* There are no merge commits. Everything is always cherry picked or rebased to the top of a branch.
* Since this was mentioned, MongoDB indeed tracks the cherry picking process explicitly: The original SERVER ticket is closed when fix is committed to trunk branch. However, new BACKPORT tickets are created and linked to the SERVER ticket, one per stable version that will need a cherry-pick. This way backporting the fix is never forgotten, as the team can just track open BACKPRT tickets and work on them to close them.

henrik

On Tue, Dec 14, 2021 at 8:53 PM Joshua McKenzie <jm...@apache.org>> wrote:
>
> I like a change originating from just one commit, and having tracking
> visible across the branches. This gives you immediate information about
> where and how the change was applied without having to go to the jira
> ticket (and relying on it being accurate)

I have the exact opposite experience right now (though this may be a
shortcoming of my env / workflow). When I'm showing annotations in intellij
and I see walls of merge commits as commit messages and have to bounce over
to a terminal or open the git panel to figure out what actual commit on a
different branch contains the minimal commit message pointing to the JIRA
to go to the PR and actually finally find out _why_ we did a thing, then
dig around to see if we changed the impl inside a merge commit SHA from the
original base impl...

Well, that is not my favorite.  :D

All ears on if there's a cleaner way to do the archaeology here.

On Tue, Dec 14, 2021 at 1:34 PM Stefan Miklosovic <
stefan.miklosovic@instaclustr.com<ma...@instaclustr.com>> wrote:

> Does somebody else use the git workflow we do as of now in Apache
> universe? Are not we quite unique? While I do share the same opinion
> Mick has in his last response, I also see the disadvantage in having
> the commit history polluted by merges. I am genuinely curious if there
> is any other Apache project out there doing things same we do (or did
> in the past) and who changed that in one way or the other, plus
> reasons behind it.
>
> On Tue, 14 Dec 2021 at 19:27, Mick Semb Wever <mc...@apache.org>> wrote:
> >
> > >
> > >
> > > >   Merge commits aren’t that useful
> > > >
> > > I keep coming back to this. Arguably the only benefit they offer now is
> > > procedurally forcing us to not miss a bugfix on a branch, but given how
> > > much we amend many things presently anyway that dilutes that benefit.
> > >
> >
> >
> > Doesn't this come down to how you read git history, and for example
> > appreciating a change-centric view over branch isolated development?
> > I like a change originating from just one commit, and having tracking
> > visible across the branches. This gives you immediate information about
> > where and how the change was applied without having to go to the jira
> > ticket (and relying on it being accurate). Connecting commits on
> different
> > branches that are developed separately (no merge tracking) is more
> > complicated. So yeah, I see value in those merge commits. I'm not against
> > trying something new, just would appreciate a bit more exposure to it
> > before making a project wide change. Hence, let's not rush it and just
> > start first with trunk.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org<ma...@cassandra.apache.org>
> For additional commands, e-mail: dev-help@cassandra.apache.org<ma...@cassandra.apache.org>
>
>

--
Henrik Ingo
+358 40 569 7354<tel:358405697354>
[Visit us online.]<https://www.datastax.com/>  [Visit us on Twitter.] <https://twitter.com/DataStaxEng>   [Visit us on YouTube.] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>   [Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

 > A wise man once said "Simple is a feature" ;)
For those unfamiliar, this was a reference to a sentiment Jonathan has
expressed over the years which I strongly agree with
https://issues.apache.org/jira/browse/CASSANDRA-6809?focusedCommentId=14102901&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14102901


On Wed, Jan 5, 2022 at 9:52 AM Joshua McKenzie <jm...@apache.org> wrote:

> A wise man once said "Simple is a feature" ;)
>
> Our current process (commit oldest, merge up or merge -s ours w/ --amend):
> - is confusing for new contributors to understand
> - hides changes inside the merge commit
> - masks future ability to see things with git attribute on commits
> - is exposed to race w/other committers across multiple branches requiring
> --atomic
> - is non-automatable requiring human intervention and prone to error
> - prevents us from using industry standard tooling and workflows around CI
> thus contributing to CI degrading over time
> + Helps enforce that we don't forget to apply something to all branches
> +(?) Is the devil we know
>
> That's a lot of negatives for a very fixable single positive and some FUD.
>
> On Tue, Jan 4, 2022 at 7:01 PM benedict@apache.org <be...@apache.org>
> wrote:
>
>> To answer your point, I don’t have anything ideologically against a
>> temporary divergence in treatment, but we should have a clear unified
>> endpoint we are aiming for.
>>
>>
>>
>> I would hate for this discussion to end without a clear answer about what
>> that endpoint should be, though - even if we don’t get there immediately.
>>
>>
>>
>> I personally dislike the idea of relying on scripts to enforce this, at
>> least in the long run, as there is no uniformity of environment, so no
>> uniformity of process, and when things go wrong due to diverging systems
>> we’re creating additional work for people (and CI is headache enough when
>> it goes wrong).
>>
>>
>>
>>
>>
>> *From: *benedict@apache.org <be...@apache.org>
>> *Date: *Tuesday, 4 January 2022 at 23:52
>> *To: *David Capwell <dc...@apple.com>, Joshua McKenzie <
>> jmckenzie@apache.org>
>> *Cc: *Henrik Ingo <he...@datastax.com>, dev <
>> dev@cassandra.apache.org>
>> *Subject: *Re: [DISCUSS] Releasable trunk and quality
>>
>> That all sounds terribly complicated to me.
>>
>>
>>
>> My view is that we should switch to the branch strategy outlined by
>> Henrik (I happen to prefer it anyway) and move to GitHub integrations to
>> control merge for each branch independently. Simples.
>>
>>
>>
>>
>>
>> *From: *David Capwell <dc...@apple.com>
>> *Date: *Tuesday, 4 January 2022 at 23:33
>> *To: *Joshua McKenzie <jm...@apache.org>
>> *Cc: *Henrik Ingo <he...@datastax.com>, dev <
>> dev@cassandra.apache.org>
>> *Subject: *Re: [DISCUSS] Releasable trunk and quality
>>
>> The more I think on it, the more I am anyway strongly -1 on having some
>> bifurcated commit process. We should decide on a uniform commit process for
>> the whole project, for all patches, whatever that may be.
>>
>>
>>
>> Making the process stable and handle all the random things we need to
>> handle takes a lot of time, for that reason I strongly feel we should start
>> with trunk only and look to expand to other branches and/or handle
>> multi-branch commits.  I agree that each branch should NOT have a different
>> process, but feel its ok if we are evolving what the process should be.
>>
>>
>>
>> About the merge commit thing, we can automate (think Josh wants to OSS my
>> script) the current process so this isn’t a blocker for automation; the
>> thing I hate about it is that I have not found any tool able to understand
>> our history, so it forces me to go to CLI to figure out how the merge
>> actually changed things (only the smallest version can be displayed
>> properly), I am 100% in favor of removing, but don’t think its a dependency
>> on automating our merge process.
>>
>>
>>
>>
>>
>>
>> On Jan 4, 2022, at 11:58 AM, Joshua McKenzie <jm...@apache.org>
>> wrote:
>>
>>
>>
>> I put together a draft confluence wiki page (login required) for the
>> Build Lead role covering what we discussed in the thread here. Link:
>> https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=199527692&draftShareId=96dfa1ef-d927-427a-bff8-0cf711c790c9&
>>
>>
>>
>> The only potentially controversial thing in there is text under what to
>> do with a consistent test failure introduced by a diff to trunk: "If
>> consistent, *git revert* *the SHA that introduced the failure*, re-open
>> the original JIRA ticket, and leave a note for the original assignee about
>> the breakage they introduced".
>>
>>
>>
>> This would apply only to patches to trunk that introduce consistent
>> failures to a test clearly attributable to that patch.
>>
>>
>>
>> I am deferring on the topic of merge strategy as there's a lot of
>> progress we can make without considering that more controversial topic yet.
>>
>>
>>
>> On Tue, Dec 21, 2021 at 9:02 AM Henrik Ingo <he...@datastax.com>
>> wrote:
>>
>> FWIW, I thought I could link to an example MongoDB commit:
>>
>>
>>
>>
>> https://github.com/mongodb/mongo/commit/dec388494b652488259072cf61fd987af3fa8470
>>
>>
>>
>> * Fixes start from trunk or whatever is the highest version that includes
>> the bug
>>
>> * It is then cherry picked to each stable version that needs to fix.
>> Above link is an example of such a cherry pick. The original sha is
>> referenced in the commit message.
>>
>> * I found that it makes sense to always cherry pick from the immediate
>> higher version, since if you had to make some changes to the previous
>> commit, they probably need to be in the next one as well.
>>
>> * There are no merge commits. Everything is always cherry picked or
>> rebased to the top of a branch.
>>
>> * Since this was mentioned, MongoDB indeed tracks the cherry picking
>> process explicitly: The original SERVER ticket is closed when fix is
>> committed to trunk branch. However, new BACKPORT tickets are created and
>> linked to the SERVER ticket, one per stable version that will need a
>> cherry-pick. This way backporting the fix is never forgotten, as the team
>> can just track open BACKPRT tickets and work on them to close them.
>>
>>
>>
>> henrik
>>
>>
>>
>> On Tue, Dec 14, 2021 at 8:53 PM Joshua McKenzie <jm...@apache.org>
>> wrote:
>>
>> >
>> > I like a change originating from just one commit, and having tracking
>> > visible across the branches. This gives you immediate information about
>> > where and how the change was applied without having to go to the jira
>> > ticket (and relying on it being accurate)
>>
>> I have the exact opposite experience right now (though this may be a
>> shortcoming of my env / workflow). When I'm showing annotations in
>> intellij
>> and I see walls of merge commits as commit messages and have to bounce
>> over
>> to a terminal or open the git panel to figure out what actual commit on a
>> different branch contains the minimal commit message pointing to the JIRA
>> to go to the PR and actually finally find out _why_ we did a thing, then
>> dig around to see if we changed the impl inside a merge commit SHA from
>> the
>> original base impl...
>>
>> Well, that is not my favorite.  :D
>>
>> All ears on if there's a cleaner way to do the archaeology here.
>>
>>
>> On Tue, Dec 14, 2021 at 1:34 PM Stefan Miklosovic <
>> stefan.miklosovic@instaclustr.com> wrote:
>>
>> > Does somebody else use the git workflow we do as of now in Apache
>> > universe? Are not we quite unique? While I do share the same opinion
>> > Mick has in his last response, I also see the disadvantage in having
>> > the commit history polluted by merges. I am genuinely curious if there
>> > is any other Apache project out there doing things same we do (or did
>> > in the past) and who changed that in one way or the other, plus
>> > reasons behind it.
>> >
>> > On Tue, 14 Dec 2021 at 19:27, Mick Semb Wever <mc...@apache.org> wrote:
>> > >
>> > > >
>> > > >
>> > > > >   Merge commits aren’t that useful
>> > > > >
>> > > > I keep coming back to this. Arguably the only benefit they offer
>> now is
>> > > > procedurally forcing us to not miss a bugfix on a branch, but given
>> how
>> > > > much we amend many things presently anyway that dilutes that
>> benefit.
>> > > >
>> > >
>> > >
>> > > Doesn't this come down to how you read git history, and for example
>> > > appreciating a change-centric view over branch isolated development?
>> > > I like a change originating from just one commit, and having tracking
>> > > visible across the branches. This gives you immediate information
>> about
>> > > where and how the change was applied without having to go to the jira
>> > > ticket (and relying on it being accurate). Connecting commits on
>> > different
>> > > branches that are developed separately (no merge tracking) is more
>> > > complicated. So yeah, I see value in those merge commits. I'm not
>> against
>> > > trying something new, just would appreciate a bit more exposure to it
>> > > before making a project wide change. Hence, let's not rush it and just
>> > > start first with trunk.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> > For additional commands, e-mail: dev-help@cassandra.apache.org
>> >
>> >
>>
>>
>>
>>
>> --
>>
>> Henrik Ingo
>>
>> +358 40 569 7354 <358405697354>
>>
>> [image: Visit us online.] <https://www.datastax.com/>  [image: Visit us
>> on Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
>> YouTube.]
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
>>   [image: Visit my LinkedIn profile.]
>> <https://www.linkedin.com/in/heingo/>
>>
>>
>>
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

A wise man once said "Simple is a feature" ;)

Our current process (commit oldest, merge up or merge -s ours w/ --amend):
- is confusing for new contributors to understand
- hides changes inside the merge commit
- masks future ability to see things with git attribute on commits
- is exposed to race w/other committers across multiple branches requiring
--atomic
- is non-automatable requiring human intervention and prone to error
- prevents us from using industry standard tooling and workflows around CI
thus contributing to CI degrading over time
+ Helps enforce that we don't forget to apply something to all branches
+(?) Is the devil we know

That's a lot of negatives for a very fixable single positive and some FUD.

On Tue, Jan 4, 2022 at 7:01 PM benedict@apache.org <be...@apache.org>
wrote:

> To answer your point, I don’t have anything ideologically against a
> temporary divergence in treatment, but we should have a clear unified
> endpoint we are aiming for.
>
>
>
> I would hate for this discussion to end without a clear answer about what
> that endpoint should be, though - even if we don’t get there immediately.
>
>
>
> I personally dislike the idea of relying on scripts to enforce this, at
> least in the long run, as there is no uniformity of environment, so no
> uniformity of process, and when things go wrong due to diverging systems
> we’re creating additional work for people (and CI is headache enough when
> it goes wrong).
>
>
>
>
>
> *From: *benedict@apache.org <be...@apache.org>
> *Date: *Tuesday, 4 January 2022 at 23:52
> *To: *David Capwell <dc...@apple.com>, Joshua McKenzie <
> jmckenzie@apache.org>
> *Cc: *Henrik Ingo <he...@datastax.com>, dev <
> dev@cassandra.apache.org>
> *Subject: *Re: [DISCUSS] Releasable trunk and quality
>
> That all sounds terribly complicated to me.
>
>
>
> My view is that we should switch to the branch strategy outlined by Henrik
> (I happen to prefer it anyway) and move to GitHub integrations to control
> merge for each branch independently. Simples.
>
>
>
>
>
> *From: *David Capwell <dc...@apple.com>
> *Date: *Tuesday, 4 January 2022 at 23:33
> *To: *Joshua McKenzie <jm...@apache.org>
> *Cc: *Henrik Ingo <he...@datastax.com>, dev <
> dev@cassandra.apache.org>
> *Subject: *Re: [DISCUSS] Releasable trunk and quality
>
> The more I think on it, the more I am anyway strongly -1 on having some
> bifurcated commit process. We should decide on a uniform commit process for
> the whole project, for all patches, whatever that may be.
>
>
>
> Making the process stable and handle all the random things we need to
> handle takes a lot of time, for that reason I strongly feel we should start
> with trunk only and look to expand to other branches and/or handle
> multi-branch commits.  I agree that each branch should NOT have a different
> process, but feel its ok if we are evolving what the process should be.
>
>
>
> About the merge commit thing, we can automate (think Josh wants to OSS my
> script) the current process so this isn’t a blocker for automation; the
> thing I hate about it is that I have not found any tool able to understand
> our history, so it forces me to go to CLI to figure out how the merge
> actually changed things (only the smallest version can be displayed
> properly), I am 100% in favor of removing, but don’t think its a dependency
> on automating our merge process.
>
>
>
>
>
>
> On Jan 4, 2022, at 11:58 AM, Joshua McKenzie <jm...@apache.org> wrote:
>
>
>
> I put together a draft confluence wiki page (login required) for the Build
> Lead role covering what we discussed in the thread here. Link:
> https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=199527692&draftShareId=96dfa1ef-d927-427a-bff8-0cf711c790c9&
>
>
>
> The only potentially controversial thing in there is text under what to do
> with a consistent test failure introduced by a diff to trunk: "If
> consistent, *git revert* *the SHA that introduced the failure*, re-open
> the original JIRA ticket, and leave a note for the original assignee about
> the breakage they introduced".
>
>
>
> This would apply only to patches to trunk that introduce consistent
> failures to a test clearly attributable to that patch.
>
>
>
> I am deferring on the topic of merge strategy as there's a lot of progress
> we can make without considering that more controversial topic yet.
>
>
>
> On Tue, Dec 21, 2021 at 9:02 AM Henrik Ingo <he...@datastax.com>
> wrote:
>
> FWIW, I thought I could link to an example MongoDB commit:
>
>
>
>
> https://github.com/mongodb/mongo/commit/dec388494b652488259072cf61fd987af3fa8470
>
>
>
> * Fixes start from trunk or whatever is the highest version that includes
> the bug
>
> * It is then cherry picked to each stable version that needs to fix. Above
> link is an example of such a cherry pick. The original sha is referenced in
> the commit message.
>
> * I found that it makes sense to always cherry pick from the immediate
> higher version, since if you had to make some changes to the previous
> commit, they probably need to be in the next one as well.
>
> * There are no merge commits. Everything is always cherry picked or
> rebased to the top of a branch.
>
> * Since this was mentioned, MongoDB indeed tracks the cherry picking
> process explicitly: The original SERVER ticket is closed when fix is
> committed to trunk branch. However, new BACKPORT tickets are created and
> linked to the SERVER ticket, one per stable version that will need a
> cherry-pick. This way backporting the fix is never forgotten, as the team
> can just track open BACKPRT tickets and work on them to close them.
>
>
>
> henrik
>
>
>
> On Tue, Dec 14, 2021 at 8:53 PM Joshua McKenzie <jm...@apache.org>
> wrote:
>
> >
> > I like a change originating from just one commit, and having tracking
> > visible across the branches. This gives you immediate information about
> > where and how the change was applied without having to go to the jira
> > ticket (and relying on it being accurate)
>
> I have the exact opposite experience right now (though this may be a
> shortcoming of my env / workflow). When I'm showing annotations in intellij
> and I see walls of merge commits as commit messages and have to bounce over
> to a terminal or open the git panel to figure out what actual commit on a
> different branch contains the minimal commit message pointing to the JIRA
> to go to the PR and actually finally find out _why_ we did a thing, then
> dig around to see if we changed the impl inside a merge commit SHA from the
> original base impl...
>
> Well, that is not my favorite.  :D
>
> All ears on if there's a cleaner way to do the archaeology here.
>
>
> On Tue, Dec 14, 2021 at 1:34 PM Stefan Miklosovic <
> stefan.miklosovic@instaclustr.com> wrote:
>
> > Does somebody else use the git workflow we do as of now in Apache
> > universe? Are not we quite unique? While I do share the same opinion
> > Mick has in his last response, I also see the disadvantage in having
> > the commit history polluted by merges. I am genuinely curious if there
> > is any other Apache project out there doing things same we do (or did
> > in the past) and who changed that in one way or the other, plus
> > reasons behind it.
> >
> > On Tue, 14 Dec 2021 at 19:27, Mick Semb Wever <mc...@apache.org> wrote:
> > >
> > > >
> > > >
> > > > >   Merge commits aren’t that useful
> > > > >
> > > > I keep coming back to this. Arguably the only benefit they offer now
> is
> > > > procedurally forcing us to not miss a bugfix on a branch, but given
> how
> > > > much we amend many things presently anyway that dilutes that benefit.
> > > >
> > >
> > >
> > > Doesn't this come down to how you read git history, and for example
> > > appreciating a change-centric view over branch isolated development?
> > > I like a change originating from just one commit, and having tracking
> > > visible across the branches. This gives you immediate information about
> > > where and how the change was applied without having to go to the jira
> > > ticket (and relying on it being accurate). Connecting commits on
> > different
> > > branches that are developed separately (no merge tracking) is more
> > > complicated. So yeah, I see value in those merge commits. I'm not
> against
> > > trying something new, just would appreciate a bit more exposure to it
> > > before making a project wide change. Hence, let's not rush it and just
> > > start first with trunk.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>
>
>
>
> --
>
> Henrik Ingo
>
> +358 40 569 7354 <358405697354>
>
> [image: Visit us online.] <https://www.datastax.com/>  [image: Visit us
> on Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
> YouTube.]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
>   [image: Visit my LinkedIn profile.]
> <https://www.linkedin.com/in/heingo/>
>
>
>

Re: [DISCUSS] Releasable trunk and quality

Posted by "benedict@apache.org" <be...@apache.org>.

To answer your point, I don’t have anything ideologically against a temporary divergence in treatment, but we should have a clear unified endpoint we are aiming for.

I would hate for this discussion to end without a clear answer about what that endpoint should be, though - even if we don’t get there immediately.

I personally dislike the idea of relying on scripts to enforce this, at least in the long run, as there is no uniformity of environment, so no uniformity of process, and when things go wrong due to diverging systems we’re creating additional work for people (and CI is headache enough when it goes wrong).

From: benedict@apache.org <be...@apache.org>
Date: Tuesday, 4 January 2022 at 23:52
To: David Capwell <dc...@apple.com>, Joshua McKenzie <jm...@apache.org>
Cc: Henrik Ingo <he...@datastax.com>, dev <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] Releasable trunk and quality
That all sounds terribly complicated to me.

My view is that we should switch to the branch strategy outlined by Henrik (I happen to prefer it anyway) and move to GitHub integrations to control merge for each branch independently. Simples.

From: David Capwell <dc...@apple.com>
Date: Tuesday, 4 January 2022 at 23:33
To: Joshua McKenzie <jm...@apache.org>
Cc: Henrik Ingo <he...@datastax.com>, dev <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] Releasable trunk and quality
The more I think on it, the more I am anyway strongly -1 on having some bifurcated commit process. We should decide on a uniform commit process for the whole project, for all patches, whatever that may be.

Making the process stable and handle all the random things we need to handle takes a lot of time, for that reason I strongly feel we should start with trunk only and look to expand to other branches and/or handle multi-branch commits.  I agree that each branch should NOT have a different process, but feel its ok if we are evolving what the process should be.

About the merge commit thing, we can automate (think Josh wants to OSS my script) the current process so this isn’t a blocker for automation; the thing I hate about it is that I have not found any tool able to understand our history, so it forces me to go to CLI to figure out how the merge actually changed things (only the smallest version can be displayed properly), I am 100% in favor of removing, but don’t think its a dependency on automating our merge process.

On Jan 4, 2022, at 11:58 AM, Joshua McKenzie <jm...@apache.org>> wrote:

I put together a draft confluence wiki page (login required) for the Build Lead role covering what we discussed in the thread here. Link: https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=199527692&draftShareId=96dfa1ef-d927-427a-bff8-0cf711c790c9&

The only potentially controversial thing in there is text under what to do with a consistent test failure introduced by a diff to trunk: "If consistent, git revert the SHA that introduced the failure, re-open the original JIRA ticket, and leave a note for the original assignee about the breakage they introduced".

This would apply only to patches to trunk that introduce consistent failures to a test clearly attributable to that patch.

I am deferring on the topic of merge strategy as there's a lot of progress we can make without considering that more controversial topic yet.

On Tue, Dec 21, 2021 at 9:02 AM Henrik Ingo <he...@datastax.com>> wrote:
FWIW, I thought I could link to an example MongoDB commit:

https://github.com/mongodb/mongo/commit/dec388494b652488259072cf61fd987af3fa8470

* Fixes start from trunk or whatever is the highest version that includes the bug
* It is then cherry picked to each stable version that needs to fix. Above link is an example of such a cherry pick. The original sha is referenced in the commit message.
* I found that it makes sense to always cherry pick from the immediate higher version, since if you had to make some changes to the previous commit, they probably need to be in the next one as well.
* There are no merge commits. Everything is always cherry picked or rebased to the top of a branch.
* Since this was mentioned, MongoDB indeed tracks the cherry picking process explicitly: The original SERVER ticket is closed when fix is committed to trunk branch. However, new BACKPORT tickets are created and linked to the SERVER ticket, one per stable version that will need a cherry-pick. This way backporting the fix is never forgotten, as the team can just track open BACKPRT tickets and work on them to close them.

henrik

On Tue, Dec 14, 2021 at 8:53 PM Joshua McKenzie <jm...@apache.org>> wrote:
>
> I like a change originating from just one commit, and having tracking
> visible across the branches. This gives you immediate information about
> where and how the change was applied without having to go to the jira
> ticket (and relying on it being accurate)

I have the exact opposite experience right now (though this may be a
shortcoming of my env / workflow). When I'm showing annotations in intellij
and I see walls of merge commits as commit messages and have to bounce over
to a terminal or open the git panel to figure out what actual commit on a
different branch contains the minimal commit message pointing to the JIRA
to go to the PR and actually finally find out _why_ we did a thing, then
dig around to see if we changed the impl inside a merge commit SHA from the
original base impl...

Well, that is not my favorite.  :D

All ears on if there's a cleaner way to do the archaeology here.

On Tue, Dec 14, 2021 at 1:34 PM Stefan Miklosovic <
stefan.miklosovic@instaclustr.com<ma...@instaclustr.com>> wrote:

> Does somebody else use the git workflow we do as of now in Apache
> universe? Are not we quite unique? While I do share the same opinion
> Mick has in his last response, I also see the disadvantage in having
> the commit history polluted by merges. I am genuinely curious if there
> is any other Apache project out there doing things same we do (or did
> in the past) and who changed that in one way or the other, plus
> reasons behind it.
>
> On Tue, 14 Dec 2021 at 19:27, Mick Semb Wever <mc...@apache.org>> wrote:
> >
> > >
> > >
> > > >   Merge commits aren’t that useful
> > > >
> > > I keep coming back to this. Arguably the only benefit they offer now is
> > > procedurally forcing us to not miss a bugfix on a branch, but given how
> > > much we amend many things presently anyway that dilutes that benefit.
> > >
> >
> >
> > Doesn't this come down to how you read git history, and for example
> > appreciating a change-centric view over branch isolated development?
> > I like a change originating from just one commit, and having tracking
> > visible across the branches. This gives you immediate information about
> > where and how the change was applied without having to go to the jira
> > ticket (and relying on it being accurate). Connecting commits on
> different
> > branches that are developed separately (no merge tracking) is more
> > complicated. So yeah, I see value in those merge commits. I'm not against
> > trying something new, just would appreciate a bit more exposure to it
> > before making a project wide change. Hence, let's not rush it and just
> > start first with trunk.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org<ma...@cassandra.apache.org>
> For additional commands, e-mail: dev-help@cassandra.apache.org<ma...@cassandra.apache.org>
>
>

--
Henrik Ingo
+358 40 569 7354<tel:358405697354>
[Visit us online.]<https://www.datastax.com/>  [Visit us on Twitter.] <https://twitter.com/DataStaxEng>   [Visit us on YouTube.] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>   [Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>

Re: [DISCUSS] Releasable trunk and quality

Posted by "benedict@apache.org" <be...@apache.org>.

That all sounds terribly complicated to me.

My view is that we should switch to the branch strategy outlined by Henrik (I happen to prefer it anyway) and move to GitHub integrations to control merge for each branch independently. Simples.

From: David Capwell <dc...@apple.com>
Date: Tuesday, 4 January 2022 at 23:33
To: Joshua McKenzie <jm...@apache.org>
Cc: Henrik Ingo <he...@datastax.com>, dev <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] Releasable trunk and quality
The more I think on it, the more I am anyway strongly -1 on having some bifurcated commit process. We should decide on a uniform commit process for the whole project, for all patches, whatever that may be.

Making the process stable and handle all the random things we need to handle takes a lot of time, for that reason I strongly feel we should start with trunk only and look to expand to other branches and/or handle multi-branch commits.  I agree that each branch should NOT have a different process, but feel its ok if we are evolving what the process should be.

About the merge commit thing, we can automate (think Josh wants to OSS my script) the current process so this isn’t a blocker for automation; the thing I hate about it is that I have not found any tool able to understand our history, so it forces me to go to CLI to figure out how the merge actually changed things (only the smallest version can be displayed properly), I am 100% in favor of removing, but don’t think its a dependency on automating our merge process.

On Jan 4, 2022, at 11:58 AM, Joshua McKenzie <jm...@apache.org>> wrote:

I put together a draft confluence wiki page (login required) for the Build Lead role covering what we discussed in the thread here. Link: https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=199527692&draftShareId=96dfa1ef-d927-427a-bff8-0cf711c790c9&

The only potentially controversial thing in there is text under what to do with a consistent test failure introduced by a diff to trunk: "If consistent, git revert the SHA that introduced the failure, re-open the original JIRA ticket, and leave a note for the original assignee about the breakage they introduced".

This would apply only to patches to trunk that introduce consistent failures to a test clearly attributable to that patch.

I am deferring on the topic of merge strategy as there's a lot of progress we can make without considering that more controversial topic yet.

On Tue, Dec 21, 2021 at 9:02 AM Henrik Ingo <he...@datastax.com>> wrote:
FWIW, I thought I could link to an example MongoDB commit:

https://github.com/mongodb/mongo/commit/dec388494b652488259072cf61fd987af3fa8470

* Fixes start from trunk or whatever is the highest version that includes the bug
* It is then cherry picked to each stable version that needs to fix. Above link is an example of such a cherry pick. The original sha is referenced in the commit message.
* I found that it makes sense to always cherry pick from the immediate higher version, since if you had to make some changes to the previous commit, they probably need to be in the next one as well.
* There are no merge commits. Everything is always cherry picked or rebased to the top of a branch.
* Since this was mentioned, MongoDB indeed tracks the cherry picking process explicitly: The original SERVER ticket is closed when fix is committed to trunk branch. However, new BACKPORT tickets are created and linked to the SERVER ticket, one per stable version that will need a cherry-pick. This way backporting the fix is never forgotten, as the team can just track open BACKPRT tickets and work on them to close them.

henrik

On Tue, Dec 14, 2021 at 8:53 PM Joshua McKenzie <jm...@apache.org>> wrote:
>
> I like a change originating from just one commit, and having tracking
> visible across the branches. This gives you immediate information about
> where and how the change was applied without having to go to the jira
> ticket (and relying on it being accurate)

I have the exact opposite experience right now (though this may be a
shortcoming of my env / workflow). When I'm showing annotations in intellij
and I see walls of merge commits as commit messages and have to bounce over
to a terminal or open the git panel to figure out what actual commit on a
different branch contains the minimal commit message pointing to the JIRA
to go to the PR and actually finally find out _why_ we did a thing, then
dig around to see if we changed the impl inside a merge commit SHA from the
original base impl...

Well, that is not my favorite.  :D

All ears on if there's a cleaner way to do the archaeology here.

On Tue, Dec 14, 2021 at 1:34 PM Stefan Miklosovic <
stefan.miklosovic@instaclustr.com<ma...@instaclustr.com>> wrote:

> Does somebody else use the git workflow we do as of now in Apache
> universe? Are not we quite unique? While I do share the same opinion
> Mick has in his last response, I also see the disadvantage in having
> the commit history polluted by merges. I am genuinely curious if there
> is any other Apache project out there doing things same we do (or did
> in the past) and who changed that in one way or the other, plus
> reasons behind it.
>
> On Tue, 14 Dec 2021 at 19:27, Mick Semb Wever <mc...@apache.org>> wrote:
> >
> > >
> > >
> > > >   Merge commits aren’t that useful
> > > >
> > > I keep coming back to this. Arguably the only benefit they offer now is
> > > procedurally forcing us to not miss a bugfix on a branch, but given how
> > > much we amend many things presently anyway that dilutes that benefit.
> > >
> >
> >
> > Doesn't this come down to how you read git history, and for example
> > appreciating a change-centric view over branch isolated development?
> > I like a change originating from just one commit, and having tracking
> > visible across the branches. This gives you immediate information about
> > where and how the change was applied without having to go to the jira
> > ticket (and relying on it being accurate). Connecting commits on
> different
> > branches that are developed separately (no merge tracking) is more
> > complicated. So yeah, I see value in those merge commits. I'm not against
> > trying something new, just would appreciate a bit more exposure to it
> > before making a project wide change. Hence, let's not rush it and just
> > start first with trunk.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org<ma...@cassandra.apache.org>
> For additional commands, e-mail: dev-help@cassandra.apache.org<ma...@cassandra.apache.org>
>
>

--
Henrik Ingo
+358 40 569 7354<tel:358405697354>
[Visit us online.]<https://www.datastax.com/>  [Visit us on Twitter.] <https://twitter.com/DataStaxEng>   [Visit us on YouTube.] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>   [Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>

Re: [DISCUSS] Releasable trunk and quality

Posted by David Capwell <dc...@apple.com>.

> The more I think on it, the more I am anyway strongly -1 on having some bifurcated commit process. We should decide on a uniform commit process for the whole project, for all patches, whatever that may be.

Making the process stable and handle all the random things we need to handle takes a lot of time, for that reason I strongly feel we should start with trunk only and look to expand to other branches and/or handle multi-branch commits.  I agree that each branch should NOT have a different process, but feel its ok if we are evolving what the process should be.

About the merge commit thing, we can automate (think Josh wants to OSS my script) the current process so this isn’t a blocker for automation; the thing I hate about it is that I have not found any tool able to understand our history, so it forces me to go to CLI to figure out how the merge actually changed things (only the smallest version can be displayed properly), I am 100% in favor of removing, but don’t think its a dependency on automating our merge process.


> On Jan 4, 2022, at 11:58 AM, Joshua McKenzie <jm...@apache.org> wrote:
> 
> I put together a draft confluence wiki page (login required) for the Build Lead role covering what we discussed in the thread here. Link: https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=199527692&draftShareId=96dfa1ef-d927-427a-bff8-0cf711c790c9& <https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=199527692&draftShareId=96dfa1ef-d927-427a-bff8-0cf711c790c9&>
> 
> The only potentially controversial thing in there is text under what to do with a consistent test failure introduced by a diff to trunk: "If consistent, git revert the SHA that introduced the failure, re-open the original JIRA ticket, and leave a note for the original assignee about the breakage they introduced".
> 
> This would apply only to patches to trunk that introduce consistent failures to a test clearly attributable to that patch.
> 
> I am deferring on the topic of merge strategy as there's a lot of progress we can make without considering that more controversial topic yet.
> 
> On Tue, Dec 21, 2021 at 9:02 AM Henrik Ingo <henrik.ingo@datastax.com <ma...@datastax.com>> wrote:
> FWIW, I thought I could link to an example MongoDB commit:
> 
> https://github.com/mongodb/mongo/commit/dec388494b652488259072cf61fd987af3fa8470 <https://github.com/mongodb/mongo/commit/dec388494b652488259072cf61fd987af3fa8470>
> 
> * Fixes start from trunk or whatever is the highest version that includes the bug
> * It is then cherry picked to each stable version that needs to fix. Above link is an example of such a cherry pick. The original sha is referenced in the commit message.
> * I found that it makes sense to always cherry pick from the immediate higher version, since if you had to make some changes to the previous commit, they probably need to be in the next one as well.
> * There are no merge commits. Everything is always cherry picked or rebased to the top of a branch.
> * Since this was mentioned, MongoDB indeed tracks the cherry picking process explicitly: The original SERVER ticket is closed when fix is committed to trunk branch. However, new BACKPORT tickets are created and linked to the SERVER ticket, one per stable version that will need a cherry-pick. This way backporting the fix is never forgotten, as the team can just track open BACKPRT tickets and work on them to close them.
> 
> henrik
> 
> On Tue, Dec 14, 2021 at 8:53 PM Joshua McKenzie <jmckenzie@apache.org <ma...@apache.org>> wrote:
> >
> > I like a change originating from just one commit, and having tracking
> > visible across the branches. This gives you immediate information about
> > where and how the change was applied without having to go to the jira
> > ticket (and relying on it being accurate)
> 
> I have the exact opposite experience right now (though this may be a
> shortcoming of my env / workflow). When I'm showing annotations in intellij
> and I see walls of merge commits as commit messages and have to bounce over
> to a terminal or open the git panel to figure out what actual commit on a
> different branch contains the minimal commit message pointing to the JIRA
> to go to the PR and actually finally find out _why_ we did a thing, then
> dig around to see if we changed the impl inside a merge commit SHA from the
> original base impl...
> 
> Well, that is not my favorite.  :D
> 
> All ears on if there's a cleaner way to do the archaeology here.
> 
> 
> On Tue, Dec 14, 2021 at 1:34 PM Stefan Miklosovic <
> stefan.miklosovic@instaclustr.com <ma...@instaclustr.com>> wrote:
> 
> > Does somebody else use the git workflow we do as of now in Apache
> > universe? Are not we quite unique? While I do share the same opinion
> > Mick has in his last response, I also see the disadvantage in having
> > the commit history polluted by merges. I am genuinely curious if there
> > is any other Apache project out there doing things same we do (or did
> > in the past) and who changed that in one way or the other, plus
> > reasons behind it.
> >
> > On Tue, 14 Dec 2021 at 19:27, Mick Semb Wever <mck@apache.org <ma...@apache.org>> wrote:
> > >
> > > >
> > > >
> > > > >   Merge commits aren’t that useful
> > > > >
> > > > I keep coming back to this. Arguably the only benefit they offer now is
> > > > procedurally forcing us to not miss a bugfix on a branch, but given how
> > > > much we amend many things presently anyway that dilutes that benefit.
> > > >
> > >
> > >
> > > Doesn't this come down to how you read git history, and for example
> > > appreciating a change-centric view over branch isolated development?
> > > I like a change originating from just one commit, and having tracking
> > > visible across the branches. This gives you immediate information about
> > > where and how the change was applied without having to go to the jira
> > > ticket (and relying on it being accurate). Connecting commits on
> > different
> > > branches that are developed separately (no merge tracking) is more
> > > complicated. So yeah, I see value in those merge commits. I'm not against
> > > trying something new, just would appreciate a bit more exposure to it
> > > before making a project wide change. Hence, let's not rush it and just
> > > start first with trunk.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
> > For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
> >
> >
> 
> 
> -- 
> Henrik Ingo
> +358 40 569 7354 <tel:358405697354>
>  <https://www.datastax.com/>   <https://twitter.com/DataStaxEng>   <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>   <https://www.linkedin.com/in/heingo/>

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

I put together a draft confluence wiki page (login required) for the Build
Lead role covering what we discussed in the thread here. Link:
https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=199527692&draftShareId=96dfa1ef-d927-427a-bff8-0cf711c790c9&

The only potentially controversial thing in there is text under what to do
with a consistent test failure introduced by a diff to trunk: "If
consistent, *git revert the SHA that introduced the failure*, re-open the
original JIRA ticket, and leave a note for the original assignee about the
breakage they introduced".

This would apply only to patches to trunk that introduce consistent
failures to a test clearly attributable to that patch.

I am deferring on the topic of merge strategy as there's a lot of progress
we can make without considering that more controversial topic yet.

On Tue, Dec 21, 2021 at 9:02 AM Henrik Ingo <he...@datastax.com>
wrote:

> FWIW, I thought I could link to an example MongoDB commit:
>
>
> https://github.com/mongodb/mongo/commit/dec388494b652488259072cf61fd987af3fa8470
>
> * Fixes start from trunk or whatever is the highest version that includes
> the bug
> * It is then cherry picked to each stable version that needs to fix. Above
> link is an example of such a cherry pick. The original sha is referenced in
> the commit message.
> * I found that it makes sense to always cherry pick from the immediate
> higher version, since if you had to make some changes to the previous
> commit, they probably need to be in the next one as well.
> * There are no merge commits. Everything is always cherry picked or
> rebased to the top of a branch.
> * Since this was mentioned, MongoDB indeed tracks the cherry picking
> process explicitly: The original SERVER ticket is closed when fix is
> committed to trunk branch. However, new BACKPORT tickets are created and
> linked to the SERVER ticket, one per stable version that will need a
> cherry-pick. This way backporting the fix is never forgotten, as the team
> can just track open BACKPRT tickets and work on them to close them.
>
> henrik
>
> On Tue, Dec 14, 2021 at 8:53 PM Joshua McKenzie <jm...@apache.org>
> wrote:
>
>> >
>> > I like a change originating from just one commit, and having tracking
>> > visible across the branches. This gives you immediate information about
>> > where and how the change was applied without having to go to the jira
>> > ticket (and relying on it being accurate)
>>
>> I have the exact opposite experience right now (though this may be a
>> shortcoming of my env / workflow). When I'm showing annotations in
>> intellij
>> and I see walls of merge commits as commit messages and have to bounce
>> over
>> to a terminal or open the git panel to figure out what actual commit on a
>> different branch contains the minimal commit message pointing to the JIRA
>> to go to the PR and actually finally find out _why_ we did a thing, then
>> dig around to see if we changed the impl inside a merge commit SHA from
>> the
>> original base impl...
>>
>> Well, that is not my favorite.  :D
>>
>> All ears on if there's a cleaner way to do the archaeology here.
>>
>>
>> On Tue, Dec 14, 2021 at 1:34 PM Stefan Miklosovic <
>> stefan.miklosovic@instaclustr.com> wrote:
>>
>> > Does somebody else use the git workflow we do as of now in Apache
>> > universe? Are not we quite unique? While I do share the same opinion
>> > Mick has in his last response, I also see the disadvantage in having
>> > the commit history polluted by merges. I am genuinely curious if there
>> > is any other Apache project out there doing things same we do (or did
>> > in the past) and who changed that in one way or the other, plus
>> > reasons behind it.
>> >
>> > On Tue, 14 Dec 2021 at 19:27, Mick Semb Wever <mc...@apache.org> wrote:
>> > >
>> > > >
>> > > >
>> > > > >   Merge commits aren’t that useful
>> > > > >
>> > > > I keep coming back to this. Arguably the only benefit they offer
>> now is
>> > > > procedurally forcing us to not miss a bugfix on a branch, but given
>> how
>> > > > much we amend many things presently anyway that dilutes that
>> benefit.
>> > > >
>> > >
>> > >
>> > > Doesn't this come down to how you read git history, and for example
>> > > appreciating a change-centric view over branch isolated development?
>> > > I like a change originating from just one commit, and having tracking
>> > > visible across the branches. This gives you immediate information
>> about
>> > > where and how the change was applied without having to go to the jira
>> > > ticket (and relying on it being accurate). Connecting commits on
>> > different
>> > > branches that are developed separately (no merge tracking) is more
>> > > complicated. So yeah, I see value in those merge commits. I'm not
>> against
>> > > trying something new, just would appreciate a bit more exposure to it
>> > > before making a project wide change. Hence, let's not rush it and just
>> > > start first with trunk.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> > For additional commands, e-mail: dev-help@cassandra.apache.org
>> >
>> >
>>
>
>
> --
>
> Henrik Ingo
>
> +358 40 569 7354 <358405697354>
>
> [image: Visit us online.] <https://www.datastax.com/>  [image: Visit us
> on Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
> YouTube.]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
>   [image: Visit my LinkedIn profile.]
> <https://www.linkedin.com/in/heingo/>
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Henrik Ingo <he...@datastax.com>.

FWIW, I thought I could link to an example MongoDB commit:

https://github.com/mongodb/mongo/commit/dec388494b652488259072cf61fd987af3fa8470

* Fixes start from trunk or whatever is the highest version that includes
the bug
* It is then cherry picked to each stable version that needs to fix. Above
link is an example of such a cherry pick. The original sha is referenced in
the commit message.
* I found that it makes sense to always cherry pick from the immediate
higher version, since if you had to make some changes to the previous
commit, they probably need to be in the next one as well.
* There are no merge commits. Everything is always cherry picked or rebased
to the top of a branch.
* Since this was mentioned, MongoDB indeed tracks the cherry picking
process explicitly: The original SERVER ticket is closed when fix is
committed to trunk branch. However, new BACKPORT tickets are created and
linked to the SERVER ticket, one per stable version that will need a
cherry-pick. This way backporting the fix is never forgotten, as the team
can just track open BACKPRT tickets and work on them to close them.

henrik

On Tue, Dec 14, 2021 at 8:53 PM Joshua McKenzie <jm...@apache.org>
wrote:

> >
> > I like a change originating from just one commit, and having tracking
> > visible across the branches. This gives you immediate information about
> > where and how the change was applied without having to go to the jira
> > ticket (and relying on it being accurate)
>
> I have the exact opposite experience right now (though this may be a
> shortcoming of my env / workflow). When I'm showing annotations in intellij
> and I see walls of merge commits as commit messages and have to bounce over
> to a terminal or open the git panel to figure out what actual commit on a
> different branch contains the minimal commit message pointing to the JIRA
> to go to the PR and actually finally find out _why_ we did a thing, then
> dig around to see if we changed the impl inside a merge commit SHA from the
> original base impl...
>
> Well, that is not my favorite.  :D
>
> All ears on if there's a cleaner way to do the archaeology here.
>
>
> On Tue, Dec 14, 2021 at 1:34 PM Stefan Miklosovic <
> stefan.miklosovic@instaclustr.com> wrote:
>
> > Does somebody else use the git workflow we do as of now in Apache
> > universe? Are not we quite unique? While I do share the same opinion
> > Mick has in his last response, I also see the disadvantage in having
> > the commit history polluted by merges. I am genuinely curious if there
> > is any other Apache project out there doing things same we do (or did
> > in the past) and who changed that in one way or the other, plus
> > reasons behind it.
> >
> > On Tue, 14 Dec 2021 at 19:27, Mick Semb Wever <mc...@apache.org> wrote:
> > >
> > > >
> > > >
> > > > >   Merge commits aren’t that useful
> > > > >
> > > > I keep coming back to this. Arguably the only benefit they offer now
> is
> > > > procedurally forcing us to not miss a bugfix on a branch, but given
> how
> > > > much we amend many things presently anyway that dilutes that benefit.
> > > >
> > >
> > >
> > > Doesn't this come down to how you read git history, and for example
> > > appreciating a change-centric view over branch isolated development?
> > > I like a change originating from just one commit, and having tracking
> > > visible across the branches. This gives you immediate information about
> > > where and how the change was applied without having to go to the jira
> > > ticket (and relying on it being accurate). Connecting commits on
> > different
> > > branches that are developed separately (no merge tracking) is more
> > > complicated. So yeah, I see value in those merge commits. I'm not
> against
> > > trying something new, just would appreciate a bit more exposure to it
> > > before making a project wide change. Hence, let's not rush it and just
> > > start first with trunk.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>


-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>

Re: [DISCUSS] Releasable trunk and quality

Posted by "benedict@apache.org" <be...@apache.org>.

Yeah, I positively dislike merge commits, both from a patch preparation perspective and when trying to piece together a class’ history. It can actively obfuscate the impact to the branch being looked at, as well as make it much harder to skim the git log.

I’d vote to modify our merge strategy irregardless of the benefits to CI.

The more I think on it, the more I am anyway strongly -1 on having some bifurcated commit process. We should decide on a uniform commit process for the whole project, for all patches, whatever that may be.

From: Joshua McKenzie <jm...@apache.org>
Date: Tuesday, 14 December 2021 at 18:53
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] Releasable trunk and quality
>
> I like a change originating from just one commit, and having tracking
> visible across the branches. This gives you immediate information about
> where and how the change was applied without having to go to the jira
> ticket (and relying on it being accurate)

I have the exact opposite experience right now (though this may be a
shortcoming of my env / workflow). When I'm showing annotations in intellij
and I see walls of merge commits as commit messages and have to bounce over
to a terminal or open the git panel to figure out what actual commit on a
different branch contains the minimal commit message pointing to the JIRA
to go to the PR and actually finally find out _why_ we did a thing, then
dig around to see if we changed the impl inside a merge commit SHA from the
original base impl...

Well, that is not my favorite.  :D

All ears on if there's a cleaner way to do the archaeology here.

On Tue, Dec 14, 2021 at 1:34 PM Stefan Miklosovic <
stefan.miklosovic@instaclustr.com> wrote:

> Does somebody else use the git workflow we do as of now in Apache
> universe? Are not we quite unique? While I do share the same opinion
> Mick has in his last response, I also see the disadvantage in having
> the commit history polluted by merges. I am genuinely curious if there
> is any other Apache project out there doing things same we do (or did
> in the past) and who changed that in one way or the other, plus
> reasons behind it.
>
> On Tue, 14 Dec 2021 at 19:27, Mick Semb Wever <mc...@apache.org> wrote:
> >
> > >
> > >
> > > >   Merge commits aren’t that useful
> > > >
> > > I keep coming back to this. Arguably the only benefit they offer now is
> > > procedurally forcing us to not miss a bugfix on a branch, but given how
> > > much we amend many things presently anyway that dilutes that benefit.
> > >
> >
> >
> > Doesn't this come down to how you read git history, and for example
> > appreciating a change-centric view over branch isolated development?
> > I like a change originating from just one commit, and having tracking
> > visible across the branches. This gives you immediate information about
> > where and how the change was applied without having to go to the jira
> > ticket (and relying on it being accurate). Connecting commits on
> different
> > branches that are developed separately (no merge tracking) is more
> > complicated. So yeah, I see value in those merge commits. I'm not against
> > trying something new, just would appreciate a bit more exposure to it
> > before making a project wide change. Hence, let's not rush it and just
> > start first with trunk.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

>
> I like a change originating from just one commit, and having tracking
> visible across the branches. This gives you immediate information about
> where and how the change was applied without having to go to the jira
> ticket (and relying on it being accurate)

I have the exact opposite experience right now (though this may be a
shortcoming of my env / workflow). When I'm showing annotations in intellij
and I see walls of merge commits as commit messages and have to bounce over
to a terminal or open the git panel to figure out what actual commit on a
different branch contains the minimal commit message pointing to the JIRA
to go to the PR and actually finally find out _why_ we did a thing, then
dig around to see if we changed the impl inside a merge commit SHA from the
original base impl...

Well, that is not my favorite.  :D

All ears on if there's a cleaner way to do the archaeology here.


On Tue, Dec 14, 2021 at 1:34 PM Stefan Miklosovic <
stefan.miklosovic@instaclustr.com> wrote:

> Does somebody else use the git workflow we do as of now in Apache
> universe? Are not we quite unique? While I do share the same opinion
> Mick has in his last response, I also see the disadvantage in having
> the commit history polluted by merges. I am genuinely curious if there
> is any other Apache project out there doing things same we do (or did
> in the past) and who changed that in one way or the other, plus
> reasons behind it.
>
> On Tue, 14 Dec 2021 at 19:27, Mick Semb Wever <mc...@apache.org> wrote:
> >
> > >
> > >
> > > >   Merge commits aren’t that useful
> > > >
> > > I keep coming back to this. Arguably the only benefit they offer now is
> > > procedurally forcing us to not miss a bugfix on a branch, but given how
> > > much we amend many things presently anyway that dilutes that benefit.
> > >
> >
> >
> > Doesn't this come down to how you read git history, and for example
> > appreciating a change-centric view over branch isolated development?
> > I like a change originating from just one commit, and having tracking
> > visible across the branches. This gives you immediate information about
> > where and how the change was applied without having to go to the jira
> > ticket (and relying on it being accurate). Connecting commits on
> different
> > branches that are developed separately (no merge tracking) is more
> > complicated. So yeah, I see value in those merge commits. I'm not against
> > trying something new, just would appreciate a bit more exposure to it
> > before making a project wide change. Hence, let's not rush it and just
> > start first with trunk.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Stefan Miklosovic <st...@instaclustr.com>.

Does somebody else use the git workflow we do as of now in Apache
universe? Are not we quite unique? While I do share the same opinion
Mick has in his last response, I also see the disadvantage in having
the commit history polluted by merges. I am genuinely curious if there
is any other Apache project out there doing things same we do (or did
in the past) and who changed that in one way or the other, plus
reasons behind it.

On Tue, 14 Dec 2021 at 19:27, Mick Semb Wever <mc...@apache.org> wrote:
>
> >
> >
> > >   Merge commits aren’t that useful
> > >
> > I keep coming back to this. Arguably the only benefit they offer now is
> > procedurally forcing us to not miss a bugfix on a branch, but given how
> > much we amend many things presently anyway that dilutes that benefit.
> >
>
>
> Doesn't this come down to how you read git history, and for example
> appreciating a change-centric view over branch isolated development?
> I like a change originating from just one commit, and having tracking
> visible across the branches. This gives you immediate information about
> where and how the change was applied without having to go to the jira
> ticket (and relying on it being accurate). Connecting commits on different
> branches that are developed separately (no merge tracking) is more
> complicated. So yeah, I see value in those merge commits. I'm not against
> trying something new, just would appreciate a bit more exposure to it
> before making a project wide change. Hence, let's not rush it and just
> start first with trunk.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by Mick Semb Wever <mc...@apache.org>.

>
>
> >   Merge commits aren’t that useful
> >
> I keep coming back to this. Arguably the only benefit they offer now is
> procedurally forcing us to not miss a bugfix on a branch, but given how
> much we amend many things presently anyway that dilutes that benefit.
>


Doesn't this come down to how you read git history, and for example
appreciating a change-centric view over branch isolated development?
I like a change originating from just one commit, and having tracking
visible across the branches. This gives you immediate information about
where and how the change was applied without having to go to the jira
ticket (and relying on it being accurate). Connecting commits on different
branches that are developed separately (no merge tracking) is more
complicated. So yeah, I see value in those merge commits. I'm not against
trying something new, just would appreciate a bit more exposure to it
before making a project wide change. Hence, let's not rush it and just
start first with trunk.

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

>
>   Merge commits aren’t that useful
>
I keep coming back to this. Arguably the only benefit they offer now is
procedurally forcing us to not miss a bugfix on a branch, but given how
much we amend many things presently anyway that dilutes that benefit.

Having 1/3rd of your commit history be noise and/or things masking changes
does more harm than good IMO.


On Mon, Dec 13, 2021 at 9:51 AM benedict@apache.org <be...@apache.org>
wrote:

> > It makes sense that such bug tickets are incentivised to be minimal, and
> if there is a smarter way (improvement) in trunk that is a separate
> follow-up ticket and patch
>
> Are you proposing separating the work entirely, so that we don’t merge up
> to trunk at all, or do a no-op merge? Often things are done differently in
> trunk (and intervening branches) for a combination of reasons, including
> that the landscape has changed so that the earlier approach is
> inapplicable. Either way, what you are proposing sounds like introducing
> unnecessary additional work?
>
> > and that we have a more concise git history (one ~third the merge
> commits).
>
> Don’t we get a more concise history with the cherry-pick approach, since
> we don’t have any of the merge commits from each prior branch? Today, a
> merge commit from 2.2 will accumulate four merge commits on the way to
> trunk.
>
> My view:
>
>   *   Merge commits aren’t that useful
>   *   It is a bad idea to have a different CI pipeline for multi-version
> development
>   *   It is particularly not worth countenancing solely to retain the
> limited utility of merge commits
>
>
>
> From: Mick Semb Wever <mc...@apache.org>
> Date: Sunday, 12 December 2021 at 11:47
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: [DISCUSS] Releasable trunk and quality
> > I find it cleaner that work is found associated to one sha on the hardest
> > > branch, and we treat (or should be) CI holistically across branches.
> >
> > If we -s ours and amend merge commits on things that straddle stuff like
> > 8099, MS rewrite, Simulator, guardrails, etc, then we have multiple SHA
> > where the impl for a thing took place and N of them (N being count of
> newer
> > branches than oldest where it applies) where they're hidden inside a
> merge
> > commit right?
> >
>
>
> That patches can be significantly different across branches is inavertible.
> One original commit versus individual commits on each branch is a trade-off
> between cleaner git history and github fancies with more direct commits.
> Taking it further, patches to release branches should be as minimal as
> possible. It makes sense that such bug tickets are incentivised to be
> minimal, and if there is a smarter way (improvement) in trunk that is a
> separate follow-up ticket and patch.
>
> So… I am willing to say (for now) that I like it that merge shas have the
> connection to the original singular shas on the hardest branch, and that we
> have a more concise git history (one ~third the merge commits).
>
>
>
> > Also, nothing's keeping us from treating CI holistically and pushing
> > --atomic across multiple branches even if we don't have merge commits.
> > That's just a procedural question we could agree on and adhere to.
> >
>
>
> Sure, but atomic is not the same: it's manual adherence and there's no
> history/record of it.
>

Re: [DISCUSS] Releasable trunk and quality

Posted by "benedict@apache.org" <be...@apache.org>.

> It makes sense that such bug tickets are incentivised to be minimal, and if there is a smarter way (improvement) in trunk that is a separate follow-up ticket and patch

Are you proposing separating the work entirely, so that we don’t merge up to trunk at all, or do a no-op merge? Often things are done differently in trunk (and intervening branches) for a combination of reasons, including that the landscape has changed so that the earlier approach is inapplicable. Either way, what you are proposing sounds like introducing unnecessary additional work?

> and that we have a more concise git history (one ~third the merge commits).

Don’t we get a more concise history with the cherry-pick approach, since we don’t have any of the merge commits from each prior branch? Today, a merge commit from 2.2 will accumulate four merge commits on the way to trunk.

My view:

  *   Merge commits aren’t that useful
  *   It is a bad idea to have a different CI pipeline for multi-version development
  *   It is particularly not worth countenancing solely to retain the limited utility of merge commits



From: Mick Semb Wever <mc...@apache.org>
Date: Sunday, 12 December 2021 at 11:47
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] Releasable trunk and quality
> I find it cleaner that work is found associated to one sha on the hardest
> > branch, and we treat (or should be) CI holistically across branches.
>
> If we -s ours and amend merge commits on things that straddle stuff like
> 8099, MS rewrite, Simulator, guardrails, etc, then we have multiple SHA
> where the impl for a thing took place and N of them (N being count of newer
> branches than oldest where it applies) where they're hidden inside a merge
> commit right?
>


That patches can be significantly different across branches is inavertible.
One original commit versus individual commits on each branch is a trade-off
between cleaner git history and github fancies with more direct commits.
Taking it further, patches to release branches should be as minimal as
possible. It makes sense that such bug tickets are incentivised to be
minimal, and if there is a smarter way (improvement) in trunk that is a
separate follow-up ticket and patch.

So… I am willing to say (for now) that I like it that merge shas have the
connection to the original singular shas on the hardest branch, and that we
have a more concise git history (one ~third the merge commits).



> Also, nothing's keeping us from treating CI holistically and pushing
> --atomic across multiple branches even if we don't have merge commits.
> That's just a procedural question we could agree on and adhere to.
>


Sure, but atomic is not the same: it's manual adherence and there's no
history/record of it.

Re: [DISCUSS] Releasable trunk and quality

Posted by Mick Semb Wever <mc...@apache.org>.

> I find it cleaner that work is found associated to one sha on the hardest
> > branch, and we treat (or should be) CI holistically across branches.
>
> If we -s ours and amend merge commits on things that straddle stuff like
> 8099, MS rewrite, Simulator, guardrails, etc, then we have multiple SHA
> where the impl for a thing took place and N of them (N being count of newer
> branches than oldest where it applies) where they're hidden inside a merge
> commit right?
>


That patches can be significantly different across branches is inavertible.
One original commit versus individual commits on each branch is a trade-off
between cleaner git history and github fancies with more direct commits.
Taking it further, patches to release branches should be as minimal as
possible. It makes sense that such bug tickets are incentivised to be
minimal, and if there is a smarter way (improvement) in trunk that is a
separate follow-up ticket and patch.

So… I am willing to say (for now) that I like it that merge shas have the
connection to the original singular shas on the hardest branch, and that we
have a more concise git history (one ~third the merge commits).



> Also, nothing's keeping us from treating CI holistically and pushing
> --atomic across multiple branches even if we don't have merge commits.
> That's just a procedural question we could agree on and adhere to.
>


Sure, but atomic is not the same: it's manual adherence and there's no
history/record of it.

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

Good with all the above (reasonable arguments) except I don't understand:
>
> I find it cleaner that work is found associated to one sha on the hardest
> branch, and we treat (or should be) CI holistically across branches.

If we -s ours and amend merge commits on things that straddle stuff like
8099, MS rewrite, Simulator, guardrails, etc, then we have multiple SHA
where the impl for a thing took place and N of them (N being count of newer
branches than oldest where it applies) where they're hidden inside a merge
commit right?

Also, nothing's keeping us from treating CI holistically and pushing
--atomic across multiple branches even if we don't have merge commits.
That's just a procedural question we could agree on and adhere to.

On Thu, Dec 9, 2021 at 3:25 PM Brandon Williams <dr...@gmail.com> wrote:

> +1 to trying trunk first.
>
> On Thu, Dec 9, 2021 at 1:52 PM Mick Semb Wever <mc...@apache.org> wrote:
> >
> > >
> > > So let me pose the question here to the list: is there anyone who would
> > > like to advocate for the current merge strategy (apply to oldest LTS,
> merge
> > > up, often -s ours w/new patch applied + amend) instead of "apply to
> trunk
> > > and cherry-pick back to LTS"?
> >
> >
> >
> > I'm in favour of the current merge strategy.
> > I find it cleaner that work is found associated to one sha on the hardest
> > branch, and we treat (or should be) CI holistically across branches.
> > I can appreciate that github makes some things easier, but I suspect it
> > will make other things messier and that there will be other consequences.
> >
> > My understanding was that we would first introduce such github fancies on
> > only commits that are intended for trunk. I am in favour of taking that
> > approach, changing our merge strategy can happen later, once we have
> ironed
> > out how the github/CI/stable-trunk is working best for us. I think this
> > would also help us understand more about the impacts of changing the
> merge
> > strategy.
> >
> > I was also under the impression that we are now aiming to be committing
> > less to the release branches. That means changing the merge strategy is
> of
> > less importance (and that there is benefit to keeping it as-is).
> Certainly
> > the commits on past branches seems very low at the moment, considering
> many
> > users are on 4.0, and this is a trend we want to continue.
> >
> > My opinion, let's take this in two steps (try stuff on just trunk first)…
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Brandon Williams <dr...@gmail.com>.

+1 to trying trunk first.

On Thu, Dec 9, 2021 at 1:52 PM Mick Semb Wever <mc...@apache.org> wrote:
>
> >
> > So let me pose the question here to the list: is there anyone who would
> > like to advocate for the current merge strategy (apply to oldest LTS, merge
> > up, often -s ours w/new patch applied + amend) instead of "apply to trunk
> > and cherry-pick back to LTS"?
>
>
>
> I'm in favour of the current merge strategy.
> I find it cleaner that work is found associated to one sha on the hardest
> branch, and we treat (or should be) CI holistically across branches.
> I can appreciate that github makes some things easier, but I suspect it
> will make other things messier and that there will be other consequences.
>
> My understanding was that we would first introduce such github fancies on
> only commits that are intended for trunk. I am in favour of taking that
> approach, changing our merge strategy can happen later, once we have ironed
> out how the github/CI/stable-trunk is working best for us. I think this
> would also help us understand more about the impacts of changing the merge
> strategy.
>
> I was also under the impression that we are now aiming to be committing
> less to the release branches. That means changing the merge strategy is of
> less importance (and that there is benefit to keeping it as-is). Certainly
> the commits on past branches seems very low at the moment, considering many
> users are on 4.0, and this is a trend we want to continue.
>
> My opinion, let's take this in two steps (try stuff on just trunk first)…

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by Mick Semb Wever <mc...@apache.org>.

>
> So let me pose the question here to the list: is there anyone who would
> like to advocate for the current merge strategy (apply to oldest LTS, merge
> up, often -s ours w/new patch applied + amend) instead of "apply to trunk
> and cherry-pick back to LTS"?



I'm in favour of the current merge strategy.
I find it cleaner that work is found associated to one sha on the hardest
branch, and we treat (or should be) CI holistically across branches.
I can appreciate that github makes some things easier, but I suspect it
will make other things messier and that there will be other consequences.

My understanding was that we would first introduce such github fancies on
only commits that are intended for trunk. I am in favour of taking that
approach, changing our merge strategy can happen later, once we have ironed
out how the github/CI/stable-trunk is working best for us. I think this
would also help us understand more about the impacts of changing the merge
strategy.

I was also under the impression that we are now aiming to be committing
less to the release branches. That means changing the merge strategy is of
less importance (and that there is benefit to keeping it as-is). Certainly
the commits on past branches seems very low at the moment, considering many
users are on 4.0, and this is a trend we want to continue.

My opinion, let's take this in two steps (try stuff on just trunk first)…

Re: [DISCUSS] Releasable trunk and quality

Posted by Brandon Williams <dr...@gmail.com>.

On Tue, Dec 7, 2021 at 11:13 AM Joshua McKenzie <jm...@apache.org> wrote:
>
> I'd frame the reasoning differently: Our current merge strategy is
> vestigial and we can't rely on it in many, if not most, cases. Patches
> rarely merge cleanly across majors requiring -s ours w/amend or other
> changes per branch. This effectively clutters up our git history, hides
> multi-branch changes behind merge commits, makes in-IDE annotations less
> effective, and makes the barrier for reverting bad patches higher.

I suspect another strategy just moves this around, but perhaps for the better.

> On the positive side, it makes it much less likely we will forget to apply
> a bugfix patch on all branches, and it's the Devil we Know and the entire
> project understands and is relatively consistent with the current strategy.
>
> What other positives are there to the current merge strategy that I may not
> be thinking of?

I think that's the big one.  While I don't expect that to be an issue,
there's no way to know if it will become one.  That said, I'm game to
try it, but then, which CI is the source of truth?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

I'd frame the reasoning differently: Our current merge strategy is
vestigial and we can't rely on it in many, if not most, cases. Patches
rarely merge cleanly across majors requiring -s ours w/amend or other
changes per branch. This effectively clutters up our git history, hides
multi-branch changes behind merge commits, makes in-IDE annotations less
effective, and makes the barrier for reverting bad patches higher. It also
just so happens to make it effectively impossible to use github actions to
block merge on green CI.

On the positive side, it makes it much less likely we will forget to apply
a bugfix patch on all branches, and it's the Devil we Know and the entire
project understands and is relatively consistent with the current strategy.

What other positives are there to the current merge strategy that I may not
be thinking of?

~Josh

On Tue, Dec 7, 2021 at 10:35 AM Brandon Williams <dr...@gmail.com> wrote:

> On Tue, Dec 7, 2021 at 8:18 AM Joshua McKenzie <jm...@apache.org>
> wrote:
> > So let me pose the question here to the list: is there anyone who would
> > like to advocate for the current merge strategy (apply to oldest LTS,
> merge
> > up, often -s ours w/new patch applied + amend) instead of "apply to trunk
> > and cherry-pick back to LTS"? If we make this change we'll be able to
> > integrate w/github actions and block merge on green CI + integrate git
> > revert into our processes.
>
> Changing the merge strategy can have deep and possibly unforeseen
> consequences, if the only reasoning is "because github needs it to do
> X" then that reasoning doesn't seem sound enough to me.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Brandon Williams <dr...@gmail.com>.

On Tue, Dec 7, 2021 at 8:18 AM Joshua McKenzie <jm...@apache.org> wrote:
> So let me pose the question here to the list: is there anyone who would
> like to advocate for the current merge strategy (apply to oldest LTS, merge
> up, often -s ours w/new patch applied + amend) instead of "apply to trunk
> and cherry-pick back to LTS"? If we make this change we'll be able to
> integrate w/github actions and block merge on green CI + integrate git
> revert into our processes.

Changing the merge strategy can have deep and possibly unforeseen
consequences, if the only reasoning is "because github needs it to do
X" then that reasoning doesn't seem sound enough to me.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

>
> the need for some external pressure to maintain build quality, and the
> best solution proposed (to my mind) was the use of GitHub actions to
> integrate with various CI services to refuse PRs that do not have a clean
> test run

Honestly, I agree 100% with this. I took the more conservative approach
(refine and standardize what we have + reduce friction) but I've long been
a believer in intentionally setting up incentives and disincentives to
shape behavior.

So let me pose the question here to the list: is there anyone who would
like to advocate for the current merge strategy (apply to oldest LTS, merge
up, often -s ours w/new patch applied + amend) instead of "apply to trunk
and cherry-pick back to LTS"? If we make this change we'll be able to
integrate w/github actions and block merge on green CI + integrate git
revert into our processes.

On Tue, Dec 7, 2021 at 9:08 AM benedict@apache.org <be...@apache.org>
wrote:

> > My personal opinion is we'd be well served to do trunk-based development
> with cherry-picks … to LTS release branches
>
> Agreed.
>
> > that's somewhat orthogonal … to the primary thing this discussion
> surfaced for me
>
> The primary outcome of the discussion for me was the need for some
> external pressure to maintain build quality, and the best solution proposed
> (to my mind) was the use of GitHub actions to integrate with various CI
> services to refuse PRs that do not have a clean test run. This doesn’t
> fully resolve flakiness, but it does provide 95%+ of the necessary pressure
> to maintain test quality, and a consistent way of determining that.
>
> This is how a lot of other projects maintain correctness, and I think how
> many forks of Cassandra are maintained outside of the project as well.
>
> From: Joshua McKenzie <jm...@apache.org>
> Date: Tuesday, 7 December 2021 at 13:08
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: [DISCUSS] Releasable trunk and quality
> >
> > it would be far preferable for consistency of behaviour to rely on shared
> > infrastructure if possible
> >
> For those of us using CircleCI, we can get a lot of the benefit by having a
> script that rewrites and cleans up circle profiles based on use-case; it's
> a shared / consistent environment and the scripting approach gives us
> flexibility to support different workflows with minimal friction (build and
> run every push vs. click to trigger for example).
>
> Is there a reason we discounted modifying the merge strategy?
>
> I took a stab at enumerating some of the current "best in class" I could
> find here:
>
> https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.9b52fp49pp3y
> .
> My personal opinion is we'd be well served to do trunk-based development
> with cherry-picks (and by that I mean basically re-applying) bugfixes back
> to LTS release branches (or perhaps doing bugfix on oldest LTS and applying
> up, tomato tomahto), doing away with merge commits, and using git revert
> more liberally when a commit breaks CI or introduces instability into it.
>
> All that said, that's somewhat orthogonal (or perhaps complementary) to the
> primary thing this discussion surfaced for me, which is that we don't have
> standardization or guidance across what tests, on what JDK's, with what
> config, etc that we run before commits today. My thinking is to get some
> clarity for everyone on that front, reduce friction to encourage that
> behavior, and then visit the merge strategy discussion independently after
> that.
>
> ~Josh
>
>
>
> On Tue, Dec 7, 2021 at 1:08 AM Berenguer Blasi <be...@gmail.com>
> wrote:
>
> > +1. I would add a 'post-commit' step: check the jenkins CI run for your
> > merge and see if sthg broke regardless.
> >
> > On 6/12/21 23:51, Ekaterina Dimitrova wrote:
> > > Hi Josh,
> > > All good questions, thank you for raising this topic.
> > > To the best of my knowledge, we don't have those documented but I will
> > put
> > > notes on what tribal knowledge I know about and I personally follow :-)
> > >
> > >  Pre-commit test suites: * Which JDK's?  - both are officially
> supported
> > so
> > > both.
> > >
> > > * When to include all python tests or do JVM only (if ever)? - if I
> test
> > > only a test fix probably
> > >
> > >  * When to run upgrade tests? - I haven't heard any definitive
> guideline.
> > > Preferably every time but if there is a tiny change I guess it can be
> > > decided for them to be skipped. I would advocate to do more than less.
> > >
> > > * What to do if a test is also failing on the reference root (i.e.
> trunk,
> > > cassandra-4.0, etc)? - check if a ticket exists already, if not - open
> > one
> > > at least, even if I don't plan to work on it at least to acknowledge
> > > the issue and add any info I know about. If we know who broke it, ping
> > the
> > > author to check it.
> > >
> > > * What to do if a test fails intermittently? - Open a ticket. During
> > > investigation - Use the CircleCI jobs for running tests in a loop to
> find
> > > when it fails or to verify the test was fixed. (This is already in my
> > draft
> > > CircleCI document, not yet released as it was pending on the documents
> > > migration.)
> > >
> > > Hope that helps.
> > >
> > > ~Ekaterina
> > >
> > > On Mon, 6 Dec 2021 at 17:20, Joshua McKenzie <jm...@apache.org>
> > wrote:
> > >
> > >> As I work through the scripting on this, I don't know if we've
> > documented
> > >> or clarified the following (don't see it here:
> > >> https://cassandra.apache.org/_/development/testing.html):
> > >>
> > >> Pre-commit test suites:
> > >> * Which JDK's?
> > >> * When to include all python tests or do JVM only (if ever)?
> > >> * When to run upgrade tests?
> > >> * What to do if a test is also failing on the reference root (i.e.
> > trunk,
> > >> cassandra-4.0, etc)?
> > >> * What to do if a test fails intermittently?
> > >>
> > >> I'll also update the above linked documentation once we hammer this
> out
> > and
> > >> try and bake it into the scripting flow as much as possible as well.
> > Goal
> > >> is to make it easy to do the right thing and hard to do the wrong
> thing,
> > >> and to have these things written down rather than have it be tribal
> > >> knowledge that varies a lot across the project.
> > >>
> > >> ~Josh
> > >>
> > >> On Sat, Dec 4, 2021 at 9:04 AM Joshua McKenzie <jm...@apache.org>
> > >> wrote:
> > >>
> > >>> After some offline collab, here's where this thread has landed on a
> > >>> proposal to change our processes to incrementally improve our
> processes
> > >> and
> > >>> hopefully stabilize the state of CI longer term:
> > >>>
> > >>> Link:
> > >>>
> > >>
> >
> https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.16oxqq30bby4
> > >>> Hopefully the mail server doesn't butcher formatting; if it does, hit
> > up
> > >>> the gdoc and leave comments there as should be open to all.
> > >>>
> > >>> Phase 1:
> > >>> Document merge criteria; update circle jobs to have a simple
> pre-merge
> > >> job
> > >>> (one for each JDK profile)
> > >>>      * Donate, document, and formalize usage of circleci-enable.py in
> > ASF
> > >>> repo (need new commit scripts / dev tooling section?)
> > >>>         * rewrites circle config jobs to simple clear flow
> > >>>         * ability to toggle between "run on push" or "click to run"
> > >>>         * Variety of other functionality; see below
> > >>> Document (site, help, README.md) and automate via scripting the
> > >>> relationship / dev / release process around:
> > >>>     * In-jvm dtest
> > >>>     * dtest
> > >>>     * ccm
> > >>> Integrate and document usage of script to build CI repeat test runs
> > >>>     * circleci-enable.py --repeat-unit org.apache.cassandra.SomeTest
> > >>>     * Document “Do this if you add or change tests”
> > >>> Introduce “Build Lead” role
> > >>>     * Weekly rotation; volunteer
> > >>>     * 1: Make sure JIRAs exist for test failures
> > >>>     * 2: Attempt to triage new test failures to root cause and assign
> > out
> > >>>     * 3: Coordinate and drive to green board on trunk
> > >>> Change and automate process for *trunk only* patches:
> > >>>     * Block on green CI (from merge criteria in CI above; potentially
> > >>> stricter definition of "clean" for trunk CI)
> > >>>     * Consider using github PR’s to merge (TODO: determine how to
> > handle
> > >>> circle + CHANGES; see below)
> > >>> Automate process for *multi-branch* merges
> > >>>     * Harden / contribute / document dcapwell script (has one which
> > does
> > >>> the following):
> > >>>         * rebases your branch to the latest (if on 3.0 then rebase
> > >> against
> > >>> cassandra-3.0)
> > >>>         * check compiles
> > >>>         * removes all changes to .circle (can opt-out for circleci
> > >> patches)
> > >>>         * removes all changes to CHANGES.txt and leverages JIRA for
> the
> > >>> content
> > >>>         * checks code still compiles
> > >>>         * changes circle to run ci
> > >>>         * push to a temp branch in git and run CI (circle + Jenkins)
> > >>>             * when all branches are clean (waiting step is manual)
> > >>>             * TODO: Define “clean”
> > >>>                 * No new test failures compared to reference?
> > >>>                 * Or no test failures at all?
> > >>>             * merge changes into the actual branches
> > >>>             * merge up changes; rewriting diff
> > >>>             * push --atomic
> > >>>
> > >>> Transition to phase 2 when:
> > >>>     * All items from phase 1 are complete
> > >>>     * Test boards for supported branches are green
> > >>>
> > >>> Phase 2:
> > >>> * Add Harry to recurring run against trunk
> > >>> * Add Harry to release pipeline
> > >>> * Suite of perf tests against trunk recurring
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Nov 17, 2021 at 1:42 PM Joshua McKenzie <
> jmckenzie@apache.org>
> > >>> wrote:
> > >>>
> > >>>> Sorry for not catching that Benedict, you're absolutely right. So
> long
> > >> as
> > >>>> we're using merge commits between branches I don't think
> auto-merging
> > >> via
> > >>>> train or blocking on green CI are options via the tooling, and
> > >> multi-branch
> > >>>> reverts will be something we should document very clearly should we
> > even
> > >>>> choose to go that route (a lot of room to make mistakes there).
> > >>>>
> > >>>> It may not be a huge issue as we can expect the more disruptive
> > changes
> > >>>> (i.e. potentially destabilizing) to be happening on trunk only, so
> > >> perhaps
> > >>>> we can get away with slightly different workflows or policies based
> on
> > >>>> whether you're doing a multi-branch bugfix or a feature on trunk.
> > Bears
> > >>>> thinking more deeply about.
> > >>>>
> > >>>> I'd also be game for revisiting our merge strategy. I don't see much
> > >>>> difference in labor between merging between branches vs. preparing
> > >> separate
> > >>>> patches for an individual developer, however I'm sure there's
> > >> maintenance
> > >>>> and integration implications there I'm not thinking of right now.
> > >>>>
> > >>>> On Wed, Nov 17, 2021 at 12:03 PM benedict@apache.org <
> > >> benedict@apache.org>
> > >>>> wrote:
> > >>>>
> > >>>>> I raised this before, but to highlight it again: how do these
> > >> approaches
> > >>>>> interface with our merge strategy?
> > >>>>>
> > >>>>> We might have to rebase several dependent merge commits and want to
> > >>>>> merge them atomically. So far as I know these tools don’t work
> > >>>>> fantastically in this scenario, but if I’m wrong that’s fantastic.
> If
> > >> not,
> > >>>>> given how important these things are, should we consider revisiting
> > our
> > >>>>> merge strategy?
> > >>>>>
> > >>>>> From: Joshua McKenzie <jm...@apache.org>
> > >>>>> Date: Wednesday, 17 November 2021 at 16:39
> > >>>>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> > >>>>> Subject: Re: [DISCUSS] Releasable trunk and quality
> > >>>>> Thanks for the feedback and insight Henrik; it's valuable to hear
> how
> > >>>>> other
> > >>>>> large complex infra projects have tackled this problem set.
> > >>>>>
> > >>>>> To attempt to summarize, what I got from your email:
> > >>>>> [Phase one]
> > >>>>> 1) Build Barons: rotation where there's always someone active tying
> > >>>>> failures to changes and adding those failures to our ticketing
> system
> > >>>>> 2) Best effort process of "test breakers" being assigned tickets to
> > fix
> > >>>>> the
> > >>>>> things their work broke
> > >>>>> 3) Moving to a culture where we regularly revert commits that break
> > >> tests
> > >>>>> 4) Running tests before we merge changes
> > >>>>>
> > >>>>> [Phase two]
> > >>>>> 1) Suite of performance tests on a regular cadence against trunk
> > >>>>> (w/hunter
> > >>>>> or otherwise)
> > >>>>> 2) Integration w/ github merge-train pipelines
> > >>>>>
> > >>>>> That cover the highlights? I agree with these points as useful
> places
> > >> for
> > >>>>> us to invest in as a project and I'll work on getting this into a
> > gdoc
> > >>>>> for
> > >>>>> us to align on and discuss further this week.
> > >>>>>
> > >>>>> ~Josh
> > >>>>>
> > >>>>>
> > >>>>> On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo <
> > henrik.ingo@datastax.com
> > >>>>> wrote:
> > >>>>>
> > >>>>>> There's an old joke: How many people read Slashdot? The answer is
> 5.
> > >>>>> The
> > >>>>>> rest of us just write comments without reading... In that spirit,
> I
> > >>>>> wanted
> > >>>>>> to share some thoughts in response to your question, even if I
> know
> > >>>>> some of
> > >>>>>> it will have been said in this thread already :-)
> > >>>>>>
> > >>>>>> Basically, I just want to share what has worked well in my past
> > >>>>> projects...
> > >>>>>> Visualization: Now that we have Butler running, we can already
> see a
> > >>>>>> decline in failing tests for 4.0 and trunk! This shows that
> > >>>>> contributors
> > >>>>>> want to do the right thing, we just need the right tools and
> > >> processes
> > >>>>> to
> > >>>>>> achieve success.
> > >>>>>>
> > >>>>>> Process: I'm confident we will soon be back to seeing 0 failures
> for
> > >>>>> 4.0
> > >>>>>> and trunk. However, keeping that state requires constant
> vigilance!
> > >> At
> > >>>>>> Mongodb we had a role called Build Baron (aka Build Cop, etc...).
> > >> This
> > >>>>> is a
> > >>>>>> weekly rotating role where the person who is the Build Baron will
> at
> > >>>>> least
> > >>>>>> once per day go through all of the Butler dashboards to catch new
> > >>>>>> regressions early. We have used the same process also at Datastax
> to
> > >>>>> guard
> > >>>>>> our downstream fork of Cassandra 4.0. It's the responsibility of
> the
> > >>>>> Build
> > >>>>>> Baron to
> > >>>>>>  - file a jira ticket for new failures
> > >>>>>>  - determine which commit is responsible for introducing the
> > >>>>> regression.
> > >>>>>> Sometimes this is obvious, sometimes this requires "bisecting" by
> > >>>>> running
> > >>>>>> more builds e.g. between two nightly builds.
> > >>>>>>  - assign the jira ticket to the author of the commit that
> > introduced
> > >>>>> the
> > >>>>>> regression
> > >>>>>>
> > >>>>>> Given that Cassandra is a community that includes part time and
> > >>>>> volunteer
> > >>>>>> developers, we may want to try some variation of this, such as
> > >> pairing
> > >>>>> 2
> > >>>>>> build barons each week?
> > >>>>>>
> > >>>>>> Reverting: A policy that the commit causing the regression is
> > >>>>> automatically
> > >>>>>> reverted can be scary. It takes courage to be the junior test
> > >> engineer
> > >>>>> who
> > >>>>>> reverts yesterday's commit from the founder and CTO, just to give
> an
> > >>>>>> example... Yet this is the most efficient way to keep the build
> > >> green.
> > >>>>> And
> > >>>>>> it turns out it's not that much additional work for the original
> > >>>>> author to
> > >>>>>> fix the issue and then re-merge the patch.
> > >>>>>>
> > >>>>>> Merge-train: For any project with more than 1 commit per day, it
> > will
> > >>>>>> inevitably happen that you need to rebase a PR before merging, and
> > >>>>> even if
> > >>>>>> it passed all tests before, after rebase it won't. In the
> downstream
> > >>>>>> Cassandra fork previously mentioned, we have tried to enable a
> > github
> > >>>>> rule
> > >>>>>> which requires a) that all tests passed before merging, and b) the
> > PR
> > >>>>> is
> > >>>>>> against the head of the branch merged into, and c) the tests were
> > run
> > >>>>> after
> > >>>>>> such rebase. Unfortunately this leads to infinite loops where a
> > large
> > >>>>> PR
> > >>>>>> may never be able to commit because it has to be rebased again and
> > >>>>> again
> > >>>>>> when smaller PRs can merge faster. The solution to this problem is
> > to
> > >>>>> have
> > >>>>>> an automated process for the rebase-test-merge cycle. Gitlab
> > supports
> > >>>>> such
> > >>>>>> a feature and calls it merge-trean:
> > >>>>>> https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html
> > >>>>>>
> > >>>>>> The merge-train can be considered an advanced feature and we can
> > >>>>> return to
> > >>>>>> it later. The other points should be sufficient to keep a
> reasonably
> > >>>>> green
> > >>>>>> trunk.
> > >>>>>>
> > >>>>>> I guess the major area where we can improve daily test coverage
> > would
> > >>>>> be
> > >>>>>> performance tests. To that end we recently open sourced a nice
> tool
> > >>>>> that
> > >>>>>> can algorithmically detects performance regressions in a
> timeseries
> > >>>>> history
> > >>>>>> of benchmark results: https://github.com/datastax-labs/hunter
> Just
> > >>>>> like
> > >>>>>> with correctness testing it's my experience that catching
> > regressions
> > >>>>> the
> > >>>>>> day they happened is much better than trying to do it at beta or
> rc
> > >>>>> time.
> > >>>>>> Piotr also blogged about Hunter when it was released:
> > >>>>>>
> > >>>>>>
> > >>
> >
> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4
> > >>>>>> henrik
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <
> > >> jmckenzie@apache.org>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> We as a project have gone back and forth on the topic of quality
> > >> and
> > >>>>> the
> > >>>>>>> notion of a releasable trunk for quite a few years. If people are
> > >>>>>>> interested, I'd like to rekindle this discussion a bit and see if
> > >>>>> we're
> > >>>>>>> happy with where we are as a project or if we think there's steps
> > >> we
> > >>>>>> should
> > >>>>>>> take to change the quality bar going forward. The following
> > >> questions
> > >>>>>> have
> > >>>>>>> been rattling around for me for awhile:
> > >>>>>>>
> > >>>>>>> 1. How do we define what "releasable trunk" means? All reviewed
> by
> > >> M
> > >>>>>>> committers? Passing N% of tests? Passing all tests plus some
> other
> > >>>>>> metrics
> > >>>>>>> (manual testing, raising the number of reviewers, test coverage,
> > >>>>> usage in
> > >>>>>>> dev or QA environments, etc)? Something else entirely?
> > >>>>>>>
> > >>>>>>> 2. With a definition settled upon in #1, what steps, if any, do
> we
> > >>>>> need
> > >>>>>> to
> > >>>>>>> take to get from where we are to having *and keeping* that
> > >> releasable
> > >>>>>>> trunk? Anything to codify there?
> > >>>>>>>
> > >>>>>>> 3. What are the benefits of having a releasable trunk as defined
> > >>>>> here?
> > >>>>>> What
> > >>>>>>> are the costs? Is it worth pursuing? What are the alternatives
> (for
> > >>>>>>> instance: a freeze before a release + stabilization focus by the
> > >>>>>> community
> > >>>>>>> i.e. 4.0 push or the tock in tick-tock)?
> > >>>>>>>
> > >>>>>>> Given the large volumes of work coming down the pike with CEP's,
> > >> this
> > >>>>>> seems
> > >>>>>>> like a good time to at least check in on this topic as a
> community.
> > >>>>>>>
> > >>>>>>> Full disclosure: running face-first into 60+ failing tests on
> trunk
> > >>>>> when
> > >>>>>>> going through the commit process for denylisting this week
> brought
> > >>>>> this
> > >>>>>>> topic back up for me (reminds me of when I went to merge CDC back
> > >> in
> > >>>>> 3.6
> > >>>>>>> and those test failures riled me up... I sense a pattern ;))
> > >>>>>>>
> > >>>>>>> Looking forward to hearing what people think.
> > >>>>>>>
> > >>>>>>> ~Josh
> > >>>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>>
> > >>>>>> Henrik Ingo
> > >>>>>>
> > >>>>>> +358 40 569 7354 <358405697354>
> > >>>>>>
> > >>>>>> [image: Visit us online.] <https://www.datastax.com/>  [image:
> > Visit
> > >>>>> us on
> > >>>>>> Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
> > >>>>> YouTube.]
> > >>>>>> <
> > >>>>>>
> > >>
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=
> > >>>>>>   [image: Visit my LinkedIn profile.] <
> > >>>>> https://www.linkedin.com/in/heingo/
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>

Re: [DISCUSS] Releasable trunk and quality

Posted by "benedict@apache.org" <be...@apache.org>.

> My personal opinion is we'd be well served to do trunk-based development
with cherry-picks … to LTS release branches

Agreed.

> that's somewhat orthogonal … to the primary thing this discussion surfaced for me

The primary outcome of the discussion for me was the need for some external pressure to maintain build quality, and the best solution proposed (to my mind) was the use of GitHub actions to integrate with various CI services to refuse PRs that do not have a clean test run. This doesn’t fully resolve flakiness, but it does provide 95%+ of the necessary pressure to maintain test quality, and a consistent way of determining that.

This is how a lot of other projects maintain correctness, and I think how many forks of Cassandra are maintained outside of the project as well.

From: Joshua McKenzie <jm...@apache.org>
Date: Tuesday, 7 December 2021 at 13:08
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] Releasable trunk and quality
>
> it would be far preferable for consistency of behaviour to rely on shared
> infrastructure if possible
>
For those of us using CircleCI, we can get a lot of the benefit by having a
script that rewrites and cleans up circle profiles based on use-case; it's
a shared / consistent environment and the scripting approach gives us
flexibility to support different workflows with minimal friction (build and
run every push vs. click to trigger for example).

Is there a reason we discounted modifying the merge strategy?

I took a stab at enumerating some of the current "best in class" I could
find here:
https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.9b52fp49pp3y.
My personal opinion is we'd be well served to do trunk-based development
with cherry-picks (and by that I mean basically re-applying) bugfixes back
to LTS release branches (or perhaps doing bugfix on oldest LTS and applying
up, tomato tomahto), doing away with merge commits, and using git revert
more liberally when a commit breaks CI or introduces instability into it.

All that said, that's somewhat orthogonal (or perhaps complementary) to the
primary thing this discussion surfaced for me, which is that we don't have
standardization or guidance across what tests, on what JDK's, with what
config, etc that we run before commits today. My thinking is to get some
clarity for everyone on that front, reduce friction to encourage that
behavior, and then visit the merge strategy discussion independently after
that.

~Josh



On Tue, Dec 7, 2021 at 1:08 AM Berenguer Blasi <be...@gmail.com>
wrote:

> +1. I would add a 'post-commit' step: check the jenkins CI run for your
> merge and see if sthg broke regardless.
>
> On 6/12/21 23:51, Ekaterina Dimitrova wrote:
> > Hi Josh,
> > All good questions, thank you for raising this topic.
> > To the best of my knowledge, we don't have those documented but I will
> put
> > notes on what tribal knowledge I know about and I personally follow :-)
> >
> >  Pre-commit test suites: * Which JDK's?  - both are officially supported
> so
> > both.
> >
> > * When to include all python tests or do JVM only (if ever)? - if I test
> > only a test fix probably
> >
> >  * When to run upgrade tests? - I haven't heard any definitive guideline.
> > Preferably every time but if there is a tiny change I guess it can be
> > decided for them to be skipped. I would advocate to do more than less.
> >
> > * What to do if a test is also failing on the reference root (i.e. trunk,
> > cassandra-4.0, etc)? - check if a ticket exists already, if not - open
> one
> > at least, even if I don't plan to work on it at least to acknowledge
> > the issue and add any info I know about. If we know who broke it, ping
> the
> > author to check it.
> >
> > * What to do if a test fails intermittently? - Open a ticket. During
> > investigation - Use the CircleCI jobs for running tests in a loop to find
> > when it fails or to verify the test was fixed. (This is already in my
> draft
> > CircleCI document, not yet released as it was pending on the documents
> > migration.)
> >
> > Hope that helps.
> >
> > ~Ekaterina
> >
> > On Mon, 6 Dec 2021 at 17:20, Joshua McKenzie <jm...@apache.org>
> wrote:
> >
> >> As I work through the scripting on this, I don't know if we've
> documented
> >> or clarified the following (don't see it here:
> >> https://cassandra.apache.org/_/development/testing.html):
> >>
> >> Pre-commit test suites:
> >> * Which JDK's?
> >> * When to include all python tests or do JVM only (if ever)?
> >> * When to run upgrade tests?
> >> * What to do if a test is also failing on the reference root (i.e.
> trunk,
> >> cassandra-4.0, etc)?
> >> * What to do if a test fails intermittently?
> >>
> >> I'll also update the above linked documentation once we hammer this out
> and
> >> try and bake it into the scripting flow as much as possible as well.
> Goal
> >> is to make it easy to do the right thing and hard to do the wrong thing,
> >> and to have these things written down rather than have it be tribal
> >> knowledge that varies a lot across the project.
> >>
> >> ~Josh
> >>
> >> On Sat, Dec 4, 2021 at 9:04 AM Joshua McKenzie <jm...@apache.org>
> >> wrote:
> >>
> >>> After some offline collab, here's where this thread has landed on a
> >>> proposal to change our processes to incrementally improve our processes
> >> and
> >>> hopefully stabilize the state of CI longer term:
> >>>
> >>> Link:
> >>>
> >>
> https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.16oxqq30bby4
> >>> Hopefully the mail server doesn't butcher formatting; if it does, hit
> up
> >>> the gdoc and leave comments there as should be open to all.
> >>>
> >>> Phase 1:
> >>> Document merge criteria; update circle jobs to have a simple pre-merge
> >> job
> >>> (one for each JDK profile)
> >>>      * Donate, document, and formalize usage of circleci-enable.py in
> ASF
> >>> repo (need new commit scripts / dev tooling section?)
> >>>         * rewrites circle config jobs to simple clear flow
> >>>         * ability to toggle between "run on push" or "click to run"
> >>>         * Variety of other functionality; see below
> >>> Document (site, help, README.md) and automate via scripting the
> >>> relationship / dev / release process around:
> >>>     * In-jvm dtest
> >>>     * dtest
> >>>     * ccm
> >>> Integrate and document usage of script to build CI repeat test runs
> >>>     * circleci-enable.py --repeat-unit org.apache.cassandra.SomeTest
> >>>     * Document “Do this if you add or change tests”
> >>> Introduce “Build Lead” role
> >>>     * Weekly rotation; volunteer
> >>>     * 1: Make sure JIRAs exist for test failures
> >>>     * 2: Attempt to triage new test failures to root cause and assign
> out
> >>>     * 3: Coordinate and drive to green board on trunk
> >>> Change and automate process for *trunk only* patches:
> >>>     * Block on green CI (from merge criteria in CI above; potentially
> >>> stricter definition of "clean" for trunk CI)
> >>>     * Consider using github PR’s to merge (TODO: determine how to
> handle
> >>> circle + CHANGES; see below)
> >>> Automate process for *multi-branch* merges
> >>>     * Harden / contribute / document dcapwell script (has one which
> does
> >>> the following):
> >>>         * rebases your branch to the latest (if on 3.0 then rebase
> >> against
> >>> cassandra-3.0)
> >>>         * check compiles
> >>>         * removes all changes to .circle (can opt-out for circleci
> >> patches)
> >>>         * removes all changes to CHANGES.txt and leverages JIRA for the
> >>> content
> >>>         * checks code still compiles
> >>>         * changes circle to run ci
> >>>         * push to a temp branch in git and run CI (circle + Jenkins)
> >>>             * when all branches are clean (waiting step is manual)
> >>>             * TODO: Define “clean”
> >>>                 * No new test failures compared to reference?
> >>>                 * Or no test failures at all?
> >>>             * merge changes into the actual branches
> >>>             * merge up changes; rewriting diff
> >>>             * push --atomic
> >>>
> >>> Transition to phase 2 when:
> >>>     * All items from phase 1 are complete
> >>>     * Test boards for supported branches are green
> >>>
> >>> Phase 2:
> >>> * Add Harry to recurring run against trunk
> >>> * Add Harry to release pipeline
> >>> * Suite of perf tests against trunk recurring
> >>>
> >>>
> >>>
> >>> On Wed, Nov 17, 2021 at 1:42 PM Joshua McKenzie <jm...@apache.org>
> >>> wrote:
> >>>
> >>>> Sorry for not catching that Benedict, you're absolutely right. So long
> >> as
> >>>> we're using merge commits between branches I don't think auto-merging
> >> via
> >>>> train or blocking on green CI are options via the tooling, and
> >> multi-branch
> >>>> reverts will be something we should document very clearly should we
> even
> >>>> choose to go that route (a lot of room to make mistakes there).
> >>>>
> >>>> It may not be a huge issue as we can expect the more disruptive
> changes
> >>>> (i.e. potentially destabilizing) to be happening on trunk only, so
> >> perhaps
> >>>> we can get away with slightly different workflows or policies based on
> >>>> whether you're doing a multi-branch bugfix or a feature on trunk.
> Bears
> >>>> thinking more deeply about.
> >>>>
> >>>> I'd also be game for revisiting our merge strategy. I don't see much
> >>>> difference in labor between merging between branches vs. preparing
> >> separate
> >>>> patches for an individual developer, however I'm sure there's
> >> maintenance
> >>>> and integration implications there I'm not thinking of right now.
> >>>>
> >>>> On Wed, Nov 17, 2021 at 12:03 PM benedict@apache.org <
> >> benedict@apache.org>
> >>>> wrote:
> >>>>
> >>>>> I raised this before, but to highlight it again: how do these
> >> approaches
> >>>>> interface with our merge strategy?
> >>>>>
> >>>>> We might have to rebase several dependent merge commits and want to
> >>>>> merge them atomically. So far as I know these tools don’t work
> >>>>> fantastically in this scenario, but if I’m wrong that’s fantastic. If
> >> not,
> >>>>> given how important these things are, should we consider revisiting
> our
> >>>>> merge strategy?
> >>>>>
> >>>>> From: Joshua McKenzie <jm...@apache.org>
> >>>>> Date: Wednesday, 17 November 2021 at 16:39
> >>>>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> >>>>> Subject: Re: [DISCUSS] Releasable trunk and quality
> >>>>> Thanks for the feedback and insight Henrik; it's valuable to hear how
> >>>>> other
> >>>>> large complex infra projects have tackled this problem set.
> >>>>>
> >>>>> To attempt to summarize, what I got from your email:
> >>>>> [Phase one]
> >>>>> 1) Build Barons: rotation where there's always someone active tying
> >>>>> failures to changes and adding those failures to our ticketing system
> >>>>> 2) Best effort process of "test breakers" being assigned tickets to
> fix
> >>>>> the
> >>>>> things their work broke
> >>>>> 3) Moving to a culture where we regularly revert commits that break
> >> tests
> >>>>> 4) Running tests before we merge changes
> >>>>>
> >>>>> [Phase two]
> >>>>> 1) Suite of performance tests on a regular cadence against trunk
> >>>>> (w/hunter
> >>>>> or otherwise)
> >>>>> 2) Integration w/ github merge-train pipelines
> >>>>>
> >>>>> That cover the highlights? I agree with these points as useful places
> >> for
> >>>>> us to invest in as a project and I'll work on getting this into a
> gdoc
> >>>>> for
> >>>>> us to align on and discuss further this week.
> >>>>>
> >>>>> ~Josh
> >>>>>
> >>>>>
> >>>>> On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo <
> henrik.ingo@datastax.com
> >>>>> wrote:
> >>>>>
> >>>>>> There's an old joke: How many people read Slashdot? The answer is 5.
> >>>>> The
> >>>>>> rest of us just write comments without reading... In that spirit, I
> >>>>> wanted
> >>>>>> to share some thoughts in response to your question, even if I know
> >>>>> some of
> >>>>>> it will have been said in this thread already :-)
> >>>>>>
> >>>>>> Basically, I just want to share what has worked well in my past
> >>>>> projects...
> >>>>>> Visualization: Now that we have Butler running, we can already see a
> >>>>>> decline in failing tests for 4.0 and trunk! This shows that
> >>>>> contributors
> >>>>>> want to do the right thing, we just need the right tools and
> >> processes
> >>>>> to
> >>>>>> achieve success.
> >>>>>>
> >>>>>> Process: I'm confident we will soon be back to seeing 0 failures for
> >>>>> 4.0
> >>>>>> and trunk. However, keeping that state requires constant vigilance!
> >> At
> >>>>>> Mongodb we had a role called Build Baron (aka Build Cop, etc...).
> >> This
> >>>>> is a
> >>>>>> weekly rotating role where the person who is the Build Baron will at
> >>>>> least
> >>>>>> once per day go through all of the Butler dashboards to catch new
> >>>>>> regressions early. We have used the same process also at Datastax to
> >>>>> guard
> >>>>>> our downstream fork of Cassandra 4.0. It's the responsibility of the
> >>>>> Build
> >>>>>> Baron to
> >>>>>>  - file a jira ticket for new failures
> >>>>>>  - determine which commit is responsible for introducing the
> >>>>> regression.
> >>>>>> Sometimes this is obvious, sometimes this requires "bisecting" by
> >>>>> running
> >>>>>> more builds e.g. between two nightly builds.
> >>>>>>  - assign the jira ticket to the author of the commit that
> introduced
> >>>>> the
> >>>>>> regression
> >>>>>>
> >>>>>> Given that Cassandra is a community that includes part time and
> >>>>> volunteer
> >>>>>> developers, we may want to try some variation of this, such as
> >> pairing
> >>>>> 2
> >>>>>> build barons each week?
> >>>>>>
> >>>>>> Reverting: A policy that the commit causing the regression is
> >>>>> automatically
> >>>>>> reverted can be scary. It takes courage to be the junior test
> >> engineer
> >>>>> who
> >>>>>> reverts yesterday's commit from the founder and CTO, just to give an
> >>>>>> example... Yet this is the most efficient way to keep the build
> >> green.
> >>>>> And
> >>>>>> it turns out it's not that much additional work for the original
> >>>>> author to
> >>>>>> fix the issue and then re-merge the patch.
> >>>>>>
> >>>>>> Merge-train: For any project with more than 1 commit per day, it
> will
> >>>>>> inevitably happen that you need to rebase a PR before merging, and
> >>>>> even if
> >>>>>> it passed all tests before, after rebase it won't. In the downstream
> >>>>>> Cassandra fork previously mentioned, we have tried to enable a
> github
> >>>>> rule
> >>>>>> which requires a) that all tests passed before merging, and b) the
> PR
> >>>>> is
> >>>>>> against the head of the branch merged into, and c) the tests were
> run
> >>>>> after
> >>>>>> such rebase. Unfortunately this leads to infinite loops where a
> large
> >>>>> PR
> >>>>>> may never be able to commit because it has to be rebased again and
> >>>>> again
> >>>>>> when smaller PRs can merge faster. The solution to this problem is
> to
> >>>>> have
> >>>>>> an automated process for the rebase-test-merge cycle. Gitlab
> supports
> >>>>> such
> >>>>>> a feature and calls it merge-trean:
> >>>>>> https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html
> >>>>>>
> >>>>>> The merge-train can be considered an advanced feature and we can
> >>>>> return to
> >>>>>> it later. The other points should be sufficient to keep a reasonably
> >>>>> green
> >>>>>> trunk.
> >>>>>>
> >>>>>> I guess the major area where we can improve daily test coverage
> would
> >>>>> be
> >>>>>> performance tests. To that end we recently open sourced a nice tool
> >>>>> that
> >>>>>> can algorithmically detects performance regressions in a timeseries
> >>>>> history
> >>>>>> of benchmark results: https://github.com/datastax-labs/hunter Just
> >>>>> like
> >>>>>> with correctness testing it's my experience that catching
> regressions
> >>>>> the
> >>>>>> day they happened is much better than trying to do it at beta or rc
> >>>>> time.
> >>>>>> Piotr also blogged about Hunter when it was released:
> >>>>>>
> >>>>>>
> >>
> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4
> >>>>>> henrik
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <
> >> jmckenzie@apache.org>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> We as a project have gone back and forth on the topic of quality
> >> and
> >>>>> the
> >>>>>>> notion of a releasable trunk for quite a few years. If people are
> >>>>>>> interested, I'd like to rekindle this discussion a bit and see if
> >>>>> we're
> >>>>>>> happy with where we are as a project or if we think there's steps
> >> we
> >>>>>> should
> >>>>>>> take to change the quality bar going forward. The following
> >> questions
> >>>>>> have
> >>>>>>> been rattling around for me for awhile:
> >>>>>>>
> >>>>>>> 1. How do we define what "releasable trunk" means? All reviewed by
> >> M
> >>>>>>> committers? Passing N% of tests? Passing all tests plus some other
> >>>>>> metrics
> >>>>>>> (manual testing, raising the number of reviewers, test coverage,
> >>>>> usage in
> >>>>>>> dev or QA environments, etc)? Something else entirely?
> >>>>>>>
> >>>>>>> 2. With a definition settled upon in #1, what steps, if any, do we
> >>>>> need
> >>>>>> to
> >>>>>>> take to get from where we are to having *and keeping* that
> >> releasable
> >>>>>>> trunk? Anything to codify there?
> >>>>>>>
> >>>>>>> 3. What are the benefits of having a releasable trunk as defined
> >>>>> here?
> >>>>>> What
> >>>>>>> are the costs? Is it worth pursuing? What are the alternatives (for
> >>>>>>> instance: a freeze before a release + stabilization focus by the
> >>>>>> community
> >>>>>>> i.e. 4.0 push or the tock in tick-tock)?
> >>>>>>>
> >>>>>>> Given the large volumes of work coming down the pike with CEP's,
> >> this
> >>>>>> seems
> >>>>>>> like a good time to at least check in on this topic as a community.
> >>>>>>>
> >>>>>>> Full disclosure: running face-first into 60+ failing tests on trunk
> >>>>> when
> >>>>>>> going through the commit process for denylisting this week brought
> >>>>> this
> >>>>>>> topic back up for me (reminds me of when I went to merge CDC back
> >> in
> >>>>> 3.6
> >>>>>>> and those test failures riled me up... I sense a pattern ;))
> >>>>>>>
> >>>>>>> Looking forward to hearing what people think.
> >>>>>>>
> >>>>>>> ~Josh
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Henrik Ingo
> >>>>>>
> >>>>>> +358 40 569 7354 <358405697354>
> >>>>>>
> >>>>>> [image: Visit us online.] <https://www.datastax.com/>  [image:
> Visit
> >>>>> us on
> >>>>>> Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
> >>>>> YouTube.]
> >>>>>> <
> >>>>>>
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=
> >>>>>>   [image: Visit my LinkedIn profile.] <
> >>>>> https://www.linkedin.com/in/heingo/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

>
> it would be far preferable for consistency of behaviour to rely on shared
> infrastructure if possible
>
For those of us using CircleCI, we can get a lot of the benefit by having a
script that rewrites and cleans up circle profiles based on use-case; it's
a shared / consistent environment and the scripting approach gives us
flexibility to support different workflows with minimal friction (build and
run every push vs. click to trigger for example).

Is there a reason we discounted modifying the merge strategy?

I took a stab at enumerating some of the current "best in class" I could
find here:
https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.9b52fp49pp3y.
My personal opinion is we'd be well served to do trunk-based development
with cherry-picks (and by that I mean basically re-applying) bugfixes back
to LTS release branches (or perhaps doing bugfix on oldest LTS and applying
up, tomato tomahto), doing away with merge commits, and using git revert
more liberally when a commit breaks CI or introduces instability into it.

All that said, that's somewhat orthogonal (or perhaps complementary) to the
primary thing this discussion surfaced for me, which is that we don't have
standardization or guidance across what tests, on what JDK's, with what
config, etc that we run before commits today. My thinking is to get some
clarity for everyone on that front, reduce friction to encourage that
behavior, and then visit the merge strategy discussion independently after
that.

~Josh



On Tue, Dec 7, 2021 at 1:08 AM Berenguer Blasi <be...@gmail.com>
wrote:

> +1. I would add a 'post-commit' step: check the jenkins CI run for your
> merge and see if sthg broke regardless.
>
> On 6/12/21 23:51, Ekaterina Dimitrova wrote:
> > Hi Josh,
> > All good questions, thank you for raising this topic.
> > To the best of my knowledge, we don't have those documented but I will
> put
> > notes on what tribal knowledge I know about and I personally follow :-)
> >
> >  Pre-commit test suites: * Which JDK's?  - both are officially supported
> so
> > both.
> >
> > * When to include all python tests or do JVM only (if ever)? - if I test
> > only a test fix probably
> >
> >  * When to run upgrade tests? - I haven't heard any definitive guideline.
> > Preferably every time but if there is a tiny change I guess it can be
> > decided for them to be skipped. I would advocate to do more than less.
> >
> > * What to do if a test is also failing on the reference root (i.e. trunk,
> > cassandra-4.0, etc)? - check if a ticket exists already, if not - open
> one
> > at least, even if I don't plan to work on it at least to acknowledge
> > the issue and add any info I know about. If we know who broke it, ping
> the
> > author to check it.
> >
> > * What to do if a test fails intermittently? - Open a ticket. During
> > investigation - Use the CircleCI jobs for running tests in a loop to find
> > when it fails or to verify the test was fixed. (This is already in my
> draft
> > CircleCI document, not yet released as it was pending on the documents
> > migration.)
> >
> > Hope that helps.
> >
> > ~Ekaterina
> >
> > On Mon, 6 Dec 2021 at 17:20, Joshua McKenzie <jm...@apache.org>
> wrote:
> >
> >> As I work through the scripting on this, I don't know if we've
> documented
> >> or clarified the following (don't see it here:
> >> https://cassandra.apache.org/_/development/testing.html):
> >>
> >> Pre-commit test suites:
> >> * Which JDK's?
> >> * When to include all python tests or do JVM only (if ever)?
> >> * When to run upgrade tests?
> >> * What to do if a test is also failing on the reference root (i.e.
> trunk,
> >> cassandra-4.0, etc)?
> >> * What to do if a test fails intermittently?
> >>
> >> I'll also update the above linked documentation once we hammer this out
> and
> >> try and bake it into the scripting flow as much as possible as well.
> Goal
> >> is to make it easy to do the right thing and hard to do the wrong thing,
> >> and to have these things written down rather than have it be tribal
> >> knowledge that varies a lot across the project.
> >>
> >> ~Josh
> >>
> >> On Sat, Dec 4, 2021 at 9:04 AM Joshua McKenzie <jm...@apache.org>
> >> wrote:
> >>
> >>> After some offline collab, here's where this thread has landed on a
> >>> proposal to change our processes to incrementally improve our processes
> >> and
> >>> hopefully stabilize the state of CI longer term:
> >>>
> >>> Link:
> >>>
> >>
> https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.16oxqq30bby4
> >>> Hopefully the mail server doesn't butcher formatting; if it does, hit
> up
> >>> the gdoc and leave comments there as should be open to all.
> >>>
> >>> Phase 1:
> >>> Document merge criteria; update circle jobs to have a simple pre-merge
> >> job
> >>> (one for each JDK profile)
> >>>      * Donate, document, and formalize usage of circleci-enable.py in
> ASF
> >>> repo (need new commit scripts / dev tooling section?)
> >>>         * rewrites circle config jobs to simple clear flow
> >>>         * ability to toggle between "run on push" or "click to run"
> >>>         * Variety of other functionality; see below
> >>> Document (site, help, README.md) and automate via scripting the
> >>> relationship / dev / release process around:
> >>>     * In-jvm dtest
> >>>     * dtest
> >>>     * ccm
> >>> Integrate and document usage of script to build CI repeat test runs
> >>>     * circleci-enable.py --repeat-unit org.apache.cassandra.SomeTest
> >>>     * Document “Do this if you add or change tests”
> >>> Introduce “Build Lead” role
> >>>     * Weekly rotation; volunteer
> >>>     * 1: Make sure JIRAs exist for test failures
> >>>     * 2: Attempt to triage new test failures to root cause and assign
> out
> >>>     * 3: Coordinate and drive to green board on trunk
> >>> Change and automate process for *trunk only* patches:
> >>>     * Block on green CI (from merge criteria in CI above; potentially
> >>> stricter definition of "clean" for trunk CI)
> >>>     * Consider using github PR’s to merge (TODO: determine how to
> handle
> >>> circle + CHANGES; see below)
> >>> Automate process for *multi-branch* merges
> >>>     * Harden / contribute / document dcapwell script (has one which
> does
> >>> the following):
> >>>         * rebases your branch to the latest (if on 3.0 then rebase
> >> against
> >>> cassandra-3.0)
> >>>         * check compiles
> >>>         * removes all changes to .circle (can opt-out for circleci
> >> patches)
> >>>         * removes all changes to CHANGES.txt and leverages JIRA for the
> >>> content
> >>>         * checks code still compiles
> >>>         * changes circle to run ci
> >>>         * push to a temp branch in git and run CI (circle + Jenkins)
> >>>             * when all branches are clean (waiting step is manual)
> >>>             * TODO: Define “clean”
> >>>                 * No new test failures compared to reference?
> >>>                 * Or no test failures at all?
> >>>             * merge changes into the actual branches
> >>>             * merge up changes; rewriting diff
> >>>             * push --atomic
> >>>
> >>> Transition to phase 2 when:
> >>>     * All items from phase 1 are complete
> >>>     * Test boards for supported branches are green
> >>>
> >>> Phase 2:
> >>> * Add Harry to recurring run against trunk
> >>> * Add Harry to release pipeline
> >>> * Suite of perf tests against trunk recurring
> >>>
> >>>
> >>>
> >>> On Wed, Nov 17, 2021 at 1:42 PM Joshua McKenzie <jm...@apache.org>
> >>> wrote:
> >>>
> >>>> Sorry for not catching that Benedict, you're absolutely right. So long
> >> as
> >>>> we're using merge commits between branches I don't think auto-merging
> >> via
> >>>> train or blocking on green CI are options via the tooling, and
> >> multi-branch
> >>>> reverts will be something we should document very clearly should we
> even
> >>>> choose to go that route (a lot of room to make mistakes there).
> >>>>
> >>>> It may not be a huge issue as we can expect the more disruptive
> changes
> >>>> (i.e. potentially destabilizing) to be happening on trunk only, so
> >> perhaps
> >>>> we can get away with slightly different workflows or policies based on
> >>>> whether you're doing a multi-branch bugfix or a feature on trunk.
> Bears
> >>>> thinking more deeply about.
> >>>>
> >>>> I'd also be game for revisiting our merge strategy. I don't see much
> >>>> difference in labor between merging between branches vs. preparing
> >> separate
> >>>> patches for an individual developer, however I'm sure there's
> >> maintenance
> >>>> and integration implications there I'm not thinking of right now.
> >>>>
> >>>> On Wed, Nov 17, 2021 at 12:03 PM benedict@apache.org <
> >> benedict@apache.org>
> >>>> wrote:
> >>>>
> >>>>> I raised this before, but to highlight it again: how do these
> >> approaches
> >>>>> interface with our merge strategy?
> >>>>>
> >>>>> We might have to rebase several dependent merge commits and want to
> >>>>> merge them atomically. So far as I know these tools don’t work
> >>>>> fantastically in this scenario, but if I’m wrong that’s fantastic. If
> >> not,
> >>>>> given how important these things are, should we consider revisiting
> our
> >>>>> merge strategy?
> >>>>>
> >>>>> From: Joshua McKenzie <jm...@apache.org>
> >>>>> Date: Wednesday, 17 November 2021 at 16:39
> >>>>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> >>>>> Subject: Re: [DISCUSS] Releasable trunk and quality
> >>>>> Thanks for the feedback and insight Henrik; it's valuable to hear how
> >>>>> other
> >>>>> large complex infra projects have tackled this problem set.
> >>>>>
> >>>>> To attempt to summarize, what I got from your email:
> >>>>> [Phase one]
> >>>>> 1) Build Barons: rotation where there's always someone active tying
> >>>>> failures to changes and adding those failures to our ticketing system
> >>>>> 2) Best effort process of "test breakers" being assigned tickets to
> fix
> >>>>> the
> >>>>> things their work broke
> >>>>> 3) Moving to a culture where we regularly revert commits that break
> >> tests
> >>>>> 4) Running tests before we merge changes
> >>>>>
> >>>>> [Phase two]
> >>>>> 1) Suite of performance tests on a regular cadence against trunk
> >>>>> (w/hunter
> >>>>> or otherwise)
> >>>>> 2) Integration w/ github merge-train pipelines
> >>>>>
> >>>>> That cover the highlights? I agree with these points as useful places
> >> for
> >>>>> us to invest in as a project and I'll work on getting this into a
> gdoc
> >>>>> for
> >>>>> us to align on and discuss further this week.
> >>>>>
> >>>>> ~Josh
> >>>>>
> >>>>>
> >>>>> On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo <
> henrik.ingo@datastax.com
> >>>>> wrote:
> >>>>>
> >>>>>> There's an old joke: How many people read Slashdot? The answer is 5.
> >>>>> The
> >>>>>> rest of us just write comments without reading... In that spirit, I
> >>>>> wanted
> >>>>>> to share some thoughts in response to your question, even if I know
> >>>>> some of
> >>>>>> it will have been said in this thread already :-)
> >>>>>>
> >>>>>> Basically, I just want to share what has worked well in my past
> >>>>> projects...
> >>>>>> Visualization: Now that we have Butler running, we can already see a
> >>>>>> decline in failing tests for 4.0 and trunk! This shows that
> >>>>> contributors
> >>>>>> want to do the right thing, we just need the right tools and
> >> processes
> >>>>> to
> >>>>>> achieve success.
> >>>>>>
> >>>>>> Process: I'm confident we will soon be back to seeing 0 failures for
> >>>>> 4.0
> >>>>>> and trunk. However, keeping that state requires constant vigilance!
> >> At
> >>>>>> Mongodb we had a role called Build Baron (aka Build Cop, etc...).
> >> This
> >>>>> is a
> >>>>>> weekly rotating role where the person who is the Build Baron will at
> >>>>> least
> >>>>>> once per day go through all of the Butler dashboards to catch new
> >>>>>> regressions early. We have used the same process also at Datastax to
> >>>>> guard
> >>>>>> our downstream fork of Cassandra 4.0. It's the responsibility of the
> >>>>> Build
> >>>>>> Baron to
> >>>>>>  - file a jira ticket for new failures
> >>>>>>  - determine which commit is responsible for introducing the
> >>>>> regression.
> >>>>>> Sometimes this is obvious, sometimes this requires "bisecting" by
> >>>>> running
> >>>>>> more builds e.g. between two nightly builds.
> >>>>>>  - assign the jira ticket to the author of the commit that
> introduced
> >>>>> the
> >>>>>> regression
> >>>>>>
> >>>>>> Given that Cassandra is a community that includes part time and
> >>>>> volunteer
> >>>>>> developers, we may want to try some variation of this, such as
> >> pairing
> >>>>> 2
> >>>>>> build barons each week?
> >>>>>>
> >>>>>> Reverting: A policy that the commit causing the regression is
> >>>>> automatically
> >>>>>> reverted can be scary. It takes courage to be the junior test
> >> engineer
> >>>>> who
> >>>>>> reverts yesterday's commit from the founder and CTO, just to give an
> >>>>>> example... Yet this is the most efficient way to keep the build
> >> green.
> >>>>> And
> >>>>>> it turns out it's not that much additional work for the original
> >>>>> author to
> >>>>>> fix the issue and then re-merge the patch.
> >>>>>>
> >>>>>> Merge-train: For any project with more than 1 commit per day, it
> will
> >>>>>> inevitably happen that you need to rebase a PR before merging, and
> >>>>> even if
> >>>>>> it passed all tests before, after rebase it won't. In the downstream
> >>>>>> Cassandra fork previously mentioned, we have tried to enable a
> github
> >>>>> rule
> >>>>>> which requires a) that all tests passed before merging, and b) the
> PR
> >>>>> is
> >>>>>> against the head of the branch merged into, and c) the tests were
> run
> >>>>> after
> >>>>>> such rebase. Unfortunately this leads to infinite loops where a
> large
> >>>>> PR
> >>>>>> may never be able to commit because it has to be rebased again and
> >>>>> again
> >>>>>> when smaller PRs can merge faster. The solution to this problem is
> to
> >>>>> have
> >>>>>> an automated process for the rebase-test-merge cycle. Gitlab
> supports
> >>>>> such
> >>>>>> a feature and calls it merge-trean:
> >>>>>> https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html
> >>>>>>
> >>>>>> The merge-train can be considered an advanced feature and we can
> >>>>> return to
> >>>>>> it later. The other points should be sufficient to keep a reasonably
> >>>>> green
> >>>>>> trunk.
> >>>>>>
> >>>>>> I guess the major area where we can improve daily test coverage
> would
> >>>>> be
> >>>>>> performance tests. To that end we recently open sourced a nice tool
> >>>>> that
> >>>>>> can algorithmically detects performance regressions in a timeseries
> >>>>> history
> >>>>>> of benchmark results: https://github.com/datastax-labs/hunter Just
> >>>>> like
> >>>>>> with correctness testing it's my experience that catching
> regressions
> >>>>> the
> >>>>>> day they happened is much better than trying to do it at beta or rc
> >>>>> time.
> >>>>>> Piotr also blogged about Hunter when it was released:
> >>>>>>
> >>>>>>
> >>
> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4
> >>>>>> henrik
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <
> >> jmckenzie@apache.org>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> We as a project have gone back and forth on the topic of quality
> >> and
> >>>>> the
> >>>>>>> notion of a releasable trunk for quite a few years. If people are
> >>>>>>> interested, I'd like to rekindle this discussion a bit and see if
> >>>>> we're
> >>>>>>> happy with where we are as a project or if we think there's steps
> >> we
> >>>>>> should
> >>>>>>> take to change the quality bar going forward. The following
> >> questions
> >>>>>> have
> >>>>>>> been rattling around for me for awhile:
> >>>>>>>
> >>>>>>> 1. How do we define what "releasable trunk" means? All reviewed by
> >> M
> >>>>>>> committers? Passing N% of tests? Passing all tests plus some other
> >>>>>> metrics
> >>>>>>> (manual testing, raising the number of reviewers, test coverage,
> >>>>> usage in
> >>>>>>> dev or QA environments, etc)? Something else entirely?
> >>>>>>>
> >>>>>>> 2. With a definition settled upon in #1, what steps, if any, do we
> >>>>> need
> >>>>>> to
> >>>>>>> take to get from where we are to having *and keeping* that
> >> releasable
> >>>>>>> trunk? Anything to codify there?
> >>>>>>>
> >>>>>>> 3. What are the benefits of having a releasable trunk as defined
> >>>>> here?
> >>>>>> What
> >>>>>>> are the costs? Is it worth pursuing? What are the alternatives (for
> >>>>>>> instance: a freeze before a release + stabilization focus by the
> >>>>>> community
> >>>>>>> i.e. 4.0 push or the tock in tick-tock)?
> >>>>>>>
> >>>>>>> Given the large volumes of work coming down the pike with CEP's,
> >> this
> >>>>>> seems
> >>>>>>> like a good time to at least check in on this topic as a community.
> >>>>>>>
> >>>>>>> Full disclosure: running face-first into 60+ failing tests on trunk
> >>>>> when
> >>>>>>> going through the commit process for denylisting this week brought
> >>>>> this
> >>>>>>> topic back up for me (reminds me of when I went to merge CDC back
> >> in
> >>>>> 3.6
> >>>>>>> and those test failures riled me up... I sense a pattern ;))
> >>>>>>>
> >>>>>>> Looking forward to hearing what people think.
> >>>>>>>
> >>>>>>> ~Josh
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Henrik Ingo
> >>>>>>
> >>>>>> +358 40 569 7354 <358405697354>
> >>>>>>
> >>>>>> [image: Visit us online.] <https://www.datastax.com/>  [image:
> Visit
> >>>>> us on
> >>>>>> Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
> >>>>> YouTube.]
> >>>>>> <
> >>>>>>
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=
> >>>>>>   [image: Visit my LinkedIn profile.] <
> >>>>> https://www.linkedin.com/in/heingo/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Berenguer Blasi <be...@gmail.com>.

+1. I would add a 'post-commit' step: check the jenkins CI run for your
merge and see if sthg broke regardless.

On 6/12/21 23:51, Ekaterina Dimitrova wrote:
> Hi Josh,
> All good questions, thank you for raising this topic.
> To the best of my knowledge, we don't have those documented but I will put
> notes on what tribal knowledge I know about and I personally follow :-)
>
>  Pre-commit test suites: * Which JDK's?  - both are officially supported so
> both.
>
> * When to include all python tests or do JVM only (if ever)? - if I test
> only a test fix probably
>
>  * When to run upgrade tests? - I haven't heard any definitive guideline.
> Preferably every time but if there is a tiny change I guess it can be
> decided for them to be skipped. I would advocate to do more than less.
>
> * What to do if a test is also failing on the reference root (i.e. trunk,
> cassandra-4.0, etc)? - check if a ticket exists already, if not - open one
> at least, even if I don't plan to work on it at least to acknowledge
> the issue and add any info I know about. If we know who broke it, ping the
> author to check it.
>
> * What to do if a test fails intermittently? - Open a ticket. During
> investigation - Use the CircleCI jobs for running tests in a loop to find
> when it fails or to verify the test was fixed. (This is already in my draft
> CircleCI document, not yet released as it was pending on the documents
> migration.)
>
> Hope that helps.
>
> ~Ekaterina
>
> On Mon, 6 Dec 2021 at 17:20, Joshua McKenzie <jm...@apache.org> wrote:
>
>> As I work through the scripting on this, I don't know if we've documented
>> or clarified the following (don't see it here:
>> https://cassandra.apache.org/_/development/testing.html):
>>
>> Pre-commit test suites:
>> * Which JDK's?
>> * When to include all python tests or do JVM only (if ever)?
>> * When to run upgrade tests?
>> * What to do if a test is also failing on the reference root (i.e. trunk,
>> cassandra-4.0, etc)?
>> * What to do if a test fails intermittently?
>>
>> I'll also update the above linked documentation once we hammer this out and
>> try and bake it into the scripting flow as much as possible as well. Goal
>> is to make it easy to do the right thing and hard to do the wrong thing,
>> and to have these things written down rather than have it be tribal
>> knowledge that varies a lot across the project.
>>
>> ~Josh
>>
>> On Sat, Dec 4, 2021 at 9:04 AM Joshua McKenzie <jm...@apache.org>
>> wrote:
>>
>>> After some offline collab, here's where this thread has landed on a
>>> proposal to change our processes to incrementally improve our processes
>> and
>>> hopefully stabilize the state of CI longer term:
>>>
>>> Link:
>>>
>> https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.16oxqq30bby4
>>> Hopefully the mail server doesn't butcher formatting; if it does, hit up
>>> the gdoc and leave comments there as should be open to all.
>>>
>>> Phase 1:
>>> Document merge criteria; update circle jobs to have a simple pre-merge
>> job
>>> (one for each JDK profile)
>>>      * Donate, document, and formalize usage of circleci-enable.py in ASF
>>> repo (need new commit scripts / dev tooling section?)
>>>         * rewrites circle config jobs to simple clear flow
>>>         * ability to toggle between "run on push" or "click to run"
>>>         * Variety of other functionality; see below
>>> Document (site, help, README.md) and automate via scripting the
>>> relationship / dev / release process around:
>>>     * In-jvm dtest
>>>     * dtest
>>>     * ccm
>>> Integrate and document usage of script to build CI repeat test runs
>>>     * circleci-enable.py --repeat-unit org.apache.cassandra.SomeTest
>>>     * Document “Do this if you add or change tests”
>>> Introduce “Build Lead” role
>>>     * Weekly rotation; volunteer
>>>     * 1: Make sure JIRAs exist for test failures
>>>     * 2: Attempt to triage new test failures to root cause and assign out
>>>     * 3: Coordinate and drive to green board on trunk
>>> Change and automate process for *trunk only* patches:
>>>     * Block on green CI (from merge criteria in CI above; potentially
>>> stricter definition of "clean" for trunk CI)
>>>     * Consider using github PR’s to merge (TODO: determine how to handle
>>> circle + CHANGES; see below)
>>> Automate process for *multi-branch* merges
>>>     * Harden / contribute / document dcapwell script (has one which does
>>> the following):
>>>         * rebases your branch to the latest (if on 3.0 then rebase
>> against
>>> cassandra-3.0)
>>>         * check compiles
>>>         * removes all changes to .circle (can opt-out for circleci
>> patches)
>>>         * removes all changes to CHANGES.txt and leverages JIRA for the
>>> content
>>>         * checks code still compiles
>>>         * changes circle to run ci
>>>         * push to a temp branch in git and run CI (circle + Jenkins)
>>>             * when all branches are clean (waiting step is manual)
>>>             * TODO: Define “clean”
>>>                 * No new test failures compared to reference?
>>>                 * Or no test failures at all?
>>>             * merge changes into the actual branches
>>>             * merge up changes; rewriting diff
>>>             * push --atomic
>>>
>>> Transition to phase 2 when:
>>>     * All items from phase 1 are complete
>>>     * Test boards for supported branches are green
>>>
>>> Phase 2:
>>> * Add Harry to recurring run against trunk
>>> * Add Harry to release pipeline
>>> * Suite of perf tests against trunk recurring
>>>
>>>
>>>
>>> On Wed, Nov 17, 2021 at 1:42 PM Joshua McKenzie <jm...@apache.org>
>>> wrote:
>>>
>>>> Sorry for not catching that Benedict, you're absolutely right. So long
>> as
>>>> we're using merge commits between branches I don't think auto-merging
>> via
>>>> train or blocking on green CI are options via the tooling, and
>> multi-branch
>>>> reverts will be something we should document very clearly should we even
>>>> choose to go that route (a lot of room to make mistakes there).
>>>>
>>>> It may not be a huge issue as we can expect the more disruptive changes
>>>> (i.e. potentially destabilizing) to be happening on trunk only, so
>> perhaps
>>>> we can get away with slightly different workflows or policies based on
>>>> whether you're doing a multi-branch bugfix or a feature on trunk. Bears
>>>> thinking more deeply about.
>>>>
>>>> I'd also be game for revisiting our merge strategy. I don't see much
>>>> difference in labor between merging between branches vs. preparing
>> separate
>>>> patches for an individual developer, however I'm sure there's
>> maintenance
>>>> and integration implications there I'm not thinking of right now.
>>>>
>>>> On Wed, Nov 17, 2021 at 12:03 PM benedict@apache.org <
>> benedict@apache.org>
>>>> wrote:
>>>>
>>>>> I raised this before, but to highlight it again: how do these
>> approaches
>>>>> interface with our merge strategy?
>>>>>
>>>>> We might have to rebase several dependent merge commits and want to
>>>>> merge them atomically. So far as I know these tools don’t work
>>>>> fantastically in this scenario, but if I’m wrong that’s fantastic. If
>> not,
>>>>> given how important these things are, should we consider revisiting our
>>>>> merge strategy?
>>>>>
>>>>> From: Joshua McKenzie <jm...@apache.org>
>>>>> Date: Wednesday, 17 November 2021 at 16:39
>>>>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>>>>> Subject: Re: [DISCUSS] Releasable trunk and quality
>>>>> Thanks for the feedback and insight Henrik; it's valuable to hear how
>>>>> other
>>>>> large complex infra projects have tackled this problem set.
>>>>>
>>>>> To attempt to summarize, what I got from your email:
>>>>> [Phase one]
>>>>> 1) Build Barons: rotation where there's always someone active tying
>>>>> failures to changes and adding those failures to our ticketing system
>>>>> 2) Best effort process of "test breakers" being assigned tickets to fix
>>>>> the
>>>>> things their work broke
>>>>> 3) Moving to a culture where we regularly revert commits that break
>> tests
>>>>> 4) Running tests before we merge changes
>>>>>
>>>>> [Phase two]
>>>>> 1) Suite of performance tests on a regular cadence against trunk
>>>>> (w/hunter
>>>>> or otherwise)
>>>>> 2) Integration w/ github merge-train pipelines
>>>>>
>>>>> That cover the highlights? I agree with these points as useful places
>> for
>>>>> us to invest in as a project and I'll work on getting this into a gdoc
>>>>> for
>>>>> us to align on and discuss further this week.
>>>>>
>>>>> ~Josh
>>>>>
>>>>>
>>>>> On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo <henrik.ingo@datastax.com
>>>>> wrote:
>>>>>
>>>>>> There's an old joke: How many people read Slashdot? The answer is 5.
>>>>> The
>>>>>> rest of us just write comments without reading... In that spirit, I
>>>>> wanted
>>>>>> to share some thoughts in response to your question, even if I know
>>>>> some of
>>>>>> it will have been said in this thread already :-)
>>>>>>
>>>>>> Basically, I just want to share what has worked well in my past
>>>>> projects...
>>>>>> Visualization: Now that we have Butler running, we can already see a
>>>>>> decline in failing tests for 4.0 and trunk! This shows that
>>>>> contributors
>>>>>> want to do the right thing, we just need the right tools and
>> processes
>>>>> to
>>>>>> achieve success.
>>>>>>
>>>>>> Process: I'm confident we will soon be back to seeing 0 failures for
>>>>> 4.0
>>>>>> and trunk. However, keeping that state requires constant vigilance!
>> At
>>>>>> Mongodb we had a role called Build Baron (aka Build Cop, etc...).
>> This
>>>>> is a
>>>>>> weekly rotating role where the person who is the Build Baron will at
>>>>> least
>>>>>> once per day go through all of the Butler dashboards to catch new
>>>>>> regressions early. We have used the same process also at Datastax to
>>>>> guard
>>>>>> our downstream fork of Cassandra 4.0. It's the responsibility of the
>>>>> Build
>>>>>> Baron to
>>>>>>  - file a jira ticket for new failures
>>>>>>  - determine which commit is responsible for introducing the
>>>>> regression.
>>>>>> Sometimes this is obvious, sometimes this requires "bisecting" by
>>>>> running
>>>>>> more builds e.g. between two nightly builds.
>>>>>>  - assign the jira ticket to the author of the commit that introduced
>>>>> the
>>>>>> regression
>>>>>>
>>>>>> Given that Cassandra is a community that includes part time and
>>>>> volunteer
>>>>>> developers, we may want to try some variation of this, such as
>> pairing
>>>>> 2
>>>>>> build barons each week?
>>>>>>
>>>>>> Reverting: A policy that the commit causing the regression is
>>>>> automatically
>>>>>> reverted can be scary. It takes courage to be the junior test
>> engineer
>>>>> who
>>>>>> reverts yesterday's commit from the founder and CTO, just to give an
>>>>>> example... Yet this is the most efficient way to keep the build
>> green.
>>>>> And
>>>>>> it turns out it's not that much additional work for the original
>>>>> author to
>>>>>> fix the issue and then re-merge the patch.
>>>>>>
>>>>>> Merge-train: For any project with more than 1 commit per day, it will
>>>>>> inevitably happen that you need to rebase a PR before merging, and
>>>>> even if
>>>>>> it passed all tests before, after rebase it won't. In the downstream
>>>>>> Cassandra fork previously mentioned, we have tried to enable a github
>>>>> rule
>>>>>> which requires a) that all tests passed before merging, and b) the PR
>>>>> is
>>>>>> against the head of the branch merged into, and c) the tests were run
>>>>> after
>>>>>> such rebase. Unfortunately this leads to infinite loops where a large
>>>>> PR
>>>>>> may never be able to commit because it has to be rebased again and
>>>>> again
>>>>>> when smaller PRs can merge faster. The solution to this problem is to
>>>>> have
>>>>>> an automated process for the rebase-test-merge cycle. Gitlab supports
>>>>> such
>>>>>> a feature and calls it merge-trean:
>>>>>> https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html
>>>>>>
>>>>>> The merge-train can be considered an advanced feature and we can
>>>>> return to
>>>>>> it later. The other points should be sufficient to keep a reasonably
>>>>> green
>>>>>> trunk.
>>>>>>
>>>>>> I guess the major area where we can improve daily test coverage would
>>>>> be
>>>>>> performance tests. To that end we recently open sourced a nice tool
>>>>> that
>>>>>> can algorithmically detects performance regressions in a timeseries
>>>>> history
>>>>>> of benchmark results: https://github.com/datastax-labs/hunter Just
>>>>> like
>>>>>> with correctness testing it's my experience that catching regressions
>>>>> the
>>>>>> day they happened is much better than trying to do it at beta or rc
>>>>> time.
>>>>>> Piotr also blogged about Hunter when it was released:
>>>>>>
>>>>>>
>> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4
>>>>>> henrik
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <
>> jmckenzie@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> We as a project have gone back and forth on the topic of quality
>> and
>>>>> the
>>>>>>> notion of a releasable trunk for quite a few years. If people are
>>>>>>> interested, I'd like to rekindle this discussion a bit and see if
>>>>> we're
>>>>>>> happy with where we are as a project or if we think there's steps
>> we
>>>>>> should
>>>>>>> take to change the quality bar going forward. The following
>> questions
>>>>>> have
>>>>>>> been rattling around for me for awhile:
>>>>>>>
>>>>>>> 1. How do we define what "releasable trunk" means? All reviewed by
>> M
>>>>>>> committers? Passing N% of tests? Passing all tests plus some other
>>>>>> metrics
>>>>>>> (manual testing, raising the number of reviewers, test coverage,
>>>>> usage in
>>>>>>> dev or QA environments, etc)? Something else entirely?
>>>>>>>
>>>>>>> 2. With a definition settled upon in #1, what steps, if any, do we
>>>>> need
>>>>>> to
>>>>>>> take to get from where we are to having *and keeping* that
>> releasable
>>>>>>> trunk? Anything to codify there?
>>>>>>>
>>>>>>> 3. What are the benefits of having a releasable trunk as defined
>>>>> here?
>>>>>> What
>>>>>>> are the costs? Is it worth pursuing? What are the alternatives (for
>>>>>>> instance: a freeze before a release + stabilization focus by the
>>>>>> community
>>>>>>> i.e. 4.0 push or the tock in tick-tock)?
>>>>>>>
>>>>>>> Given the large volumes of work coming down the pike with CEP's,
>> this
>>>>>> seems
>>>>>>> like a good time to at least check in on this topic as a community.
>>>>>>>
>>>>>>> Full disclosure: running face-first into 60+ failing tests on trunk
>>>>> when
>>>>>>> going through the commit process for denylisting this week brought
>>>>> this
>>>>>>> topic back up for me (reminds me of when I went to merge CDC back
>> in
>>>>> 3.6
>>>>>>> and those test failures riled me up... I sense a pattern ;))
>>>>>>>
>>>>>>> Looking forward to hearing what people think.
>>>>>>>
>>>>>>> ~Josh
>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Henrik Ingo
>>>>>>
>>>>>> +358 40 569 7354 <358405697354>
>>>>>>
>>>>>> [image: Visit us online.] <https://www.datastax.com/>  [image: Visit
>>>>> us on
>>>>>> Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
>>>>> YouTube.]
>>>>>> <
>>>>>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=
>>>>>>   [image: Visit my LinkedIn profile.] <
>>>>> https://www.linkedin.com/in/heingo/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by Ekaterina Dimitrova <e....@gmail.com>.

Hi Josh,
All good questions, thank you for raising this topic.
To the best of my knowledge, we don't have those documented but I will put
notes on what tribal knowledge I know about and I personally follow :-)

 Pre-commit test suites: * Which JDK's?  - both are officially supported so
both.

* When to include all python tests or do JVM only (if ever)? - if I test
only a test fix probably

 * When to run upgrade tests? - I haven't heard any definitive guideline.
Preferably every time but if there is a tiny change I guess it can be
decided for them to be skipped. I would advocate to do more than less.

* What to do if a test is also failing on the reference root (i.e. trunk,
cassandra-4.0, etc)? - check if a ticket exists already, if not - open one
at least, even if I don't plan to work on it at least to acknowledge
the issue and add any info I know about. If we know who broke it, ping the
author to check it.

* What to do if a test fails intermittently? - Open a ticket. During
investigation - Use the CircleCI jobs for running tests in a loop to find
when it fails or to verify the test was fixed. (This is already in my draft
CircleCI document, not yet released as it was pending on the documents
migration.)

Hope that helps.

~Ekaterina

On Mon, 6 Dec 2021 at 17:20, Joshua McKenzie <jm...@apache.org> wrote:

> As I work through the scripting on this, I don't know if we've documented
> or clarified the following (don't see it here:
> https://cassandra.apache.org/_/development/testing.html):
>
> Pre-commit test suites:
> * Which JDK's?
> * When to include all python tests or do JVM only (if ever)?
> * When to run upgrade tests?
> * What to do if a test is also failing on the reference root (i.e. trunk,
> cassandra-4.0, etc)?
> * What to do if a test fails intermittently?
>
> I'll also update the above linked documentation once we hammer this out and
> try and bake it into the scripting flow as much as possible as well. Goal
> is to make it easy to do the right thing and hard to do the wrong thing,
> and to have these things written down rather than have it be tribal
> knowledge that varies a lot across the project.
>
> ~Josh
>
> On Sat, Dec 4, 2021 at 9:04 AM Joshua McKenzie <jm...@apache.org>
> wrote:
>
> > After some offline collab, here's where this thread has landed on a
> > proposal to change our processes to incrementally improve our processes
> and
> > hopefully stabilize the state of CI longer term:
> >
> > Link:
> >
> https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.16oxqq30bby4
> >
> > Hopefully the mail server doesn't butcher formatting; if it does, hit up
> > the gdoc and leave comments there as should be open to all.
> >
> > Phase 1:
> > Document merge criteria; update circle jobs to have a simple pre-merge
> job
> > (one for each JDK profile)
> >      * Donate, document, and formalize usage of circleci-enable.py in ASF
> > repo (need new commit scripts / dev tooling section?)
> >         * rewrites circle config jobs to simple clear flow
> >         * ability to toggle between "run on push" or "click to run"
> >         * Variety of other functionality; see below
> > Document (site, help, README.md) and automate via scripting the
> > relationship / dev / release process around:
> >     * In-jvm dtest
> >     * dtest
> >     * ccm
> > Integrate and document usage of script to build CI repeat test runs
> >     * circleci-enable.py --repeat-unit org.apache.cassandra.SomeTest
> >     * Document “Do this if you add or change tests”
> > Introduce “Build Lead” role
> >     * Weekly rotation; volunteer
> >     * 1: Make sure JIRAs exist for test failures
> >     * 2: Attempt to triage new test failures to root cause and assign out
> >     * 3: Coordinate and drive to green board on trunk
> > Change and automate process for *trunk only* patches:
> >     * Block on green CI (from merge criteria in CI above; potentially
> > stricter definition of "clean" for trunk CI)
> >     * Consider using github PR’s to merge (TODO: determine how to handle
> > circle + CHANGES; see below)
> > Automate process for *multi-branch* merges
> >     * Harden / contribute / document dcapwell script (has one which does
> > the following):
> >         * rebases your branch to the latest (if on 3.0 then rebase
> against
> > cassandra-3.0)
> >         * check compiles
> >         * removes all changes to .circle (can opt-out for circleci
> patches)
> >         * removes all changes to CHANGES.txt and leverages JIRA for the
> > content
> >         * checks code still compiles
> >         * changes circle to run ci
> >         * push to a temp branch in git and run CI (circle + Jenkins)
> >             * when all branches are clean (waiting step is manual)
> >             * TODO: Define “clean”
> >                 * No new test failures compared to reference?
> >                 * Or no test failures at all?
> >             * merge changes into the actual branches
> >             * merge up changes; rewriting diff
> >             * push --atomic
> >
> > Transition to phase 2 when:
> >     * All items from phase 1 are complete
> >     * Test boards for supported branches are green
> >
> > Phase 2:
> > * Add Harry to recurring run against trunk
> > * Add Harry to release pipeline
> > * Suite of perf tests against trunk recurring
> >
> >
> >
> > On Wed, Nov 17, 2021 at 1:42 PM Joshua McKenzie <jm...@apache.org>
> > wrote:
> >
> >> Sorry for not catching that Benedict, you're absolutely right. So long
> as
> >> we're using merge commits between branches I don't think auto-merging
> via
> >> train or blocking on green CI are options via the tooling, and
> multi-branch
> >> reverts will be something we should document very clearly should we even
> >> choose to go that route (a lot of room to make mistakes there).
> >>
> >> It may not be a huge issue as we can expect the more disruptive changes
> >> (i.e. potentially destabilizing) to be happening on trunk only, so
> perhaps
> >> we can get away with slightly different workflows or policies based on
> >> whether you're doing a multi-branch bugfix or a feature on trunk. Bears
> >> thinking more deeply about.
> >>
> >> I'd also be game for revisiting our merge strategy. I don't see much
> >> difference in labor between merging between branches vs. preparing
> separate
> >> patches for an individual developer, however I'm sure there's
> maintenance
> >> and integration implications there I'm not thinking of right now.
> >>
> >> On Wed, Nov 17, 2021 at 12:03 PM benedict@apache.org <
> benedict@apache.org>
> >> wrote:
> >>
> >>> I raised this before, but to highlight it again: how do these
> approaches
> >>> interface with our merge strategy?
> >>>
> >>> We might have to rebase several dependent merge commits and want to
> >>> merge them atomically. So far as I know these tools don’t work
> >>> fantastically in this scenario, but if I’m wrong that’s fantastic. If
> not,
> >>> given how important these things are, should we consider revisiting our
> >>> merge strategy?
> >>>
> >>> From: Joshua McKenzie <jm...@apache.org>
> >>> Date: Wednesday, 17 November 2021 at 16:39
> >>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> >>> Subject: Re: [DISCUSS] Releasable trunk and quality
> >>> Thanks for the feedback and insight Henrik; it's valuable to hear how
> >>> other
> >>> large complex infra projects have tackled this problem set.
> >>>
> >>> To attempt to summarize, what I got from your email:
> >>> [Phase one]
> >>> 1) Build Barons: rotation where there's always someone active tying
> >>> failures to changes and adding those failures to our ticketing system
> >>> 2) Best effort process of "test breakers" being assigned tickets to fix
> >>> the
> >>> things their work broke
> >>> 3) Moving to a culture where we regularly revert commits that break
> tests
> >>> 4) Running tests before we merge changes
> >>>
> >>> [Phase two]
> >>> 1) Suite of performance tests on a regular cadence against trunk
> >>> (w/hunter
> >>> or otherwise)
> >>> 2) Integration w/ github merge-train pipelines
> >>>
> >>> That cover the highlights? I agree with these points as useful places
> for
> >>> us to invest in as a project and I'll work on getting this into a gdoc
> >>> for
> >>> us to align on and discuss further this week.
> >>>
> >>> ~Josh
> >>>
> >>>
> >>> On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo <henrik.ingo@datastax.com
> >
> >>> wrote:
> >>>
> >>> > There's an old joke: How many people read Slashdot? The answer is 5.
> >>> The
> >>> > rest of us just write comments without reading... In that spirit, I
> >>> wanted
> >>> > to share some thoughts in response to your question, even if I know
> >>> some of
> >>> > it will have been said in this thread already :-)
> >>> >
> >>> > Basically, I just want to share what has worked well in my past
> >>> projects...
> >>> >
> >>> > Visualization: Now that we have Butler running, we can already see a
> >>> > decline in failing tests for 4.0 and trunk! This shows that
> >>> contributors
> >>> > want to do the right thing, we just need the right tools and
> processes
> >>> to
> >>> > achieve success.
> >>> >
> >>> > Process: I'm confident we will soon be back to seeing 0 failures for
> >>> 4.0
> >>> > and trunk. However, keeping that state requires constant vigilance!
> At
> >>> > Mongodb we had a role called Build Baron (aka Build Cop, etc...).
> This
> >>> is a
> >>> > weekly rotating role where the person who is the Build Baron will at
> >>> least
> >>> > once per day go through all of the Butler dashboards to catch new
> >>> > regressions early. We have used the same process also at Datastax to
> >>> guard
> >>> > our downstream fork of Cassandra 4.0. It's the responsibility of the
> >>> Build
> >>> > Baron to
> >>> >  - file a jira ticket for new failures
> >>> >  - determine which commit is responsible for introducing the
> >>> regression.
> >>> > Sometimes this is obvious, sometimes this requires "bisecting" by
> >>> running
> >>> > more builds e.g. between two nightly builds.
> >>> >  - assign the jira ticket to the author of the commit that introduced
> >>> the
> >>> > regression
> >>> >
> >>> > Given that Cassandra is a community that includes part time and
> >>> volunteer
> >>> > developers, we may want to try some variation of this, such as
> pairing
> >>> 2
> >>> > build barons each week?
> >>> >
> >>> > Reverting: A policy that the commit causing the regression is
> >>> automatically
> >>> > reverted can be scary. It takes courage to be the junior test
> engineer
> >>> who
> >>> > reverts yesterday's commit from the founder and CTO, just to give an
> >>> > example... Yet this is the most efficient way to keep the build
> green.
> >>> And
> >>> > it turns out it's not that much additional work for the original
> >>> author to
> >>> > fix the issue and then re-merge the patch.
> >>> >
> >>> > Merge-train: For any project with more than 1 commit per day, it will
> >>> > inevitably happen that you need to rebase a PR before merging, and
> >>> even if
> >>> > it passed all tests before, after rebase it won't. In the downstream
> >>> > Cassandra fork previously mentioned, we have tried to enable a github
> >>> rule
> >>> > which requires a) that all tests passed before merging, and b) the PR
> >>> is
> >>> > against the head of the branch merged into, and c) the tests were run
> >>> after
> >>> > such rebase. Unfortunately this leads to infinite loops where a large
> >>> PR
> >>> > may never be able to commit because it has to be rebased again and
> >>> again
> >>> > when smaller PRs can merge faster. The solution to this problem is to
> >>> have
> >>> > an automated process for the rebase-test-merge cycle. Gitlab supports
> >>> such
> >>> > a feature and calls it merge-trean:
> >>> > https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html
> >>> >
> >>> > The merge-train can be considered an advanced feature and we can
> >>> return to
> >>> > it later. The other points should be sufficient to keep a reasonably
> >>> green
> >>> > trunk.
> >>> >
> >>> > I guess the major area where we can improve daily test coverage would
> >>> be
> >>> > performance tests. To that end we recently open sourced a nice tool
> >>> that
> >>> > can algorithmically detects performance regressions in a timeseries
> >>> history
> >>> > of benchmark results: https://github.com/datastax-labs/hunter Just
> >>> like
> >>> > with correctness testing it's my experience that catching regressions
> >>> the
> >>> > day they happened is much better than trying to do it at beta or rc
> >>> time.
> >>> >
> >>> > Piotr also blogged about Hunter when it was released:
> >>> >
> >>> >
> >>>
> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4
> >>> >
> >>> > henrik
> >>> >
> >>> >
> >>> >
> >>> > On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <
> jmckenzie@apache.org>
> >>> > wrote:
> >>> >
> >>> > > We as a project have gone back and forth on the topic of quality
> and
> >>> the
> >>> > > notion of a releasable trunk for quite a few years. If people are
> >>> > > interested, I'd like to rekindle this discussion a bit and see if
> >>> we're
> >>> > > happy with where we are as a project or if we think there's steps
> we
> >>> > should
> >>> > > take to change the quality bar going forward. The following
> questions
> >>> > have
> >>> > > been rattling around for me for awhile:
> >>> > >
> >>> > > 1. How do we define what "releasable trunk" means? All reviewed by
> M
> >>> > > committers? Passing N% of tests? Passing all tests plus some other
> >>> > metrics
> >>> > > (manual testing, raising the number of reviewers, test coverage,
> >>> usage in
> >>> > > dev or QA environments, etc)? Something else entirely?
> >>> > >
> >>> > > 2. With a definition settled upon in #1, what steps, if any, do we
> >>> need
> >>> > to
> >>> > > take to get from where we are to having *and keeping* that
> releasable
> >>> > > trunk? Anything to codify there?
> >>> > >
> >>> > > 3. What are the benefits of having a releasable trunk as defined
> >>> here?
> >>> > What
> >>> > > are the costs? Is it worth pursuing? What are the alternatives (for
> >>> > > instance: a freeze before a release + stabilization focus by the
> >>> > community
> >>> > > i.e. 4.0 push or the tock in tick-tock)?
> >>> > >
> >>> > > Given the large volumes of work coming down the pike with CEP's,
> this
> >>> > seems
> >>> > > like a good time to at least check in on this topic as a community.
> >>> > >
> >>> > > Full disclosure: running face-first into 60+ failing tests on trunk
> >>> when
> >>> > > going through the commit process for denylisting this week brought
> >>> this
> >>> > > topic back up for me (reminds me of when I went to merge CDC back
> in
> >>> 3.6
> >>> > > and those test failures riled me up... I sense a pattern ;))
> >>> > >
> >>> > > Looking forward to hearing what people think.
> >>> > >
> >>> > > ~Josh
> >>> > >
> >>> >
> >>> >
> >>> > --
> >>> >
> >>> > Henrik Ingo
> >>> >
> >>> > +358 40 569 7354 <358405697354>
> >>> >
> >>> > [image: Visit us online.] <https://www.datastax.com/>  [image: Visit
> >>> us on
> >>> > Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
> >>> YouTube.]
> >>> > <
> >>> >
> >>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=
> >>> > >
> >>> >   [image: Visit my LinkedIn profile.] <
> >>> https://www.linkedin.com/in/heingo/
> >>> > >
> >>> >
> >>>
> >>
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

As I work through the scripting on this, I don't know if we've documented
or clarified the following (don't see it here:
https://cassandra.apache.org/_/development/testing.html):

Pre-commit test suites:
* Which JDK's?
* When to include all python tests or do JVM only (if ever)?
* When to run upgrade tests?
* What to do if a test is also failing on the reference root (i.e. trunk,
cassandra-4.0, etc)?
* What to do if a test fails intermittently?

I'll also update the above linked documentation once we hammer this out and
try and bake it into the scripting flow as much as possible as well. Goal
is to make it easy to do the right thing and hard to do the wrong thing,
and to have these things written down rather than have it be tribal
knowledge that varies a lot across the project.

~Josh

On Sat, Dec 4, 2021 at 9:04 AM Joshua McKenzie <jm...@apache.org> wrote:

> After some offline collab, here's where this thread has landed on a
> proposal to change our processes to incrementally improve our processes and
> hopefully stabilize the state of CI longer term:
>
> Link:
> https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.16oxqq30bby4
>
> Hopefully the mail server doesn't butcher formatting; if it does, hit up
> the gdoc and leave comments there as should be open to all.
>
> Phase 1:
> Document merge criteria; update circle jobs to have a simple pre-merge job
> (one for each JDK profile)
>      * Donate, document, and formalize usage of circleci-enable.py in ASF
> repo (need new commit scripts / dev tooling section?)
>         * rewrites circle config jobs to simple clear flow
>         * ability to toggle between "run on push" or "click to run"
>         * Variety of other functionality; see below
> Document (site, help, README.md) and automate via scripting the
> relationship / dev / release process around:
>     * In-jvm dtest
>     * dtest
>     * ccm
> Integrate and document usage of script to build CI repeat test runs
>     * circleci-enable.py --repeat-unit org.apache.cassandra.SomeTest
>     * Document “Do this if you add or change tests”
> Introduce “Build Lead” role
>     * Weekly rotation; volunteer
>     * 1: Make sure JIRAs exist for test failures
>     * 2: Attempt to triage new test failures to root cause and assign out
>     * 3: Coordinate and drive to green board on trunk
> Change and automate process for *trunk only* patches:
>     * Block on green CI (from merge criteria in CI above; potentially
> stricter definition of "clean" for trunk CI)
>     * Consider using github PR’s to merge (TODO: determine how to handle
> circle + CHANGES; see below)
> Automate process for *multi-branch* merges
>     * Harden / contribute / document dcapwell script (has one which does
> the following):
>         * rebases your branch to the latest (if on 3.0 then rebase against
> cassandra-3.0)
>         * check compiles
>         * removes all changes to .circle (can opt-out for circleci patches)
>         * removes all changes to CHANGES.txt and leverages JIRA for the
> content
>         * checks code still compiles
>         * changes circle to run ci
>         * push to a temp branch in git and run CI (circle + Jenkins)
>             * when all branches are clean (waiting step is manual)
>             * TODO: Define “clean”
>                 * No new test failures compared to reference?
>                 * Or no test failures at all?
>             * merge changes into the actual branches
>             * merge up changes; rewriting diff
>             * push --atomic
>
> Transition to phase 2 when:
>     * All items from phase 1 are complete
>     * Test boards for supported branches are green
>
> Phase 2:
> * Add Harry to recurring run against trunk
> * Add Harry to release pipeline
> * Suite of perf tests against trunk recurring
>
>
>
> On Wed, Nov 17, 2021 at 1:42 PM Joshua McKenzie <jm...@apache.org>
> wrote:
>
>> Sorry for not catching that Benedict, you're absolutely right. So long as
>> we're using merge commits between branches I don't think auto-merging via
>> train or blocking on green CI are options via the tooling, and multi-branch
>> reverts will be something we should document very clearly should we even
>> choose to go that route (a lot of room to make mistakes there).
>>
>> It may not be a huge issue as we can expect the more disruptive changes
>> (i.e. potentially destabilizing) to be happening on trunk only, so perhaps
>> we can get away with slightly different workflows or policies based on
>> whether you're doing a multi-branch bugfix or a feature on trunk. Bears
>> thinking more deeply about.
>>
>> I'd also be game for revisiting our merge strategy. I don't see much
>> difference in labor between merging between branches vs. preparing separate
>> patches for an individual developer, however I'm sure there's maintenance
>> and integration implications there I'm not thinking of right now.
>>
>> On Wed, Nov 17, 2021 at 12:03 PM benedict@apache.org <be...@apache.org>
>> wrote:
>>
>>> I raised this before, but to highlight it again: how do these approaches
>>> interface with our merge strategy?
>>>
>>> We might have to rebase several dependent merge commits and want to
>>> merge them atomically. So far as I know these tools don’t work
>>> fantastically in this scenario, but if I’m wrong that’s fantastic. If not,
>>> given how important these things are, should we consider revisiting our
>>> merge strategy?
>>>
>>> From: Joshua McKenzie <jm...@apache.org>
>>> Date: Wednesday, 17 November 2021 at 16:39
>>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>>> Subject: Re: [DISCUSS] Releasable trunk and quality
>>> Thanks for the feedback and insight Henrik; it's valuable to hear how
>>> other
>>> large complex infra projects have tackled this problem set.
>>>
>>> To attempt to summarize, what I got from your email:
>>> [Phase one]
>>> 1) Build Barons: rotation where there's always someone active tying
>>> failures to changes and adding those failures to our ticketing system
>>> 2) Best effort process of "test breakers" being assigned tickets to fix
>>> the
>>> things their work broke
>>> 3) Moving to a culture where we regularly revert commits that break tests
>>> 4) Running tests before we merge changes
>>>
>>> [Phase two]
>>> 1) Suite of performance tests on a regular cadence against trunk
>>> (w/hunter
>>> or otherwise)
>>> 2) Integration w/ github merge-train pipelines
>>>
>>> That cover the highlights? I agree with these points as useful places for
>>> us to invest in as a project and I'll work on getting this into a gdoc
>>> for
>>> us to align on and discuss further this week.
>>>
>>> ~Josh
>>>
>>>
>>> On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo <he...@datastax.com>
>>> wrote:
>>>
>>> > There's an old joke: How many people read Slashdot? The answer is 5.
>>> The
>>> > rest of us just write comments without reading... In that spirit, I
>>> wanted
>>> > to share some thoughts in response to your question, even if I know
>>> some of
>>> > it will have been said in this thread already :-)
>>> >
>>> > Basically, I just want to share what has worked well in my past
>>> projects...
>>> >
>>> > Visualization: Now that we have Butler running, we can already see a
>>> > decline in failing tests for 4.0 and trunk! This shows that
>>> contributors
>>> > want to do the right thing, we just need the right tools and processes
>>> to
>>> > achieve success.
>>> >
>>> > Process: I'm confident we will soon be back to seeing 0 failures for
>>> 4.0
>>> > and trunk. However, keeping that state requires constant vigilance! At
>>> > Mongodb we had a role called Build Baron (aka Build Cop, etc...). This
>>> is a
>>> > weekly rotating role where the person who is the Build Baron will at
>>> least
>>> > once per day go through all of the Butler dashboards to catch new
>>> > regressions early. We have used the same process also at Datastax to
>>> guard
>>> > our downstream fork of Cassandra 4.0. It's the responsibility of the
>>> Build
>>> > Baron to
>>> >  - file a jira ticket for new failures
>>> >  - determine which commit is responsible for introducing the
>>> regression.
>>> > Sometimes this is obvious, sometimes this requires "bisecting" by
>>> running
>>> > more builds e.g. between two nightly builds.
>>> >  - assign the jira ticket to the author of the commit that introduced
>>> the
>>> > regression
>>> >
>>> > Given that Cassandra is a community that includes part time and
>>> volunteer
>>> > developers, we may want to try some variation of this, such as pairing
>>> 2
>>> > build barons each week?
>>> >
>>> > Reverting: A policy that the commit causing the regression is
>>> automatically
>>> > reverted can be scary. It takes courage to be the junior test engineer
>>> who
>>> > reverts yesterday's commit from the founder and CTO, just to give an
>>> > example... Yet this is the most efficient way to keep the build green.
>>> And
>>> > it turns out it's not that much additional work for the original
>>> author to
>>> > fix the issue and then re-merge the patch.
>>> >
>>> > Merge-train: For any project with more than 1 commit per day, it will
>>> > inevitably happen that you need to rebase a PR before merging, and
>>> even if
>>> > it passed all tests before, after rebase it won't. In the downstream
>>> > Cassandra fork previously mentioned, we have tried to enable a github
>>> rule
>>> > which requires a) that all tests passed before merging, and b) the PR
>>> is
>>> > against the head of the branch merged into, and c) the tests were run
>>> after
>>> > such rebase. Unfortunately this leads to infinite loops where a large
>>> PR
>>> > may never be able to commit because it has to be rebased again and
>>> again
>>> > when smaller PRs can merge faster. The solution to this problem is to
>>> have
>>> > an automated process for the rebase-test-merge cycle. Gitlab supports
>>> such
>>> > a feature and calls it merge-trean:
>>> > https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html
>>> >
>>> > The merge-train can be considered an advanced feature and we can
>>> return to
>>> > it later. The other points should be sufficient to keep a reasonably
>>> green
>>> > trunk.
>>> >
>>> > I guess the major area where we can improve daily test coverage would
>>> be
>>> > performance tests. To that end we recently open sourced a nice tool
>>> that
>>> > can algorithmically detects performance regressions in a timeseries
>>> history
>>> > of benchmark results: https://github.com/datastax-labs/hunter Just
>>> like
>>> > with correctness testing it's my experience that catching regressions
>>> the
>>> > day they happened is much better than trying to do it at beta or rc
>>> time.
>>> >
>>> > Piotr also blogged about Hunter when it was released:
>>> >
>>> >
>>> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4
>>> >
>>> > henrik
>>> >
>>> >
>>> >
>>> > On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <jm...@apache.org>
>>> > wrote:
>>> >
>>> > > We as a project have gone back and forth on the topic of quality and
>>> the
>>> > > notion of a releasable trunk for quite a few years. If people are
>>> > > interested, I'd like to rekindle this discussion a bit and see if
>>> we're
>>> > > happy with where we are as a project or if we think there's steps we
>>> > should
>>> > > take to change the quality bar going forward. The following questions
>>> > have
>>> > > been rattling around for me for awhile:
>>> > >
>>> > > 1. How do we define what "releasable trunk" means? All reviewed by M
>>> > > committers? Passing N% of tests? Passing all tests plus some other
>>> > metrics
>>> > > (manual testing, raising the number of reviewers, test coverage,
>>> usage in
>>> > > dev or QA environments, etc)? Something else entirely?
>>> > >
>>> > > 2. With a definition settled upon in #1, what steps, if any, do we
>>> need
>>> > to
>>> > > take to get from where we are to having *and keeping* that releasable
>>> > > trunk? Anything to codify there?
>>> > >
>>> > > 3. What are the benefits of having a releasable trunk as defined
>>> here?
>>> > What
>>> > > are the costs? Is it worth pursuing? What are the alternatives (for
>>> > > instance: a freeze before a release + stabilization focus by the
>>> > community
>>> > > i.e. 4.0 push or the tock in tick-tock)?
>>> > >
>>> > > Given the large volumes of work coming down the pike with CEP's, this
>>> > seems
>>> > > like a good time to at least check in on this topic as a community.
>>> > >
>>> > > Full disclosure: running face-first into 60+ failing tests on trunk
>>> when
>>> > > going through the commit process for denylisting this week brought
>>> this
>>> > > topic back up for me (reminds me of when I went to merge CDC back in
>>> 3.6
>>> > > and those test failures riled me up... I sense a pattern ;))
>>> > >
>>> > > Looking forward to hearing what people think.
>>> > >
>>> > > ~Josh
>>> > >
>>> >
>>> >
>>> > --
>>> >
>>> > Henrik Ingo
>>> >
>>> > +358 40 569 7354 <358405697354>
>>> >
>>> > [image: Visit us online.] <https://www.datastax.com/>  [image: Visit
>>> us on
>>> > Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
>>> YouTube.]
>>> > <
>>> >
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=
>>> > >
>>> >   [image: Visit my LinkedIn profile.] <
>>> https://www.linkedin.com/in/heingo/
>>> > >
>>> >
>>>
>>

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

After some offline collab, here's where this thread has landed on a
proposal to change our processes to incrementally improve our processes and
hopefully stabilize the state of CI longer term:

Link:
https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.16oxqq30bby4

Hopefully the mail server doesn't butcher formatting; if it does, hit up
the gdoc and leave comments there as should be open to all.

Phase 1:
Document merge criteria; update circle jobs to have a simple pre-merge job
(one for each JDK profile)
     * Donate, document, and formalize usage of circleci-enable.py in ASF
repo (need new commit scripts / dev tooling section?)
        * rewrites circle config jobs to simple clear flow
        * ability to toggle between "run on push" or "click to run"
        * Variety of other functionality; see below
Document (site, help, README.md) and automate via scripting the
relationship / dev / release process around:
    * In-jvm dtest
    * dtest
    * ccm
Integrate and document usage of script to build CI repeat test runs
    * circleci-enable.py --repeat-unit org.apache.cassandra.SomeTest
    * Document “Do this if you add or change tests”
Introduce “Build Lead” role
    * Weekly rotation; volunteer
    * 1: Make sure JIRAs exist for test failures
    * 2: Attempt to triage new test failures to root cause and assign out
    * 3: Coordinate and drive to green board on trunk
Change and automate process for *trunk only* patches:
    * Block on green CI (from merge criteria in CI above; potentially
stricter definition of "clean" for trunk CI)
    * Consider using github PR’s to merge (TODO: determine how to handle
circle + CHANGES; see below)
Automate process for *multi-branch* merges
    * Harden / contribute / document dcapwell script (has one which does
the following):
        * rebases your branch to the latest (if on 3.0 then rebase against
cassandra-3.0)
        * check compiles
        * removes all changes to .circle (can opt-out for circleci patches)
        * removes all changes to CHANGES.txt and leverages JIRA for the
content
        * checks code still compiles
        * changes circle to run ci
        * push to a temp branch in git and run CI (circle + Jenkins)
            * when all branches are clean (waiting step is manual)
            * TODO: Define “clean”
                * No new test failures compared to reference?
                * Or no test failures at all?
            * merge changes into the actual branches
            * merge up changes; rewriting diff
            * push --atomic

Transition to phase 2 when:
    * All items from phase 1 are complete
    * Test boards for supported branches are green

Phase 2:
* Add Harry to recurring run against trunk
* Add Harry to release pipeline
* Suite of perf tests against trunk recurring



On Wed, Nov 17, 2021 at 1:42 PM Joshua McKenzie <jm...@apache.org>
wrote:

> Sorry for not catching that Benedict, you're absolutely right. So long as
> we're using merge commits between branches I don't think auto-merging via
> train or blocking on green CI are options via the tooling, and multi-branch
> reverts will be something we should document very clearly should we even
> choose to go that route (a lot of room to make mistakes there).
>
> It may not be a huge issue as we can expect the more disruptive changes
> (i.e. potentially destabilizing) to be happening on trunk only, so perhaps
> we can get away with slightly different workflows or policies based on
> whether you're doing a multi-branch bugfix or a feature on trunk. Bears
> thinking more deeply about.
>
> I'd also be game for revisiting our merge strategy. I don't see much
> difference in labor between merging between branches vs. preparing separate
> patches for an individual developer, however I'm sure there's maintenance
> and integration implications there I'm not thinking of right now.
>
> On Wed, Nov 17, 2021 at 12:03 PM benedict@apache.org <be...@apache.org>
> wrote:
>
>> I raised this before, but to highlight it again: how do these approaches
>> interface with our merge strategy?
>>
>> We might have to rebase several dependent merge commits and want to merge
>> them atomically. So far as I know these tools don’t work fantastically in
>> this scenario, but if I’m wrong that’s fantastic. If not, given how
>> important these things are, should we consider revisiting our merge
>> strategy?
>>
>> From: Joshua McKenzie <jm...@apache.org>
>> Date: Wednesday, 17 November 2021 at 16:39
>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>> Subject: Re: [DISCUSS] Releasable trunk and quality
>> Thanks for the feedback and insight Henrik; it's valuable to hear how
>> other
>> large complex infra projects have tackled this problem set.
>>
>> To attempt to summarize, what I got from your email:
>> [Phase one]
>> 1) Build Barons: rotation where there's always someone active tying
>> failures to changes and adding those failures to our ticketing system
>> 2) Best effort process of "test breakers" being assigned tickets to fix
>> the
>> things their work broke
>> 3) Moving to a culture where we regularly revert commits that break tests
>> 4) Running tests before we merge changes
>>
>> [Phase two]
>> 1) Suite of performance tests on a regular cadence against trunk (w/hunter
>> or otherwise)
>> 2) Integration w/ github merge-train pipelines
>>
>> That cover the highlights? I agree with these points as useful places for
>> us to invest in as a project and I'll work on getting this into a gdoc for
>> us to align on and discuss further this week.
>>
>> ~Josh
>>
>>
>> On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo <he...@datastax.com>
>> wrote:
>>
>> > There's an old joke: How many people read Slashdot? The answer is 5. The
>> > rest of us just write comments without reading... In that spirit, I
>> wanted
>> > to share some thoughts in response to your question, even if I know
>> some of
>> > it will have been said in this thread already :-)
>> >
>> > Basically, I just want to share what has worked well in my past
>> projects...
>> >
>> > Visualization: Now that we have Butler running, we can already see a
>> > decline in failing tests for 4.0 and trunk! This shows that contributors
>> > want to do the right thing, we just need the right tools and processes
>> to
>> > achieve success.
>> >
>> > Process: I'm confident we will soon be back to seeing 0 failures for 4.0
>> > and trunk. However, keeping that state requires constant vigilance! At
>> > Mongodb we had a role called Build Baron (aka Build Cop, etc...). This
>> is a
>> > weekly rotating role where the person who is the Build Baron will at
>> least
>> > once per day go through all of the Butler dashboards to catch new
>> > regressions early. We have used the same process also at Datastax to
>> guard
>> > our downstream fork of Cassandra 4.0. It's the responsibility of the
>> Build
>> > Baron to
>> >  - file a jira ticket for new failures
>> >  - determine which commit is responsible for introducing the regression.
>> > Sometimes this is obvious, sometimes this requires "bisecting" by
>> running
>> > more builds e.g. between two nightly builds.
>> >  - assign the jira ticket to the author of the commit that introduced
>> the
>> > regression
>> >
>> > Given that Cassandra is a community that includes part time and
>> volunteer
>> > developers, we may want to try some variation of this, such as pairing 2
>> > build barons each week?
>> >
>> > Reverting: A policy that the commit causing the regression is
>> automatically
>> > reverted can be scary. It takes courage to be the junior test engineer
>> who
>> > reverts yesterday's commit from the founder and CTO, just to give an
>> > example... Yet this is the most efficient way to keep the build green.
>> And
>> > it turns out it's not that much additional work for the original author
>> to
>> > fix the issue and then re-merge the patch.
>> >
>> > Merge-train: For any project with more than 1 commit per day, it will
>> > inevitably happen that you need to rebase a PR before merging, and even
>> if
>> > it passed all tests before, after rebase it won't. In the downstream
>> > Cassandra fork previously mentioned, we have tried to enable a github
>> rule
>> > which requires a) that all tests passed before merging, and b) the PR is
>> > against the head of the branch merged into, and c) the tests were run
>> after
>> > such rebase. Unfortunately this leads to infinite loops where a large PR
>> > may never be able to commit because it has to be rebased again and again
>> > when smaller PRs can merge faster. The solution to this problem is to
>> have
>> > an automated process for the rebase-test-merge cycle. Gitlab supports
>> such
>> > a feature and calls it merge-trean:
>> > https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html
>> >
>> > The merge-train can be considered an advanced feature and we can return
>> to
>> > it later. The other points should be sufficient to keep a reasonably
>> green
>> > trunk.
>> >
>> > I guess the major area where we can improve daily test coverage would be
>> > performance tests. To that end we recently open sourced a nice tool that
>> > can algorithmically detects performance regressions in a timeseries
>> history
>> > of benchmark results: https://github.com/datastax-labs/hunter Just like
>> > with correctness testing it's my experience that catching regressions
>> the
>> > day they happened is much better than trying to do it at beta or rc
>> time.
>> >
>> > Piotr also blogged about Hunter when it was released:
>> >
>> >
>> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4
>> >
>> > henrik
>> >
>> >
>> >
>> > On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <jm...@apache.org>
>> > wrote:
>> >
>> > > We as a project have gone back and forth on the topic of quality and
>> the
>> > > notion of a releasable trunk for quite a few years. If people are
>> > > interested, I'd like to rekindle this discussion a bit and see if
>> we're
>> > > happy with where we are as a project or if we think there's steps we
>> > should
>> > > take to change the quality bar going forward. The following questions
>> > have
>> > > been rattling around for me for awhile:
>> > >
>> > > 1. How do we define what "releasable trunk" means? All reviewed by M
>> > > committers? Passing N% of tests? Passing all tests plus some other
>> > metrics
>> > > (manual testing, raising the number of reviewers, test coverage,
>> usage in
>> > > dev or QA environments, etc)? Something else entirely?
>> > >
>> > > 2. With a definition settled upon in #1, what steps, if any, do we
>> need
>> > to
>> > > take to get from where we are to having *and keeping* that releasable
>> > > trunk? Anything to codify there?
>> > >
>> > > 3. What are the benefits of having a releasable trunk as defined here?
>> > What
>> > > are the costs? Is it worth pursuing? What are the alternatives (for
>> > > instance: a freeze before a release + stabilization focus by the
>> > community
>> > > i.e. 4.0 push or the tock in tick-tock)?
>> > >
>> > > Given the large volumes of work coming down the pike with CEP's, this
>> > seems
>> > > like a good time to at least check in on this topic as a community.
>> > >
>> > > Full disclosure: running face-first into 60+ failing tests on trunk
>> when
>> > > going through the commit process for denylisting this week brought
>> this
>> > > topic back up for me (reminds me of when I went to merge CDC back in
>> 3.6
>> > > and those test failures riled me up... I sense a pattern ;))
>> > >
>> > > Looking forward to hearing what people think.
>> > >
>> > > ~Josh
>> > >
>> >
>> >
>> > --
>> >
>> > Henrik Ingo
>> >
>> > +358 40 569 7354 <358405697354>
>> >
>> > [image: Visit us online.] <https://www.datastax.com/>  [image: Visit
>> us on
>> > Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
>> YouTube.]
>> > <
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=
>> > >
>> >   [image: Visit my LinkedIn profile.] <
>> https://www.linkedin.com/in/heingo/
>> > >
>> >
>>
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

Sorry for not catching that Benedict, you're absolutely right. So long as
we're using merge commits between branches I don't think auto-merging via
train or blocking on green CI are options via the tooling, and multi-branch
reverts will be something we should document very clearly should we even
choose to go that route (a lot of room to make mistakes there).

It may not be a huge issue as we can expect the more disruptive changes
(i.e. potentially destabilizing) to be happening on trunk only, so perhaps
we can get away with slightly different workflows or policies based on
whether you're doing a multi-branch bugfix or a feature on trunk. Bears
thinking more deeply about.

I'd also be game for revisiting our merge strategy. I don't see much
difference in labor between merging between branches vs. preparing separate
patches for an individual developer, however I'm sure there's maintenance
and integration implications there I'm not thinking of right now.

On Wed, Nov 17, 2021 at 12:03 PM benedict@apache.org <be...@apache.org>
wrote:

> I raised this before, but to highlight it again: how do these approaches
> interface with our merge strategy?
>
> We might have to rebase several dependent merge commits and want to merge
> them atomically. So far as I know these tools don’t work fantastically in
> this scenario, but if I’m wrong that’s fantastic. If not, given how
> important these things are, should we consider revisiting our merge
> strategy?
>
> From: Joshua McKenzie <jm...@apache.org>
> Date: Wednesday, 17 November 2021 at 16:39
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: [DISCUSS] Releasable trunk and quality
> Thanks for the feedback and insight Henrik; it's valuable to hear how other
> large complex infra projects have tackled this problem set.
>
> To attempt to summarize, what I got from your email:
> [Phase one]
> 1) Build Barons: rotation where there's always someone active tying
> failures to changes and adding those failures to our ticketing system
> 2) Best effort process of "test breakers" being assigned tickets to fix the
> things their work broke
> 3) Moving to a culture where we regularly revert commits that break tests
> 4) Running tests before we merge changes
>
> [Phase two]
> 1) Suite of performance tests on a regular cadence against trunk (w/hunter
> or otherwise)
> 2) Integration w/ github merge-train pipelines
>
> That cover the highlights? I agree with these points as useful places for
> us to invest in as a project and I'll work on getting this into a gdoc for
> us to align on and discuss further this week.
>
> ~Josh
>
>
> On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo <he...@datastax.com>
> wrote:
>
> > There's an old joke: How many people read Slashdot? The answer is 5. The
> > rest of us just write comments without reading... In that spirit, I
> wanted
> > to share some thoughts in response to your question, even if I know some
> of
> > it will have been said in this thread already :-)
> >
> > Basically, I just want to share what has worked well in my past
> projects...
> >
> > Visualization: Now that we have Butler running, we can already see a
> > decline in failing tests for 4.0 and trunk! This shows that contributors
> > want to do the right thing, we just need the right tools and processes to
> > achieve success.
> >
> > Process: I'm confident we will soon be back to seeing 0 failures for 4.0
> > and trunk. However, keeping that state requires constant vigilance! At
> > Mongodb we had a role called Build Baron (aka Build Cop, etc...). This
> is a
> > weekly rotating role where the person who is the Build Baron will at
> least
> > once per day go through all of the Butler dashboards to catch new
> > regressions early. We have used the same process also at Datastax to
> guard
> > our downstream fork of Cassandra 4.0. It's the responsibility of the
> Build
> > Baron to
> >  - file a jira ticket for new failures
> >  - determine which commit is responsible for introducing the regression.
> > Sometimes this is obvious, sometimes this requires "bisecting" by running
> > more builds e.g. between two nightly builds.
> >  - assign the jira ticket to the author of the commit that introduced the
> > regression
> >
> > Given that Cassandra is a community that includes part time and volunteer
> > developers, we may want to try some variation of this, such as pairing 2
> > build barons each week?
> >
> > Reverting: A policy that the commit causing the regression is
> automatically
> > reverted can be scary. It takes courage to be the junior test engineer
> who
> > reverts yesterday's commit from the founder and CTO, just to give an
> > example... Yet this is the most efficient way to keep the build green.
> And
> > it turns out it's not that much additional work for the original author
> to
> > fix the issue and then re-merge the patch.
> >
> > Merge-train: For any project with more than 1 commit per day, it will
> > inevitably happen that you need to rebase a PR before merging, and even
> if
> > it passed all tests before, after rebase it won't. In the downstream
> > Cassandra fork previously mentioned, we have tried to enable a github
> rule
> > which requires a) that all tests passed before merging, and b) the PR is
> > against the head of the branch merged into, and c) the tests were run
> after
> > such rebase. Unfortunately this leads to infinite loops where a large PR
> > may never be able to commit because it has to be rebased again and again
> > when smaller PRs can merge faster. The solution to this problem is to
> have
> > an automated process for the rebase-test-merge cycle. Gitlab supports
> such
> > a feature and calls it merge-trean:
> > https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html
> >
> > The merge-train can be considered an advanced feature and we can return
> to
> > it later. The other points should be sufficient to keep a reasonably
> green
> > trunk.
> >
> > I guess the major area where we can improve daily test coverage would be
> > performance tests. To that end we recently open sourced a nice tool that
> > can algorithmically detects performance regressions in a timeseries
> history
> > of benchmark results: https://github.com/datastax-labs/hunter Just like
> > with correctness testing it's my experience that catching regressions the
> > day they happened is much better than trying to do it at beta or rc time.
> >
> > Piotr also blogged about Hunter when it was released:
> >
> >
> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4
> >
> > henrik
> >
> >
> >
> > On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <jm...@apache.org>
> > wrote:
> >
> > > We as a project have gone back and forth on the topic of quality and
> the
> > > notion of a releasable trunk for quite a few years. If people are
> > > interested, I'd like to rekindle this discussion a bit and see if we're
> > > happy with where we are as a project or if we think there's steps we
> > should
> > > take to change the quality bar going forward. The following questions
> > have
> > > been rattling around for me for awhile:
> > >
> > > 1. How do we define what "releasable trunk" means? All reviewed by M
> > > committers? Passing N% of tests? Passing all tests plus some other
> > metrics
> > > (manual testing, raising the number of reviewers, test coverage, usage
> in
> > > dev or QA environments, etc)? Something else entirely?
> > >
> > > 2. With a definition settled upon in #1, what steps, if any, do we need
> > to
> > > take to get from where we are to having *and keeping* that releasable
> > > trunk? Anything to codify there?
> > >
> > > 3. What are the benefits of having a releasable trunk as defined here?
> > What
> > > are the costs? Is it worth pursuing? What are the alternatives (for
> > > instance: a freeze before a release + stabilization focus by the
> > community
> > > i.e. 4.0 push or the tock in tick-tock)?
> > >
> > > Given the large volumes of work coming down the pike with CEP's, this
> > seems
> > > like a good time to at least check in on this topic as a community.
> > >
> > > Full disclosure: running face-first into 60+ failing tests on trunk
> when
> > > going through the commit process for denylisting this week brought this
> > > topic back up for me (reminds me of when I went to merge CDC back in
> 3.6
> > > and those test failures riled me up... I sense a pattern ;))
> > >
> > > Looking forward to hearing what people think.
> > >
> > > ~Josh
> > >
> >
> >
> > --
> >
> > Henrik Ingo
> >
> > +358 40 569 7354 <358405697354>
> >
> > [image: Visit us online.] <https://www.datastax.com/>  [image: Visit us
> on
> > Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
> YouTube.]
> > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=
> > >
> >   [image: Visit my LinkedIn profile.] <
> https://www.linkedin.com/in/heingo/
> > >
> >
>

Re: [DISCUSS] Releasable trunk and quality

Posted by "benedict@apache.org" <be...@apache.org>.

I raised this before, but to highlight it again: how do these approaches interface with our merge strategy?

We might have to rebase several dependent merge commits and want to merge them atomically. So far as I know these tools don’t work fantastically in this scenario, but if I’m wrong that’s fantastic. If not, given how important these things are, should we consider revisiting our merge strategy?

From: Joshua McKenzie <jm...@apache.org>
Date: Wednesday, 17 November 2021 at 16:39
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] Releasable trunk and quality
Thanks for the feedback and insight Henrik; it's valuable to hear how other
large complex infra projects have tackled this problem set.

To attempt to summarize, what I got from your email:
[Phase one]
1) Build Barons: rotation where there's always someone active tying
failures to changes and adding those failures to our ticketing system
2) Best effort process of "test breakers" being assigned tickets to fix the
things their work broke
3) Moving to a culture where we regularly revert commits that break tests
4) Running tests before we merge changes

[Phase two]
1) Suite of performance tests on a regular cadence against trunk (w/hunter
or otherwise)
2) Integration w/ github merge-train pipelines

That cover the highlights? I agree with these points as useful places for
us to invest in as a project and I'll work on getting this into a gdoc for
us to align on and discuss further this week.

~Josh


On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo <he...@datastax.com>
wrote:

> There's an old joke: How many people read Slashdot? The answer is 5. The
> rest of us just write comments without reading... In that spirit, I wanted
> to share some thoughts in response to your question, even if I know some of
> it will have been said in this thread already :-)
>
> Basically, I just want to share what has worked well in my past projects...
>
> Visualization: Now that we have Butler running, we can already see a
> decline in failing tests for 4.0 and trunk! This shows that contributors
> want to do the right thing, we just need the right tools and processes to
> achieve success.
>
> Process: I'm confident we will soon be back to seeing 0 failures for 4.0
> and trunk. However, keeping that state requires constant vigilance! At
> Mongodb we had a role called Build Baron (aka Build Cop, etc...). This is a
> weekly rotating role where the person who is the Build Baron will at least
> once per day go through all of the Butler dashboards to catch new
> regressions early. We have used the same process also at Datastax to guard
> our downstream fork of Cassandra 4.0. It's the responsibility of the Build
> Baron to
>  - file a jira ticket for new failures
>  - determine which commit is responsible for introducing the regression.
> Sometimes this is obvious, sometimes this requires "bisecting" by running
> more builds e.g. between two nightly builds.
>  - assign the jira ticket to the author of the commit that introduced the
> regression
>
> Given that Cassandra is a community that includes part time and volunteer
> developers, we may want to try some variation of this, such as pairing 2
> build barons each week?
>
> Reverting: A policy that the commit causing the regression is automatically
> reverted can be scary. It takes courage to be the junior test engineer who
> reverts yesterday's commit from the founder and CTO, just to give an
> example... Yet this is the most efficient way to keep the build green. And
> it turns out it's not that much additional work for the original author to
> fix the issue and then re-merge the patch.
>
> Merge-train: For any project with more than 1 commit per day, it will
> inevitably happen that you need to rebase a PR before merging, and even if
> it passed all tests before, after rebase it won't. In the downstream
> Cassandra fork previously mentioned, we have tried to enable a github rule
> which requires a) that all tests passed before merging, and b) the PR is
> against the head of the branch merged into, and c) the tests were run after
> such rebase. Unfortunately this leads to infinite loops where a large PR
> may never be able to commit because it has to be rebased again and again
> when smaller PRs can merge faster. The solution to this problem is to have
> an automated process for the rebase-test-merge cycle. Gitlab supports such
> a feature and calls it merge-trean:
> https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html
>
> The merge-train can be considered an advanced feature and we can return to
> it later. The other points should be sufficient to keep a reasonably green
> trunk.
>
> I guess the major area where we can improve daily test coverage would be
> performance tests. To that end we recently open sourced a nice tool that
> can algorithmically detects performance regressions in a timeseries history
> of benchmark results: https://github.com/datastax-labs/hunter Just like
> with correctness testing it's my experience that catching regressions the
> day they happened is much better than trying to do it at beta or rc time.
>
> Piotr also blogged about Hunter when it was released:
>
> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4
>
> henrik
>
>
>
> On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <jm...@apache.org>
> wrote:
>
> > We as a project have gone back and forth on the topic of quality and the
> > notion of a releasable trunk for quite a few years. If people are
> > interested, I'd like to rekindle this discussion a bit and see if we're
> > happy with where we are as a project or if we think there's steps we
> should
> > take to change the quality bar going forward. The following questions
> have
> > been rattling around for me for awhile:
> >
> > 1. How do we define what "releasable trunk" means? All reviewed by M
> > committers? Passing N% of tests? Passing all tests plus some other
> metrics
> > (manual testing, raising the number of reviewers, test coverage, usage in
> > dev or QA environments, etc)? Something else entirely?
> >
> > 2. With a definition settled upon in #1, what steps, if any, do we need
> to
> > take to get from where we are to having *and keeping* that releasable
> > trunk? Anything to codify there?
> >
> > 3. What are the benefits of having a releasable trunk as defined here?
> What
> > are the costs? Is it worth pursuing? What are the alternatives (for
> > instance: a freeze before a release + stabilization focus by the
> community
> > i.e. 4.0 push or the tock in tick-tock)?
> >
> > Given the large volumes of work coming down the pike with CEP's, this
> seems
> > like a good time to at least check in on this topic as a community.
> >
> > Full disclosure: running face-first into 60+ failing tests on trunk when
> > going through the commit process for denylisting this week brought this
> > topic back up for me (reminds me of when I went to merge CDC back in 3.6
> > and those test failures riled me up... I sense a pattern ;))
> >
> > Looking forward to hearing what people think.
> >
> > ~Josh
> >
>
>
> --
>
> Henrik Ingo
>
> +358 40 569 7354 <358405697354>
>
> [image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
> Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=
> >
>   [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/
> >
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

Thanks for the feedback and insight Henrik; it's valuable to hear how other
large complex infra projects have tackled this problem set.

To attempt to summarize, what I got from your email:
[Phase one]
1) Build Barons: rotation where there's always someone active tying
failures to changes and adding those failures to our ticketing system
2) Best effort process of "test breakers" being assigned tickets to fix the
things their work broke
3) Moving to a culture where we regularly revert commits that break tests
4) Running tests before we merge changes

[Phase two]
1) Suite of performance tests on a regular cadence against trunk (w/hunter
or otherwise)
2) Integration w/ github merge-train pipelines

That cover the highlights? I agree with these points as useful places for
us to invest in as a project and I'll work on getting this into a gdoc for
us to align on and discuss further this week.

~Josh


On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo <he...@datastax.com>
wrote:

> There's an old joke: How many people read Slashdot? The answer is 5. The
> rest of us just write comments without reading... In that spirit, I wanted
> to share some thoughts in response to your question, even if I know some of
> it will have been said in this thread already :-)
>
> Basically, I just want to share what has worked well in my past projects...
>
> Visualization: Now that we have Butler running, we can already see a
> decline in failing tests for 4.0 and trunk! This shows that contributors
> want to do the right thing, we just need the right tools and processes to
> achieve success.
>
> Process: I'm confident we will soon be back to seeing 0 failures for 4.0
> and trunk. However, keeping that state requires constant vigilance! At
> Mongodb we had a role called Build Baron (aka Build Cop, etc...). This is a
> weekly rotating role where the person who is the Build Baron will at least
> once per day go through all of the Butler dashboards to catch new
> regressions early. We have used the same process also at Datastax to guard
> our downstream fork of Cassandra 4.0. It's the responsibility of the Build
> Baron to
>  - file a jira ticket for new failures
>  - determine which commit is responsible for introducing the regression.
> Sometimes this is obvious, sometimes this requires "bisecting" by running
> more builds e.g. between two nightly builds.
>  - assign the jira ticket to the author of the commit that introduced the
> regression
>
> Given that Cassandra is a community that includes part time and volunteer
> developers, we may want to try some variation of this, such as pairing 2
> build barons each week?
>
> Reverting: A policy that the commit causing the regression is automatically
> reverted can be scary. It takes courage to be the junior test engineer who
> reverts yesterday's commit from the founder and CTO, just to give an
> example... Yet this is the most efficient way to keep the build green. And
> it turns out it's not that much additional work for the original author to
> fix the issue and then re-merge the patch.
>
> Merge-train: For any project with more than 1 commit per day, it will
> inevitably happen that you need to rebase a PR before merging, and even if
> it passed all tests before, after rebase it won't. In the downstream
> Cassandra fork previously mentioned, we have tried to enable a github rule
> which requires a) that all tests passed before merging, and b) the PR is
> against the head of the branch merged into, and c) the tests were run after
> such rebase. Unfortunately this leads to infinite loops where a large PR
> may never be able to commit because it has to be rebased again and again
> when smaller PRs can merge faster. The solution to this problem is to have
> an automated process for the rebase-test-merge cycle. Gitlab supports such
> a feature and calls it merge-trean:
> https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html
>
> The merge-train can be considered an advanced feature and we can return to
> it later. The other points should be sufficient to keep a reasonably green
> trunk.
>
> I guess the major area where we can improve daily test coverage would be
> performance tests. To that end we recently open sourced a nice tool that
> can algorithmically detects performance regressions in a timeseries history
> of benchmark results: https://github.com/datastax-labs/hunter Just like
> with correctness testing it's my experience that catching regressions the
> day they happened is much better than trying to do it at beta or rc time.
>
> Piotr also blogged about Hunter when it was released:
>
> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4
>
> henrik
>
>
>
> On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <jm...@apache.org>
> wrote:
>
> > We as a project have gone back and forth on the topic of quality and the
> > notion of a releasable trunk for quite a few years. If people are
> > interested, I'd like to rekindle this discussion a bit and see if we're
> > happy with where we are as a project or if we think there's steps we
> should
> > take to change the quality bar going forward. The following questions
> have
> > been rattling around for me for awhile:
> >
> > 1. How do we define what "releasable trunk" means? All reviewed by M
> > committers? Passing N% of tests? Passing all tests plus some other
> metrics
> > (manual testing, raising the number of reviewers, test coverage, usage in
> > dev or QA environments, etc)? Something else entirely?
> >
> > 2. With a definition settled upon in #1, what steps, if any, do we need
> to
> > take to get from where we are to having *and keeping* that releasable
> > trunk? Anything to codify there?
> >
> > 3. What are the benefits of having a releasable trunk as defined here?
> What
> > are the costs? Is it worth pursuing? What are the alternatives (for
> > instance: a freeze before a release + stabilization focus by the
> community
> > i.e. 4.0 push or the tock in tick-tock)?
> >
> > Given the large volumes of work coming down the pike with CEP's, this
> seems
> > like a good time to at least check in on this topic as a community.
> >
> > Full disclosure: running face-first into 60+ failing tests on trunk when
> > going through the commit process for denylisting this week brought this
> > topic back up for me (reminds me of when I went to merge CDC back in 3.6
> > and those test failures riled me up... I sense a pattern ;))
> >
> > Looking forward to hearing what people think.
> >
> > ~Josh
> >
>
>
> --
>
> Henrik Ingo
>
> +358 40 569 7354 <358405697354>
>
> [image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
> Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=
> >
>   [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/
> >
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Henrik Ingo <he...@datastax.com>.

There's an old joke: How many people read Slashdot? The answer is 5. The
rest of us just write comments without reading... In that spirit, I wanted
to share some thoughts in response to your question, even if I know some of
it will have been said in this thread already :-)

Basically, I just want to share what has worked well in my past projects...

Visualization: Now that we have Butler running, we can already see a
decline in failing tests for 4.0 and trunk! This shows that contributors
want to do the right thing, we just need the right tools and processes to
achieve success.

Process: I'm confident we will soon be back to seeing 0 failures for 4.0
and trunk. However, keeping that state requires constant vigilance! At
Mongodb we had a role called Build Baron (aka Build Cop, etc...). This is a
weekly rotating role where the person who is the Build Baron will at least
once per day go through all of the Butler dashboards to catch new
regressions early. We have used the same process also at Datastax to guard
our downstream fork of Cassandra 4.0. It's the responsibility of the Build
Baron to
 - file a jira ticket for new failures
 - determine which commit is responsible for introducing the regression.
Sometimes this is obvious, sometimes this requires "bisecting" by running
more builds e.g. between two nightly builds.
 - assign the jira ticket to the author of the commit that introduced the
regression

Given that Cassandra is a community that includes part time and volunteer
developers, we may want to try some variation of this, such as pairing 2
build barons each week?

Reverting: A policy that the commit causing the regression is automatically
reverted can be scary. It takes courage to be the junior test engineer who
reverts yesterday's commit from the founder and CTO, just to give an
example... Yet this is the most efficient way to keep the build green. And
it turns out it's not that much additional work for the original author to
fix the issue and then re-merge the patch.

Merge-train: For any project with more than 1 commit per day, it will
inevitably happen that you need to rebase a PR before merging, and even if
it passed all tests before, after rebase it won't. In the downstream
Cassandra fork previously mentioned, we have tried to enable a github rule
which requires a) that all tests passed before merging, and b) the PR is
against the head of the branch merged into, and c) the tests were run after
such rebase. Unfortunately this leads to infinite loops where a large PR
may never be able to commit because it has to be rebased again and again
when smaller PRs can merge faster. The solution to this problem is to have
an automated process for the rebase-test-merge cycle. Gitlab supports such
a feature and calls it merge-trean:
https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html

The merge-train can be considered an advanced feature and we can return to
it later. The other points should be sufficient to keep a reasonably green
trunk.

I guess the major area where we can improve daily test coverage would be
performance tests. To that end we recently open sourced a nice tool that
can algorithmically detects performance regressions in a timeseries history
of benchmark results: https://github.com/datastax-labs/hunter Just like
with correctness testing it's my experience that catching regressions the
day they happened is much better than trying to do it at beta or rc time.

Piotr also blogged about Hunter when it was released:
https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4

henrik

On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <jm...@apache.org>
wrote:

> We as a project have gone back and forth on the topic of quality and the
> notion of a releasable trunk for quite a few years. If people are
> interested, I'd like to rekindle this discussion a bit and see if we're
> happy with where we are as a project or if we think there's steps we should
> take to change the quality bar going forward. The following questions have
> been rattling around for me for awhile:
>
> 1. How do we define what "releasable trunk" means? All reviewed by M
> committers? Passing N% of tests? Passing all tests plus some other metrics
> (manual testing, raising the number of reviewers, test coverage, usage in
> dev or QA environments, etc)? Something else entirely?
>
> 2. With a definition settled upon in #1, what steps, if any, do we need to
> take to get from where we are to having *and keeping* that releasable
> trunk? Anything to codify there?
>
> 3. What are the benefits of having a releasable trunk as defined here? What
> are the costs? Is it worth pursuing? What are the alternatives (for
> instance: a freeze before a release + stabilization focus by the community
> i.e. 4.0 push or the tock in tick-tock)?
>
> Given the large volumes of work coming down the pike with CEP's, this seems
> like a good time to at least check in on this topic as a community.
>
> Full disclosure: running face-first into 60+ failing tests on trunk when
> going through the commit process for denylisting this week brought this
> topic back up for me (reminds me of when I went to merge CDC back in 3.6
> and those test failures riled me up... I sense a pattern ;))
>
> Looking forward to hearing what people think.
>
> ~Josh
>

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

>
> It'd be great to
> expand this, but it's been somewhat difficult to do, since last time a
> bootstrap test was attempted, it has immediately uncovered enough issues to
> keep us busy fixing them for quite some time. Maybe it's about time to try
> that again.

I'm going to go with a "yes please". :)

On Wed, Nov 3, 2021 at 9:27 AM Oleksandr Petrov <ol...@gmail.com>
wrote:

> I'll merge 16262 and the Harry blog-post that accompanies it shortly.
> Having 16262 merged will significantly reduce the amount of resistance one
> has to overcome in order to write a fuzz test. But this, of course, only
> covers short/small/unit-test-like tests.
>
> For longer running tests, I guess for now we will have to rely on folks
> (hopefully) running long fuzz tests and reporting issues. But eventually
> it'd be great to have enough automation around it so that anyone could do
> that and where test results are public.
>
> In regard to long-running tests, currently with Harry we can run three
> kinds of long-running tests:
> 1. Stress-like concurrent write workload, followed by periods of quiescence
> and then validation
> 2. Writes with injected faults, followed by repair and validation
> 3. Stress-like concurrent read/write workload with fault injection without
> validation, for finding rare edge conditions / triggering possible
> exceptions
>
> Which means that quorum read and write paths (for all kinds of schemas,
> including all possible kinds of read and write queries), compactions,
> repairs, read-repairs and hints are covered fairly well. However things
> like bootstrap and other kinds of range movements aren't. It'd be great to
> expand this, but it's been somewhat difficult to do, since last time a
> bootstrap test was attempted, it has immediately uncovered enough issues to
> keep us busy fixing them for quite some time. Maybe it's about time to try
> that again.
>
> For short tests, you can think of Harry as a tool to save you time and
> allow focusing on higher-level test meaning rather than creating schema and
> coming up with specific values to insert/select.
>
> Thanks
> --Alex
>
>
>
> On Tue, Nov 2, 2021 at 5:30 PM Ekaterina Dimitrova <e....@gmail.com>
> wrote:
>
> > Did I hear my name? 😁
> > Sorry Josh, you are wrong :-) 2 out of 30 in two months were real bugs
> > discovered by pflaky tests and one of them was very hard to hit. So
> 6-7%. I
> > think that report I sent back then didn’t come through so the topic was
> > cleared in a follow up mail by Benjamin; with a lot of sweat but we kept
> to
> > the promised 4.0 standard.
> >
> > Now back to this topic:
> > - green CI without enough test coverage is nothing more than green CI
> > unfortunately to me.  I know this is an elephant but I won’t sleep well
> > tonight if I don’t mention it.
> > - I believe the looping of tests mentioned by Berenguer can help for
> > verifying no new weird flakiness is introduced by new tests added. And of
> > course it helps a lot during fixing flaky tests, I think that’s clear.
> >
> >  I think that it would be great if each such test
> > > > (or
> > > > > test group) was guaranteed to have a ticket and some preliminary
> > > analysis
> > > > > was done to confirm it is just a test problem before releasing the
> > new
> > > > > version
> >
> > Probably not bad idea. Preliminary analysis. But we need to get into the
> > cadence of regular checking our CI; divide and conquer on regular basis
> > between all of us. Not to mention it is way easier to follow up recently
> > introduced issues with the people who worked on stuff then trying to find
> > out what happened a year ago in a rush before a release. I agree it is
> not
> > about the number but what stays behind it.
> >
> > Requiring all tests to run pre every merge, easily we can add this in
> > circle but there are many people who don’t have access to high resources
> so
> > again they won’t be able to run absolutely everything. At the end
> > everything is up to the diligence of the reviewers/committers. Plus
> > official CI is Jenkins and we know there are different infra related
> > failures in the different CIs. Not an easy topic, indeed. I support
> running
> > all tests, just having in mind all the related issues/complications.
> >
> > I would say in my mind upgrade tests are particularly important to be
> green
> > before a release, too.
> >
> > Seems to me we have the tools, but now it is time to organize the rhythm
> in
> > an efficient manner.
> >
> > Best regards,
> > Ekaterina
> >
> >
> > On Tue, 2 Nov 2021 at 11:06, Joshua McKenzie <jm...@apache.org>
> wrote:
> >
> > > To your point Jacek, I believe in the run up to 4.0 Ekaterina did some
> > > analysis and something like 18% (correct me if I'm wrong here) of the
> > test
> > > failures we were considering "flaky tests" were actual product defects
> in
> > > the database. With that in mind, we should be uncomfortable cutting a
> > > release if we have 6 test failures since there's every likelihood one
> of
> > > them is a surfaced bug.
> > >
> > > ensuring our best practices are followed for every merge
> > >
> > > I totally agree but I also don't think we have this codified (unless
> I'm
> > > just completely missing something - very possible! ;)) Seems like we
> have
> > > different circle configs, different sets of jobs being run, Harry /
> > Hunter
> > > (maybe?) / ?? run on some but not all commits and/or all branches,
> > > manual performance testing on specific releases but nothing surfaced
> > > formally to the project as a reproducible suite like we used to have
> > years
> > > ago (primitive though it was at the time with what it covered).
> > >
> > > If we *don't* have this clarified right now, I think there's
> significant
> > > value in enumerating and at least documenting what our agreed upon best
> > > practices are so we can start holding ourselves and each other
> > accountable
> > > to that bar. Given some of the incredible but sweeping work coming down
> > the
> > > pike, this strikes me as a thing we need to be proactive and vigilant
> > about
> > > so as not to regress.
> > >
> > > ~Josh
> > >
> > > On Tue, Nov 2, 2021 at 3:49 AM Jacek Lewandowski <
> > > lewandowski.jacek@gmail.com> wrote:
> > >
> > > > >
> > > > > we already have a way to confirm flakiness on circle by running the
> > > test
> > > > > repeatedly N times. Like 100 or 500. That has proven to work very
> > well
> > > > > so far, at least for me. #collaborating #justfyi
> > > > >
> > > >
> > > > It does not prove that it is the test flakiness. It still can be a
> bug
> > in
> > > > the code which occurs intermittently under some rare conditions
> > > >
> > > >
> > > > - - -- --- ----- -------- -------------
> > > > Jacek Lewandowski
> > > >
> > > >
> > > > On Tue, Nov 2, 2021 at 7:46 AM Berenguer Blasi <
> > berenguerblasi@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > we already have a way to confirm flakiness on circle by running the
> > > test
> > > > > repeatedly N times. Like 100 or 500. That has proven to work very
> > well
> > > > > so far, at least for me. #collaborating #justfyi
> > > > >
> > > > > On the 60+ failures it is not as bad as it looks. Let me explain. I
> > > have
> > > > > been tracking failures in 4.0 and trunk daily, it's grown as a
> habit
> > in
> > > > > me after the 4.0 push. And 4.0 and trunk were hovering around <10
> > > > > failures solidly (you can check jenkins ci graphs). The random
> bisect
> > > or
> > > > > fix was needed leaving behind 3 or 4 tests that have defeated
> > already 2
> > > > > or 3 committers, so the really tough guys. I am reasonably
> convinced
> > > > > once the 60+ failures fix merges we'll be back to the <10 failures
> > with
> > > > > relative little effort.
> > > > >
> > > > > So we're just in the middle of a 'fix' but overall we shouldn't be
> as
> > > > > bad as it looks now as we've been quite good at keeping CI
> green-ish
> > > imo.
> > > > >
> > > > > Also +1 to releasable branches, which whatever we settle it means
> it
> > is
> > > > > not a wall of failures, bc of reasons explained like the hidden
> costs
> > > etc
> > > > >
> > > > > My 2cts.
> > > > >
> > > > > On 2/11/21 6:07, Jacek Lewandowski wrote:
> > > > > >> I don’t think means guaranteeing there are no failing tests
> > (though
> > > > > >> ideally this would also happen), but about ensuring our best
> > > practices
> > > > > are
> > > > > >> followed for every merge. 4.0 took so long to release because of
> > the
> > > > > amount
> > > > > >> of hidden work that was created by merging work that didn’t meet
> > the
> > > > > >> standard for release.
> > > > > >>
> > > > > > Tests are sometimes considered flaky because they fail
> > intermittently
> > > > but
> > > > > > it may not be related to the insufficiently consistent test
> > > > > implementation
> > > > > > and can reveal some real problem in the production code. I saw
> that
> > > in
> > > > > > various codebases and I think that it would be great if each such
> > > test
> > > > > (or
> > > > > > test group) was guaranteed to have a ticket and some preliminary
> > > > analysis
> > > > > > was done to confirm it is just a test problem before releasing
> the
> > > new
> > > > > > version
> > > > > >
> > > > > > Historically we have also had significant pressure to backport
> > > features
> > > > > to
> > > > > >> earlier versions due to the cost and risk of upgrading. If we
> > > maintain
> > > > > >> broader version compatibility for upgrade, and reduce the risk
> of
> > > > > adopting
> > > > > >> newer versions, then this pressure is also reduced
> significantly.
> > > > Though
> > > > > >> perhaps we will stick to our guns here anyway, as there seems to
> > be
> > > > > renewed
> > > > > >> pressure to limit work in GA releases to bug fixes exclusively.
> It
> > > > > remains
> > > > > >> to be seen if this holds.
> > > > > >
> > > > > > Are there any precise requirements for supported upgrade and
> > > downgrade
> > > > > > paths?
> > > > > >
> > > > > > Thanks
> > > > > > - - -- --- ----- -------- -------------
> > > > > > Jacek Lewandowski
> > > > > >
> > > > > >
> > > > > > On Sat, Oct 30, 2021 at 4:07 PM benedict@apache.org <
> > > > benedict@apache.org
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > >>> How do we define what "releasable trunk" means?
> > > > > >> For me, the major criteria is ensuring that work is not merged
> > that
> > > is
> > > > > >> known to require follow-up work, or could reasonably have been
> > known
> > > > to
> > > > > >> require follow-up work if better QA practices had been followed.
> > > > > >>
> > > > > >> So, a big part of this is ensuring we continue to exceed our
> > targets
> > > > for
> > > > > >> improved QA. For me this means trying to weave tools like Harry
> > and
> > > > the
> > > > > >> Simulator into our development workflow early on, but we’ll see
> > how
> > > > well
> > > > > >> these tools gain broader adoption. This also means focus in
> > general
> > > on
> > > > > >> possible negative effects of a change.
> > > > > >>
> > > > > >> I think we could do with producing guidance documentation for
> how
> > to
> > > > > >> approach QA, where we can record our best practices and evolve
> > them
> > > as
> > > > > we
> > > > > >> discover flaws or pitfalls, either for ergonomics or for bug
> > > > discovery.
> > > > > >>
> > > > > >>> What are the benefits of having a releasable trunk as defined
> > here?
> > > > > >> If we want to have any hope of meeting reasonable release
> cadences
> > > > _and_
> > > > > >> the high project quality we expect today, then I think a
> > ~shippable
> > > > > trunk
> > > > > >> policy is an absolute necessity.
> > > > > >>
> > > > > >> I don’t think means guaranteeing there are no failing tests
> > (though
> > > > > >> ideally this would also happen), but about ensuring our best
> > > practices
> > > > > are
> > > > > >> followed for every merge. 4.0 took so long to release because of
> > the
> > > > > amount
> > > > > >> of hidden work that was created by merging work that didn’t meet
> > the
> > > > > >> standard for release.
> > > > > >>
> > > > > >> Historically we have also had significant pressure to backport
> > > > features
> > > > > to
> > > > > >> earlier versions due to the cost and risk of upgrading. If we
> > > maintain
> > > > > >> broader version compatibility for upgrade, and reduce the risk
> of
> > > > > adopting
> > > > > >> newer versions, then this pressure is also reduced
> significantly.
> > > > Though
> > > > > >> perhaps we will stick to our guns here anyway, as there seems to
> > be
> > > > > renewed
> > > > > >> pressure to limit work in GA releases to bug fixes exclusively.
> It
> > > > > remains
> > > > > >> to be seen if this holds.
> > > > > >>
> > > > > >>> What are the costs?
> > > > > >> I think the costs are quite low, perhaps even negative. Hidden
> > work
> > > > > >> produced by merges that break things can be much more costly
> than
> > > > > getting
> > > > > >> the work right first time, as attribution is much more
> > challenging.
> > > > > >>
> > > > > >> One cost that is created, however, is for version compatibility
> as
> > > we
> > > > > >> cannot say “well, this is a minor version bump so we don’t need
> to
> > > > > support
> > > > > >> downgrade”. But I think we should be investing in this anyway
> for
> > > > > operator
> > > > > >> simplicity and confidence, so I actually see this as a benefit
> as
> > > > well.
> > > > > >>
> > > > > >>> Full disclosure: running face-first into 60+ failing tests on
> > trunk
> > > > > >> I have to apologise here. CircleCI did not uncover these
> problems,
> > > > > >> apparently due to some way it resolves dependencies, and so I am
> > > > > >> responsible for a significant number of these and have been
> quite
> > > sick
> > > > > >> since.
> > > > > >>
> > > > > >> I think a push to eliminate flaky tests will probably help here
> in
> > > > > future,
> > > > > >> though, and perhaps the project needs to have some (low)
> threshold
> > > of
> > > > > flaky
> > > > > >> or failing tests at which point we block merges to force a
> > > correction.
> > > > > >>
> > > > > >>
> > > > > >> From: Joshua McKenzie <jm...@apache.org>
> > > > > >> Date: Saturday, 30 October 2021 at 14:00
> > > > > >> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> > > > > >> Subject: [DISCUSS] Releasable trunk and quality
> > > > > >> We as a project have gone back and forth on the topic of quality
> > and
> > > > the
> > > > > >> notion of a releasable trunk for quite a few years. If people
> are
> > > > > >> interested, I'd like to rekindle this discussion a bit and see
> if
> > > > we're
> > > > > >> happy with where we are as a project or if we think there's
> steps
> > we
> > > > > should
> > > > > >> take to change the quality bar going forward. The following
> > > questions
> > > > > have
> > > > > >> been rattling around for me for awhile:
> > > > > >>
> > > > > >> 1. How do we define what "releasable trunk" means? All reviewed
> > by M
> > > > > >> committers? Passing N% of tests? Passing all tests plus some
> other
> > > > > metrics
> > > > > >> (manual testing, raising the number of reviewers, test coverage,
> > > usage
> > > > > in
> > > > > >> dev or QA environments, etc)? Something else entirely?
> > > > > >>
> > > > > >> 2. With a definition settled upon in #1, what steps, if any, do
> we
> > > > need
> > > > > to
> > > > > >> take to get from where we are to having *and keeping* that
> > > releasable
> > > > > >> trunk? Anything to codify there?
> > > > > >>
> > > > > >> 3. What are the benefits of having a releasable trunk as defined
> > > here?
> > > > > What
> > > > > >> are the costs? Is it worth pursuing? What are the alternatives
> > (for
> > > > > >> instance: a freeze before a release + stabilization focus by the
> > > > > community
> > > > > >> i.e. 4.0 push or the tock in tick-tock)?
> > > > > >>
> > > > > >> Given the large volumes of work coming down the pike with CEP's,
> > > this
> > > > > seems
> > > > > >> like a good time to at least check in on this topic as a
> > community.
> > > > > >>
> > > > > >> Full disclosure: running face-first into 60+ failing tests on
> > trunk
> > > > when
> > > > > >> going through the commit process for denylisting this week
> brought
> > > > this
> > > > > >> topic back up for me (reminds me of when I went to merge CDC
> back
> > in
> > > > 3.6
> > > > > >> and those test failures riled me up... I sense a pattern ;))
> > > > > >>
> > > > > >> Looking forward to hearing what people think.
> > > > > >>
> > > > > >> ~Josh
> > > > > >>
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> alex p
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Oleksandr Petrov <ol...@gmail.com>.

I'll merge 16262 and the Harry blog-post that accompanies it shortly.
Having 16262 merged will significantly reduce the amount of resistance one
has to overcome in order to write a fuzz test. But this, of course, only
covers short/small/unit-test-like tests.

For longer running tests, I guess for now we will have to rely on folks
(hopefully) running long fuzz tests and reporting issues. But eventually
it'd be great to have enough automation around it so that anyone could do
that and where test results are public.

In regard to long-running tests, currently with Harry we can run three
kinds of long-running tests:
1. Stress-like concurrent write workload, followed by periods of quiescence
and then validation
2. Writes with injected faults, followed by repair and validation
3. Stress-like concurrent read/write workload with fault injection without
validation, for finding rare edge conditions / triggering possible
exceptions

Which means that quorum read and write paths (for all kinds of schemas,
including all possible kinds of read and write queries), compactions,
repairs, read-repairs and hints are covered fairly well. However things
like bootstrap and other kinds of range movements aren't. It'd be great to
expand this, but it's been somewhat difficult to do, since last time a
bootstrap test was attempted, it has immediately uncovered enough issues to
keep us busy fixing them for quite some time. Maybe it's about time to try
that again.

For short tests, you can think of Harry as a tool to save you time and
allow focusing on higher-level test meaning rather than creating schema and
coming up with specific values to insert/select.

Thanks
--Alex



On Tue, Nov 2, 2021 at 5:30 PM Ekaterina Dimitrova <e....@gmail.com>
wrote:

> Did I hear my name? 😁
> Sorry Josh, you are wrong :-) 2 out of 30 in two months were real bugs
> discovered by pflaky tests and one of them was very hard to hit. So 6-7%. I
> think that report I sent back then didn’t come through so the topic was
> cleared in a follow up mail by Benjamin; with a lot of sweat but we kept to
> the promised 4.0 standard.
>
> Now back to this topic:
> - green CI without enough test coverage is nothing more than green CI
> unfortunately to me.  I know this is an elephant but I won’t sleep well
> tonight if I don’t mention it.
> - I believe the looping of tests mentioned by Berenguer can help for
> verifying no new weird flakiness is introduced by new tests added. And of
> course it helps a lot during fixing flaky tests, I think that’s clear.
>
>  I think that it would be great if each such test
> > > (or
> > > > test group) was guaranteed to have a ticket and some preliminary
> > analysis
> > > > was done to confirm it is just a test problem before releasing the
> new
> > > > version
>
> Probably not bad idea. Preliminary analysis. But we need to get into the
> cadence of regular checking our CI; divide and conquer on regular basis
> between all of us. Not to mention it is way easier to follow up recently
> introduced issues with the people who worked on stuff then trying to find
> out what happened a year ago in a rush before a release. I agree it is not
> about the number but what stays behind it.
>
> Requiring all tests to run pre every merge, easily we can add this in
> circle but there are many people who don’t have access to high resources so
> again they won’t be able to run absolutely everything. At the end
> everything is up to the diligence of the reviewers/committers. Plus
> official CI is Jenkins and we know there are different infra related
> failures in the different CIs. Not an easy topic, indeed. I support running
> all tests, just having in mind all the related issues/complications.
>
> I would say in my mind upgrade tests are particularly important to be green
> before a release, too.
>
> Seems to me we have the tools, but now it is time to organize the rhythm in
> an efficient manner.
>
> Best regards,
> Ekaterina
>
>
> On Tue, 2 Nov 2021 at 11:06, Joshua McKenzie <jm...@apache.org> wrote:
>
> > To your point Jacek, I believe in the run up to 4.0 Ekaterina did some
> > analysis and something like 18% (correct me if I'm wrong here) of the
> test
> > failures we were considering "flaky tests" were actual product defects in
> > the database. With that in mind, we should be uncomfortable cutting a
> > release if we have 6 test failures since there's every likelihood one of
> > them is a surfaced bug.
> >
> > ensuring our best practices are followed for every merge
> >
> > I totally agree but I also don't think we have this codified (unless I'm
> > just completely missing something - very possible! ;)) Seems like we have
> > different circle configs, different sets of jobs being run, Harry /
> Hunter
> > (maybe?) / ?? run on some but not all commits and/or all branches,
> > manual performance testing on specific releases but nothing surfaced
> > formally to the project as a reproducible suite like we used to have
> years
> > ago (primitive though it was at the time with what it covered).
> >
> > If we *don't* have this clarified right now, I think there's significant
> > value in enumerating and at least documenting what our agreed upon best
> > practices are so we can start holding ourselves and each other
> accountable
> > to that bar. Given some of the incredible but sweeping work coming down
> the
> > pike, this strikes me as a thing we need to be proactive and vigilant
> about
> > so as not to regress.
> >
> > ~Josh
> >
> > On Tue, Nov 2, 2021 at 3:49 AM Jacek Lewandowski <
> > lewandowski.jacek@gmail.com> wrote:
> >
> > > >
> > > > we already have a way to confirm flakiness on circle by running the
> > test
> > > > repeatedly N times. Like 100 or 500. That has proven to work very
> well
> > > > so far, at least for me. #collaborating #justfyi
> > > >
> > >
> > > It does not prove that it is the test flakiness. It still can be a bug
> in
> > > the code which occurs intermittently under some rare conditions
> > >
> > >
> > > - - -- --- ----- -------- -------------
> > > Jacek Lewandowski
> > >
> > >
> > > On Tue, Nov 2, 2021 at 7:46 AM Berenguer Blasi <
> berenguerblasi@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > we already have a way to confirm flakiness on circle by running the
> > test
> > > > repeatedly N times. Like 100 or 500. That has proven to work very
> well
> > > > so far, at least for me. #collaborating #justfyi
> > > >
> > > > On the 60+ failures it is not as bad as it looks. Let me explain. I
> > have
> > > > been tracking failures in 4.0 and trunk daily, it's grown as a habit
> in
> > > > me after the 4.0 push. And 4.0 and trunk were hovering around <10
> > > > failures solidly (you can check jenkins ci graphs). The random bisect
> > or
> > > > fix was needed leaving behind 3 or 4 tests that have defeated
> already 2
> > > > or 3 committers, so the really tough guys. I am reasonably convinced
> > > > once the 60+ failures fix merges we'll be back to the <10 failures
> with
> > > > relative little effort.
> > > >
> > > > So we're just in the middle of a 'fix' but overall we shouldn't be as
> > > > bad as it looks now as we've been quite good at keeping CI green-ish
> > imo.
> > > >
> > > > Also +1 to releasable branches, which whatever we settle it means it
> is
> > > > not a wall of failures, bc of reasons explained like the hidden costs
> > etc
> > > >
> > > > My 2cts.
> > > >
> > > > On 2/11/21 6:07, Jacek Lewandowski wrote:
> > > > >> I don’t think means guaranteeing there are no failing tests
> (though
> > > > >> ideally this would also happen), but about ensuring our best
> > practices
> > > > are
> > > > >> followed for every merge. 4.0 took so long to release because of
> the
> > > > amount
> > > > >> of hidden work that was created by merging work that didn’t meet
> the
> > > > >> standard for release.
> > > > >>
> > > > > Tests are sometimes considered flaky because they fail
> intermittently
> > > but
> > > > > it may not be related to the insufficiently consistent test
> > > > implementation
> > > > > and can reveal some real problem in the production code. I saw that
> > in
> > > > > various codebases and I think that it would be great if each such
> > test
> > > > (or
> > > > > test group) was guaranteed to have a ticket and some preliminary
> > > analysis
> > > > > was done to confirm it is just a test problem before releasing the
> > new
> > > > > version
> > > > >
> > > > > Historically we have also had significant pressure to backport
> > features
> > > > to
> > > > >> earlier versions due to the cost and risk of upgrading. If we
> > maintain
> > > > >> broader version compatibility for upgrade, and reduce the risk of
> > > > adopting
> > > > >> newer versions, then this pressure is also reduced significantly.
> > > Though
> > > > >> perhaps we will stick to our guns here anyway, as there seems to
> be
> > > > renewed
> > > > >> pressure to limit work in GA releases to bug fixes exclusively. It
> > > > remains
> > > > >> to be seen if this holds.
> > > > >
> > > > > Are there any precise requirements for supported upgrade and
> > downgrade
> > > > > paths?
> > > > >
> > > > > Thanks
> > > > > - - -- --- ----- -------- -------------
> > > > > Jacek Lewandowski
> > > > >
> > > > >
> > > > > On Sat, Oct 30, 2021 at 4:07 PM benedict@apache.org <
> > > benedict@apache.org
> > > > >
> > > > > wrote:
> > > > >
> > > > >>> How do we define what "releasable trunk" means?
> > > > >> For me, the major criteria is ensuring that work is not merged
> that
> > is
> > > > >> known to require follow-up work, or could reasonably have been
> known
> > > to
> > > > >> require follow-up work if better QA practices had been followed.
> > > > >>
> > > > >> So, a big part of this is ensuring we continue to exceed our
> targets
> > > for
> > > > >> improved QA. For me this means trying to weave tools like Harry
> and
> > > the
> > > > >> Simulator into our development workflow early on, but we’ll see
> how
> > > well
> > > > >> these tools gain broader adoption. This also means focus in
> general
> > on
> > > > >> possible negative effects of a change.
> > > > >>
> > > > >> I think we could do with producing guidance documentation for how
> to
> > > > >> approach QA, where we can record our best practices and evolve
> them
> > as
> > > > we
> > > > >> discover flaws or pitfalls, either for ergonomics or for bug
> > > discovery.
> > > > >>
> > > > >>> What are the benefits of having a releasable trunk as defined
> here?
> > > > >> If we want to have any hope of meeting reasonable release cadences
> > > _and_
> > > > >> the high project quality we expect today, then I think a
> ~shippable
> > > > trunk
> > > > >> policy is an absolute necessity.
> > > > >>
> > > > >> I don’t think means guaranteeing there are no failing tests
> (though
> > > > >> ideally this would also happen), but about ensuring our best
> > practices
> > > > are
> > > > >> followed for every merge. 4.0 took so long to release because of
> the
> > > > amount
> > > > >> of hidden work that was created by merging work that didn’t meet
> the
> > > > >> standard for release.
> > > > >>
> > > > >> Historically we have also had significant pressure to backport
> > > features
> > > > to
> > > > >> earlier versions due to the cost and risk of upgrading. If we
> > maintain
> > > > >> broader version compatibility for upgrade, and reduce the risk of
> > > > adopting
> > > > >> newer versions, then this pressure is also reduced significantly.
> > > Though
> > > > >> perhaps we will stick to our guns here anyway, as there seems to
> be
> > > > renewed
> > > > >> pressure to limit work in GA releases to bug fixes exclusively. It
> > > > remains
> > > > >> to be seen if this holds.
> > > > >>
> > > > >>> What are the costs?
> > > > >> I think the costs are quite low, perhaps even negative. Hidden
> work
> > > > >> produced by merges that break things can be much more costly than
> > > > getting
> > > > >> the work right first time, as attribution is much more
> challenging.
> > > > >>
> > > > >> One cost that is created, however, is for version compatibility as
> > we
> > > > >> cannot say “well, this is a minor version bump so we don’t need to
> > > > support
> > > > >> downgrade”. But I think we should be investing in this anyway for
> > > > operator
> > > > >> simplicity and confidence, so I actually see this as a benefit as
> > > well.
> > > > >>
> > > > >>> Full disclosure: running face-first into 60+ failing tests on
> trunk
> > > > >> I have to apologise here. CircleCI did not uncover these problems,
> > > > >> apparently due to some way it resolves dependencies, and so I am
> > > > >> responsible for a significant number of these and have been quite
> > sick
> > > > >> since.
> > > > >>
> > > > >> I think a push to eliminate flaky tests will probably help here in
> > > > future,
> > > > >> though, and perhaps the project needs to have some (low) threshold
> > of
> > > > flaky
> > > > >> or failing tests at which point we block merges to force a
> > correction.
> > > > >>
> > > > >>
> > > > >> From: Joshua McKenzie <jm...@apache.org>
> > > > >> Date: Saturday, 30 October 2021 at 14:00
> > > > >> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> > > > >> Subject: [DISCUSS] Releasable trunk and quality
> > > > >> We as a project have gone back and forth on the topic of quality
> and
> > > the
> > > > >> notion of a releasable trunk for quite a few years. If people are
> > > > >> interested, I'd like to rekindle this discussion a bit and see if
> > > we're
> > > > >> happy with where we are as a project or if we think there's steps
> we
> > > > should
> > > > >> take to change the quality bar going forward. The following
> > questions
> > > > have
> > > > >> been rattling around for me for awhile:
> > > > >>
> > > > >> 1. How do we define what "releasable trunk" means? All reviewed
> by M
> > > > >> committers? Passing N% of tests? Passing all tests plus some other
> > > > metrics
> > > > >> (manual testing, raising the number of reviewers, test coverage,
> > usage
> > > > in
> > > > >> dev or QA environments, etc)? Something else entirely?
> > > > >>
> > > > >> 2. With a definition settled upon in #1, what steps, if any, do we
> > > need
> > > > to
> > > > >> take to get from where we are to having *and keeping* that
> > releasable
> > > > >> trunk? Anything to codify there?
> > > > >>
> > > > >> 3. What are the benefits of having a releasable trunk as defined
> > here?
> > > > What
> > > > >> are the costs? Is it worth pursuing? What are the alternatives
> (for
> > > > >> instance: a freeze before a release + stabilization focus by the
> > > > community
> > > > >> i.e. 4.0 push or the tock in tick-tock)?
> > > > >>
> > > > >> Given the large volumes of work coming down the pike with CEP's,
> > this
> > > > seems
> > > > >> like a good time to at least check in on this topic as a
> community.
> > > > >>
> > > > >> Full disclosure: running face-first into 60+ failing tests on
> trunk
> > > when
> > > > >> going through the commit process for denylisting this week brought
> > > this
> > > > >> topic back up for me (reminds me of when I went to merge CDC back
> in
> > > 3.6
> > > > >> and those test failures riled me up... I sense a pattern ;))
> > > > >>
> > > > >> Looking forward to hearing what people think.
> > > > >>
> > > > >> ~Josh
> > > > >>
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >
> > > >
> > >
> >
>


-- 
alex p

Re: [DISCUSS] Releasable trunk and quality

Posted by Ekaterina Dimitrova <e....@gmail.com>.

Did I hear my name? 😁
Sorry Josh, you are wrong :-) 2 out of 30 in two months were real bugs
discovered by pflaky tests and one of them was very hard to hit. So 6-7%. I
think that report I sent back then didn’t come through so the topic was
cleared in a follow up mail by Benjamin; with a lot of sweat but we kept to
the promised 4.0 standard.

Now back to this topic:
- green CI without enough test coverage is nothing more than green CI
unfortunately to me.  I know this is an elephant but I won’t sleep well
tonight if I don’t mention it.
- I believe the looping of tests mentioned by Berenguer can help for
verifying no new weird flakiness is introduced by new tests added. And of
course it helps a lot during fixing flaky tests, I think that’s clear.

 I think that it would be great if each such test
> > (or
> > > test group) was guaranteed to have a ticket and some preliminary
> analysis
> > > was done to confirm it is just a test problem before releasing the new
> > > version

Probably not bad idea. Preliminary analysis. But we need to get into the
cadence of regular checking our CI; divide and conquer on regular basis
between all of us. Not to mention it is way easier to follow up recently
introduced issues with the people who worked on stuff then trying to find
out what happened a year ago in a rush before a release. I agree it is not
about the number but what stays behind it.

Requiring all tests to run pre every merge, easily we can add this in
circle but there are many people who don’t have access to high resources so
again they won’t be able to run absolutely everything. At the end
everything is up to the diligence of the reviewers/committers. Plus
official CI is Jenkins and we know there are different infra related
failures in the different CIs. Not an easy topic, indeed. I support running
all tests, just having in mind all the related issues/complications.

I would say in my mind upgrade tests are particularly important to be green
before a release, too.

Seems to me we have the tools, but now it is time to organize the rhythm in
an efficient manner.

Best regards,
Ekaterina


On Tue, 2 Nov 2021 at 11:06, Joshua McKenzie <jm...@apache.org> wrote:

> To your point Jacek, I believe in the run up to 4.0 Ekaterina did some
> analysis and something like 18% (correct me if I'm wrong here) of the test
> failures we were considering "flaky tests" were actual product defects in
> the database. With that in mind, we should be uncomfortable cutting a
> release if we have 6 test failures since there's every likelihood one of
> them is a surfaced bug.
>
> ensuring our best practices are followed for every merge
>
> I totally agree but I also don't think we have this codified (unless I'm
> just completely missing something - very possible! ;)) Seems like we have
> different circle configs, different sets of jobs being run, Harry / Hunter
> (maybe?) / ?? run on some but not all commits and/or all branches,
> manual performance testing on specific releases but nothing surfaced
> formally to the project as a reproducible suite like we used to have years
> ago (primitive though it was at the time with what it covered).
>
> If we *don't* have this clarified right now, I think there's significant
> value in enumerating and at least documenting what our agreed upon best
> practices are so we can start holding ourselves and each other accountable
> to that bar. Given some of the incredible but sweeping work coming down the
> pike, this strikes me as a thing we need to be proactive and vigilant about
> so as not to regress.
>
> ~Josh
>
> On Tue, Nov 2, 2021 at 3:49 AM Jacek Lewandowski <
> lewandowski.jacek@gmail.com> wrote:
>
> > >
> > > we already have a way to confirm flakiness on circle by running the
> test
> > > repeatedly N times. Like 100 or 500. That has proven to work very well
> > > so far, at least for me. #collaborating #justfyi
> > >
> >
> > It does not prove that it is the test flakiness. It still can be a bug in
> > the code which occurs intermittently under some rare conditions
> >
> >
> > - - -- --- ----- -------- -------------
> > Jacek Lewandowski
> >
> >
> > On Tue, Nov 2, 2021 at 7:46 AM Berenguer Blasi <berenguerblasi@gmail.com
> >
> > wrote:
> >
> > > Hi,
> > >
> > > we already have a way to confirm flakiness on circle by running the
> test
> > > repeatedly N times. Like 100 or 500. That has proven to work very well
> > > so far, at least for me. #collaborating #justfyi
> > >
> > > On the 60+ failures it is not as bad as it looks. Let me explain. I
> have
> > > been tracking failures in 4.0 and trunk daily, it's grown as a habit in
> > > me after the 4.0 push. And 4.0 and trunk were hovering around <10
> > > failures solidly (you can check jenkins ci graphs). The random bisect
> or
> > > fix was needed leaving behind 3 or 4 tests that have defeated already 2
> > > or 3 committers, so the really tough guys. I am reasonably convinced
> > > once the 60+ failures fix merges we'll be back to the <10 failures with
> > > relative little effort.
> > >
> > > So we're just in the middle of a 'fix' but overall we shouldn't be as
> > > bad as it looks now as we've been quite good at keeping CI green-ish
> imo.
> > >
> > > Also +1 to releasable branches, which whatever we settle it means it is
> > > not a wall of failures, bc of reasons explained like the hidden costs
> etc
> > >
> > > My 2cts.
> > >
> > > On 2/11/21 6:07, Jacek Lewandowski wrote:
> > > >> I don’t think means guaranteeing there are no failing tests (though
> > > >> ideally this would also happen), but about ensuring our best
> practices
> > > are
> > > >> followed for every merge. 4.0 took so long to release because of the
> > > amount
> > > >> of hidden work that was created by merging work that didn’t meet the
> > > >> standard for release.
> > > >>
> > > > Tests are sometimes considered flaky because they fail intermittently
> > but
> > > > it may not be related to the insufficiently consistent test
> > > implementation
> > > > and can reveal some real problem in the production code. I saw that
> in
> > > > various codebases and I think that it would be great if each such
> test
> > > (or
> > > > test group) was guaranteed to have a ticket and some preliminary
> > analysis
> > > > was done to confirm it is just a test problem before releasing the
> new
> > > > version
> > > >
> > > > Historically we have also had significant pressure to backport
> features
> > > to
> > > >> earlier versions due to the cost and risk of upgrading. If we
> maintain
> > > >> broader version compatibility for upgrade, and reduce the risk of
> > > adopting
> > > >> newer versions, then this pressure is also reduced significantly.
> > Though
> > > >> perhaps we will stick to our guns here anyway, as there seems to be
> > > renewed
> > > >> pressure to limit work in GA releases to bug fixes exclusively. It
> > > remains
> > > >> to be seen if this holds.
> > > >
> > > > Are there any precise requirements for supported upgrade and
> downgrade
> > > > paths?
> > > >
> > > > Thanks
> > > > - - -- --- ----- -------- -------------
> > > > Jacek Lewandowski
> > > >
> > > >
> > > > On Sat, Oct 30, 2021 at 4:07 PM benedict@apache.org <
> > benedict@apache.org
> > > >
> > > > wrote:
> > > >
> > > >>> How do we define what "releasable trunk" means?
> > > >> For me, the major criteria is ensuring that work is not merged that
> is
> > > >> known to require follow-up work, or could reasonably have been known
> > to
> > > >> require follow-up work if better QA practices had been followed.
> > > >>
> > > >> So, a big part of this is ensuring we continue to exceed our targets
> > for
> > > >> improved QA. For me this means trying to weave tools like Harry and
> > the
> > > >> Simulator into our development workflow early on, but we’ll see how
> > well
> > > >> these tools gain broader adoption. This also means focus in general
> on
> > > >> possible negative effects of a change.
> > > >>
> > > >> I think we could do with producing guidance documentation for how to
> > > >> approach QA, where we can record our best practices and evolve them
> as
> > > we
> > > >> discover flaws or pitfalls, either for ergonomics or for bug
> > discovery.
> > > >>
> > > >>> What are the benefits of having a releasable trunk as defined here?
> > > >> If we want to have any hope of meeting reasonable release cadences
> > _and_
> > > >> the high project quality we expect today, then I think a ~shippable
> > > trunk
> > > >> policy is an absolute necessity.
> > > >>
> > > >> I don’t think means guaranteeing there are no failing tests (though
> > > >> ideally this would also happen), but about ensuring our best
> practices
> > > are
> > > >> followed for every merge. 4.0 took so long to release because of the
> > > amount
> > > >> of hidden work that was created by merging work that didn’t meet the
> > > >> standard for release.
> > > >>
> > > >> Historically we have also had significant pressure to backport
> > features
> > > to
> > > >> earlier versions due to the cost and risk of upgrading. If we
> maintain
> > > >> broader version compatibility for upgrade, and reduce the risk of
> > > adopting
> > > >> newer versions, then this pressure is also reduced significantly.
> > Though
> > > >> perhaps we will stick to our guns here anyway, as there seems to be
> > > renewed
> > > >> pressure to limit work in GA releases to bug fixes exclusively. It
> > > remains
> > > >> to be seen if this holds.
> > > >>
> > > >>> What are the costs?
> > > >> I think the costs are quite low, perhaps even negative. Hidden work
> > > >> produced by merges that break things can be much more costly than
> > > getting
> > > >> the work right first time, as attribution is much more challenging.
> > > >>
> > > >> One cost that is created, however, is for version compatibility as
> we
> > > >> cannot say “well, this is a minor version bump so we don’t need to
> > > support
> > > >> downgrade”. But I think we should be investing in this anyway for
> > > operator
> > > >> simplicity and confidence, so I actually see this as a benefit as
> > well.
> > > >>
> > > >>> Full disclosure: running face-first into 60+ failing tests on trunk
> > > >> I have to apologise here. CircleCI did not uncover these problems,
> > > >> apparently due to some way it resolves dependencies, and so I am
> > > >> responsible for a significant number of these and have been quite
> sick
> > > >> since.
> > > >>
> > > >> I think a push to eliminate flaky tests will probably help here in
> > > future,
> > > >> though, and perhaps the project needs to have some (low) threshold
> of
> > > flaky
> > > >> or failing tests at which point we block merges to force a
> correction.
> > > >>
> > > >>
> > > >> From: Joshua McKenzie <jm...@apache.org>
> > > >> Date: Saturday, 30 October 2021 at 14:00
> > > >> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> > > >> Subject: [DISCUSS] Releasable trunk and quality
> > > >> We as a project have gone back and forth on the topic of quality and
> > the
> > > >> notion of a releasable trunk for quite a few years. If people are
> > > >> interested, I'd like to rekindle this discussion a bit and see if
> > we're
> > > >> happy with where we are as a project or if we think there's steps we
> > > should
> > > >> take to change the quality bar going forward. The following
> questions
> > > have
> > > >> been rattling around for me for awhile:
> > > >>
> > > >> 1. How do we define what "releasable trunk" means? All reviewed by M
> > > >> committers? Passing N% of tests? Passing all tests plus some other
> > > metrics
> > > >> (manual testing, raising the number of reviewers, test coverage,
> usage
> > > in
> > > >> dev or QA environments, etc)? Something else entirely?
> > > >>
> > > >> 2. With a definition settled upon in #1, what steps, if any, do we
> > need
> > > to
> > > >> take to get from where we are to having *and keeping* that
> releasable
> > > >> trunk? Anything to codify there?
> > > >>
> > > >> 3. What are the benefits of having a releasable trunk as defined
> here?
> > > What
> > > >> are the costs? Is it worth pursuing? What are the alternatives (for
> > > >> instance: a freeze before a release + stabilization focus by the
> > > community
> > > >> i.e. 4.0 push or the tock in tick-tock)?
> > > >>
> > > >> Given the large volumes of work coming down the pike with CEP's,
> this
> > > seems
> > > >> like a good time to at least check in on this topic as a community.
> > > >>
> > > >> Full disclosure: running face-first into 60+ failing tests on trunk
> > when
> > > >> going through the commit process for denylisting this week brought
> > this
> > > >> topic back up for me (reminds me of when I went to merge CDC back in
> > 3.6
> > > >> and those test failures riled me up... I sense a pattern ;))
> > > >>
> > > >> Looking forward to hearing what people think.
> > > >>
> > > >> ~Josh
> > > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> > >
> >
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

To your point Jacek, I believe in the run up to 4.0 Ekaterina did some
analysis and something like 18% (correct me if I'm wrong here) of the test
failures we were considering "flaky tests" were actual product defects in
the database. With that in mind, we should be uncomfortable cutting a
release if we have 6 test failures since there's every likelihood one of
them is a surfaced bug.

ensuring our best practices are followed for every merge

I totally agree but I also don't think we have this codified (unless I'm
just completely missing something - very possible! ;)) Seems like we have
different circle configs, different sets of jobs being run, Harry / Hunter
(maybe?) / ?? run on some but not all commits and/or all branches,
manual performance testing on specific releases but nothing surfaced
formally to the project as a reproducible suite like we used to have years
ago (primitive though it was at the time with what it covered).

If we *don't* have this clarified right now, I think there's significant
value in enumerating and at least documenting what our agreed upon best
practices are so we can start holding ourselves and each other accountable
to that bar. Given some of the incredible but sweeping work coming down the
pike, this strikes me as a thing we need to be proactive and vigilant about
so as not to regress.

~Josh

On Tue, Nov 2, 2021 at 3:49 AM Jacek Lewandowski <
lewandowski.jacek@gmail.com> wrote:

> >
> > we already have a way to confirm flakiness on circle by running the test
> > repeatedly N times. Like 100 or 500. That has proven to work very well
> > so far, at least for me. #collaborating #justfyi
> >
>
> It does not prove that it is the test flakiness. It still can be a bug in
> the code which occurs intermittently under some rare conditions
>
>
> - - -- --- ----- -------- -------------
> Jacek Lewandowski
>
>
> On Tue, Nov 2, 2021 at 7:46 AM Berenguer Blasi <be...@gmail.com>
> wrote:
>
> > Hi,
> >
> > we already have a way to confirm flakiness on circle by running the test
> > repeatedly N times. Like 100 or 500. That has proven to work very well
> > so far, at least for me. #collaborating #justfyi
> >
> > On the 60+ failures it is not as bad as it looks. Let me explain. I have
> > been tracking failures in 4.0 and trunk daily, it's grown as a habit in
> > me after the 4.0 push. And 4.0 and trunk were hovering around <10
> > failures solidly (you can check jenkins ci graphs). The random bisect or
> > fix was needed leaving behind 3 or 4 tests that have defeated already 2
> > or 3 committers, so the really tough guys. I am reasonably convinced
> > once the 60+ failures fix merges we'll be back to the <10 failures with
> > relative little effort.
> >
> > So we're just in the middle of a 'fix' but overall we shouldn't be as
> > bad as it looks now as we've been quite good at keeping CI green-ish imo.
> >
> > Also +1 to releasable branches, which whatever we settle it means it is
> > not a wall of failures, bc of reasons explained like the hidden costs etc
> >
> > My 2cts.
> >
> > On 2/11/21 6:07, Jacek Lewandowski wrote:
> > >> I don’t think means guaranteeing there are no failing tests (though
> > >> ideally this would also happen), but about ensuring our best practices
> > are
> > >> followed for every merge. 4.0 took so long to release because of the
> > amount
> > >> of hidden work that was created by merging work that didn’t meet the
> > >> standard for release.
> > >>
> > > Tests are sometimes considered flaky because they fail intermittently
> but
> > > it may not be related to the insufficiently consistent test
> > implementation
> > > and can reveal some real problem in the production code. I saw that in
> > > various codebases and I think that it would be great if each such test
> > (or
> > > test group) was guaranteed to have a ticket and some preliminary
> analysis
> > > was done to confirm it is just a test problem before releasing the new
> > > version
> > >
> > > Historically we have also had significant pressure to backport features
> > to
> > >> earlier versions due to the cost and risk of upgrading. If we maintain
> > >> broader version compatibility for upgrade, and reduce the risk of
> > adopting
> > >> newer versions, then this pressure is also reduced significantly.
> Though
> > >> perhaps we will stick to our guns here anyway, as there seems to be
> > renewed
> > >> pressure to limit work in GA releases to bug fixes exclusively. It
> > remains
> > >> to be seen if this holds.
> > >
> > > Are there any precise requirements for supported upgrade and downgrade
> > > paths?
> > >
> > > Thanks
> > > - - -- --- ----- -------- -------------
> > > Jacek Lewandowski
> > >
> > >
> > > On Sat, Oct 30, 2021 at 4:07 PM benedict@apache.org <
> benedict@apache.org
> > >
> > > wrote:
> > >
> > >>> How do we define what "releasable trunk" means?
> > >> For me, the major criteria is ensuring that work is not merged that is
> > >> known to require follow-up work, or could reasonably have been known
> to
> > >> require follow-up work if better QA practices had been followed.
> > >>
> > >> So, a big part of this is ensuring we continue to exceed our targets
> for
> > >> improved QA. For me this means trying to weave tools like Harry and
> the
> > >> Simulator into our development workflow early on, but we’ll see how
> well
> > >> these tools gain broader adoption. This also means focus in general on
> > >> possible negative effects of a change.
> > >>
> > >> I think we could do with producing guidance documentation for how to
> > >> approach QA, where we can record our best practices and evolve them as
> > we
> > >> discover flaws or pitfalls, either for ergonomics or for bug
> discovery.
> > >>
> > >>> What are the benefits of having a releasable trunk as defined here?
> > >> If we want to have any hope of meeting reasonable release cadences
> _and_
> > >> the high project quality we expect today, then I think a ~shippable
> > trunk
> > >> policy is an absolute necessity.
> > >>
> > >> I don’t think means guaranteeing there are no failing tests (though
> > >> ideally this would also happen), but about ensuring our best practices
> > are
> > >> followed for every merge. 4.0 took so long to release because of the
> > amount
> > >> of hidden work that was created by merging work that didn’t meet the
> > >> standard for release.
> > >>
> > >> Historically we have also had significant pressure to backport
> features
> > to
> > >> earlier versions due to the cost and risk of upgrading. If we maintain
> > >> broader version compatibility for upgrade, and reduce the risk of
> > adopting
> > >> newer versions, then this pressure is also reduced significantly.
> Though
> > >> perhaps we will stick to our guns here anyway, as there seems to be
> > renewed
> > >> pressure to limit work in GA releases to bug fixes exclusively. It
> > remains
> > >> to be seen if this holds.
> > >>
> > >>> What are the costs?
> > >> I think the costs are quite low, perhaps even negative. Hidden work
> > >> produced by merges that break things can be much more costly than
> > getting
> > >> the work right first time, as attribution is much more challenging.
> > >>
> > >> One cost that is created, however, is for version compatibility as we
> > >> cannot say “well, this is a minor version bump so we don’t need to
> > support
> > >> downgrade”. But I think we should be investing in this anyway for
> > operator
> > >> simplicity and confidence, so I actually see this as a benefit as
> well.
> > >>
> > >>> Full disclosure: running face-first into 60+ failing tests on trunk
> > >> I have to apologise here. CircleCI did not uncover these problems,
> > >> apparently due to some way it resolves dependencies, and so I am
> > >> responsible for a significant number of these and have been quite sick
> > >> since.
> > >>
> > >> I think a push to eliminate flaky tests will probably help here in
> > future,
> > >> though, and perhaps the project needs to have some (low) threshold of
> > flaky
> > >> or failing tests at which point we block merges to force a correction.
> > >>
> > >>
> > >> From: Joshua McKenzie <jm...@apache.org>
> > >> Date: Saturday, 30 October 2021 at 14:00
> > >> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> > >> Subject: [DISCUSS] Releasable trunk and quality
> > >> We as a project have gone back and forth on the topic of quality and
> the
> > >> notion of a releasable trunk for quite a few years. If people are
> > >> interested, I'd like to rekindle this discussion a bit and see if
> we're
> > >> happy with where we are as a project or if we think there's steps we
> > should
> > >> take to change the quality bar going forward. The following questions
> > have
> > >> been rattling around for me for awhile:
> > >>
> > >> 1. How do we define what "releasable trunk" means? All reviewed by M
> > >> committers? Passing N% of tests? Passing all tests plus some other
> > metrics
> > >> (manual testing, raising the number of reviewers, test coverage, usage
> > in
> > >> dev or QA environments, etc)? Something else entirely?
> > >>
> > >> 2. With a definition settled upon in #1, what steps, if any, do we
> need
> > to
> > >> take to get from where we are to having *and keeping* that releasable
> > >> trunk? Anything to codify there?
> > >>
> > >> 3. What are the benefits of having a releasable trunk as defined here?
> > What
> > >> are the costs? Is it worth pursuing? What are the alternatives (for
> > >> instance: a freeze before a release + stabilization focus by the
> > community
> > >> i.e. 4.0 push or the tock in tick-tock)?
> > >>
> > >> Given the large volumes of work coming down the pike with CEP's, this
> > seems
> > >> like a good time to at least check in on this topic as a community.
> > >>
> > >> Full disclosure: running face-first into 60+ failing tests on trunk
> when
> > >> going through the commit process for denylisting this week brought
> this
> > >> topic back up for me (reminds me of when I went to merge CDC back in
> 3.6
> > >> and those test failures riled me up... I sense a pattern ;))
> > >>
> > >> Looking forward to hearing what people think.
> > >>
> > >> ~Josh
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Jacek Lewandowski <le...@gmail.com>.

>
> we already have a way to confirm flakiness on circle by running the test
> repeatedly N times. Like 100 or 500. That has proven to work very well
> so far, at least for me. #collaborating #justfyi
>

It does not prove that it is the test flakiness. It still can be a bug in
the code which occurs intermittently under some rare conditions


- - -- --- ----- -------- -------------
Jacek Lewandowski


On Tue, Nov 2, 2021 at 7:46 AM Berenguer Blasi <be...@gmail.com>
wrote:

> Hi,
>
> we already have a way to confirm flakiness on circle by running the test
> repeatedly N times. Like 100 or 500. That has proven to work very well
> so far, at least for me. #collaborating #justfyi
>
> On the 60+ failures it is not as bad as it looks. Let me explain. I have
> been tracking failures in 4.0 and trunk daily, it's grown as a habit in
> me after the 4.0 push. And 4.0 and trunk were hovering around <10
> failures solidly (you can check jenkins ci graphs). The random bisect or
> fix was needed leaving behind 3 or 4 tests that have defeated already 2
> or 3 committers, so the really tough guys. I am reasonably convinced
> once the 60+ failures fix merges we'll be back to the <10 failures with
> relative little effort.
>
> So we're just in the middle of a 'fix' but overall we shouldn't be as
> bad as it looks now as we've been quite good at keeping CI green-ish imo.
>
> Also +1 to releasable branches, which whatever we settle it means it is
> not a wall of failures, bc of reasons explained like the hidden costs etc
>
> My 2cts.
>
> On 2/11/21 6:07, Jacek Lewandowski wrote:
> >> I don’t think means guaranteeing there are no failing tests (though
> >> ideally this would also happen), but about ensuring our best practices
> are
> >> followed for every merge. 4.0 took so long to release because of the
> amount
> >> of hidden work that was created by merging work that didn’t meet the
> >> standard for release.
> >>
> > Tests are sometimes considered flaky because they fail intermittently but
> > it may not be related to the insufficiently consistent test
> implementation
> > and can reveal some real problem in the production code. I saw that in
> > various codebases and I think that it would be great if each such test
> (or
> > test group) was guaranteed to have a ticket and some preliminary analysis
> > was done to confirm it is just a test problem before releasing the new
> > version
> >
> > Historically we have also had significant pressure to backport features
> to
> >> earlier versions due to the cost and risk of upgrading. If we maintain
> >> broader version compatibility for upgrade, and reduce the risk of
> adopting
> >> newer versions, then this pressure is also reduced significantly. Though
> >> perhaps we will stick to our guns here anyway, as there seems to be
> renewed
> >> pressure to limit work in GA releases to bug fixes exclusively. It
> remains
> >> to be seen if this holds.
> >
> > Are there any precise requirements for supported upgrade and downgrade
> > paths?
> >
> > Thanks
> > - - -- --- ----- -------- -------------
> > Jacek Lewandowski
> >
> >
> > On Sat, Oct 30, 2021 at 4:07 PM benedict@apache.org <benedict@apache.org
> >
> > wrote:
> >
> >>> How do we define what "releasable trunk" means?
> >> For me, the major criteria is ensuring that work is not merged that is
> >> known to require follow-up work, or could reasonably have been known to
> >> require follow-up work if better QA practices had been followed.
> >>
> >> So, a big part of this is ensuring we continue to exceed our targets for
> >> improved QA. For me this means trying to weave tools like Harry and the
> >> Simulator into our development workflow early on, but we’ll see how well
> >> these tools gain broader adoption. This also means focus in general on
> >> possible negative effects of a change.
> >>
> >> I think we could do with producing guidance documentation for how to
> >> approach QA, where we can record our best practices and evolve them as
> we
> >> discover flaws or pitfalls, either for ergonomics or for bug discovery.
> >>
> >>> What are the benefits of having a releasable trunk as defined here?
> >> If we want to have any hope of meeting reasonable release cadences _and_
> >> the high project quality we expect today, then I think a ~shippable
> trunk
> >> policy is an absolute necessity.
> >>
> >> I don’t think means guaranteeing there are no failing tests (though
> >> ideally this would also happen), but about ensuring our best practices
> are
> >> followed for every merge. 4.0 took so long to release because of the
> amount
> >> of hidden work that was created by merging work that didn’t meet the
> >> standard for release.
> >>
> >> Historically we have also had significant pressure to backport features
> to
> >> earlier versions due to the cost and risk of upgrading. If we maintain
> >> broader version compatibility for upgrade, and reduce the risk of
> adopting
> >> newer versions, then this pressure is also reduced significantly. Though
> >> perhaps we will stick to our guns here anyway, as there seems to be
> renewed
> >> pressure to limit work in GA releases to bug fixes exclusively. It
> remains
> >> to be seen if this holds.
> >>
> >>> What are the costs?
> >> I think the costs are quite low, perhaps even negative. Hidden work
> >> produced by merges that break things can be much more costly than
> getting
> >> the work right first time, as attribution is much more challenging.
> >>
> >> One cost that is created, however, is for version compatibility as we
> >> cannot say “well, this is a minor version bump so we don’t need to
> support
> >> downgrade”. But I think we should be investing in this anyway for
> operator
> >> simplicity and confidence, so I actually see this as a benefit as well.
> >>
> >>> Full disclosure: running face-first into 60+ failing tests on trunk
> >> I have to apologise here. CircleCI did not uncover these problems,
> >> apparently due to some way it resolves dependencies, and so I am
> >> responsible for a significant number of these and have been quite sick
> >> since.
> >>
> >> I think a push to eliminate flaky tests will probably help here in
> future,
> >> though, and perhaps the project needs to have some (low) threshold of
> flaky
> >> or failing tests at which point we block merges to force a correction.
> >>
> >>
> >> From: Joshua McKenzie <jm...@apache.org>
> >> Date: Saturday, 30 October 2021 at 14:00
> >> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> >> Subject: [DISCUSS] Releasable trunk and quality
> >> We as a project have gone back and forth on the topic of quality and the
> >> notion of a releasable trunk for quite a few years. If people are
> >> interested, I'd like to rekindle this discussion a bit and see if we're
> >> happy with where we are as a project or if we think there's steps we
> should
> >> take to change the quality bar going forward. The following questions
> have
> >> been rattling around for me for awhile:
> >>
> >> 1. How do we define what "releasable trunk" means? All reviewed by M
> >> committers? Passing N% of tests? Passing all tests plus some other
> metrics
> >> (manual testing, raising the number of reviewers, test coverage, usage
> in
> >> dev or QA environments, etc)? Something else entirely?
> >>
> >> 2. With a definition settled upon in #1, what steps, if any, do we need
> to
> >> take to get from where we are to having *and keeping* that releasable
> >> trunk? Anything to codify there?
> >>
> >> 3. What are the benefits of having a releasable trunk as defined here?
> What
> >> are the costs? Is it worth pursuing? What are the alternatives (for
> >> instance: a freeze before a release + stabilization focus by the
> community
> >> i.e. 4.0 push or the tock in tick-tock)?
> >>
> >> Given the large volumes of work coming down the pike with CEP's, this
> seems
> >> like a good time to at least check in on this topic as a community.
> >>
> >> Full disclosure: running face-first into 60+ failing tests on trunk when
> >> going through the commit process for denylisting this week brought this
> >> topic back up for me (reminds me of when I went to merge CDC back in 3.6
> >> and those test failures riled me up... I sense a pattern ;))
> >>
> >> Looking forward to hearing what people think.
> >>
> >> ~Josh
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Berenguer Blasi <be...@gmail.com>.

Hi,

we already have a way to confirm flakiness on circle by running the test
repeatedly N times. Like 100 or 500. That has proven to work very well
so far, at least for me. #collaborating #justfyi

On the 60+ failures it is not as bad as it looks. Let me explain. I have
been tracking failures in 4.0 and trunk daily, it's grown as a habit in
me after the 4.0 push. And 4.0 and trunk were hovering around <10
failures solidly (you can check jenkins ci graphs). The random bisect or
fix was needed leaving behind 3 or 4 tests that have defeated already 2
or 3 committers, so the really tough guys. I am reasonably convinced
once the 60+ failures fix merges we'll be back to the <10 failures with
relative little effort.

So we're just in the middle of a 'fix' but overall we shouldn't be as
bad as it looks now as we've been quite good at keeping CI green-ish imo.

Also +1 to releasable branches, which whatever we settle it means it is
not a wall of failures, bc of reasons explained like the hidden costs etc

My 2cts.

On 2/11/21 6:07, Jacek Lewandowski wrote:
>> I don’t think means guaranteeing there are no failing tests (though
>> ideally this would also happen), but about ensuring our best practices are
>> followed for every merge. 4.0 took so long to release because of the amount
>> of hidden work that was created by merging work that didn’t meet the
>> standard for release.
>>
> Tests are sometimes considered flaky because they fail intermittently but
> it may not be related to the insufficiently consistent test implementation
> and can reveal some real problem in the production code. I saw that in
> various codebases and I think that it would be great if each such test (or
> test group) was guaranteed to have a ticket and some preliminary analysis
> was done to confirm it is just a test problem before releasing the new
> version
>
> Historically we have also had significant pressure to backport features to
>> earlier versions due to the cost and risk of upgrading. If we maintain
>> broader version compatibility for upgrade, and reduce the risk of adopting
>> newer versions, then this pressure is also reduced significantly. Though
>> perhaps we will stick to our guns here anyway, as there seems to be renewed
>> pressure to limit work in GA releases to bug fixes exclusively. It remains
>> to be seen if this holds.
>
> Are there any precise requirements for supported upgrade and downgrade
> paths?
>
> Thanks
> - - -- --- ----- -------- -------------
> Jacek Lewandowski
>
>
> On Sat, Oct 30, 2021 at 4:07 PM benedict@apache.org <be...@apache.org>
> wrote:
>
>>> How do we define what "releasable trunk" means?
>> For me, the major criteria is ensuring that work is not merged that is
>> known to require follow-up work, or could reasonably have been known to
>> require follow-up work if better QA practices had been followed.
>>
>> So, a big part of this is ensuring we continue to exceed our targets for
>> improved QA. For me this means trying to weave tools like Harry and the
>> Simulator into our development workflow early on, but we’ll see how well
>> these tools gain broader adoption. This also means focus in general on
>> possible negative effects of a change.
>>
>> I think we could do with producing guidance documentation for how to
>> approach QA, where we can record our best practices and evolve them as we
>> discover flaws or pitfalls, either for ergonomics or for bug discovery.
>>
>>> What are the benefits of having a releasable trunk as defined here?
>> If we want to have any hope of meeting reasonable release cadences _and_
>> the high project quality we expect today, then I think a ~shippable trunk
>> policy is an absolute necessity.
>>
>> I don’t think means guaranteeing there are no failing tests (though
>> ideally this would also happen), but about ensuring our best practices are
>> followed for every merge. 4.0 took so long to release because of the amount
>> of hidden work that was created by merging work that didn’t meet the
>> standard for release.
>>
>> Historically we have also had significant pressure to backport features to
>> earlier versions due to the cost and risk of upgrading. If we maintain
>> broader version compatibility for upgrade, and reduce the risk of adopting
>> newer versions, then this pressure is also reduced significantly. Though
>> perhaps we will stick to our guns here anyway, as there seems to be renewed
>> pressure to limit work in GA releases to bug fixes exclusively. It remains
>> to be seen if this holds.
>>
>>> What are the costs?
>> I think the costs are quite low, perhaps even negative. Hidden work
>> produced by merges that break things can be much more costly than getting
>> the work right first time, as attribution is much more challenging.
>>
>> One cost that is created, however, is for version compatibility as we
>> cannot say “well, this is a minor version bump so we don’t need to support
>> downgrade”. But I think we should be investing in this anyway for operator
>> simplicity and confidence, so I actually see this as a benefit as well.
>>
>>> Full disclosure: running face-first into 60+ failing tests on trunk
>> I have to apologise here. CircleCI did not uncover these problems,
>> apparently due to some way it resolves dependencies, and so I am
>> responsible for a significant number of these and have been quite sick
>> since.
>>
>> I think a push to eliminate flaky tests will probably help here in future,
>> though, and perhaps the project needs to have some (low) threshold of flaky
>> or failing tests at which point we block merges to force a correction.
>>
>>
>> From: Joshua McKenzie <jm...@apache.org>
>> Date: Saturday, 30 October 2021 at 14:00
>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>> Subject: [DISCUSS] Releasable trunk and quality
>> We as a project have gone back and forth on the topic of quality and the
>> notion of a releasable trunk for quite a few years. If people are
>> interested, I'd like to rekindle this discussion a bit and see if we're
>> happy with where we are as a project or if we think there's steps we should
>> take to change the quality bar going forward. The following questions have
>> been rattling around for me for awhile:
>>
>> 1. How do we define what "releasable trunk" means? All reviewed by M
>> committers? Passing N% of tests? Passing all tests plus some other metrics
>> (manual testing, raising the number of reviewers, test coverage, usage in
>> dev or QA environments, etc)? Something else entirely?
>>
>> 2. With a definition settled upon in #1, what steps, if any, do we need to
>> take to get from where we are to having *and keeping* that releasable
>> trunk? Anything to codify there?
>>
>> 3. What are the benefits of having a releasable trunk as defined here? What
>> are the costs? Is it worth pursuing? What are the alternatives (for
>> instance: a freeze before a release + stabilization focus by the community
>> i.e. 4.0 push or the tock in tick-tock)?
>>
>> Given the large volumes of work coming down the pike with CEP's, this seems
>> like a good time to at least check in on this topic as a community.
>>
>> Full disclosure: running face-first into 60+ failing tests on trunk when
>> going through the commit process for denylisting this week brought this
>> topic back up for me (reminds me of when I went to merge CDC back in 3.6
>> and those test failures riled me up... I sense a pattern ;))
>>
>> Looking forward to hearing what people think.
>>
>> ~Josh
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by Jacek Lewandowski <le...@gmail.com>.

>
> I don’t think means guaranteeing there are no failing tests (though
> ideally this would also happen), but about ensuring our best practices are
> followed for every merge. 4.0 took so long to release because of the amount
> of hidden work that was created by merging work that didn’t meet the
> standard for release.
>

Tests are sometimes considered flaky because they fail intermittently but
it may not be related to the insufficiently consistent test implementation
and can reveal some real problem in the production code. I saw that in
various codebases and I think that it would be great if each such test (or
test group) was guaranteed to have a ticket and some preliminary analysis
was done to confirm it is just a test problem before releasing the new
version

Historically we have also had significant pressure to backport features to
> earlier versions due to the cost and risk of upgrading. If we maintain
> broader version compatibility for upgrade, and reduce the risk of adopting
> newer versions, then this pressure is also reduced significantly. Though
> perhaps we will stick to our guns here anyway, as there seems to be renewed
> pressure to limit work in GA releases to bug fixes exclusively. It remains
> to be seen if this holds.


Are there any precise requirements for supported upgrade and downgrade
paths?

Thanks
- - -- --- ----- -------- -------------
Jacek Lewandowski


On Sat, Oct 30, 2021 at 4:07 PM benedict@apache.org <be...@apache.org>
wrote:

> > How do we define what "releasable trunk" means?
>
> For me, the major criteria is ensuring that work is not merged that is
> known to require follow-up work, or could reasonably have been known to
> require follow-up work if better QA practices had been followed.
>
> So, a big part of this is ensuring we continue to exceed our targets for
> improved QA. For me this means trying to weave tools like Harry and the
> Simulator into our development workflow early on, but we’ll see how well
> these tools gain broader adoption. This also means focus in general on
> possible negative effects of a change.
>
> I think we could do with producing guidance documentation for how to
> approach QA, where we can record our best practices and evolve them as we
> discover flaws or pitfalls, either for ergonomics or for bug discovery.
>
> > What are the benefits of having a releasable trunk as defined here?
>
> If we want to have any hope of meeting reasonable release cadences _and_
> the high project quality we expect today, then I think a ~shippable trunk
> policy is an absolute necessity.
>
> I don’t think means guaranteeing there are no failing tests (though
> ideally this would also happen), but about ensuring our best practices are
> followed for every merge. 4.0 took so long to release because of the amount
> of hidden work that was created by merging work that didn’t meet the
> standard for release.
>
> Historically we have also had significant pressure to backport features to
> earlier versions due to the cost and risk of upgrading. If we maintain
> broader version compatibility for upgrade, and reduce the risk of adopting
> newer versions, then this pressure is also reduced significantly. Though
> perhaps we will stick to our guns here anyway, as there seems to be renewed
> pressure to limit work in GA releases to bug fixes exclusively. It remains
> to be seen if this holds.
>
> > What are the costs?
>
> I think the costs are quite low, perhaps even negative. Hidden work
> produced by merges that break things can be much more costly than getting
> the work right first time, as attribution is much more challenging.
>
> One cost that is created, however, is for version compatibility as we
> cannot say “well, this is a minor version bump so we don’t need to support
> downgrade”. But I think we should be investing in this anyway for operator
> simplicity and confidence, so I actually see this as a benefit as well.
>
> > Full disclosure: running face-first into 60+ failing tests on trunk
>
> I have to apologise here. CircleCI did not uncover these problems,
> apparently due to some way it resolves dependencies, and so I am
> responsible for a significant number of these and have been quite sick
> since.
>
> I think a push to eliminate flaky tests will probably help here in future,
> though, and perhaps the project needs to have some (low) threshold of flaky
> or failing tests at which point we block merges to force a correction.
>
>
> From: Joshua McKenzie <jm...@apache.org>
> Date: Saturday, 30 October 2021 at 14:00
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: [DISCUSS] Releasable trunk and quality
> We as a project have gone back and forth on the topic of quality and the
> notion of a releasable trunk for quite a few years. If people are
> interested, I'd like to rekindle this discussion a bit and see if we're
> happy with where we are as a project or if we think there's steps we should
> take to change the quality bar going forward. The following questions have
> been rattling around for me for awhile:
>
> 1. How do we define what "releasable trunk" means? All reviewed by M
> committers? Passing N% of tests? Passing all tests plus some other metrics
> (manual testing, raising the number of reviewers, test coverage, usage in
> dev or QA environments, etc)? Something else entirely?
>
> 2. With a definition settled upon in #1, what steps, if any, do we need to
> take to get from where we are to having *and keeping* that releasable
> trunk? Anything to codify there?
>
> 3. What are the benefits of having a releasable trunk as defined here? What
> are the costs? Is it worth pursuing? What are the alternatives (for
> instance: a freeze before a release + stabilization focus by the community
> i.e. 4.0 push or the tock in tick-tock)?
>
> Given the large volumes of work coming down the pike with CEP's, this seems
> like a good time to at least check in on this topic as a community.
>
> Full disclosure: running face-first into 60+ failing tests on trunk when
> going through the commit process for denylisting this week brought this
> topic back up for me (reminds me of when I went to merge CDC back in 3.6
> and those test failures riled me up... I sense a pattern ;))
>
> Looking forward to hearing what people think.
>
> ~Josh
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Ekaterina Dimitrova <e....@gmail.com>.

Thank you Josh.

“I think it would be helpful if we always ran the repeated test jobs at
CircleCI when we add a new test or modify an existing one. Running those
jobs, when applicable, could be a requirement before committing. This
wouldn't help us when the changes affect many different tests or we are not
able to identify the tests affected by our changes, but I think it could
have prevented many of the recently fixed flakies.“

I think I would love also to see the verification with running new tests in
a loop before adding them to the code happening more often. It was
mentioned by a few of us in this discussion as a good method we already use
successfully so I just wanted to mention it again so it doesn’t slip out of
the list. :-)

Happy weekend everyone!

Best regards,
Ekaterina


On Fri, 5 Nov 2021 at 11:30, Joshua McKenzie <jm...@apache.org> wrote:

> To checkpoint this conversation and keep it going, the ideas I see
> in-thread (light editorializing by me):
> 1. Blocking PR merge on CI being green (viable for single branch commits,
> less so for multiple)
> 2. A change in our expected culture of "if you see something, fix
> something" when it comes to test failures on a branch (requires stable
> green test board to be viable)
> 3. Clearer merge criteria and potentially updates to circle config for
> committers in terms of "which test suites need to be run" (notably,
> including upgrade tests)
> 4. Integration of model and property based fuzz testing into the release
> qualification pipeline at least
> 5. Improvements in project dependency management, most notably in-jvm dtest
> API's, and the release process around that
>
> So a) Am I missing anything, and b) Am I getting anything wrong in the
> summary above?
>
> On Thu, Nov 4, 2021 at 9:01 AM Andrés de la Peña <ad...@apache.org>
> wrote:
>
> > Hi all,
> >
> > we already have a way to confirm flakiness on circle by running the test
> > > repeatedly N times. Like 100 or 500. That has proven to work very well
> > > so far, at least for me. #collaborating #justfyi
> >
> >
> > I think it would be helpful if we always ran the repeated test jobs at
> > CircleCI when we add a new test or modify an existing one. Running those
> > jobs, when applicable, could be a requirement before committing. This
> > wouldn't help us when the changes affect many different tests or we are
> not
> > able to identify the tests affected by our changes, but I think it could
> > have prevented many of the recently fixed flakies.
> >
> >
> > On Thu, 4 Nov 2021 at 12:24, Joshua McKenzie <jm...@apache.org>
> wrote:
> >
> > > >
> > > > we noticed CI going from a
> > > > steady 3-ish failures to many and it's getting fixed. So we're moving
> > in
> > > > the right direction imo.
> > > >
> > > An observation about this: there's tooling and technology widely in use
> > to
> > > help prevent ever getting into this state (to Benedict's point:
> blocking
> > > merge on CI failure, or nightly tests and reverting regression commits,
> > > etc). I think there's significant time and energy savings for us in
> using
> > > automation to be proactive about the quality of our test boards rather
> > than
> > > reactive.
> > >
> > > I 100% agree that it's heartening to see that the quality of the
> codebase
> > > is improving as is the discipline / attentiveness of our collective
> > > culture. That said, I believe we still have a pretty fragile system
> when
> > it
> > > comes to test failure accumulation.
> > >
> > > On Thu, Nov 4, 2021 at 2:46 AM Berenguer Blasi <
> berenguerblasi@gmail.com
> > >
> > > wrote:
> > >
> > > > I agree with David. CI has been pretty reliable besides the random
> > > > jenkins going down or timeout. The same 3 or 4 tests were the only
> > flaky
> > > > ones in jenkins and Circle was very green. I bisected a couple
> failures
> > > > to legit code errors, David is fixing some more, others have as well,
> > etc
> > > >
> > > > It is good news imo as we're just getting to learn our CI post 4.0 is
> > > > reliable and we need to start treating it as so and paying attention
> to
> > > > it's reports. Not perfect but reliable enough it would have prevented
> > > > those bugs getting merged.
> > > >
> > > > In fact we're having this conversation bc we noticed CI going from a
> > > > steady 3-ish failures to many and it's getting fixed. So we're moving
> > in
> > > > the right direction imo.
> > > >
> > > > On 3/11/21 19:25, David Capwell wrote:
> > > > >> It’s hard to gate commit on a clean CI run when there’s flaky
> tests
> > > > > I agree, this is also why so much effort was done in 4.0 release to
> > > > remove as much as possible.  Just over 1 month ago we were not really
> > > > having a flaky test issue (outside of the sporadic timeout issues; my
> > > > circle ci runs were green constantly), and now the “flaky tests” I
> see
> > > are
> > > > all actual bugs (been root causing 2 out of the 3 I reported) and
> some
> > > (not
> > > > all) of the flakyness was triggered by recent changes in the past
> > month.
> > > > >
> > > > > Right now people do not believe the failing test is caused by their
> > > > patch and attribute to flakiness, which then causes the builds to
> start
> > > > being flaky, which then leads to a different author coming to fix the
> > > > issue; this behavior is what I would love to see go away.  If we
> find a
> > > > flaky test, we should do the following
> > > > >
> > > > > 1) has it already been reported and who is working to fix?  Can we
> > > block
> > > > this patch on the test being fixed?  Flaky tests due to timing issues
> > > > normally are resolved very quickly, real bugs take longer.
> > > > > 2) if not reported, why?  If you are the first to see this issue
> than
> > > > good chance the patch caused the issue so should root cause.  If you
> > are
> > > > not the first to see it, why did others not report it (we tend to be
> > good
> > > > about this, even to the point Brandon has to mark the new tickets as
> > > dups…)?
> > > > >
> > > > > I have committed when there were flakiness, and I have caused
> > > flakiness;
> > > > not saying I am perfect or that I do the above, just saying that if
> we
> > > all
> > > > moved to the above model we could start relying on CI.  The biggest
> > > impact
> > > > to our stability is people actually root causing flaky tests.
> > > > >
> > > > >>  I think we're going to need a system that
> > > > >> understands the difference between success, failure, and timeouts
> > > > >
> > > > > I am curious how this system can know that the timeout is not an
> > actual
> > > > failure.  There was a bug in 4.0 with time serialization in message,
> > > which
> > > > would cause the message to get dropped; this presented itself as a
> > > timeout
> > > > if I remember properly (Jon Meredith or Yifan Cai fixed this bug I
> > > believe).
> > > > >
> > > > >> On Nov 3, 2021, at 10:56 AM, Brandon Williams <dr...@gmail.com>
> > > wrote:
> > > > >>
> > > > >> On Wed, Nov 3, 2021 at 12:35 PM benedict@apache.org <
> > > > benedict@apache.org> wrote:
> > > > >>> The largest number of test failures turn out (as pointed out by
> > > David)
> > > > to be due to how arcane it was to trigger the full test suite.
> > Hopefully
> > > we
> > > > can get on top of that, but I think a significant remaining issue is
> a
> > > lack
> > > > of trust in the output of CI. It’s hard to gate commit on a clean CI
> > run
> > > > when there’s flaky tests, and it doesn’t take much to misattribute
> one
> > > > failing test to the existing flakiness (I tend to compare to a run of
> > the
> > > > trunk baseline for comparison, but this is burdensome and still error
> > > > prone). The more flaky tests there are the more likely this is.
> > > > >>>
> > > > >>> This is in my opinion the real cost of flaky tests, and it’s
> > probably
> > > > worth trying to crack down on them hard if we can. It’s possible the
> > > > Simulator may help here, when I finally finish it up, as we can port
> > > flaky
> > > > tests to run with the Simulator and the failing seed can then be
> > explored
> > > > deterministically (all being well).
> > > > >> I totally agree that the lack of trust is a driving problem here,
> > even
> > > > >> in knowing which CI system to rely on. When Jenkins broke but
> Circle
> > > > >> was fine, we all assumed it was a problem with Jenkins, right up
> > until
> > > > >> Circle also broke.
> > > > >>
> > > > >> In testing a distributed system like this I think we're always
> going
> > > > >> to have failures, even on non-flaky tests, simply because the
> > > > >> underlying infrastructure is variable with transient failures of
> its
> > > > >> own (the network is reliable!)  We can fix the flakies where the
> > fault
> > > > >> is in the code (and we've done this to many already) but to get
> more
> > > > >> trustworthy output, I think we're going to need a system that
> > > > >> understands the difference between success, failure, and timeouts,
> > and
> > > > >> in the latter case knows how to at least mark them differently.
> > > > >> Simulator may help, as do the in-jvm dtests, but there is
> ultimately
> > > > >> no way to cover everything without doing some things the hard,
> more
> > > > >> realistic way where sometimes shit happens, marring the
> > almost-perfect
> > > > >> runs with noisy doubt, which then has to be sifted through to
> > > > >> determine if there was a real issue.
> > > > >>
> > > > >>
> > ---------------------------------------------------------------------
> > > > >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > > > >>
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

To checkpoint this conversation and keep it going, the ideas I see
in-thread (light editorializing by me):
1. Blocking PR merge on CI being green (viable for single branch commits,
less so for multiple)
2. A change in our expected culture of "if you see something, fix
something" when it comes to test failures on a branch (requires stable
green test board to be viable)
3. Clearer merge criteria and potentially updates to circle config for
committers in terms of "which test suites need to be run" (notably,
including upgrade tests)
4. Integration of model and property based fuzz testing into the release
qualification pipeline at least
5. Improvements in project dependency management, most notably in-jvm dtest
API's, and the release process around that

So a) Am I missing anything, and b) Am I getting anything wrong in the
summary above?

On Thu, Nov 4, 2021 at 9:01 AM Andrés de la Peña <ad...@apache.org>
wrote:

> Hi all,
>
> we already have a way to confirm flakiness on circle by running the test
> > repeatedly N times. Like 100 or 500. That has proven to work very well
> > so far, at least for me. #collaborating #justfyi
>
>
> I think it would be helpful if we always ran the repeated test jobs at
> CircleCI when we add a new test or modify an existing one. Running those
> jobs, when applicable, could be a requirement before committing. This
> wouldn't help us when the changes affect many different tests or we are not
> able to identify the tests affected by our changes, but I think it could
> have prevented many of the recently fixed flakies.
>
>
> On Thu, 4 Nov 2021 at 12:24, Joshua McKenzie <jm...@apache.org> wrote:
>
> > >
> > > we noticed CI going from a
> > > steady 3-ish failures to many and it's getting fixed. So we're moving
> in
> > > the right direction imo.
> > >
> > An observation about this: there's tooling and technology widely in use
> to
> > help prevent ever getting into this state (to Benedict's point: blocking
> > merge on CI failure, or nightly tests and reverting regression commits,
> > etc). I think there's significant time and energy savings for us in using
> > automation to be proactive about the quality of our test boards rather
> than
> > reactive.
> >
> > I 100% agree that it's heartening to see that the quality of the codebase
> > is improving as is the discipline / attentiveness of our collective
> > culture. That said, I believe we still have a pretty fragile system when
> it
> > comes to test failure accumulation.
> >
> > On Thu, Nov 4, 2021 at 2:46 AM Berenguer Blasi <berenguerblasi@gmail.com
> >
> > wrote:
> >
> > > I agree with David. CI has been pretty reliable besides the random
> > > jenkins going down or timeout. The same 3 or 4 tests were the only
> flaky
> > > ones in jenkins and Circle was very green. I bisected a couple failures
> > > to legit code errors, David is fixing some more, others have as well,
> etc
> > >
> > > It is good news imo as we're just getting to learn our CI post 4.0 is
> > > reliable and we need to start treating it as so and paying attention to
> > > it's reports. Not perfect but reliable enough it would have prevented
> > > those bugs getting merged.
> > >
> > > In fact we're having this conversation bc we noticed CI going from a
> > > steady 3-ish failures to many and it's getting fixed. So we're moving
> in
> > > the right direction imo.
> > >
> > > On 3/11/21 19:25, David Capwell wrote:
> > > >> It’s hard to gate commit on a clean CI run when there’s flaky tests
> > > > I agree, this is also why so much effort was done in 4.0 release to
> > > remove as much as possible.  Just over 1 month ago we were not really
> > > having a flaky test issue (outside of the sporadic timeout issues; my
> > > circle ci runs were green constantly), and now the “flaky tests” I see
> > are
> > > all actual bugs (been root causing 2 out of the 3 I reported) and some
> > (not
> > > all) of the flakyness was triggered by recent changes in the past
> month.
> > > >
> > > > Right now people do not believe the failing test is caused by their
> > > patch and attribute to flakiness, which then causes the builds to start
> > > being flaky, which then leads to a different author coming to fix the
> > > issue; this behavior is what I would love to see go away.  If we find a
> > > flaky test, we should do the following
> > > >
> > > > 1) has it already been reported and who is working to fix?  Can we
> > block
> > > this patch on the test being fixed?  Flaky tests due to timing issues
> > > normally are resolved very quickly, real bugs take longer.
> > > > 2) if not reported, why?  If you are the first to see this issue than
> > > good chance the patch caused the issue so should root cause.  If you
> are
> > > not the first to see it, why did others not report it (we tend to be
> good
> > > about this, even to the point Brandon has to mark the new tickets as
> > dups…)?
> > > >
> > > > I have committed when there were flakiness, and I have caused
> > flakiness;
> > > not saying I am perfect or that I do the above, just saying that if we
> > all
> > > moved to the above model we could start relying on CI.  The biggest
> > impact
> > > to our stability is people actually root causing flaky tests.
> > > >
> > > >>  I think we're going to need a system that
> > > >> understands the difference between success, failure, and timeouts
> > > >
> > > > I am curious how this system can know that the timeout is not an
> actual
> > > failure.  There was a bug in 4.0 with time serialization in message,
> > which
> > > would cause the message to get dropped; this presented itself as a
> > timeout
> > > if I remember properly (Jon Meredith or Yifan Cai fixed this bug I
> > believe).
> > > >
> > > >> On Nov 3, 2021, at 10:56 AM, Brandon Williams <dr...@gmail.com>
> > wrote:
> > > >>
> > > >> On Wed, Nov 3, 2021 at 12:35 PM benedict@apache.org <
> > > benedict@apache.org> wrote:
> > > >>> The largest number of test failures turn out (as pointed out by
> > David)
> > > to be due to how arcane it was to trigger the full test suite.
> Hopefully
> > we
> > > can get on top of that, but I think a significant remaining issue is a
> > lack
> > > of trust in the output of CI. It’s hard to gate commit on a clean CI
> run
> > > when there’s flaky tests, and it doesn’t take much to misattribute one
> > > failing test to the existing flakiness (I tend to compare to a run of
> the
> > > trunk baseline for comparison, but this is burdensome and still error
> > > prone). The more flaky tests there are the more likely this is.
> > > >>>
> > > >>> This is in my opinion the real cost of flaky tests, and it’s
> probably
> > > worth trying to crack down on them hard if we can. It’s possible the
> > > Simulator may help here, when I finally finish it up, as we can port
> > flaky
> > > tests to run with the Simulator and the failing seed can then be
> explored
> > > deterministically (all being well).
> > > >> I totally agree that the lack of trust is a driving problem here,
> even
> > > >> in knowing which CI system to rely on. When Jenkins broke but Circle
> > > >> was fine, we all assumed it was a problem with Jenkins, right up
> until
> > > >> Circle also broke.
> > > >>
> > > >> In testing a distributed system like this I think we're always going
> > > >> to have failures, even on non-flaky tests, simply because the
> > > >> underlying infrastructure is variable with transient failures of its
> > > >> own (the network is reliable!)  We can fix the flakies where the
> fault
> > > >> is in the code (and we've done this to many already) but to get more
> > > >> trustworthy output, I think we're going to need a system that
> > > >> understands the difference between success, failure, and timeouts,
> and
> > > >> in the latter case knows how to at least mark them differently.
> > > >> Simulator may help, as do the in-jvm dtests, but there is ultimately
> > > >> no way to cover everything without doing some things the hard, more
> > > >> realistic way where sometimes shit happens, marring the
> almost-perfect
> > > >> runs with noisy doubt, which then has to be sifted through to
> > > >> determine if there was a real issue.
> > > >>
> > > >>
> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >>
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> > >
> >
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Andrés de la Peña <ad...@apache.org>.

Hi all,

we already have a way to confirm flakiness on circle by running the test
> repeatedly N times. Like 100 or 500. That has proven to work very well
> so far, at least for me. #collaborating #justfyi


I think it would be helpful if we always ran the repeated test jobs at
CircleCI when we add a new test or modify an existing one. Running those
jobs, when applicable, could be a requirement before committing. This
wouldn't help us when the changes affect many different tests or we are not
able to identify the tests affected by our changes, but I think it could
have prevented many of the recently fixed flakies.


On Thu, 4 Nov 2021 at 12:24, Joshua McKenzie <jm...@apache.org> wrote:

> >
> > we noticed CI going from a
> > steady 3-ish failures to many and it's getting fixed. So we're moving in
> > the right direction imo.
> >
> An observation about this: there's tooling and technology widely in use to
> help prevent ever getting into this state (to Benedict's point: blocking
> merge on CI failure, or nightly tests and reverting regression commits,
> etc). I think there's significant time and energy savings for us in using
> automation to be proactive about the quality of our test boards rather than
> reactive.
>
> I 100% agree that it's heartening to see that the quality of the codebase
> is improving as is the discipline / attentiveness of our collective
> culture. That said, I believe we still have a pretty fragile system when it
> comes to test failure accumulation.
>
> On Thu, Nov 4, 2021 at 2:46 AM Berenguer Blasi <be...@gmail.com>
> wrote:
>
> > I agree with David. CI has been pretty reliable besides the random
> > jenkins going down or timeout. The same 3 or 4 tests were the only flaky
> > ones in jenkins and Circle was very green. I bisected a couple failures
> > to legit code errors, David is fixing some more, others have as well, etc
> >
> > It is good news imo as we're just getting to learn our CI post 4.0 is
> > reliable and we need to start treating it as so and paying attention to
> > it's reports. Not perfect but reliable enough it would have prevented
> > those bugs getting merged.
> >
> > In fact we're having this conversation bc we noticed CI going from a
> > steady 3-ish failures to many and it's getting fixed. So we're moving in
> > the right direction imo.
> >
> > On 3/11/21 19:25, David Capwell wrote:
> > >> It’s hard to gate commit on a clean CI run when there’s flaky tests
> > > I agree, this is also why so much effort was done in 4.0 release to
> > remove as much as possible.  Just over 1 month ago we were not really
> > having a flaky test issue (outside of the sporadic timeout issues; my
> > circle ci runs were green constantly), and now the “flaky tests” I see
> are
> > all actual bugs (been root causing 2 out of the 3 I reported) and some
> (not
> > all) of the flakyness was triggered by recent changes in the past month.
> > >
> > > Right now people do not believe the failing test is caused by their
> > patch and attribute to flakiness, which then causes the builds to start
> > being flaky, which then leads to a different author coming to fix the
> > issue; this behavior is what I would love to see go away.  If we find a
> > flaky test, we should do the following
> > >
> > > 1) has it already been reported and who is working to fix?  Can we
> block
> > this patch on the test being fixed?  Flaky tests due to timing issues
> > normally are resolved very quickly, real bugs take longer.
> > > 2) if not reported, why?  If you are the first to see this issue than
> > good chance the patch caused the issue so should root cause.  If you are
> > not the first to see it, why did others not report it (we tend to be good
> > about this, even to the point Brandon has to mark the new tickets as
> dups…)?
> > >
> > > I have committed when there were flakiness, and I have caused
> flakiness;
> > not saying I am perfect or that I do the above, just saying that if we
> all
> > moved to the above model we could start relying on CI.  The biggest
> impact
> > to our stability is people actually root causing flaky tests.
> > >
> > >>  I think we're going to need a system that
> > >> understands the difference between success, failure, and timeouts
> > >
> > > I am curious how this system can know that the timeout is not an actual
> > failure.  There was a bug in 4.0 with time serialization in message,
> which
> > would cause the message to get dropped; this presented itself as a
> timeout
> > if I remember properly (Jon Meredith or Yifan Cai fixed this bug I
> believe).
> > >
> > >> On Nov 3, 2021, at 10:56 AM, Brandon Williams <dr...@gmail.com>
> wrote:
> > >>
> > >> On Wed, Nov 3, 2021 at 12:35 PM benedict@apache.org <
> > benedict@apache.org> wrote:
> > >>> The largest number of test failures turn out (as pointed out by
> David)
> > to be due to how arcane it was to trigger the full test suite. Hopefully
> we
> > can get on top of that, but I think a significant remaining issue is a
> lack
> > of trust in the output of CI. It’s hard to gate commit on a clean CI run
> > when there’s flaky tests, and it doesn’t take much to misattribute one
> > failing test to the existing flakiness (I tend to compare to a run of the
> > trunk baseline for comparison, but this is burdensome and still error
> > prone). The more flaky tests there are the more likely this is.
> > >>>
> > >>> This is in my opinion the real cost of flaky tests, and it’s probably
> > worth trying to crack down on them hard if we can. It’s possible the
> > Simulator may help here, when I finally finish it up, as we can port
> flaky
> > tests to run with the Simulator and the failing seed can then be explored
> > deterministically (all being well).
> > >> I totally agree that the lack of trust is a driving problem here, even
> > >> in knowing which CI system to rely on. When Jenkins broke but Circle
> > >> was fine, we all assumed it was a problem with Jenkins, right up until
> > >> Circle also broke.
> > >>
> > >> In testing a distributed system like this I think we're always going
> > >> to have failures, even on non-flaky tests, simply because the
> > >> underlying infrastructure is variable with transient failures of its
> > >> own (the network is reliable!)  We can fix the flakies where the fault
> > >> is in the code (and we've done this to many already) but to get more
> > >> trustworthy output, I think we're going to need a system that
> > >> understands the difference between success, failure, and timeouts, and
> > >> in the latter case knows how to at least mark them differently.
> > >> Simulator may help, as do the in-jvm dtests, but there is ultimately
> > >> no way to cover everything without doing some things the hard, more
> > >> realistic way where sometimes shit happens, marring the almost-perfect
> > >> runs with noisy doubt, which then has to be sifted through to
> > >> determine if there was a real issue.
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

>
> we noticed CI going from a
> steady 3-ish failures to many and it's getting fixed. So we're moving in
> the right direction imo.
>
An observation about this: there's tooling and technology widely in use to
help prevent ever getting into this state (to Benedict's point: blocking
merge on CI failure, or nightly tests and reverting regression commits,
etc). I think there's significant time and energy savings for us in using
automation to be proactive about the quality of our test boards rather than
reactive.

I 100% agree that it's heartening to see that the quality of the codebase
is improving as is the discipline / attentiveness of our collective
culture. That said, I believe we still have a pretty fragile system when it
comes to test failure accumulation.

On Thu, Nov 4, 2021 at 2:46 AM Berenguer Blasi <be...@gmail.com>
wrote:

> I agree with David. CI has been pretty reliable besides the random
> jenkins going down or timeout. The same 3 or 4 tests were the only flaky
> ones in jenkins and Circle was very green. I bisected a couple failures
> to legit code errors, David is fixing some more, others have as well, etc
>
> It is good news imo as we're just getting to learn our CI post 4.0 is
> reliable and we need to start treating it as so and paying attention to
> it's reports. Not perfect but reliable enough it would have prevented
> those bugs getting merged.
>
> In fact we're having this conversation bc we noticed CI going from a
> steady 3-ish failures to many and it's getting fixed. So we're moving in
> the right direction imo.
>
> On 3/11/21 19:25, David Capwell wrote:
> >> It’s hard to gate commit on a clean CI run when there’s flaky tests
> > I agree, this is also why so much effort was done in 4.0 release to
> remove as much as possible.  Just over 1 month ago we were not really
> having a flaky test issue (outside of the sporadic timeout issues; my
> circle ci runs were green constantly), and now the “flaky tests” I see are
> all actual bugs (been root causing 2 out of the 3 I reported) and some (not
> all) of the flakyness was triggered by recent changes in the past month.
> >
> > Right now people do not believe the failing test is caused by their
> patch and attribute to flakiness, which then causes the builds to start
> being flaky, which then leads to a different author coming to fix the
> issue; this behavior is what I would love to see go away.  If we find a
> flaky test, we should do the following
> >
> > 1) has it already been reported and who is working to fix?  Can we block
> this patch on the test being fixed?  Flaky tests due to timing issues
> normally are resolved very quickly, real bugs take longer.
> > 2) if not reported, why?  If you are the first to see this issue than
> good chance the patch caused the issue so should root cause.  If you are
> not the first to see it, why did others not report it (we tend to be good
> about this, even to the point Brandon has to mark the new tickets as dups…)?
> >
> > I have committed when there were flakiness, and I have caused flakiness;
> not saying I am perfect or that I do the above, just saying that if we all
> moved to the above model we could start relying on CI.  The biggest impact
> to our stability is people actually root causing flaky tests.
> >
> >>  I think we're going to need a system that
> >> understands the difference between success, failure, and timeouts
> >
> > I am curious how this system can know that the timeout is not an actual
> failure.  There was a bug in 4.0 with time serialization in message, which
> would cause the message to get dropped; this presented itself as a timeout
> if I remember properly (Jon Meredith or Yifan Cai fixed this bug I believe).
> >
> >> On Nov 3, 2021, at 10:56 AM, Brandon Williams <dr...@gmail.com> wrote:
> >>
> >> On Wed, Nov 3, 2021 at 12:35 PM benedict@apache.org <
> benedict@apache.org> wrote:
> >>> The largest number of test failures turn out (as pointed out by David)
> to be due to how arcane it was to trigger the full test suite. Hopefully we
> can get on top of that, but I think a significant remaining issue is a lack
> of trust in the output of CI. It’s hard to gate commit on a clean CI run
> when there’s flaky tests, and it doesn’t take much to misattribute one
> failing test to the existing flakiness (I tend to compare to a run of the
> trunk baseline for comparison, but this is burdensome and still error
> prone). The more flaky tests there are the more likely this is.
> >>>
> >>> This is in my opinion the real cost of flaky tests, and it’s probably
> worth trying to crack down on them hard if we can. It’s possible the
> Simulator may help here, when I finally finish it up, as we can port flaky
> tests to run with the Simulator and the failing seed can then be explored
> deterministically (all being well).
> >> I totally agree that the lack of trust is a driving problem here, even
> >> in knowing which CI system to rely on. When Jenkins broke but Circle
> >> was fine, we all assumed it was a problem with Jenkins, right up until
> >> Circle also broke.
> >>
> >> In testing a distributed system like this I think we're always going
> >> to have failures, even on non-flaky tests, simply because the
> >> underlying infrastructure is variable with transient failures of its
> >> own (the network is reliable!)  We can fix the flakies where the fault
> >> is in the code (and we've done this to many already) but to get more
> >> trustworthy output, I think we're going to need a system that
> >> understands the difference between success, failure, and timeouts, and
> >> in the latter case knows how to at least mark them differently.
> >> Simulator may help, as do the in-jvm dtests, but there is ultimately
> >> no way to cover everything without doing some things the hard, more
> >> realistic way where sometimes shit happens, marring the almost-perfect
> >> runs with noisy doubt, which then has to be sifted through to
> >> determine if there was a real issue.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Berenguer Blasi <be...@gmail.com>.

I agree with David. CI has been pretty reliable besides the random
jenkins going down or timeout. The same 3 or 4 tests were the only flaky
ones in jenkins and Circle was very green. I bisected a couple failures
to legit code errors, David is fixing some more, others have as well, etc

It is good news imo as we're just getting to learn our CI post 4.0 is
reliable and we need to start treating it as so and paying attention to
it's reports. Not perfect but reliable enough it would have prevented
those bugs getting merged.

In fact we're having this conversation bc we noticed CI going from a
steady 3-ish failures to many and it's getting fixed. So we're moving in
the right direction imo.

On 3/11/21 19:25, David Capwell wrote:
>> It’s hard to gate commit on a clean CI run when there’s flaky tests
> I agree, this is also why so much effort was done in 4.0 release to remove as much as possible.  Just over 1 month ago we were not really having a flaky test issue (outside of the sporadic timeout issues; my circle ci runs were green constantly), and now the “flaky tests” I see are all actual bugs (been root causing 2 out of the 3 I reported) and some (not all) of the flakyness was triggered by recent changes in the past month.
>
> Right now people do not believe the failing test is caused by their patch and attribute to flakiness, which then causes the builds to start being flaky, which then leads to a different author coming to fix the issue; this behavior is what I would love to see go away.  If we find a flaky test, we should do the following
>
> 1) has it already been reported and who is working to fix?  Can we block this patch on the test being fixed?  Flaky tests due to timing issues normally are resolved very quickly, real bugs take longer.
> 2) if not reported, why?  If you are the first to see this issue than good chance the patch caused the issue so should root cause.  If you are not the first to see it, why did others not report it (we tend to be good about this, even to the point Brandon has to mark the new tickets as dups…)?
>
> I have committed when there were flakiness, and I have caused flakiness; not saying I am perfect or that I do the above, just saying that if we all moved to the above model we could start relying on CI.  The biggest impact to our stability is people actually root causing flaky tests.
>
>>  I think we're going to need a system that
>> understands the difference between success, failure, and timeouts
>
> I am curious how this system can know that the timeout is not an actual failure.  There was a bug in 4.0 with time serialization in message, which would cause the message to get dropped; this presented itself as a timeout if I remember properly (Jon Meredith or Yifan Cai fixed this bug I believe).
>
>> On Nov 3, 2021, at 10:56 AM, Brandon Williams <dr...@gmail.com> wrote:
>>
>> On Wed, Nov 3, 2021 at 12:35 PM benedict@apache.org <be...@apache.org> wrote:
>>> The largest number of test failures turn out (as pointed out by David) to be due to how arcane it was to trigger the full test suite. Hopefully we can get on top of that, but I think a significant remaining issue is a lack of trust in the output of CI. It’s hard to gate commit on a clean CI run when there’s flaky tests, and it doesn’t take much to misattribute one failing test to the existing flakiness (I tend to compare to a run of the trunk baseline for comparison, but this is burdensome and still error prone). The more flaky tests there are the more likely this is.
>>>
>>> This is in my opinion the real cost of flaky tests, and it’s probably worth trying to crack down on them hard if we can. It’s possible the Simulator may help here, when I finally finish it up, as we can port flaky tests to run with the Simulator and the failing seed can then be explored deterministically (all being well).
>> I totally agree that the lack of trust is a driving problem here, even
>> in knowing which CI system to rely on. When Jenkins broke but Circle
>> was fine, we all assumed it was a problem with Jenkins, right up until
>> Circle also broke.
>>
>> In testing a distributed system like this I think we're always going
>> to have failures, even on non-flaky tests, simply because the
>> underlying infrastructure is variable with transient failures of its
>> own (the network is reliable!)  We can fix the flakies where the fault
>> is in the code (and we've done this to many already) but to get more
>> trustworthy output, I think we're going to need a system that
>> understands the difference between success, failure, and timeouts, and
>> in the latter case knows how to at least mark them differently.
>> Simulator may help, as do the in-jvm dtests, but there is ultimately
>> no way to cover everything without doing some things the hard, more
>> realistic way where sometimes shit happens, marring the almost-perfect
>> runs with noisy doubt, which then has to be sifted through to
>> determine if there was a real issue.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by Brandon Williams <dr...@gmail.com>.

On Wed, Nov 3, 2021 at 1:26 PM David Capwell <dc...@apple.com.invalid> wrote:

> >  I think we're going to need a system that
> > understands the difference between success, failure, and timeouts
>
>
> I am curious how this system can know that the timeout is not an actual failure.  There was a bug in 4.0 with time serialization in message, which would cause the message to get dropped; this presented itself as a timeout if I remember properly (Jon Meredith or Yifan Cai fixed this bug I believe).

I don't think it needs to understand the cause of the timeout, just be
able to differentiate.  Of course some bugs present as timeouts so an
eye will need to be kept on that, but test history can make that
simple.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by David Capwell <dc...@apple.com.INVALID>.

> It’s hard to gate commit on a clean CI run when there’s flaky tests

I agree, this is also why so much effort was done in 4.0 release to remove as much as possible.  Just over 1 month ago we were not really having a flaky test issue (outside of the sporadic timeout issues; my circle ci runs were green constantly), and now the “flaky tests” I see are all actual bugs (been root causing 2 out of the 3 I reported) and some (not all) of the flakyness was triggered by recent changes in the past month.

Right now people do not believe the failing test is caused by their patch and attribute to flakiness, which then causes the builds to start being flaky, which then leads to a different author coming to fix the issue; this behavior is what I would love to see go away.  If we find a flaky test, we should do the following

1) has it already been reported and who is working to fix?  Can we block this patch on the test being fixed?  Flaky tests due to timing issues normally are resolved very quickly, real bugs take longer.
2) if not reported, why?  If you are the first to see this issue than good chance the patch caused the issue so should root cause.  If you are not the first to see it, why did others not report it (we tend to be good about this, even to the point Brandon has to mark the new tickets as dups…)?

I have committed when there were flakiness, and I have caused flakiness; not saying I am perfect or that I do the above, just saying that if we all moved to the above model we could start relying on CI.  The biggest impact to our stability is people actually root causing flaky tests.

>  I think we're going to need a system that
> understands the difference between success, failure, and timeouts

I am curious how this system can know that the timeout is not an actual failure.  There was a bug in 4.0 with time serialization in message, which would cause the message to get dropped; this presented itself as a timeout if I remember properly (Jon Meredith or Yifan Cai fixed this bug I believe).

> On Nov 3, 2021, at 10:56 AM, Brandon Williams <dr...@gmail.com> wrote:
> 
> On Wed, Nov 3, 2021 at 12:35 PM benedict@apache.org <be...@apache.org> wrote:
>> 
>> The largest number of test failures turn out (as pointed out by David) to be due to how arcane it was to trigger the full test suite. Hopefully we can get on top of that, but I think a significant remaining issue is a lack of trust in the output of CI. It’s hard to gate commit on a clean CI run when there’s flaky tests, and it doesn’t take much to misattribute one failing test to the existing flakiness (I tend to compare to a run of the trunk baseline for comparison, but this is burdensome and still error prone). The more flaky tests there are the more likely this is.
>> 
>> This is in my opinion the real cost of flaky tests, and it’s probably worth trying to crack down on them hard if we can. It’s possible the Simulator may help here, when I finally finish it up, as we can port flaky tests to run with the Simulator and the failing seed can then be explored deterministically (all being well).
> 
> I totally agree that the lack of trust is a driving problem here, even
> in knowing which CI system to rely on. When Jenkins broke but Circle
> was fine, we all assumed it was a problem with Jenkins, right up until
> Circle also broke.
> 
> In testing a distributed system like this I think we're always going
> to have failures, even on non-flaky tests, simply because the
> underlying infrastructure is variable with transient failures of its
> own (the network is reliable!)  We can fix the flakies where the fault
> is in the code (and we've done this to many already) but to get more
> trustworthy output, I think we're going to need a system that
> understands the difference between success, failure, and timeouts, and
> in the latter case knows how to at least mark them differently.
> Simulator may help, as do the in-jvm dtests, but there is ultimately
> no way to cover everything without doing some things the hard, more
> realistic way where sometimes shit happens, marring the almost-perfect
> runs with noisy doubt, which then has to be sifted through to
> determine if there was a real issue.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by Brandon Williams <dr...@gmail.com>.

On Wed, Nov 3, 2021 at 12:35 PM benedict@apache.org <be...@apache.org> wrote:
>
> The largest number of test failures turn out (as pointed out by David) to be due to how arcane it was to trigger the full test suite. Hopefully we can get on top of that, but I think a significant remaining issue is a lack of trust in the output of CI. It’s hard to gate commit on a clean CI run when there’s flaky tests, and it doesn’t take much to misattribute one failing test to the existing flakiness (I tend to compare to a run of the trunk baseline for comparison, but this is burdensome and still error prone). The more flaky tests there are the more likely this is.
>
> This is in my opinion the real cost of flaky tests, and it’s probably worth trying to crack down on them hard if we can. It’s possible the Simulator may help here, when I finally finish it up, as we can port flaky tests to run with the Simulator and the failing seed can then be explored deterministically (all being well).

I totally agree that the lack of trust is a driving problem here, even
in knowing which CI system to rely on. When Jenkins broke but Circle
was fine, we all assumed it was a problem with Jenkins, right up until
Circle also broke.

In testing a distributed system like this I think we're always going
to have failures, even on non-flaky tests, simply because the
underlying infrastructure is variable with transient failures of its
own (the network is reliable!)  We can fix the flakies where the fault
is in the code (and we've done this to many already) but to get more
trustworthy output, I think we're going to need a system that
understands the difference between success, failure, and timeouts, and
in the latter case knows how to at least mark them differently.
Simulator may help, as do the in-jvm dtests, but there is ultimately
no way to cover everything without doing some things the hard, more
realistic way where sometimes shit happens, marring the almost-perfect
runs with noisy doubt, which then has to be sifted through to
determine if there was a real issue.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by "benedict@apache.org" <be...@apache.org>.

The largest number of test failures turn out (as pointed out by David) to be due to how arcane it was to trigger the full test suite. Hopefully we can get on top of that, but I think a significant remaining issue is a lack of trust in the output of CI. It’s hard to gate commit on a clean CI run when there’s flaky tests, and it doesn’t take much to misattribute one failing test to the existing flakiness (I tend to compare to a run of the trunk baseline for comparison, but this is burdensome and still error prone). The more flaky tests there are the more likely this is.

This is in my opinion the real cost of flaky tests, and it’s probably worth trying to crack down on them hard if we can. It’s possible the Simulator may help here, when I finally finish it up, as we can port flaky tests to run with the Simulator and the failing seed can then be explored deterministically (all being well).

From: Brandon Williams <dr...@gmail.com>
Date: Wednesday, 3 November 2021 at 17:07
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] Releasable trunk and quality
On Mon, Nov 1, 2021 at 5:03 PM David Capwell <dc...@apple.com.invalid> wrote:
>
> > How do we define what "releasable trunk" means?
>
> One thing I would love is for us to adopt a “run all tests needed to release before commit” mentality, and to link a successful run in JIRA when closing (we talked about this once in slack).  If we look at CircleCI we currently do not run all the tests needed to sign off; below are the tests disabled in the “pre-commit” workflows (see https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml#L381):

A good first step toward this would be for us to treat our binding +1s
more judiciously, and not grant any without at least a pre-commit CI
run linked in the ticket.  You don't have to look very hard to find a
lot of these today (I know I'm guilty), and it's possible we wouldn't
have the current CI mess now if we had been a little bit more
diligent.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by Brandon Williams <dr...@gmail.com>.

On Mon, Nov 1, 2021 at 5:03 PM David Capwell <dc...@apple.com.invalid> wrote:
>
> > How do we define what "releasable trunk" means?
>
> One thing I would love is for us to adopt a “run all tests needed to release before commit” mentality, and to link a successful run in JIRA when closing (we talked about this once in slack).  If we look at CircleCI we currently do not run all the tests needed to sign off; below are the tests disabled in the “pre-commit” workflows (see https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml#L381):

A good first step toward this would be for us to treat our binding +1s
more judiciously, and not grant any without at least a pre-commit CI
run linked in the ticket.  You don't have to look very hard to find a
lot of these today (I know I'm guilty), and it's possible we wouldn't
have the current CI mess now if we had been a little bit more
diligent.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by David Capwell <dc...@apple.com.INVALID>.

> I have to apologise here. CircleCI did not uncover these problems, apparently due to some way it resolves dependencies,

I double checked your CircleCI run for the trunk branch, and the problem doesn’t have to do with “resolves dependencies”, the problem lies with our CI being too complex and doesn’t natively support multi-branch commits.

Right now you need to opt-in to 2 builds to run the single jvm-dtest upgrade test build (missed in your CI); this should not be opt-in (see my previous comment about this), and it really shouldn’t be 2 approvals for a single build…
Enabling “upgrade tests” does not run all the upgrade tests… you need to approve 2 other builds to run the full set of upgrade tests (see problem above).  I see in the build you ran the upgrade tests, which only touches the python-dtest upgrade tests
Lastly, you need to hack the circleci configuration to support multi-branch CI, if you do not it will run against w/e is already committed to 2.2, 3.0, 3.11, and 4.0.  Multi-branch commits are very normal for our project, but doing CI properly in these cases is way too hard (you can not do multi-branch tests in Jenkins https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch-test/build <https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch-test/build>; no support to run against your other branches).

> On Nov 1, 2021, at 3:03 PM, David Capwell <dc...@apple.com.INVALID> wrote:
> 
>> How do we define what "releasable trunk" means?
> 
> One thing I would love is for us to adopt a “run all tests needed to release before commit” mentality, and to link a successful run in JIRA when closing (we talked about this once in slack).  If we look at CircleCI we currently do not run all the tests needed to sign off; below are the tests disabled in the “pre-commit” workflows (see https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml#L381):
> 
> start_utests_long
> start_utests_compression
> start_utests_stress
> start_utests_fqltool
> start_utests_system_keyspace_directory
> start_jvm_upgrade_dtest
> start_upgrade_tests
> 
> Given the configuration right now we have to opt-in to upgrade tests, but we can’t release if those are broken (same for compression/fqltool/cdc (not covered in circle)).
> 
>> On Oct 30, 2021, at 6:24 AM, benedict@apache.org wrote:
>> 
>>> How do we define what "releasable trunk" means?
>> 
>> For me, the major criteria is ensuring that work is not merged that is known to require follow-up work, or could reasonably have been known to require follow-up work if better QA practices had been followed.
>> 
>> So, a big part of this is ensuring we continue to exceed our targets for improved QA. For me this means trying to weave tools like Harry and the Simulator into our development workflow early on, but we’ll see how well these tools gain broader adoption. This also means focus in general on possible negative effects of a change.
>> 
>> I think we could do with producing guidance documentation for how to approach QA, where we can record our best practices and evolve them as we discover flaws or pitfalls, either for ergonomics or for bug discovery.
>> 
>>> What are the benefits of having a releasable trunk as defined here?
>> 
>> If we want to have any hope of meeting reasonable release cadences _and_ the high project quality we expect today, then I think a ~shippable trunk policy is an absolute necessity.
>> 
>> I don’t think means guaranteeing there are no failing tests (though ideally this would also happen), but about ensuring our best practices are followed for every merge. 4.0 took so long to release because of the amount of hidden work that was created by merging work that didn’t meet the standard for release.
>> 
>> Historically we have also had significant pressure to backport features to earlier versions due to the cost and risk of upgrading. If we maintain broader version compatibility for upgrade, and reduce the risk of adopting newer versions, then this pressure is also reduced significantly. Though perhaps we will stick to our guns here anyway, as there seems to be renewed pressure to limit work in GA releases to bug fixes exclusively. It remains to be seen if this holds.
>> 
>>> What are the costs?
>> 
>> I think the costs are quite low, perhaps even negative. Hidden work produced by merges that break things can be much more costly than getting the work right first time, as attribution is much more challenging.
>> 
>> One cost that is created, however, is for version compatibility as we cannot say “well, this is a minor version bump so we don’t need to support downgrade”. But I think we should be investing in this anyway for operator simplicity and confidence, so I actually see this as a benefit as well.
>> 
>>> Full disclosure: running face-first into 60+ failing tests on trunk
>> 
>> I have to apologise here. CircleCI did not uncover these problems, apparently due to some way it resolves dependencies, and so I am responsible for a significant number of these and have been quite sick since.
>> 
>> I think a push to eliminate flaky tests will probably help here in future, though, and perhaps the project needs to have some (low) threshold of flaky or failing tests at which point we block merges to force a correction.
>> 
>> 
>> From: Joshua McKenzie <jm...@apache.org>
>> Date: Saturday, 30 October 2021 at 14:00
>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>> Subject: [DISCUSS] Releasable trunk and quality
>> We as a project have gone back and forth on the topic of quality and the
>> notion of a releasable trunk for quite a few years. If people are
>> interested, I'd like to rekindle this discussion a bit and see if we're
>> happy with where we are as a project or if we think there's steps we should
>> take to change the quality bar going forward. The following questions have
>> been rattling around for me for awhile:
>> 
>> 1. How do we define what "releasable trunk" means? All reviewed by M
>> committers? Passing N% of tests? Passing all tests plus some other metrics
>> (manual testing, raising the number of reviewers, test coverage, usage in
>> dev or QA environments, etc)? Something else entirely?
>> 
>> 2. With a definition settled upon in #1, what steps, if any, do we need to
>> take to get from where we are to having *and keeping* that releasable
>> trunk? Anything to codify there?
>> 
>> 3. What are the benefits of having a releasable trunk as defined here? What
>> are the costs? Is it worth pursuing? What are the alternatives (for
>> instance: a freeze before a release + stabilization focus by the community
>> i.e. 4.0 push or the tock in tick-tock)?
>> 
>> Given the large volumes of work coming down the pike with CEP's, this seems
>> like a good time to at least check in on this topic as a community.
>> 
>> Full disclosure: running face-first into 60+ failing tests on trunk when
>> going through the commit process for denylisting this week brought this
>> topic back up for me (reminds me of when I went to merge CDC back in 3.6
>> and those test failures riled me up... I sense a pattern ;))
>> 
>> Looking forward to hearing what people think.
>> 
>> ~Josh
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

Re: [DISCUSS] Releasable trunk and quality

Posted by David Capwell <dc...@apple.com.INVALID>.

> How do we define what "releasable trunk" means?

One thing I would love is for us to adopt a “run all tests needed to release before commit” mentality, and to link a successful run in JIRA when closing (we talked about this once in slack).  If we look at CircleCI we currently do not run all the tests needed to sign off; below are the tests disabled in the “pre-commit” workflows (see https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml#L381):

start_utests_long
start_utests_compression
start_utests_stress
start_utests_fqltool
start_utests_system_keyspace_directory
start_jvm_upgrade_dtest
start_upgrade_tests

Given the configuration right now we have to opt-in to upgrade tests, but we can’t release if those are broken (same for compression/fqltool/cdc (not covered in circle)).

> On Oct 30, 2021, at 6:24 AM, benedict@apache.org wrote:
> 
>> How do we define what "releasable trunk" means?
> 
> For me, the major criteria is ensuring that work is not merged that is known to require follow-up work, or could reasonably have been known to require follow-up work if better QA practices had been followed.
> 
> So, a big part of this is ensuring we continue to exceed our targets for improved QA. For me this means trying to weave tools like Harry and the Simulator into our development workflow early on, but we’ll see how well these tools gain broader adoption. This also means focus in general on possible negative effects of a change.
> 
> I think we could do with producing guidance documentation for how to approach QA, where we can record our best practices and evolve them as we discover flaws or pitfalls, either for ergonomics or for bug discovery.
> 
>> What are the benefits of having a releasable trunk as defined here?
> 
> If we want to have any hope of meeting reasonable release cadences _and_ the high project quality we expect today, then I think a ~shippable trunk policy is an absolute necessity.
> 
> I don’t think means guaranteeing there are no failing tests (though ideally this would also happen), but about ensuring our best practices are followed for every merge. 4.0 took so long to release because of the amount of hidden work that was created by merging work that didn’t meet the standard for release.
> 
> Historically we have also had significant pressure to backport features to earlier versions due to the cost and risk of upgrading. If we maintain broader version compatibility for upgrade, and reduce the risk of adopting newer versions, then this pressure is also reduced significantly. Though perhaps we will stick to our guns here anyway, as there seems to be renewed pressure to limit work in GA releases to bug fixes exclusively. It remains to be seen if this holds.
> 
>> What are the costs?
> 
> I think the costs are quite low, perhaps even negative. Hidden work produced by merges that break things can be much more costly than getting the work right first time, as attribution is much more challenging.
> 
> One cost that is created, however, is for version compatibility as we cannot say “well, this is a minor version bump so we don’t need to support downgrade”. But I think we should be investing in this anyway for operator simplicity and confidence, so I actually see this as a benefit as well.
> 
>> Full disclosure: running face-first into 60+ failing tests on trunk
> 
> I have to apologise here. CircleCI did not uncover these problems, apparently due to some way it resolves dependencies, and so I am responsible for a significant number of these and have been quite sick since.
> 
> I think a push to eliminate flaky tests will probably help here in future, though, and perhaps the project needs to have some (low) threshold of flaky or failing tests at which point we block merges to force a correction.
> 
> 
> From: Joshua McKenzie <jm...@apache.org>
> Date: Saturday, 30 October 2021 at 14:00
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: [DISCUSS] Releasable trunk and quality
> We as a project have gone back and forth on the topic of quality and the
> notion of a releasable trunk for quite a few years. If people are
> interested, I'd like to rekindle this discussion a bit and see if we're
> happy with where we are as a project or if we think there's steps we should
> take to change the quality bar going forward. The following questions have
> been rattling around for me for awhile:
> 
> 1. How do we define what "releasable trunk" means? All reviewed by M
> committers? Passing N% of tests? Passing all tests plus some other metrics
> (manual testing, raising the number of reviewers, test coverage, usage in
> dev or QA environments, etc)? Something else entirely?
> 
> 2. With a definition settled upon in #1, what steps, if any, do we need to
> take to get from where we are to having *and keeping* that releasable
> trunk? Anything to codify there?
> 
> 3. What are the benefits of having a releasable trunk as defined here? What
> are the costs? Is it worth pursuing? What are the alternatives (for
> instance: a freeze before a release + stabilization focus by the community
> i.e. 4.0 push or the tock in tick-tock)?
> 
> Given the large volumes of work coming down the pike with CEP's, this seems
> like a good time to at least check in on this topic as a community.
> 
> Full disclosure: running face-first into 60+ failing tests on trunk when
> going through the commit process for denylisting this week brought this
> topic back up for me (reminds me of when I went to merge CDC back in 3.6
> and those test failures riled me up... I sense a pattern ;))
> 
> Looking forward to hearing what people think.
> 
> ~Josh


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by "benedict@apache.org" <be...@apache.org>.

> How do we define what "releasable trunk" means?

For me, the major criteria is ensuring that work is not merged that is known to require follow-up work, or could reasonably have been known to require follow-up work if better QA practices had been followed.

So, a big part of this is ensuring we continue to exceed our targets for improved QA. For me this means trying to weave tools like Harry and the Simulator into our development workflow early on, but we’ll see how well these tools gain broader adoption. This also means focus in general on possible negative effects of a change.

I think we could do with producing guidance documentation for how to approach QA, where we can record our best practices and evolve them as we discover flaws or pitfalls, either for ergonomics or for bug discovery.

> What are the benefits of having a releasable trunk as defined here?

If we want to have any hope of meeting reasonable release cadences _and_ the high project quality we expect today, then I think a ~shippable trunk policy is an absolute necessity.

I don’t think means guaranteeing there are no failing tests (though ideally this would also happen), but about ensuring our best practices are followed for every merge. 4.0 took so long to release because of the amount of hidden work that was created by merging work that didn’t meet the standard for release.

Historically we have also had significant pressure to backport features to earlier versions due to the cost and risk of upgrading. If we maintain broader version compatibility for upgrade, and reduce the risk of adopting newer versions, then this pressure is also reduced significantly. Though perhaps we will stick to our guns here anyway, as there seems to be renewed pressure to limit work in GA releases to bug fixes exclusively. It remains to be seen if this holds.

> What are the costs?

I think the costs are quite low, perhaps even negative. Hidden work produced by merges that break things can be much more costly than getting the work right first time, as attribution is much more challenging.

One cost that is created, however, is for version compatibility as we cannot say “well, this is a minor version bump so we don’t need to support downgrade”. But I think we should be investing in this anyway for operator simplicity and confidence, so I actually see this as a benefit as well.

> Full disclosure: running face-first into 60+ failing tests on trunk

I have to apologise here. CircleCI did not uncover these problems, apparently due to some way it resolves dependencies, and so I am responsible for a significant number of these and have been quite sick since.

I think a push to eliminate flaky tests will probably help here in future, though, and perhaps the project needs to have some (low) threshold of flaky or failing tests at which point we block merges to force a correction.


From: Joshua McKenzie <jm...@apache.org>
Date: Saturday, 30 October 2021 at 14:00
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: [DISCUSS] Releasable trunk and quality
We as a project have gone back and forth on the topic of quality and the
notion of a releasable trunk for quite a few years. If people are
interested, I'd like to rekindle this discussion a bit and see if we're
happy with where we are as a project or if we think there's steps we should
take to change the quality bar going forward. The following questions have
been rattling around for me for awhile:

1. How do we define what "releasable trunk" means? All reviewed by M
committers? Passing N% of tests? Passing all tests plus some other metrics
(manual testing, raising the number of reviewers, test coverage, usage in
dev or QA environments, etc)? Something else entirely?

2. With a definition settled upon in #1, what steps, if any, do we need to
take to get from where we are to having *and keeping* that releasable
trunk? Anything to codify there?

3. What are the benefits of having a releasable trunk as defined here? What
are the costs? Is it worth pursuing? What are the alternatives (for
instance: a freeze before a release + stabilization focus by the community
i.e. 4.0 push or the tock in tick-tock)?

Given the large volumes of work coming down the pike with CEP's, this seems
like a good time to at least check in on this topic as a community.

Full disclosure: running face-first into 60+ failing tests on trunk when
going through the commit process for denylisting this week brought this
topic back up for me (reminds me of when I went to merge CDC back in 3.6
and those test failures riled me up... I sense a pattern ;))

Looking forward to hearing what people think.

~Josh