You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by David Capwell <dc...@apple.com.INVALID> on 2021/11/01 22:03:15 UTC

Re: [DISCUSS] Releasable trunk and quality

> How do we define what "releasable trunk" means?

One thing I would love is for us to adopt a “run all tests needed to release before commit” mentality, and to link a successful run in JIRA when closing (we talked about this once in slack).  If we look at CircleCI we currently do not run all the tests needed to sign off; below are the tests disabled in the “pre-commit” workflows (see https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml#L381):

start_utests_long
start_utests_compression
start_utests_stress
start_utests_fqltool
start_utests_system_keyspace_directory
start_jvm_upgrade_dtest
start_upgrade_tests

Given the configuration right now we have to opt-in to upgrade tests, but we can’t release if those are broken (same for compression/fqltool/cdc (not covered in circle)).

> On Oct 30, 2021, at 6:24 AM, benedict@apache.org wrote:
> 
>> How do we define what "releasable trunk" means?
> 
> For me, the major criteria is ensuring that work is not merged that is known to require follow-up work, or could reasonably have been known to require follow-up work if better QA practices had been followed.
> 
> So, a big part of this is ensuring we continue to exceed our targets for improved QA. For me this means trying to weave tools like Harry and the Simulator into our development workflow early on, but we’ll see how well these tools gain broader adoption. This also means focus in general on possible negative effects of a change.
> 
> I think we could do with producing guidance documentation for how to approach QA, where we can record our best practices and evolve them as we discover flaws or pitfalls, either for ergonomics or for bug discovery.
> 
>> What are the benefits of having a releasable trunk as defined here?
> 
> If we want to have any hope of meeting reasonable release cadences _and_ the high project quality we expect today, then I think a ~shippable trunk policy is an absolute necessity.
> 
> I don’t think means guaranteeing there are no failing tests (though ideally this would also happen), but about ensuring our best practices are followed for every merge. 4.0 took so long to release because of the amount of hidden work that was created by merging work that didn’t meet the standard for release.
> 
> Historically we have also had significant pressure to backport features to earlier versions due to the cost and risk of upgrading. If we maintain broader version compatibility for upgrade, and reduce the risk of adopting newer versions, then this pressure is also reduced significantly. Though perhaps we will stick to our guns here anyway, as there seems to be renewed pressure to limit work in GA releases to bug fixes exclusively. It remains to be seen if this holds.
> 
>> What are the costs?
> 
> I think the costs are quite low, perhaps even negative. Hidden work produced by merges that break things can be much more costly than getting the work right first time, as attribution is much more challenging.
> 
> One cost that is created, however, is for version compatibility as we cannot say “well, this is a minor version bump so we don’t need to support downgrade”. But I think we should be investing in this anyway for operator simplicity and confidence, so I actually see this as a benefit as well.
> 
>> Full disclosure: running face-first into 60+ failing tests on trunk
> 
> I have to apologise here. CircleCI did not uncover these problems, apparently due to some way it resolves dependencies, and so I am responsible for a significant number of these and have been quite sick since.
> 
> I think a push to eliminate flaky tests will probably help here in future, though, and perhaps the project needs to have some (low) threshold of flaky or failing tests at which point we block merges to force a correction.
> 
> 
> From: Joshua McKenzie <jm...@apache.org>
> Date: Saturday, 30 October 2021 at 14:00
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: [DISCUSS] Releasable trunk and quality
> We as a project have gone back and forth on the topic of quality and the
> notion of a releasable trunk for quite a few years. If people are
> interested, I'd like to rekindle this discussion a bit and see if we're
> happy with where we are as a project or if we think there's steps we should
> take to change the quality bar going forward. The following questions have
> been rattling around for me for awhile:
> 
> 1. How do we define what "releasable trunk" means? All reviewed by M
> committers? Passing N% of tests? Passing all tests plus some other metrics
> (manual testing, raising the number of reviewers, test coverage, usage in
> dev or QA environments, etc)? Something else entirely?
> 
> 2. With a definition settled upon in #1, what steps, if any, do we need to
> take to get from where we are to having *and keeping* that releasable
> trunk? Anything to codify there?
> 
> 3. What are the benefits of having a releasable trunk as defined here? What
> are the costs? Is it worth pursuing? What are the alternatives (for
> instance: a freeze before a release + stabilization focus by the community
> i.e. 4.0 push or the tock in tick-tock)?
> 
> Given the large volumes of work coming down the pike with CEP's, this seems
> like a good time to at least check in on this topic as a community.
> 
> Full disclosure: running face-first into 60+ failing tests on trunk when
> going through the commit process for denylisting this week brought this
> topic back up for me (reminds me of when I went to merge CDC back in 3.6
> and those test failures riled me up... I sense a pattern ;))
> 
> Looking forward to hearing what people think.
> 
> ~Josh


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by Ekaterina Dimitrova <e....@gmail.com>.

Thank you Josh.

“I think it would be helpful if we always ran the repeated test jobs at
CircleCI when we add a new test or modify an existing one. Running those
jobs, when applicable, could be a requirement before committing. This
wouldn't help us when the changes affect many different tests or we are not
able to identify the tests affected by our changes, but I think it could
have prevented many of the recently fixed flakies.“

I think I would love also to see the verification with running new tests in
a loop before adding them to the code happening more often. It was
mentioned by a few of us in this discussion as a good method we already use
successfully so I just wanted to mention it again so it doesn’t slip out of
the list. :-)

Happy weekend everyone!

Best regards,
Ekaterina


On Fri, 5 Nov 2021 at 11:30, Joshua McKenzie <jm...@apache.org> wrote:

> To checkpoint this conversation and keep it going, the ideas I see
> in-thread (light editorializing by me):
> 1. Blocking PR merge on CI being green (viable for single branch commits,
> less so for multiple)
> 2. A change in our expected culture of "if you see something, fix
> something" when it comes to test failures on a branch (requires stable
> green test board to be viable)
> 3. Clearer merge criteria and potentially updates to circle config for
> committers in terms of "which test suites need to be run" (notably,
> including upgrade tests)
> 4. Integration of model and property based fuzz testing into the release
> qualification pipeline at least
> 5. Improvements in project dependency management, most notably in-jvm dtest
> API's, and the release process around that
>
> So a) Am I missing anything, and b) Am I getting anything wrong in the
> summary above?
>
> On Thu, Nov 4, 2021 at 9:01 AM Andrés de la Peña <ad...@apache.org>
> wrote:
>
> > Hi all,
> >
> > we already have a way to confirm flakiness on circle by running the test
> > > repeatedly N times. Like 100 or 500. That has proven to work very well
> > > so far, at least for me. #collaborating #justfyi
> >
> >
> > I think it would be helpful if we always ran the repeated test jobs at
> > CircleCI when we add a new test or modify an existing one. Running those
> > jobs, when applicable, could be a requirement before committing. This
> > wouldn't help us when the changes affect many different tests or we are
> not
> > able to identify the tests affected by our changes, but I think it could
> > have prevented many of the recently fixed flakies.
> >
> >
> > On Thu, 4 Nov 2021 at 12:24, Joshua McKenzie <jm...@apache.org>
> wrote:
> >
> > > >
> > > > we noticed CI going from a
> > > > steady 3-ish failures to many and it's getting fixed. So we're moving
> > in
> > > > the right direction imo.
> > > >
> > > An observation about this: there's tooling and technology widely in use
> > to
> > > help prevent ever getting into this state (to Benedict's point:
> blocking
> > > merge on CI failure, or nightly tests and reverting regression commits,
> > > etc). I think there's significant time and energy savings for us in
> using
> > > automation to be proactive about the quality of our test boards rather
> > than
> > > reactive.
> > >
> > > I 100% agree that it's heartening to see that the quality of the
> codebase
> > > is improving as is the discipline / attentiveness of our collective
> > > culture. That said, I believe we still have a pretty fragile system
> when
> > it
> > > comes to test failure accumulation.
> > >
> > > On Thu, Nov 4, 2021 at 2:46 AM Berenguer Blasi <
> berenguerblasi@gmail.com
> > >
> > > wrote:
> > >
> > > > I agree with David. CI has been pretty reliable besides the random
> > > > jenkins going down or timeout. The same 3 or 4 tests were the only
> > flaky
> > > > ones in jenkins and Circle was very green. I bisected a couple
> failures
> > > > to legit code errors, David is fixing some more, others have as well,
> > etc
> > > >
> > > > It is good news imo as we're just getting to learn our CI post 4.0 is
> > > > reliable and we need to start treating it as so and paying attention
> to
> > > > it's reports. Not perfect but reliable enough it would have prevented
> > > > those bugs getting merged.
> > > >
> > > > In fact we're having this conversation bc we noticed CI going from a
> > > > steady 3-ish failures to many and it's getting fixed. So we're moving
> > in
> > > > the right direction imo.
> > > >
> > > > On 3/11/21 19:25, David Capwell wrote:
> > > > >> It’s hard to gate commit on a clean CI run when there’s flaky
> tests
> > > > > I agree, this is also why so much effort was done in 4.0 release to
> > > > remove as much as possible.  Just over 1 month ago we were not really
> > > > having a flaky test issue (outside of the sporadic timeout issues; my
> > > > circle ci runs were green constantly), and now the “flaky tests” I
> see
> > > are
> > > > all actual bugs (been root causing 2 out of the 3 I reported) and
> some
> > > (not
> > > > all) of the flakyness was triggered by recent changes in the past
> > month.
> > > > >
> > > > > Right now people do not believe the failing test is caused by their
> > > > patch and attribute to flakiness, which then causes the builds to
> start
> > > > being flaky, which then leads to a different author coming to fix the
> > > > issue; this behavior is what I would love to see go away.  If we
> find a
> > > > flaky test, we should do the following
> > > > >
> > > > > 1) has it already been reported and who is working to fix?  Can we
> > > block
> > > > this patch on the test being fixed?  Flaky tests due to timing issues
> > > > normally are resolved very quickly, real bugs take longer.
> > > > > 2) if not reported, why?  If you are the first to see this issue
> than
> > > > good chance the patch caused the issue so should root cause.  If you
> > are
> > > > not the first to see it, why did others not report it (we tend to be
> > good
> > > > about this, even to the point Brandon has to mark the new tickets as
> > > dups…)?
> > > > >
> > > > > I have committed when there were flakiness, and I have caused
> > > flakiness;
> > > > not saying I am perfect or that I do the above, just saying that if
> we
> > > all
> > > > moved to the above model we could start relying on CI.  The biggest
> > > impact
> > > > to our stability is people actually root causing flaky tests.
> > > > >
> > > > >>  I think we're going to need a system that
> > > > >> understands the difference between success, failure, and timeouts
> > > > >
> > > > > I am curious how this system can know that the timeout is not an
> > actual
> > > > failure.  There was a bug in 4.0 with time serialization in message,
> > > which
> > > > would cause the message to get dropped; this presented itself as a
> > > timeout
> > > > if I remember properly (Jon Meredith or Yifan Cai fixed this bug I
> > > believe).
> > > > >
> > > > >> On Nov 3, 2021, at 10:56 AM, Brandon Williams <dr...@gmail.com>
> > > wrote:
> > > > >>
> > > > >> On Wed, Nov 3, 2021 at 12:35 PM benedict@apache.org <
> > > > benedict@apache.org> wrote:
> > > > >>> The largest number of test failures turn out (as pointed out by
> > > David)
> > > > to be due to how arcane it was to trigger the full test suite.
> > Hopefully
> > > we
> > > > can get on top of that, but I think a significant remaining issue is
> a
> > > lack
> > > > of trust in the output of CI. It’s hard to gate commit on a clean CI
> > run
> > > > when there’s flaky tests, and it doesn’t take much to misattribute
> one
> > > > failing test to the existing flakiness (I tend to compare to a run of
> > the
> > > > trunk baseline for comparison, but this is burdensome and still error
> > > > prone). The more flaky tests there are the more likely this is.
> > > > >>>
> > > > >>> This is in my opinion the real cost of flaky tests, and it’s
> > probably
> > > > worth trying to crack down on them hard if we can. It’s possible the
> > > > Simulator may help here, when I finally finish it up, as we can port
> > > flaky
> > > > tests to run with the Simulator and the failing seed can then be
> > explored
> > > > deterministically (all being well).
> > > > >> I totally agree that the lack of trust is a driving problem here,
> > even
> > > > >> in knowing which CI system to rely on. When Jenkins broke but
> Circle
> > > > >> was fine, we all assumed it was a problem with Jenkins, right up
> > until
> > > > >> Circle also broke.
> > > > >>
> > > > >> In testing a distributed system like this I think we're always
> going
> > > > >> to have failures, even on non-flaky tests, simply because the
> > > > >> underlying infrastructure is variable with transient failures of
> its
> > > > >> own (the network is reliable!)  We can fix the flakies where the
> > fault
> > > > >> is in the code (and we've done this to many already) but to get
> more
> > > > >> trustworthy output, I think we're going to need a system that
> > > > >> understands the difference between success, failure, and timeouts,
> > and
> > > > >> in the latter case knows how to at least mark them differently.
> > > > >> Simulator may help, as do the in-jvm dtests, but there is
> ultimately
> > > > >> no way to cover everything without doing some things the hard,
> more
> > > > >> realistic way where sometimes shit happens, marring the
> > almost-perfect
> > > > >> runs with noisy doubt, which then has to be sifted through to
> > > > >> determine if there was a real issue.
> > > > >>
> > > > >>
> > ---------------------------------------------------------------------
> > > > >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > > > >>
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

To checkpoint this conversation and keep it going, the ideas I see
in-thread (light editorializing by me):
1. Blocking PR merge on CI being green (viable for single branch commits,
less so for multiple)
2. A change in our expected culture of "if you see something, fix
something" when it comes to test failures on a branch (requires stable
green test board to be viable)
3. Clearer merge criteria and potentially updates to circle config for
committers in terms of "which test suites need to be run" (notably,
including upgrade tests)
4. Integration of model and property based fuzz testing into the release
qualification pipeline at least
5. Improvements in project dependency management, most notably in-jvm dtest
API's, and the release process around that

So a) Am I missing anything, and b) Am I getting anything wrong in the
summary above?

On Thu, Nov 4, 2021 at 9:01 AM Andrés de la Peña <ad...@apache.org>
wrote:

> Hi all,
>
> we already have a way to confirm flakiness on circle by running the test
> > repeatedly N times. Like 100 or 500. That has proven to work very well
> > so far, at least for me. #collaborating #justfyi
>
>
> I think it would be helpful if we always ran the repeated test jobs at
> CircleCI when we add a new test or modify an existing one. Running those
> jobs, when applicable, could be a requirement before committing. This
> wouldn't help us when the changes affect many different tests or we are not
> able to identify the tests affected by our changes, but I think it could
> have prevented many of the recently fixed flakies.
>
>
> On Thu, 4 Nov 2021 at 12:24, Joshua McKenzie <jm...@apache.org> wrote:
>
> > >
> > > we noticed CI going from a
> > > steady 3-ish failures to many and it's getting fixed. So we're moving
> in
> > > the right direction imo.
> > >
> > An observation about this: there's tooling and technology widely in use
> to
> > help prevent ever getting into this state (to Benedict's point: blocking
> > merge on CI failure, or nightly tests and reverting regression commits,
> > etc). I think there's significant time and energy savings for us in using
> > automation to be proactive about the quality of our test boards rather
> than
> > reactive.
> >
> > I 100% agree that it's heartening to see that the quality of the codebase
> > is improving as is the discipline / attentiveness of our collective
> > culture. That said, I believe we still have a pretty fragile system when
> it
> > comes to test failure accumulation.
> >
> > On Thu, Nov 4, 2021 at 2:46 AM Berenguer Blasi <berenguerblasi@gmail.com
> >
> > wrote:
> >
> > > I agree with David. CI has been pretty reliable besides the random
> > > jenkins going down or timeout. The same 3 or 4 tests were the only
> flaky
> > > ones in jenkins and Circle was very green. I bisected a couple failures
> > > to legit code errors, David is fixing some more, others have as well,
> etc
> > >
> > > It is good news imo as we're just getting to learn our CI post 4.0 is
> > > reliable and we need to start treating it as so and paying attention to
> > > it's reports. Not perfect but reliable enough it would have prevented
> > > those bugs getting merged.
> > >
> > > In fact we're having this conversation bc we noticed CI going from a
> > > steady 3-ish failures to many and it's getting fixed. So we're moving
> in
> > > the right direction imo.
> > >
> > > On 3/11/21 19:25, David Capwell wrote:
> > > >> It’s hard to gate commit on a clean CI run when there’s flaky tests
> > > > I agree, this is also why so much effort was done in 4.0 release to
> > > remove as much as possible.  Just over 1 month ago we were not really
> > > having a flaky test issue (outside of the sporadic timeout issues; my
> > > circle ci runs were green constantly), and now the “flaky tests” I see
> > are
> > > all actual bugs (been root causing 2 out of the 3 I reported) and some
> > (not
> > > all) of the flakyness was triggered by recent changes in the past
> month.
> > > >
> > > > Right now people do not believe the failing test is caused by their
> > > patch and attribute to flakiness, which then causes the builds to start
> > > being flaky, which then leads to a different author coming to fix the
> > > issue; this behavior is what I would love to see go away.  If we find a
> > > flaky test, we should do the following
> > > >
> > > > 1) has it already been reported and who is working to fix?  Can we
> > block
> > > this patch on the test being fixed?  Flaky tests due to timing issues
> > > normally are resolved very quickly, real bugs take longer.
> > > > 2) if not reported, why?  If you are the first to see this issue than
> > > good chance the patch caused the issue so should root cause.  If you
> are
> > > not the first to see it, why did others not report it (we tend to be
> good
> > > about this, even to the point Brandon has to mark the new tickets as
> > dups…)?
> > > >
> > > > I have committed when there were flakiness, and I have caused
> > flakiness;
> > > not saying I am perfect or that I do the above, just saying that if we
> > all
> > > moved to the above model we could start relying on CI.  The biggest
> > impact
> > > to our stability is people actually root causing flaky tests.
> > > >
> > > >>  I think we're going to need a system that
> > > >> understands the difference between success, failure, and timeouts
> > > >
> > > > I am curious how this system can know that the timeout is not an
> actual
> > > failure.  There was a bug in 4.0 with time serialization in message,
> > which
> > > would cause the message to get dropped; this presented itself as a
> > timeout
> > > if I remember properly (Jon Meredith or Yifan Cai fixed this bug I
> > believe).
> > > >
> > > >> On Nov 3, 2021, at 10:56 AM, Brandon Williams <dr...@gmail.com>
> > wrote:
> > > >>
> > > >> On Wed, Nov 3, 2021 at 12:35 PM benedict@apache.org <
> > > benedict@apache.org> wrote:
> > > >>> The largest number of test failures turn out (as pointed out by
> > David)
> > > to be due to how arcane it was to trigger the full test suite.
> Hopefully
> > we
> > > can get on top of that, but I think a significant remaining issue is a
> > lack
> > > of trust in the output of CI. It’s hard to gate commit on a clean CI
> run
> > > when there’s flaky tests, and it doesn’t take much to misattribute one
> > > failing test to the existing flakiness (I tend to compare to a run of
> the
> > > trunk baseline for comparison, but this is burdensome and still error
> > > prone). The more flaky tests there are the more likely this is.
> > > >>>
> > > >>> This is in my opinion the real cost of flaky tests, and it’s
> probably
> > > worth trying to crack down on them hard if we can. It’s possible the
> > > Simulator may help here, when I finally finish it up, as we can port
> > flaky
> > > tests to run with the Simulator and the failing seed can then be
> explored
> > > deterministically (all being well).
> > > >> I totally agree that the lack of trust is a driving problem here,
> even
> > > >> in knowing which CI system to rely on. When Jenkins broke but Circle
> > > >> was fine, we all assumed it was a problem with Jenkins, right up
> until
> > > >> Circle also broke.
> > > >>
> > > >> In testing a distributed system like this I think we're always going
> > > >> to have failures, even on non-flaky tests, simply because the
> > > >> underlying infrastructure is variable with transient failures of its
> > > >> own (the network is reliable!)  We can fix the flakies where the
> fault
> > > >> is in the code (and we've done this to many already) but to get more
> > > >> trustworthy output, I think we're going to need a system that
> > > >> understands the difference between success, failure, and timeouts,
> and
> > > >> in the latter case knows how to at least mark them differently.
> > > >> Simulator may help, as do the in-jvm dtests, but there is ultimately
> > > >> no way to cover everything without doing some things the hard, more
> > > >> realistic way where sometimes shit happens, marring the
> almost-perfect
> > > >> runs with noisy doubt, which then has to be sifted through to
> > > >> determine if there was a real issue.
> > > >>
> > > >>
> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >>
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> > >
> >
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Andrés de la Peña <ad...@apache.org>.

Hi all,

we already have a way to confirm flakiness on circle by running the test
> repeatedly N times. Like 100 or 500. That has proven to work very well
> so far, at least for me. #collaborating #justfyi


I think it would be helpful if we always ran the repeated test jobs at
CircleCI when we add a new test or modify an existing one. Running those
jobs, when applicable, could be a requirement before committing. This
wouldn't help us when the changes affect many different tests or we are not
able to identify the tests affected by our changes, but I think it could
have prevented many of the recently fixed flakies.


On Thu, 4 Nov 2021 at 12:24, Joshua McKenzie <jm...@apache.org> wrote:

> >
> > we noticed CI going from a
> > steady 3-ish failures to many and it's getting fixed. So we're moving in
> > the right direction imo.
> >
> An observation about this: there's tooling and technology widely in use to
> help prevent ever getting into this state (to Benedict's point: blocking
> merge on CI failure, or nightly tests and reverting regression commits,
> etc). I think there's significant time and energy savings for us in using
> automation to be proactive about the quality of our test boards rather than
> reactive.
>
> I 100% agree that it's heartening to see that the quality of the codebase
> is improving as is the discipline / attentiveness of our collective
> culture. That said, I believe we still have a pretty fragile system when it
> comes to test failure accumulation.
>
> On Thu, Nov 4, 2021 at 2:46 AM Berenguer Blasi <be...@gmail.com>
> wrote:
>
> > I agree with David. CI has been pretty reliable besides the random
> > jenkins going down or timeout. The same 3 or 4 tests were the only flaky
> > ones in jenkins and Circle was very green. I bisected a couple failures
> > to legit code errors, David is fixing some more, others have as well, etc
> >
> > It is good news imo as we're just getting to learn our CI post 4.0 is
> > reliable and we need to start treating it as so and paying attention to
> > it's reports. Not perfect but reliable enough it would have prevented
> > those bugs getting merged.
> >
> > In fact we're having this conversation bc we noticed CI going from a
> > steady 3-ish failures to many and it's getting fixed. So we're moving in
> > the right direction imo.
> >
> > On 3/11/21 19:25, David Capwell wrote:
> > >> It’s hard to gate commit on a clean CI run when there’s flaky tests
> > > I agree, this is also why so much effort was done in 4.0 release to
> > remove as much as possible.  Just over 1 month ago we were not really
> > having a flaky test issue (outside of the sporadic timeout issues; my
> > circle ci runs were green constantly), and now the “flaky tests” I see
> are
> > all actual bugs (been root causing 2 out of the 3 I reported) and some
> (not
> > all) of the flakyness was triggered by recent changes in the past month.
> > >
> > > Right now people do not believe the failing test is caused by their
> > patch and attribute to flakiness, which then causes the builds to start
> > being flaky, which then leads to a different author coming to fix the
> > issue; this behavior is what I would love to see go away.  If we find a
> > flaky test, we should do the following
> > >
> > > 1) has it already been reported and who is working to fix?  Can we
> block
> > this patch on the test being fixed?  Flaky tests due to timing issues
> > normally are resolved very quickly, real bugs take longer.
> > > 2) if not reported, why?  If you are the first to see this issue than
> > good chance the patch caused the issue so should root cause.  If you are
> > not the first to see it, why did others not report it (we tend to be good
> > about this, even to the point Brandon has to mark the new tickets as
> dups…)?
> > >
> > > I have committed when there were flakiness, and I have caused
> flakiness;
> > not saying I am perfect or that I do the above, just saying that if we
> all
> > moved to the above model we could start relying on CI.  The biggest
> impact
> > to our stability is people actually root causing flaky tests.
> > >
> > >>  I think we're going to need a system that
> > >> understands the difference between success, failure, and timeouts
> > >
> > > I am curious how this system can know that the timeout is not an actual
> > failure.  There was a bug in 4.0 with time serialization in message,
> which
> > would cause the message to get dropped; this presented itself as a
> timeout
> > if I remember properly (Jon Meredith or Yifan Cai fixed this bug I
> believe).
> > >
> > >> On Nov 3, 2021, at 10:56 AM, Brandon Williams <dr...@gmail.com>
> wrote:
> > >>
> > >> On Wed, Nov 3, 2021 at 12:35 PM benedict@apache.org <
> > benedict@apache.org> wrote:
> > >>> The largest number of test failures turn out (as pointed out by
> David)
> > to be due to how arcane it was to trigger the full test suite. Hopefully
> we
> > can get on top of that, but I think a significant remaining issue is a
> lack
> > of trust in the output of CI. It’s hard to gate commit on a clean CI run
> > when there’s flaky tests, and it doesn’t take much to misattribute one
> > failing test to the existing flakiness (I tend to compare to a run of the
> > trunk baseline for comparison, but this is burdensome and still error
> > prone). The more flaky tests there are the more likely this is.
> > >>>
> > >>> This is in my opinion the real cost of flaky tests, and it’s probably
> > worth trying to crack down on them hard if we can. It’s possible the
> > Simulator may help here, when I finally finish it up, as we can port
> flaky
> > tests to run with the Simulator and the failing seed can then be explored
> > deterministically (all being well).
> > >> I totally agree that the lack of trust is a driving problem here, even
> > >> in knowing which CI system to rely on. When Jenkins broke but Circle
> > >> was fine, we all assumed it was a problem with Jenkins, right up until
> > >> Circle also broke.
> > >>
> > >> In testing a distributed system like this I think we're always going
> > >> to have failures, even on non-flaky tests, simply because the
> > >> underlying infrastructure is variable with transient failures of its
> > >> own (the network is reliable!)  We can fix the flakies where the fault
> > >> is in the code (and we've done this to many already) but to get more
> > >> trustworthy output, I think we're going to need a system that
> > >> understands the difference between success, failure, and timeouts, and
> > >> in the latter case knows how to at least mark them differently.
> > >> Simulator may help, as do the in-jvm dtests, but there is ultimately
> > >> no way to cover everything without doing some things the hard, more
> > >> realistic way where sometimes shit happens, marring the almost-perfect
> > >> runs with noisy doubt, which then has to be sifted through to
> > >> determine if there was a real issue.
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Joshua McKenzie <jm...@apache.org>.

>
> we noticed CI going from a
> steady 3-ish failures to many and it's getting fixed. So we're moving in
> the right direction imo.
>
An observation about this: there's tooling and technology widely in use to
help prevent ever getting into this state (to Benedict's point: blocking
merge on CI failure, or nightly tests and reverting regression commits,
etc). I think there's significant time and energy savings for us in using
automation to be proactive about the quality of our test boards rather than
reactive.

I 100% agree that it's heartening to see that the quality of the codebase
is improving as is the discipline / attentiveness of our collective
culture. That said, I believe we still have a pretty fragile system when it
comes to test failure accumulation.

On Thu, Nov 4, 2021 at 2:46 AM Berenguer Blasi <be...@gmail.com>
wrote:

> I agree with David. CI has been pretty reliable besides the random
> jenkins going down or timeout. The same 3 or 4 tests were the only flaky
> ones in jenkins and Circle was very green. I bisected a couple failures
> to legit code errors, David is fixing some more, others have as well, etc
>
> It is good news imo as we're just getting to learn our CI post 4.0 is
> reliable and we need to start treating it as so and paying attention to
> it's reports. Not perfect but reliable enough it would have prevented
> those bugs getting merged.
>
> In fact we're having this conversation bc we noticed CI going from a
> steady 3-ish failures to many and it's getting fixed. So we're moving in
> the right direction imo.
>
> On 3/11/21 19:25, David Capwell wrote:
> >> It’s hard to gate commit on a clean CI run when there’s flaky tests
> > I agree, this is also why so much effort was done in 4.0 release to
> remove as much as possible.  Just over 1 month ago we were not really
> having a flaky test issue (outside of the sporadic timeout issues; my
> circle ci runs were green constantly), and now the “flaky tests” I see are
> all actual bugs (been root causing 2 out of the 3 I reported) and some (not
> all) of the flakyness was triggered by recent changes in the past month.
> >
> > Right now people do not believe the failing test is caused by their
> patch and attribute to flakiness, which then causes the builds to start
> being flaky, which then leads to a different author coming to fix the
> issue; this behavior is what I would love to see go away.  If we find a
> flaky test, we should do the following
> >
> > 1) has it already been reported and who is working to fix?  Can we block
> this patch on the test being fixed?  Flaky tests due to timing issues
> normally are resolved very quickly, real bugs take longer.
> > 2) if not reported, why?  If you are the first to see this issue than
> good chance the patch caused the issue so should root cause.  If you are
> not the first to see it, why did others not report it (we tend to be good
> about this, even to the point Brandon has to mark the new tickets as dups…)?
> >
> > I have committed when there were flakiness, and I have caused flakiness;
> not saying I am perfect or that I do the above, just saying that if we all
> moved to the above model we could start relying on CI.  The biggest impact
> to our stability is people actually root causing flaky tests.
> >
> >>  I think we're going to need a system that
> >> understands the difference between success, failure, and timeouts
> >
> > I am curious how this system can know that the timeout is not an actual
> failure.  There was a bug in 4.0 with time serialization in message, which
> would cause the message to get dropped; this presented itself as a timeout
> if I remember properly (Jon Meredith or Yifan Cai fixed this bug I believe).
> >
> >> On Nov 3, 2021, at 10:56 AM, Brandon Williams <dr...@gmail.com> wrote:
> >>
> >> On Wed, Nov 3, 2021 at 12:35 PM benedict@apache.org <
> benedict@apache.org> wrote:
> >>> The largest number of test failures turn out (as pointed out by David)
> to be due to how arcane it was to trigger the full test suite. Hopefully we
> can get on top of that, but I think a significant remaining issue is a lack
> of trust in the output of CI. It’s hard to gate commit on a clean CI run
> when there’s flaky tests, and it doesn’t take much to misattribute one
> failing test to the existing flakiness (I tend to compare to a run of the
> trunk baseline for comparison, but this is burdensome and still error
> prone). The more flaky tests there are the more likely this is.
> >>>
> >>> This is in my opinion the real cost of flaky tests, and it’s probably
> worth trying to crack down on them hard if we can. It’s possible the
> Simulator may help here, when I finally finish it up, as we can port flaky
> tests to run with the Simulator and the failing seed can then be explored
> deterministically (all being well).
> >> I totally agree that the lack of trust is a driving problem here, even
> >> in knowing which CI system to rely on. When Jenkins broke but Circle
> >> was fine, we all assumed it was a problem with Jenkins, right up until
> >> Circle also broke.
> >>
> >> In testing a distributed system like this I think we're always going
> >> to have failures, even on non-flaky tests, simply because the
> >> underlying infrastructure is variable with transient failures of its
> >> own (the network is reliable!)  We can fix the flakies where the fault
> >> is in the code (and we've done this to many already) but to get more
> >> trustworthy output, I think we're going to need a system that
> >> understands the difference between success, failure, and timeouts, and
> >> in the latter case knows how to at least mark them differently.
> >> Simulator may help, as do the in-jvm dtests, but there is ultimately
> >> no way to cover everything without doing some things the hard, more
> >> realistic way where sometimes shit happens, marring the almost-perfect
> >> runs with noisy doubt, which then has to be sifted through to
> >> determine if there was a real issue.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Releasable trunk and quality

Posted by Berenguer Blasi <be...@gmail.com>.

I agree with David. CI has been pretty reliable besides the random
jenkins going down or timeout. The same 3 or 4 tests were the only flaky
ones in jenkins and Circle was very green. I bisected a couple failures
to legit code errors, David is fixing some more, others have as well, etc

It is good news imo as we're just getting to learn our CI post 4.0 is
reliable and we need to start treating it as so and paying attention to
it's reports. Not perfect but reliable enough it would have prevented
those bugs getting merged.

In fact we're having this conversation bc we noticed CI going from a
steady 3-ish failures to many and it's getting fixed. So we're moving in
the right direction imo.

On 3/11/21 19:25, David Capwell wrote:
>> It’s hard to gate commit on a clean CI run when there’s flaky tests
> I agree, this is also why so much effort was done in 4.0 release to remove as much as possible.  Just over 1 month ago we were not really having a flaky test issue (outside of the sporadic timeout issues; my circle ci runs were green constantly), and now the “flaky tests” I see are all actual bugs (been root causing 2 out of the 3 I reported) and some (not all) of the flakyness was triggered by recent changes in the past month.
>
> Right now people do not believe the failing test is caused by their patch and attribute to flakiness, which then causes the builds to start being flaky, which then leads to a different author coming to fix the issue; this behavior is what I would love to see go away.  If we find a flaky test, we should do the following
>
> 1) has it already been reported and who is working to fix?  Can we block this patch on the test being fixed?  Flaky tests due to timing issues normally are resolved very quickly, real bugs take longer.
> 2) if not reported, why?  If you are the first to see this issue than good chance the patch caused the issue so should root cause.  If you are not the first to see it, why did others not report it (we tend to be good about this, even to the point Brandon has to mark the new tickets as dups…)?
>
> I have committed when there were flakiness, and I have caused flakiness; not saying I am perfect or that I do the above, just saying that if we all moved to the above model we could start relying on CI.  The biggest impact to our stability is people actually root causing flaky tests.
>
>>  I think we're going to need a system that
>> understands the difference between success, failure, and timeouts
>
> I am curious how this system can know that the timeout is not an actual failure.  There was a bug in 4.0 with time serialization in message, which would cause the message to get dropped; this presented itself as a timeout if I remember properly (Jon Meredith or Yifan Cai fixed this bug I believe).
>
>> On Nov 3, 2021, at 10:56 AM, Brandon Williams <dr...@gmail.com> wrote:
>>
>> On Wed, Nov 3, 2021 at 12:35 PM benedict@apache.org <be...@apache.org> wrote:
>>> The largest number of test failures turn out (as pointed out by David) to be due to how arcane it was to trigger the full test suite. Hopefully we can get on top of that, but I think a significant remaining issue is a lack of trust in the output of CI. It’s hard to gate commit on a clean CI run when there’s flaky tests, and it doesn’t take much to misattribute one failing test to the existing flakiness (I tend to compare to a run of the trunk baseline for comparison, but this is burdensome and still error prone). The more flaky tests there are the more likely this is.
>>>
>>> This is in my opinion the real cost of flaky tests, and it’s probably worth trying to crack down on them hard if we can. It’s possible the Simulator may help here, when I finally finish it up, as we can port flaky tests to run with the Simulator and the failing seed can then be explored deterministically (all being well).
>> I totally agree that the lack of trust is a driving problem here, even
>> in knowing which CI system to rely on. When Jenkins broke but Circle
>> was fine, we all assumed it was a problem with Jenkins, right up until
>> Circle also broke.
>>
>> In testing a distributed system like this I think we're always going
>> to have failures, even on non-flaky tests, simply because the
>> underlying infrastructure is variable with transient failures of its
>> own (the network is reliable!)  We can fix the flakies where the fault
>> is in the code (and we've done this to many already) but to get more
>> trustworthy output, I think we're going to need a system that
>> understands the difference between success, failure, and timeouts, and
>> in the latter case knows how to at least mark them differently.
>> Simulator may help, as do the in-jvm dtests, but there is ultimately
>> no way to cover everything without doing some things the hard, more
>> realistic way where sometimes shit happens, marring the almost-perfect
>> runs with noisy doubt, which then has to be sifted through to
>> determine if there was a real issue.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by Brandon Williams <dr...@gmail.com>.

On Wed, Nov 3, 2021 at 1:26 PM David Capwell <dc...@apple.com.invalid> wrote:

> >  I think we're going to need a system that
> > understands the difference between success, failure, and timeouts
>
>
> I am curious how this system can know that the timeout is not an actual failure.  There was a bug in 4.0 with time serialization in message, which would cause the message to get dropped; this presented itself as a timeout if I remember properly (Jon Meredith or Yifan Cai fixed this bug I believe).

I don't think it needs to understand the cause of the timeout, just be
able to differentiate.  Of course some bugs present as timeouts so an
eye will need to be kept on that, but test history can make that
simple.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by David Capwell <dc...@apple.com.INVALID>.

> It’s hard to gate commit on a clean CI run when there’s flaky tests

I agree, this is also why so much effort was done in 4.0 release to remove as much as possible.  Just over 1 month ago we were not really having a flaky test issue (outside of the sporadic timeout issues; my circle ci runs were green constantly), and now the “flaky tests” I see are all actual bugs (been root causing 2 out of the 3 I reported) and some (not all) of the flakyness was triggered by recent changes in the past month.

Right now people do not believe the failing test is caused by their patch and attribute to flakiness, which then causes the builds to start being flaky, which then leads to a different author coming to fix the issue; this behavior is what I would love to see go away.  If we find a flaky test, we should do the following

1) has it already been reported and who is working to fix?  Can we block this patch on the test being fixed?  Flaky tests due to timing issues normally are resolved very quickly, real bugs take longer.
2) if not reported, why?  If you are the first to see this issue than good chance the patch caused the issue so should root cause.  If you are not the first to see it, why did others not report it (we tend to be good about this, even to the point Brandon has to mark the new tickets as dups…)?

I have committed when there were flakiness, and I have caused flakiness; not saying I am perfect or that I do the above, just saying that if we all moved to the above model we could start relying on CI.  The biggest impact to our stability is people actually root causing flaky tests.

>  I think we're going to need a system that
> understands the difference between success, failure, and timeouts

I am curious how this system can know that the timeout is not an actual failure.  There was a bug in 4.0 with time serialization in message, which would cause the message to get dropped; this presented itself as a timeout if I remember properly (Jon Meredith or Yifan Cai fixed this bug I believe).

> On Nov 3, 2021, at 10:56 AM, Brandon Williams <dr...@gmail.com> wrote:
> 
> On Wed, Nov 3, 2021 at 12:35 PM benedict@apache.org <be...@apache.org> wrote:
>> 
>> The largest number of test failures turn out (as pointed out by David) to be due to how arcane it was to trigger the full test suite. Hopefully we can get on top of that, but I think a significant remaining issue is a lack of trust in the output of CI. It’s hard to gate commit on a clean CI run when there’s flaky tests, and it doesn’t take much to misattribute one failing test to the existing flakiness (I tend to compare to a run of the trunk baseline for comparison, but this is burdensome and still error prone). The more flaky tests there are the more likely this is.
>> 
>> This is in my opinion the real cost of flaky tests, and it’s probably worth trying to crack down on them hard if we can. It’s possible the Simulator may help here, when I finally finish it up, as we can port flaky tests to run with the Simulator and the failing seed can then be explored deterministically (all being well).
> 
> I totally agree that the lack of trust is a driving problem here, even
> in knowing which CI system to rely on. When Jenkins broke but Circle
> was fine, we all assumed it was a problem with Jenkins, right up until
> Circle also broke.
> 
> In testing a distributed system like this I think we're always going
> to have failures, even on non-flaky tests, simply because the
> underlying infrastructure is variable with transient failures of its
> own (the network is reliable!)  We can fix the flakies where the fault
> is in the code (and we've done this to many already) but to get more
> trustworthy output, I think we're going to need a system that
> understands the difference between success, failure, and timeouts, and
> in the latter case knows how to at least mark them differently.
> Simulator may help, as do the in-jvm dtests, but there is ultimately
> no way to cover everything without doing some things the hard, more
> realistic way where sometimes shit happens, marring the almost-perfect
> runs with noisy doubt, which then has to be sifted through to
> determine if there was a real issue.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by Brandon Williams <dr...@gmail.com>.

On Wed, Nov 3, 2021 at 12:35 PM benedict@apache.org <be...@apache.org> wrote:
>
> The largest number of test failures turn out (as pointed out by David) to be due to how arcane it was to trigger the full test suite. Hopefully we can get on top of that, but I think a significant remaining issue is a lack of trust in the output of CI. It’s hard to gate commit on a clean CI run when there’s flaky tests, and it doesn’t take much to misattribute one failing test to the existing flakiness (I tend to compare to a run of the trunk baseline for comparison, but this is burdensome and still error prone). The more flaky tests there are the more likely this is.
>
> This is in my opinion the real cost of flaky tests, and it’s probably worth trying to crack down on them hard if we can. It’s possible the Simulator may help here, when I finally finish it up, as we can port flaky tests to run with the Simulator and the failing seed can then be explored deterministically (all being well).

I totally agree that the lack of trust is a driving problem here, even
in knowing which CI system to rely on. When Jenkins broke but Circle
was fine, we all assumed it was a problem with Jenkins, right up until
Circle also broke.

In testing a distributed system like this I think we're always going
to have failures, even on non-flaky tests, simply because the
underlying infrastructure is variable with transient failures of its
own (the network is reliable!)  We can fix the flakies where the fault
is in the code (and we've done this to many already) but to get more
trustworthy output, I think we're going to need a system that
understands the difference between success, failure, and timeouts, and
in the latter case knows how to at least mark them differently.
Simulator may help, as do the in-jvm dtests, but there is ultimately
no way to cover everything without doing some things the hard, more
realistic way where sometimes shit happens, marring the almost-perfect
runs with noisy doubt, which then has to be sifted through to
determine if there was a real issue.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by "benedict@apache.org" <be...@apache.org>.

The largest number of test failures turn out (as pointed out by David) to be due to how arcane it was to trigger the full test suite. Hopefully we can get on top of that, but I think a significant remaining issue is a lack of trust in the output of CI. It’s hard to gate commit on a clean CI run when there’s flaky tests, and it doesn’t take much to misattribute one failing test to the existing flakiness (I tend to compare to a run of the trunk baseline for comparison, but this is burdensome and still error prone). The more flaky tests there are the more likely this is.

This is in my opinion the real cost of flaky tests, and it’s probably worth trying to crack down on them hard if we can. It’s possible the Simulator may help here, when I finally finish it up, as we can port flaky tests to run with the Simulator and the failing seed can then be explored deterministically (all being well).

From: Brandon Williams <dr...@gmail.com>
Date: Wednesday, 3 November 2021 at 17:07
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] Releasable trunk and quality
On Mon, Nov 1, 2021 at 5:03 PM David Capwell <dc...@apple.com.invalid> wrote:
>
> > How do we define what "releasable trunk" means?
>
> One thing I would love is for us to adopt a “run all tests needed to release before commit” mentality, and to link a successful run in JIRA when closing (we talked about this once in slack).  If we look at CircleCI we currently do not run all the tests needed to sign off; below are the tests disabled in the “pre-commit” workflows (see https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml#L381):

A good first step toward this would be for us to treat our binding +1s
more judiciously, and not grant any without at least a pre-commit CI
run linked in the ticket.  You don't have to look very hard to find a
lot of these today (I know I'm guilty), and it's possible we wouldn't
have the current CI mess now if we had been a little bit more
diligent.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by Brandon Williams <dr...@gmail.com>.

On Mon, Nov 1, 2021 at 5:03 PM David Capwell <dc...@apple.com.invalid> wrote:
>
> > How do we define what "releasable trunk" means?
>
> One thing I would love is for us to adopt a “run all tests needed to release before commit” mentality, and to link a successful run in JIRA when closing (we talked about this once in slack).  If we look at CircleCI we currently do not run all the tests needed to sign off; below are the tests disabled in the “pre-commit” workflows (see https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml#L381):

A good first step toward this would be for us to treat our binding +1s
more judiciously, and not grant any without at least a pre-commit CI
run linked in the ticket.  You don't have to look very hard to find a
lot of these today (I know I'm guilty), and it's possible we wouldn't
have the current CI mess now if we had been a little bit more
diligent.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Posted by David Capwell <dc...@apple.com.INVALID>.

> I have to apologise here. CircleCI did not uncover these problems, apparently due to some way it resolves dependencies,

I double checked your CircleCI run for the trunk branch, and the problem doesn’t have to do with “resolves dependencies”, the problem lies with our CI being too complex and doesn’t natively support multi-branch commits.

Right now you need to opt-in to 2 builds to run the single jvm-dtest upgrade test build (missed in your CI); this should not be opt-in (see my previous comment about this), and it really shouldn’t be 2 approvals for a single build…
Enabling “upgrade tests” does not run all the upgrade tests… you need to approve 2 other builds to run the full set of upgrade tests (see problem above).  I see in the build you ran the upgrade tests, which only touches the python-dtest upgrade tests
Lastly, you need to hack the circleci configuration to support multi-branch CI, if you do not it will run against w/e is already committed to 2.2, 3.0, 3.11, and 4.0.  Multi-branch commits are very normal for our project, but doing CI properly in these cases is way too hard (you can not do multi-branch tests in Jenkins https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch-test/build <https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch-test/build>; no support to run against your other branches).

> On Nov 1, 2021, at 3:03 PM, David Capwell <dc...@apple.com.INVALID> wrote:
> 
>> How do we define what "releasable trunk" means?
> 
> One thing I would love is for us to adopt a “run all tests needed to release before commit” mentality, and to link a successful run in JIRA when closing (we talked about this once in slack).  If we look at CircleCI we currently do not run all the tests needed to sign off; below are the tests disabled in the “pre-commit” workflows (see https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml#L381):
> 
> start_utests_long
> start_utests_compression
> start_utests_stress
> start_utests_fqltool
> start_utests_system_keyspace_directory
> start_jvm_upgrade_dtest
> start_upgrade_tests
> 
> Given the configuration right now we have to opt-in to upgrade tests, but we can’t release if those are broken (same for compression/fqltool/cdc (not covered in circle)).
> 
>> On Oct 30, 2021, at 6:24 AM, benedict@apache.org wrote:
>> 
>>> How do we define what "releasable trunk" means?
>> 
>> For me, the major criteria is ensuring that work is not merged that is known to require follow-up work, or could reasonably have been known to require follow-up work if better QA practices had been followed.
>> 
>> So, a big part of this is ensuring we continue to exceed our targets for improved QA. For me this means trying to weave tools like Harry and the Simulator into our development workflow early on, but we’ll see how well these tools gain broader adoption. This also means focus in general on possible negative effects of a change.
>> 
>> I think we could do with producing guidance documentation for how to approach QA, where we can record our best practices and evolve them as we discover flaws or pitfalls, either for ergonomics or for bug discovery.
>> 
>>> What are the benefits of having a releasable trunk as defined here?
>> 
>> If we want to have any hope of meeting reasonable release cadences _and_ the high project quality we expect today, then I think a ~shippable trunk policy is an absolute necessity.
>> 
>> I don’t think means guaranteeing there are no failing tests (though ideally this would also happen), but about ensuring our best practices are followed for every merge. 4.0 took so long to release because of the amount of hidden work that was created by merging work that didn’t meet the standard for release.
>> 
>> Historically we have also had significant pressure to backport features to earlier versions due to the cost and risk of upgrading. If we maintain broader version compatibility for upgrade, and reduce the risk of adopting newer versions, then this pressure is also reduced significantly. Though perhaps we will stick to our guns here anyway, as there seems to be renewed pressure to limit work in GA releases to bug fixes exclusively. It remains to be seen if this holds.
>> 
>>> What are the costs?
>> 
>> I think the costs are quite low, perhaps even negative. Hidden work produced by merges that break things can be much more costly than getting the work right first time, as attribution is much more challenging.
>> 
>> One cost that is created, however, is for version compatibility as we cannot say “well, this is a minor version bump so we don’t need to support downgrade”. But I think we should be investing in this anyway for operator simplicity and confidence, so I actually see this as a benefit as well.
>> 
>>> Full disclosure: running face-first into 60+ failing tests on trunk
>> 
>> I have to apologise here. CircleCI did not uncover these problems, apparently due to some way it resolves dependencies, and so I am responsible for a significant number of these and have been quite sick since.
>> 
>> I think a push to eliminate flaky tests will probably help here in future, though, and perhaps the project needs to have some (low) threshold of flaky or failing tests at which point we block merges to force a correction.
>> 
>> 
>> From: Joshua McKenzie <jm...@apache.org>
>> Date: Saturday, 30 October 2021 at 14:00
>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>> Subject: [DISCUSS] Releasable trunk and quality
>> We as a project have gone back and forth on the topic of quality and the
>> notion of a releasable trunk for quite a few years. If people are
>> interested, I'd like to rekindle this discussion a bit and see if we're
>> happy with where we are as a project or if we think there's steps we should
>> take to change the quality bar going forward. The following questions have
>> been rattling around for me for awhile:
>> 
>> 1. How do we define what "releasable trunk" means? All reviewed by M
>> committers? Passing N% of tests? Passing all tests plus some other metrics
>> (manual testing, raising the number of reviewers, test coverage, usage in
>> dev or QA environments, etc)? Something else entirely?
>> 
>> 2. With a definition settled upon in #1, what steps, if any, do we need to
>> take to get from where we are to having *and keeping* that releasable
>> trunk? Anything to codify there?
>> 
>> 3. What are the benefits of having a releasable trunk as defined here? What
>> are the costs? Is it worth pursuing? What are the alternatives (for
>> instance: a freeze before a release + stabilization focus by the community
>> i.e. 4.0 push or the tock in tick-tock)?
>> 
>> Given the large volumes of work coming down the pike with CEP's, this seems
>> like a good time to at least check in on this topic as a community.
>> 
>> Full disclosure: running face-first into 60+ failing tests on trunk when
>> going through the commit process for denylisting this week brought this
>> topic back up for me (reminds me of when I went to merge CDC back in 3.6
>> and those test failures riled me up... I sense a pattern ;))
>> 
>> Looking forward to hearing what people think.
>> 
>> ~Josh
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>