You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Josh McKenzie <jm...@apache.org> on 2022/08/03 17:16:28 UTC

Cassandra project status update 2022-08-03

Greetings everyone! Let's check in on 4.1, see how we're doing:

https://butler.cassandra.apache.org/#/
We had 4 failures on our last run. We've gone back and forth a bit with the CASTest failure, a test introduced back in CASSANDRA-12126 @Ignore'd, however that showed some legitimate failures that should be addressed by Paxos V2. If anyone from the discussion has the cycles (or someone with familiarity with the area) could take assignee on the test failure ticket (17461) and responsibility for driving it to resolution that would help clarify our efforts there. (https://issues.apache.org/jira/browse/CASSANDRA-17461)

Along with that, we saw a failure in TopPartitionsTest.testServiceTopPartitionsSingleTable (cdc) and TestBootstrap.test_simultaneous_bootstrap (offheap). Given both are specific configurations of tests that ran successfully to completion in other configurations there's a reasonable chance they're flaky, be it from the logic of the test or the CI environment in which they're executing. Neither tickets appear to have active JIRA's associated with them in butler or in the kanban board (https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=496&quickFilter=2252) so we could use a volunteer here to both create those tickets and to drive them.

We're close enough that we're ready to again visit how we want to treat the requirement for no flaky failures before we cut beta (https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle, "No flaky tests - All tests (Unit Tests and DTests) should pass consistently"). After seeing a couple releases with this requirement (4.0 and now 4.1), I'm inclined to agree with the comment from Dinesh that we should revise this requirement formally if we're going to effectively release with flaky tests anyway; best to be honest with ourselves and acknowledge it's not proving to be a forcing function for changing behavior. If this email doesn't see much traction on this topic I'll hit up the dev list with a DISCUSS thread on it.

The kanban for 4.1 blockers show us 13 tickets: https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2455. Most of them are assigned and many in progress, however we have 3 unassigned if anyone wants to pick those up: https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2455&quickFilter=2160


[New Contributors Getting Started]
One of the three issues on 4.1 blocker list or either of the 2 failing tests listed above would be great areas to focus your attention!

Nuts and bolts / env / etc: here's an explanation of various types of contribution: https://cassandra.apache.org/_/community.html#how-to-contribute
An overview of the C* architecture: https://cassandra.apache.org/doc/latest/cassandra/architecture/overview.html
And here's our getting started contributing guide: https://cassandra.apache.org/_/development/index.html
We hang out in #cassandra-dev on https://the-asf.slack.com, and you can ping the @cassandra_mentors alias to reach 13 of us who have volunteered to mentor new contributors on the project. Looking forward to seeing you there.


[Dev list Digest]
https://lists.apache.org/list?dev@cassandra.apache.org:lte=2w:

The challenge of our eclectic usage of NULL strikes again with CEP-15. Avi opened up a ticket about this with https://issues.apache.org/jira/browse/CASSANDRA-17762. Caleb's working on the CQL support for multi-partition transactions on https://issues.apache.org/jira/browse/CASSANDRA-17719 where the general sentiment seems to be "let's go with a SQL-congruent syntax".

Discussion about the potential benefits and downsides of a multi-threaded flushing CommitLog continue: https://lists.apache.org/thread/5j8ljtpdw3g0gyrx6m31gh1gjdkztclg. As this project is quite complex and has very different performance characteristics over time (in-memory initially only vs. long-term flushed to disk maintaining LSM trees), benchmarking features like this has proven difficult. Anyone with a perspective on the cost/benefits or who's interested in balancing that complexity vs. functionality feel free to chime in.

An interesting question about inclusivity or exclusivity of token ranges and API consistency came up thanks to https://issues.apache.org/jira/browse/CASSANDRA-17575. https://lists.apache.org/thread/4tm626ffnqlvt4cbmopdfpd8w6fpqscd. This link doesn't capture the entire thread for some reason; the most clarifying observation to me comes from Jeremiah about the current usage of tokens in the tool: "Reading the responses here and taking a step back, I think the current behavior of nodetool compact is probably the correct behavior. The main use case I can see for using nodetool compact is someone wants to take some sstable and compact it with all the overlapping sstables"

And last but not least, Claude Warren is looking for a reviewer on https://issues.apache.org/jira/browse/CASSANDRA-14218. Looks like Dinesh was flagged on that as reviewer awhile ago.

[CI Trends]
https://butler.cassandra.apache.org/#/

The last three weeks show us ticking up but the reason is not too surprising:

3.0: 10 -> 14
3.11: 15 -> 17
4.0: 1 -> 6
4.1: 5 -> 4
trunk: 5 -> 7

On the 3.0-4.0 branches, this looks to be due to TestRepair failing (https://issues.apache.org/jira/browse/CASSANDRA-17701 and https://issues.apache.org/jira/browse/CASSANDRA-17702). Neither of those tickets yet have an assignee so if anyone has the cycles or context to look into them that'd be great.

4.1 failures are slowly but surely contracting.


[Release progress]
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2175

4.1 beta:
We closed out 8 issues in the past couple of weeks. Some test fixes, restarting on gossip only nodes (CASSANDRA-17752), adding validation that the new config params are structured as we expect in 4.1 for JMX (CASSANDRA-17738), and cleaning up a straightforward doubling of the writePreparedStatement call in CASSANDRA-17764.

4.1 rc:
Test fix (CASSANDRA-17769)

Been a pretty quiet week on our older branches.

So to sum it up:
- CASTest failures blocking 4.1: https://issues.apache.org/jira/browse/CASSANDRA-17461, needs assignee
- Regression on some TestRepair: https://issues.apache.org/jira/browse/CASSANDRA-17701 and https://issues.apache.org/jira/browse/CASSANDRA-17702, needs assignee
- We should discuss whether we want to cut 4.1 w/known flaky tests in ASF CI or if we need to introduce more formal metrics around what "having no flakes" means (3, 5, 10 clean runs? Something else?)

Thanks as always everyone; see you on slack.

~Josh

Re: Cassandra project status update 2022-08-03

Posted by Andrés de la Peña <ad...@apache.org>.
>
> > I think if we want to do this, it should be extremely easy - by which I
> mean automatic, really. This shouldn’t be too tricky I think? We just need
> to produce a diff of new test classes and methods within existing classes.


Having a CircleCI job that automatically runs all new/modified tests would
be a great way to prevent most of the new flakies. We would still miss some
cases, like unmodified tests that turn flaky after changing the tested
code, but I'd say that's not as usual.

> I can probably help out by putting together something to output @Test
> annotated methods within a source tree, if others are able to turn this
> into a part of the CircleCI pre-commit task (i.e. to pick the common
> ancestor with trunk, 4.1 etc, and run this task for each of the outputs)


I think we would need a bash/sh shell script taking a diff file and test
directory, and returning the file path and qualified class name of every
modified test class. I'd say we don't need the method names for Java tests
because quite often we see flaky tests that only fail when running their
entire class, so it's probably better to repeatedly run entire test classes
instead of particular methods.

We would also need a similar script for Python dtests. We would probably
want it to provide the full path of the modified tests (as
in cqlsh_tests/test_cqlsh.py::TestCqlshSmoke::test_create_index) because
those tests can be quite resource-intensive.

I think once we have those scripts we could plug their output to the
CircleCI commands for repeating tests.

Putting together all this seems relatively involved, so it can take us some
time to get it ready. In the meantime, I think it's a good practice to just
manually include any new/modified tests into the CircleCI config. Doing so
only requires to pass a few additional options to the script that generates
the config, which doesn't seem to require too much effort.

On Wed, 10 Aug 2022 at 19:47, Brandon Williams <dr...@gmail.com> wrote:

> > Side note, Butler is reporting CASSANDRA-17348 as open (it's resolved as
> a duplicate).
>
> This is fixed.
>

Re: Cassandra project status update 2022-08-03

Posted by Brandon Williams <dr...@gmail.com>.
> Side note, Butler is reporting CASSANDRA-17348 as open (it's resolved as a duplicate).

This is fixed.

Re: Cassandra project status update 2022-08-03

Posted by Mick Semb Wever <mc...@apache.org>.
On Wed, 10 Aug 2022 at 17:54, Josh McKenzie <jm...@apache.org> wrote:

> “ We can start by putting the bar at a lower level and raise the level
> over time when most of the flakies that we hit are above that level.”
> My only concern is only who and how will track that.
>
> What's Butler's logic for flagging things flaky? Maybe a "flaky low" vs.
> "flaky high" distinction based on failure frequency (or some much better
> name I'm sure someone else will come up with) could make sense?
>


I'd be keen to see orders of magnitude, rather than arbitrary labels. Also
per CI system (having the data for basic correlation between systems will
be useful in other discussions and decisions).



> Then we could focus our efforts on the ones that are flagged as failing at
> whatever high water mark threshold we set.
>


Maybe obvious, but so long there's a way to bypass this when a flaky is
identified as being a legit bug and/or in a critical component (even a 1:1M
flakiness in certain components can be disastrous).

Some other questions…
 - how to measure the flakiness
 - how to measure post-commit rates across both CI systems
 - where the flakiness labels(/orders-of-magnitude) should be recorded
 - how we label flakies as being legit/critical/blocking (currently you
often have to read through the comments)


Applying this manually to the remaining 4.1 blockers we have:
- CASSANDRA-17461 CASTest. 1:40 failures on circle. looks to be able 1:2 on
ci-cassandra
- CASSANDRA-17618 InternodeEncryptionEnforcementTest. 1:167 circle. no
flaky in ci-cassandra
- CASSANDRA-17804 AutoSnapshotTtlTest. unknown flakiness in both ci.
- CASSANDRA-17573 PaxosRepairTest. 1:20 circle. no flakies in ci-cassandra.
- CASSANDRA-17658 KeyspaceMetricsTest. 1:20 circle. no flakies in
ci-cassandra.

In addition to these, Butler lists a number of flakies against 4.1, but
these are not regressions in 4.1 hence are not blockers. The jira board is
currently not blocking a 4.1-beta release on non-regression flakies. This
means our releases are not blocked on overall flakies, regardless if
there's more or less of them. How are we to place this with our recent
stance of no releases unless green…? (loops back to my "less overall
flakies than previous release /campground-cleaner" suggestion)

Side note, Butler is reporting CASSANDRA-17348 as open (it's resolved as a
duplicate).

Re: Cassandra project status update 2022-08-03

Posted by Josh McKenzie <jm...@apache.org>.
> “ We can start by putting the bar at a lower level and raise the level over time when most of the flakies that we hit are above that level.”
> My only concern is only who and how will track that.
What's Butler's logic for flagging things flaky? Maybe a "flaky low" vs. "flaky high" distinction based on failure frequency (or some much better name I'm sure someone else will come up with) could make sense? Then we could focus our efforts on the ones that are flagged as failing at whatever high water mark threshold we set.

It'd be trivial for me to update the script that parses test failure output for JIRA updates to flag things based on their failure frequency.

On Tue, Aug 9, 2022, at 5:24 PM, Ekaterina Dimitrova wrote:
> “ In my opinion, not all flakies are equals. Some fails every 10 runs, some fails 1 in a 1000 runs.”
> Agreed, for all not new tests/regressions which are also not infra related.
> 
> “ We can start by putting the bar at a lower level and raise the level over time when most of the flakies that we hit are above that level.”
> My only concern is only who and how will track that.
> Also, metric for non-infra issues I guess
> 
> “ At the same time we should make sure that we do not introduce new flakies. One simple approach that has been mentioned several time is to run the new tests added by a given patch in a loop using one of the CircleCI tasks. ”
> +1, I personally find this very valuable and more efficient than bisecting and getting back to works done in some cases months ago
> 
> 
> “ We should also probably revert newly committed patch if we detect that they introduced flakies.”
> +1, not that I like my patches to be reverted but it seems as the most fair way to stick to our stated goals. But I think last time we talked about reverting, we discussed it only for trunk? Or do I remember it wrong?
> 
> 
> 
> On Tue, 9 Aug 2022 at 7:58, Benjamin Lerer <bl...@apache.org> wrote:
>> At this point it is clear that we will probably never be able to remove some level of flakiness from our tests. For me the questions are: 1) Where do we draw the line for a release ? and 2) How do we maintain that line over time?
>> 
>> In my opinion, not all flakies are equals. Some fails every 10 runs, some fails 1 in a 1000 runs. I would personally draw the line based on that metric. With the circleci tasks that Andres has added we can easily get that information for a given test.
>> We can start by putting the bar at a lower level and raise the level over time when most of the flakies that we hit are above that level.
>> 
>> TThat would allow us to minimize the risk of introducing flaky tests. We should also probably revert newly committed patch if we detect that they introduced flakies.
>> 
>> What do you think?
>> 
>> 
>> 
>> 
>> 
>> Le dim. 7 août 2022 à 12:24, Mick Semb Wever <mc...@apache.org> a écrit :
>>> 
>>> 
>>>> With that said, I guess we can just revise on a regular basis what exactly are the last flakes and not numbers which also change quickly up and down with the first change in the Infra. 
>>>> 
>>> 
>>> 
>>> +1, I am in favour of taking a pragmatic approach.
>>> 
>>> If flakies are identified and triaged enough that, with correlation from both CI systems, we are confident that no legit bugs are behind them, I'm in favour of going beta.
>>> 
>>> I still remain in favour of somehow incentivising reducing other flakies as well. Flakies that expose poor/limited CI infra, and/or tests that are not as resilient as they could be, are still noise that indirectly reduce our QA (and increase efforts to find and tackle those legit runtime problems). Interested in hearing input from others here that have been spending a lot of time on this front. 
>>> 
>>> Could it work if we say: all flakies must be ticketed, and test/infra related flakies do not block a beta release so long as there are fewer than the previous release? The intent here being pragmatic, but keeping us on a "keep the campground cleaner" trajectory… 

Re: Cassandra project status update 2022-08-03

Posted by Ekaterina Dimitrova <e....@gmail.com>.
“ In my opinion, not all flakies are equals. Some fails every 10 runs, some
fails 1 in a 1000 runs.”
Agreed, for all not new tests/regressions which are also not infra related.

“ We can start by putting the bar at a lower level and raise the level over
time when most of the flakies that we hit are above that level.”
My only concern is only who and how will track that.
Also, metric for non-infra issues I guess

“ At the same time we should make sure that we do not introduce new
flakies. One simple approach that has been mentioned several time is to run
the new tests added by a given patch in a loop using one of the CircleCI
tasks. ”
+1, I personally find this very valuable and more efficient than bisecting
and getting back to works done in some cases months ago


“ We should also probably revert newly committed patch if we detect that
they introduced flakies.”
+1, not that I like my patches to be reverted but it seems as the most fair
way to stick to our stated goals. But I think last time we talked about
reverting, we discussed it only for trunk? Or do I remember it wrong?



On Tue, 9 Aug 2022 at 7:58, Benjamin Lerer <bl...@apache.org> wrote:

> At this point it is clear that we will probably never be able to remove
> some level of flakiness from our tests. For me the questions are: 1) Where
> do we draw the line for a release ? and 2) How do we maintain that line
> over time?
>
> In my opinion, not all flakies are equals. Some fails every 10 runs, some
> fails 1 in a 1000 runs. I would personally draw the line based on that
> metric. With the circleci tasks that Andres has added we can easily get
> that information for a given test.
> We can start by putting the bar at a lower level and raise the level over
> time when most of the flakies that we hit are above that level.
>
> TThat would allow us to minimize the risk of introducing flaky tests. We
> should also probably revert newly committed patch if we detect that they
> introduced flakies.
>
> What do you think?
>
>
>
>
>
> Le dim. 7 août 2022 à 12:24, Mick Semb Wever <mc...@apache.org> a écrit :
>
>>
>>
>> With that said, I guess we can just revise on a regular basis what
>>> exactly are the last flakes and not numbers which also change quickly up
>>> and down with the first change in the Infra.
>>>
>>
>>
>> +1, I am in favour of taking a pragmatic approach.
>>
>> If flakies are identified and triaged enough that, with correlation from
>> both CI systems, we are confident that no legit bugs are behind them, I'm
>> in favour of going beta.
>>
>> I still remain in favour of somehow incentivising reducing other flakies
>> as well. Flakies that expose poor/limited CI infra, and/or tests that are
>> not as resilient as they could be, are still noise that indirectly reduce
>> our QA (and increase efforts to find and tackle those legit runtime
>> problems). Interested in hearing input from others here that have been
>> spending a lot of time on this front.
>>
>> Could it work if we say: all flakies must be ticketed, and test/infra
>> related flakies do not block a beta release so long as there are fewer than
>> the previous release? The intent here being pragmatic, but keeping us on a
>> "keep the campground cleaner" trajectory…
>>
>>

Re: Cassandra project status update 2022-08-03

Posted by Benjamin Lerer <bl...@apache.org>.
At this point it is clear that we will probably never be able to remove
some level of flakiness from our tests. For me the questions are: 1) Where
do we draw the line for a release ? and 2) How do we maintain that line
over time?

In my opinion, not all flakies are equals. Some fails every 10 runs, some
fails 1 in a 1000 runs. I would personally draw the line based on that
metric. With the circleci tasks that Andres has added we can easily get
that information for a given test.
We can start by putting the bar at a lower level and raise the level over
time when most of the flakies that we hit are above that level.

At the same time we should make sure that we do not introduce new flakies.
One simple approach that has been mentioned several time is to run the new
tests added by a given patch in a loop using one of the CircleCI tasks.
That would allow us to minimize the risk of introducing flaky tests. We
should also probably revert newly committed patch if we detect that they
introduced flakies.

What do you think?





Le dim. 7 août 2022 à 12:24, Mick Semb Wever <mc...@apache.org> a écrit :

>
>
> With that said, I guess we can just revise on a regular basis what exactly
>> are the last flakes and not numbers which also change quickly up and down
>> with the first change in the Infra.
>>
>
>
> +1, I am in favour of taking a pragmatic approach.
>
> If flakies are identified and triaged enough that, with correlation from
> both CI systems, we are confident that no legit bugs are behind them, I'm
> in favour of going beta.
>
> I still remain in favour of somehow incentivising reducing other flakies
> as well. Flakies that expose poor/limited CI infra, and/or tests that are
> not as resilient as they could be, are still noise that indirectly reduce
> our QA (and increase efforts to find and tackle those legit runtime
> problems). Interested in hearing input from others here that have been
> spending a lot of time on this front.
>
> Could it work if we say: all flakies must be ticketed, and test/infra
> related flakies do not block a beta release so long as there are fewer than
> the previous release? The intent here being pragmatic, but keeping us on a
> "keep the campground cleaner" trajectory…
>
>

Re: Cassandra project status update 2022-08-03

Posted by Mick Semb Wever <mc...@apache.org>.
With that said, I guess we can just revise on a regular basis what exactly
> are the last flakes and not numbers which also change quickly up and down
> with the first change in the Infra.
>


+1, I am in favour of taking a pragmatic approach.

If flakies are identified and triaged enough that, with correlation from
both CI systems, we are confident that no legit bugs are behind them, I'm
in favour of going beta.

I still remain in favour of somehow incentivising reducing other flakies as
well. Flakies that expose poor/limited CI infra, and/or tests that are not
as resilient as they could be, are still noise that indirectly reduce our
QA (and increase efforts to find and tackle those legit runtime problems).
Interested in hearing input from others here that have been spending a lot
of time on this front.

Could it work if we say: all flakies must be ticketed, and test/infra
related flakies do not block a beta release so long as there are fewer than
the previous release? The intent here being pragmatic, but keeping us on a
"keep the campground cleaner" trajectory…

Re: Cassandra project status update 2022-08-03

Posted by Ekaterina Dimitrova <e....@gmail.com>.
Re: 17738 - the ticket was about any new properties which are actually not
of the new types. It had to guarantee that there is no disconnect between
updating Settings Virtual Table after startup and JMX setters/getters. (In
one of its “brother” tickets the issues we found exist since 4.0) I bring
it up as we need to ensure configuration parameters update the original
Config parameters from JMX if we want Settings Virtual Table to be properly
updated after startup and thus cut the confusion for the users. This is
actually a goal for this VT stated also in our Docs and the original ticket.
Raising the point again as while we still have both VT and JMX we need to
be sure we provide consistent information for our users. I will put also a
note in the Config docs to stress on this and remind people.
Probably when we add the update option for the Settings Virtual Table in
the next version we will need to think of better way to keep this in sync
or even start deprecating JMX but for now this is what we have in place and
we need to maintain it.

Thank you Josh for the report, it is always valuable!

About flaky tests - in my personal opinion it is more about what
outstanding flaky tests we have then how many. We can have 3 which surface
legit bugs, we can have 10 presenting only timeouts which are due to
environmental issues. These days I see Circle CI green all the time which
is really promising as many of our legit bugs were discovered there. With
that said, I guess we can just revise on a regular basis what exactly are
the last flakes and not numbers which also change quickly up and down with
the first change in the Infra.

On Wed, 3 Aug 2022 at 13:17, Josh McKenzie <jm...@apache.org> wrote:

> Greetings everyone! Let's check in on 4.1, see how we're doing:
>
> https://butler.cassandra.apache.org/#/
> We had 4 failures on our last run. We've gone back and forth a bit with
> the CASTest failure, a test introduced back in CASSANDRA-12126 @Ignore'd,
> however that showed some legitimate failures that should be addressed by
> Paxos V2. If anyone from the discussion has the cycles (or someone with
> familiarity with the area) could take assignee on the test failure ticket
> (17461) and responsibility for driving it to resolution that would help
> clarify our efforts there. (
> https://issues.apache.org/jira/browse/CASSANDRA-17461)
>
> Along with that, we saw a failure in
> TopPartitionsTest.testServiceTopPartitionsSingleTable (cdc) and
> TestBootstrap.test_simultaneous_bootstrap (offheap). Given both are
> specific configurations of tests that ran successfully to completion in
> other configurations there's a reasonable chance they're flaky, be it from
> the logic of the test or the CI environment in which they're executing.
> Neither tickets appear to have active JIRA's associated with them in butler
> or in the kanban board (
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=496&quickFilter=2252)
> so we could use a volunteer here to both create those tickets and to drive
> them.
>
> We're close enough that we're ready to again visit how we want to treat
> the requirement for no flaky failures before we cut beta (
> https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle,
> "No flaky tests - All tests (Unit Tests and DTests) should pass
> consistently"). After seeing a couple releases with this requirement (4.0
> and now 4.1), I'm inclined to agree with the comment from Dinesh that we
> should revise this requirement formally if we're going to effectively
> release with flaky tests anyway; best to be honest with ourselves and
> acknowledge it's not proving to be a forcing function for changing
> behavior. If this email doesn't see much traction on this topic I'll hit up
> the dev list with a DISCUSS thread on it.
>
> The kanban for 4.1 blockers show us 13 tickets:
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2455.
> Most of them are assigned and many in progress, however we have 3
> unassigned if anyone wants to pick those up:
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2455&quickFilter=2160
>
>
> [New Contributors Getting Started]
> One of the three issues on 4.1 blocker list or either of the 2 failing
> tests listed above would be great areas to focus your attention!
>
> Nuts and bolts / env / etc: here's an explanation of various types of
> contribution:
> https://cassandra.apache.org/_/community.html#how-to-contribute
> An overview of the C* architecture:
> https://cassandra.apache.org/doc/latest/cassandra/architecture/overview.html
> And here's our getting started contributing guide:
> https://cassandra.apache.org/_/development/index.html
> We hang out in #cassandra-dev on https://the-asf.slack.com, and you can
> ping the @cassandra_mentors alias to reach 13 of us who have volunteered to
> mentor new contributors on the project. Looking forward to seeing you there.
>
>
> [Dev list Digest]
> https://lists.apache.org/list?dev@cassandra.apache.org:lte=2w:
>
> The challenge of our eclectic usage of NULL strikes again with CEP-15. Avi
> opened up a ticket about this with
> https://issues.apache.org/jira/browse/CASSANDRA-17762. Caleb's working on
> the CQL support for multi-partition transactions on
> https://issues.apache.org/jira/browse/CASSANDRA-17719 where the general
> sentiment seems to be "let's go with a SQL-congruent syntax".
>
> Discussion about the potential benefits and downsides of a multi-threaded
> flushing CommitLog continue:
> https://lists.apache.org/thread/5j8ljtpdw3g0gyrx6m31gh1gjdkztclg. As this
> project is quite complex and has very different performance characteristics
> over time (in-memory initially only vs. long-term flushed to disk
> maintaining LSM trees), benchmarking features like this has proven
> difficult. Anyone with a perspective on the cost/benefits or who's
> interested in balancing that complexity vs. functionality feel free to
> chime in.
>
> An interesting question about inclusivity or exclusivity of token ranges
> and API consistency came up thanks to
> https://issues.apache.org/jira/browse/CASSANDRA-17575.
> https://lists.apache.org/thread/4tm626ffnqlvt4cbmopdfpd8w6fpqscd. This
> link doesn't capture the entire thread for some reason; the most clarifying
> observation to me comes from Jeremiah about the current usage of tokens in
> the tool: "Reading the responses here and taking a step back, I think the
> current behavior of nodetool compact is probably the correct behavior. The
> main use case I can see for using nodetool compact is someone wants to take
> some sstable and compact it with all the overlapping sstables"
>
> And last but not least, Claude Warren is looking for a reviewer on
> https://issues.apache.org/jira/browse/CASSANDRA-14218. Looks like Dinesh
> was flagged on that as reviewer awhile ago.
>
> [CI Trends]
> https://butler.cassandra.apache.org/#/
>
> The last three weeks show us ticking up but the reason is not too
> surprising:
>
> 3.0: 10 -> 14
> 3.11: 15 -> 17
> 4.0: 1 -> 6
> 4.1: 5 -> 4
> trunk: 5 -> 7
>
> On the 3.0-4.0 branches, this looks to be due to TestRepair failing (
> https://issues.apache.org/jira/browse/CASSANDRA-17701 and
> https://issues.apache.org/jira/browse/CASSANDRA-17702). Neither of those
> tickets yet have an assignee so if anyone has the cycles or context to look
> into them that'd be great.
>
> 4.1 failures are slowly but surely contracting.
>
>
> [Release progress]
>
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2175
>
> 4.1 beta:
> We closed out 8 issues in the past couple of weeks. Some test fixes,
> restarting on gossip only nodes (CASSANDRA-17752), adding validation that
> the new config params are structured as we expect in 4.1 for JMX
> (CASSANDRA-17738), and cleaning up a straightforward doubling of the
> writePreparedStatement call in CASSANDRA-17764.
>
> 4.1 rc:
> Test fix (CASSANDRA-17769)
>
> Been a pretty quiet week on our older branches.
>
> So to sum it up:
> - CASTest failures blocking 4.1:
> https://issues.apache.org/jira/browse/CASSANDRA-17461, needs assignee
> - Regression on some TestRepair:
> https://issues.apache.org/jira/browse/CASSANDRA-17701 and
> https://issues.apache.org/jira/browse/CASSANDRA-17702, needs assignee
> - We should discuss whether we want to cut 4.1 w/known flaky tests in ASF
> CI or if we need to introduce more formal metrics around what "having no
> flakes" means (3, 5, 10 clean runs? Something else?)
>
> Thanks as always everyone; see you on slack.
>
>
> ~Josh
>