You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Josh McKenzie <jm...@apache.org> on 2022/11/07 21:59:01 UTC

Cassandra project status update 2022-11-07

Oh good grief, it's been 26 days since I wrote one of these. My apologies! (Life happens - I can confirm that the terribly named "triple-demic" is real folks)

We've had a number of releases since the last status email. The current and latest supported GA cassandra releases across all branches are:

- cassandra 4: 4.0.7
- cassandra 3.11: 3.11.14
- cassandra 3.0: 3.0.28


[Needs Committers]
I'd like to first focus our attention on tickets that are flagged as "Needs Committer". Our project rules for Cassandra are that 2 committers need to sign off on a commit, so many times if an author or reviewer isn't yet a committer, these tickets can need external input to get into the codebase. The following URL is for a query to pull the Needs Committer tickets: https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20and%20resolution%20%3D%20unresolved%20and%20status%20%3D%20%22Needs%20Committer%22

CASSANDRA-17861, Update Python test framework from nose to pytest in CCM could use another committer on it: https://issues.apache.org/jira/browse/CASSANDRA-17861

CASSANDRA-17870, nodetool/rebuild: Add flag to exclude nodes from local datacenter could also use another committer on review: https://issues.apache.org/jira/browse/CASSANDRA-17870

CASSANDRA-15402, Make incremental backup configurable per keyspace and table looks like it has committer attention as per a recent comment so we're good there.

CASSANDRA-14930, decommission may cause timeout because messaging backlog is cleared: not sure why this one is marked as Needs Committer actually as it has 2 as reviewer. Might just need a status update.

Before we get to 4.1 status, I'd like to call out that Trie memtables were merged in CASSANDRA-17240. This is a large body of novel work (that Branimir presented on at ApacheCon for those of you lucky enough to attend) and it's great to see this land in the project; it's worth your time to pop open that diff and take a look around and see some of the new things being added to Cassandra. Notably, there's some great discussion about property-based testing going on in the comments which has sparked some offline discussion about how we can integrate exploratory fuzz testing in our primary CI pipeline; more to come on that front as discussions evolve.


[4.1 status]
Let's move on to 4.1 status. We're down to 2 tickets blocking rc, and I'm given to understand that the one in progress is close to having something to review, so on the "outstanding work" side we're in great shape: https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484

That leaves us with the question: what do we do about CI? We've recently expanded our governance options as to what we consider validated and cleared for release: https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle. Specifically:

"Three consecutive green runs of circleci OR of ASF CI are required to cut RC"

Our most recent run of 4.1 on ASF infra had 9 failures - https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-4.1/cassandra-4.1. This has been trending up a bit very recently from a low of 1 a bit over a week ago; the lion's share of the failures look to be environmental with timeouts.

With ASF CI having stragglers that are flaking lately, option 2 would be three consecutive green runs on circleci, however in order for this to be representative we need some improvements to the test configuration in circle to get it into parity with the ASF env, as tracked in CASSANDRA-17930 here: https://issues.apache.org/jira/browse/CASSANDRA-17930. As of a recent comment Ekaterina's taking point on this and tracking that addition in CASSANDRA-18001: https://issues.apache.org/jira/browse/CASSANDRA-18001. Ekaterina - if there's anything other folks on the project can do to assist (including reviewing) please let us know.

So we do have a 3rd option we discussed in slack: running tests on the ASF infra and then selectively multiplexing failures on circle. If a test fails on ASF CI but passes 500 times on circle, the general consensus was that was sufficient for us to have confidence in the test. With the recent changes Andres introduced in CASSANDRA-17939, multiplexing multiple tests in circle has become very simple and you can see instructions on generating the correct circle config using .circleci/generate.sh --help (look for the REPEATED_UTESTS= , REPEATED_JVM_DTESTS=, etc options). This hybrid third approach (canonical run on ASF infra + multiplex failures on circle) gives us another outlet to get a validated release if necessary, albeit at the cost of more effort.

I'm working with some of the other contributors on ways we can evolve our canonical CI infrastructure as well as making that environment reproducible in order to get us a more stable environment in the ASF while also allowing contributors with access to private cloud hardware to run testing at higher parallelization levels; stay tuned for more detail on that in the coming weeks as well.

One last note I want to call out - immense amounts of energy from many contributors has gone into hardening our test infrastructure and improving our tests in the run up to 4.1. 9 tests failing out of a total suite count of 49,698 tests (as of build 202 on 4.1) is a 99.98% pass rate. That said, we're infrastructure software powering many of the world's most critical applications so we're going to keep pushing until we hit green and keep it there.


[New Contributors Getting Started]
We have a new entrant for new contributors! So technically this has been around awhile but I hadn't thought to promote it in these emails. We have an official management sidecar for Apache Cassandra as designed and delivered as part of CEP-1: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652224. This is a smaller and less complex project than the Cassandra Storage engine and Query Coordination so might prove an attractive on-ramp for any of you who have thought about getting involved but were daunted by the database internals themselves.

Open JIRA issues for the sidecar can be found here: https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRASC%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20assignee%20DESC%2C%20priority%20DESC%2C%20updated%20DESC

And the project can be cloned from the github repo here: https://github.com/apache/cassandra-sidecar

On the Cassandra side, we've curated 24 "Starter Tickets" across our various releases that are unassigned right now - these are also good candidates if you're looking for something a little more bite-sized to get adjusted: https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2160&quickFilter=2162. Likewise, documentation contributions and website contributions are generally good ways to get to know our project ecosystem, the commit process, and interact with some of the other contributors.

If you're feeling adventurous, there are quite a few tickets on the unassigned list on 4.0.x and 4.x that could be good candidates to take on: https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2160. There's 46 unassigned issues in 4.0.x and 311 in 4.x so there's a lot of options to choose from there.

We hang out in #cassandra-dev on https://the-asf.slack.com and there's a @cassandra_mentors alias you can use to reach a bunch of us that have volunteered to help newcomers get situated. If you need an invite to the slack channel feel free to reply to just me on this email and I'll get you set up.

Here's reference explaining the various types of contribution: https://cassandra.apache.org/_/community.html#how-to-contribute
An overview of the C* architecture: https://cassandra.apache.org/doc/latest/cassandra/architecture/overview.html
The getting started contributing guide: https://cassandra.apache.org/_/development/index.html


[Dev list Digest]
https://lists.apache.org/list?dev@cassandra.apache.org:lte=26d:
26 days is a lot of ground to cover here. :)

The thread on whether we were going to do 4.2 or 5.0 came to a close here: https://lists.apache.org/thread/ymj3737x25b7bbqv9lp27w5v1ftc83j9. Results are enshrined in CASSANDRA-17973: https://issues.apache.org/jira/browse/CASSANDRA-17973 (spoiler alert: we're going with 5.0)

We had a solid discussion about changes to improve circleci (https://lists.apache.org/thread/c7hp1wt06r14v1vpovjd5mzy62gdsxqh) that culminated in CASSANDRA-18007 being created: https://issues.apache.org/jira/browse/CASSANDRA-18007. 

Erick Ramirez provided a PR and proposal for a formal events page for our website: https://lists.apache.org/thread/hn1b8ymn5sq3w31dvrorroqm2q7yw82v, that can be seen here now that it's merged: https://cassandra.apache.org/_/events.html

Derek Chen-Becker had a general question about our usage of sh vs. bash: https://lists.apache.org/thread/dzn34v18rhgsxo9grlmxrvxnp0521hgz. The quick and dirty lazy consensus there seems to be "user-facing don't change from sh, dev-facing let's go bash".

Derek has a well thought out and articulated proposal about refactoring and cleaning up our CircleCI config to make use of some of the idiomatic features and parameterization available in the ecosystem: https://lists.apache.org/thread/mvql1p5y2j7so18427zcg4zxc9vzl7l3.

We've had some tests slip through the cracks historically as they didn't match the prescribed regex that picks up test file names; Stefan Miklosovic called this out on a thread here: https://lists.apache.org/thread/vhqprqcv070vmomoozyqdn75fvdd1oll. There's a couple of proposals that have come up on the thread (that are ultimately complementary) - using Checkstyle to force a certain file format and extending our logic during our build to look for non-abstract files in the test directory containing the @Test annotation. No real closure on this yet, and ultimately the person willing to do the work has the final say on it if nobody has any major concerns with an implementation which is the case here.

A few days ago David Capwell asked about places in our code where we haven't actually specified encoding meaning they've relied on the system specified default: https://lists.apache.org/thread/sokxf46s7hyoxr9q4wm7dv3q2nm19nt3. I've personally read that email three times now and can't think of a useful response other than to back away slowly, so maybe one of you will see that here and chime in. :)

And last but not least in this marathon catch-up, Ekaterina has put together a proposal for extending our code style regarding when we access JDK internals and when to hit the dev list for consensus on this thread: https://lists.apache.org/thread/ydgg308jl6sfcwg92kf6m7ylqqo089ho. Her proposal can be found here: https://github.com/ekaterinadimitrova2/cassandra-website/commit/4a9edc7e88fd9fc2c6aa8a84290b75b02cac03bf


[ASF CI Trends]
https://butler.cassandra.apache.org/#/

Here's our trends on our branches for the last 26 days:

3.0: 13 -> 10
3.11: 22 -> 11
4.0: 6 -> 2
4.1: 14 -> 9
trunk: 7 -> 21

We discussed 4.1 up above; 3.0 through 4.0 are trending in a good direction. Looks like quite a few of the trunk failures are from new messaging in logs on teardown that either don't have exceptions yet in the teardown parser or test that haven't been updated to change logic to match new defaults on trunk. I'd advocate for all those things being fixed _before_ they get into trunk of course, but I'm also responsible for some of them so I will refrain from throwing stones from within this fine glass house I'm in.


[Release progress]
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2278

I know I'm behind when I have to create a custom quick filter to have the kanban show some strange number of days. So in the last 26 days we have:

4.1 rc / ga: 7 issues
- Fixing generate.sh behavior w/out options provided (CASSANDRA-17995)
- Fix race condition on repair snapshots (CASSANDRA-17955)
- Add --resolve-ip option on nodetool gossipinfo (CASSANDRA-17934)
- Automatically detect and repeat new or changed tests in circleci config (CASSANDRA-17939)
- Update What's New page for 4.1 and trunk (CASSANDRA-17976)
- Update Netbeans project file for dependency (CASSANDRA-18002)
- RPM installation on centos7 is broken (CASSANDRA-17765)

4.0.x: 5 issues
- CircleCI: j11_utest_fqltool fails to build (CASSANDRA-18020)
- CircleCI: Skip checkstyle in the ant-based repeated tests (CASSANDRA-18000)
- Fix CircleCI config for running python upgrade tests on 3.0 and 3.11 (CASSANDRA-17912)
- Update debian packages for bullseye (CASSANDRA-17871)
- CircleCI: Add jobs for running specialized unit tests with Java 11 (CASSANDRA-17987)

4.X / Next: 6 issues
- Round out cqlsh completion test coverage (CASSANDRA-16640)
- Log JVM arguments at in-JVM test class initialization (CASSANDRA-16664)
- nodetool bootstrap resume returns success even if there is an error during bootstrap (CASSANDRA-16491)
- Make resumable bootstrap feature optional (CASSANDRA-17679)
- Include GitSHA in nodetool version output (CASSANDRA-17753)
- CEP-19: Trie Memtable implementation (CASSANDRA-17240)

Phew! And this is why I should keep to the biweekly cadence; there's a lot going on these days. :)

~Josh

Re: Cassandra project status update 2022-11-07

Posted by Josh McKenzie <jm...@apache.org>.
Thanks Stefan; I added the 4.1-rc fixver to the ticket we already have. Could you do the same when you create a ticket for that other failure?

On Mon, Nov 7, 2022, at 5:29 PM, Miklosovic, Stefan wrote:
> Hi Josh,
> 
> thanks for the status.
> 
> I would like to raise awareness that as we fix CASSANDRA-17964, it will introduce two tests which will start to fail (because they were not executed as part of CI until now because how they are named (not ending on *Test)).
> 
> I believe that these tests will need to be addressed and fixed before 4.1 is out.
> 
> My email describing that in more detail is here (1).
> 
> (1) https://lists.apache.org/thread/pl0q1krhgv0rvybp5jmdy3411hchy28l
> 
> Regards,
> 
> Stefan
> 
> (1) https://lists.apache.org/thread/pl0q1krhgv0rvybp5jmdy3411hchy28l
> 
> ________________________________________
> From: Josh McKenzie <jm...@apache.org>
> Sent: Monday, November 7, 2022 22:59
> To: dev
> Subject: Cassandra project status update 2022-11-07
> 
> NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.
> 
> 
> 
> Oh good grief, it's been 26 days since I wrote one of these. My apologies! (Life happens - I can confirm that the terribly named "triple-demic" is real folks)
> 
> We've had a number of releases since the last status email. The current and latest supported GA cassandra releases across all branches are:
> 
> - cassandra 4: 4.0.7
> - cassandra 3.11: 3.11.14
> - cassandra 3.0: 3.0.28
> 
> 
> [Needs Committers]
> I'd like to first focus our attention on tickets that are flagged as "Needs Committer". Our project rules for Cassandra are that 2 committers need to sign off on a commit, so many times if an author or reviewer isn't yet a committer, these tickets can need external input to get into the codebase. The following URL is for a query to pull the Needs Committer tickets: https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20and%20resolution%20%3D%20unresolved%20and%20status%20%3D%20%22Needs%20Committer%22
> 
> CASSANDRA-17861, Update Python test framework from nose to pytest in CCM could use another committer on it: https://issues.apache.org/jira/browse/CASSANDRA-17861
> 
> CASSANDRA-17870, nodetool/rebuild: Add flag to exclude nodes from local datacenter could also use another committer on review: https://issues.apache.org/jira/browse/CASSANDRA-17870
> 
> CASSANDRA-15402, Make incremental backup configurable per keyspace and table looks like it has committer attention as per a recent comment so we're good there.
> 
> CASSANDRA-14930, decommission may cause timeout because messaging backlog is cleared: not sure why this one is marked as Needs Committer actually as it has 2 as reviewer. Might just need a status update.
> 
> Before we get to 4.1 status, I'd like to call out that Trie memtables were merged in CASSANDRA-17240. This is a large body of novel work (that Branimir presented on at ApacheCon for those of you lucky enough to attend) and it's great to see this land in the project; it's worth your time to pop open that diff and take a look around and see some of the new things being added to Cassandra. Notably, there's some great discussion about property-based testing going on in the comments which has sparked some offline discussion about how we can integrate exploratory fuzz testing in our primary CI pipeline; more to come on that front as discussions evolve.
> 
> 
> [4.1 status]
> Let's move on to 4.1 status. We're down to 2 tickets blocking rc, and I'm given to understand that the one in progress is close to having something to review, so on the "outstanding work" side we're in great shape: https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484
> 
> That leaves us with the question: what do we do about CI? We've recently expanded our governance options as to what we consider validated and cleared for release: https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle. Specifically:
> 
> "Three consecutive green runs of circleci OR of ASF CI are required to cut RC"
> 
> Our most recent run of 4.1 on ASF infra had 9 failures - https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-4.1/cassandra-4.1. This has been trending up a bit very recently from a low of 1 a bit over a week ago; the lion's share of the failures look to be environmental with timeouts.
> 
> With ASF CI having stragglers that are flaking lately, option 2 would be three consecutive green runs on circleci, however in order for this to be representative we need some improvements to the test configuration in circle to get it into parity with the ASF env, as tracked in CASSANDRA-17930 here: https://issues.apache.org/jira/browse/CASSANDRA-17930. As of a recent comment Ekaterina's taking point on this and tracking that addition in CASSANDRA-18001: https://issues.apache.org/jira/browse/CASSANDRA-18001. Ekaterina - if there's anything other folks on the project can do to assist (including reviewing) please let us know.
> 
> So we do have a 3rd option we discussed in slack: running tests on the ASF infra and then selectively multiplexing failures on circle. If a test fails on ASF CI but passes 500 times on circle, the general consensus was that was sufficient for us to have confidence in the test. With the recent changes Andres introduced in CASSANDRA-17939, multiplexing multiple tests in circle has become very simple and you can see instructions on generating the correct circle config using .circleci/generate.sh --help (look for the REPEATED_UTESTS= , REPEATED_JVM_DTESTS=, etc options). This hybrid third approach (canonical run on ASF infra + multiplex failures on circle) gives us another outlet to get a validated release if necessary, albeit at the cost of more effort.
> 
> I'm working with some of the other contributors on ways we can evolve our canonical CI infrastructure as well as making that environment reproducible in order to get us a more stable environment in the ASF while also allowing contributors with access to private cloud hardware to run testing at higher parallelization levels; stay tuned for more detail on that in the coming weeks as well.
> 
> One last note I want to call out - immense amounts of energy from many contributors has gone into hardening our test infrastructure and improving our tests in the run up to 4.1. 9 tests failing out of a total suite count of 49,698 tests (as of build 202 on 4.1) is a 99.98% pass rate. That said, we're infrastructure software powering many of the world's most critical applications so we're going to keep pushing until we hit green and keep it there.
> 
> 
> [New Contributors Getting Started]
> We have a new entrant for new contributors! So technically this has been around awhile but I hadn't thought to promote it in these emails. We have an official management sidecar for Apache Cassandra as designed and delivered as part of CEP-1: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652224. This is a smaller and less complex project than the Cassandra Storage engine and Query Coordination so might prove an attractive on-ramp for any of you who have thought about getting involved but were daunted by the database internals themselves.
> 
> Open JIRA issues for the sidecar can be found here: https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRASC%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20assignee%20DESC%2C%20priority%20DESC%2C%20updated%20DESC
> 
> And the project can be cloned from the github repo here: https://github.com/apache/cassandra-sidecar
> 
> On the Cassandra side, we've curated 24 "Starter Tickets" across our various releases that are unassigned right now - these are also good candidates if you're looking for something a little more bite-sized to get adjusted: https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2160&quickFilter=2162. Likewise, documentation contributions and website contributions are generally good ways to get to know our project ecosystem, the commit process, and interact with some of the other contributors.
> 
> If you're feeling adventurous, there are quite a few tickets on the unassigned list on 4.0.x and 4.x that could be good candidates to take on: https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2160. There's 46 unassigned issues in 4.0.x and 311 in 4.x so there's a lot of options to choose from there.
> 
> We hang out in #cassandra-dev on https://the-asf.slack.com and there's a @cassandra_mentors alias you can use to reach a bunch of us that have volunteered to help newcomers get situated. If you need an invite to the slack channel feel free to reply to just me on this email and I'll get you set up.
> 
> Here's reference explaining the various types of contribution: https://cassandra.apache.org/_/community.html#how-to-contribute
> An overview of the C* architecture: https://cassandra.apache.org/doc/latest/cassandra/architecture/overview.html
> The getting started contributing guide: https://cassandra.apache.org/_/development/index.html
> 
> 
> [Dev list Digest]
> https://lists.apache.org/list?dev@cassandra.apache.org:lte=26d:
> 26 days is a lot of ground to cover here. :)
> 
> The thread on whether we were going to do 4.2 or 5.0 came to a close here: https://lists.apache.org/thread/ymj3737x25b7bbqv9lp27w5v1ftc83j9. Results are enshrined in CASSANDRA-17973: https://issues.apache.org/jira/browse/CASSANDRA-17973 (spoiler alert: we're going with 5.0)
> 
> We had a solid discussion about changes to improve circleci (https://lists.apache.org/thread/c7hp1wt06r14v1vpovjd5mzy62gdsxqh) that culminated in CASSANDRA-18007 being created: https://issues.apache.org/jira/browse/CASSANDRA-18007.
> 
> Erick Ramirez provided a PR and proposal for a formal events page for our website: https://lists.apache.org/thread/hn1b8ymn5sq3w31dvrorroqm2q7yw82v, that can be seen here now that it's merged: https://cassandra.apache.org/_/events.html
> 
> Derek Chen-Becker had a general question about our usage of sh vs. bash: https://lists.apache.org/thread/dzn34v18rhgsxo9grlmxrvxnp0521hgz. The quick and dirty lazy consensus there seems to be "user-facing don't change from sh, dev-facing let's go bash".
> 
> Derek has a well thought out and articulated proposal about refactoring and cleaning up our CircleCI config to make use of some of the idiomatic features and parameterization available in the ecosystem: https://lists.apache.org/thread/mvql1p5y2j7so18427zcg4zxc9vzl7l3.
> 
> We've had some tests slip through the cracks historically as they didn't match the prescribed regex that picks up test file names; Stefan Miklosovic called this out on a thread here: https://lists.apache.org/thread/vhqprqcv070vmomoozyqdn75fvdd1oll. There's a couple of proposals that have come up on the thread (that are ultimately complementary) - using Checkstyle to force a certain file format and extending our logic during our build to look for non-abstract files in the test directory containing the @Test annotation. No real closure on this yet, and ultimately the person willing to do the work has the final say on it if nobody has any major concerns with an implementation which is the case here.
> 
> A few days ago David Capwell asked about places in our code where we haven't actually specified encoding meaning they've relied on the system specified default: https://lists.apache.org/thread/sokxf46s7hyoxr9q4wm7dv3q2nm19nt3. I've personally read that email three times now and can't think of a useful response other than to back away slowly, so maybe one of you will see that here and chime in. :)
> 
> And last but not least in this marathon catch-up, Ekaterina has put together a proposal for extending our code style regarding when we access JDK internals and when to hit the dev list for consensus on this thread: https://lists.apache.org/thread/ydgg308jl6sfcwg92kf6m7ylqqo089ho. Her proposal can be found here: https://github.com/ekaterinadimitrova2/cassandra-website/commit/4a9edc7e88fd9fc2c6aa8a84290b75b02cac03bf
> 
> 
> [ASF CI Trends]
> https://butler.cassandra.apache.org/#/
> 
> Here's our trends on our branches for the last 26 days:
> 
> 3.0: 13 -> 10
> 3.11: 22 -> 11
> 4.0: 6 -> 2
> 4.1: 14 -> 9
> trunk: 7 -> 21
> 
> We discussed 4.1 up above; 3.0 through 4.0 are trending in a good direction. Looks like quite a few of the trunk failures are from new messaging in logs on teardown that either don't have exceptions yet in the teardown parser or test that haven't been updated to change logic to match new defaults on trunk. I'd advocate for all those things being fixed _before_ they get into trunk of course, but I'm also responsible for some of them so I will refrain from throwing stones from within this fine glass house I'm in.
> 
> 
> [Release progress]
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2278
> 
> I know I'm behind when I have to create a custom quick filter to have the kanban show some strange number of days. So in the last 26 days we have:
> 
> 4.1 rc / ga: 7 issues
> - Fixing generate.sh behavior w/out options provided (CASSANDRA-17995)
> - Fix race condition on repair snapshots (CASSANDRA-17955)
> - Add --resolve-ip option on nodetool gossipinfo (CASSANDRA-17934)
> - Automatically detect and repeat new or changed tests in circleci config (CASSANDRA-17939)
> - Update What's New page for 4.1 and trunk (CASSANDRA-17976)
> - Update Netbeans project file for dependency (CASSANDRA-18002)
> - RPM installation on centos7 is broken (CASSANDRA-17765)
> 
> 4.0.x: 5 issues
> - CircleCI: j11_utest_fqltool fails to build (CASSANDRA-18020)
> - CircleCI: Skip checkstyle in the ant-based repeated tests (CASSANDRA-18000)
> - Fix CircleCI config for running python upgrade tests on 3.0 and 3.11 (CASSANDRA-17912)
> - Update debian packages for bullseye (CASSANDRA-17871)
> - CircleCI: Add jobs for running specialized unit tests with Java 11 (CASSANDRA-17987)
> 
> 4.X / Next: 6 issues
> - Round out cqlsh completion test coverage (CASSANDRA-16640)
> - Log JVM arguments at in-JVM test class initialization (CASSANDRA-16664)
> - nodetool bootstrap resume returns success even if there is an error during bootstrap (CASSANDRA-16491)
> - Make resumable bootstrap feature optional (CASSANDRA-17679)
> - Include GitSHA in nodetool version output (CASSANDRA-17753)
> - CEP-19: Trie Memtable implementation (CASSANDRA-17240)
> 
> Phew! And this is why I should keep to the biweekly cadence; there's a lot going on these days. :)
> 
> ~Josh
> 

Re: Cassandra project status update 2022-11-07

Posted by "Miklosovic, Stefan" <St...@netapp.com>.
Hi Josh,

thanks for the status.

I would like to raise awareness that as we fix CASSANDRA-17964, it will introduce two tests which will start to fail (because they were not executed as part of CI until now because how they are named (not ending on *Test)).

I believe that these tests will need to be addressed and fixed before 4.1 is out.

My email describing that in more detail is here (1).

(1) https://lists.apache.org/thread/pl0q1krhgv0rvybp5jmdy3411hchy28l

Regards,

Stefan

(1) https://lists.apache.org/thread/pl0q1krhgv0rvybp5jmdy3411hchy28l

________________________________________
From: Josh McKenzie <jm...@apache.org>
Sent: Monday, November 7, 2022 22:59
To: dev
Subject: Cassandra project status update 2022-11-07

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.



Oh good grief, it's been 26 days since I wrote one of these. My apologies! (Life happens - I can confirm that the terribly named "triple-demic" is real folks)

We've had a number of releases since the last status email. The current and latest supported GA cassandra releases across all branches are:

- cassandra 4: 4.0.7
- cassandra 3.11: 3.11.14
- cassandra 3.0: 3.0.28


[Needs Committers]
I'd like to first focus our attention on tickets that are flagged as "Needs Committer". Our project rules for Cassandra are that 2 committers need to sign off on a commit, so many times if an author or reviewer isn't yet a committer, these tickets can need external input to get into the codebase. The following URL is for a query to pull the Needs Committer tickets: https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20and%20resolution%20%3D%20unresolved%20and%20status%20%3D%20%22Needs%20Committer%22

CASSANDRA-17861, Update Python test framework from nose to pytest in CCM could use another committer on it: https://issues.apache.org/jira/browse/CASSANDRA-17861

CASSANDRA-17870, nodetool/rebuild: Add flag to exclude nodes from local datacenter could also use another committer on review: https://issues.apache.org/jira/browse/CASSANDRA-17870

CASSANDRA-15402, Make incremental backup configurable per keyspace and table looks like it has committer attention as per a recent comment so we're good there.

CASSANDRA-14930, decommission may cause timeout because messaging backlog is cleared: not sure why this one is marked as Needs Committer actually as it has 2 as reviewer. Might just need a status update.

Before we get to 4.1 status, I'd like to call out that Trie memtables were merged in CASSANDRA-17240. This is a large body of novel work (that Branimir presented on at ApacheCon for those of you lucky enough to attend) and it's great to see this land in the project; it's worth your time to pop open that diff and take a look around and see some of the new things being added to Cassandra. Notably, there's some great discussion about property-based testing going on in the comments which has sparked some offline discussion about how we can integrate exploratory fuzz testing in our primary CI pipeline; more to come on that front as discussions evolve.


[4.1 status]
Let's move on to 4.1 status. We're down to 2 tickets blocking rc, and I'm given to understand that the one in progress is close to having something to review, so on the "outstanding work" side we're in great shape: https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484

That leaves us with the question: what do we do about CI? We've recently expanded our governance options as to what we consider validated and cleared for release: https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle. Specifically:

"Three consecutive green runs of circleci OR of ASF CI are required to cut RC"

Our most recent run of 4.1 on ASF infra had 9 failures - https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-4.1/cassandra-4.1. This has been trending up a bit very recently from a low of 1 a bit over a week ago; the lion's share of the failures look to be environmental with timeouts.

With ASF CI having stragglers that are flaking lately, option 2 would be three consecutive green runs on circleci, however in order for this to be representative we need some improvements to the test configuration in circle to get it into parity with the ASF env, as tracked in CASSANDRA-17930 here: https://issues.apache.org/jira/browse/CASSANDRA-17930. As of a recent comment Ekaterina's taking point on this and tracking that addition in CASSANDRA-18001: https://issues.apache.org/jira/browse/CASSANDRA-18001. Ekaterina - if there's anything other folks on the project can do to assist (including reviewing) please let us know.

So we do have a 3rd option we discussed in slack: running tests on the ASF infra and then selectively multiplexing failures on circle. If a test fails on ASF CI but passes 500 times on circle, the general consensus was that was sufficient for us to have confidence in the test. With the recent changes Andres introduced in CASSANDRA-17939, multiplexing multiple tests in circle has become very simple and you can see instructions on generating the correct circle config using .circleci/generate.sh --help (look for the REPEATED_UTESTS= , REPEATED_JVM_DTESTS=, etc options). This hybrid third approach (canonical run on ASF infra + multiplex failures on circle) gives us another outlet to get a validated release if necessary, albeit at the cost of more effort.

I'm working with some of the other contributors on ways we can evolve our canonical CI infrastructure as well as making that environment reproducible in order to get us a more stable environment in the ASF while also allowing contributors with access to private cloud hardware to run testing at higher parallelization levels; stay tuned for more detail on that in the coming weeks as well.

One last note I want to call out - immense amounts of energy from many contributors has gone into hardening our test infrastructure and improving our tests in the run up to 4.1. 9 tests failing out of a total suite count of 49,698 tests (as of build 202 on 4.1) is a 99.98% pass rate. That said, we're infrastructure software powering many of the world's most critical applications so we're going to keep pushing until we hit green and keep it there.


[New Contributors Getting Started]
We have a new entrant for new contributors! So technically this has been around awhile but I hadn't thought to promote it in these emails. We have an official management sidecar for Apache Cassandra as designed and delivered as part of CEP-1: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652224. This is a smaller and less complex project than the Cassandra Storage engine and Query Coordination so might prove an attractive on-ramp for any of you who have thought about getting involved but were daunted by the database internals themselves.

Open JIRA issues for the sidecar can be found here: https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRASC%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20assignee%20DESC%2C%20priority%20DESC%2C%20updated%20DESC

And the project can be cloned from the github repo here: https://github.com/apache/cassandra-sidecar

On the Cassandra side, we've curated 24 "Starter Tickets" across our various releases that are unassigned right now - these are also good candidates if you're looking for something a little more bite-sized to get adjusted: https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2160&quickFilter=2162. Likewise, documentation contributions and website contributions are generally good ways to get to know our project ecosystem, the commit process, and interact with some of the other contributors.

If you're feeling adventurous, there are quite a few tickets on the unassigned list on 4.0.x and 4.x that could be good candidates to take on: https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2160. There's 46 unassigned issues in 4.0.x and 311 in 4.x so there's a lot of options to choose from there.

We hang out in #cassandra-dev on https://the-asf.slack.com and there's a @cassandra_mentors alias you can use to reach a bunch of us that have volunteered to help newcomers get situated. If you need an invite to the slack channel feel free to reply to just me on this email and I'll get you set up.

Here's reference explaining the various types of contribution: https://cassandra.apache.org/_/community.html#how-to-contribute
An overview of the C* architecture: https://cassandra.apache.org/doc/latest/cassandra/architecture/overview.html
The getting started contributing guide: https://cassandra.apache.org/_/development/index.html


[Dev list Digest]
https://lists.apache.org/list?dev@cassandra.apache.org:lte=26d:
26 days is a lot of ground to cover here. :)

The thread on whether we were going to do 4.2 or 5.0 came to a close here: https://lists.apache.org/thread/ymj3737x25b7bbqv9lp27w5v1ftc83j9. Results are enshrined in CASSANDRA-17973: https://issues.apache.org/jira/browse/CASSANDRA-17973 (spoiler alert: we're going with 5.0)

We had a solid discussion about changes to improve circleci (https://lists.apache.org/thread/c7hp1wt06r14v1vpovjd5mzy62gdsxqh) that culminated in CASSANDRA-18007 being created: https://issues.apache.org/jira/browse/CASSANDRA-18007.

Erick Ramirez provided a PR and proposal for a formal events page for our website: https://lists.apache.org/thread/hn1b8ymn5sq3w31dvrorroqm2q7yw82v, that can be seen here now that it's merged: https://cassandra.apache.org/_/events.html

Derek Chen-Becker had a general question about our usage of sh vs. bash: https://lists.apache.org/thread/dzn34v18rhgsxo9grlmxrvxnp0521hgz. The quick and dirty lazy consensus there seems to be "user-facing don't change from sh, dev-facing let's go bash".

Derek has a well thought out and articulated proposal about refactoring and cleaning up our CircleCI config to make use of some of the idiomatic features and parameterization available in the ecosystem: https://lists.apache.org/thread/mvql1p5y2j7so18427zcg4zxc9vzl7l3.

We've had some tests slip through the cracks historically as they didn't match the prescribed regex that picks up test file names; Stefan Miklosovic called this out on a thread here: https://lists.apache.org/thread/vhqprqcv070vmomoozyqdn75fvdd1oll. There's a couple of proposals that have come up on the thread (that are ultimately complementary) - using Checkstyle to force a certain file format and extending our logic during our build to look for non-abstract files in the test directory containing the @Test annotation. No real closure on this yet, and ultimately the person willing to do the work has the final say on it if nobody has any major concerns with an implementation which is the case here.

A few days ago David Capwell asked about places in our code where we haven't actually specified encoding meaning they've relied on the system specified default: https://lists.apache.org/thread/sokxf46s7hyoxr9q4wm7dv3q2nm19nt3. I've personally read that email three times now and can't think of a useful response other than to back away slowly, so maybe one of you will see that here and chime in. :)

And last but not least in this marathon catch-up, Ekaterina has put together a proposal for extending our code style regarding when we access JDK internals and when to hit the dev list for consensus on this thread: https://lists.apache.org/thread/ydgg308jl6sfcwg92kf6m7ylqqo089ho. Her proposal can be found here: https://github.com/ekaterinadimitrova2/cassandra-website/commit/4a9edc7e88fd9fc2c6aa8a84290b75b02cac03bf


[ASF CI Trends]
https://butler.cassandra.apache.org/#/

Here's our trends on our branches for the last 26 days:

3.0: 13 -> 10
3.11: 22 -> 11
4.0: 6 -> 2
4.1: 14 -> 9
trunk: 7 -> 21

We discussed 4.1 up above; 3.0 through 4.0 are trending in a good direction. Looks like quite a few of the trunk failures are from new messaging in logs on teardown that either don't have exceptions yet in the teardown parser or test that haven't been updated to change logic to match new defaults on trunk. I'd advocate for all those things being fixed _before_ they get into trunk of course, but I'm also responsible for some of them so I will refrain from throwing stones from within this fine glass house I'm in.


[Release progress]
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2278

I know I'm behind when I have to create a custom quick filter to have the kanban show some strange number of days. So in the last 26 days we have:

4.1 rc / ga: 7 issues
- Fixing generate.sh behavior w/out options provided (CASSANDRA-17995)
- Fix race condition on repair snapshots (CASSANDRA-17955)
- Add --resolve-ip option on nodetool gossipinfo (CASSANDRA-17934)
- Automatically detect and repeat new or changed tests in circleci config (CASSANDRA-17939)
- Update What's New page for 4.1 and trunk (CASSANDRA-17976)
- Update Netbeans project file for dependency (CASSANDRA-18002)
- RPM installation on centos7 is broken (CASSANDRA-17765)

4.0.x: 5 issues
- CircleCI: j11_utest_fqltool fails to build (CASSANDRA-18020)
- CircleCI: Skip checkstyle in the ant-based repeated tests (CASSANDRA-18000)
- Fix CircleCI config for running python upgrade tests on 3.0 and 3.11 (CASSANDRA-17912)
- Update debian packages for bullseye (CASSANDRA-17871)
- CircleCI: Add jobs for running specialized unit tests with Java 11 (CASSANDRA-17987)

4.X / Next: 6 issues
- Round out cqlsh completion test coverage (CASSANDRA-16640)
- Log JVM arguments at in-JVM test class initialization (CASSANDRA-16664)
- nodetool bootstrap resume returns success even if there is an error during bootstrap (CASSANDRA-16491)
- Make resumable bootstrap feature optional (CASSANDRA-17679)
- Include GitSHA in nodetool version output (CASSANDRA-17753)
- CEP-19: Trie Memtable implementation (CASSANDRA-17240)

Phew! And this is why I should keep to the biweekly cadence; there's a lot going on these days. :)

~Josh