You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by er...@apache.org on 2022/05/19 08:27:12 UTC

[cassandra-website] branch trunk updated (f4d31194 -> 5107aa9f)

This is an automated email from the ASF dual-hosted git repository.

erickramirezau pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra-website.git


    from f4d31194 New releases 3.0.27, 3.11.13, and 4.0.4
     new b5bfe943 CASSANDRA-17639
     new 5107aa9f Apply suggestions from Erick's code review

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../the-path-to-green-ci-unsplash-hasan-almasi.jpg | Bin 0 -> 3404325 bytes
 site-content/source/modules/ROOT/pages/blog.adoc   |  25 +++++
 .../ROOT/pages/blog/The-Path-to-Green-CI.adoc      | 118 +++++++++++++++++++++
 3 files changed, 143 insertions(+)
 create mode 100644 site-content/source/modules/ROOT/images/blog/the-path-to-green-ci-unsplash-hasan-almasi.jpg
 create mode 100644 site-content/source/modules/ROOT/pages/blog/The-Path-to-Green-CI.adoc


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org

[cassandra-website] 01/02: CASSANDRA-17639

Posted by er...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

erickramirezau pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra-website.git

commit b5bfe9433488ad696337f304b5008599fc1c29fe
Author: Diogenese Topper <di...@gmail.com>
AuthorDate: Wed May 18 20:48:27 2022 -0700

    CASSANDRA-17639
    
    patch by Josh McKenzie, Diogenese Topper; reviewed by -- for CASSANDRA-17639
    
    Co-authored by: Josh McKenzie
    Co-authored by: Diogenese Topper <di...@constantia.io>
---
 .../the-path-to-green-ci-unsplash-hasan-almasi.jpg | Bin 0 -> 3404325 bytes
 site-content/source/modules/ROOT/pages/blog.adoc   |  25 ++++
 .../ROOT/pages/blog/The-Path-to-Green-CI.adoc      | 133 +++++++++++++++++++++
 3 files changed, 158 insertions(+)

diff --git a/site-content/source/modules/ROOT/images/blog/the-path-to-green-ci-unsplash-hasan-almasi.jpg b/site-content/source/modules/ROOT/images/blog/the-path-to-green-ci-unsplash-hasan-almasi.jpg
new file mode 100644
index 00000000..0f07950b
Binary files /dev/null and b/site-content/source/modules/ROOT/images/blog/the-path-to-green-ci-unsplash-hasan-almasi.jpg differ
diff --git a/site-content/source/modules/ROOT/pages/blog.adoc b/site-content/source/modules/ROOT/pages/blog.adoc
index f7bc32a9..e707771a 100644
--- a/site-content/source/modules/ROOT/pages/blog.adoc
+++ b/site-content/source/modules/ROOT/pages/blog.adoc
@@ -8,6 +8,31 @@ NOTES FOR CONTENT CREATORS
 - Replace post tile, date, description and link to you post.
 ////
 
+//start card
+[openblock,card shadow relative test]
+----
+[openblock,card-header]
+------
+[discrete]
+=== The Path to Green CI
+[discrete]
+==== May 19, 2022
+------
+[openblock,card-content]
+------
+We reflect on our development journey, the work that goes into testing and pull out some numbers to demonstrate the level of testing that now goes into Apache Cassandra as we approach the GA of Cassandra 4.1.
+
+[openblock,card-btn card-btn--blog]
+--------
+
+[.btn.btn--alt]
+xref:blog/The-Path-to-Green-CI.adoc[Read More]
+--------
+
+------
+----
+//end card
+
 //start card
 [openblock,card shadow relative test]
 ----
diff --git a/site-content/source/modules/ROOT/pages/blog/The-Path-to-Green-CI.adoc b/site-content/source/modules/ROOT/pages/blog/The-Path-to-Green-CI.adoc
new file mode 100644
index 00000000..5d43e79a
--- /dev/null
+++ b/site-content/source/modules/ROOT/pages/blog/The-Path-to-Green-CI.adoc
@@ -0,0 +1,133 @@
+= The Path to Green CI
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: May, 19 2022
+:page-post-author: Josh McKenzie
+:description: Testing Apache Cassandra
+:keywords:
+
+:!figure-caption:
+
+.Image credit: https://unsplash.com/@hasanalmasi[Hasan Almasi on Unsplash^]
+image::blog/the-path-to-green-ci-unsplash-hasan-almasi.jpg[the path to Green CI]
+
+As we approach the Cassandra 4.1 GA release, it’s a great time to stop and reflect on some of our development history, the past year since we released 4.0, and where we’re headed in the future. In this blog post, we’re going to limit our focus to testing Cassandra as that’s been a huge focus in the 4.0+ time frame.
+
+=== The Numbers
+
+But don’t just take my word for it - let’s start with some numbers! We’re going to use “lines of code” as a loose proxy for “where we’re spending our time”. This is definitely a fraught metric, but as _one_ way of viewing things it paints a pretty interesting picture, consistent with many of our intuitions.
+
+Below is a list for our past three major releases and how much raw code exists in the following:
+
+1. src/java: raw database code
+2. test/unit: unit testing code
+3. test/distributed: distributed tests in java code
+4. The cassandra-dtest repo .py test files: python distributed tests using https://github.com/riptano/ccm[ccm^]
+
+[cols=3*]
+|=======
+|||*% total*
+
+|*3.0 Total:*|*316,262*|100.00%
+
+|src|181,871|57.51%
+
+|junit|100,284|31.71%
+
+|jdtest|9,225|2.92%
+
+|pdtest|24,882|7.87%
+
+|All tests|134,391|42.49%
+|=======
+
+[cols=4*]
+|=====
+|||*% total*|*% change since 3.0*
+
+|*4.0 Total:*|495,566|100.00%|156.69%
+
+|src|261,825|52.83%|143.96%
+
+|junit|172,672|34.84%|172.18%
+
+|jdtest|21,769|4.39%|235.98%
+
+|pdtest|39,300|7.93%|157.95%
+
+|All tests|233,741|47.17%|
+|=======
+|=====
+
+[cols=4*]
+|=======
+|||*% total*|*% change since 4.0*
+
+|*4.1 Total:*|566,127|100.00%|114.24%
+
+|src|297,685|52.58%|113.70%
+
+|junit|197,231|34.84%|114.22%
+
+|jdtest|31,306|5.53%|143.81%
+
+|pdtest|39,905|7.05%|101.54%
+
+|All tests|268,442|47.42%|
+|=======
+
+The biggest thing that immediately jumps out to me: our in-jvm dtests have been growing at a very strong pace relative to the rest of the code-base. Immense effort has gone into not just authoring this testing _framework_, but also adding new tests to it. As a percentage of our total code, there was a significant jump from 3.0 to 4.0 of almost a 5% relative increase in total test code to the entire codebase.
+
+We can also infer that our new code addition to the database has _accelerated_ in the past year, as the delta from 4.0-4.1 represents a time frame of one calendar year and roughly 38k LoC net add vs. the 5.5 year gap between 3.0 and 4.0 with a net add of ~80k LoC. While the database also had a release line of 3.1-3.11 introducing new features and tests during this time window; for the sake of this analysis we’re only considering major traditional point releases.
+
+So what does all this mean for us working on and depending on the project? It means that with the release of 4.1, _we have 10% more code just *unit* testing the database than we had in the entire database in 3.0_. It means that new development on Cassandra is accelerating. It also means we’re constantly moving the goalposts on what’s required to keep https://ci-cassandra.apache.org/[Green (passing) CI (continuous integration)^].
+
+=== Keeping it Green
+
+We all know Cassandra is an incredibly complex piece of software. The power to scale up linearly to petabytes of data on hundreds of machines, with zero downtime, in a masterless single logical cluster, where machines can drop out and in, https://cassandra.apache.org/doc/latest/cassandra/operating/hints.html[hint], https://cassandra.apache.org/doc/latest/cassandra/operating/read_repair.html[heal], and https://cassandra.apache.org/doc/latest/cassandra/operating/repair.html[repair] simply  [...]
+
+One of our struggles over time has been the software and hardware complexity required to keep our testing infrastructure “clean”, or green, on an ongoing basis. Balancing runtime, resourcing, and cost with a system as complex as Cassandra is a fixed challenge to begin with and is only growing over time as we’ve seen above.
+
+We always drive down to stable 0 test failures at a GA release, however what we refer to as “flaky tests” sneak back into our suite over time. Let’s take a recent example, https://nightlies.apache.org/cassandra/ci-cassandra.apache.org/job/Cassandra-trunk/1112/[build run 1112^] on trunk (effectively Cassandra 4.1 pre-alpha).
+
+19 test failures! https://nightlies.apache.org/cassandra/ci-cassandra.apache.org/job/Cassandra-trunk/1112/testReport/[Out of an entire suite of 49,704 tests makes that a 99.96% pass rate^]. Nobody wants a database that works 99.96% of the time, however, and that’s assuming we have 100% test coverage of not just all our code but also all possible combinations of state, a problem so daunting some contributors are https://issues.apache.org/jira/browse/CASSANDRA-15348[pushing the bleeding ed [...]
+
+Burning down less than 20 flaky tests to get our release out between freeze and our goal for release, an eight-week window, is quite doable, so why not just continue to float along with a low number of test failures? Well, it gets more complicated when we look at _where_ we run our tests.
+
+=== Circle vs. Jenkins
+
+As https://cassandra.apache.org/_/development/testing.html[we outline in our contributor guide on testing], tests can both be run on https://ci-cassandra.apache.org/[Apache Jenkins infrastructure^] or on https://github.com/apache/cassandra/tree/cassandra-4.1/.circleci[CircleCI^]. The primary difference between these two systems are runtime, cost, and resources allocated to each individual test. While some contributors have access to paid CircleCI accounts that allow them to dedicate more [...]
+
+One challenge this introduces is tests that “flake” due to resource allocation differences. For instance, if you allocate a particularly intensive unit test to eight cores in a container with 16 gigs of RAM, you can expect a different runtime than allocating a container with two cores and eight gigs of RAM. Throw into the mix that all of us are doing development on different laptops, with different core counts, with different _architectures_, and you have a recipe for some pretty challen [...]
+
+https://cwiki.apache.org/confluence/x/1AorCQ[Currently we accept both a Green run on CircleCI and a Green run on DevBranch on jenkins as acceptable for committers to merge code^]. This introduces a gap for us as the Circle plan can allocate more resources to containers for running tests based on the plan you use, meaning a test could pass on Circle that subsequently fails on ASF Jenkins due to resourcing limitations.
+
+Another challenge we face is that it can be challenging to author tests in Cassandra that have deterministic results in the face of scheduling pressures. Given the long legacy of our project (which dates back to 2008), we have quite a bit of static state without existing stub implementations for testing, meaning many of our unit tests spin up state in other areas of the database, write files to disk, and otherwise mutate state in adjacent subsystems. https://github.com/apache/cassandra/b [...]
+
+=== Testing Complex Systems is Itself Complex
+
+The Cassandra testing ecosystem consists of a variety of different suites targeting different subsystems and operations in the database. From a high level, a look at the https://ci-cassandra.apache.org/job/Cassandra-trunk/[top level testing pipelines of the project^] shows standouts like testing with https://issues.apache.org/jira/browse/CASSANDRA-6809[compression^], with https://issues.apache.org/jira/browse/CASSANDRA-8844[change-data-capture^] enabled, during upgrades, both unit vs. di [...]
+
+Taking a quick look at the https://ci-cassandra.apache.org/job/Cassandra-trunk/1112/flowGraphTable/[runtime pipeline under the hood^], you can see the large distributed effort that it is to break down the different jobs across these different agents. https://github.com/apache/cassandra-builds/blob/trunk/jenkins-dsl/cassandra_job_dsl_seed.groovy[The code required to generate, distribute, build, collect logs from, teardown, and maintain^] all these jobs on these machines lives in the https [...]
+
+Throwing all this hardware and parallelization at our almost 50,000 tests takes our total test runtime *down to 4h 9m 4s*. A big shout-out to Mick Semb Wever, committer and PMC member on the project, who’s done a ton of work to get us this far with our CI infrastructure!
+
+We have a few ideas for ways to reduce the total processing burden of our tests; with this much compute required and this many tests, small percentages add up to big gains. Jacek Lewandowski is targeting some file operations and general speedup in https://issues.apache.org/jira/browse/CASSANDRA-17427[CASSANDRA-17427^], Berenguer Blasi is looking into potentially re-using dtest clusters in our python dtests to cut out unnecessary cluster startup and shutdown times in https://issues.apache [...]
+
+Lastly, we have a Jenkins to JIRA integration script drafted that would auto update tickets with the results of the CI runs on ASF Jenkins infrastructure with the results of their build in https://issues.apache.org/jira/browse/CASSANDRA-17277?focusedCommentId=17493385&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17493385[CASSANDRA-17277^]. This is necessary as we have two paths for code to get certified for inclusion (circle or ASF Jenkins) with the for [...]
+
+=== The Future of Testing in Cassandra
+
+As we head into the verification cycle for Cassandra 4.1 we’re going to be using https://cwiki.apache.org/confluence/x/tQzjBw[the same Release Lifecycle definitions^] we ratified back in 2019. Of note, we won’t transition from alpha to beta without green tests: _“No flaky tests - All tests (Unit Tests and DTests) should pass consistently. A failing test, upon analyzing the root cause of failure, may be “ignored in exceptional cases”, if appropriate, for the release, after discussion in t [...]
+ 
+So we’re going to drive back to a green test board as we do for each major release, but are we going to make an effort to stay there and if so, how?
+ 
+I’ve been working on this project since early 2014 (!), and this has always been a challenge for us. That said, after analyzing the numbers for this blog post and realizing _just how much_ we’re proportionally expanding our _testing_, I’m heartened by the progress we’re making; the proportion of flaky or failing tests is objectively falling over time. A total of 15 failing tests out of 50,000 is a lot less than 15 failing out of 25,000, or 12,500 for example, so we’re definitely moving i [...]
+ 
+If we take the value of having a green test board as self-evident (developer time, triaging, branch stability, feedback loops, etc), how can we stay there after the 4.1 release? The combination of a bot letting us know ASAP if our patch correlates with a new test failure should help, as will lowering the total runtime required between running our tests and merging them. Lastly, in January of 2022 we introduced a new https://cwiki.apache.org/confluence/x/DI3kCw[Build Lead^] role to shephe [...]
+ 
+We have a balanced tension between wanting to get code changes into the system rapidly for contributors fortunate enough to be able to use CircleCI while also providing for and encouraging usage of the freely available Apache Jenkins infrastructure, but we’re bridging the gap this naturally creates.
+ 
+Contributors around the globe are working hard to get Cassandra 4.1 GA soon and just like Cassandra 4.0 before it, we expect this to be the most stable, best performing version of Apache Cassandra we’ve ever released. You can download the test build of Cassandra 4.1 https://nightlies.apache.org/cassandra/cassandra-4.1/Cassandra-4.1-artifacts/23/Cassandra-4.1-artifacts/[here^] and test it out - let us know what you think!
+ 
+If you haven’t yet, come join the xref:community.adoc[Cassandra development community] and get involved in making the most scalable and available database in the world!
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org

[cassandra-website] 02/02: Apply suggestions from Erick's code review

Posted by er...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

erickramirezau pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra-website.git

commit 5107aa9f2a5705d2c7666088b77924315a672112
Author: Erick Ramirez <er...@gmail.com>
AuthorDate: Thu May 19 18:25:05 2022 +1000

    Apply suggestions from Erick's code review
---
 site-content/source/modules/ROOT/pages/blog.adoc   |  2 +-
 .../ROOT/pages/blog/The-Path-to-Green-CI.adoc      | 53 ++++++++--------------
 2 files changed, 20 insertions(+), 35 deletions(-)

diff --git a/site-content/source/modules/ROOT/pages/blog.adoc b/site-content/source/modules/ROOT/pages/blog.adoc
index e707771a..34141efd 100644
--- a/site-content/source/modules/ROOT/pages/blog.adoc
+++ b/site-content/source/modules/ROOT/pages/blog.adoc
@@ -20,7 +20,7 @@ NOTES FOR CONTENT CREATORS
 ------
 [openblock,card-content]
 ------
-We reflect on our development journey, the work that goes into testing and pull out some numbers to demonstrate the level of testing that now goes into Apache Cassandra as we approach the GA of Cassandra 4.1.
+As we approach the GA of Cassandra 4.1, we reflect on our development journey and show some statistics to demonstrate the level of testing that now goes into the project.
 
 [openblock,card-btn card-btn--blog]
 --------
diff --git a/site-content/source/modules/ROOT/pages/blog/The-Path-to-Green-CI.adoc b/site-content/source/modules/ROOT/pages/blog/The-Path-to-Green-CI.adoc
index 5d43e79a..ce8c2453 100644
--- a/site-content/source/modules/ROOT/pages/blog/The-Path-to-Green-CI.adoc
+++ b/site-content/source/modules/ROOT/pages/blog/The-Path-to-Green-CI.adoc
@@ -15,64 +15,49 @@ As we approach the Cassandra 4.1 GA release, it’s a great time to stop and ref
 
 === The Numbers
 
-But don’t just take my word for it - let’s start with some numbers! We’re going to use “lines of code” as a loose proxy for “where we’re spending our time”. This is definitely a fraught metric, but as _one_ way of viewing things it paints a pretty interesting picture, consistent with many of our intuitions.
+But don’t just take my word for it - let’s start with some numbers! We’re going to use “lines of code” (LoC) as a loose proxy for “where we’re spending our time”. This is definitely a fraught metric, but as _one_ way of viewing things it paints a pretty interesting picture, consistent with many of our intuitions.
 
 Below is a list for our past three major releases and how much raw code exists in the following:
 
-1. src/java: raw database code
-2. test/unit: unit testing code
-3. test/distributed: distributed tests in java code
-4. The cassandra-dtest repo .py test files: python distributed tests using https://github.com/riptano/ccm[ccm^]
+1. `src/java`: raw database code
+2. `test/unit`: unit testing code
+3. `test/distributed`: distributed tests in java code
+4. The `cassandra-dtest` repo .py test files: python distributed tests using https://github.com/riptano/ccm[ccm^]
 
-[cols=3*]
+[%header,cols=3*]
 |=======
-|||*% total*
-
-|*3.0 Total:*|*316,262*|100.00%
-
+|&nbsp; |Lines |% total
 |src|181,871|57.51%
-
 |junit|100,284|31.71%
-
 |jdtest|9,225|2.92%
-
 |pdtest|24,882|7.87%
-
+|*3.0 Total*|*316,262*|100.00%
 |All tests|134,391|42.49%
 |=======
 
-[cols=4*]
-|=====
-|||*% total*|*% change since 3.0*
-
-|*4.0 Total:*|495,566|100.00%|156.69%
+{sp} +
 
+[%header,cols=4*]
+|=====
+|&nbsp; |Lines |% total |% change since 3.0
 |src|261,825|52.83%|143.96%
-
 |junit|172,672|34.84%|172.18%
-
 |jdtest|21,769|4.39%|235.98%
-
 |pdtest|39,300|7.93%|157.95%
-
+|*4.0 Total*|*495,566*|100.00%|156.69%
 |All tests|233,741|47.17%|
 |=======
-|=====
 
-[cols=4*]
+{sp} +
+
+[%header,cols=4*]
 |=======
 |||*% total*|*% change since 4.0*
-
-|*4.1 Total:*|566,127|100.00%|114.24%
-
 |src|297,685|52.58%|113.70%
-
 |junit|197,231|34.84%|114.22%
-
 |jdtest|31,306|5.53%|143.81%
-
 |pdtest|39,905|7.05%|101.54%
-
+|*4.1 Total*|*566,127*|100.00%|114.24%
 |All tests|268,442|47.42%|
 |=======
 
@@ -84,7 +69,7 @@ So what does all this mean for us working on and depending on the project? It me
 
 === Keeping it Green
 
-We all know Cassandra is an incredibly complex piece of software. The power to scale up linearly to petabytes of data on hundreds of machines, with zero downtime, in a masterless single logical cluster, where machines can drop out and in, https://cassandra.apache.org/doc/latest/cassandra/operating/hints.html[hint], https://cassandra.apache.org/doc/latest/cassandra/operating/read_repair.html[heal], and https://cassandra.apache.org/doc/latest/cassandra/operating/repair.html[repair] simply  [...]
+We all know Cassandra is an incredibly complex piece of software. The power to scale up linearly to petabytes of data on hundreds of machines, with zero downtime, in a masterless single logical cluster, where machines can drop out and in, link:/doc/latest/cassandra/operating/hints.adoc[hint], link:/doc/latest/cassandra/operating/read_repair.adoc[heal], and link:/doc/latest/cassandra/operating/repair.adoc[repair] simply cannot be implemented without a significant amount of code, infrastru [...]
 
 One of our struggles over time has been the software and hardware complexity required to keep our testing infrastructure “clean”, or green, on an ongoing basis. Balancing runtime, resourcing, and cost with a system as complex as Cassandra is a fixed challenge to begin with and is only growing over time as we’ve seen above.
 
@@ -96,7 +81,7 @@ Burning down less than 20 flaky tests to get our release out between freeze and
 
 === Circle vs. Jenkins
 
-As https://cassandra.apache.org/_/development/testing.html[we outline in our contributor guide on testing], tests can both be run on https://ci-cassandra.apache.org/[Apache Jenkins infrastructure^] or on https://github.com/apache/cassandra/tree/cassandra-4.1/.circleci[CircleCI^]. The primary difference between these two systems are runtime, cost, and resources allocated to each individual test. While some contributors have access to paid CircleCI accounts that allow them to dedicate more [...]
+As xref:/development/testing.adoc[we outline in our contributor guide on testing], tests can both be run on https://ci-cassandra.apache.org/[Apache Jenkins infrastructure^] or on https://github.com/apache/cassandra/tree/cassandra-4.1/.circleci[CircleCI^]. The primary difference between these two systems are runtime, cost, and resources allocated to each individual test. While some contributors have access to paid CircleCI accounts that allow them to dedicate more resources to their test  [...]
 
 One challenge this introduces is tests that “flake” due to resource allocation differences. For instance, if you allocate a particularly intensive unit test to eight cores in a container with 16 gigs of RAM, you can expect a different runtime than allocating a container with two cores and eight gigs of RAM. Throw into the mix that all of us are doing development on different laptops, with different core counts, with different _architectures_, and you have a recipe for some pretty challen [...]
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org