You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pr@cassandra.apache.org by GitBox <gi...@apache.org> on 2021/11/09 01:06:49 UTC

[GitHub] [cassandra-website] ErickRamirezDS commented on a change in pull request #80: Blog Post: Harry, an Open Source Fuzz Testing and Verification Tool for Apache Cassandra.

ErickRamirezDS commented on a change in pull request #80:
URL: https://github.com/apache/cassandra-website/pull/80#discussion_r745201959



##########
File path: site-content/source/modules/ROOT/pages/blog/Harry-an-Open-Source-Fuzz-Testing-and-Verification-Tool-for-Apache-Cassandra.adoc
##########
@@ -0,0 +1,523 @@
+= Harry, an Open Source Fuzz Testing and Verification Tool for Apache Cassandra
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: November 2, 2021
+:page-post-author: The Apache Cassandra Community
+:description: The Apache Cassandra Community
+:keywords: 
+
+Over the years working on Apache Cassandra while writing tests or
+trying to reproduce the issues, I’ve always found myself repeating the
+same procedure over and over again: creating schema, writing loops
+generating data, then either manually reconciling it to check the
+results, or comparing the result set against some predetermined
+expected result. Not only is this approach tedious and time-consuming,
+but it also does not scale: if some set of operations work for one
+schema, there’s no way to know if it will also work for any arbitrary
+schema, whether it will work if operations are executed in a different
+order, or if operations themselves are slightly different.
+
+While preparing Apache Cassandra for 4.0 release, we’ve made extensive
+progress in how we test. The new in-tree in-JVM distributed test
+framework enables us to easily write tests that exercise coordinated
+query execution code paths while giving us flexibility and control
+that was previously offered only by CQLTester, a tool for exercising
+node-local query paths. Many subsystems were audited and covered with
+tests. Cassandra users tried the new version out in their clusters and
+reported their findings. All of these things are useful and important,
+but we still needed a tool that would give us the same or higher
+degree of confidence for every commit so that we could know that the
+database is working as expected, not only for an exact set of
+operations that exercised by unit and integration tests, but
+potentially for any use-case and combination of operations under
+circumstances comparable to production.
+
+This all led us to develop Harry, a tool that can combine properties
+of stress- and integration-testing tools. Harry is a tool that can
+generate data for an arbitrary schema, execute data modification
+queries against the cluster, track the progress of operation
+execution, and make sure that responses to read queries are correct.
+
+After reading this post, you will understand:
+
+* how Harry generates the data
+* how Harry performs verification
+* which properties values generated by  Harry make verification not only possible but also efficient. 
+
+The primary audience for this post is Cassandra contributors, so you
+will need to be familiar with Apache Cassandra and its tooling.
+
+== Fuzz testing 
+
+Since most of the bugs are reproduced by taking a sequence of actions
+following some pattern, we need to specify what actions can be used to
+lead to a given state. However, there’s a lot of flexibility regarding
+which values exactly are written in specific columns.
+
+For example, if we look at
+https://issues.apache.org/jira/browse/CASSANDRA-16453?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel[CASSANDRA-16453,window=_blank],

Review comment:
       Please clean up the link to just use `https://issues.apache.org/jira/browse/CASSANDRA-16453`. 🙂

##########
File path: site-content/source/modules/ROOT/pages/blog/Harry-an-Open-Source-Fuzz-Testing-and-Verification-Tool-for-Apache-Cassandra.adoc
##########
@@ -0,0 +1,523 @@
+= Harry, an Open Source Fuzz Testing and Verification Tool for Apache Cassandra
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: November 2, 2021
+:page-post-author: The Apache Cassandra Community

Review comment:
       Please update the author to `Alex Petrov`. 🍻

##########
File path: site-content/source/modules/ROOT/pages/blog.adoc
##########
@@ -1,762 +1,812 @@
-= Blog
-:page-layout: blog-landing
-:page-role: blog-landing
-
-////
-NOTES FOR CONTENT CREATORS
-- To add a new blog post, copy and paste markup for one card below.  Copy from '//start' to the next '//end'
-- Replace post tile, date, description and link to you post.
-////
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-=== Apache Cassandra Changelog #10
-[discrete]
-==== October 5, 2021
-------
-[openblock,card-content]
-------
-Apache Cassandra 4.0.1 is released, and Aleksei Zotov becomes a committer. Discussions are underway for some key, new feature proposals, including support for general-purpose transactions and Storage Attached Index (SAI). CEP-11, the pluggable memtable implementations proposal, has been approved, as has CEP-13 for a denylisting partitions feature.l-making.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Apache-Cassandra-Changelog-10-October-2021.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-=== Reaper: Anti-entropy Repair Made Easy 
-[discrete]
-==== September 28, 2021
-------
-[openblock,card-content]
-------
-Originally designed by Spotify, Reaper is an open source written in Java to schedule and orchestrate repairs of Apache Cassandra clusters. It helps make repairs as safe and reliable as possible, and with the recent release of Apache Cassandra 4.0 that also includes incremental repairs.
-
-[openblock,card-btn card-btn--blog]
---------
-[.btn.btn--alt]
-xref:blog/Reaper-Anti-entropy-Repair-Made-Easy.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-=== Join Cassandra at Apachecon 2021
-[discrete]
-==== September 20, 2021
-------
-[openblock,card-content]
-------
-Register to attend ApacheCon 2021 for a packed series of presentations on the new features in development for Apache Cassandra, along with best practices for CI & testing, and cutting-edge use cases. The BoF event at the end of the day includes a deep dive into Apache Cassandra 4.0 and cocktail-making.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Join-Cassandra-at-ApacheCon-2021.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-=== Cassandra on Kubernetes: A Beginner's Guide 
-[discrete]
-==== September 4, 2021
-------
-[openblock,card-content]
-------
-Managing infrastructure has been standardizing around Kubernetes. Learn how the Apache Cassandra community has been developing solutions to simplify deployment and management of data with Cassandra operators and open source distributions for Kubernetes.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Cassandra-on-Kubernetes-A-Beginners-Guide.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-=== Apache Cassandra Upgrade Advisory 
-[discrete]
-==== August 18, 2021
-------
-[openblock,card-content]
-------
-Users of Apache Cassandra 3.023, 3.0.24, 3.11.9 and 3.11.10 should upgrade due to the potential for data corruption during schema changes.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Upgrade-Advisory.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-=== Apache Cassandra Changelog #9 
-[discrete]
-==== August 18, 2021
-------
-[openblock,card-content]
-------
-Release of 4.0 GA, 3.0.25, and 3.0.11, upgrade advisory and Jon Meredith becomes committer.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Apache-Cassandra-Changelog-9-August-2021.adoc[Read More]
---------
-
-------
-----
-//end card
-
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-=== Apache Cassandra 4.0 Overview 
-[discrete]
-==== August 18, 2021
-------
-[openblock,card-content]
-------
-Take a look at the full overview of the latest and greatest features of Apache Cassandra 4.0.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Apache-Cassandra-4.0-Overview.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-=== Apache Cassandra 4.0 is Here 
-[discrete]
-==== July 27, 2021
-------
-[openblock,card-content]
-------
-On November 9th, 2015 the Apache Cassandra project released version 3.0 and, with it, a host of really big changes you would expect in a major version.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Apache-Cassandra-4.0-is-Here.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-=== Apache Cassandra Changelog #8 
-[discrete]
-==== June 28, 2021
-------
-[openblock,card-content]
-------
-4.0-rc2 released, say hello to our Google Summer of Code intern and new community intro to Cassandra videos.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Apache-Cassandra-Changelog-8-June-2021.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-=== Cassandra and Kubernetes: SIG Update #2 
-[discrete]
-==== June 9, 2021
-------
-[openblock,card-content]
-------
-The Cassandra Kubernetes SIG is excited to share that there has been coalescence around the Cass Operator project as the community-based operator.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Cassandra-and-Kubernetes-SIG-Update-2.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-=== Apache Cassandra Changelog #7
-[discrete]
-==== May 31, 2021
-------
-[openblock,card-content]
-------
-Our monthly roundup of key activities and knowledge to keep the community informed.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Apache-Cassandra-Changelog-7-May-2021.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-=== Speakers Announce for April 28 Cassandra 4.0 World party
-[discrete]
-==== April 19,2021
-------
-[openblock,card-content]
-------
-The list of speakers for Apache Cassandra's upcoming 4.0 World Party.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Speakers-Announced-for-April-28-Cassandra-4.0-World-Party.adoc[Read More]
---------
-
-------
-----
-//end card
-
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-=== Apache Cassandra Changelog #6
-[discrete]
-==== April 12,2021
-------
-[openblock,card-content]
-------
-Our monthly roundup of key activities and knowledge to keep the community informed.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Apache-Cassandra-Changelog-6-April-2021.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-=== Apache Cassandra World Party 2021
-[discrete]
-==== March 25, 2021
-------
-[openblock,card-content]
-------
-We are now one of the most important databases today and manage the biggest workloads in the world. Because of that, we want to gather the worldwide community to 
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/World-Party.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-===  Join Apache Cassandra for Google Summer of Code 2021 
-[discrete]
-==== March 10, 2021
-------
-[openblock,card-content]
-------
-The ASF has been a GSoC mentor organization since the beginning. Apache Cassandra mentored a successful GSoC project in 2016 and we are participating again this year.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Join-Cassandra-GSoC-2021.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-===  Apache Cassandra Changelog #5 
-[discrete]
-==== March 08, 2021
-------
-[openblock,card-content]
-------
-Our monthly roundup of key activities and knowledge to keep the community informed.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Apache-Cassandra-Changelog-5-March-2021.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-===  Apache Cassandra Changelog #4 
-[discrete]
-==== February 11, 2021
-------
-[openblock,card-content]
-------
-Our monthly roundup of key activities and knowledge to keep the community informed.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Apache-Cassandra-Changelog-4-February-2021.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-===  Apache Cassandra Changelog #3
-[discrete]
-==== January 19, 2021
-------
-[openblock,card-content]
-------
-Our monthly roundup of key activities and knowledge to keep the community informed.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Apache-Cassandra-Changelog-3-January-2021.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-===  Apache Cassandra Changelog #2
-[discrete]
-==== December 01, 2020
-------
-[openblock,card-content]
-------
-Our monthly roundup of key activities and knowledge to keep the community informed.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Apache-Cassandra-Changelog-2-December-2020.adoc[Read More]
---------
-
-------
-----
-//end card
-
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-===  Apache Cassandra Changelog #1
-[discrete]
-==== October 28, 2020
-------
-[openblock,card-content]
-------
-Introducing the first Cassandra Changelog blog! Our monthly roundup of key activities and knowledge to keep the community informed.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Apache-Cassandra-Changelog-1-October-2020.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-===  Apache Cassandra Usage Report 2020
-[discrete]
-==== September 17, 2020
-------
-[openblock,card-content]
-------
-Apache Cassandra is the open source NoSQL database for mission critical data. Today the community announced findings from a comprehensive global survey of 901 practitioners on Cassandra usage. It’s the first of what will become an annual survey that provides a baseline understanding of who, how, and why organizations use Cassandra.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Apache-Cassandra-Usage-Report-2020.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-===  Improving Apache Cassandra’s Front Door and Backpressure
-[discrete]
-==== September 03, 2020
-------
-[openblock,card-content]
-------
-As part of CASSANDRA-15013, we have improved Cassandra’s ability to handle high throughput workloads, while having enough safeguards in place to protect itself from potentially going out of memory. In order to better explain the change we have made, let us understand at a high level, on how an incoming request is processed by Cassandra before the fix, followed by what we changed, and the new relevant configuration knobs available.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Improving-Apache-Cassandras-Front-Door-and-Backpressure.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-===  Cassandra and Kubernetes: SIG Update and Survey
-[discrete]
-==== August 14, 2020
-------
-[openblock,card-content]
-------
-Five operators for Apache Cassandra have been created that have made it easier to run containerized Cassandra on Kubernetes. Recently the major contributors to these operators came together to discuss the creation of a community-based operator with the intent of making one that makes it easy to run C* on K8s. One of the project’s organizational goals is that the end result will eventually become part of the Apache Software Foundation or the Apache Cassandra project.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Cassandra-and-Kubernetes-SIG-Update-and-Survey.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-===  Introducing Apache Cassandra 4.0 Beta: Battle Tested From Day One
-[discrete]
-==== July 20, 2020
-------
-[openblock,card-content]
-------
-This is the most stable Apache Cassandra in history; you should start using Apache Cassandra 4.0 Beta today in your test and QA environments, head to the downloads site to get your hands on it. The Cassandra community is on a mission to deliver a 4.0 GA release that is ready to be deployed to production. You can guarantee this holds true by running your application workloads against the Beta release and contributing to the community’s validation effort to get Cassandra 4.0 to GA.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Introducing-Apache-Cassandra-4-Beta-Battle-Tested-From-Day-One.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-===  Even Higher Availability with 5x Faster Streaming in Cassandra 4.0
-[discrete]
-==== April 09, 2019
-------
-[openblock,card-content]
-------
-Streaming is a process where nodes of a cluster exchange data in the form of SSTables. Streaming can kick in during many situations such as bootstrap, repair, rebuild, range movement, cluster expansion, etc. In this post, we discuss the massive performance improvements made to the streaming process in Apache Cassandra 4.0.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Even-Higher-Availability-with-5x-Faster-Streaming-in-Cassandra-4.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-===  Introducing Transient Replication
-[discrete]
-==== December 03, 2018
-------
-[openblock,card-content]
-------
-Transient Replication is a new experimental feature soon to be available in 4.0. When enabled, it allows for the creation of keyspaces where replication factor can be specified as a number of copies (full replicas) and temporary copies (transient replicas). Transient replicas retain the data they replicate only long enough for it to be propagated to full replicas, via incremental repair, at which point the data is deleted. Writing to transient replicas can be avoided almost entirely if monotonic reads are not required because it is possible to achieve a quorum of acknowledged writes without them.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Introducing-Transient-Replication.adoc[Read More]
---------
-
-------
-----
-//end card
-
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-===  Audit Logging in Apache Cassandra 4.0
-[discrete]
-==== October 29, 2018
-------
-[openblock,card-content]
-------
-Database audit logging is an industry standard tool for enterprises to capture critical data change events including what data changed and who triggered the event. These captured records can then be reviewed later to ensure compliance with regulatory, security and operational policies.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Audit-Logging-in-Apache-Cassandra-4.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-===  Finding Bugs in Cassandra's Internals with Property-based Testing
-[discrete]
-==== October 17, 2018
-------
-[openblock,card-content]
-------
-As of September 1st, the Apache Cassandra community has shifted the focus of Cassandra 4.0 development from new feature work to testing, validation, and hardening, with the goal of releasing a stable 4.0 that every Cassandra user, from small deployments to large corporations, can deploy with confidence. There are several projects and methodologies that the community is undertaking to this end. One of these is the adoption of property-based testing, which was previously introduced here. This post will take a look at a specific use of this approach and how it found a bug in a new feature meant to ensure data integrity between the client and Cassandra.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Finding-Bugs-in-Cassandra\'s-Internals-with-Property-based-Testing.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-===  Testing Apache Cassandra 4.0
-[discrete]
-==== August 21, 2018
-------
-[openblock,card-content]
-------
-With the goal of ensuring reliability and stability in Apache Cassandra 4.0, the project’s committers have voted to freeze new features on September 1 to concentrate on testing and validation before cutting a stable beta. Towards that goal, the community is investing in methodologies that can be performed at scale to exercise edge cases in the largest Cassandra clusters. The result, we hope, is to make Apache Cassandra 4.0 the best-tested and most reliable major release right out of the gate.
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Testing-Apache-Cassandra-4.adoc[Read More]
---------
-
-------
-----
-//end card
-
-//start card
-[openblock,card shadow relative test]
-----
-[openblock,card-header]
-------
-[discrete]
-===  Hardware-bound Zero Copy Streaming in Apache Cassandra 4.0
-[discrete]
-==== August 07, 2018
-------
-[openblock,card-content]
-------
-Streaming in Apache Cassandra powers host replacement, range movements, and cluster expansions. Streaming plays a crucial role in the cluster and as such its performance is key to not only the speed of the operations its used in but the cluster’s health generally. In Apache Cassandra 4.0, we have introduced an improved streaming implementation that reduces GC pressure and increases throughput several folds and are now limited, in some cases, only by the disk / network IO (See: CASSANDRA-14556).
-
-[openblock,card-btn card-btn--blog]
---------
-
-[.btn.btn--alt]
-xref:blog/Hardware-bound-Zero-Copy-Streaming-in-Apache-Cassandra-4.adoc[Read More]
---------
-
-------
-----
+= Blog
+:page-layout: blog-landing
+:page-role: blog-landing
+
+////
+NOTES FOR CONTENT CREATORS
+- To add a new blog post, copy and paste markup for one card below.  Copy from '//start' to the next '//end'
+- Replace post tile, date, description and link to you post.
+////
+
+//start card

Review comment:
       Please update your commit so only lines 17-34 are modified on `blog.adoc`. Thanks! 🍻

##########
File path: site-content/source/modules/ROOT/pages/blog/Harry-an-Open-Source-Fuzz-Testing-and-Verification-Tool-for-Apache-Cassandra.adoc
##########
@@ -0,0 +1,523 @@
+= Harry, an Open Source Fuzz Testing and Verification Tool for Apache Cassandra
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: November 2, 2021
+:page-post-author: The Apache Cassandra Community
+:description: The Apache Cassandra Community
+:keywords: 
+
+Over the years working on Apache Cassandra while writing tests or
+trying to reproduce the issues, I’ve always found myself repeating the
+same procedure over and over again: creating schema, writing loops
+generating data, then either manually reconciling it to check the
+results, or comparing the result set against some predetermined
+expected result. Not only is this approach tedious and time-consuming,
+but it also does not scale: if some set of operations work for one
+schema, there’s no way to know if it will also work for any arbitrary
+schema, whether it will work if operations are executed in a different
+order, or if operations themselves are slightly different.
+
+While preparing Apache Cassandra for 4.0 release, we’ve made extensive
+progress in how we test. The new in-tree in-JVM distributed test
+framework enables us to easily write tests that exercise coordinated
+query execution code paths while giving us flexibility and control
+that was previously offered only by CQLTester, a tool for exercising
+node-local query paths. Many subsystems were audited and covered with
+tests. Cassandra users tried the new version out in their clusters and
+reported their findings. All of these things are useful and important,
+but we still needed a tool that would give us the same or higher
+degree of confidence for every commit so that we could know that the
+database is working as expected, not only for an exact set of
+operations that exercised by unit and integration tests, but
+potentially for any use-case and combination of operations under
+circumstances comparable to production.
+
+This all led us to develop Harry, a tool that can combine properties
+of stress- and integration-testing tools. Harry is a tool that can
+generate data for an arbitrary schema, execute data modification
+queries against the cluster, track the progress of operation
+execution, and make sure that responses to read queries are correct.
+
+After reading this post, you will understand:
+
+* how Harry generates the data
+* how Harry performs verification
+* which properties values generated by  Harry make verification not only possible but also efficient. 
+
+The primary audience for this post is Cassandra contributors, so you
+will need to be familiar with Apache Cassandra and its tooling.
+
+== Fuzz testing 
+
+Since most of the bugs are reproduced by taking a sequence of actions
+following some pattern, we need to specify what actions can be used to
+lead to a given state. However, there’s a lot of flexibility regarding
+which values exactly are written in specific columns.
+
+For example, if we look at
+https://issues.apache.org/jira/browse/CASSANDRA-16453?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel[CASSANDRA-16453,window=_blank],
+which was reproduced using Harry. Code to reproduce the issue with
+in-JVM DTests looks something like this:
+
+.Repro.java
+[source,java]
+----
+try (Cluster cluster = init(builder().withNodes(2).start()))
+{
+    cluster.schemaChange(withKeyspace("CREATE TABLE distributed_test_keyspace.table_0 (pk0 bigint,ck0 bigint,regular0 bigint,regular1 bigint,regular2 bigint, PRIMARY KEY (pk0, ck0)) WITH  CLUSTERING ORDER BY (ck0 ASC);"));
+    cluster.coordinator(1).execute("DELETE FROM distributed_test_keyspace.table_0 USING TIMESTAMP 1 WHERE pk0=1 AND ck0>2;", ConsistencyLevel.ALL);
+    cluster.get(2).executeInternal("DELETE FROM distributed_test_keyspace.table_0 USING TIMESTAMP 1 WHERE pk0=1;");
+    cluster.coordinator(1).execute("SELECT * FROM distributed_test_keyspace.table_0 WHERE pk0=1 AND ck0>=1 AND ck0<3;",
+                                   ConsistencyLevel.ALL, 1L, 1L, 3L);
+}
+----
+
+You can see that, at first glance, there are only three things that
+that are important to reproduce the issue:
+
+1. The table has to have at least one clustering column
+2. Two actions are executed against the cluster: a range deletion, and a partition deletion
+3. Both operations have the same timestamp
+
+The rest of the details do not matter: size of the cluster, number of
+replicas, clustering order, consistency level with which operations
+are executed, types of clustering keys and values written, and so on.
+
+The simplest way to cover a case like this with a test is to hardcode
+the schema and then execute a partition deletion and a range deletion
+hardcoding the values, much as we did above. This might work, but
+there’s still a chance that the proposed fix may not work for some
+other schema or some combination of values.
+
+To improve the situation, we can express the test in more abstract
+terms and, instead of writing a repro using specific statements, we
+can only use the constraints we’ve specified above:
+
+
+.HarryDsl.java
+[source,java]
+----
+test(new SchemaGenerators.Builder("harry")
+                         .partitionKeySpec(1, 5)
+                         .clusteringKeySpec(1, 5)
+                         .regularColumnSpec(1, 10)
+                         .generator(),
+     historyBuilder -> {
+         historyBuilder.nextPartition()
+                       .simultaneously()
+                       .randomOrder()
+                       .partitionDeletion()
+                       .rangeDeletion()
+                       .finish();
+     });
+----
+
+This spec can be used to generate clusters of different sizes,
+configured with different schemas, executing the given sequence of
+actions both in isolation and combined with other randomly generated
+ones, with failure-injection. Best of all, this test will _not only_
+ensure that such a sequence of actions does not produce an exception
+but also ensures that a cluster will respond with correct results to
+_any_ allowed read query.
+
+== Generating data 
+
+Generating random values and sequences of actions and reconciling them
+during verification is in itself not a difficult task. Making this
+process time- and memory-efficient is what makes it more interesting.
+
+For space efficiency, the log of actions generated using Harry is not
+kept in memory or saved anywhere on disk since any generated operation
+can be reproduced from its sequence number. In Harry, a sequence
+number consists of two parts: the logical timestamp (LTS, which has
+1-1 mapping to real-time timestamp), and the modification ID, which
+allows having multiple uniquely identifiable operations for each
+logical timestamp. For the sake of simplicity, we’ll just say that
+each operation is represented by its sequence number / LTS.
+
+In the example above, the operation order is determined by the seed
+for the given run. Let’s say that partition deletion is executed
+first. To produce a `DELETE` statement from it, we now need to
+generate a partition key and get a timestamp. Similarly, to generate a
+range deletion, we will need a partition key, two clustering keys that
+will serve as lower and higher bounds for the range tombstone, and a
+timestamp.
+
+Using the sequence number and knowing the operation type, we can now
+produce _descriptors_ that are used as the compact internal
+representation of data in Harry. No matter how many parts it consists
+of, any partition key is represented by a single `long`. The same is
+true for the clustering keys: any clustering key, single-part or
+composite, is represented using a single `long` descriptor. If we were
+to generate an `INSERT` or `UPDATE` operation, each value for a
+regular or a static column would have its own descriptor since we
+would want to distinguish between two writes made by two different
+operations.
+
+To summarise, every operation has a sequence number, which determines
+everything that is required to fully reproduce this operation,
+including descriptors that we will later use to generate values
+themselves:
+
+* partition deletion only has a partition descriptor
+* range deletion has a partition descriptor and two clustering descriptors, specifying tombstone bounds
+* insert or update operation has a partition descriptor, a clustering descriptor, and a set of value descriptors, one for each regular and static column.
+
+Using descriptors rather than specific values for verification can be
+extremely useful for efficiency. Instead of comparing potentially
+large values, we could just compare two longs that uniquely identify
+them. This means that we have to have a way to not _only_ generate a
+value from the descriptor, but _also_ to compute a descriptor the
+value was generated from.
+
+In Harry, we call such a generator `Bijection<T>`, and every bijection
+can _inflate_ a descriptor into the value of type `T`. Then _deflate_
+the value of type `T` back into the descriptor where it was originally
+generated.
+
+== Validating results
+
+Applying a predetermined sequence of operations against a single
+partition produces some partition state. Knowing the status of
+execution of each operation, we can deterministically determine the
+state of each node in the cluster and validate the results of
+execution of any `SELECT` query.
+
+Since we can represent any operation as a sequence of descriptors, we
+know the order of operations (since the timestamp determines it). We
+can assume we know the status of each operation (whether or not it has
+been executed against some node), and we can deterministically produce
+partition state for any given point in time. Partition state is
+nothing but a sorted map, where the key is a clustering descriptor,
+and value is a row state. Row state, in this case, holds value
+descriptors for each column, and timestamps where operations were
+executed:
+
+.PartitionState.java
+[source,java]
+----
+public class PartitionState implements Iterable<RowState> {
+    long partitionDescriptor;
+    NavigableMap<Long, RowState> rowStates;
+}
+
+public static class RowState {
+    long[] valueDescriptors;
+    long[] logicalTimestamps;
+}
+----
+
+Similarly, since any value written to the database is generated using
+a bijection, we can produce the partition state from the result set by
+deflating every value returned by the database into the descriptor
+that it was generated from.
+
+== Generating Descriptors
+
+Reproducible operation sequences can be generated from a set of rules
+that determines what the sequence is going to look like. For example,
+we can specify probability distributions for each operation type or
+give operations relative weights, which can be turned into the
+distribution internally later. Configuration for an insert / update /
+delete workload with a probability of an insert operation (100/251)
+being twice as high as a probability of a row deletion (50/251), and
+ten times more probable than a partition deletion (1/251), would look
+like:
+
+----
+INSERT: 100
+UPDATE: 100
+DELETE_ROW: 50
+DELETE_PARTITION: 1
+----
+
+Since each operation is uniquely determined by its sequence number, we
+can deterministically compute its operation type by taking these
+probability distributions. One way to do this is by using PCG random
+number generator, which has some useful properties we’re going to use
+for generating our pseudorandom values.
+
+If you’d like to learn more about the mathematical underpinnings of
+PCG, you should read this paper
+(https://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf[https://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf,window=_blank]). However,
+to be able to use PCG, it is not necessary to know any of the
+internals. We need a random number generator that will have the
+following properties:
+
+  * Long period: sequence of numbers it produces does not repeat
+    frequently; ideally the period should be 2^64 when generating a
+    random number from 64 bits of entropy
+  * Stream selection: the ability to produce different random
+    sequences from the same seed, identified by some stream id.
+  * Addressability: any number produced by the generator can be
+    reproduced from the seed and its sequence number. Ideally, we’d
+    like to have methods such as `long randomNumber(long
+    sequenceNumber, long stream)` and `long sequenceNumber(long
+    randomNumber, long stream)`. In other words, we should be able to
+    determine the sequence number of the random number in the given
+    stream. Using this method, we can also determine `distance(long x,
+    long y)` : how many random numbers we should skip to get `y` after
+    seeing `x`.
+  * Walkability: the ability to produce a number immediately following
+    `long next(long randomNumber, long stream)` or preceding `long
+    prev(long randomNumber, long stream)` the given random number in
+    the random sequence.
+
+You might have noticed that there are two ways to achieve the same
+thing. We can get a pseudorandom number from some number known by the
+system by using `randomNumber(i, stream)` and by using `prev(i,
+stream)`. Both variants are valid, and both operations can be
+inverted. We have a slight preference toward using `prev`, since its
+inverse can be computed in constant time.
+
+These properties allow us to reproduce partition state from just
+configuration (i.e., known distributions, schema, size of the
+partition, etc) and a seed:
+
+  * Partition descriptor for `N` th operation can be picked as `M` th
+    random number in the stream of partition descriptors, and the
+    relation between `N` and `M` is determined by the chosen pattern
+    for visiting partitions.
+  * Clustering descriptor for `N` th operation can be picked as `M` th
+    random number in the stream of clustering descriptors **for the
+    given partition**, where maximum `M` is determined by the maximum
+    partition size, so there can be no more than `max(M)` rows in any
+    generated partition.
+
+One of the simplest useful ways to represent a pattern for picking a
+descriptor from the sequence is to use a sliding window. The sliding
+window begins with a preset number of items in it and allows to visit
+each item in the current window one or several times in a round-robin
+fashion. After this, it cycles one of the items out and adds a new one
+in its place.
+
+Once operation type, partition descriptor, and clustering descriptors
+are determined, all we have left to cover is how to generate value
+descriptors for `INSERT` and `UPDATE` operations. Value descriptor for
+a column is uniquely identified by its sequence number and is bound by
+partition descriptor, clustering descriptor, and column.
+
+To summarise, all operations in Harry are deterministic and are
+represented using their descriptors. Descriptors can be computed
+hierarchically using the following rules:
+
+* Partition descriptor is picked from the _stream_ of partition descriptors. Its position in that stream is determined by some rule (for example, a sliding window):
+
+[source,perl]
+----
+long pd = rng.randomNumber(positionFor(sequenceNumber), PARTITION_DESCRIPTOR_STREAM_ID)
+----
+
+* Clustering descriptor is picked from the _stream_ of clustering descriptors **for the given partition**.
+
+----
+long cd = rng.prev(positionInPartition, pd);
+----
+
+* Value descriptor is picked from the _stream_ of descriptors identified by which partition, clustering, and column the value belongs to:
+
+----
+long vd = rng.randomNumber(sequenceNumber, pd ^ cd ^ col);
+----
+
+== Inflation and Deflation
+
+We’ve mentioned before that one reason Harry state is so compact and
+can be validated so efficiently is because every value read from the
+database can be traced back to the descriptor it was generated
+from. To achieve this, we generate all values using order-preserving
+bijections. In other words, for any value generated from a descriptor,
+it should be possible to quickly find a descriptor this value was
+generated from, and two values generated from two distinct descriptors
+should sort the same as descriptors themselves.
+
+Implementing an order-preserving bijection for 64-bit longs is trivial
+and can be achieved by using an identity function. Essentially, any
+long descriptor _is_ the value it represents:
+
+[source,java]
+----
+long inflate(long descriptor) {
+  return descriptor;
+}
+
+long deflate(long value) {
+  return value; 
+}
+----
+
+There are many ways to make a bijection for strings. One of the ways
+to do it is to have a set of 256 short strings of the same length in a
+sorted array. When inflating a 64-bit long descriptor into the string,
+we’ll be iterating over these 64 bits, taking 8 bits (one byte) at a
+time, using the value of this byte as an index in an array of 256
+strings.
+
+[source,java]
+----
+String inflate(long descriptor) {
+    StringBuilder builder = new StringBuilder();
+
+    for (int i = 0; i < Long.BYTES; i++) {
+            int idx = getByte(descriptor, i);
+            builder.append(nibbles[idx]);
+    }
+    return builder.toString();
+}
+----
+
+One thing we should take into account here is that strings are
+compared byte-wise, while longs use signed comparison. To make sure
+generated strings have the same order as descriptors, we need to XOR
+the sign bit.
+
+Since any two strings produced by this generator will be unique, and
+we can produce at most 2^64 values using this generator, to generate
+longer strings we do not even need larger nibbles. We can append
+random data of arbitrary length to the end of the string. This does
+not change the order since it is determined by the prefix generated
+from nibbles that is unique to each value.
+
+Such simple bijections can represent data types used for regular and
+static columns. We’ve previously mentioned that partition and
+clustering keys are also represented using 64-bit
+descriptors. Partition and clustering keys are composite: they consist
+of multiple distinct parts. One way to implement bijection for a
+composite type is to “slice” 64 bits of entropy into smaller chunks,
+each chunk giving some entropy to generate a different part of the
+key. Each slice is then inflated using a bijection that corresponds to
+the part of the key it represents. To convert the value back to the
+descriptor, we must deflate each part of the key and then “stitch” the
+values back together into a 64-bit descriptor.
+
+To summarise, key generators are just bijections that can generate
+multiple values for a single 64-bit descriptor instead of one. A
+simplified and generalized version of such bijection may look
+something like this:
+

Review comment:
       For consistency, insert the line `[source,java]` here.

##########
File path: site-content/source/modules/ROOT/pages/blog/Harry-an-Open-Source-Fuzz-Testing-and-Verification-Tool-for-Apache-Cassandra.adoc
##########
@@ -0,0 +1,523 @@
+= Harry, an Open Source Fuzz Testing and Verification Tool for Apache Cassandra
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: November 2, 2021
+:page-post-author: The Apache Cassandra Community
+:description: The Apache Cassandra Community
+:keywords: 
+
+Over the years working on Apache Cassandra while writing tests or
+trying to reproduce the issues, I’ve always found myself repeating the
+same procedure over and over again: creating schema, writing loops
+generating data, then either manually reconciling it to check the
+results, or comparing the result set against some predetermined
+expected result. Not only is this approach tedious and time-consuming,
+but it also does not scale: if some set of operations work for one
+schema, there’s no way to know if it will also work for any arbitrary
+schema, whether it will work if operations are executed in a different
+order, or if operations themselves are slightly different.
+
+While preparing Apache Cassandra for 4.0 release, we’ve made extensive
+progress in how we test. The new in-tree in-JVM distributed test
+framework enables us to easily write tests that exercise coordinated
+query execution code paths while giving us flexibility and control
+that was previously offered only by CQLTester, a tool for exercising
+node-local query paths. Many subsystems were audited and covered with
+tests. Cassandra users tried the new version out in their clusters and
+reported their findings. All of these things are useful and important,
+but we still needed a tool that would give us the same or higher
+degree of confidence for every commit so that we could know that the
+database is working as expected, not only for an exact set of
+operations that exercised by unit and integration tests, but
+potentially for any use-case and combination of operations under
+circumstances comparable to production.
+
+This all led us to develop Harry, a tool that can combine properties
+of stress- and integration-testing tools. Harry is a tool that can
+generate data for an arbitrary schema, execute data modification
+queries against the cluster, track the progress of operation
+execution, and make sure that responses to read queries are correct.
+
+After reading this post, you will understand:
+
+* how Harry generates the data
+* how Harry performs verification
+* which properties values generated by  Harry make verification not only possible but also efficient. 
+
+The primary audience for this post is Cassandra contributors, so you
+will need to be familiar with Apache Cassandra and its tooling.
+
+== Fuzz testing 
+
+Since most of the bugs are reproduced by taking a sequence of actions
+following some pattern, we need to specify what actions can be used to
+lead to a given state. However, there’s a lot of flexibility regarding
+which values exactly are written in specific columns.
+
+For example, if we look at
+https://issues.apache.org/jira/browse/CASSANDRA-16453?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel[CASSANDRA-16453,window=_blank],
+which was reproduced using Harry. Code to reproduce the issue with
+in-JVM DTests looks something like this:
+
+.Repro.java
+[source,java]
+----
+try (Cluster cluster = init(builder().withNodes(2).start()))
+{
+    cluster.schemaChange(withKeyspace("CREATE TABLE distributed_test_keyspace.table_0 (pk0 bigint,ck0 bigint,regular0 bigint,regular1 bigint,regular2 bigint, PRIMARY KEY (pk0, ck0)) WITH  CLUSTERING ORDER BY (ck0 ASC);"));
+    cluster.coordinator(1).execute("DELETE FROM distributed_test_keyspace.table_0 USING TIMESTAMP 1 WHERE pk0=1 AND ck0>2;", ConsistencyLevel.ALL);
+    cluster.get(2).executeInternal("DELETE FROM distributed_test_keyspace.table_0 USING TIMESTAMP 1 WHERE pk0=1;");
+    cluster.coordinator(1).execute("SELECT * FROM distributed_test_keyspace.table_0 WHERE pk0=1 AND ck0>=1 AND ck0<3;",
+                                   ConsistencyLevel.ALL, 1L, 1L, 3L);
+}
+----
+
+You can see that, at first glance, there are only three things that
+that are important to reproduce the issue:
+
+1. The table has to have at least one clustering column
+2. Two actions are executed against the cluster: a range deletion, and a partition deletion
+3. Both operations have the same timestamp
+
+The rest of the details do not matter: size of the cluster, number of
+replicas, clustering order, consistency level with which operations
+are executed, types of clustering keys and values written, and so on.
+
+The simplest way to cover a case like this with a test is to hardcode
+the schema and then execute a partition deletion and a range deletion
+hardcoding the values, much as we did above. This might work, but
+there’s still a chance that the proposed fix may not work for some
+other schema or some combination of values.
+
+To improve the situation, we can express the test in more abstract
+terms and, instead of writing a repro using specific statements, we
+can only use the constraints we’ve specified above:
+
+
+.HarryDsl.java
+[source,java]
+----
+test(new SchemaGenerators.Builder("harry")
+                         .partitionKeySpec(1, 5)
+                         .clusteringKeySpec(1, 5)
+                         .regularColumnSpec(1, 10)
+                         .generator(),
+     historyBuilder -> {
+         historyBuilder.nextPartition()
+                       .simultaneously()
+                       .randomOrder()
+                       .partitionDeletion()
+                       .rangeDeletion()
+                       .finish();
+     });
+----
+
+This spec can be used to generate clusters of different sizes,
+configured with different schemas, executing the given sequence of
+actions both in isolation and combined with other randomly generated
+ones, with failure-injection. Best of all, this test will _not only_
+ensure that such a sequence of actions does not produce an exception
+but also ensures that a cluster will respond with correct results to
+_any_ allowed read query.
+
+== Generating data 
+
+Generating random values and sequences of actions and reconciling them
+during verification is in itself not a difficult task. Making this
+process time- and memory-efficient is what makes it more interesting.
+
+For space efficiency, the log of actions generated using Harry is not
+kept in memory or saved anywhere on disk since any generated operation
+can be reproduced from its sequence number. In Harry, a sequence
+number consists of two parts: the logical timestamp (LTS, which has
+1-1 mapping to real-time timestamp), and the modification ID, which
+allows having multiple uniquely identifiable operations for each
+logical timestamp. For the sake of simplicity, we’ll just say that
+each operation is represented by its sequence number / LTS.
+
+In the example above, the operation order is determined by the seed
+for the given run. Let’s say that partition deletion is executed
+first. To produce a `DELETE` statement from it, we now need to
+generate a partition key and get a timestamp. Similarly, to generate a
+range deletion, we will need a partition key, two clustering keys that
+will serve as lower and higher bounds for the range tombstone, and a
+timestamp.
+
+Using the sequence number and knowing the operation type, we can now
+produce _descriptors_ that are used as the compact internal
+representation of data in Harry. No matter how many parts it consists
+of, any partition key is represented by a single `long`. The same is
+true for the clustering keys: any clustering key, single-part or
+composite, is represented using a single `long` descriptor. If we were
+to generate an `INSERT` or `UPDATE` operation, each value for a
+regular or a static column would have its own descriptor since we
+would want to distinguish between two writes made by two different
+operations.
+
+To summarise, every operation has a sequence number, which determines
+everything that is required to fully reproduce this operation,
+including descriptors that we will later use to generate values
+themselves:
+
+* partition deletion only has a partition descriptor
+* range deletion has a partition descriptor and two clustering descriptors, specifying tombstone bounds
+* insert or update operation has a partition descriptor, a clustering descriptor, and a set of value descriptors, one for each regular and static column.
+
+Using descriptors rather than specific values for verification can be
+extremely useful for efficiency. Instead of comparing potentially
+large values, we could just compare two longs that uniquely identify
+them. This means that we have to have a way to not _only_ generate a
+value from the descriptor, but _also_ to compute a descriptor the
+value was generated from.
+
+In Harry, we call such a generator `Bijection<T>`, and every bijection
+can _inflate_ a descriptor into the value of type `T`. Then _deflate_
+the value of type `T` back into the descriptor where it was originally
+generated.
+
+== Validating results
+
+Applying a predetermined sequence of operations against a single
+partition produces some partition state. Knowing the status of
+execution of each operation, we can deterministically determine the
+state of each node in the cluster and validate the results of
+execution of any `SELECT` query.
+
+Since we can represent any operation as a sequence of descriptors, we
+know the order of operations (since the timestamp determines it). We
+can assume we know the status of each operation (whether or not it has
+been executed against some node), and we can deterministically produce
+partition state for any given point in time. Partition state is
+nothing but a sorted map, where the key is a clustering descriptor,
+and value is a row state. Row state, in this case, holds value
+descriptors for each column, and timestamps where operations were
+executed:
+
+.PartitionState.java
+[source,java]
+----
+public class PartitionState implements Iterable<RowState> {
+    long partitionDescriptor;
+    NavigableMap<Long, RowState> rowStates;
+}
+
+public static class RowState {
+    long[] valueDescriptors;
+    long[] logicalTimestamps;
+}
+----
+
+Similarly, since any value written to the database is generated using
+a bijection, we can produce the partition state from the result set by
+deflating every value returned by the database into the descriptor
+that it was generated from.
+
+== Generating Descriptors
+
+Reproducible operation sequences can be generated from a set of rules
+that determines what the sequence is going to look like. For example,
+we can specify probability distributions for each operation type or
+give operations relative weights, which can be turned into the
+distribution internally later. Configuration for an insert / update /
+delete workload with a probability of an insert operation (100/251)
+being twice as high as a probability of a row deletion (50/251), and
+ten times more probable than a partition deletion (1/251), would look
+like:
+
+----
+INSERT: 100
+UPDATE: 100
+DELETE_ROW: 50
+DELETE_PARTITION: 1
+----
+
+Since each operation is uniquely determined by its sequence number, we
+can deterministically compute its operation type by taking these
+probability distributions. One way to do this is by using PCG random
+number generator, which has some useful properties we’re going to use
+for generating our pseudorandom values.
+
+If you’d like to learn more about the mathematical underpinnings of
+PCG, you should read this paper
+(https://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf[https://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf,window=_blank]). However,
+to be able to use PCG, it is not necessary to know any of the
+internals. We need a random number generator that will have the
+following properties:
+
+  * Long period: sequence of numbers it produces does not repeat
+    frequently; ideally the period should be 2^64 when generating a
+    random number from 64 bits of entropy
+  * Stream selection: the ability to produce different random
+    sequences from the same seed, identified by some stream id.
+  * Addressability: any number produced by the generator can be
+    reproduced from the seed and its sequence number. Ideally, we’d
+    like to have methods such as `long randomNumber(long
+    sequenceNumber, long stream)` and `long sequenceNumber(long
+    randomNumber, long stream)`. In other words, we should be able to
+    determine the sequence number of the random number in the given
+    stream. Using this method, we can also determine `distance(long x,
+    long y)` : how many random numbers we should skip to get `y` after
+    seeing `x`.
+  * Walkability: the ability to produce a number immediately following
+    `long next(long randomNumber, long stream)` or preceding `long
+    prev(long randomNumber, long stream)` the given random number in
+    the random sequence.
+
+You might have noticed that there are two ways to achieve the same
+thing. We can get a pseudorandom number from some number known by the
+system by using `randomNumber(i, stream)` and by using `prev(i,
+stream)`. Both variants are valid, and both operations can be
+inverted. We have a slight preference toward using `prev`, since its
+inverse can be computed in constant time.
+
+These properties allow us to reproduce partition state from just
+configuration (i.e., known distributions, schema, size of the
+partition, etc) and a seed:
+
+  * Partition descriptor for `N` th operation can be picked as `M` th
+    random number in the stream of partition descriptors, and the
+    relation between `N` and `M` is determined by the chosen pattern
+    for visiting partitions.
+  * Clustering descriptor for `N` th operation can be picked as `M` th
+    random number in the stream of clustering descriptors **for the
+    given partition**, where maximum `M` is determined by the maximum
+    partition size, so there can be no more than `max(M)` rows in any
+    generated partition.
+
+One of the simplest useful ways to represent a pattern for picking a
+descriptor from the sequence is to use a sliding window. The sliding
+window begins with a preset number of items in it and allows to visit
+each item in the current window one or several times in a round-robin
+fashion. After this, it cycles one of the items out and adds a new one
+in its place.
+
+Once operation type, partition descriptor, and clustering descriptors
+are determined, all we have left to cover is how to generate value
+descriptors for `INSERT` and `UPDATE` operations. Value descriptor for
+a column is uniquely identified by its sequence number and is bound by
+partition descriptor, clustering descriptor, and column.
+
+To summarise, all operations in Harry are deterministic and are
+represented using their descriptors. Descriptors can be computed
+hierarchically using the following rules:
+
+* Partition descriptor is picked from the _stream_ of partition descriptors. Its position in that stream is determined by some rule (for example, a sliding window):
+
+[source,perl]
+----
+long pd = rng.randomNumber(positionFor(sequenceNumber), PARTITION_DESCRIPTOR_STREAM_ID)
+----
+
+* Clustering descriptor is picked from the _stream_ of clustering descriptors **for the given partition**.
+
+----
+long cd = rng.prev(positionInPartition, pd);
+----
+
+* Value descriptor is picked from the _stream_ of descriptors identified by which partition, clustering, and column the value belongs to:
+
+----
+long vd = rng.randomNumber(sequenceNumber, pd ^ cd ^ col);
+----
+
+== Inflation and Deflation
+
+We’ve mentioned before that one reason Harry state is so compact and
+can be validated so efficiently is because every value read from the
+database can be traced back to the descriptor it was generated
+from. To achieve this, we generate all values using order-preserving
+bijections. In other words, for any value generated from a descriptor,
+it should be possible to quickly find a descriptor this value was
+generated from, and two values generated from two distinct descriptors
+should sort the same as descriptors themselves.
+
+Implementing an order-preserving bijection for 64-bit longs is trivial
+and can be achieved by using an identity function. Essentially, any
+long descriptor _is_ the value it represents:
+
+[source,java]
+----
+long inflate(long descriptor) {
+  return descriptor;
+}
+
+long deflate(long value) {
+  return value; 
+}
+----
+
+There are many ways to make a bijection for strings. One of the ways
+to do it is to have a set of 256 short strings of the same length in a
+sorted array. When inflating a 64-bit long descriptor into the string,
+we’ll be iterating over these 64 bits, taking 8 bits (one byte) at a
+time, using the value of this byte as an index in an array of 256
+strings.
+
+[source,java]
+----
+String inflate(long descriptor) {
+    StringBuilder builder = new StringBuilder();
+
+    for (int i = 0; i < Long.BYTES; i++) {
+            int idx = getByte(descriptor, i);
+            builder.append(nibbles[idx]);
+    }
+    return builder.toString();
+}
+----
+
+One thing we should take into account here is that strings are
+compared byte-wise, while longs use signed comparison. To make sure
+generated strings have the same order as descriptors, we need to XOR
+the sign bit.
+
+Since any two strings produced by this generator will be unique, and
+we can produce at most 2^64 values using this generator, to generate
+longer strings we do not even need larger nibbles. We can append
+random data of arbitrary length to the end of the string. This does
+not change the order since it is determined by the prefix generated
+from nibbles that is unique to each value.
+
+Such simple bijections can represent data types used for regular and
+static columns. We’ve previously mentioned that partition and
+clustering keys are also represented using 64-bit
+descriptors. Partition and clustering keys are composite: they consist
+of multiple distinct parts. One way to implement bijection for a
+composite type is to “slice” 64 bits of entropy into smaller chunks,
+each chunk giving some entropy to generate a different part of the
+key. Each slice is then inflated using a bijection that corresponds to
+the part of the key it represents. To convert the value back to the
+descriptor, we must deflate each part of the key and then “stitch” the
+values back together into a 64-bit descriptor.
+
+To summarise, key generators are just bijections that can generate
+multiple values for a single 64-bit descriptor instead of one. A
+simplified and generalized version of such bijection may look
+something like this:
+
+----
+Object[] inflate(long descriptor) {
+  long[] slices = slice(descriptor);
+  Object[] key = new Object[slices.length];
+  for (int i = 0; i < slices.length; i++) {
+     key[i] = children[i].inflate(slices[i]);
+  }
+  return key;
+}
+
+long deflate(Object[] value) {
+  long[] slices = new long[value.length];
+  for (int i = 0; i < value.length; i++) {
+     slices[i] = children[i].deflate(value[i]);
+  }
+  return stitch(slices);
+}
+----
+
+Values generated by key generators preserve the order of descriptors
+they were generated from, which allows efficiently checking the order
+of results, comparing clustering descriptors, and validating range
+deletions.
+
+== Putting it all together 
+
+In this post, we’ve learned how the various parts of Harry work,
+starting with how to reproduce a sequence of operations up to how the
+values are generated. Using this information, we can create a
+quiescent model checker that can validate the state of the database in
+the absence of in-flight operations, assuming we know the state of all
+operations before this moment.
+
+As we’ve discussed, Harry is working with reproducible histories of
+operations, where the following information identifies each operation:
+
+----
+class Operation {
+  long lts; // logical timestamp of the operation
+  long pd;  // partition descriptor, derived from LTS
+  long cd;  // clustering descriptor, derived from LTS and PD
+  long OperationKind; // operation type, derived from LTS and PD
+}
+----
+
+Now, all we need to do is to produce a sequence of operations. For example, each operation with `INSERT` kind is going to be represented by:
+
+----

Review comment:
       For consistency, insert the line `[source,java]` here.

##########
File path: site-content/source/modules/ROOT/pages/blog/Harry-an-Open-Source-Fuzz-Testing-and-Verification-Tool-for-Apache-Cassandra.adoc
##########
@@ -0,0 +1,523 @@
+= Harry, an Open Source Fuzz Testing and Verification Tool for Apache Cassandra
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: November 2, 2021
+:page-post-author: The Apache Cassandra Community
+:description: The Apache Cassandra Community
+:keywords: 
+
+Over the years working on Apache Cassandra while writing tests or
+trying to reproduce the issues, I’ve always found myself repeating the
+same procedure over and over again: creating schema, writing loops
+generating data, then either manually reconciling it to check the
+results, or comparing the result set against some predetermined
+expected result. Not only is this approach tedious and time-consuming,
+but it also does not scale: if some set of operations work for one
+schema, there’s no way to know if it will also work for any arbitrary
+schema, whether it will work if operations are executed in a different
+order, or if operations themselves are slightly different.
+
+While preparing Apache Cassandra for 4.0 release, we’ve made extensive
+progress in how we test. The new in-tree in-JVM distributed test
+framework enables us to easily write tests that exercise coordinated
+query execution code paths while giving us flexibility and control
+that was previously offered only by CQLTester, a tool for exercising
+node-local query paths. Many subsystems were audited and covered with
+tests. Cassandra users tried the new version out in their clusters and
+reported their findings. All of these things are useful and important,
+but we still needed a tool that would give us the same or higher
+degree of confidence for every commit so that we could know that the
+database is working as expected, not only for an exact set of
+operations that exercised by unit and integration tests, but
+potentially for any use-case and combination of operations under
+circumstances comparable to production.
+
+This all led us to develop Harry, a tool that can combine properties
+of stress- and integration-testing tools. Harry is a tool that can
+generate data for an arbitrary schema, execute data modification
+queries against the cluster, track the progress of operation
+execution, and make sure that responses to read queries are correct.
+
+After reading this post, you will understand:
+
+* how Harry generates the data
+* how Harry performs verification
+* which properties values generated by  Harry make verification not only possible but also efficient. 
+
+The primary audience for this post is Cassandra contributors, so you
+will need to be familiar with Apache Cassandra and its tooling.
+
+== Fuzz testing 
+
+Since most of the bugs are reproduced by taking a sequence of actions
+following some pattern, we need to specify what actions can be used to
+lead to a given state. However, there’s a lot of flexibility regarding
+which values exactly are written in specific columns.
+
+For example, if we look at
+https://issues.apache.org/jira/browse/CASSANDRA-16453?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel[CASSANDRA-16453,window=_blank],
+which was reproduced using Harry. Code to reproduce the issue with
+in-JVM DTests looks something like this:
+
+.Repro.java
+[source,java]
+----
+try (Cluster cluster = init(builder().withNodes(2).start()))
+{
+    cluster.schemaChange(withKeyspace("CREATE TABLE distributed_test_keyspace.table_0 (pk0 bigint,ck0 bigint,regular0 bigint,regular1 bigint,regular2 bigint, PRIMARY KEY (pk0, ck0)) WITH  CLUSTERING ORDER BY (ck0 ASC);"));
+    cluster.coordinator(1).execute("DELETE FROM distributed_test_keyspace.table_0 USING TIMESTAMP 1 WHERE pk0=1 AND ck0>2;", ConsistencyLevel.ALL);
+    cluster.get(2).executeInternal("DELETE FROM distributed_test_keyspace.table_0 USING TIMESTAMP 1 WHERE pk0=1;");
+    cluster.coordinator(1).execute("SELECT * FROM distributed_test_keyspace.table_0 WHERE pk0=1 AND ck0>=1 AND ck0<3;",
+                                   ConsistencyLevel.ALL, 1L, 1L, 3L);
+}
+----
+
+You can see that, at first glance, there are only three things that
+that are important to reproduce the issue:
+
+1. The table has to have at least one clustering column
+2. Two actions are executed against the cluster: a range deletion, and a partition deletion
+3. Both operations have the same timestamp
+
+The rest of the details do not matter: size of the cluster, number of
+replicas, clustering order, consistency level with which operations
+are executed, types of clustering keys and values written, and so on.
+
+The simplest way to cover a case like this with a test is to hardcode
+the schema and then execute a partition deletion and a range deletion
+hardcoding the values, much as we did above. This might work, but
+there’s still a chance that the proposed fix may not work for some
+other schema or some combination of values.
+
+To improve the situation, we can express the test in more abstract
+terms and, instead of writing a repro using specific statements, we
+can only use the constraints we’ve specified above:
+
+
+.HarryDsl.java
+[source,java]
+----
+test(new SchemaGenerators.Builder("harry")
+                         .partitionKeySpec(1, 5)
+                         .clusteringKeySpec(1, 5)
+                         .regularColumnSpec(1, 10)
+                         .generator(),
+     historyBuilder -> {
+         historyBuilder.nextPartition()
+                       .simultaneously()
+                       .randomOrder()
+                       .partitionDeletion()
+                       .rangeDeletion()
+                       .finish();
+     });
+----
+
+This spec can be used to generate clusters of different sizes,
+configured with different schemas, executing the given sequence of
+actions both in isolation and combined with other randomly generated
+ones, with failure-injection. Best of all, this test will _not only_
+ensure that such a sequence of actions does not produce an exception
+but also ensures that a cluster will respond with correct results to
+_any_ allowed read query.
+
+== Generating data 
+
+Generating random values and sequences of actions and reconciling them
+during verification is in itself not a difficult task. Making this
+process time- and memory-efficient is what makes it more interesting.
+
+For space efficiency, the log of actions generated using Harry is not
+kept in memory or saved anywhere on disk since any generated operation
+can be reproduced from its sequence number. In Harry, a sequence
+number consists of two parts: the logical timestamp (LTS, which has
+1-1 mapping to real-time timestamp), and the modification ID, which
+allows having multiple uniquely identifiable operations for each
+logical timestamp. For the sake of simplicity, we’ll just say that
+each operation is represented by its sequence number / LTS.
+
+In the example above, the operation order is determined by the seed
+for the given run. Let’s say that partition deletion is executed
+first. To produce a `DELETE` statement from it, we now need to
+generate a partition key and get a timestamp. Similarly, to generate a
+range deletion, we will need a partition key, two clustering keys that
+will serve as lower and higher bounds for the range tombstone, and a
+timestamp.
+
+Using the sequence number and knowing the operation type, we can now
+produce _descriptors_ that are used as the compact internal
+representation of data in Harry. No matter how many parts it consists
+of, any partition key is represented by a single `long`. The same is
+true for the clustering keys: any clustering key, single-part or
+composite, is represented using a single `long` descriptor. If we were
+to generate an `INSERT` or `UPDATE` operation, each value for a
+regular or a static column would have its own descriptor since we
+would want to distinguish between two writes made by two different
+operations.
+
+To summarise, every operation has a sequence number, which determines
+everything that is required to fully reproduce this operation,
+including descriptors that we will later use to generate values
+themselves:
+
+* partition deletion only has a partition descriptor
+* range deletion has a partition descriptor and two clustering descriptors, specifying tombstone bounds
+* insert or update operation has a partition descriptor, a clustering descriptor, and a set of value descriptors, one for each regular and static column.
+
+Using descriptors rather than specific values for verification can be
+extremely useful for efficiency. Instead of comparing potentially
+large values, we could just compare two longs that uniquely identify
+them. This means that we have to have a way to not _only_ generate a
+value from the descriptor, but _also_ to compute a descriptor the
+value was generated from.
+
+In Harry, we call such a generator `Bijection<T>`, and every bijection
+can _inflate_ a descriptor into the value of type `T`. Then _deflate_
+the value of type `T` back into the descriptor where it was originally
+generated.
+
+== Validating results
+
+Applying a predetermined sequence of operations against a single
+partition produces some partition state. Knowing the status of
+execution of each operation, we can deterministically determine the
+state of each node in the cluster and validate the results of
+execution of any `SELECT` query.
+
+Since we can represent any operation as a sequence of descriptors, we
+know the order of operations (since the timestamp determines it). We
+can assume we know the status of each operation (whether or not it has
+been executed against some node), and we can deterministically produce
+partition state for any given point in time. Partition state is
+nothing but a sorted map, where the key is a clustering descriptor,
+and value is a row state. Row state, in this case, holds value
+descriptors for each column, and timestamps where operations were
+executed:
+
+.PartitionState.java
+[source,java]
+----
+public class PartitionState implements Iterable<RowState> {
+    long partitionDescriptor;
+    NavigableMap<Long, RowState> rowStates;
+}
+
+public static class RowState {
+    long[] valueDescriptors;
+    long[] logicalTimestamps;
+}
+----
+
+Similarly, since any value written to the database is generated using
+a bijection, we can produce the partition state from the result set by
+deflating every value returned by the database into the descriptor
+that it was generated from.
+
+== Generating Descriptors
+
+Reproducible operation sequences can be generated from a set of rules
+that determines what the sequence is going to look like. For example,
+we can specify probability distributions for each operation type or
+give operations relative weights, which can be turned into the
+distribution internally later. Configuration for an insert / update /
+delete workload with a probability of an insert operation (100/251)
+being twice as high as a probability of a row deletion (50/251), and
+ten times more probable than a partition deletion (1/251), would look
+like:
+
+----
+INSERT: 100
+UPDATE: 100
+DELETE_ROW: 50
+DELETE_PARTITION: 1
+----
+
+Since each operation is uniquely determined by its sequence number, we
+can deterministically compute its operation type by taking these
+probability distributions. One way to do this is by using PCG random
+number generator, which has some useful properties we’re going to use
+for generating our pseudorandom values.
+
+If you’d like to learn more about the mathematical underpinnings of
+PCG, you should read this paper
+(https://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf[https://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf,window=_blank]). However,
+to be able to use PCG, it is not necessary to know any of the
+internals. We need a random number generator that will have the
+following properties:
+
+  * Long period: sequence of numbers it produces does not repeat
+    frequently; ideally the period should be 2^64 when generating a
+    random number from 64 bits of entropy
+  * Stream selection: the ability to produce different random
+    sequences from the same seed, identified by some stream id.
+  * Addressability: any number produced by the generator can be
+    reproduced from the seed and its sequence number. Ideally, we’d
+    like to have methods such as `long randomNumber(long
+    sequenceNumber, long stream)` and `long sequenceNumber(long
+    randomNumber, long stream)`. In other words, we should be able to
+    determine the sequence number of the random number in the given
+    stream. Using this method, we can also determine `distance(long x,
+    long y)` : how many random numbers we should skip to get `y` after
+    seeing `x`.
+  * Walkability: the ability to produce a number immediately following
+    `long next(long randomNumber, long stream)` or preceding `long
+    prev(long randomNumber, long stream)` the given random number in
+    the random sequence.
+
+You might have noticed that there are two ways to achieve the same
+thing. We can get a pseudorandom number from some number known by the
+system by using `randomNumber(i, stream)` and by using `prev(i,
+stream)`. Both variants are valid, and both operations can be
+inverted. We have a slight preference toward using `prev`, since its
+inverse can be computed in constant time.
+
+These properties allow us to reproduce partition state from just
+configuration (i.e., known distributions, schema, size of the
+partition, etc) and a seed:
+
+  * Partition descriptor for `N` th operation can be picked as `M` th
+    random number in the stream of partition descriptors, and the
+    relation between `N` and `M` is determined by the chosen pattern
+    for visiting partitions.
+  * Clustering descriptor for `N` th operation can be picked as `M` th
+    random number in the stream of clustering descriptors **for the
+    given partition**, where maximum `M` is determined by the maximum
+    partition size, so there can be no more than `max(M)` rows in any
+    generated partition.
+
+One of the simplest useful ways to represent a pattern for picking a
+descriptor from the sequence is to use a sliding window. The sliding
+window begins with a preset number of items in it and allows to visit
+each item in the current window one or several times in a round-robin
+fashion. After this, it cycles one of the items out and adds a new one
+in its place.
+
+Once operation type, partition descriptor, and clustering descriptors
+are determined, all we have left to cover is how to generate value
+descriptors for `INSERT` and `UPDATE` operations. Value descriptor for
+a column is uniquely identified by its sequence number and is bound by
+partition descriptor, clustering descriptor, and column.
+
+To summarise, all operations in Harry are deterministic and are
+represented using their descriptors. Descriptors can be computed
+hierarchically using the following rules:
+
+* Partition descriptor is picked from the _stream_ of partition descriptors. Its position in that stream is determined by some rule (for example, a sliding window):
+
+[source,perl]
+----
+long pd = rng.randomNumber(positionFor(sequenceNumber), PARTITION_DESCRIPTOR_STREAM_ID)
+----
+
+* Clustering descriptor is picked from the _stream_ of clustering descriptors **for the given partition**.
+
+----
+long cd = rng.prev(positionInPartition, pd);
+----
+
+* Value descriptor is picked from the _stream_ of descriptors identified by which partition, clustering, and column the value belongs to:
+
+----
+long vd = rng.randomNumber(sequenceNumber, pd ^ cd ^ col);
+----
+
+== Inflation and Deflation
+
+We’ve mentioned before that one reason Harry state is so compact and
+can be validated so efficiently is because every value read from the
+database can be traced back to the descriptor it was generated
+from. To achieve this, we generate all values using order-preserving
+bijections. In other words, for any value generated from a descriptor,
+it should be possible to quickly find a descriptor this value was
+generated from, and two values generated from two distinct descriptors
+should sort the same as descriptors themselves.
+
+Implementing an order-preserving bijection for 64-bit longs is trivial
+and can be achieved by using an identity function. Essentially, any
+long descriptor _is_ the value it represents:
+
+[source,java]
+----
+long inflate(long descriptor) {
+  return descriptor;
+}
+
+long deflate(long value) {
+  return value; 
+}
+----
+
+There are many ways to make a bijection for strings. One of the ways
+to do it is to have a set of 256 short strings of the same length in a
+sorted array. When inflating a 64-bit long descriptor into the string,
+we’ll be iterating over these 64 bits, taking 8 bits (one byte) at a
+time, using the value of this byte as an index in an array of 256
+strings.
+
+[source,java]
+----
+String inflate(long descriptor) {
+    StringBuilder builder = new StringBuilder();
+
+    for (int i = 0; i < Long.BYTES; i++) {
+            int idx = getByte(descriptor, i);
+            builder.append(nibbles[idx]);
+    }
+    return builder.toString();
+}
+----
+
+One thing we should take into account here is that strings are
+compared byte-wise, while longs use signed comparison. To make sure
+generated strings have the same order as descriptors, we need to XOR
+the sign bit.
+
+Since any two strings produced by this generator will be unique, and
+we can produce at most 2^64 values using this generator, to generate
+longer strings we do not even need larger nibbles. We can append
+random data of arbitrary length to the end of the string. This does
+not change the order since it is determined by the prefix generated
+from nibbles that is unique to each value.
+
+Such simple bijections can represent data types used for regular and
+static columns. We’ve previously mentioned that partition and
+clustering keys are also represented using 64-bit
+descriptors. Partition and clustering keys are composite: they consist
+of multiple distinct parts. One way to implement bijection for a
+composite type is to “slice” 64 bits of entropy into smaller chunks,
+each chunk giving some entropy to generate a different part of the
+key. Each slice is then inflated using a bijection that corresponds to
+the part of the key it represents. To convert the value back to the
+descriptor, we must deflate each part of the key and then “stitch” the
+values back together into a 64-bit descriptor.
+
+To summarise, key generators are just bijections that can generate
+multiple values for a single 64-bit descriptor instead of one. A
+simplified and generalized version of such bijection may look
+something like this:
+
+----
+Object[] inflate(long descriptor) {
+  long[] slices = slice(descriptor);
+  Object[] key = new Object[slices.length];
+  for (int i = 0; i < slices.length; i++) {
+     key[i] = children[i].inflate(slices[i]);
+  }
+  return key;
+}
+
+long deflate(Object[] value) {
+  long[] slices = new long[value.length];
+  for (int i = 0; i < value.length; i++) {
+     slices[i] = children[i].deflate(value[i]);
+  }
+  return stitch(slices);
+}
+----
+
+Values generated by key generators preserve the order of descriptors
+they were generated from, which allows efficiently checking the order
+of results, comparing clustering descriptors, and validating range
+deletions.
+
+== Putting it all together 
+
+In this post, we’ve learned how the various parts of Harry work,
+starting with how to reproduce a sequence of operations up to how the
+values are generated. Using this information, we can create a
+quiescent model checker that can validate the state of the database in
+the absence of in-flight operations, assuming we know the state of all
+operations before this moment.
+
+As we’ve discussed, Harry is working with reproducible histories of
+operations, where the following information identifies each operation:
+
+----

Review comment:
       For consistency, insert the line `[source,java]` here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org