You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Alex Petrov <al...@coffeenco.de> on 2022/02/16 11:45:51 UTC

Apache Cassandra fuzz testing

Hi everyone,

As you may know, we've been actively working on fuzz testing Apache Cassandra for the past several years and made quite a large progress on that front.

Re: Apache Cassandra fuzz testing

Posted by "benedict@apache.org" <be...@apache.org>.

I’m not sure we have lots of bandwidth for upgrading existing tests anyway. However, the source of flakiness in existing tests is primarily either environmental or poor test design (relying on timings being a major culprit). If Harry were to produce flakiness it would have a higher likelihood of being real problems, and they would be reproducible if the tests were deterministic.

The Simulator on the other hand might be helpful for flaky tests, by being deterministic. We might want to develop a JUnitRunner that is backed by the simulator so we can easily switch it on to help diagnose flaky tests, and also for improved testing of concurrent code unit tests. We probably would not want to use it for all tests, however, as it might well slow down execution.

From: Stefan Miklosovic <st...@instaclustr.com>
Date: Friday, 18 February 2022 at 08:57
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: Apache Cassandra fuzz testing
Benjamin's email could be written by myself :) Fully agree.

On Fri, 18 Feb 2022 at 09:42, Benjamin Lerer <b....@gmail.com> wrote:
>
> Thanks a lot for raising that topic Alex.
>
> I did not have the chance to use Harry yet and I guess it is the case for most of us.
> Starting to use it in our new tests makes total sense to me.
> I am more concerned about starting to migrate/update existing tests. It took us time to build some reliable and non flaky tests to guarantee the correctness of the codebase. As far as I can see from Harry's documentation some features are still missing. The people lack experience with this tool and it will take a bit of time for them to build that knowledge. Along the way we might also discover some issues with Harry that need to be addressed.
>
> So I am +1 for starting to use it in our new tests and build our knowledge of Harry. Regarding a migration of existing tests to it, I would wait a bit before choosing to go down that path.
>
>
>
> Le mer. 16 févr. 2022 à 16:30, benedict@apache.org <be...@apache.org> a écrit :
>>
>> +1
>>
>>
>>
>> The Simulator is hopefully going to be another powerful tool for this kind of work, and we should be encouraging the use of both for large or complex pieces of work.
>>
>>
>>
>>
>>
>> From: Alex Petrov <al...@coffeenco.de>
>> Date: Wednesday, 16 February 2022 at 11:56
>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>> Subject: Re: Apache Cassandra fuzz testing
>>
>> (apologies for sending an incomplete email)
>>
>>
>>
>> Hi everyone,
>>
>>
>>
>> As you may know, we’ve been actively working on fuzz testing Apache Cassandra for the past several years and made quite a large progress on that front.
>>
>>
>>
>> We’ve cut a 0.0.1 release of Harry [1], a fuzz testing tool for apache Cassandra and merged CASSANDRA-16262 [2].
>>
>>
>>
>> I’d recommend us as a community to take the next logical step and demand fuzz / property-based tests for all marjor patches, and start migrating/updating existing tests to be property-based rather than using hardoced values.
>>
>>
>>
>> Harry can be used to generate data, and then check that a sequence of events corresponds to Cassandra resolution rules. We will continue expanding Harry coverage and writing new models and checkers, too.
>>
>>
>>
>> If you would like to learn more about Harry, you can refer to a recent blog post [3]. I will also be happy to answer any questions you may have about Harry and assist you in writing your tests, and helping to extend Harry in case there’s a feature you may need to accomplish it.
>>
>>
>>
>> Thank you,
>>
>> —Alex
>>
>>
>>
>> [1] [GitHub - apache/cassandra-harry: Apache Cassandra - Harry](https://github.com/apache/cassandra-harry)
>>
>> [2] [CASSANDRA-16262 4.0 Quality: Coordination & Replication Fuzz Testing - ASF JIRA](https://issues.apache.org/jira/browse/CASSANDRA-16262)
>>
>> [3] [Apache Cassandra | Apache Cassandra Documentation](https://cassandra.apache.org/_/blog/Harry-an-Open-Source-Fuzz-Testing-and-Verification-Tool-for-Apache-Cassandra.html)

Re: Apache Cassandra fuzz testing

Posted by Stefan Miklosovic <st...@instaclustr.com>.

Benjamin's email could be written by myself :) Fully agree.

On Fri, 18 Feb 2022 at 09:42, Benjamin Lerer <b....@gmail.com> wrote:
>
> Thanks a lot for raising that topic Alex.
>
> I did not have the chance to use Harry yet and I guess it is the case for most of us.
> Starting to use it in our new tests makes total sense to me.
> I am more concerned about starting to migrate/update existing tests. It took us time to build some reliable and non flaky tests to guarantee the correctness of the codebase. As far as I can see from Harry's documentation some features are still missing. The people lack experience with this tool and it will take a bit of time for them to build that knowledge. Along the way we might also discover some issues with Harry that need to be addressed.
>
> So I am +1 for starting to use it in our new tests and build our knowledge of Harry. Regarding a migration of existing tests to it, I would wait a bit before choosing to go down that path.
>
>
>
> Le mer. 16 févr. 2022 à 16:30, benedict@apache.org <be...@apache.org> a écrit :
>>
>> +1
>>
>>
>>
>> The Simulator is hopefully going to be another powerful tool for this kind of work, and we should be encouraging the use of both for large or complex pieces of work.
>>
>>
>>
>>
>>
>> From: Alex Petrov <al...@coffeenco.de>
>> Date: Wednesday, 16 February 2022 at 11:56
>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>> Subject: Re: Apache Cassandra fuzz testing
>>
>> (apologies for sending an incomplete email)
>>
>>
>>
>> Hi everyone,
>>
>>
>>
>> As you may know, we’ve been actively working on fuzz testing Apache Cassandra for the past several years and made quite a large progress on that front.
>>
>>
>>
>> We’ve cut a 0.0.1 release of Harry [1], a fuzz testing tool for apache Cassandra and merged CASSANDRA-16262 [2].
>>
>>
>>
>> I’d recommend us as a community to take the next logical step and demand fuzz / property-based tests for all marjor patches, and start migrating/updating existing tests to be property-based rather than using hardoced values.
>>
>>
>>
>> Harry can be used to generate data, and then check that a sequence of events corresponds to Cassandra resolution rules. We will continue expanding Harry coverage and writing new models and checkers, too.
>>
>>
>>
>> If you would like to learn more about Harry, you can refer to a recent blog post [3]. I will also be happy to answer any questions you may have about Harry and assist you in writing your tests, and helping to extend Harry in case there’s a feature you may need to accomplish it.
>>
>>
>>
>> Thank you,
>>
>> —Alex
>>
>>
>>
>> [1] [GitHub - apache/cassandra-harry: Apache Cassandra - Harry](https://github.com/apache/cassandra-harry)
>>
>> [2] [CASSANDRA-16262 4.0 Quality: Coordination & Replication Fuzz Testing - ASF JIRA](https://issues.apache.org/jira/browse/CASSANDRA-16262)
>>
>> [3] [Apache Cassandra | Apache Cassandra Documentation](https://cassandra.apache.org/_/blog/Harry-an-Open-Source-Fuzz-Testing-and-Verification-Tool-for-Apache-Cassandra.html)

Re: Apache Cassandra fuzz testing

Posted by "benedict@apache.org" <be...@apache.org>.

> There are many tests that are currently purely manual, and some are just hard to maintain….. And whenever we add support for, say, UDTs, overnight you'll just get UDTs for all existing tests

Yes, something worth really highlighting here is that many of our tests are flaky because we have so many tests, many of low quality, where determinism/reliability has been too costly to deliver. With fewer tests able to cover more functionality, investment in reliability and determinism more easily pay off. Also, by moving to frameworks that have done some of this heavy lifting, it is anyway easier to achieve.

I agree that some areas of the codebase might be quite ripe for this kind of work, particularly for more complex CQL features and ones being invested in today, or in the near future. MVs seem an obvious example, as part of work to move them out of experimental status. I’m uncertain if SAI is suitable for use with Harry, but it could be explored.

From: Alex Petrov <al...@coffeenco.de>
Date: Friday, 18 February 2022 at 11:39
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: Apache Cassandra fuzz testing
I did not intend to imply that we should migrate all tests. To be more specific than I was, we can pick up only ones where Harry just makes more sense than manual tests, where it can cover more ground. GROUP BY comes to mind as a perfect example: its current test suite is rather limited. Fuzzing it can yield a lot of useful things, with very little risk for flakiness. It can completely replace existing test suite and test many more cases.

Another example - SelectTest and many tests like it, which is just a manual way to go through a bunch of cases, while leaving out many other potential edge-cases. TTL tests would be the next example. Range tombstones - yet another. Read repair tests would also be good to expand. Many python dtests that use stress to load data are another potential candidate.

There are many tests that are currently purely manual, and some are just hard to maintain. Many of those can be good candidates for switching to property-based. But, as Benedict mentioned, we don't have much bandwidth to migrate the tests anyways.

It could be that you are skeptical since you haven't had much experience with Harry just yet. While many features are still missing, it still is more powerful than many existing manually written tests. And whenever we add support for, say, UDTs, overnight you'll just get UDTs for all existing tests, followed by collections, and other things. Moreover, we will be able to see if all our tests pass under failure conditions, and test them with different sets of parameters.

Maybe if I reframe it and say that we add fuzz tests for the mentioned areas of code and, if we, at some point in the future, decide that manually-written tests are redundant, we can consider deprecating them.

On Fri, Feb 18, 2022, at 9:41 AM, Benjamin Lerer wrote:
Thanks a lot for raising that topic Alex.

I did not have the chance to use Harry yet and I guess it is the case for most of us.
Starting to use it in our new tests makes total sense to me.
I am more concerned about starting to migrate/update existing tests. It took us time to build some reliable and non flaky tests to guarantee the correctness of the codebase. As far as I can see from Harry's documentation some features are still missing. The people lack experience with this tool and it will take a bit of time for them to build that knowledge. Along the way we might also discover some issues with Harry that need to be addressed.

So I am +1 for starting to use it in our new tests and build our knowledge of Harry. Regarding a migration of existing tests to it, I would wait a bit before choosing to go down that path.

Le mer. 16 févr. 2022 à 16:30, benedict@apache.org<ma...@apache.org> <be...@apache.org>> a écrit :

+1

The Simulator is hopefully going to be another powerful tool for this kind of work, and we should be encouraging the use of both for large or complex pieces of work.

From: Alex Petrov <al...@coffeenco.de>>
Date: Wednesday, 16 February 2022 at 11:56
To: dev@cassandra.apache.org<ma...@cassandra.apache.org> <de...@cassandra.apache.org>>
Subject: Re: Apache Cassandra fuzz testing

(apologies for sending an incomplete email)

Hi everyone,

As you may know, we’ve been actively working on fuzz testing Apache Cassandra for the past several years and made quite a large progress on that front.

We’ve cut a 0.0.1 release of Harry [1], a fuzz testing tool for apache Cassandra and merged CASSANDRA-16262 [2].

I’d recommend us as a community to take the next logical step and demand fuzz / property-based tests for all marjor patches, and start migrating/updating existing tests to be property-based rather than using hardoced values.

Harry can be used to generate data, and then check that a sequence of events corresponds to Cassandra resolution rules. We will continue expanding Harry coverage and writing new models and checkers, too.

If you would like to learn more about Harry, you can refer to a recent blog post [3]. I will also be happy to answer any questions you may have about Harry and assist you in writing your tests, and helping to extend Harry in case there’s a feature you may need to accomplish it.

Thank you,

—Alex

[1] [GitHub - apache/cassandra-harry: Apache Cassandra - Harry](https://github.com/apache/cassandra-harry)

[2] [CASSANDRA-16262 4.0 Quality: Coordination & Replication Fuzz Testing - ASF JIRA](https://issues.apache.org/jira/browse/CASSANDRA-16262)

[3] [Apache Cassandra | Apache Cassandra Documentation](https://cassandra.apache.org/_/blog/Harry-an-Open-Source-Fuzz-Testing-and-Verification-Tool-for-Apache-Cassandra.html)

Re: Apache Cassandra fuzz testing

Posted by Alex Petrov <al...@coffeenco.de>.

I did not intend to imply that we should migrate all tests. To be more specific than I was, we can pick up only ones where Harry just makes more sense than manual tests, where it can cover more ground. GROUP BY comes to mind as a perfect example: its current test suite is rather limited. Fuzzing it can yield a lot of useful things, with very little risk for flakiness. It can completely replace existing test suite and test many more cases.

Another example - SelectTest and many tests like it, which is just a manual way to go through a bunch of cases, while leaving out many other potential edge-cases. TTL tests would be the next example. Range tombstones - yet another. Read repair tests would also be good to expand. Many python dtests that use stress to load data are another potential candidate.

There are many tests that are currently purely manual, and some are just hard to maintain. Many of those can be good candidates for switching to property-based. But, as Benedict mentioned, we don't have much bandwidth to migrate the tests anyways.

It could be that you are skeptical since you haven't had much experience with Harry just yet. While many features are still missing, it still is more powerful than many existing manually written tests. And whenever we add support for, say, UDTs, overnight you'll just get UDTs for all existing tests, followed by collections, and other things. Moreover, we will be able to see if all our tests pass under failure conditions, and test them with different sets of parameters. 

Maybe if I reframe it and say that we add fuzz tests for the mentioned areas of code and, if we, at some point in the future, decide that manually-written tests are redundant, we can consider deprecating them. 

On Fri, Feb 18, 2022, at 9:41 AM, Benjamin Lerer wrote:
> Thanks a lot for raising that topic Alex.
> 
> I did not have the chance to use Harry yet and I guess it is the case for most of us.
> Starting to use it in our new tests makes total sense to me. 
> I am more concerned about starting to migrate/update existing tests. It took us time to build some reliable and non flaky tests to guarantee the correctness of the codebase. As far as I can see from Harry's documentation some features are still missing. The people lack experience with this tool and it will take a bit of time for them to build that knowledge. Along the way we might also discover some issues with Harry that need to be addressed.
> 
> So I am +1 for starting to use it in our new tests and build our knowledge of Harry. Regarding a migration of existing tests to it, I would wait a bit before choosing to go down that path. 
> 
>  
> 
> Le mer. 16 févr. 2022 à 16:30, benedict@apache.org <be...@apache.org> a écrit :
>> +1____
>> __ __
>> The Simulator is hopefully going to be another powerful tool for this kind of work, and we should be encouraging the use of both for large or complex pieces of work.____
>> __ __
>> __ __
>> *From: *Alex Petrov <al...@coffeenco.de>
>> *Date: *Wednesday, 16 February 2022 at 11:56
>> *To: *dev@cassandra.apache.org <de...@cassandra.apache.org>
>> *Subject: *Re: Apache Cassandra fuzz testing____
>> (apologies for sending an incomplete email) ____
>> __ __
>> Hi everyone,____
>> __ __
>> As you may know, we’ve been actively working on fuzz testing Apache Cassandra for the past several years and made quite a large progress on that front.____
>> __ __
>> We’ve cut a 0.0.1 release of Harry [1], a fuzz testing tool for apache Cassandra and merged CASSANDRA-16262 [2].____
>> __ __
>> I’d recommend us as a community to take the next logical step and demand fuzz / property-based tests for all marjor patches, and start migrating/updating existing tests to be property-based rather than using hardoced values.____
>> __ __
>> Harry can be used to generate data, and then check that a sequence of events corresponds to Cassandra resolution rules. We will continue expanding Harry coverage and writing new models and checkers, too.____
>> __ __
>> If you would like to learn more about Harry, you can refer to a recent blog post [3]. I will also be happy to answer any questions you may have about Harry and assist you in writing your tests, and helping to extend Harry in case there’s a feature you may need to accomplish it.____
>> __ __
>> Thank you,____
>> —Alex____
>> __ __
>> [1] [GitHub - apache/cassandra-harry: Apache Cassandra - Harry](https://github.com/apache/cassandra-harry)____
>> [2] [CASSANDRA-16262 4.0 Quality: Coordination & Replication Fuzz Testing - ASF JIRA](https://issues.apache.org/jira/browse/CASSANDRA-16262)____
>> [3] [Apache Cassandra | Apache Cassandra Documentation](https://cassandra.apache.org/_/blog/Harry-an-Open-Source-Fuzz-Testing-and-Verification-Tool-for-Apache-Cassandra.html)____

Re: Apache Cassandra fuzz testing

Posted by Benjamin Lerer <b....@gmail.com>.

Thanks a lot for raising that topic Alex.

I did not have the chance to use Harry yet and I guess it is the case for
most of us.
Starting to use it in our new tests makes total sense to me.
I am more concerned about starting to migrate/update existing tests. It
took us time to build some reliable and non flaky tests to guarantee the
correctness of the codebase. As far as I can see from Harry's documentation
some features are still missing. The people lack experience with this tool
and it will take a bit of time for them to build that knowledge. Along the
way we might also discover some issues with Harry that need to be addressed.

So I am +1 for starting to use it in our new tests and build our knowledge
of Harry. Regarding a migration of existing tests to it, I would wait a bit
before choosing to go down that path.



Le mer. 16 févr. 2022 à 16:30, benedict@apache.org <be...@apache.org> a
écrit :

> +1
>
>
>
> The Simulator is hopefully going to be another powerful tool for this kind
> of work, and we should be encouraging the use of both for large or complex
> pieces of work.
>
>
>
>
>
> *From: *Alex Petrov <al...@coffeenco.de>
> *Date: *Wednesday, 16 February 2022 at 11:56
> *To: *dev@cassandra.apache.org <de...@cassandra.apache.org>
> *Subject: *Re: Apache Cassandra fuzz testing
>
> (apologies for sending an incomplete email)
>
>
>
> Hi everyone,
>
>
>
> As you may know, we’ve been actively working on fuzz testing Apache
> Cassandra for the past several years and made quite a large progress on
> that front.
>
>
>
> We’ve cut a 0.0.1 release of Harry [1], a fuzz testing tool for apache
> Cassandra and merged CASSANDRA-16262 [2].
>
>
>
> I’d recommend us as a community to take the next logical step and demand
> fuzz / property-based tests for all marjor patches, and start
> migrating/updating existing tests to be property-based rather than using
> hardoced values.
>
>
>
> Harry can be used to generate data, and then check that a sequence of
> events corresponds to Cassandra resolution rules. We will continue
> expanding Harry coverage and writing new models and checkers, too.
>
>
>
> If you would like to learn more about Harry, you can refer to a recent
> blog post [3]. I will also be happy to answer any questions you may have
> about Harry and assist you in writing your tests, and helping to extend
> Harry in case there’s a feature you may need to accomplish it.
>
>
>
> Thank you,
>
> —Alex
>
>
>
> [1] [GitHub - apache/cassandra-harry: Apache Cassandra - Harry](
> https://github.com/apache/cassandra-harry)
>
> [2] [CASSANDRA-16262 4.0 Quality: Coordination & Replication Fuzz Testing
> - ASF JIRA](https://issues.apache.org/jira/browse/CASSANDRA-16262)
>
> [3] [Apache Cassandra | Apache Cassandra Documentation](
> https://cassandra.apache.org/_/blog/Harry-an-Open-Source-Fuzz-Testing-and-Verification-Tool-for-Apache-Cassandra.html
> )
>

Re: Apache Cassandra fuzz testing

Posted by "benedict@apache.org" <be...@apache.org>.

+1

The Simulator is hopefully going to be another powerful tool for this kind of work, and we should be encouraging the use of both for large or complex pieces of work.

From: Alex Petrov <al...@coffeenco.de>
Date: Wednesday, 16 February 2022 at 11:56
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: Apache Cassandra fuzz testing
(apologies for sending an incomplete email)

Hi everyone,

As you may know, we’ve been actively working on fuzz testing Apache Cassandra for the past several years and made quite a large progress on that front.

We’ve cut a 0.0.1 release of Harry [1], a fuzz testing tool for apache Cassandra and merged CASSANDRA-16262 [2].

I’d recommend us as a community to take the next logical step and demand fuzz / property-based tests for all marjor patches, and start migrating/updating existing tests to be property-based rather than using hardoced values.

Harry can be used to generate data, and then check that a sequence of events corresponds to Cassandra resolution rules. We will continue expanding Harry coverage and writing new models and checkers, too.

If you would like to learn more about Harry, you can refer to a recent blog post [3]. I will also be happy to answer any questions you may have about Harry and assist you in writing your tests, and helping to extend Harry in case there’s a feature you may need to accomplish it.

Thank you,
—Alex

[1] [GitHub - apache/cassandra-harry: Apache Cassandra - Harry](https://github.com/apache/cassandra-harry)
[2] [CASSANDRA-16262 4.0 Quality: Coordination & Replication Fuzz Testing - ASF JIRA](https://issues.apache.org/jira/browse/CASSANDRA-16262)
[3] [Apache Cassandra | Apache Cassandra Documentation](https://cassandra.apache.org/_/blog/Harry-an-Open-Source-Fuzz-Testing-and-Verification-Tool-for-Apache-Cassandra.html)

Re: Apache Cassandra fuzz testing

Posted by Alex Petrov <al...@coffeenco.de>.

(apologies for sending an incomplete email) 

Hi everyone,

As you may know, we’ve been actively working on fuzz testing Apache Cassandra for the past several years and made quite a large progress on that front.

We’ve cut a 0.0.1 release of Harry [1], a fuzz testing tool for apache Cassandra and merged CASSANDRA-16262 [2].

I’d recommend us as a community to take the next logical step and demand fuzz / property-based tests for all marjor patches, and start migrating/updating existing tests to be property-based rather than using hardoced values.

Harry can be used to generate data, and then check that a sequence of events corresponds to Cassandra resolution rules. We will continue expanding Harry coverage and writing new models and checkers, too.

If you would like to learn more about Harry, you can refer to a recent blog post [3]. I will also be happy to answer any questions you may have about Harry and assist you in writing your tests, and helping to extend Harry in case there’s a feature you may need to accomplish it.

Thank you,
—Alex

[1] [GitHub - apache/cassandra-harry: Apache Cassandra - Harry](https://github.com/apache/cassandra-harry)
[2] [CASSANDRA-16262 4.0 Quality: Coordination & Replication Fuzz Testing - ASF JIRA](https://issues.apache.org/jira/browse/CASSANDRA-16262)
[3] [Apache Cassandra | Apache Cassandra Documentation](https://cassandra.apache.org/_/blog/Harry-an-Open-Source-Fuzz-Testing-and-Verification-Tool-for-Apache-Cassandra.html)