You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Joshua McKenzie <jm...@apache.org> on 2020/08/18 21:44:28 UTC
Re: [DISCUSS] A point of view on Testing Cassandra

This totally dropped off my radar; the call out from the SAI thread
reminded me. Thanks Benedict.

I think you raised some great points here about what a "minimum viable
testing" might look like for a new feature:

> New features should be required to include randomised integration tests
> that exercise all of the functions of the feature in random combinations
> and verifies that the behaviour is consistent with expectation.  New
> functionality for an existing feature should augment any existing such
> tests to include the new functionality in its random exploration of
> behaviour.
>


> For a given system/feature/function, we should run with _every_ user
> option and every feature behaviour at least once;
>
Aim for testing all combinations of options and features if possible, fall
back to random combinations if not.

Seems like maybe some high level principles surface from the discussion
(I'm taking a bit of editorial liberty adding some things and intentionally
simplifying here):

For releases:

   - No perf regressions
   - A healthy (several hundred) corpus of user schemas tested in both
   mixed version and final stable version clusters
   - A tool to empower end users to test their schemas and workloads on
   mixed version and new version clusters prior to upgrading
   - Green test board
   - Adversarial testing


For new features:

   - All functions exercised w/random inputs and deliberate bad inputs
   - All functions exercised intentionally at boundary conditions
   - All functions exercised in a variety of failure and exception scenarios
   - Run with every user option and feature behavior at least once
   - Aim for testing all combinations of options and features if possible,
   fall back to random combinations if not
   - At least N% code coverage on tests

Maybe some of the above will prove useful or validating for the work you're
doing on articulating a tactical PoV on testing on the project Benedict.

On Thu, Jul 16, 2020 at 5:26 AM Benedict Elliott Smith <be...@apache.org>
wrote:

> Thanks for getting the ball rolling.  I think we need to be a lot more
> specific, though, and it may take some time to hash it all out.
>
> For starters we need to distinguish between types of "done" - are we
> discussing:
>  - Release
>  - New Feature
>  - New Functionality (for an existing feature)
>  - Performance Improvement
>  - Minor refactor
>  - Bug fix
>
> ?  All of these (perhaps more) require unique criteria in my opinion.
>
> For example:
>  - New features should be required to include randomised integration tests
> that exercise all of the functions of the feature in random combinations
> and verifies that the behaviour is consistent with expectation.  New
> functionality for an existing feature should augment any existing such
> tests to include the new functionality in its random exploration of
> behaviour.
>  - Releases are more suitable for many of your cluster-level tests, IMO,
> particularly if we get regular performance regression tests running against
> trunk (something for a shared roadmap)
>
> Then, there are various things that need specifying more clearly, e.g.:
>
> > Minimum 75% code coverage on non-boilerplate code
> Coverage by what? In my model, randomised integration tests of the
> relevant feature, but we need to agree specifically. Some thoughts:
>  - Not clear the value of code coverage measures, but 75% perhaps an
> acceptable arbitrary number if we want a lower bound
>  - More pertinent measure is options and behaviours
>     - For a given system/feature/function, we should run with _every_ user
> option and every feature behaviour at least once;
>     - Where tractable, exhaustive coverage (every combination of option,
> with every logical behaviour);
>     - Where not possible, random combinations of options and behaviours.
>
> > - Some form of the above in mixed-version clusters
> I think we need to include mixed-schema, and modified-schema clusters as
> well, as this is a significant source of bugs
>
> > aggressively adversarial scenarios
> As far as chaos is concerned, I hope to bring an addition to in-jvm dtests
> soon, that should facilitate this for more targeted correctness tests - so
> problems can be surfaced more rapidly and repeatably.  Also with much less
> hardware :)
>
>
> On 15/07/2020, 22:35, "Joshua McKenzie" <jm...@apache.org> wrote:
>
>     I like that the "we need a Definition of Done" seems to be surfacing.
> No
>     directed intent from opening this thread but it seems a serendipitous
>     outcome. And to reiterate - I didn't open this thread with the hope or
>     intent of getting all of us to agree on anything or explore what we
> should
>     or shouldn't agree on. That's not my place nor is it historically how
> we
>     seem to operate. :) Just looking to share a PoV so other project
>     participants know about some work coming down the pipe and can engage
> if
>     they're interested.
>
>     Brainstorming here to get discussion started, which we could drop in a
> doc
>     and riff on or high bandwidth w/collaborators interested in the topic:
>
>        - Tested on clusters with N nodes (10? 50? 3?) <- I'd start at
> proposing
>        min maybe 25
>        - Tested on data set sizes >= <M>TB (Maybe 30 given the 25 node
> count
>        w/current density)
>        - Soak tested in aggressively adversarial scenarios w/proven
> correctness
>        for 72 hours (fallout w/nodes down, up, bounce, GC pausing, major
>        compaction, major repair, packet loss, bootstrapping, etc. We could
> come up
>        with a list)
>        - Some form of the above in mixed-version clusters
>        - Minimum 75% code coverage on non-boilerplate code
>        - Where possible (i.e. not a brand new semantic / feature),
> diff-tested
>        against existing schemas making use of APIs in mixed version
> clusters as
>        well as on new-version only clusters (in case of refactor /
> internal black
>        box rewrite)
>
>     Some discrete bars like the above for a definition of done may make
> sense.
>     Any other ideas to add or differing points of view on what the #'s
> above
>     should be? Or disagreement on the items in the list above?
>
>     I hold all the above loosely, so don't hesitate to respond, disagree,
> or
>     totally shoot down. Or propose an entirely different approach to
>     determining a Definition of Done we could engage with.
>
>     Last but not least, we'd have to make infrastructure like this
> available to
>     the project at large for usage and validation on testing features or
> this
>     exercise will simply serve to deter engagement with the project
> outside a
>     small subset of the population with resources to dedicate to this type
> of
>     testing which I think we don't want.
>
>     On Wed, Jul 15, 2020 at 11:53 AM Benedict Elliott Smith <
> benedict@apache.org>
>     wrote:
>
>     > Perhaps you could clarify what you personally hope we _should_ agree
> as a
>     > project, and what you want us to _not_ agree (blossom in infinite
> variety)?
>     >
>     > My view: We need to agree a shared framework for quality going
> forwards.
>     > This will raise the bar to contributions, including above many that
> already
>     > exist.  So, we then need a roadmap to meeting the framework's
> requirements
>     > for past and future contributions, so that feature development does
> not
>     > suffer too greatly from the extra expectations imposed upon them.  I
> hope
>     > the framework and roadmap will be very specific and prescriptive in
> setting
>     > their minimum standards, which can of course be further augmented as
> any
>     > contributor desires.
>     >
>     > This seems to be the only way to come to an agreement about the
> point of
>     > contention you raise: some people perceive an insufficient concern
> about
>     > quality, others perceive a surplus of concern about quality.  Until
> we
>     > agree quite specifically what we mean, this tension will persist.  I
> also
>     > think it's a great way to improve project efficiency, if a
> contributor so
>     > cares: resources can be focused on the shared requirements first,
> since
>     > they're the "table stakes".
>     >
>     > Could you elaborate what you would prefer to leave out of this in
> your
>     > "Definition of Done"?
>     >
>     >
>     > On 15/07/2020, 16:28, "Joshua McKenzie" <jm...@apache.org>
> wrote:
>     >
>     >     >
>     >     > This section reads as very anti-adding tests to test/unit; I
> am 100%
>     > in
>     >     > favor of improving/creating our smoke, integration, regression,
>     >     > performance, E2E, etc. testing, but don't think I am as
> negative to
>     >     > test/unit, these tests are still valuable and more are welcome.
>     >
>     >     I am a strong proponent of unit tests; upon re-reading the
> document I
>     > don't
>     >     draw the same conclusion you do about the implications of the
>     >     verbiage, however it's completely reasonable to have a point of
> view
>     > that's
>     >     skeptical of people on this project's dedication to rigor and
> quality.
>     > :) I
>     >     think it's critical to "name and tame" the current architectural
>     >     constraints that undermine our ability to thoroughly unit test,
> as
>     > well as
>     >     understand and mitigate the weaknesses of our current unit
> testing
>     >     capabilities. A discrete example - attempting to "unit test"
> anything
>     > in
>     >     the CommitLog largely leads to the entire CommitLog package
> spinning
>     > up,
>     >     which drags in other packages, and before you know it you have
> multiple
>     >     modules up and running thanks to the dependency tree. This is
> something
>     >     myself, Jason, Stupp, Branimir, and others have all repeatedly
> burned
>     > time
>     >     on trying to delicately walk through re: test spin up and tear
> down.
>     > This
>     >     has ramifications far beyond just the time lost by engineers; the
>     >     opportunity cost of that combined with the fragility of systems
> means
>     > that
>     >     what testing we *do* perform is going to be constrained in scope
>     > relative
>     >     to a traditional battery against a stand-alone, modularized
> artifact.
>     >
>     >     Any and all contribution to *any* testing is strongly welcomed
> by all
>     > of us
>     >     on the project. In terms of "where I and a few others are going
> to
>     > choose
>     >     to invest our efforts" right now, accepting the current
> shortcomings
>     > of the
>     >     system to make as much headway on the urgent + important is
> where we're
>     >     headed.
>     >
>     >     I think it's more important that we set a standard for the
> project
>     > (e.g.,
>     >     > fundamental conformance to properties of the database) rather
> than
>     >     > attempting to measure quality relative to other DBs
>     >
>     >     I'm sympathetic to this then the pragmatist in me hammers me
> down. In
>     >     general, the adage "Software is never done; it is only released"
>     > resonates
>     >     for me as the core of what we have to navigate here. We will
> never be
>     > able
>     >     to state with 100% certainty that there is fundamental
> conformance to
>     > the
>     >     availability and correctness properties of the database; this
>     > dissatisfying
>     >     reality is why you have multiple teams implementing the software
> for
>     >     spacecraft and then redundancies within redundancies in each
> system for
>     >     unexpected failure scenarios and the unknown-unknown. In my
> opinion, we
>     >     need a very clear articulation of our Definition of Done when it
> comes
>     > to
>     >     correctness guarantees (yes Ariel, you were right) as well as a
> more
>     >     skillful and deliberate articulated and implemented "failsafe"
> for
>     > catching
>     >     things and/or surfacing adverse conditions within the system upon
>     > failure.
>     >
>     >     It's tricky because in the past (in my opinion) we've been pretty
>     > remiss as
>     >     a project when it comes to a devotion to correctness and rigor.
> The
>     > danger
>     >     I'm anecdotally seeing is that if we let that pendulum swing too
> far
>     > in the
>     >     other direction without successfully clearly defining what
> "Done" looks
>     >     like from a quality perspective, that's an Everest we can all
> climb
>     > and die
>     >     on as a project.
>     >
>     >     On Wed, Jul 15, 2020 at 12:42 AM Scott Andreas <
> scott@paradoxica.net>
>     > wrote:
>     >
>     >     > Thanks for starting discussion!
>     >     >
>     >     > Replying to the thread with what I would have left as comments.
>     >     >
>     >     > ––––––
>     >     >
>     >     > > As yet, we lack empirical evidence to quantify the relative
>     > stability or
>     >     > instability of our project compared to a peer cohort
>     >     >
>     >     > I think it's more important that we set a standard for the
> project
>     > (e.g.,
>     >     > fundamental conformance to properties of the database) rather
> than
>     >     > attempting to measure quality relative to other DBs. That
> might be a
>     > useful
>     >     > measure, but I don't think it's the most important one. With
> regard
>     > to
>     >     > measuring against a common standard in the project, this is
> roughly
>     > what I
>     >     > had in mind when proposing "Release Quality Metrics" on the
> list in
>     > 2018. I
>     >     > still think making progress on something like this is essential
>     > toward
>     >     > defining a quantitative bar for release:
>     >     >
> https://www.mail-archive.com/dev@cassandra.apache.org/msg13154.html
>     >     >
>     >     > > Conversely, the ability to repeatedly and thoroughly
> validate the
>     >     > correctness of both new and existing functionality in the
> system is
>     > vital
>     >     > to the speed with which we can evolve it's form and function.
>     >     >
>     >     > Strongly agreed.
>     >     >
>     >     > > Utopia (and following section)
>     >     >
>     >     > Some nods to great potential refactors to consider post-4.0
> here. ^
>     >     >
>     >     > > We should productize a kubernetes-centric, infra agnostic
> tool
>     > that has
>     >     > the following available testing paradigms:
>     >     >
>     >     > This would be an excellent set of capabilities to have.
>     >     >
>     >     > > We need to empower our user community to participate in the
> testing
>     >     > process...
>     >     >
>     >     > I really like this point. I took as a thought experiment "what
> would
>     > feel
>     >     > great to be able to say" if one were to write a product
> announcement
>     > for
>     >     > 4.0 and landed on something like "Users of Apache Cassandra can
>     > preflight
>     >     > their 4.0 upgrade by runing $tool to clone, upgrade, and
> compare
>     > their
>     >     > clusters, ensuring that the upgrade will complete smoothly and
>     > correctly."
>     >     >
>     >     > > The less friction and less investment we can require from
> ecosystem
>     >     > participants, the more we can expect them to engage in desired
>     > behavior.
>     >     >
>     >     > +1
>     >     >
>     >     > ––––––
>     >     >
>     >     > I like the document and there's a lot that has me nodding.
> Toward the
>     >     > opening statement on "empirical evidence to quantify relative
>     > stability,"
>     >     > I'd love to revisit discussion on quantifying attributes like
> these
>     > here:
>     >     >
>     >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=93324430
>     >     >
>     >     > – Scott
>     >     >
>     >     > ________________________________________
>     >     > From: David Capwell <dc...@gmail.com>
>     >     > Sent: Tuesday, July 14, 2020 6:23 PM
>     >     > To: dev@cassandra.apache.org
>     >     > Subject: Re: [DISCUSS] A point of view on Testing Cassandra
>     >     >
>     >     > I am also not fully clear on the motives, but welcome anything
> which
>     > helps
>     >     > bring in better and more robust testing; thanks for starting
> this.
>     >     >
>     >     > Since I can not comment in the doc I have to copy/paste and put
>     > here... =(
>     >     >
>     >     > Reality
>     >     > > ...
>     >     > > investing in improving our smoke and integration testing as
> much
>     > as is
>     >     > > possible with our current constraints seems prudent.
>     >     >
>     >     >
>     >     > This section reads as very anti-adding tests to test/unit; I
> am 100%
>     > in
>     >     > favor of improving/creating our smoke, integration, regression,
>     >     > performance, E2E, etc. testing, but don't think I am as
> negative to
>     >     > test/unit, these tests are still valuable and more are welcome.
>     >     >
>     >     > To enumerate a punch list of traits we as engineers need from a
>     > testing
>     >     > > suite
>     >     >
>     >     >
>     >     > Would be good to speak about portability, accessibility, and
> version
>     >     > independents.  If a new contributor wants to add tests to this
> suite
>     > they
>     >     > need to be able to run it, and it should run within a
> "reasonable"
>     > time
>     >     > frame; one of the big issuers with python dtests is that it
> takes
>     > 14+ hours
>     >     > to run, this makes it no longer accessible to new contributors.
>     >     >
>     >     >
>     >     > On Tue, Jul 14, 2020 at 11:47 AM Joshua McKenzie <
>     > jmckenzie@apache.org>
>     >     > wrote:
>     >     >
>     >     > > The purpose is purely to signal a point of view on the state
> of
>     > testing
>     >     > in
>     >     > > the codebase, some shortcomings of the architecture, and
> what a
>     > few of us
>     >     > > are doing and further planning to do about it. Kind of a
> "prompt
>     >     > discussion
>     >     > > if anyone has a wild allergic reaction to it, or encourage
>     > collaboration
>     >     > if
>     >     > > they have a wild positive reaction" sort of thing. Maybe a
>     > spiritual
>     >     > > "CEP-lite". :)
>     >     > >
>     >     > > I would advocate that we be very selective about the topics
> on
>     > which we
>     >     > > strive for a consistent shared point of view as a project.
> There
>     > are a
>     >     > lot
>     >     > > of us and we all have different experiences and different
> points
>     > of view
>     >     > > that lead to different perspectives and value systems.
> Agreeing on
>     >     > discrete
>     >     > > definitions of done, 100% - that's table stakes. But
> agreeing on
>     > how we
>     >     > get
>     >     > > there, my personal take is we'd all be well served to spend
> our
>     > energy
>     >     > > Doing the Work and expressing these complementary positions
> rather
>     > than
>     >     > > trying to bend everyone to one consistent point of view.
>     >     > >
>     >     > > Let a thousand flowers bloom, as someone wise recently told
> me. :)
>     >     > >
>     >     > > That said, this work will be happening in an open source
> repo with
>     > a
>     >     > > permissive license (almost certainly ASLv2), likely using
> github
>     > issues,
>     >     > so
>     >     > > anyone that wants to collaborate on it would be most
> welcome. I
>     > can make
>     >     > > sure Gianluca, Charles, Berenguer, and others bring that to
> this ML
>     >     > thread
>     >     > > once we've started open-sourcing things.
>     >     > >
>     >     > > On Tue, Jul 14, 2020 at 4:25 AM Benedict Elliott Smith <
>     >     > > benedict@apache.org>
>     >     > > wrote:
>     >     > >
>     >     > > > It does raise the bar to critiquing the document though,
> but
>     > perhaps
>     >     > > > that's also a feature.
>     >     > > >
>     >     > > > Perhaps we can first discuss the purpose of the document?
> It
>     > seems to
>     >     > be
>     >     > > a
>     >     > > > mix of mission statement for the project, as well as your
> own
>     > near term
>     >     > > > roadmap?  Should we interpret it only as an advertisement
> of
>     > your own
>     >     > > view
>     >     > > > of the problems the project faces, as a start to dialogue,
> or is
>     > the
>     >     > > > purpose to solicit feedback?
>     >     > > >
>     >     > > > Would it be helpful to work towards a similar document the
> whole
>     >     > > community
>     >     > > > endorses, with a shared mission statement, and a (perhaps
> loosely
>     >     > > defined)
>     >     > > > shared roadmap?
>     >     > > >
>     >     > > > I'd like to call out some specific things in the document
> that I
>     > am
>     >     > > > personally excited by: the project has long lacked a
> coherent,
>     >     > repeatable
>     >     > > > approach to performance testing and regressions; combined
> with
>     > easy
>     >     > > > visualisation tools this would be a huge win.  The FQL
> sampling
>     > with
>     >     > data
>     >     > > > distribution inference is also something that has been
> discussed
>     >     > > privately
>     >     > > > elsewhere, and would be hugely advantageous to the former,
> so
>     > that we
>     >     > can
>     >     > > > discover representative workloads.
>     >     > > >
>     >     > > > Thanks for taking the time to put this together, and start
> this
>     >     > dialogue.
>     >     > > >
>     >     > > >
>     >     > > > On 13/07/2020, 23:41, "Joshua McKenzie" <
> jmckenzie@apache.org>
>     > wrote:
>     >     > > >
>     >     > > >     >
>     >     > > >     > Can you please allow comments on the doc so we can
> leave
>     >     > feedback.
>     >     > > >     >
>     >     > > >
>     >     > > >
>     >     > > >     > Doc is view only; figured we could keep this to the
> ML.
>     >     > > >     >
>     >     > > >     That's a feature, not a bug.
>     >     > > >
>     >     > > >     Happy to chat here or on slack w/anyone. This is a
> complex
>     > topic so
>     >     > > >     long-form or high bandwidth communication is a better
> fit
>     > than gdoc
>     >     > > >     comments. They rapidly become unwieldy.
>     >     > > >
>     >     > > >     On Mon, Jul 13, 2020 at 6:17 PM sankalp kohli <
>     >     > > kohlisankalp@gmail.com>
>     >     > > >     wrote:
>     >     > > >
>     >     > > >     > Can you please allow comments on the doc so we can
> leave
>     >     > feedback.
>     >     > > >     >
>     >     > > >     > On Mon, Jul 13, 2020 at 2:16 PM Joshua McKenzie <
>     >     > > > jmckenzie@apache.org>
>     >     > > >     > wrote:
>     >     > > >     >
>     >     > > >     > > Link:
>     >     > > >     > >
>     >     > > >     > >
>     >     > > >     >
>     >     > > >
>     >     > >
>     >     >
>     >
> https://docs.google.com/document/d/1ktuBWpD2NLurB9PUvmbwGgrXsgnyU58koOseZAfaFBQ/edit#
>     >     > > >     > >
>     >     > > >     > >
>     >     > > >     > > Myself and a few other contributors are working
> with
>     > this point
>     >     > > of
>     >     > > > view
>     >     > > >     > as
>     >     > > >     > > our frame of where we're going to work on improving
>     > testing on
>     >     > > the
>     >     > > >     > project.
>     >     > > >     > > I figured it might be useful to foster
> collaboration more
>     >     > broadly
>     >     > > > in the
>     >     > > >     > > community as well as provide people with the
> opportunity
>     > to
>     >     > > > discuss work
>     >     > > >     > > they're doing they may not yet have had a chance to
>     > bring up or
>     >     > > > open
>     >     > > >     > > source. While fallout is already open-sourced,
> expect the
>     >     > schema
>     >     > > >     > anonymizer
>     >     > > >     > > and some of the cassandra-diff + nosqlbench
> framework
>     > effort to
>     >     > > be
>     >     > > >     > > open-sourced / openly worked on soon. Anyone that's
>     > interested
>     >     > in
>     >     > > >     > > collaborating, that would be highly welcome.
>     >     > > >     > >
>     >     > > >     > > Doc is view only; figured we could keep this to
> the ML.
>     >     > > >     > >
>     >     > > >     > > Thanks.
>     >     > > >     > >
>     >     > > >     > > ~Josh
>     >     > > >     > >
>     >     > > >     >
>     >     > > >
>     >     > > >
>     >     > > >
>     >     > > >
>     > ---------------------------------------------------------------------
>     >     > > > To unsubscribe, e-mail:
> dev-unsubscribe@cassandra.apache.org
>     >     > > > For additional commands, e-mail:
> dev-help@cassandra.apache.org
>     >     > > >
>     >     > > >
>     >     > >
>     >     >
>     >     >
> ---------------------------------------------------------------------
>     >     > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>     >     > For additional commands, e-mail: dev-help@cassandra.apache.org
>     >     >
>     >     >
>     >
>     >
>     >
>     > ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>     > For additional commands, e-mail: dev-help@cassandra.apache.org
>     >
>     >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>