You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict (JIRA)" <ji...@apache.org> on 2015/03/18 14:44:38 UTC

[jira] [Comment Edited] (CASSANDRA-8986) Major cassandra-stress refactor

    [ https://issues.apache.org/jira/browse/CASSANDRA-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367134#comment-14367134 ] 

Benedict edited comment on CASSANDRA-8986 at 3/18/15 1:43 PM:
--------------------------------------------------------------

I agree, but that is also independent of this goal. I plan to do that refactor first (as a separate ticket; I think I have a few related ones filed). I do intend to retain a "simple" mode, though, since the old mode is still used widely, but will transparently create a StressProfile to perform it.

edit: ... actually, we may disagree a little. I want to ensure the profile can specify everything, but the cli is still a very useful way to override a number of properties, especially for scripting. Forcing users to write a separate yaml for every possible test is really ugly IMO.


was (Author: benedict):
I agree, but that is also independent of this goal. I plan to do that refactor first (as a separate ticket; I think I have a few related ones filed). I do intend to retain a "simple" mode, though, since the old mode is still used widely, but will transparently create a StressProfile to perform it.

> Major cassandra-stress refactor
> -------------------------------
>
>                 Key: CASSANDRA-8986
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8986
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Benedict
>            Assignee: Benedict
>
> We need a tool for both stressing _and_ validating more complex workloads than stress currently supports. Stress needs a raft of changes, and I think it would be easier to deliver many of these as a single major endeavour which I think is justifiable given its audience. The rough behaviours I want stress to support are:
> * Ability to know exactly how many rows it will produce, for any clustering prefix, without generating those prefixes
> * Ability to generate an amount of data proportional to the amount it will produce to the server (or consume from the server), rather than proportional to the variation in clustering columns
> * Ability to reliably produce near identical behaviour each run
> * Ability to understand complex overlays of operation types (LWT, Delete, Expiry, although perhaps not all implemented immediately, the framework for supporting them easily)
> * Ability to (with minimal internal state) understand the complete cluster state through overlays of multiple procedural generations
> * Ability to understand the in-flight state of in-progress operations (i.e. if we're applying a delete, understand that the delete may have been applied, and may not have been, for potentially multiple conflicting in flight operations)
> I think the necessary changes to support this would give us the _functional_ base to support all the functionality I can currently envisage stress needing. Before embarking on this (which I may attempt very soon), it would be helpful to get input from others as to features missing from stress that I haven't covered here that we will certainly want in the future, so that they can be factored in to the overall design and hopefully avoid another refactor one year from now, as its complexity is scaling each time, and each time it is a higher sunk cost. [~jbellis] [~iamaleksey] [~slebresne] [~tjake] [~enigmacurry] [~aweisberg] [~blambov] [~jshook] ... and @everyone else :) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)