You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ariel Weisberg (JIRA)" <ji...@apache.org> on 2014/12/17 19:17:14 UTC
[jira] [Comment Edited] (CASSANDRA-8503) Collect important stress profiles for regression analysis done by jenkins

    [ https://issues.apache.org/jira/browse/CASSANDRA-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250242#comment-14250242 ] 

Ariel Weisberg edited comment on CASSANDRA-8503 at 12/17/14 6:17 PM:
---------------------------------------------------------------------

I think there are two general classes of benchmarks you would run in CI. Representative user workloads, and targeted microbenchmark workloads. Targeted workloads are a huge help during ongoing development because they magnify the impact of regressions from code changes that are harder to notice in representative workloads. They also point to the specific subsystem being benchmarked.

I will just cover the microbenchmarks. The full matrix is large so there is an element of wanting ponies, but the reality is that they are all interesting from a preventing performance regressions and understanding the impact of ongoing changes perspective.

Benchmark the stress client, so excess server capacity and a single client testing lots of small messages. Lots of large messages. Stuff the servers can answer as fast as possible. The flip side of this workload is the same thing but for the server where you measure how many trivially answerable tiny queries you can shove through a cluster given excess client capacity. When testing the server this might also be when you test the matrix of replication and consistency levels.

Benchmark perfomance of non-prepared statements.

Benchmark performance of preparing statements?
 
A full test matrix for data intensive workloads would test read, write, and 50/50, and for a bonus 90/10. Single cell partitions with a small value and a large value, and a range of wide rows (small, medium, large). All 3 compaction strategies with compression on/off. Data intensive workloads also need to run against a spinning rust and SSDs.

CQL specific microbenchmarks against specific CQL datatypes. If there are interactions that are important we should capture those.

Counters

Lightweight transactions

The matrix also needs to include different permutations of replication strategies and consistency levels. Maybe we can constrain those variations to parts of the matrix that would best reflect the impact of replication strategies and CL. Probably a subset of the data intensive workloads.

Also want a workload targeting the row cache and key cache when everything is cached and when there is a realistic long tail of data not in the cache.

For every workload to really get the value you would like a graph for throughput and a graph for latency at some percentile with a data point per revision tested going back to the beginning as well as a 90 day graph. A trend line also helps. Then someone has to be it for monitoring the graphs and poking people when there is an issue.

The workflow usually goes something like the monitor tags the author of the suspected bad revision who triages it and either fixes it or hands it off to the correct person. Timeliness is really important because once regressions start stacking it's a pain to know whether you have done what you should to fix it.


was (Author: aweisberg):
I think there are two general classes of benchmarks you would run in CI. Representative user workloads, and targeted microbenchmark workloads. Targeted workloads are a huge help during ongoing development because they magnify the impact of regressions from code changes that are harder to notice in representative workloads. They also point to the specific subsystem being benchmarked.

I will just cover the microbenchmarks. The full matrix is large so there is an element of wanting ponies, but the reality is that they are all interesting from a preventing performance regressions and understanding the impact of ongoing changes perspective.

Benchmark the stress client, so excess server capacity and a single client testing lots of small messages. Lots of large messages. Stuff the servers can answer as fast as possible. The flip side of this workload is the same thing but for the server where you measure how many trivially answerable tiny queries you can shove through a cluster given excess client capacity.

Benchmark perfomance of non-prepared statements.

Benchmark performance of preparing statements?
 
A full test matrix for data intensive workloads would test read, write, and 50/50, and for a bonus 90/10. Single cell partitions with a small value and a large value, and a range of wide rows (small, medium, large). All 3 compaction strategies with compression on/off. Data intensive workloads also need to run against a spinning rust and SSDs.

CQL specific microbenchmarks against specific CQL datatypes. If there are interactions that are important we should capture those.

Counters

Lightweight transactions

The matrix also needs to include different permutations of replication strategies and consistency levels. Maybe we can constrain those variations to parts of the matrix that would best reflect the impact of replication strategies and CL. Probably a subset of the data intensive workloads.

Also want a workload targeting the row cache and key cache when everything is cached and when there is a realistic long tail of data not in the cache.

For every workload to really get the value you would like a graph for throughput and a graph for latency at some percentile with a data point per revision tested going back to the beginning as well as a 90 day graph. A trend line also helps. Then someone has to be it for monitoring the graphs and poking people when there is an issue.

The workflow usually goes something like the monitor tags the author of the suspected bad revision who triages it and either fixes it or hands it off to the correct person. Timeliness is really important because once regressions start stacking it's a pain to know whether you have done what you should to fix it.

> Collect important stress profiles for regression analysis done by jenkins
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8503
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8503
>             Project: Cassandra
>          Issue Type: Task
>            Reporter: Ryan McGuire
>            Assignee: Ryan McGuire
>
> We have a weekly job setup on CassCI to run a performance benchmark against the dev branches as well as the last stable releases.
> Here's an example:
> http://cstar.datastax.com/tests/id/8223fe2e-8585-11e4-b0bf-42010af0688f
> This test is currently pretty basic, it's running on three nodes, with a the default stress profile. We should crowdsource a collection of stress profiles to run, and then once we have many of these tests running we can collect them all into a weekly email.
> Ideas:
>  * Timeseries (Can this be done with stress? not sure)
>  * compact storage
>  * compression off
>  * ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)