You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by sp...@apache.org on 2017/03/15 19:00:38 UTC

cassandra git commit: Initial docs for stress

Repository: cassandra
Updated Branches:
  refs/heads/trunk 9bd482d36 -> 5a8983c14


Initial docs for stress

patch by Christopher Batey; reviewed by Stefan Podkowinski for CASSANDRA-12365


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/5a8983c1
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/5a8983c1
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/5a8983c1

Branch: refs/heads/trunk
Commit: 5a8983c1486a0c1021b7138d200e24e933e59f3a
Parents: 9bd482d
Author: Christopher Batey <ch...@gmail.com>
Authored: Wed Aug 31 19:51:05 2016 +0100
Committer: Stefan Podkowinski <s....@gmail.com>
Committed: Wed Mar 15 19:58:49 2017 +0100

----------------------------------------------------------------------
 doc/source/development/testing.rst        |   2 +-
 doc/source/tools/cassandra_stress.rst     | 240 +++++++++++++++++++++++++
 doc/source/tools/example-stress-graph.png | Bin 0 -> 359103 bytes
 doc/source/tools/index.rst                |   1 +
 doc/source/tools/stress-example.yaml      |  43 +++++
 tools/stress/README.txt                   |  85 +--------
 6 files changed, 288 insertions(+), 83 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/cassandra/blob/5a8983c1/doc/source/development/testing.rst
----------------------------------------------------------------------
diff --git a/doc/source/development/testing.rst b/doc/source/development/testing.rst
index f95ec1d..e11989a 100644
--- a/doc/source/development/testing.rst
+++ b/doc/source/development/testing.rst
@@ -76,7 +76,7 @@ Performance tests for Cassandra are a special breed of tests that are not part o
 Cassandra Stress Tool
 ---------------------
 
-TODO: `CASSANDRA-12365 <https://issues.apache.org/jira/browse/CASSANDRA-12365>`_
+See :ref:`cassandra_stress`
 
 cstar_perf
 ----------

http://git-wip-us.apache.org/repos/asf/cassandra/blob/5a8983c1/doc/source/tools/cassandra_stress.rst
----------------------------------------------------------------------
diff --git a/doc/source/tools/cassandra_stress.rst b/doc/source/tools/cassandra_stress.rst
new file mode 100644
index 0000000..417288f
--- /dev/null
+++ b/doc/source/tools/cassandra_stress.rst
@@ -0,0 +1,240 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: yaml
+
+.. _cassandra_stress:
+
+Cassandra Stress
+----------------
+
+cassandra-stress is a tool for benchmarking and load testing a Cassandra
+cluster. cassandra-stress supports testing arbitrary CQL tables and queries
+to allow users to benchmark their data model.
+
+This documentation focuses on user mode as this allows the testing of your
+actual schema. 
+
+Usage
+^^^^^
+There are several operation types:
+
+    * write-only, read-only, and mixed workloads of standard data
+    * write-only and read-only workloads for counter columns
+    * user configured workloads, running custom queries on custom schemas
+
+The syntax is `cassandra-stress <command> [options]`. If you want more information on a given command
+or options, just run `cassandra-stress help <command|option>`.
+
+Commands:
+    read:
+        Multiple concurrent reads - the cluster must first be populated by a write test
+    write:
+        Multiple concurrent writes against the cluster
+    mixed:
+        Interleaving of any basic commands, with configurable ratio and distribution - the cluster must first be populated by a write test
+    counter_write:
+        Multiple concurrent updates of counters.
+    counter_read:
+        Multiple concurrent reads of counters. The cluster must first be populated by a counterwrite test.
+    user:
+        Interleaving of user provided queries, with configurable ratio and distribution.
+    help:
+        Print help for a command or option
+    print:
+        Inspect the output of a distribution definition
+    legacy:
+        Legacy support mode
+
+Primary Options:
+    -pop:
+        Population distribution and intra-partition visit order
+    -insert:
+        Insert specific options relating to various methods for batching and splitting partition updates
+    -col:
+        Column details such as size and count distribution, data generator, names, comparator and if super columns should be used
+    -rate:
+        Thread count, rate limit or automatic mode (default is auto)
+    -mode:
+        Thrift or CQL with options
+    -errors:
+        How to handle errors when encountered during stress
+    -sample:
+        Specify the number of samples to collect for measuring latency
+    -schema:
+        Replication settings, compression, compaction, etc.
+    -node:
+        Nodes to connect to
+    -log:
+        Where to log progress to, and the interval at which to do it
+    -transport:
+        Custom transport factories
+    -port:
+        The port to connect to cassandra nodes on
+    -sendto:
+        Specify a stress server to send this command to
+    -graph:
+        Graph recorded metrics
+    -tokenrange:
+        Token range settings
+
+
+Suboptions:
+    Every command and primary option has its own collection of suboptions. These are too numerous to list here.
+    For information on the suboptions for each command or option, please use the help command,
+    `cassandra-stress help <command|option>`.
+
+User mode
+^^^^^^^^^
+
+User mode allows you to use your stress your own schemas. This can save time in
+the long run rather than building an application and then realising your schema
+doesn't scale.
+
+Profile
++++++++
+
+User mode requires a profile defined in YAML. 
+
+The keyspace for the test::
+
+  keyspace: staff
+
+CQL for the keyspace. Optional if the keyspace already exists::
+
+  keyspace_definition: |
+   CREATE KEYSPACE stresscql WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};
+
+The table to be stressed::
+  
+  table: staff_activities
+
+CQL for the table. Optional if the table already exists::
+
+  table_definition: |
+    CREATE TABLE staff_activities (
+        name text,
+        when timeuuid,
+        what text,
+        PRIMARY KEY(name, when, what)
+    ) 
+
+
+Optional meta information on the generated columns in the above table.
+The min and max only apply to text and blob types.
+The distribution field represents the total unique population
+distribution of that column across rows::
+
+    columnspec:
+      - name: name
+        size: uniform(5..10) # The names of the staff members are between 5-10 characters
+        population: uniform(1..10) # 10 possible staff members to pick from
+      - name: when
+        cluster: uniform(20..500) # Staff members do between 20 and 500 events
+      - name: what
+        size: normal(10..100,50)
+
+Supported types are:
+
+An exponential distribution over the range [min..max]::
+
+    EXP(min..max)
+
+An extreme value (Weibull) distribution over the range [min..max]::
+
+    EXTREME(min..max,shape)
+
+A gaussian/normal distribution, where mean=(min+max)/2, and stdev is (mean-min)/stdvrng::
+
+    GAUSSIAN(min..max,stdvrng)
+
+A gaussian/normal distribution, with explicitly defined mean and stdev::
+
+    GAUSSIAN(min..max,mean,stdev)
+
+A uniform distribution over the range [min, max]::
+
+    UNIFORM(min..max)
+
+A fixed distribution, always returning the same value::
+
+    FIXED(val)
+      
+If preceded by ~, the distribution is inverted
+
+Defaults for all columns are size: uniform(4..8), population: uniform(1..100B), cluster: fixed(1)
+
+Insert distributions::
+
+    insert:
+      # How many partition to insert per batch
+      partitions: fixed(1)
+      # How many rows to update per partition
+      select: fixed(1)/500
+      # UNLOGGED or LOGGED batch for insert
+      batchtype: UNLOGGED
+
+
+Currently all inserts are done inside batches.
+
+Read statements to use during the test::
+
+    queries:
+       events:
+          cql: select *  from staff_activities where name = ?
+          fields: samerow
+       latest_event:
+          cql: select * from staff_activities where name = ?  LIMIT 1
+          fields: samerow
+
+Running a user mode test::
+
+    cassandra-stress user profile=./example.yaml duration=1m "ops(insert=1,latest_event=1,events=1)" truncate=once
+
+This will create the schema then run tests for 1 minute with an equal number of inserts, latest_event queries and events
+queries. Additionally the table will be truncated once before the test.
+
+The full example can be found here :download:`yaml <./stress-example.yaml>`
+
+Graphing
+^^^^^^^^
+
+Graphs can be generated for each run of stress.
+
+.. image:: example-stress-graph.png
+
+To create a new graph::
+
+    cassandra-stress user profile=./stress-example.yaml "ops(insert=1,latest_event=1,events=1)" -graph file=graph.html title="Awesome graph"
+
+To add a new run to an existing graph point to an existing file and add a revision name::
+
+    cassandra-stress user profile=./stress-example.yaml duration=1m "ops(insert=1,latest_event=1,events=1)" -graph file=graph.html title="Awesome graph" revision="Second run"
+
+FAQ
+^^^^
+
+**How do you use NetworkTopologyStrategy for the keyspace?**
+
+Use the schema option making sure to either escape the parenthesis or enclose in quotes::
+
+    cassandra-stress write -schema "replication(strategy=NetworkTopologyStrategy,datacenter1=3)"
+
+**How do you use SSL?**
+
+Use the transport option::
+
+    cassandra-stress "write n=100k cl=ONE no-warmup" -transport "truststore=$HOME/jks/truststore.jks truststore-password=cassandra"
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/cassandra/blob/5a8983c1/doc/source/tools/example-stress-graph.png
----------------------------------------------------------------------
diff --git a/doc/source/tools/example-stress-graph.png b/doc/source/tools/example-stress-graph.png
new file mode 100644
index 0000000..a65b08b
Binary files /dev/null and b/doc/source/tools/example-stress-graph.png differ

http://git-wip-us.apache.org/repos/asf/cassandra/blob/5a8983c1/doc/source/tools/index.rst
----------------------------------------------------------------------
diff --git a/doc/source/tools/index.rst b/doc/source/tools/index.rst
index bdb98fd..20a5383 100644
--- a/doc/source/tools/index.rst
+++ b/doc/source/tools/index.rst
@@ -24,3 +24,4 @@ This section describes the command line tools provided with Apache Cassandra.
 
    cqlsh
    nodetool/nodetool
+   cassandra_stress

http://git-wip-us.apache.org/repos/asf/cassandra/blob/5a8983c1/doc/source/tools/stress-example.yaml
----------------------------------------------------------------------
diff --git a/doc/source/tools/stress-example.yaml b/doc/source/tools/stress-example.yaml
new file mode 100644
index 0000000..0384de2
--- /dev/null
+++ b/doc/source/tools/stress-example.yaml
@@ -0,0 +1,43 @@
+keyspace: example
+
+# Would almost always be network topology unless running something locally
+keyspace_definition: |
+  CREATE KEYSPACE example WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};
+
+table: staff_activities
+
+# The table under test. Start with a partition per staff member
+# Is this a good idea?
+table_definition: |
+  CREATE TABLE staff_activities (
+        name text,
+        when timeuuid,
+        what text,
+        PRIMARY KEY(name, when)
+  ) 
+
+columnspec:
+  - name: name
+    size: uniform(5..10) # The names of the staff members are between 5-10 characters
+    population: uniform(1..10) # 10 possible staff members to pick from 
+  - name: when
+    cluster: uniform(20..500) # Staff members do between 20 and 500 events
+  - name: what
+    size: normal(10..100,50)
+
+insert:
+  # we only update a single partition in any given insert 
+  partitions: fixed(1) 
+  # we want to insert a single row per partition and we have between 20 and 500
+  # rows per partition
+  select: fixed(1)/500 
+  batchtype: UNLOGGED             # Single partition unlogged batches are essentially noops
+
+queries:
+   events:
+      cql: select *  from staff_activities where name = ?
+      fields: samerow
+   latest_event:
+      cql: select * from staff_activities where name = ?  LIMIT 1
+      fields: samerow
+

http://git-wip-us.apache.org/repos/asf/cassandra/blob/5a8983c1/tools/stress/README.txt
----------------------------------------------------------------------
diff --git a/tools/stress/README.txt b/tools/stress/README.txt
index 585409e..355415b 100644
--- a/tools/stress/README.txt
+++ b/tools/stress/README.txt
@@ -1,92 +1,13 @@
 cassandra-stress
 ======
 
-Description
------------
-cassandra-stress is a tool for benchmarking and load testing a Cassandra
-cluster. cassandra-stress supports testing arbitrary CQL tables and queries
-to allow users to benchmark their data model.
-
 Setup
 -----
 Run `ant` from the Cassandra source directory, then cassandra-stress can be invoked from tools/bin/cassandra-stress.
 cassandra-stress supports benchmarking any Cassandra cluster of version 2.0+.
 
-Usage
------
-There are several operation types:
-
-    * write-only, read-only, and mixed workloads of standard data
-    * write-only and read-only workloads for counter columns
-    * user configured workloads, running custom queries on custom schemas
-    * support for legacy cassandra-stress operations
-
-The syntax is `cassandra-stress <command> [options]`. If you want more information on a given command
-or options, just run `cassandra-stress help <command|option>`.
-
-Commands:
-    read:
-        Multiple concurrent reads - the cluster must first be populated by a write test
-    write:
-        Multiple concurrent writes against the cluster
-    mixed:
-        Interleaving of any basic commands, with configurable ratio and distribution - the cluster must first be populated by a write test
-    counter_write:
-        Multiple concurrent updates of counters.
-    counter_read:
-        Multiple concurrent reads of counters. The cluster must first be populated by a counterwrite test.
-    user:
-        Interleaving of user provided queries, with configurable ratio and distribution.
-        See http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
-    help:
-        Print help for a command or option
-    print:
-        Inspect the output of a distribution definition
-    legacy:
-        Legacy support mode
-
-Primary Options:
-    -pop:
-        Population distribution and intra-partition visit order
-    -insert:
-        Insert specific options relating to various methods for batching and splitting partition updates
-    -col:
-        Column details such as size and count distribution, data generator, names, comparator and if super columns should be used
-    -rate:
-        Thread count, rate limit or automatic mode (default is auto)
-    -mode:
-        CQL transport options
-    -errors:
-        How to handle errors when encountered during stress
-    -sample:
-        Specify the number of samples to collect for measuring latency
-    -schema:
-        Replication settings, compression, compaction, etc.
-    -node:
-        Nodes to connect to
-    -log:
-        Where to log progress to, and the interval at which to do it
-    -transport:
-        Custom transport factories
-    -port:
-        The port to connect to cassandra nodes on
-    -sendto:
-        Specify a stress server to send this command to
-    -graph:
-        Graph recorded metrics
-    -tokenrange:
-        Token range settings
-
-
-Suboptions:
-    Every command and primary option has its own collection of suboptions. These are too numerous to list here.
-    For information on the suboptions for each command or option, please use the help command,
-    `cassandra-stress help <command|option>`.
+Usage & Examples
+----------------
 
-Examples
---------
+See: https://cassandra.apache.org/doc/latest/tools/cassandra_stress.html
 
-    * tools/bin/cassandra-stress write n=1000000 -node 192.168.1.101 # 1M inserts to given host
-    * tools/bin/cassandra-stress read n=10000000 -node 192.168.1.101 -o read # 1M reads
-    * tools/bin/cassandra-stress write -node 192.168.1.101,192.168.1.102 n=10000000 # 10M inserts spread across two nodes
-    * tools/bin/cassandra-stress help -pop # Print help for population distribution option