You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Fred Habash <fm...@gmail.com> on 2019/12/10 14:56:47 UTC

Predicting Read/Write Latency as a Function of Total Requests & Cluster Size

I'm looking for an empirical way to answer these two question:

1. If I increase application work load (read/write requests) by some
percentage, how is it going to affect read/write latency. Of course, all
other factors remaining constant e.g. ec2 instance class, ssd specs, number
of nodes, etc.

2) How many nodes do I have to add to maintain a given read/write latency?

Are there are any methods or instruments out there that can help answer
these que



----------------------------------------
Thank you

Re: Predicting Read/Write Latency as a Function of Total Requests & Cluster Size

Posted by Peter Corless <pe...@scylladb.com>.
The theoretical answer involves Little's Law
<https://en.wikipedia.org/wiki/Little%27s_law> (*L=λW*). But the practical
experience is, as you say, dependent on a fair number of factors. We wrote
a recent blog
<https://www.scylladb.com/2019/11/20/maximizing-performance-via-concurrency-while-minimizing-timeouts-in-distributed-databases/>
that
should be applicable to your thought processes about parallelism,
throughput, latency, and timeouts.

Earlier this year, we also wrote a blog about sizing Scylla clusters
<https://www.scylladb.com/2019/06/20/sizing-up-your-scylla-cluster/> that
touches on latency and throughput. For example a general rule of thumb is
that with the current generation of Intel cores, for payloads of <1kb you
can get ~12.5k ops/core with Scylla. If there are similar blogs about
sizing Cassandra clusters, I'd be interested in reading them as well!

Also, in terms of latency, I want to point out that there is a great deal
dependent on the nature of your data, queries and caching. For example, if
you have a very low cache hit rate, expect greater latencies — data will
still need to be read from storage even if you add more nodes.

On Tue, Dec 10, 2019 at 6:57 AM Fred Habash <fm...@gmail.com> wrote:

> I'm looking for an empirical way to answer these two question:
>
> 1. If I increase application work load (read/write requests) by some
> percentage, how is it going to affect read/write latency. Of course, all
> other factors remaining constant e.g. ec2 instance class, ssd specs, number
> of nodes, etc.
>
> 2) How many nodes do I have to add to maintain a given read/write latency?
>
> Are there are any methods or instruments out there that can help answer
> these que
>
>
>
> ----------------------------------------
> Thank you
>
>
>

-- 
Peter Corless
Technical Marketing Manager
ScyllaDB
e: peter@scylladb.com
t: @petercorless <https://twitter.com/PeterCorless>
v: 650-906-3134

Re: Predicting Read/Write Latency as a Function of Total Requests & Cluster Size

Posted by Reid Pinchback <rp...@tripadvisor.com>.
Latency SLAs are very much *not* Cassandra’s sweet spot, scaling throughput and storage is more where C*’s strengths shine.  If you want just median latency you’ll find things a bit more amenable to modeling, but not if you have 2 nines and particularly not 3 nines SLA expectations.  Basically, the harder you push on the nodes, the more you get sporadic but non-ignorable timing artifacts due to garbage collection and IO stalls when the flushing of the writes can choke out the disk reads.  Also, running in AWS, you’ll find that noisy neighbors are a routine issue no matter what the specifics of your use.

What your actual data model is, and what your patterns of reads and writes are, the impact of deletes and TTLs requiring tombstone cleanup, etc., all dramatically change the picture.

If you aren’t already aware of it, there is something called cassandra-stress that can help you do some experiments. The challenge though is determining if the experiments are representative of what your actual usage will be.  Because of the GC issues in anything implemented in a JVM or interpreter, it’s pretty easy to fall off the cliff of relevance.  TLP wrote an article about some of the challenges of this with cassandra-stress:

https://thelastpickle.com/blog/2017/02/08/Modeling-real-life-workloads-with-cassandra-stress.html

Note that one way to not have to care a lot about variable latency is to make use of speculative retry.  Basically you’re trading off some of your median throughput to help achieve a latency SLA.  The tradeoff benefit breaks down when you get to 3 nines.

I’m actually hoping to start on some modeling of what the latency surface looks like with different assumptions in the new year, not because I expect the specific numbers to translate to anybody else but just to show how the underyling dynamics evidence themselves in metrics when C* nodes are under duress.

R


From: Fred Habash <fm...@gmail.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Tuesday, December 10, 2019 at 9:57 AM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: Predicting Read/Write Latency as a Function of Total Requests & Cluster Size

Message from External Sender
I'm looking for an empirical way to answer these two question:

1. If I increase application work load (read/write requests) by some percentage, how is it going to affect read/write latency. Of course, all other factors remaining constant e.g. ec2 instance class, ssd specs, number of nodes, etc.

2) How many nodes do I have to add to maintain a given read/write latency?

Are there are any methods or instruments out there that can help answer these que



----------------------------------------
Thank you