You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Robert Wille <rw...@fold3.com> on 2014/11/25 16:57:01 UTC

Rule of thumb for concurrent asynchronous queries?

Suppose I have the primary keys for 10,000 rows and I want them all. Is there a rule of thumb for the maximum number of concurrent asynchronous queries I should execute?

Re: Rule of thumb for concurrent asynchronous queries?

Posted by Nikolai Grigoriev <ng...@gmail.com>.
I think it all depends on how many machines will be involved in the query
(read consistency is also a factor) and how long is a typical response in
bytes. Large responses will put more pressure on the GC, which will result
in more time spent in GC and possibly long(er) GC pauses.

Cassandra can tolerate many things - but at the cost for other queries and
all the way up to the heal of the individual node.

>From the original question it is not clear if all these rows are coming
from the same or few nodes (token range) or these are really 10K primary
keys - so they are spread more or less evenly across the cluster.

Also the node disk I/O may be a concern - especially if the data is not in
OS cache (or row cache if applicable).

I think it is a tough question to get a precise answer. If I had such a
problem I would try to determine the peak speed I can achieve first. I.e.
find the limiting factor (CPU or disk I/O most likely), then shoot as many
requests in as many threads as practical for the client app. Measure the
load to prove that you've determined the limiting factor correctly (either
CPU or I/O, I doubt it will be network). Then measure the latency and
decide what kind of latency you can tolerate for your use case. And then go
down from that peak load you've created by certain factor (i.e. limit
yourself to XX% of the peak load you have achieved).

On Tue, Nov 25, 2014 at 11:34 AM, Jack Krupansky <ja...@basetechnology.com>
wrote:

> Great question. The safe answer is to do a proof of concept implementation
> and try various rates to determine where the bottleneck is. It will also
> depend on the row size. Hard to say if you will be limited by the cluster
> load or network bandwidth.
>
> Is there only one client talking to your cluster? Or are you asking what
> each of, say, one million clients can be simultaneously requesting?
>
> The rate of requests will matter as well, particularly if the cluster has
> a non-trivial load.
>
> My ultimate rule of thumb is simple: Moderation. Not too many threads, not
> too frequent request rate.
>
> It would be nice if we had a way to calculate this number (both numbers)
> for you so that a client (driver) could ping for it from the cluster, as
> well as for the cluster to return a suggested wait interval before sending
> another request based on actual load.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Robert Wille
> Sent: Tuesday, November 25, 2014 10:57 AM
> To: user@cassandra.apache.org
> Subject: Rule of thumb for concurrent asynchronous queries?
>
> Suppose I have the primary keys for 10,000 rows and I want them all. Is
> there a rule of thumb for the maximum number of concurrent asynchronous
> queries I should execute?=
>



-- 
Nikolai Grigoriev
(514) 772-5178

Re: Rule of thumb for concurrent asynchronous queries?

Posted by Jack Krupansky <ja...@basetechnology.com>.
Great question. The safe answer is to do a proof of concept implementation 
and try various rates to determine where the bottleneck is. It will also 
depend on the row size. Hard to say if you will be limited by the cluster 
load or network bandwidth.

Is there only one client talking to your cluster? Or are you asking what 
each of, say, one million clients can be simultaneously requesting?

The rate of requests will matter as well, particularly if the cluster has a 
non-trivial load.

My ultimate rule of thumb is simple: Moderation. Not too many threads, not 
too frequent request rate.

It would be nice if we had a way to calculate this number (both numbers) for 
you so that a client (driver) could ping for it from the cluster, as well as 
for the cluster to return a suggested wait interval before sending another 
request based on actual load.

-- Jack Krupansky

-----Original Message----- 
From: Robert Wille
Sent: Tuesday, November 25, 2014 10:57 AM
To: user@cassandra.apache.org
Subject: Rule of thumb for concurrent asynchronous queries?

Suppose I have the primary keys for 10,000 rows and I want them all. Is 
there a rule of thumb for the maximum number of concurrent asynchronous 
queries I should execute?=