You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Ben Hood <0x...@gmail.com> on 2014/02/05 15:14:23 UTC

CQL flow control

Hi,

A discussion has arisen in the gocql team about how to handle
saturation when CQL clients are sending in packets at a faster rate
than the Cassandra cluster can sustain.

What is the general approach to this from a server perspective? Is
there any flow control that the server can apply to back pressure onto
the sending driver? If this were the case, you could code the driver
to propagate this back pressure onto the sending app.

If not, how do other driver implementors view this situation? Do you
try to maintain some kind of flow control at the driver level so that
you can push back onto the app, or you just let the effects of IO
saturation just bubble up to the app?

Cheers,

Ben

Re: CQL flow control

Posted by Ben Hood <0x...@gmail.com>.

On Wed, Feb 5, 2014 at 7:32 PM, Edward Capriolo <ed...@gmail.com> wrote:
> I agree you can not really ask your database to capacity plan for you.
> Cassandra does have backpressure of sorts if  requests fail with
> TimedOutException or UnavailableException. You might be having a capacity
> problem.
>
> The way I would handle this is
> 1) prototype at scale (dark launches, similar hardware loaded with data you
> expect in production)
> 2) collect stats like 95 percentile response time, request/failures.
>
> When your 95 percentile starts dipping this is a good indication that it is
> time to deal with the performance issue.

This is a good point when you're assessing the end to end flow control
of a particular app. As I was saying to Rob, I'm looking at what
options a driver can provide, if any.

Re: CQL flow control

Posted by Edward Capriolo <ed...@gmail.com>.

I agree you can not really ask your database to capacity plan for you.
Cassandra does have backpressure of sorts if  requests fail with
TimedOutException or UnavailableException. You might be having a capacity
problem.

The way I would handle this is
1) prototype at scale (dark launches, similar hardware loaded with data you
expect in production)
2) collect stats like 95 percentile response time, request/failures.

When your 95 percentile starts dipping this is a good indication that it is
time to deal with the performance issue.

On Wed, Feb 5, 2014 at 1:55 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Wed, Feb 5, 2014 at 6:14 AM, Ben Hood <0x...@gmail.com> wrote:
>
>> What is the general approach to this from a server perspective? Is
>> there any flow control that the server can apply to back pressure onto
>> the sending driver?
>
>
> No. In theory the client could look at dynamic snitch scores, I suppose,
> if the dynamic snitch worked right...
>
> For most clients, my belief is the only backpressure is that, once a node
> is severely overloaded, it will stop attempting to write hints and return
> an OverloadedException. But this is only on the hint write path, not the
> normal write path.
>
>
>> If not, how do other driver implementors view this situation? Do you
>> try to maintain some kind of flow control at the driver level so that
>> you can push back onto the app, or you just let the effects of IO
>> saturation just bubble up to the app?
>>
>
> I think most deploys of Cassandra deal with this reality by carefully
> managing available capacity so that they don't risk getting in this
> situation.
>
> I understand that is not a technical solution appropriate to your
> question's scope, but I do believe it describes the status quo.
>
> =Rob
>

Re: CQL flow control

Posted by Ben Hood <0x...@gmail.com>.

On Wed, Feb 5, 2014 at 6:55 PM, Robert Coli <rc...@eventbrite.com> wrote:
> I think most deploys of Cassandra deal with this reality by carefully
> managing available capacity so that they don't risk getting in this
> situation.

This is what I have done in my production apps. Basically I have found
the system's sweet spot by calibrating the sustainable throughput and
then I've used netlink to shape the ingress into the CQL drivers.

That said, I was asking this question from the perspective of
implementing the CQL driver, which needs to take a far more generic
approach. In the end it won't free people up from properly assessing
end to end flow control in their apps, but I was looking at ways for
the driver to push back more neatly onto the sending app.

Re: CQL flow control

Posted by Robert Coli <rc...@eventbrite.com>.

On Wed, Feb 5, 2014 at 6:14 AM, Ben Hood <0x...@gmail.com> wrote:

> What is the general approach to this from a server perspective? Is
> there any flow control that the server can apply to back pressure onto
> the sending driver?

No. In theory the client could look at dynamic snitch scores, I suppose, if
the dynamic snitch worked right...

For most clients, my belief is the only backpressure is that, once a node
is severely overloaded, it will stop attempting to write hints and return
an OverloadedException. But this is only on the hint write path, not the
normal write path.

> If not, how do other driver implementors view this situation? Do you
> try to maintain some kind of flow control at the driver level so that
> you can push back onto the app, or you just let the effects of IO
> saturation just bubble up to the app?
>

I think most deploys of Cassandra deal with this reality by carefully
managing available capacity so that they don't risk getting in this
situation.

I understand that is not a technical solution appropriate to your
question's scope, but I do believe it describes the status quo.

=Rob