You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Sumanth Pasupuleti (JIRA)" <ji...@apache.org> on 2019/04/01 06:06:00 UTC

[jira] [Commented] (CASSANDRA-15049) Requests blocked at NTR stage should be rejected

    [ https://issues.apache.org/jira/browse/CASSANDRA-15049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806417#comment-16806417 ] 

Sumanth Pasupuleti commented on CASSANDRA-15049:
------------------------------------------------

FYI, I have submitted a patch on CASSANDRA-15013.

> Requests blocked at NTR stage should be rejected
> ------------------------------------------------
>
>                 Key: CASSANDRA-15049
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15049
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Michaël Figuière
>            Priority: Normal
>
> CASSANDRA-11363 has emphasized that if the NTR stage's thread pool and queue are full, the Netty Event Loops may block waiting on the NTR queue. The solution that was brought in CASSANDRA-11363 was to increase the default queue size from 128 to 1024. This significantly reduced the number of blocked requests observed but hasn't removed the problem entirely. Whenever a Netty Event Loop is blocked, the responsiveness of Cassandra is significantly impacted so it seems inappropriate to rely solely on increasing this queue size until everything looks fine... at the time the tuning was done.
> In fact, this situation looks exactly like the definition of the {{Overloaded}} error of the CQL Protocol:
> {code:java}
> 0x1001 Overloaded: the request cannot be processed because the
> 	coordinator node is overloaded{code}
> Therefore, whenever a request can't make it to the NTR stage, it should be rejected with an {{Overloaded}} error to the client. This can be done at low cost as we're already in the Netty Event Loop owning the channel to that client.
> It would then be the client responsibility to retry with another coordinator, which is likely to lead to a better P99 latency than blocking on an already too long queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org