You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2010/06/30 03:26:54 UTC

[jira] Commented: (CASSANDRA-685) add backpressure to StorageProxy

    [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883760#action_12883760 ] 

Jonathan Ellis commented on CASSANDRA-685:
------------------------------------------

Following the line of reasoning from My comment on Jan 22, I think the best thing to do is to take what we're doing now -- allowing TimedoutExceptions to serve as flow control -- but make how we deal with overload situations better so we don't have the current potential for a vicious cycle of getting farther and farther behind while RMS/RRS executors waste time processing requests for which the coordinator node long since stopped waiting for:

- uncap RMS and RRS executors.  instead,
- MessageDeserializer will check recent RMS/RRS throughput and will simply discard requests that won't make it through the task queue within RPCTimeout (preventing memory pressure from huge task queue backlog, i have seen upwards of 1.5M pendingtasks on MD)
- MD will tag requests with a timestamp as they arrive and RMS/RRS will again discard requests that have spent longer than RPCTimeout in the task queue
- log replay will have to self-throttle since RMS queue won't be doing it for it (it would be nice to deal with this by adjusting the queue size but concurrent queue sizes are fixed once created)

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.