You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jason Brown (JIRA)" <ji...@apache.org> on 2017/01/03 22:26:58 UTC

[jira] [Commented] (CASSANDRA-13039) Mutation time mostly spent in LinkedBlockingQueue.put() when writing with ONE

    [ https://issues.apache.org/jira/browse/CASSANDRA-13039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796392#comment-15796392 ] 

Jason Brown commented on CASSANDRA-13039:
-----------------------------------------

[~iksaif] For background, we used to do one-off {{take()}} s from the LBQ, but we switched to grabbing elements in bulk (via {{#drainTo()}}) with CASSANDRA-1632 (which was before coalescing was introduced).

> Mutation time mostly spent in LinkedBlockingQueue.put() when writing with ONE
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13039
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13039
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination
>            Reporter: Corentin Chary
>         Attachments: mutation-linkedlist-block.png, profiler-snapshot.nps
>
>
> On a setup with a sustained write load of 70kQPS per node and a RF of 2 it looks like most of the mutation time is spend in OutboundTcpConnection.enqueue() -> backlog.put()
> backlog is an unbounded LinkedBlockingQueue, which means that .put() can only be blocking if a lock is taken. I strongly suspect that this is caused by the use of drainTo() in CoalescingStrategies which is causing contention for the producers.
> On the other hand, not using drainTo() could lead to starvation of the consumers.
> Possible solutions:
> - Allow multiple connections per size and per hosts in OutboundTcpConnectionPool
> - Switch from drainTo to multiple take()
> - Switch to ConcurrentLinkedQueue (which is lockless), also means we need active polling.
> Maybe a good solution would be something hybrid: a bounded LinkedBlockingQueue and an unbounded ConcurrentLinkedQueue. This way you get low latency when you don't have a lot of messages, and throughput when you do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)