You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Corentin Chary (JIRA)" <ji...@apache.org> on 2016/12/13 09:56:58 UTC

[jira] [Created] (CASSANDRA-13039) Mutation time mostly spent in LinkedBlockingQueue.put()

Corentin Chary created CASSANDRA-13039:
------------------------------------------

             Summary: Mutation time mostly spent in LinkedBlockingQueue.put()
                 Key: CASSANDRA-13039
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13039
             Project: Cassandra
          Issue Type: Bug
          Components: Coordination
            Reporter: Corentin Chary
         Attachments: mutation-linkedlist-block.png, profiler-snapshot.nps

On a setup with a sustained write load of 70kQPS per node and a RF of 2 it looks like most of the mutation time is spend in OutboundTcpConnection.enqueue() -> backlog.put()

backlog is an unbounded LinkedBlockingQueue, which means that .put() can only be blocking if a lock is taken. I strongly suspect that this is caused by the use of drainTo() in CoalescingStrategies which is causing contention for the producers.

On the other hand, not using drainTo() could lead to starvation of the consumers.

Possible solutions:
- Allow multiple connections per size and per hosts in OutboundTcpConnectionPool
- Switch from drainTo to multiple take()
- Switch to ConcurrentLinkedQueue (which is lockless), also means we need active polling.

Maybe a good solution would be something hybrid: a bounded LinkedBlockingQueue and an unbounded ConcurrentLinkedQueue. This way you get low latency when you don't have a lot of messages, and throughput when you do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)