You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jeff Jirsa (JIRA)" <ji...@apache.org> on 2016/03/17 06:46:33 UTC

[jira] [Commented] (CASSANDRA-11366) Compaction backlog triggered backpressure to write operations

    [ https://issues.apache.org/jira/browse/CASSANDRA-11366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198786#comment-15198786 ] 

Jeff Jirsa commented on CASSANDRA-11366:
----------------------------------------

There exist other times when pending compactions will increase (notably bootstrapping / repair, where streaming - especially in a vnode situation - will create dozens/hundreds/thousands of sstables. If you apply backpressure and drop mutations inbound in these situations, you'll end up needing to repair, which will again create tons of sstables, which will create more backpressure, which could start a vicious cycle.

As an operator, I would strongly prefer the existing behavior to having one-more-source-of-backpressure to try to reason about - the right answer is for operations teams to monitor pending compactions and counts of sstables and adjust accordingly (especially since adjust may mean tuning concurrent compactors or compaction throughput, not throttling writes). 
 

> Compaction backlog triggered backpressure to write operations
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-11366
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11366
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths
>            Reporter: Wei Deng
>
> A general perception about Cassandra is that it is super efficient in taking writes and can handle very high write throughput, due to its simple and efficient commitlog/memtable write path. However, one aspect that is often overlooked by new Cassandra user is that when they ingest a lot of data at the highest throughput possible trying to push the hardware's limit, as soon as the compaction starts to kick in, a seemingly harmless write workload could create way more I/O and CPU consumption than their hardware can accommodate, and as a result, memtable flushes take much longer to finish due to I/O contentions from the compaction, thousands of SSTables start to show up on data disk because compaction is not able to process them all in time, and GC pauses are more frequent and impactful because read will have to touch way more SSTables than ideal situation. Depending on the compaction strategy a Cassandra table chooses, this kind of overwhelmed and backlogged situation can sometimes take a long time to clear up (for LeveledCompactionStrategy, or LCS, it could be hours) even after the write workload is removed.
> Currently write has a back pressure mechanism called load shedding, and it will be triggered if the MutationStage gets too overwhelmed and a mutation takes more than 2 seconds to get acknowledgement, in that case, the mutation message will be dropped and a hint will be added by StorageProxy. However, this mechanism does not take into consideration of compaction backlogs. As a result, the compaction can be very much behind under heavy write workload and continue to accumulate more and more pending compactions and unprocessed SSTables, while the writes are flowing at the same rate as write timeouts are not triggered yet. For LCS and DTCS this can get out of control and become hard to recover. As a practical workaround, you can monitor the number of pending compactions, and if it starts going on an upward trend, reduce the write throughput. This has proved to be effectively. However, these two types of activities are usually conducted by different teams in an enterprise, so they are not always easy to coordinate. If there is some back pressure mechanism (to slow down the intake of the writes) that can be automatically triggered based on how much compaction is behind, it will make DBA's life easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)