You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Wei Deng (JIRA)" <ji...@apache.org> on 2016/03/17 05:48:33 UTC

[jira] [Created] (CASSANDRA-11366) Compaction backlog triggered backpressure to write operations

Wei Deng created CASSANDRA-11366:
------------------------------------

             Summary: Compaction backlog triggered backpressure to write operations
                 Key: CASSANDRA-11366
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11366
             Project: Cassandra
          Issue Type: Improvement
          Components: Local Write-Read Paths
            Reporter: Wei Deng


A general perception about Cassandra is that it is super efficient in taking writes and can handle very high write throughput, due to its simple and efficient commitlog/memtable write path. However, one aspect that is often overlooked by new Cassandra user is that when they ingest a lot of data at the highest throughput possible trying to push the hardware's limit, as soon as the compaction starts to kick in, a seemingly harmless write workload could create way more I/O and CPU consumption than their hardware can accommodate, and as a result, memtable flushes take much longer to finish due to I/O contentions from the compaction, thousands of SSTables start to show up on data disk because compaction is not able to process them all in time, and GC pauses are more frequent and impactful because read will have to touch way more SSTables than ideal situation. Depending on the compaction strategy a Cassandra table chooses, this kind of overwhelmed and backlogged situation can sometimes take a long time to clear up (for LeveledCompactionStrategy, or LCS, it could be hours) even after the write workload is removed.

Currently write has a back pressure mechanism called load shedding, and it will be triggered if the MutationStage gets too overwhelmed and a mutation takes more than 2 seconds to get acknowledgement, in that case, the mutation message will be dropped and a hint will be added by StorageProxy. However, this mechanism does not take into consideration of compaction backlogs. As a result, the compaction can be very much behind under heavy write workload and continue to accumulate more and more pending compactions and unprocessed SSTables, while the writes are flowing at the same rate as write timeouts are not triggered yet. For LCS and DTCS this can get out of control and become hard to recover. As a practical workaround, you can monitor the number of pending compactions, and if it starts going on an upward trend, reduce the write throughput. This has proved to be effectively. However, these two types of activities are usually conducted by different teams in an enterprise, so they are not always easy to coordinate. If there is some back pressure mechanism (to slow down the intake of the writes) that can be automatically triggered based on how much compaction is behind, it will make DBA's life easier.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)