You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2014/12/17 00:05:13 UTC
[jira] [Resolved] (CASSANDRA-7552) Compactions Pending build up when using LCS

     [ https://issues.apache.org/jira/browse/CASSANDRA-7552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-7552.
---------------------------------------
    Resolution: Not a Problem
      Assignee:     (was: Yuki Morishita)

"LCS falls behind" is an expected condition under heavy write load.  (Even STCS can fall behind, but it will recover faster.)

> Compactions Pending build up when using LCS
> -------------------------------------------
>
>                 Key: CASSANDRA-7552
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7552
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Darla Baker
>
> We seem to be hitting an issue with LeveledCompactionStrategy while running performance tests on a 4 node cassandra installation. We are currently using Cassandra 2.0.7.
> In summary, we run a tests consisting of approximatively, 8000 inserts/sec, 16,000 gets/sec, and 8,000 deletes/sec. We have a grace period of 12 hours on our column families.
> At this rate, we observe a stable pending compaction tasks for about 22 to 26 hours. After that period, something happens and the pending compaction tasks starts to increase rapidly, sometimes on one or two servers, but sometimes on all four of them. This goes on until the uncompacted SStables start consuming all the disk space, after which the cassandra cluster generally fails.
> When this occurs, the Compaction completed tasks rate is usually reducing over time, which seems to indicate that it takes more and more time to run the existing compaction tasks.
> At different occasions, I can reproduce a similar issue in less than 12 hours. While the traffic rate remains constant, we seem to be hitting this at various intervals. Yesterday I could reproduce in less than 6 hours.
> We have two different deployments on which we have tested this issue: 
> 1. 4x IBM HS22, using RAMDISK as cassandra data directory (thus eliminating disk I/O) 
> 2. 8x IBM HS23, with SSD disks, deployed in two "geo-redundant" data centers of 4 nodes each, and a latency of 50ms between the data centers.
> I can reproduce the "compaction tasks falling behind" on both these setup, although they could be occurring for different reasons. Because of #1, I do not believe we are hitting an I/O bottleneck just yet.
> As an additional interesting node, if I artificially pause the traffic when I see the pending compaction task issue occurring, then: 
> 1. The pending compaction tasks obviously stops to increase, but stay at the same number for 15 minutes (as if nothing is running). 
> 2. The completed compaction tasks falls to 0 for 15 minutes 
> 3. After 15 to 20 minutes, out of the blue, all compaction completes in less than 2 minutes.
> If I restart the traffic after that, the system is stable for a few hours, but the issue always comes back.
> We have written a small test tool that reproduce our application's Cassandra interaction.
> We have not successfully run a test for more than 30 hours under load, and every failure after that time would follow a similar pattern.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)