You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Yuki Morishita (JIRA)" <ji...@apache.org> on 2014/07/16 00:45:07 UTC
[jira] [Commented] (CASSANDRA-7552) Compactions Pending build up when using LCS

    [ https://issues.apache.org/jira/browse/CASSANDRA-7552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062790#comment-14062790 ] 

Yuki Morishita commented on CASSANDRA-7552:
-------------------------------------------

Number of compaction pending tasks in LCS is estimated from total bytes of SSTable files in each level so that lower levels get more 'pending tasks' than higher level to show compaction is behind(https://github.com/apache/cassandra/blob/cassandra-2.0.7/src/java/org/apache/cassandra/db/compaction/LeveledManifest.java#L526).
I guess you have more writes that pile many SSTables on L0 than node with LCS can handle.

Check your SSTable level distibution using nodetool cfstats.

> Compactions Pending build up when using LCS
> -------------------------------------------
>
>                 Key: CASSANDRA-7552
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7552
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Darla Baker
>            Assignee: Yuki Morishita
>
> We seem to be hitting an issue with LeveledCompactionStrategy while running performance tests on a 4 node cassandra installation. We are currently using Cassandra 2.0.7.
> In summary, we run a tests consisting of approximatively, 8000 inserts/sec, 16,000 gets/sec, and 8,000 deletes/sec. We have a grace period of 12 hours on our column families.
> At this rate, we observe a stable pending compaction tasks for about 22 to 26 hours. After that period, something happens and the pending compaction tasks starts to increase rapidly, sometimes on one or two servers, but sometimes on all four of them. This goes on until the uncompacted SStables start consuming all the disk space, after which the cassandra cluster generally fails.
> When this occurs, the Compaction completed tasks rate is usually reducing over time, which seems to indicate that it takes more and more time to run the existing compaction tasks.
> At different occasions, I can reproduce a similar issue in less than 12 hours. While the traffic rate remains constant, we seem to be hitting this at various intervals. Yesterday I could reproduce in less than 6 hours.
> We have two different deployments on which we have tested this issue: 
> 1. 4x IBM HS22, using RAMDISK as cassandra data directory (thus eliminating disk I/O) 
> 2. 8x IBM HS23, with SSD disks, deployed in two "geo-redundant" data centers of 4 nodes each, and a latency of 50ms between the data centers.
> I can reproduce the "compaction tasks falling behind" on both these setup, although they could be occurring for different reasons. Because of #1, I do not believe we are hitting an I/O bottleneck just yet.
> As an additional interesting node, if I artificially pause the traffic when I see the pending compaction task issue occurring, then: 
> 1. The pending compaction tasks obviously stops to increase, but stay at the same number for 15 minutes (as if nothing is running). 
> 2. The completed compaction tasks falls to 0 for 15 minutes 
> 3. After 15 to 20 minutes, out of the blue, all compaction completes in less than 2 minutes.
> If I restart the traffic after that, the system is stable for a few hours, but the issue always comes back.
> We have written a small test tool that reproduce our application's Cassandra interaction.
> We have not successfully run a test for more than 30 hours under load, and every failure after that time would follow a similar pattern.



--
This message was sent by Atlassian JIRA
(v6.2#6252)