You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jan Karlsson (JIRA)" <ji...@apache.org> on 2017/03/23 19:47:41 UTC

[jira] [Commented] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account

    [ https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15939107#comment-15939107 ] 

Jan Karlsson commented on CASSANDRA-13354:
------------------------------------------

I did some tests simulating traffic on a 4 node cluster. 2 of the nodes were running with my patch while the other two ran without it.
Steps to reproduce:
Traffic on
Turn one of the nodes off
Wait 7 minutes
Truncate hints on all other nodes
Turn node on
Run repair on the node

As you can see the unpatched version kept increasing as non-repaired data from ongoing traffic was prioritized. If I had more discrepancies in my data set, this would just increase to the configured FD limit or until you die from heap pressure.

Repair is completed at 8:11pm but those small repaired files are not compacted as it picks unrepaired new sstables over the small repaired sstables. However, it did show a downwards trend as compaction was slightly faster than insertion and would probably eventually end with the repaired files compacted.

During the unpatched test, it only showed 2 pending compactions with 22k~ file descriptors open/10k~ sstables. At 8:33pm I disabled the traffic completely to hurry this along.
SSTables in each level: [10347/4, 5, 0, 0, 0, 0, 0, 0, 0]

> LCS estimated compaction tasks does not take number of files into account
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13354
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13354
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction
>         Environment: Cassandra 2.2.9
>            Reporter: Jan Karlsson
>            Assignee: Jan Karlsson
>         Attachments: 13354-trunk.txt, patchedTest.png, unpatchedTest.png
>
>
> In LCS, the way we estimate number of compaction tasks remaining for L0 is by taking the size of a SSTable and multiply it by four. This would give 4*160mb with default settings. This calculation is used to determine whether repaired or repaired data is being compacted.
> Now this works well until you take repair into account. Repair streams over many many sstables which could be smaller than the configured SSTable size depending on your use case. In our case we are talking about many thousands of tiny SSTables. As number of files increases one can run into any number of problems, including GC issues, too many open files or plain increase in read latency.
> With the current algorithm we will choose repaired or unrepaired depending on whichever side has more data in it. Even if the repaired files outnumber the unrepaired files by a large margin.
> Similarily, our algorithm that selects compaction candidates takes up to 32 SSTables at a time in L0, however our estimated task calculation does not take this number into account. These two mechanisms should be aligned with each other.
> I propose that we take the number of files in L0 into account when estimating remaining tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)