You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jeff Jirsa (Jira)" <ji...@apache.org> on 2020/05/28 21:49:01 UTC

[jira] [Assigned] (CASSANDRA-12200) Backlogged compactions can make repair on trivially small tables waiting for a long time to finish

     [ https://issues.apache.org/jira/browse/CASSANDRA-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Jirsa reassigned CASSANDRA-12200:
--------------------------------------

    Assignee:     (was: Jeff Jirsa)

> Backlogged compactions can make repair on trivially small tables waiting for a long time to finish
> --------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-12200
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12200
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Legacy/Core
>            Reporter: Wei Deng
>            Priority: Normal
>
> In C* 3.0 we started to use incremental repair by default. However, this seems to create a repair performance problem if you have a relatively write-heavy workload that can drive all available concurrent_compactors to be used by active compactions.
> I was able to demonstrate this issue by the following scenario:
> 1. On a three-node C* 3.0.7 cluster, use "cassandra-stress write n=100000000" to generate 100GB of data with keyspace1.standard1 table using LCS (ctrl+c the stress client once the data size on each node reaches 35+GB).
> 2. At this point, there will be hundreds of L0 SSTables waiting for LCS to digest on each node, and with concurrent_compactors set to default at 2, the two compaction threads are constantly busy processing the backlogged L0 SSTables.
> 3. Now create a new keyspace called "trivial_ks" with RF=3 and create a small two-column CQL table in it, and insert 6 records.
> 4. Start a "nodetool repair trivial_ks" session on one of the nodes, and watch the following behavior:
> {noformat}
> automaton@wdengdse50google-98425b985-3:~$ nodetool repair trivial_ks
> [2016-07-13 01:57:28,364] Starting repair command #1, repairing keyspace trivial_ks with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
> [2016-07-13 01:57:31,027] Repair session 27212dd0-489d-11e6-a6d6-cd06faa0aaa2 for range [(3074457345618258602,-9223372036854775808], (-9223372036854775808,-3074457345618258603], (-3074457345618258603,3074457345618258602]] finished (progress: 66%)
> [2016-07-13 02:07:47,637] Repair completed successfully
> [2016-07-13 02:07:47,657] Repair command #1 finished in 10 minutes 19 seconds
> {noformat}
> Basically for such a small table it took 10+ minutes to finish the repair. Looking at debug.log for this particular repair session UUID, you will find that all nodes were able to pass through validation compaction within 15ms, but one of the nodes actually got stuck waiting for a compaction slot because it has to do an anti-compaction step before it can finally tell the initiating node that it's done with its part of the repair session, so it took 10+ minutes for one compaction slot to be freed up, like shown in the following debug.log entries:
> {noformat}
> DEBUG [AntiEntropyStage:1] 2016-07-13 01:57:30,956  RepairMessageVerbHandler.java:149 - Got anticompaction request AnticompactionRequest{parentRepairSession=27103de0-489d-11e6-a6d6-cd06faa0aaa2} org.apache.cassandra.repair.messages.AnticompactionRequest@34449ff4
> <...>
> <snip>
> <...>
> DEBUG [CompactionExecutor:5] 2016-07-13 02:07:47,506  CompactionTask.java:217 - Compacted (286609e0-489d-11e6-9e03-1fd69c5ec46c) 32 sstables to [/var/lib/cassandra/data/keyspace1/standard1-9c02e9c1487c11e6b9161dbd340a212f/mb-499-big,] to level=0.  2,892,058,050 bytes to 2,874,333,820 (~99% of original) in 616,880ms = 4.443617MB/s.  0 total partitions merged to 12,233,340.  Partition merge counts were {1:12086760, 2:146580, }
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,512  CompactionManager.java:511 - Starting anticompaction for trivial_ks.weitest on 1/[BigTableReader(path='/var/lib/cassandra/data/trivial_ks/weitest-538b07d1489b11e6a9ef61c6ff848952/mb-1-big-Data.db')] sstables
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,513  CompactionManager.java:540 - SSTable BigTableReader(path='/var/lib/cassandra/data/trivial_ks/weitest-538b07d1489b11e6a9ef61c6ff848952/mb-1-big-Data.db') fully contained in range (-9223372036854775808,-9223372036854775808], mutating repairedAt instead of anticompacting
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,570  CompactionManager.java:578 - Completed anticompaction successfully
> {noformat}
> Since validation compaction has its own threads outside of the regular compaction thread pool restricted by concurrent_compactors, we were able to pass through validation compaction without any issue. If we could treat anti-compaction the same way (i.e. to give it its own thread pool), we can avoid this kind of repair performance problem from happening.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org