You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stu Hood (JIRA)" <ji...@apache.org> on 2011/02/18 05:57:12 UTC

[jira] Updated: (CASSANDRA-2191) Multithread across compaction buckets

     [ https://issues.apache.org/jira/browse/CASSANDRA-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2191:
--------------------------------

    Attachment: 0002-Use-the-compacting-set-of-sstables-to-schedule-multith.txt
                0001-Add-a-compacting-set-to-sstabletracker.txt

Patch to add a "compacting" set to the SSTableTracker which is atomically modified to schedule compactions. SSTables are removed from the compacting set in a finally block.

Also, converts the "compactionLock", which is only used by migrations (to completely stop compactions), to a read-write lock. Running compactions acquire as readers, migrations acquire as writer.

Implications: up to #num-procs compactions will run at once, possibly within the same bucket, but likely in different buckets.

This patch goes hand in hand with CASSANDRA-2156, which ensures that despite our multithreading, we don't trample other operations on the system.

> Multithread across compaction buckets
> -------------------------------------
>
>                 Key: CASSANDRA-2191
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2191
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Priority: Critical
>              Labels: compaction
>             Fix For: 0.8
>
>         Attachments: 0001-Add-a-compacting-set-to-sstabletracker.txt, 0002-Use-the-compacting-set-of-sstables-to-schedule-multith.txt
>
>
> This ticket overlaps with CASSANDRA-1876 to a degree, but the approaches and reasoning are different enough to open a separate issue.
> The problem with compactions currently is that they compact the set of sstables that existed the moment the compaction started. This means that for longer running compactions (even when running as fast as possible on the hardware), a very large number of new sstables might be created in the meantime. We have observed this proliferation of sstables killing performance during major/high-bucketed compactions.
> One approach would be to pause compactions in upper buckets (containing larger files) when compactions in lower buckets become possible. While this would likely solve the problem with read performance, it does not actually help us perform compaction any faster, which is a reasonable requirement for other situations.
> Instead, we need to be able to perform any compactions that are currently required in parallel, independent of what bucket they might be in.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira