You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ariel Weisberg (JIRA)" <ji...@apache.org> on 2017/04/06 21:47:41 UTC

[jira] [Updated] (CASSANDRA-13422) CompactionStrategyManager should take write not read lock when handling add/remove notifications

     [ https://issues.apache.org/jira/browse/CASSANDRA-13422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ariel Weisberg updated CASSANDRA-13422:
---------------------------------------
    Description: 
{{getNextBackgroundTask}} in various compaction strategies (definitely {{LCS}}) rely on checking the result of {{DataTracker.getCompacting()}} to avoid accessing data and metadata related to tables that have already head their resources released.

There is a race where this check is unreliable and will claim a table that has its resources already released is not compacting resulting in use after free.

[{{LeveledCompactionStrategy.findDroppableSSTable}}|https://github.com/apache/cassandra/blob/c794d2bed7ca1d10e13c4da08a3d45f5c755c1d8/src/java/org/apache/cassandra/db/compaction/LeveledCompactionStrategy.java#L504] for instance has this three part logical && condition where the first check is against the compacting set before calling {{worthDroppingTombstones}} which fails if the table has been released.

The order of events is basically that CompactionStrategyManager acquires the read lock in getNextBackgroundTask(), then proceeds eventually to findDroppableSSTable and acquires a set of SSTables from the manifest. While the manifest is thread safe it's not accessed atomically WRT to other operations. Once it has acquire the set of tables it acquires the (not atomically) the set of compacting SSTables and iterates checking the former against the latter.

Meanwhile other compaction threads are marking tables obsolete or compacted and releasing their references. Doing this removes them from {{DataTracker}} and publishes a notification to the strategies, but this notification only requires the read lock. After the compaction thread has published the notifications it eventually marks the table as not compacting in {{DataTracker}} or removes it entirely.

The race is then that the compaction thread generating a new background task acquires the sstables from the manifest on the stack. Any table in that set that was compacting at that time must remain compacting so that it can be skipped. Another compaction thread finishes a compaction and is able to remove the table from the manifest and then remove it from the compacting set. The thread generating the background task then acquires the list of compacting tables which doesn't include the table it is supposed to skip.

The simple fix appears to be to require threads to acquire the write lock in order to publish notifications of tables being added/removed from compaction strategies. While holding the write lock it won't be possible for someone to see a view of tables in the manifest where tables that are compacting aren't compacting in the view

> CompactionStrategyManager should take write not read lock when handling add/remove notifications
> ------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13422
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13422
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 4.0, 3.11.x
>
>
> {{getNextBackgroundTask}} in various compaction strategies (definitely {{LCS}}) rely on checking the result of {{DataTracker.getCompacting()}} to avoid accessing data and metadata related to tables that have already head their resources released.
> There is a race where this check is unreliable and will claim a table that has its resources already released is not compacting resulting in use after free.
> [{{LeveledCompactionStrategy.findDroppableSSTable}}|https://github.com/apache/cassandra/blob/c794d2bed7ca1d10e13c4da08a3d45f5c755c1d8/src/java/org/apache/cassandra/db/compaction/LeveledCompactionStrategy.java#L504] for instance has this three part logical && condition where the first check is against the compacting set before calling {{worthDroppingTombstones}} which fails if the table has been released.
> The order of events is basically that CompactionStrategyManager acquires the read lock in getNextBackgroundTask(), then proceeds eventually to findDroppableSSTable and acquires a set of SSTables from the manifest. While the manifest is thread safe it's not accessed atomically WRT to other operations. Once it has acquire the set of tables it acquires the (not atomically) the set of compacting SSTables and iterates checking the former against the latter.
> Meanwhile other compaction threads are marking tables obsolete or compacted and releasing their references. Doing this removes them from {{DataTracker}} and publishes a notification to the strategies, but this notification only requires the read lock. After the compaction thread has published the notifications it eventually marks the table as not compacting in {{DataTracker}} or removes it entirely.
> The race is then that the compaction thread generating a new background task acquires the sstables from the manifest on the stack. Any table in that set that was compacting at that time must remain compacting so that it can be skipped. Another compaction thread finishes a compaction and is able to remove the table from the manifest and then remove it from the compacting set. The thread generating the background task then acquires the list of compacting tables which doesn't include the table it is supposed to skip.
> The simple fix appears to be to require threads to acquire the write lock in order to publish notifications of tables being added/removed from compaction strategies. While holding the write lock it won't be possible for someone to see a view of tables in the manifest where tables that are compacting aren't compacting in the view



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)