You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Nick Bailey (JIRA)" <ji...@apache.org> on 2013/09/12 16:58:52 UTC
[jira] [Created] (CASSANDRA-6011) Race condition in snapshot repair
Nick Bailey created CASSANDRA-6011:
--------------------------------------
Summary: Race condition in snapshot repair
Key: CASSANDRA-6011
URL: https://issues.apache.org/jira/browse/CASSANDRA-6011
Project: Cassandra
Issue Type: Bug
Components: Core
Reporter: Nick Bailey
Fix For: 1.2.10, 2.0.1
When we do a snapshot/sequential repair, we use the repair session id as the snapshot name. Unfortunately in Directories.java when we delete a snapshot, we delete it for all column families, even when called on a specific cf store.
So what can happen is this:
Node B finishes validation compaction for CF1 and Notifies Node A
Node B *starts* to delete snapshot for CF1
Node A finishes repair of CF1 and starts repair of CF2
Node B takes snapshot of CF2 and starts validation compaction, but the previous validation compaction is still deleting snapshots, so the snapshot it wants to run a validation on gets deleted out from under it.
I've only reproduced on 1.2.6, but looking at the code this definitely looks like it exists in 1.2 HEAD. Not positive about 2.0.
I think the fix is just to update Directories.java to not delete the snapshot from all column families.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira