You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Nick Bailey (JIRA)" <ji...@apache.org> on 2013/09/12 16:58:52 UTC

[jira] [Created] (CASSANDRA-6011) Race condition in snapshot repair

Nick Bailey created CASSANDRA-6011:
--------------------------------------

             Summary: Race condition in snapshot repair
                 Key: CASSANDRA-6011
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6011
             Project: Cassandra
          Issue Type: Bug
          Components: Core
            Reporter: Nick Bailey
             Fix For: 1.2.10, 2.0.1


When we do a snapshot/sequential repair, we use the repair session id as the snapshot name. Unfortunately in Directories.java when we delete a snapshot, we delete it for all column families, even when called on a specific cf store.

So what can happen is this:

Node B finishes validation compaction for CF1 and Notifies Node A
Node B *starts* to delete snapshot for CF1
Node A finishes repair of CF1 and starts repair of CF2
Node B takes snapshot of CF2 and starts validation compaction, but the previous validation compaction is still deleting snapshots, so the snapshot it wants to run a validation on gets deleted out from under it.

I've only reproduced on 1.2.6, but looking at the code this definitely looks like it exists in 1.2 HEAD. Not positive about 2.0.

I think the fix is just to update Directories.java to not delete the snapshot from all column families.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira