You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stefan Miklosovic (Jira)" <ji...@apache.org> on 2022/10/10 11:07:00 UTC

[jira] [Commented] (CASSANDRA-17955) Race condition on repair snapshots

    [ https://issues.apache.org/jira/browse/CASSANDRA-17955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17615026#comment-17615026 ] 

Stefan Miklosovic commented on CASSANDRA-17955:
-----------------------------------------------

[~marcuse] [~dcapwell] would you mind to take a look? I see you were involved in that part of the code lastly via git blame.

This is reported to be quite a big issue, we have a case of 200 nodes cluster where 80 nodes across 3 dcs hit this problem.

Having this in 4.1 GA would be really great. Isn't this actually a blocker?

> Race condition on repair snapshots
> ----------------------------------
>
>                 Key: CASSANDRA-17955
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17955
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Repair, Local/Snapshots
>            Reporter: Cameron Zemek
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>              Labels: 4.0
>             Fix For: 4.0.x, 4.1-rc, 4.x
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> If an endpoint is convicted and that endpoint is a coordinator then ActiveRepairService::removeParentRepairSession is called.
> The issue is that this occurs on clearSnapshotExecutor and can happen while RepairMessageVerbHandler is in process of taking a snapshot. So then you get a race condition and clearSnapshot will throw a java.nio.file.DirectoryNotEmptyException
>  
> {code:java}
> public static void deleteRecursiveWithThrottle(File dir, RateLimiter rateLimiter)
> {
>     if (dir.isDirectory())
>     {
>         String[] children = dir.list();
>         for (String child : children)
>             deleteRecursiveWithThrottle(new File(dir, child), rateLimiter);
>     }
>     // The directory is now empty so now it can be smoked
>     deleteWithConfirmWithThrottle(dir, rateLimiter);
> } {code}
> Due to the directory not being empty when it goes to remove the directory at the end.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org