You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Wei Deng (Jira)" <ji...@apache.org> on 2020/08/13 16:11:00 UTC

[jira] [Comment Edited] (CASSANDRA-16047) Potential race condition in creating hard link when incremental backup is turned on

    [ https://issues.apache.org/jira/browse/CASSANDRA-16047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17177122#comment-17177122 ] 

Wei Deng edited comment on CASSANDRA-16047 at 8/13/20, 4:10 PM:
----------------------------------------------------------------

The version was mentioned in the description: 3.0.15.

The deployment is on a public cloud environment with EBS-like disks that are backed by SSD with decent latency, throughput and IOPS, so it is hard to think the culprit being in the OS and IO layer.


was (Author: weideng):
The version was mentioned in the description: 3.0.15.

> Potential race condition in creating hard link when incremental backup is turned on
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-16047
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16047
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/SSTable
>            Reporter: Wei Deng
>            Priority: Urgent
>         Attachments: incremental_backup_hardlink_exception.jpg, incremental_backup_hardlink_exception1.jpg
>
>
> It seems that there is a race condition in creating hard link if incremental backup is turned on.
> The following screenshot was captured in a production cluster running Cassandra 3.0.15 after turning on incremental backup. When this {{NoSuchFileException}} happens, due to the {{FSWriteError}} and the default disk failure policy, the JVM will be shutdown, so it's a pretty critical bug.
>  !incremental_backup_hardlink_exception.jpg!
> Due to the risk of causing production database downtime (if similar issue happens on multiple nodes in a short time frame), and same exception causing JVM shutdown multiple times already, incremental backup had to be turned off for now, but this is not an ideal situation.
> !incremental_backup_hardlink_exception1.jpg!
> The deployment is on a public cloud environment with EBS-like disks that are backed by SSD with decent latency, throughput and IOPS, so it is hard to think the culprit being in the OS and IO layer. Based on the second screenshot above, this is a low flush traffic {{system.size_estimates}} table, so compaction of the source SSTable doesn't seem to be at play here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org