You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Marcus Eriksson (JIRA)" <ji...@apache.org> on 2018/09/19 08:26:00 UTC

[jira] [Commented] (CASSANDRA-14763) Fail incremental repair prepare phase if it encounters sstables from un-finalized sessions

    [ https://issues.apache.org/jira/browse/CASSANDRA-14763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620283#comment-16620283 ] 

Marcus Eriksson commented on CASSANDRA-14763:
---------------------------------------------

a few comments;

* The error message given by the failing nodetool could be a bit better: {{Repair job has failed with the error message: [2018-09-19 10:01:51,386] null}} maybe we could add that the user should have a look in the logs for further details
* a comment about isPending() on the commit on github

wrote a dtest making sure that we throw an exception if this happens: https://github.com/krummas/cassandra-dtest/commits/marcuse/14763

also looks like a few repair dtests needs fixing

> Fail incremental repair prepare phase if it encounters sstables from un-finalized sessions
> ------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14763
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14763
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Repair
>            Reporter: Blake Eggleston
>            Assignee: Blake Eggleston
>            Priority: Major
>             Fix For: 4.0
>
>
> Raised in CASSANDRA-14685. If we encounter sstables from other IR sessions during an IR prepare phase, we should fail the new session. If we don't, the expectation that all data received before a repair session is consistent when it completes wouldn't always be true.
> In more detail: 
> We don’t have a foolproof way of determining if a repair session has hung. To prevent hung repair sessions from locking up sstables indefinitely, incremental repair sessions will auto-fail after 24 hours. During this time, the sstables for this session will remain isolated from the rest of the data set. Afterwards, the sstables are moved back into the unrepaired set.
>  
> During the prepare phase of an incremental repair, we isolate the data to be repaired. However, we ignore other sstables marked pending repair for the same token range. I think the intention here was to prevent a hung repair from locking up incremental repairs for 24 hours without manual intervention. Assuming the session succeeds, it’s data will be moved to repaired. _However the data from a hung session will eventually be moved back to unrepaired._ This means that you can’t use the most recent successful incremental repair as the high water mark for fully repaired data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org