You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Branimir Lambov (JIRA)" <ji...@apache.org> on 2015/08/18 17:36:47 UTC

[jira] [Commented] (CASSANDRA-9669) If sstable flushes complete out of order, on restart we can fail to replay necessary commit log records

    [ https://issues.apache.org/jira/browse/CASSANDRA-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701436#comment-14701436 ] 

Branimir Lambov commented on CASSANDRA-9669:
--------------------------------------------

First round of review, focussing at the replay logic:

{{ReplayPosition.add}} is suspect, as [in extending the end|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-2b76f3efa4aaa38339ab8f4869c9b7bfR70] {{extend != null}} implies {{extend.getKey().compareTo(end) >= 0}} and this makes the branch reachable only if {{end == entend.getKey()}} which I don't think is what you want.

The other direction appears correct, but the two can't be symmetrical as we can only key into starts. This is sufficient as {{range\[i\].key/lower < range\[i\].value/upper < range\[i+1\].key/lower}} (which I think you should clarify in a comment) and thus the entry we want to extend us on the right is the one before the first start larger than our end. {{end = max(end, floorEntry(end).value)}} should do it.

There's no need to recursively rerun, extending in either direction has no effect on the other side so you can do the two in sequence.

{{ReplayPosition.min}} should be a bit more descriptively named, e.g. "minUpperBound" or "firstNotCovered".

I'd call [CommitLogReplayer.replay|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-348a1347dacf897385fb0a97116a1b5eR166] "shouldReplay". Consider moving the containment logic (the last two lines) into {{ReplayPosition}} to keep the details of the range format within the class. I believe the intervals are start-inclusive, end-exclusive, thus the [final comparison|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-348a1347dacf897385fb0a97116a1b5eR172] should be {{<=}}.

Could you update the JavaDoc, especially returned values (e.g. [here|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-98f5acb96aa6d684781936c141132e2aR2498] and [here|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-f0a15c3588b56c5ce53ece7c48e325b5R247])? The comment [here|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-348a1347dacf897385fb0a97116a1b5eR333] is outdated.

Is [FlushRunnable.runMayThrow|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-f0a15c3588b56c5ce53ece7c48e325b5R352] still needed? DiskAwareRunnable?

I'll continue looking at rest of the code tomorrow.

> If sstable flushes complete out of order, on restart we can fail to replay necessary commit log records
> -------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9669
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9669
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Critical
>              Labels: correctness
>             Fix For: 3.x, 2.1.x, 2.2.x, 3.0.x
>
>
> While {{postFlushExecutor}} ensures it never expires CL entries out-of-order, on restart we simply take the maximum replay position of any sstable on disk, and ignore anything prior. 
> It is quite possible for there to be two flushes triggered for a given table, and for the second to finish first by virtue of containing a much smaller quantity of live data (or perhaps the disk is just under less pressure). If we crash before the first sstable has been written, then on restart the data it would have represented will disappear, since we will not replay the CL records.
> This looks to be a bug present since time immemorial, and also seems pretty serious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)