You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ariel Weisberg (JIRA)" <ji...@apache.org> on 2014/12/02 22:12:12 UTC

[jira] [Commented] (CASSANDRA-8383) Memtable flush may expire records from the commit log that are in a later memtable

    [ https://issues.apache.org/jira/browse/CASSANDRA-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232130#comment-14232130 ] 

Ariel Weisberg commented on CASSANDRA-8383:
-------------------------------------------

Does this deserve a regression test? I almost wish ReplayPosition implemented method wrappers for GT, GTE, LT, LTE, rather then using compareTo. For me there is mental overhead in parsing that kind of condition.

If I understand correctly if this race occurs and the writing thread loses it will be kicked forward to the next memtable despite the fact that the op group says it could go into the current memtable.

So for a memtable to accept a write (either no barrier must exist || the barrier exists but is after the op group) && if a last replay position is set it must be >= the replay position of the write
If it is not set the replay position will be updated by the writer so the flusher gets the position of the last write to the memtable correctly.
If the replay position is finalized even though the op group says that the write could go into this memtable it is kicked into the next one which is harmless and op order still works since it chains dependencies in order.

In effect the last replay position is frozen earlier so that when the second op group is created and starts interleaving in the CL anything beyond the frozen position is not considered for truncation after the memtable flushes.

I think this does what I just said and I think that fixes the problem that is described where upon create of the next op group CL entries from different op groups interleave with the truncation point used for the CL. Freezing the truncation point before creating the second op group solves the problem.

> Memtable flush may expire records from the commit log that are in a later memtable
> ----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8383
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8383
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Critical
>              Labels: commitlog
>             Fix For: 2.1.3
>
>
> This is a pretty obvious bug with any care of thought, so not sure how I managed to introduce it. We use OpOrder to ensure all writes to a memtable have finished before flushing, however we also use this OpOrder to direct writes to the correct memtable. However this is insufficient, since the OpOrder is only a partial order; an operation from the "future" (i.e. for the next memtable) could still interleave with the "past" operations in such a way that they grab a CL entry inbetween the "past" operations. Since we simply take the max ReplayPosition of those in the past, this would mean any interleaved future operations would be expired even though they haven't been persisted to disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)