You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Paulo Motta (JIRA)" <ji...@apache.org> on 2017/08/24 16:15:00 UTC

[jira] [Comment Edited] (CASSANDRA-13069) Local batchlog for MV may not be correctly written on node movements

    [ https://issues.apache.org/jira/browse/CASSANDRA-13069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139795#comment-16139795 ] 

Paulo Motta edited comment on CASSANDRA-13069 at 8/24/17 4:14 PM:
------------------------------------------------------------------

Finally getting back to this after a while, sorry for the delay! Going back to the question of the last review round:

{quote}
bq. As far as I can tell, the goal of using a local batchlog is to guarantee eventual consistency of a the base table and its views. That is, no matter what happens for a given update, either both that update and all the related view updates get eventually applied, or none of it is. So I don't understand why:
1. we don't include local view mutations in the batchlog in SP.mutateMV.
2. the base table mutation isn't included in said batchlog alongside it's related view updates.
{quote}

I ended up writing a [trunk patch|https://github.com/pauloricardomg/cassandra/commit/00a0bef75129a396aeee2cedc89d6a6ec66f610e] to include both local and base table mutations in the batchlog as suggested, but then looking at the original code I figured that whatever failure happens during views update (but before the local base or views are persisted) is safeguarded by the [base tablecommit log write|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Keyspace.java#L595] prior to the view update, so I don't think we actually need to include the local mutations in the batchlog.

Given that remote view writes are [fire and forget|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageProxy.java#L843],  the most probable cause of failure during local view writing would be a crash, and in that case the commit log replay will re-apply the base mutations and views. The two downsides I can think of are:
a) We cannot ensure the commit log is actually persisted before the crash, unless batch commit log is used.
b) The user may have durable_writes=false
c) I'm not sure if many other non-crash failures are possible in this case (fail local base and view mutations), besides full/corrupted disk, in which case you are screwed anyway, but if they happen you'll need to wait until the next restart/commitlog replay to have your base-views consistent.

There's not much we can do about a), so it seems we'll just need to live with this and rely on repair to fix potential inconsistencies? b) is actually a problem even with including local mutations in the batchlog write, unless we force batchlog writes to be durable. c) is not a big deal unless there are other legitimate non-crash scenarios which I'm not aware of.

Adding the local base mutation to the view batchlog requires the following changes:
a) Special case the batchlog write-path to [skip writing the base mutation|https://github.com/pauloricardomg/cassandra/commit/00a0bef75129a396aeee2cedc89d6a6ec66f610e#diff-b8ec5deb7141558cf8f868460077be31R94], since that will be written by the calling thread after the view updates, and [ack it|https://github.com/pauloricardomg/cassandra/commit/00a0bef75129a396aeee2cedc89d6a6ec66f610e#diff-9ca8d303d375e01a14d42154eaf46e37R556] after the base table write is done so the batchlog can be clean.
b) Special case the batchlog [replay path|https://github.com/pauloricardomg/cassandra/commit/00a0bef75129a396aeee2cedc89d6a6ec66f610e#diff-c77d5f1027ee5e8a49e106903a4ca937R458] to avoid generating views when replaying base table mutations in the case of view batchlogs.

So, unless there's a good reason not mentioned above to include the local mutations on the view batchlog, I'd prefer to keep the current approach of writing only remote view mutations in the view batchlog. If you agree with that, I [added a comment|https://github.com/pauloricardomg/cassandra/commit/8314ced9f1077224008e79e4c23d00d3277073d5#diff-71f06c193f5b5e270cf8ac695164f43aR732] in the original patch explaining why the local mutations are not included in the local batchlog to avoid confusion in the future. Please find the updated patch below:

||3.0||trunk||dtest||
|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-13069]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-13069]|[branch|https://github.com/riptano/cassandra-dtest/compare/master...pauloricardomg:13069]|
|[testall|http://jenkins-cassandra.datastax.lan/view/Dev/view/paulomotta/job/pauloricardomg-3.0-13069-testall/lastCompletedBuild/testReport/]|[testall|http://jenkins-cassandra.datastax.lan/view/Dev/view/paulomotta/job/pauloricardomg-trunk-13069-testall/lastCompletedBuild/testReport/]|
|[dtest|http://jenkins-cassandra.datastax.lan/view/Dev/view/paulomotta/job/pauloricardomg-3.0-13069-dtest/lastCompletedBuild/testReport/]|[dtest|http://jenkins-cassandra.datastax.lan/view/Dev/view/paulomotta/job/pauloricardomg-trunk-13069-dtest/lastCompletedBuild/testReport/]|


I've added 2 [dtests|https://github.com/pauloricardomg/cassandra-dtest/commit/f0b7983dd26cd269c8f629f6de8ddd585e644b29#diff-62ba429edee6a4681782f078246c9893R1575] to check that base and view are consistent on crash before and after the view is applied, which detected that a newly [created|https://github.com/apache/cassandra/blob/8314ced9f1077224008e79e4c23d00d3277073d5/src/java/org/apache/cassandra/service/StorageProxy.java#L726] batchlog from commit-log replay is not immediately replayed due to the batchlog replay delay, so I removed that wait on the [first replay|https://github.com/apache/cassandra/blob/96168c00a85559fd687d64f9df664b08711b933b/src/java/org/apache/cassandra/batchlog/BatchlogManager.java#L210]. I also updated the MV test to verify that the view batchlog is [empty after replay|https://github.com/pauloricardomg/cassandra-dtest/commit/f0b7983dd26cd269c8f629f6de8ddd585e644b29#diff-62ba429edee6a4681782f078246c9893R124].

I did a few simplifications in the MV and batchlog write path in the [other patch|https://github.com/pauloricardomg/cassandra/tree/trunk-13069-add-local-to-view-batchlog] which may be worth keeping, so if we decide keeping the current approach of skipping local mutations on the batchlog I will open a new ticket with the refactoring suggestions.


was (Author: pauloricardomg):
Finally getting back to this after a while, sorry for the delay! Going back to the question of the last review round:

{quote}
bq. As far as I can tell, the goal of using a local batchlog is to guarantee eventual consistency of a the base table and its views. That is, no matter what happens for a given update, either both that update and all the related view updates get eventually applied, or none of it is. So I don't understand why:
1. we don't include local view mutations in the batchlog in SP.mutateMV.
2. the base table mutation isn't included in said batchlog alongside it's related view updates.
{quote}

I ended up writing a [trunk patch|https://github.com/pauloricardomg/cassandra/commit/00a0bef75129a396aeee2cedc89d6a6ec66f610e] to include both local and base table mutations in the batchlog as suggested, but then looking at the original code I figured that whatever failure happens during views update (but before the local base or views are persisted) is safeguarded by the [base tablecommit log write|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Keyspace.java#L595] prior to the view update, so I don't think we actually need to include the local mutations in the batchlog.

Given that remote view writes are [fire and forget|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageProxy.java#L843],  the most probable cause of failure during local view writing would be a crash, and in that case the commit log replay will re-apply the base mutations and views. The two downsides I can think of are:
a) We cannot ensure the commit log is actually persisted before the crash, unless batch commit log is used.
b) The user may have durable_writes=false
c) I'm not sure if many other non-crash failures are possible in this case (fail local base and view mutations), besides full/corrupted disk, in which case you are screwed anyway, but if they happen you'll need to wait until the next restart/commitlog replay to have your base-views consistent.

There's not much we can do about a), so it seems we'll just need to live with this and rely on repair to fix potential inconsistencies? b) is actually a problem even with including local mutations in the batchlog write, unless we force batchlog writes to be durable. c) is not a big deal unless there are other legitimate non-crash scenarios which I'm not aware of.

Adding the local base mutation to the view batchlog requires the following changes:
a) Special case the batchlog write-path to [skip writing the base mutation|https://github.com/pauloricardomg/cassandra/commit/00a0bef75129a396aeee2cedc89d6a6ec66f610e#diff-b8ec5deb7141558cf8f868460077be31R94], since that will be written by the calling thread after the view updates, and [ack it|https://github.com/pauloricardomg/cassandra/commit/00a0bef75129a396aeee2cedc89d6a6ec66f610e#diff-9ca8d303d375e01a14d42154eaf46e37R556] after the base table write is done so the batchlog can be clean.
b) Special case the batchlog [replay path|https://github.com/pauloricardomg/cassandra/commit/00a0bef75129a396aeee2cedc89d6a6ec66f610e#diff-c77d5f1027ee5e8a49e106903a4ca937R458] to avoid generating views when replaying base table mutations in the case of view batchlogs.

So, unless there's a good reason not mentioned above to include the local mutations on the view batchlog, I'd prefer to keep the current approach of writing only remote view mutations in the view batchlog. If you agree with that, I [added a comment|https://github.com/pauloricardomg/cassandra/commit/8314ced9f1077224008e79e4c23d00d3277073d5#diff-71f06c193f5b5e270cf8ac695164f43aR732] in the original patch explaining why the local mutations are not included in the local batchlog to avoid confusion in the future. Please find the updated patch below:

||3.0||trunk||dtest||
|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-13069]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-13069]|[branch|https://github.com/riptano/cassandra-dtest/compare/master...pauloricardomg:13069]|
|[testall|http://jenkins-cassandra.datastax.lan/view/Dev/view/paulomotta/job/pauloricardomg-3.0-13069-testall/lastCompletedBuild/testReport/]|[testall|http://jenkins-cassandra.datastax.lan/view/Dev/view/paulomotta/job/pauloricardomg-trunk-13069-testall/lastCompletedBuild/testReport/]|
|[dtest|http://jenkins-cassandra.datastax.lan/view/Dev/view/paulomotta/job/pauloricardomg-3.0-13069-dtest/lastCompletedBuild/testReport/]|[dtest|http://jenkins-cassandra.datastax.lan/view/Dev/view/paulomotta/job/pauloricardomg-trunk-13069-dtest/lastCompletedBuild/testReport/]|


I've added 2 [dtests|https://github.com/pauloricardomg/cassandra-dtest/commit/f0b7983dd26cd269c8f629f6de8ddd585e644b29#diff-62ba429edee6a4681782f078246c9893R1575] to check that base and view are consistent on crash before and after the view is applied, which detected that recovered batchlog from commitlogs is not replayed straight away due to the batchlog timeout, so I removed that wait on the [first replay|https://github.com/pauloricardomg/cassandra/commit/8314ced9f1077224008e79e4c23d00d3277073d5#diff-c77d5f1027ee5e8a49e106903a4ca937R200]. I also updated the MV test to verify that the view batchlog is [empty after replay|https://github.com/pauloricardomg/cassandra-dtest/commit/f0b7983dd26cd269c8f629f6de8ddd585e644b29#diff-62ba429edee6a4681782f078246c9893R124].

I did a few simplifications in the MV and batchlog write path in the [other patch|https://github.com/pauloricardomg/cassandra/tree/trunk-13069-add-local-to-view-batchlog] which may be worth keeping, so if we decide keeping the current approach of skipping local mutations on the batchlog I will open a new ticket with the refactoring suggestions.

> Local batchlog for MV may not be correctly written on node movements
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-13069
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13069
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Materialized Views
>            Reporter: Sylvain Lebresne
>            Assignee: Paulo Motta
>
> Unless I'm really reading this wrong, I think the code [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageProxy.java#L829-L843], which comes from CASSANDRA-10674, isn't working properly.
> More precisely, I believe we can have both paired and unpaired mutations, so that both {{if}} can be taken, but if that's the case, the 2nd write to the batchlog will basically overwrite (remove) the batchlog write of the 1st {{if}} and I don't think that's the intention. In practice, this means "paired" mutation won't be in the batchlog, which mean they won't be replayed at all if they fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org