You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2021/10/08 18:37:50 UTC

[GitHub] [couchdb] nickva opened a new pull request #3780: Try harder to avoid a change feed rewind after a shard move

nickva opened a new pull request #3780:
URL: https://github.com/apache/couchdb/pull/3780


   In the previous attempt [1] we improved the logic by spawning workers on the matching target shards only. However, that wasn't enough as workers might still reject the passed in sequence from the old node when it asserts ownership
   locally on each shard.
   
   Re-use the already existing replacement clause, where after uuid is matched, we try harder to find the highest viable sequence. To use the unit test setup as an example, if the moved from `node1` to `node2`, and recorded epoch `{node2, 10}`
   on the new node, then a sequence generated on `node1` before the move, for example `12`, would rewind down to `10` only when calculated on new location on `node2`, instead of being rewound all the way down to `0`.
   
   [1] https://github.com/apache/couchdb/commit/e83935c7f8c3e47b47f07f22ece327f6529d4da0
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3780: Try harder to avoid a change feed rewind after a shard move

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3780:
URL: https://github.com/apache/couchdb/pull/3780#discussion_r725272278



##########
File path: src/couch/src/couch_db.erl
##########
@@ -2047,6 +2051,18 @@ t_calculate_start_seq_epoch_mismatch() ->
         ?assertEqual(0, Seq)
     end).
 
+t_calculate_start_seq_shard_move() ->
+    ?_test(begin
+        Db = test_util:fake_db([]),
+        % Sequence when shard was on node1
+        ?assertEqual(2, calculate_start_seq(Db, node1, {2, <<"foo">>})),
+        % Sequence from node1 after the move happened, we reset back to the
+        % start of the epoch on node2 = 10
+        ?assertEqual(10, calculate_start_seq(Db, node1, {16, <<"foo">>})),

Review comment:
       > ?assertEqual(10, calculate_start_seq(Db, node1, {8, <<"foo">>})),
   
   Good question!
   
   `calculate_start_seq(Db, node1, {8, <<"foo">>})),` should be 8, because sequence 8 was generated on node1 before the the shard was moved to node2. It was moved to node2 on sequence 10. This is essentially the same case as `?assertEqual(2, calculate_start_seq(Db, node1, {2, <<"foo">>})),` in the test.
   
   In other words, for sequences from 1-9 when the shard lived on node1 and the _changes feed has sequence  {node1, 1}, {node1, 2}, {node1, 9} we'd start streaming from 1, 2 or 9 safely.
   
   But then, the shard was moved to node2, the first time it was opened on node2 we recorded its new epoch on node2 as with `Seq = 10`. So from now on, even if the shard copy `node1` was updated and generated say sequence `15`, when we get `{node1, 15}` we can't start streaming from 15, but know we've been node2 since `Seq = 10`, so we start with `Seq = 10` then. Previously we could have started with 0 which is what this PR is fixing.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva merged pull request #3780: Try harder to avoid a change feed rewind after a shard move

Posted by GitBox <gi...@apache.org>.
nickva merged pull request #3780:
URL: https://github.com/apache/couchdb/pull/3780


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3780: Try harder to avoid a change feed rewind after a shard move

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3780:
URL: https://github.com/apache/couchdb/pull/3780#discussion_r725272278



##########
File path: src/couch/src/couch_db.erl
##########
@@ -2047,6 +2051,18 @@ t_calculate_start_seq_epoch_mismatch() ->
         ?assertEqual(0, Seq)
     end).
 
+t_calculate_start_seq_shard_move() ->
+    ?_test(begin
+        Db = test_util:fake_db([]),
+        % Sequence when shard was on node1
+        ?assertEqual(2, calculate_start_seq(Db, node1, {2, <<"foo">>})),
+        % Sequence from node1 after the move happened, we reset back to the
+        % start of the epoch on node2 = 10
+        ?assertEqual(10, calculate_start_seq(Db, node1, {16, <<"foo">>})),

Review comment:
       > ?assertEqual(10, calculate_start_seq(Db, node1, {8, <<"foo">>})),
   
   Good question!
   
   `calculate_start_seq(Db, node1, {8, <<"foo">>})),` should be 8, because sequence 8 was generated on node1 before the the shard was moved to node2. It was moved to node2 on sequence 10. This is essentially the same case as `?assertEqual(2, calculate_start_seq(Db, node1, {2, <<"foo">>})),` in the test.
   
   In other words, for sequences from 1-9 when the shard lived on `node1` and the _changes feed has sequences  `{node1, 1}`, or `{node1, 2}`, or `{node1, 9}` etc., we'd start streaming from `1`, `2` or `9` safely.
   
   But then, the shard was moved to node2, the first time it was opened on node2 we recorded its new epoch on `node2` as with `Seq = 10`. So from now on, even if the shard copy on `node1` was updated and generated say sequence `15`, when we get `{node1, 15}` we can't start streaming from `15`, but know we've been `node2` since `Seq = 10`, so we start with `Seq = 10` then. Previously we could have started with 0 which is what this PR is fixing.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3780: Try harder to avoid a change feed rewind after a shard move

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3780:
URL: https://github.com/apache/couchdb/pull/3780#discussion_r725257375



##########
File path: src/couch/src/couch_db.erl
##########
@@ -2047,6 +2051,18 @@ t_calculate_start_seq_epoch_mismatch() ->
         ?assertEqual(0, Seq)
     end).
 
+t_calculate_start_seq_shard_move() ->
+    ?_test(begin
+        Db = test_util:fake_db([]),
+        % Sequence when shard was on node1
+        ?assertEqual(2, calculate_start_seq(Db, node1, {2, <<"foo">>})),
+        % Sequence from node1 after the move happened, we reset back to the
+        % start of the epoch on node2 = 10
+        ?assertEqual(10, calculate_start_seq(Db, node1, {16, <<"foo">>})),

Review comment:
       Trying to think if this ever happens and if it's the right result:
   ```
   ?assertEqual(10, calculate_start_seq(Db, node1, {8, <<"foo">>})),
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on pull request #3780: Try harder to avoid a change feed rewind after a shard move

Posted by GitBox <gi...@apache.org>.
nickva commented on pull request #3780:
URL: https://github.com/apache/couchdb/pull/3780#issuecomment-939089720


   @jaydoane good catch, Thanks!
   
   Fixed the commit message and the PR comment.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3780: Try harder to avoid a change feed rewind after a shard move

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3780:
URL: https://github.com/apache/couchdb/pull/3780#discussion_r725274772



##########
File path: src/couch/src/couch_db.erl
##########
@@ -2047,6 +2051,18 @@ t_calculate_start_seq_epoch_mismatch() ->
         ?assertEqual(0, Seq)
     end).
 
+t_calculate_start_seq_shard_move() ->
+    ?_test(begin
+        Db = test_util:fake_db([]),
+        % Sequence when shard was on node1
+        ?assertEqual(2, calculate_start_seq(Db, node1, {2, <<"foo">>})),
+        % Sequence from node1 after the move happened, we reset back to the
+        % start of the epoch on node2 = 10
+        ?assertEqual(10, calculate_start_seq(Db, node1, {16, <<"foo">>})),

Review comment:
       Thank you for the very detailed answer!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on pull request #3780: Try harder to avoid a change feed rewind after a shard move

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on pull request #3780:
URL: https://github.com/apache/couchdb/pull/3780#issuecomment-939090703


   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org