You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Mike Percy (Code Review)" <ge...@cloudera.org> on 2017/08/30 01:09:31 UTC

[kudu-CR] consensus: Save previous last-logged OpId across tablet copies

Hello David Ribeiro Alves, Alexey Serbin,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/7888

to review the following change.

Change subject: consensus: Save previous last-logged OpId across tablet copies
......................................................................

consensus: Save previous last-logged OpId across tablet copies

Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
5 files changed, 79 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/7888/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>

[kudu-CR] consensus: Save previous last-logged OpId across tablet copies

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: consensus: Save previous last-logged OpId across tablet copies
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7888/1/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

Line 2854:   vector<string> master_flags = {
nit: const?


http://gerrit.cloudera.org:8080/#/c/7888/1/src/kudu/tserver/tablet_copy_client.cc
File src/kudu/tserver/tablet_copy_client.cc:

PS1, Line 348:   RETURN_NOT_OK(DownloadBlocks());
             :   RETURN_NOT_OK(DownloadWALs());
Just curious: what if the crash happens between calls of there two methods?  Will it be handled appropriately as well?


-- 
To view, visit http://gerrit.cloudera.org:8080/7888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes

[kudu-CR] consensus: Save previous last-logged OpId across tablet copies

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7888

to look at the new patch set (#2).

Change subject: consensus: Save previous last-logged OpId across tablet copies
......................................................................

consensus: Save previous last-logged OpId across tablet copies

Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
5 files changed, 79 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/7888/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] consensus: Save previous last-logged OpId across tablet copies

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: consensus: Save previous last-logged OpId across tablet copies
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7888/2/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

PS2, Line 2893: ASSERT_OK(cluster_->tablet_server_by_uuid(leader_uuid)->Restart());
> Thank you for the clarification.
Err, I slightly messed up my explanation (I was thinking about another test I recently wrote). I should have said:

it was simpler to write than looping and trying to make sure we revived the tombstoned one (the follower) and one (but not both) of the others.


-- 
To view, visit http://gerrit.cloudera.org:8080/7888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] consensus: Save previous last-logged OpId across tablet copies

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: consensus: Save previous last-logged OpId across tablet copies
......................................................................


Patch Set 2:

(1 comment)

Just passing through...

http://gerrit.cloudera.org:8080/#/c/7888/2//COMMIT_MSG
Commit Message:

Line 7: consensus: Save previous last-logged OpId across tablet copies
> nit: as an afterthought, I it would be nice to add some more information on
Could you flesh out the commit description? Why is saving the last-logged OpId a good thing (or the right thing) to do?


-- 
To view, visit http://gerrit.cloudera.org:8080/7888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] consensus: Save previous last-logged OpId across tablet copies

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has submitted this change and it was merged.

Change subject: consensus: Save previous last-logged OpId across tablet copies
......................................................................


consensus: Save previous last-logged OpId across tablet copies

When a replica falls behind the leader's log, and then fails a tablet
copy, it should still be able to vote in leader elections while
tombstoned. With this patch, that becomes possible. This helps with
availability.

Without this patch, asking a post-tablet copy failure replica to vote
would result in the following RPC response:

IllegalState: must be running to vote when last-logged opid is not known

Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Reviewed-on: http://gerrit.cloudera.org:8080/7888
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: David Ribeiro Alves <da...@gmail.com>
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
5 files changed, 81 insertions(+), 13 deletions(-)

Approvals:
  David Ribeiro Alves: Looks good to me, approved
  Alexey Serbin: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/7888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] consensus: Save previous last-logged OpId across tablet copies

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: consensus: Save previous last-logged OpId across tablet copies
......................................................................


Patch Set 4: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7888/2/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

PS2, Line 2846:   vector<string> ts_flags;
> Yes, this test failed before the fix.
ok, great!


PS2, Line 2893: workload.set_timeout_allowed(true);
> Yes, we should be able to bring back any other two, since that was a majori
Thank you for the clarification.


-- 
To view, visit http://gerrit.cloudera.org:8080/7888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] consensus: Save previous last-logged OpId across tablet copies

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7888

to look at the new patch set (#4).

Change subject: consensus: Save previous last-logged OpId across tablet copies
......................................................................

consensus: Save previous last-logged OpId across tablet copies

When a replica falls behind the leader's log, and then fails a tablet
copy, it should still be able to vote in leader elections while
tombstoned. With this patch, that becomes possible. This helps with
availability.

Without this patch, asking a post-tablet copy failure replica to vote
would result in the following RPC response:

IllegalState: must be running to vote when last-logged opid is not known

Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
5 files changed, 81 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/7888/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] consensus: Save previous last-logged OpId across tablet copies

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: consensus: Save previous last-logged OpId across tablet copies
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7888/4/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

PS4, Line 2884:   int follower_idx = cluster_->tablet_server_index_by_uuid(follower_uuid);
              :   ASSERT_OK(inspect_->CheckTabletDataStateOnTS(follower_idx, tablet_id_, { TABLET_DATA_COPYING }));
              : 
              :   ASSERT_OK(cluster_->master()->Restart());
              :   ASSERT_OK(cluster_->tablet_server_by_uuid(leader_uuid)->Restart());
              :   ASSERT_OK(cluster_->tablet_server_by_uuid(follower_uuid)->Restart());
> hum, but that would happen also if there was a last op id, right? (this is 
Just reiterating what we discussed over chat: since we're now clobbering the last-logged opid in one fewer instance, we're able to successfully vote in that case, which serves as a functional regression test.


-- 
To view, visit http://gerrit.cloudera.org:8080/7888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] consensus: Save previous last-logged OpId across tablet copies

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: consensus: Save previous last-logged OpId across tablet copies
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7888/4/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

PS4, Line 2884:   int follower_idx = cluster_->tablet_server_index_by_uuid(follower_uuid);
              :   ASSERT_OK(inspect_->CheckTabletDataStateOnTS(follower_idx, tablet_id_, { TABLET_DATA_COPYING }));
              : 
              :   ASSERT_OK(cluster_->master()->Restart());
              :   ASSERT_OK(cluster_->tablet_server_by_uuid(leader_uuid)->Restart());
              :   ASSERT_OK(cluster_->tablet_server_by_uuid(follower_uuid)->Restart());
is there a way to assert that the follower doesn't have a last op id?


-- 
To view, visit http://gerrit.cloudera.org:8080/7888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] consensus: Save previous last-logged OpId across tablet copies

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: consensus: Save previous last-logged OpId across tablet copies
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7888/4/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

PS4, Line 2884:   int follower_idx = cluster_->tablet_server_index_by_uuid(follower_uuid);
              :   ASSERT_OK(inspect_->CheckTabletDataStateOnTS(follower_idx, tablet_id_, { TABLET_DATA_COPYING }));
              : 
              :   ASSERT_OK(cluster_->master()->Restart());
              :   ASSERT_OK(cluster_->tablet_server_by_uuid(leader_uuid)->Restart());
              :   ASSERT_OK(cluster_->tablet_server_by_uuid(follower_uuid)->Restart());
> is there a way to assert that the follower doesn't have a last op id?
It will in this case, and we do assert on it functionally, because if it didn't then the new leader would not get elected and be able to tablet copy over the follower, and we would not be able to write to the tablet again below.


-- 
To view, visit http://gerrit.cloudera.org:8080/7888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] consensus: Save previous last-logged OpId across tablet copies

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: consensus: Save previous last-logged OpId across tablet copies
......................................................................


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: No

[kudu-CR] consensus: Save previous last-logged OpId across tablet copies

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: consensus: Save previous last-logged OpId across tablet copies
......................................................................


Patch Set 2:

(5 comments)

Added some after-thought nits: they are about making sure my understanding was correct.

http://gerrit.cloudera.org:8080/#/c/7888/2//COMMIT_MSG
Commit Message:

Line 7: consensus: Save previous last-logged OpId across tablet copies
nit: as an afterthought, I it would be nice to add some more information on how the system behave prior to this patch in the particular case which is now exercised by the test.  Is that something like 'IllegalState: must be running to vote when last-logged opid is not known' or it was something more cryptic?

Would it make sense?


http://gerrit.cloudera.org:8080/#/c/7888/2/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

PS2, Line 2846: TEST_F(RaftConsensusITest, TestTombstonedVoteAfterFailedTabletCopy)
Just for my better understanding: did this test fail before the fix?


PS2, Line 2849:   vector<string> ts_flags = {
              :     // We write 128KB cells in this test, so bump the limit.
              :     "--max_cell_size_bytes=1000000"
              :   };
nit: could you add a comment explaining why it's crucial to write above the default cell size limit?


Line 2860:   ASSERT_EQ(3, active_tablet_servers.size());
nit: Since the FLAGS_num_replicas might be altered in the command line, would it make sense to add:
  ASSERT_EQ(3, FLAGS_num_replicas);
?


PS2, Line 2893: ASSERT_OK(cluster_->tablet_server_by_uuid(leader_uuid)->Restart());
nit (just wanted to make sure my understanding is correct): would it work if bringing back not the former leader, but the other replica (not the crashed follower, of course)?


-- 
To view, visit http://gerrit.cloudera.org:8080/7888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] consensus: Save previous last-logged OpId across tablet copies

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: consensus: Save previous last-logged OpId across tablet copies
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7888/2//COMMIT_MSG
Commit Message:

Line 7: consensus: Save previous last-logged OpId across tablet copies
> Could you flesh out the commit description? Why is saving the last-logged O
Done; Done


http://gerrit.cloudera.org:8080/#/c/7888/2/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

PS2, Line 2846: TEST_F(RaftConsensusITest, TestTombstonedVoteAfterFailedTabletCopy)
> Just for my better understanding: did this test fail before the fix?
Yes, this test failed before the fix.


PS2, Line 2849:   vector<string> ts_flags = {
              :     // We write 128KB cells in this test, so bump the limit.
              :     "--max_cell_size_bytes=1000000"
              :   };
> nit: could you add a comment explaining why it's crucial to write above the
I just moved this to CauseFollowerToFallBehindLogGC()


Line 2860:   ASSERT_EQ(3, active_tablet_servers.size());
> nit: Since the FLAGS_num_replicas might be altered in the command line, wou
Done


PS2, Line 2893: ASSERT_OK(cluster_->tablet_server_by_uuid(leader_uuid)->Restart());
> nit (just wanted to make sure my understanding is correct): would it work i
Yes, we should be able to bring back any other two, since that was a majority. However I already had a handle to these two, and it was simpler to write than looping and trying to make sure we revived the tombstoned one and two (but not all 3) of the others.


-- 
To view, visit http://gerrit.cloudera.org:8080/7888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] consensus: Save previous last-logged OpId across tablet copies

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: consensus: Save previous last-logged OpId across tablet copies
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7888/1/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

Line 2854:   vector<string> master_flags = {
> nit: const?
Done


http://gerrit.cloudera.org:8080/#/c/7888/1/src/kudu/tserver/tablet_copy_client.cc
File src/kudu/tserver/tablet_copy_client.cc:

PS1, Line 348:   RETURN_NOT_OK(DownloadBlocks());
             :   RETURN_NOT_OK(DownloadWALs());
> Just curious: what if the crash happens between calls of there two methods?
We may orphan the blocks, but nothing "bad" happens. Relevant tests are in tablet_copy-itest (see BadTabletCopyITest).


-- 
To view, visit http://gerrit.cloudera.org:8080/7888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] consensus: Save previous last-logged OpId across tablet copies

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: consensus: Save previous last-logged OpId across tablet copies
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7888/4/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

PS4, Line 2884:   int follower_idx = cluster_->tablet_server_index_by_uuid(follower_uuid);
              :   ASSERT_OK(inspect_->CheckTabletDataStateOnTS(follower_idx, tablet_id_, { TABLET_DATA_COPYING }));
              : 
              :   ASSERT_OK(cluster_->master()->Restart());
              :   ASSERT_OK(cluster_->tablet_server_by_uuid(leader_uuid)->Restart());
              :   ASSERT_OK(cluster_->tablet_server_by_uuid(follower_uuid)->Restart());
> It will in this case, and we do assert on it functionally, because if it di
hum, but that would happen also if there was a last op id, right? (this is the "additional" case you're solving here, or am I missing something)


-- 
To view, visit http://gerrit.cloudera.org:8080/7888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] consensus: Save previous last-logged OpId across tablet copies

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: consensus: Save previous last-logged OpId across tablet copies
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7888/3//COMMIT_MSG
Commit Message:

PS3, Line 14: poat
> Thanks for adding the description. There's a typo here.
erg


-- 
To view, visit http://gerrit.cloudera.org:8080/7888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] consensus: Save previous last-logged OpId across tablet copies

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: consensus: Save previous last-logged OpId across tablet copies
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7888/3//COMMIT_MSG
Commit Message:

PS3, Line 14: poat
Thanks for adding the description. There's a typo here.


-- 
To view, visit http://gerrit.cloudera.org:8080/7888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] consensus: Save previous last-logged OpId across tablet copies

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7888

to look at the new patch set (#3).

Change subject: consensus: Save previous last-logged OpId across tablet copies
......................................................................

consensus: Save previous last-logged OpId across tablet copies

When a replica falls behind the leader's log, and then fails a tablet
copy, it should still be able to vote in leader elections while
tombstoned. With this patch, that becomes possible. This helps with
availability.

Without this patch, asking a poat-tablet copy failure replica to vote
would result in the following RPC response:

IllegalState: must be running to vote when last-logged opid is not known

Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
5 files changed, 81 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/7888/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] consensus: Save previous last-logged OpId across tablet copies

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: consensus: Save previous last-logged OpId across tablet copies
......................................................................


Patch Set 2: Code-Review+2

LGTM

You might want to get more feedback from David, though.

-- 
To view, visit http://gerrit.cloudera.org:8080/7888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie886764adbdc6d84406df44d6cdb693d038bcd57
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: No