You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2018/03/15 06:55:38 UTC

[kudu-CR](branch-1.7.x) [consensus] FAILED UNRECOVERABLE replica health status

Hello Mike Percy,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/9649

to review the following change.


Change subject: [consensus] FAILED_UNRECOVERABLE replica health status
......................................................................

[consensus] FAILED_UNRECOVERABLE replica health status

Added HealthStatus::FAILED_UNRECOVERABLE for a tablet replica. This is
to mark replicas which are not able to catch up with the leader due to
GC-collected segments of WAL and other unrecoverable cases.

With the introduction of the FAILED_UNRECOVERABLE health status, the
replica management scheme becomes hybrid: the system evicts replicas
with FAILED_UNRECOVERABLE health status before adding a replacement
if it anticipates that it can commit the transaction.

This patch is a part of the fix to address KUDU-2342.  It also addresses
KUDU-2322 as well: evicting voter replicas more aggressively if they
fall behind log segment GC threshold.

Change-Id: I35637c5bda6681b732dbc2bbf94b9d4258b12095
Reviewed-on: http://gerrit.cloudera.org:8080/9625
Tested-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Mike Percy <mp...@apache.org>
(cherry picked from commit a74f9a0dcaf88315c8563b95cdeb5701d9ce5438)
---
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/quorum_util-test.cc
M src/kudu/consensus/quorum_util.cc
M src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
6 files changed, 455 insertions(+), 164 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/49/9649/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.7.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I35637c5bda6681b732dbc2bbf94b9d4258b12095
Gerrit-Change-Number: 9649
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR](branch-1.7.x) [consensus] FAILED UNRECOVERABLE replica health status

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9649 )

Change subject: [consensus] FAILED_UNRECOVERABLE replica health status
......................................................................

[consensus] FAILED_UNRECOVERABLE replica health status

Added HealthStatus::FAILED_UNRECOVERABLE for a tablet replica. This is
to mark replicas which are not able to catch up with the leader due to
GC-collected segments of WAL and other unrecoverable cases.

With the introduction of the FAILED_UNRECOVERABLE health status, the
replica management scheme becomes hybrid: the system evicts replicas
with FAILED_UNRECOVERABLE health status before adding a replacement
if it anticipates that it can commit the transaction.

This patch is a part of the fix to address KUDU-2342.  It also addresses
KUDU-2322 as well: evicting voter replicas more aggressively if they
fall behind log segment GC threshold.

Change-Id: I35637c5bda6681b732dbc2bbf94b9d4258b12095
Reviewed-on: http://gerrit.cloudera.org:8080/9625
Tested-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Mike Percy <mp...@apache.org>
(cherry picked from commit a74f9a0dcaf88315c8563b95cdeb5701d9ce5438)
Reviewed-on: http://gerrit.cloudera.org:8080/9649
Tested-by: Kudu Jenkins
Reviewed-by: Grant Henke <gr...@gmail.com>
---
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/quorum_util-test.cc
M src/kudu/consensus/quorum_util.cc
M src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
6 files changed, 455 insertions(+), 164 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Grant Henke: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/9649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.7.x
Gerrit-MessageType: merged
Gerrit-Change-Id: I35637c5bda6681b732dbc2bbf94b9d4258b12095
Gerrit-Change-Number: 9649
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR](branch-1.7.x) [consensus] FAILED UNRECOVERABLE replica health status

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/9649 )

Change subject: [consensus] FAILED_UNRECOVERABLE replica health status
......................................................................


Patch Set 1: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/9649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.7.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I35637c5bda6681b732dbc2bbf94b9d4258b12095
Gerrit-Change-Number: 9649
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 15 Mar 2018 15:43:36 +0000
Gerrit-HasComments: No