You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2016/05/09 19:08:46 UTC

[kudu-CR] Don't crash TS if consensus metadata is corrupted

Hello Adar Dembo,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/3006

to review the following change.

Change subject: Don't crash TS if consensus metadata is corrupted
......................................................................

Don't crash TS if consensus metadata is corrupted

If the consensus metadata somehow gets corrupted with a too-early term, the TS
should not crash with a CHECK failure. Instead, it should just mark that tablet
as FAILED.

Currently, the leader does not auto-evict a FAILED replica. But, the administrator
can use the CLI tools to delete the bad replica, which should cause it to get
automatically repaired.

This fix is based on an issue encountered in Bruce Song Zhang's cluster. His
cluster had been affected by KUDU-1436, which caused tablets on many servers to
have incorrect consensus metadata. Because of the CHECK that was in place, he
was unable to restart and recover those servers, causing an outage. With this
patch in place, only the affected tablets would have been affected, and
assuming a majority of replicas were still available, the table availability
would not have been compromised.

Change-Id: If9f85c1ce31a32e89e57c74e9750e66073b9752c
---
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc
M src/kudu/integration-tests/external_mini_cluster_fs_inspector.h
M src/kudu/integration-tests/raft_consensus-itest.cc
4 files changed, 73 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/06/3006/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3006
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: If9f85c1ce31a32e89e57c74e9750e66073b9752c
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>