You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2017/11/01 19:29:07 UTC
[kudu-CR] shutdown tablets on disk failure at runtime
Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8442
Change subject: shutdown tablets on disk failure at runtime
......................................................................
shutdown tablets on disk failure at runtime
Before, various code paths pass along disk failure Statuses until they
eventually hit a CHECK failure and crash the server. Such fatal errors
were "safe" by design, as they would ensure no additional changes were
made durable to each tablet. This patch aims to achieve similar behavior
for failed replicas while keeping the server alive.
These failures are permitted provided the following have occurred for
each tablet in the affected directory:
* The failed directory is immediately marked as failed, preventing
further tablets from being striped across a failed disk.
* The tablet's MvccManager is shut down to prevent further writes from
being made durable and preventing I/O to the tablet.
* A request is submitted to a threadpool to eventually completely shut
down the replica, eventually marking it for eviction.
Note: failures of the metadata directory and the WAL directory are
fatal.
This is a part of a series of patches to handle disk failure. To see how
this patch fits in, see section 2.4 of:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit
Change-Id: I0141f1c83a81d029b1e3c2659bbfcbe48a992626
---
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/tablet_server.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
7 files changed, 155 insertions(+), 35 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/42/8442/1
--
To view, visit http://gerrit.cloudera.org:8080/8442
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0141f1c83a81d029b1e3c2659bbfcbe48a992626
Gerrit-Change-Number: 8442
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
[kudu-CR] shutdown tablets on disk failure at runtime
Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has abandoned this change. ( http://gerrit.cloudera.org:8080/8442 )
Change subject: shutdown tablets on disk failure at runtime
......................................................................
Abandoned
pushed a new patch instead of new rev
--
To view, visit http://gerrit.cloudera.org:8080/8442
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I0141f1c83a81d029b1e3c2659bbfcbe48a992626
Gerrit-Change-Number: 8442
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins