You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2017/11/01 19:29:07 UTC

[kudu-CR] shutdown tablets on disk failure at runtime

Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8442


Change subject: shutdown tablets on disk failure at runtime
......................................................................

shutdown tablets on disk failure at runtime

Before, various code paths pass along disk failure Statuses until they
eventually hit a CHECK failure and crash the server. Such fatal errors
were "safe" by design, as they would ensure no additional changes were
made durable to each tablet. This patch aims to achieve similar behavior
for failed replicas while keeping the server alive.

These failures are permitted provided the following have occurred for
each tablet in the affected directory:
* The failed directory is immediately marked as failed, preventing
  further tablets from being striped across a failed disk.
* The tablet's MvccManager is shut down to prevent further writes from
  being made durable and preventing I/O to the tablet.
* A request is submitted to a threadpool to eventually completely shut
  down the replica, eventually marking it for eviction.

Note: failures of the metadata directory and the WAL directory are
fatal.

This is a part of a series of patches to handle disk failure. To see how
this patch fits in, see section 2.4 of:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit

Change-Id: I0141f1c83a81d029b1e3c2659bbfcbe48a992626
---
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/tablet_server.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
7 files changed, 155 insertions(+), 35 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/42/8442/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8442
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0141f1c83a81d029b1e3c2659bbfcbe48a992626
Gerrit-Change-Number: 8442
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>

[kudu-CR] shutdown tablets on disk failure at runtime

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has abandoned this change. ( http://gerrit.cloudera.org:8080/8442 )

Change subject: shutdown tablets on disk failure at runtime
......................................................................


Abandoned

pushed a new patch instead of new rev
-- 
To view, visit http://gerrit.cloudera.org:8080/8442
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I0141f1c83a81d029b1e3c2659bbfcbe48a992626
Gerrit-Change-Number: 8442
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins