You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2017/11/01 19:29:06 UTC

[kudu-CR] error manager: synchronize/serialize handling

Hello Tidy Bot, Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8395

to look at the new patch set (#4).

Change subject: error_manager: synchronize/serialize handling
......................................................................

error_manager: synchronize/serialize handling

The state of a tablet server post-disk-failure depends significantly on
the completion of disk-failure-handling callbacks. I.e. error handling
_must_ finish before anything is propagated back to the offending caller.
This is trickier when multiple calls are in flight that may trigger
error handling for a single tablet.

This patch extends the error manager to serialize such interweaved
calls: when a disk fails, it will run a disk-failure-handling callback,
and only once this is complete can another error be handled. Errors that
may indirectly be caused by disk failures can be handled by
non-disk-specific handling, serializing failure-handling in the same
fashion.

As an example of where this is necessary, say a tablet has data in a
single directory and hits a bad disk. That directory is immediately
marked failed and handling starts to fail all tablets in the directory.
Before, if the tablet were to create a new block before being failed, it
would fail immediately, complaining that no directories are available,
and would eventually fail a CHECK that translates roughly to: "Has error
handling for this tablet completed?"

By wrapping block creation with tablet-specific error handling and with
serialized error-handling, this CHECK will pass.

Change-Id: Ie61c408a0b4424f933f40a31147568c2f906be0e
---
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/data_dirs-test.cc
A src/kudu/fs/error_manager-test.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/tserver/tablet_server.cc
10 files changed, 295 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/8395/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie61c408a0b4424f933f40a31147568c2f906be0e
Gerrit-Change-Number: 8395
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>