You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2016/04/01 05:21:26 UTC

[kudu-CR] KUDU-495 (part 2): ensure all catalog writes for an operation are batched

Hello David Ribeiro Alves, Mike Percy, Todd Lipcon,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/2695

to review the following change.

Change subject: KUDU-495 (part 2): ensure all catalog writes for an operation are batched
......................................................................

KUDU-495 (part 2): ensure all catalog writes for an operation are batched

The motivation is simple: by combining all catalog writes for a single
logical operation into one write, we can safely downgrade the CHECK_OK in
DeleteTable() to a runtime failure, and we can get away with absolutely no
"roll forward" repair of on-disk metadata when it is reloaded. For this to
work, we need all of the master metadata to be in a single tablet, and that
was done a long time ago; all that remained was the previous change to the
sys_catalog API.

Let's start with DeleteTable(). The disparate writes have been combined into
one. The order of operations has been changed to be safe, with in-memory
modifications taking place only after the write has succeeded. Doing this
requires that we commit tablet mutations before table mutations, and forces
a change to tablet iteration order in ExtractTabletsToProcess(). I also
audited all tablet readers to make sure they're OK with seeing a deleted
tablet before its table (they are).

The disparate writes in CreateTable() have also been combined. The operation
is still not completely safe (some in-memory changes take place before the
write succeeds); I will tackle that in a follow-on patch.

To test this, I used a combination of white box and block box methods.
First, I extended fault injection to returns errors and hooked that up to
sys_catalog. The new test enables fault injection, then performs a bunch of
random operations as if it were a client, coping with failures as they
arise. When it's done, it does a verification pass on the master metadata
using the sys_catalog visitor hooks.

Change-Id: I5cbccf5ce22c005d7aa25bbdefe7502873a8ed7d
---
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master-test.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/util/fault_injection.cc
M src/kudu/util/fault_injection.h
6 files changed, 464 insertions(+), 179 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/2695/1
-- 
To view, visit http://gerrit.cloudera.org:8080/2695
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I5cbccf5ce22c005d7aa25bbdefe7502873a8ed7d
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>