You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2019/12/04 04:28:11 UTC

[kudu-CR] KUDU-2993: don't require update dirs to fix directory inconsistencies

Hello Tidy Bot, YangSong, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14760

to look at the new patch set (#10).

Change subject: KUDU-2993: don't require update_dirs to fix directory inconsistencies
......................................................................

KUDU-2993: don't require update_dirs to fix directory inconsistencies

This patch removes the ENFORCE_CONSISTENCY behavior when opening the
DataDirManager. By default, the FS layout will be opened with the new
UPDATE_AND_IGNORE_FAILURE mode, wherein:
- We update the PIMFs if we notice any are missing or their metadata is
  not consistent with the actual set of directory UUIDs.
- We tolerate failures when creating and updating the PIMFs.

This also maintains the previous UPDATE_ON_DISK behavior as
UPDATE_AND_ERROR_ON_FAILURE, wherein a disk failure during the update
would halt any further updates and revert any metadata changes thus far.
This is only used by the 'update_dirs' tool to maintain existing
behavior.

Since we now rewrite the PIMFs to be consistent by default, the
"integrity check" is now gone. This check was previously useful to
ensure that the 'all_uuids' fields matched for every PIMF, which ensured
that every data directory that was expected to exist actually existed.
This was important for a couple reasons:
- When a single missing data directory spelled failure for the entire
  node, starting up with even a single "inconsistent" directory would
  break all tablets on the tserver.
- The file block manager requires that the UUID indexes used by the
  DataDirManager are static. These indexes are defined by the ordering
  of the UUIDs in the PIMFs, so we used the integrity check to ensure
  the ordering was consistent across PIMFs.

Now that Kudu tablets can start up with missing directories, the first
reason isn't particularly enticing.

The second is trickier to work around. To work around it, I've kept the
essence of the UUID indexing for the file block manager, though I've
made the "integrity checking" virtually non-existent. For the log block
manager, I've made the UUID indexing much simpler: rather than relying
on the integrity check, we'll now always assign a PIMF a UUID, even if
we couldn't read one from disk.

Tests:
- Updated a few tests that previously enforced consistency among PIMFs
  to instead check for the correct instance-updating behavior.
- Added a test to check that failures while updating the PIMFs don't
  stop us from opening the FS layout.
- Added a test that checks that the adding/removing behavior on a
  tserver affects and fails tablets as expected.
- Added a test to make sure that this doesn't completely break the file
  block manager. Given we don't expect heavy usage of the FBM, I didn't
  do extensive testing when the PIMFs are tampered with.
- Added a test to ensure we don't regress the rollback behavior of
  the 'update_dirs' tool in the face of a disk failure.

Change-Id: Ic3027e7edb5c60e96ced6160fec1a380b38353a5
---
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager_util-test.cc
M src/kudu/fs/block_manager_util.cc
M src/kudu/fs/block_manager_util.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/error_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/server/server_base.cc
M src/kudu/tools/tool_action_fs.cc
M src/kudu/tserver/tablet_server-test.cc
14 files changed, 832 insertions(+), 706 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/14760/10
-- 
To view, visit http://gerrit.cloudera.org:8080/14760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic3027e7edb5c60e96ced6160fec1a380b38353a5
Gerrit-Change-Number: 14760
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: YangSong <sy...@yeah.net>