You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2019/12/04 04:28:11 UTC
[kudu-CR] KUDU-2993: don't require update dirs to fix directory inconsistencies
Hello Tidy Bot, YangSong, Kudu Jenkins, Adar Dembo,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/14760
to look at the new patch set (#10).
Change subject: KUDU-2993: don't require update_dirs to fix directory inconsistencies
......................................................................
KUDU-2993: don't require update_dirs to fix directory inconsistencies
This patch removes the ENFORCE_CONSISTENCY behavior when opening the
DataDirManager. By default, the FS layout will be opened with the new
UPDATE_AND_IGNORE_FAILURE mode, wherein:
- We update the PIMFs if we notice any are missing or their metadata is
not consistent with the actual set of directory UUIDs.
- We tolerate failures when creating and updating the PIMFs.
This also maintains the previous UPDATE_ON_DISK behavior as
UPDATE_AND_ERROR_ON_FAILURE, wherein a disk failure during the update
would halt any further updates and revert any metadata changes thus far.
This is only used by the 'update_dirs' tool to maintain existing
behavior.
Since we now rewrite the PIMFs to be consistent by default, the
"integrity check" is now gone. This check was previously useful to
ensure that the 'all_uuids' fields matched for every PIMF, which ensured
that every data directory that was expected to exist actually existed.
This was important for a couple reasons:
- When a single missing data directory spelled failure for the entire
node, starting up with even a single "inconsistent" directory would
break all tablets on the tserver.
- The file block manager requires that the UUID indexes used by the
DataDirManager are static. These indexes are defined by the ordering
of the UUIDs in the PIMFs, so we used the integrity check to ensure
the ordering was consistent across PIMFs.
Now that Kudu tablets can start up with missing directories, the first
reason isn't particularly enticing.
The second is trickier to work around. To work around it, I've kept the
essence of the UUID indexing for the file block manager, though I've
made the "integrity checking" virtually non-existent. For the log block
manager, I've made the UUID indexing much simpler: rather than relying
on the integrity check, we'll now always assign a PIMF a UUID, even if
we couldn't read one from disk.
Tests:
- Updated a few tests that previously enforced consistency among PIMFs
to instead check for the correct instance-updating behavior.
- Added a test to check that failures while updating the PIMFs don't
stop us from opening the FS layout.
- Added a test that checks that the adding/removing behavior on a
tserver affects and fails tablets as expected.
- Added a test to make sure that this doesn't completely break the file
block manager. Given we don't expect heavy usage of the FBM, I didn't
do extensive testing when the PIMFs are tampered with.
- Added a test to ensure we don't regress the rollback behavior of
the 'update_dirs' tool in the face of a disk failure.
Change-Id: Ic3027e7edb5c60e96ced6160fec1a380b38353a5
---
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager_util-test.cc
M src/kudu/fs/block_manager_util.cc
M src/kudu/fs/block_manager_util.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/error_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/server/server_base.cc
M src/kudu/tools/tool_action_fs.cc
M src/kudu/tserver/tablet_server-test.cc
14 files changed, 832 insertions(+), 706 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/14760/10
--
To view, visit http://gerrit.cloudera.org:8080/14760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic3027e7edb5c60e96ced6160fec1a380b38353a5
Gerrit-Change-Number: 14760
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: YangSong <sy...@yeah.net>