You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2018/08/16 17:06:03 UTC

[kudu-CR] WIP KUDU-2469 pt 2: fail replicas on CFile corruption

Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11249


Change subject: WIP KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................

WIP KUDU-2469 pt 2: fail replicas on CFile corruption

wip: add tests

This patch adds handling for CFile corruption errors via the error
manager, pulling the tablet id of interest with the SCOPED_IO_CONTEXT.

The tablet id is plumbed down at various points that may incur CFile IO:
- when performing any maintenance op
- when opening the tablet to open the keys
- when beginning a scan

I opted to add a new CFILE ErrorHandlerType enum rather than using the
TABLET enum (which is currently a no-op and serves to serialize certain
failures, see error_manager.h for more), as updating the TABLET error
handling behavior would change the existing error handling behavior.

Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
---
M src/kudu/cfile/bloomfile.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/fs/error_manager.cc
M src/kudu/fs/error_manager.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_mm_ops.cc
M src/kudu/tablet/tablet_replica_mm_ops.cc
M src/kudu/tserver/tablet_server.cc
M src/kudu/tserver/tablet_service.cc
10 files changed, 125 insertions(+), 28 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/49/11249/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>

[kudu-CR] KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11249 )

Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/bloomfile.cc
File src/kudu/cfile/bloomfile.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/bloomfile.cc@257
PS9, Line 257:     return Status::Corruption("bloom file is compressed (compression not supported)",
> I suppose the type of corruption is interesting. 
Yeah, these ones are pretty fuzzy. I think a few of these `return Corruption`s could probably be DCHECKs or CHECKs, but returning an error is more conservative. If anything, I can just omit this handling since it's a corruption that we wouldn't expect, and it's probably worth surfacing as a bug, rather than handling it.


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/bloomfile.cc@344
PS9, Line 344:     RETURN_NOT_OK(ParseBlockHeader(io_context, dblk_data.data(), &hdr, &bloom_data));
> Can we rely on the the above reader_->ReadBlock to handle any corruption is
Hrm, that's a good point. As long as checksums are doing their job, all the other `return Corruption` paths _should_ be indicative of a bug. I wonder if this extra layer of protection is valuable at all.


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc@185
PS9, Line 185:   RETURN_NOT_OK(ReadAndParseFooter(io_context));
> Can we handle header and footer corruption here with RETURN_NOT_OK_HANDLE_C
I'm not too keen on that approach. You're right that that's what we did for EIO handling, but I think this approach makes it much clearer where handling is happening, _and_ makes it easier to say that all the places that should be covered _are_ covered.

That approach circumvents this extra plumbing, but I'm not sure it reduces the number of places we need to cover. With current handling, we have to handle at every `return Corruption` site. Relying on RETURN_NOT_OK_HANDLE_CORRUPTION, we would have to wrap every call to a function that may return a corruption.



-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 9
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Thu, 30 Aug 2018 17:54:25 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11249

to look at the new patch set (#6).

Change subject: WIP KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................

WIP KUDU-2469 pt 2: fail replicas on CFile corruption

wip: add tests for scans

This patch adds handling for CFile corruption errors via the error
manager, pulling the tablet id of interesting from the IOContext.

Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
---
M src/kudu/cfile/bloomfile.cc
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/index_btree.cc
M src/kudu/cfile/index_btree.h
M src/kudu/fs/error_manager.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/io_context.h
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/tserver/tablet_server.cc
13 files changed, 171 insertions(+), 52 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/49/11249/6
-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11249 )

Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................


Patch Set 12: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc@185
PS9, Line 185:         "cfile uses features from an incompatible bitset value $0 vs supported $1 ",
> I'm not too keen on that approach. You're right that that's what we did for
I think these 2 lines and



-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 12
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Tue, 04 Sep 2018 14:25:43 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11249 )

Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11249/10/src/kudu/integration-tests/disk_failure-itest.cc
File src/kudu/integration-tests/disk_failure-itest.cc:

http://gerrit.cloudera.org:8080/#/c/11249/10/src/kudu/integration-tests/disk_failure-itest.cc@267
PS10, Line 267:         Substitute("--$0=$1", flag_pair.first, flag_pair.second));
> warning: passing result of std::move() as a const reference argument; no mo
Done


http://gerrit.cloudera.org:8080/#/c/11249/10/src/kudu/integration-tests/disk_failure-itest.cc@268
PS10, Line 268:   }
> warning: passing result of std::move() as a const reference argument; no mo
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 11
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Sat, 01 Sep 2018 17:20:05 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11249 )

Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................


Patch Set 9: Verified+1

Failure appears to be a known dist test issue


-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 9
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Thu, 30 Aug 2018 06:42:56 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11249

to look at the new patch set (#7).

Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................

KUDU-2469 pt 2: fail replicas on CFile corruption

This adds handling for CFile corruption errors via the error manager. If
a CFile corruption is encountered, the tablet affected will be failed
and scheduled to be shutdown, pulling the tablet id of interest from the
IOContext.

Corruption handling is delegated to the CFileReaders, which have access
to the error manager. If a CFileReader method is to return a corruption
error, it first must call the appropriate error-handling. Wrappers
classes of CFileReaders that yield corruption errors can do the same.

This patch includes some fault injection macros that helped facilitate
testing, and some extra plumbing of IOContexts in places that were
caught without coverage: the IndexTreeIterator, the BloomCache, and
DeltaFileReader::ReadDeltaStats().

Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
---
M src/kudu/cfile/bloomfile.cc
M src/kudu/cfile/bloomfile.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/index_btree.cc
M src/kudu/cfile/index_btree.h
M src/kudu/fs/error_manager.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/io_context.h
M src/kudu/integration-tests/disk_failure-itest.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/tserver/tablet_server.cc
15 files changed, 385 insertions(+), 99 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/49/11249/7
-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11249 )

Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................


Patch Set 9:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/bloomfile.cc
File src/kudu/cfile/bloomfile.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/bloomfile.cc@257
PS9, Line 257:     return Status::Corruption("bloom file is compressed (compression not supported)",
> I suppose the type of corruption is interesting. 
I'm also wondering why compression or a missing value index are grounds for corruption. They seem more like NotSupported to me (i.e backwards incompatible on-disk state that is being loaded by an older version of Kudu). Does it really make sense to fail these replicas? What happens if reader_->HandleCorruption() isn't called; does the server crash?


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc@234
PS9, Line 234:       .CloneAndPrepend("failed to parse CFile pre-header");
Shouldn't the CloneAndPrepend be conditioned on !s.OK()?


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/fs/io_context.h
File src/kudu/fs/io_context.h:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/fs/io_context.h@30
PS9, Line 30: user
s/user/module/ based on the text in the examples.


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/integration-tests/disk_failure-itest.cc
File src/kudu/integration-tests/disk_failure-itest.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/integration-tests/disk_failure-itest.cc@217
PS9, Line 217:     ::testing::Values(ErrorType::CFILE_CORRUPTION));
Shouldn't this parameterize on DISK_FAILURE too?


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/integration-tests/disk_failure-itest.cc@240
PS9, Line 240:   // Wait a bit for some errors to occur.
Can we ASSERT_EVENTUALLY() on something here, to ensure that we only run the ClusterVerifier when an error has actually occurred? Is there a concern that rereplication may kick in and finish too quickly for us to observe an error in the test? Maybe we can look at a metric?


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/integration-tests/disk_failure-itest.cc@250
PS9, Line 250:   // Set up a workload that does both reads and writes.
But you're setting the number of write threads to 0...


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/integration-tests/disk_failure-itest.cc@272
PS9, Line 272:   // Wait a bit for some errors to occur.
Same feedback about waiting for an error here.


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tablet/deltafile.cc
File src/kudu/tablet/deltafile.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tablet/deltafile.cc@278
PS9, Line 278:     return Status::Corruption("file does not have a value index!");
Again, seems more like NotSupported than Corruption.


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tablet/deltafile.cc@291
PS9, Line 291:     return Status::Corruption("missing delta stats from the delta file metadata");
Same here.


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tserver/tablet_server-test.cc
File src/kudu/tserver/tablet_server-test.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tserver/tablet_server-test.cc@1676
PS9, Line 1676:     auto log_req = MakeScopedCleanup([&] {
              :       LOG(INFO) << SecureDebugString(req);
              :     });
Does SCOPED_TRACE not work in here?



-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 9
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Thu, 30 Aug 2018 17:58:18 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11249 )

Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc@185
PS9, Line 185:   RETURN_NOT_OK(ReadAndParseFooter(io_context));
> I'm not too keen on that approach. You're right that that's what we did for
I think if we eliminate all the Corruption status that should actually be something else. Like "not supported", then we only need to wrap InitOnce and the VerifyChecksum call in ReadBlock. These are the private methods that validate checksums.



-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 9
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Thu, 30 Aug 2018 19:58:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11249

to look at the new patch set (#9).

Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................

KUDU-2469 pt 2: fail replicas on CFile corruption

This adds handling for CFile corruption errors via the error manager. If
a CFile corruption is encountered, the tablet affected will be failed
and scheduled to be shutdown, pulling the tablet id of interest from the
IOContext.

Corruption handling is delegated to the CFileReaders, which have access
to the error manager. If a CFileReader method is to return a corruption
error, it first must call the appropriate error-handling. Wrappers
classes of CFileReaders that yield corruption errors can do the same.

This patch includes some fault injection macros that helped facilitate
testing, and some extra plumbing of IOContexts in places that were
caught without coverage: the IndexTreeIterator, the BloomCache, and
DeltaFileReader::ReadDeltaStats().

Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
---
M src/kudu/cfile/bloomfile.cc
M src/kudu/cfile/bloomfile.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/index_btree.cc
M src/kudu/cfile/index_btree.h
M src/kudu/fs/error_manager.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/io_context.h
M src/kudu/integration-tests/disk_failure-itest.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/tserver/tablet_server.cc
15 files changed, 399 insertions(+), 105 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/49/11249/9
-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 9
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has removed a vote on this change.

Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 9
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11249 )

Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................


Patch Set 9:

(5 comments)

I did a quick first pass. The theme of my feedback is that handling the corruption at the boundaries of the public api instead of each location of internal api should reduce the location we need to add handling code while still providing the same coverage. 

As long as the lower level calls return Status:Corruption the higher level handling can handle and fail the tablet.

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/bloomfile.cc
File src/kudu/cfile/bloomfile.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/bloomfile.cc@257
PS9, Line 257:     return Status::Corruption("bloom file is compressed (compression not supported)",
I suppose the type of corruption is interesting. 

It's clear the a Cfile with a checksum issue is corrupt and should be failed. 

How could some of these other corruptions happen?  Missing index, compression unsupported, etc. All of these should result in checksum errors when initializing in theory. Is this catch-all safe?


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/bloomfile.cc@344
PS9, Line 344:     RETURN_NOT_OK(ParseBlockHeader(io_context, dblk_data.data(), &hdr, &bloom_data));
Can we rely on the the above reader_->ReadBlock to handle any corruption issues instead of handling all cases in the bloomfile code too?


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc@185
PS9, Line 185:   RETURN_NOT_OK(ReadAndParseFooter(io_context));
Can we handle header and footer corruption here with RETURN_NOT_OK_HANDLE_CORRUPTION instead of at each location in the internal methods?


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc@328
PS9, Line 328:     RETURN_NOT_OK(VerifyChecksum(io_context, slices, checksum));
Can we handle the checksum corruption here with RETURN_NOT_OK_HANDLE_CORRUPTION instead of at each location in the internal method?


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tablet/deltafile.cc
File src/kudu/tablet/deltafile.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tablet/deltafile.cc@274
PS9, Line 274:   RETURN_NOT_OK(reader_->Init(io_context));
Same comment in this file as the ones in the bloomfile code. Can we depend on CfileReader correctly handling any corruption so that we don't need to handle it in all the wrappers?



-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 9
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Thu, 30 Aug 2018 17:13:31 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11249 )

Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................


Patch Set 12: Code-Review+1

But will defer to Grant.


-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 12
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Sat, 01 Sep 2018 22:52:38 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11249 )

Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................

KUDU-2469 pt 2: fail replicas on CFile corruption

This adds handling for CFile corruption errors via the error manager. If
a CFile corruption is encountered, the replica affected will be failed
and scheduled to be shutdown (pulling the tablet id of interest from the
IOContext), and eventually resulting in its re-replication.

Corruption handling is entirely delegated to the CFileReaders, which
have access to the error manager. Given that checksum errors are
detected in VerifyChecksum(), methods that wrap VerifyChecksum() must
expect the corruption and handle it, namely ReadBlock() and Init().

This patch also includes a fault injection flag that helped facilitate
testing, and some extra plumbing of IOContexts in places that were
caught without coverage: the IndexTreeIterator and the BloomCache.

Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Reviewed-on: http://gerrit.cloudera.org:8080/11249
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Reviewed-by: Grant Henke <gr...@apache.org>
---
M src/kudu/cfile/bloomfile.cc
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/index_btree.cc
M src/kudu/cfile/index_btree.h
M src/kudu/fs/error_manager.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/io_context.h
M src/kudu/integration-tests/disk_failure-itest.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/tserver/tablet_server.cc
13 files changed, 342 insertions(+), 77 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Adar Dembo: Looks good to me, but someone else must approve
  Grant Henke: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 13
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11249 )

Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11249/7/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

http://gerrit.cloudera.org:8080/#/c/11249/7/src/kudu/cfile/cfile_reader.h@115
PS7, Line 115:   // Reads the data block pointed to by `ptr`. Will pull the data block from
> warning: missing username/bug in TODO [google-readability-todo]
Rewrote this comment


http://gerrit.cloudera.org:8080/#/c/11249/7/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/11249/7/src/kudu/cfile/cfile_reader.cc@59
PS7, Line 59: #include "kudu/util/debug/trace_event.h"
> warning: #includes are not sorted properly [llvm-include-order]
Removed this



-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Thu, 30 Aug 2018 05:29:19 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11249

to look at the new patch set (#3).

Change subject: WIP KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................

WIP KUDU-2469 pt 2: fail replicas on CFile corruption

wip: add tests

This patch adds handling for CFile corruption errors via the error
manager, pulling the tablet id of interesting from the IOContext.

Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
---
M src/kudu/cfile/bloomfile.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/fs/error_manager.cc
M src/kudu/fs/error_manager.h
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/tserver/tablet_server.cc
9 files changed, 67 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/49/11249/3
-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11249

to look at the new patch set (#10).

Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................

KUDU-2469 pt 2: fail replicas on CFile corruption

This adds handling for CFile corruption errors via the error manager. If
a CFile corruption is encountered, the tablet affected will be failed
and scheduled to be shutdown, pulling the tablet id of interest from the
IOContext.

Corruption handling is entirely delegated to the CFileReaders, which
have access to the error manager. Given checksum errors are detected in
VerifyChecksum(), methods that wrap VerifyChecksum() must expect the
corruption and handle it, namely ReadBlock() and Init().

This patch includes a fault injection flag that helped facilitate
testing, and some extra plumbing of IOContexts in places that were
caught without coverage: the IndexTreeIterator and the BloomCache.

Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
---
M src/kudu/cfile/bloomfile.cc
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/index_btree.cc
M src/kudu/cfile/index_btree.h
M src/kudu/fs/error_manager.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/io_context.h
M src/kudu/integration-tests/disk_failure-itest.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/tserver/tablet_server.cc
13 files changed, 341 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/49/11249/10
-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] WIP KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11249 )

Change subject: WIP KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................


Patch Set 5:

(6 comments)

Still a WIP since I'd like to maybe add more docs to IOContext and how it should be used and add more tests, but just checking this in.

http://gerrit.cloudera.org:8080/#/c/11249/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11249/2//COMMIT_MSG@18
PS2, Line 18: 
            : 
            : 
            : 
            : 
> I read through the comment in error_manager.h but still don't understand wh
I tried clarifying this in a separate patch: https://gerrit.cloudera.org/c/11303/


http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/cfile/cfile_reader.cc@81
PS2, Line 81: DEFINE_double(cfile_inject_corruption, 0,
> One of the reasons I didn't do this while writing the checksum code  was du
Ah nice, thanks for the pointer. Seems pretty useful for unit testing, though I think I'll still add this since I find it a bit easier to use in more integration-y tests. I agree it's kind of clunky in terms of the number of locations, but IMO it also makes it easier to get broader coverage of errors.


http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/cfile/cfile_reader.cc@227
PS2, Line 227: 
> Should we use a macro for this pattern? Something similar to the HANDLE_DIS
Done


http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/cfile/cfile_reader.cc@348
PS2, Line 348:   if (PREDICT_FALSE(checksum_value != expected_checksum)) {
> Can we assert that this exists?
Done


http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/fs/error_manager.h
File src/kudu/fs/error_manager.h:

http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/fs/error_manager.h@76
PS2, Line 76:   return _s; \
> Do we need a new handler type? Or is TABLET good enough to re-use?
I tried clarifying this in this patch: https://gerrit.cloudera.org/c/11303/
Basically, the error handling should be based on the type of error, not the error handling callback. Sure, we want to fail a single tablet, but the TABLET handler is being used elsewhere, and changing the callback would have effects elsewhere.


http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/fs/error_manager.h@103
PS2, Line 103: 
> safety
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Wed, 29 Aug 2018 03:18:07 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11249

to look at the new patch set (#4).

Change subject: WIP KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................

WIP KUDU-2469 pt 2: fail replicas on CFile corruption

wip: add tests

This patch adds handling for CFile corruption errors via the error
manager, pulling the tablet id of interesting from the IOContext.

Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
---
M src/kudu/cfile/bloomfile.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/fs/error_manager.cc
M src/kudu/fs/error_manager.h
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/tserver/tablet_server.cc
9 files changed, 66 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/49/11249/4
-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11249

to look at the new patch set (#12).

Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................

KUDU-2469 pt 2: fail replicas on CFile corruption

This adds handling for CFile corruption errors via the error manager. If
a CFile corruption is encountered, the replica affected will be failed
and scheduled to be shutdown (pulling the tablet id of interest from the
IOContext), and eventually resulting in its re-replication.

Corruption handling is entirely delegated to the CFileReaders, which
have access to the error manager. Given that checksum errors are
detected in VerifyChecksum(), methods that wrap VerifyChecksum() must
expect the corruption and handle it, namely ReadBlock() and Init().

This patch also includes a fault injection flag that helped facilitate
testing, and some extra plumbing of IOContexts in places that were
caught without coverage: the IndexTreeIterator and the BloomCache.

Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
---
M src/kudu/cfile/bloomfile.cc
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/index_btree.cc
M src/kudu/cfile/index_btree.h
M src/kudu/fs/error_manager.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/io_context.h
M src/kudu/integration-tests/disk_failure-itest.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/tserver/tablet_server.cc
13 files changed, 342 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/49/11249/12
-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 12
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11249 )

Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11249/6/src/kudu/cfile/index_btree.h
File src/kudu/cfile/index_btree.h:

http://gerrit.cloudera.org:8080/#/c/11249/6/src/kudu/cfile/index_btree.h@110
PS6, Line 110:   Status LoadBlock(const BlockPointer &block, int depth);
> warning: function 'kudu::cfile::IndexTreeIterator::LoadBlock' has a definit
Done


http://gerrit.cloudera.org:8080/#/c/11249/6/src/kudu/cfile/index_btree.cc
File src/kudu/cfile/index_btree.cc:

http://gerrit.cloudera.org:8080/#/c/11249/6/src/kudu/cfile/index_btree.cc@318
PS6, Line 318: }
> warning: do not use 'else' after 'return' [readability-else-after-return]
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Thu, 30 Aug 2018 04:09:03 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11249 )

Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................


Patch Set 9:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/bloomfile.cc
File src/kudu/cfile/bloomfile.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/bloomfile.cc@257
PS9, Line 257:     return Status::Corruption("bloom file is compressed (compression not supported)",
> I'm also wondering why compression or a missing value index are grounds for
Yeah, I agree with returning some other error, or just not handling the corruption since it's not indicative of a disk-related corruption. It might crash, depending on the operation (e.g. if this is done in the context of Flush, it will, if on a scan, compaction, it won't).

Regardless, failing the tablet seems inappropriate here.


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc@185
PS9, Line 185:   RETURN_NOT_OK(ReadAndParseFooter(io_context));
> I think if we eliminate all the Corruption status that should actually be s
Done


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc@234
PS9, Line 234:       .CloneAndPrepend("failed to parse CFile pre-header");
> Shouldn't the CloneAndPrepend be conditioned on !s.OK()?
It could be, though CloneAndPrepend does that check already.


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc@328
PS9, Line 328:     RETURN_NOT_OK(VerifyChecksum(io_context, slices, checksum));
> Can we handle the checksum corruption here with RETURN_NOT_OK_HANDLE_CORRUP
Done


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/fs/io_context.h
File src/kudu/fs/io_context.h:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/fs/io_context.h@30
PS9, Line 30: user
> s/user/module/ based on the text in the examples.
Done


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/integration-tests/disk_failure-itest.cc
File src/kudu/integration-tests/disk_failure-itest.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/integration-tests/disk_failure-itest.cc@217
PS9, Line 217:     ::testing::Values(ErrorType::CFILE_CORRUPTION));
> Shouldn't this parameterize on DISK_FAILURE too?
Whoops yes


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/integration-tests/disk_failure-itest.cc@240
PS9, Line 240:   // Wait a bit for some errors to occur.
> Can we ASSERT_EVENTUALLY() on something here, to ensure that we only run th
Done


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/integration-tests/disk_failure-itest.cc@250
PS9, Line 250:   // Set up a workload that does both reads and writes.
> But you're setting the number of write threads to 0...
Done


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/integration-tests/disk_failure-itest.cc@272
PS9, Line 272:   // Wait a bit for some errors to occur.
> Same feedback about waiting for an error here.
Done


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tablet/deltafile.cc
File src/kudu/tablet/deltafile.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tablet/deltafile.cc@274
PS9, Line 274:   RETURN_NOT_OK(reader_->Init(io_context));
> Same comment in this file as the ones in the bloomfile code. Can we depend 
Done


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tablet/deltafile.cc@278
PS9, Line 278:     return Status::Corruption("file does not have a value index!");
> Again, seems more like NotSupported than Corruption.
Done


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tablet/deltafile.cc@291
PS9, Line 291:     return Status::Corruption("missing delta stats from the delta file metadata");
> Same here.
Done


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tserver/tablet_server-test.cc
File src/kudu/tserver/tablet_server-test.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tserver/tablet_server-test.cc@1676
PS9, Line 1676:     auto log_req = MakeScopedCleanup([&] {
              :       LOG(INFO) << SecureDebugString(req);
              :     });
> Does SCOPED_TRACE not work in here?
Indeed, seems ASSERT_EVENTUALLY and SCOPED_TRACE don't like each other. It would break the build as:

 ../../src/kudu/tserver/tablet_server-test.cc:1674:77: note: ‘testing::internal::ScopedTrace gtest_trace_1683’ previously declared here

I'll just tweak this a bit, since this is kind of ugly.



-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 9
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Sat, 01 Sep 2018 07:11:29 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11249 )

Change subject: WIP KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11249/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11249/2//COMMIT_MSG@18
PS2, Line 18: 
            : I opted to add a new CFILE ErrorHandlerType enum rather than using the
            : TABLET enum (which is currently a no-op and serves to serialize certain
            : failures, see error_manager.h for more), as using the TABLET enum would
            : change the existing error handling behavior of certain errors.
I read through the comment in error_manager.h but still don't understand why we can't reuse TABLET for these errors. I don't remember the details of our initial discussion on error serialization, unfortunately.


http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/cfile/cfile_reader.cc@348
PS2, Line 348:   if (ctx && !ctx->tablet_id().empty()) {
Can we assert that this exists?


http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/fs/error_manager.h
File src/kudu/fs/error_manager.h:

http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/fs/error_manager.h@103
PS2, Line 103: saftey
safety



-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Thu, 16 Aug 2018 18:40:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11249

to look at the new patch set (#5).

Change subject: WIP KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................

WIP KUDU-2469 pt 2: fail replicas on CFile corruption

wip: add tests

This patch adds handling for CFile corruption errors via the error
manager, pulling the tablet id of interesting from the IOContext.

Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
---
M src/kudu/cfile/bloomfile.cc
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/index_btree.cc
M src/kudu/cfile/index_btree.h
M src/kudu/fs/error_manager.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/io_context.h
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/tserver/tablet_server.cc
13 files changed, 167 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/49/11249/5
-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] WIP KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11249

to look at the new patch set (#2).

Change subject: WIP KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................

WIP KUDU-2469 pt 2: fail replicas on CFile corruption

wip: add tests

This patch adds handling for CFile corruption errors via the error
manager, pulling the tablet id of interest with the SCOPED_IO_CONTEXT.

The tablet id is plumbed down at various points that may incur CFile IO:
- when performing any maintenance op
- when opening the tablet to open the keys
- when beginning a scan

I opted to add a new CFILE ErrorHandlerType enum rather than using the
TABLET enum (which is currently a no-op and serves to serialize certain
failures, see error_manager.h for more), as using the TABLET enum would
change the existing error handling behavior of certain errors.

Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
---
M src/kudu/cfile/bloomfile.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/fs/error_manager.cc
M src/kudu/fs/error_manager.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_mm_ops.cc
M src/kudu/tablet/tablet_replica_mm_ops.cc
M src/kudu/tserver/tablet_server.cc
M src/kudu/tserver/tablet_service.cc
10 files changed, 125 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/49/11249/2
-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11249

to look at the new patch set (#8).

Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................

KUDU-2469 pt 2: fail replicas on CFile corruption

This adds handling for CFile corruption errors via the error manager. If
a CFile corruption is encountered, the tablet affected will be failed
and scheduled to be shutdown, pulling the tablet id of interest from the
IOContext.

Corruption handling is delegated to the CFileReaders, which have access
to the error manager. If a CFileReader method is to return a corruption
error, it first must call the appropriate error-handling. Wrappers
classes of CFileReaders that yield corruption errors can do the same.

This patch includes some fault injection macros that helped facilitate
testing, and some extra plumbing of IOContexts in places that were
caught without coverage: the IndexTreeIterator, the BloomCache, and
DeltaFileReader::ReadDeltaStats().

Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
---
M src/kudu/cfile/bloomfile.cc
M src/kudu/cfile/bloomfile.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/index_btree.cc
M src/kudu/cfile/index_btree.h
M src/kudu/fs/error_manager.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/io_context.h
M src/kudu/integration-tests/disk_failure-itest.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/tserver/tablet_server.cc
15 files changed, 392 insertions(+), 104 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/49/11249/8
-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] WIP KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11249 )

Change subject: WIP KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/cfile/cfile_reader.cc@81
PS2, Line 81: DEFINE_double(cfile_inject_corruption, 0,
One of the reasons I didn't do this while writing the checksum code  was due to all the locations it would need to be called/used. Instead I wrote a test that manually corrupts each bit. See cfile-test.cc CorruptAndReadBlock. 

Not saying you can't do this, it's just another option.


http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/cfile/cfile_reader.cc@227
PS2, Line 227:   if (s.IsCorruption()) {
Should we use a macro for this pattern? Something similar to the HANDLE_DISK_FAILURE macro?


http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/fs/error_manager.h
File src/kudu/fs/error_manager.h:

http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/fs/error_manager.h@76
PS2, Line 76:   CFILE,
Do we need a new handler type? Or is TABLET good enough to re-use?

Is there a different action we would take because it's CFILE and not TABLET?



-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Thu, 16 Aug 2018 17:30:09 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2469 pt 2: fail replicas on CFile corruption

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11249

to look at the new patch set (#11).

Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................

KUDU-2469 pt 2: fail replicas on CFile corruption

This adds handling for CFile corruption errors via the error manager. If
a CFile corruption is encountered, the replica affected will be failed
and scheduled to be shutdown (pulling the tablet id of interest from the
IOContext), and eventually resulting in its re-replication.

Corruption handling is entirely delegated to the CFileReaders, which
have access to the error manager. Given that checksum errors are
detected in VerifyChecksum(), methods that wrap VerifyChecksum() must
expect the corruption and handle it, namely ReadBlock() and Init().

This patch also includes a fault injection flag that helped facilitate
testing, and some extra plumbing of IOContexts in places that were
caught without coverage: the IndexTreeIterator and the BloomCache.

Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
---
M src/kudu/cfile/bloomfile.cc
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/index_btree.cc
M src/kudu/cfile/index_btree.h
M src/kudu/fs/error_manager.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/io_context.h
M src/kudu/integration-tests/disk_failure-itest.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/tserver/tablet_server.cc
13 files changed, 340 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/49/11249/11
-- 
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 11
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot