You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kudu.apache.org by "Mike Percy (Code Review)" <ge...@cloudera.org> on 2016/04/01 09:58:17 UTC

[kudu-CR] KUDU-1377: Ignore corrupted trailing records on LBM metadata

Mike Percy has posted comments on this change.

Change subject: KUDU-1377: Ignore corrupted trailing records on LBM metadata
......................................................................


Patch Set 3:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

Line 759: // The idea behind this test is that we should tolerate partial records at the
> Nit: we tolerate only _one_ partial record; the last one. A partial non-las
Done


Line 761: // metdata files. A system crash or disk-full event can result in
> I wish we had more precision w.r.t. the kinds of failures that could result
Done


Line 767:   // TODO: Create block, then truncate metadata file, then start the thing up,
> Why is this a TODO?
Removed


Line 784:   gscoped_ptr<ReadableBlock> block;
> Why read the last block? What does that test?
To prove that we can open the block. It fails later.


Line 788:   // Reopen the block manager to prove that we can still append to a container
> Hmm, are you sure this isn't already tested by another test here, or by blo
Removed


Line 809:   LOG(INFO) << path;
> Is this still useful?
Removed


Line 810:   string metadata_path = path + ".metadata";
> Would be good to consolidate this in a constant within LogBlockManager, so 
Done


Line 819:   int ret = truncate(metadata_path.c_str(), meta_size + 1);
> How would you feel about opening metadata_path as an RWFile, then relying o
Done


Line 821:     PLOG(ERROR) << "truncate() of file " << metadata_path << " failed";
> Can you PLOG(FATAL) and thus avoid the FAIL() below?
Moot point with the RWFile


Line 826:   // our blocks.
> Should add that the container will not be reused.
Done


http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

Line 297:   mutable bool append_disabled_;
> Why does this need to be mutable? Where are we modifying it in a const meth
In LogBlockContainer::ReadContainerRecords()


Line 443:   if (PREDICT_TRUE(read_status.IsEndOfFile() || last_record_is_partial)) {
> This is another reason why ValidateAndRead() should return EOF in the parti
See my other comment about this. let's leave the API as-is.


Line 447:       append_disabled_ = true;
> Should probably LOG(WARN) something too.
Done


http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

Line 271:   // Return the path of the given container.
> Nit: add ForTests() or some such? It'd be brittle for real code to rely on 
That's why I made it private, but this doesn't hurt. Done


http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/util/pb_util-test.cc
File src/kudu/util/pb_util-test.cc:

Line 249:   {
> Nit: why the new scope? The other test cases don't do that.
It creates variables that are not reusable (pb_file). Better to keep the test scope a little cleaner.


Line 254:     int ret = truncate(path_.c_str(), known_good_size - 2);
> Again, consider adding Truncate() as a method to RWFile, and opening that h
Done


Line 256:       PLOG(ERROR) << "truncate() of file " << path_ << " failed";
> Combine the if statement with PLOG_IF(FATAL, ret != 0) here.
Moot point w/ RWFile::Truncate()


Line 266:     for (int i = 0; i < 2; i++) {
> Why the loop?
To verify that the response is repeatable. I'll add a comment.


http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/util/pb_util.cc
File src/kudu/util/pb_util.cc:

Line 458:     // Avoid returning Status::EndOfFile to caller since it may be confusing.
> How so? At the moment, the LBM can't recover from any non-OK status returne
Well, this was a suggestion you made on the previous rev, but I've now reverted it (it's simpler not to coerce it to Corruption).


Line 476:     return Status::EndOfFile("Reached end of file");
> This case doesn't set is_partial_record, and it should set it to true, righ
No, there is no partial record. It is truly the end of the file. I've added an assert for this.


http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/util/pb_util.h
File src/kudu/util/pb_util.h:

Line 233:   enum EofOK {
> Now unused?
Thanks for the catch.


Line 249:   // Return values:
> Rephrase:
Rephrased


Line 282:   // If 'file_size' < 'offset_' + 'length' then this function will not
> Let's make this more precise: we avoid incrementing offset_ in the case of 
Done


Line 285:   Status ValidateAndRead(size_t length, uint64_t file_size,
> On second thought, I don't think you need is_partial_record here at all. If
Yeah but the API is clunkier and ReadNextPB() is uglier.


Line 287:                          bool* is_partial_record);
> is_partial_record isn't a great name here, because ValidateAndRead() doesn'
renamed to is_partial_read


-- 
To view, visit http://gerrit.cloudera.org:8080/2595
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0b88ba051b390b6a2587eecdd205c478f1edccb4
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes