You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Grant Henke (Code Review)" <ge...@cloudera.org> on 2017/04/13 19:08:53 UTC

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Grant Henke has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6630

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block.

cfile_write_checksums is defaulted to false to ensure upgrades don't
immeditaly result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not writen the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enabled validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

TODO:
- More unit and perf testing
- Change cfile block to page to avoid term overload?

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
4 files changed, 103 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 13:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 56: TAG_FLAG(cfile_verify_checksums, experimental);
> Done. I am not sure I am clear on the difference between experimental and e
I think the main difference is that experimental is for things we don't necessarily want to draw attention to, while evolving is for something that we want people to use, but retain the right to adjust how it's configured in the future.


Line 137: Status CFileReader::ReadMagicAndLength(uint64_t offset, Slice* slice) {
> I only kept it for the sake of the TRACE_EVENT1. Can remove it.
I think every caller already has a TRACE_EVENT() of some kind in it, so we don't need to retain this purely for that.


Line 438:     }
> This is mainly to prevent data_size -= checksum_size from becoming negative
I understand, but what exactly does it mean for a ReadBlock() call to provide ptr.size() < 4? Is that impossible when hasChecksums() is true?


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

Line 67: TAG_FLAG(cfile_write_checksums, experimental);
> I describe the defaults a bit in the patch summary. Think I should add a mo
No, my reading skills aren't as good as they should be. Sorry about that.

But, since we want people to try this feature out, I think evolving is the better tag choice. You could check with Todd/Mike for a second opinion.


Line 263:   // Prepend a footer checksum.
> I saw the same thing but don't have a great answer. I can change it if you 
If you can't prove to yourself that NOT using WriteRawData() is more correct, then change it.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 13:

(28 comments)

http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

Line 244:     if (FLAGS_cfile_write_checksums) {
Can you make TestCFile a friend of CFileReader (see the FRIEND_TEST macro) and call HasChecksum() instead? Or make HasChecksum() a public method?


Line 330:     source->Size(&file_size);
RETURN_NOT_OK


Line 333:     source->Read(0, &data);
RETURN_NOT_OK


Line 843:   source->Size(&file_size);
Should ASSERT_OK on this too.


PS13, Line 847: 254
Just in case one of the file's bytes actually has the value 254, maybe you should do something more drastic like flip one of the bits? That way there's always a "change" to the byte's value.

Bonus points: if it's not too slow, add another loop which selects which of 8 bits in the byte to flip.


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

PS13, Line 56: TAG_FLAG(cfile_verify_checksums, experimental);
Should probably default to evolving.


Line 137: Status CFileReader::ReadMagicAndLength(uint64_t offset, Slice* slice) {
Since this function is now just a block read, I think it's net more readable to remove it and do the block read inline wherever it was needed.


Line 210:     if (header.size() != header_size || checksum.size() != checksum_size) {
Do we need this check now that there are no short reads?


Line 215:     if (FLAGS_cfile_verify_checksums) {
If this is false, why bother reading the checksum out of the block at all?


Line 226:     RETURN_NOT_OK(block_->Read(off, &header));
Would be cleaner to call ReadV() regardless of whether checksums exist or not, and just pass either a single Slice or two Slices as needed.


Line 227:     if (header.size() != header_size) {
Do we need this check now that there are no short reads?


PS13, Line 271:   // Read both the header and checksum in one call.
              :   // We read the checksum position in case one exists.
              :   // This is done to avoid the need for a follow up read call.
Just to be clear, if there are no checksums, the four extra bytes we're reading:
1. Are definitely not checksum data, but,
2. Are guaranteed to exist.

Is that right? And AFAICT, since the footer itself tells us whether the cfile has checksums, the only alternative is to do two separate reads, and we think reading an extra 4 bytes is the better trade-off, right?


Line 277:   if (footer.size() != footer_size || checksum.size() != checksum_size) {
No short reads; do we still need this?


PS13, Line 282:   // This needs to be done before validating the checksum since the
              :   // incompatible_features flag tells us if a checksum exists at all.
What's the point of the footer checksum then? It seems extraordinarily rare that we'll find a footer than can serialize properly but fails its checksum test.


PS13, Line 436:     if (checksum_size > data_size) {
              :       return Status::Corruption("invalid data size");
              :     }
This effectively forbids zero-length block reads, right? Is that an invariant that we already enforce?


Line 461:     if (block.size() != data_size || checksum.size() != checksum_size) {
No short reads...


PS13, Line 466:     if (FLAGS_cfile_verify_checksums) {
              :       uint32_t expected_checksum = DecodeFixed32(checksum.data());
              :       uint32_t checksum_value = crc::Crc32c(block.data(), block.size());
              :       if (PREDICT_FALSE(checksum_value != expected_checksum)) {
              :         return Status::Corruption(
              :             Substitute("Checksum does not match at offset $0 size $1: $2 vs $3",
              :                        ptr.offset(), block.size(), checksum_value, expected_checksum));
              :       }
              :     }
I think I've seen this block of code three times; can you factor it out into a helper function? Something like:

  Status VerifyChecksum(...):
    if !FLAGS_cfile_verify_checksums:
      return Status::OK();
    <verify the checksum>


Line 476:     RETURN_NOT_OK(block_->Read(ptr.offset(), &block));
Let's use ReadV() for both cases, and vary whether we pass a one-slice vector or a two-slice vector.


Line 477:     if (block.size() != data_size) {
No short reads...


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

Line 188:   // Returns true if the file has checksums on the header, footer, and data blocks
Nit: terminate with a period.


Line 189:   bool hasChecksums() const;
Nit: should be either has_checksums() or HasChecksums()


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

PS13, Line 65: DEFINE_bool(cfile_write_checksums, false,
             :             "Write CRC32 checksums for each block");
             : TAG_FLAG(cfile_write_checksums, experimental);
1. Why default to false?
2. Why experimental and not evolving?


Line 186:     RETURN_NOT_OK_PREPEND(WriteRawData(Slice(checksum_buf, sizeof(checksum_buf))),
Rather than two separate calls to WriteRawData(), could we append the checksum to header_str and write out the whole thing in a single call to WriteRawData()?


Line 263:   // Prepend a footer checksum.
Any idea why we're using block_->Append() here and not WriteRawData() as in the header? Seems like using the same code path would be cleaner, but maybe there's a good reason (i.e. not wanting to update off_).


Line 272:   RETURN_NOT_OK_PREPEND(block_->Append(footer_str), "Couldn't write footer");
Likewise here, could we combine the checksum and footer_str and make just a single Append() call?


Line 494:     RETURN_NOT_OK(WriteRawData(data));
Hmm, a little surprised that Todd didn't suggest coalescing these into one syscall too, via WriteV(). I'm guessing it's because we expect to read far more than to write. Still, it seems like a WriteV()-abstraction to use here would be overall useful too. Do you have any interest in doing that? Not as part of this patch though.


Line 504:     RETURN_NOT_OK(WriteRawData(Slice(crc_buf, sizeof(crc_buf))));
Let's also combine the writes here. It's a little trickier since we need to compute the combined crc32 first and then add an additional slice to out_slices, but it should be doable.


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/util/crc-test.cc
File src/kudu/util/crc-test.cc:

PS13, Line 58: 0xa9421b7
Since this is used 3 times now, store it up front in a "const uint64_t kKnownGoodCrc32" or some such.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6630/14/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

Line 27: Header
> I will add that they are optional. I am not sure if the cflags should go in
That's fair.


http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

Line 336:     data.mutable_data()[corrupt_offset] = corrupt_value;
Nit: could do data.data()[corrupt_offset] here I think (though it may mean making 'orig' const uint8_t), since the data isn't yet beint modified.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#17).

Change subject: KUDU-463. Add checksumming to cfile
......................................................................

KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/util/crc-test.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
10 files changed, 325 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/17
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-463. Add checksumming to cfile

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged.

Change subject: KUDU-463. Add checksumming to cfile
......................................................................


KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Reviewed-on: http://gerrit.cloudera.org:8080/6630
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <to...@apache.org>
Reviewed-by: Adar Dembo <ad...@cloudera.com>
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/util/crc-test.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
10 files changed, 325 insertions(+), 56 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Todd Lipcon: Looks good to me, but someone else must approve
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 18
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 13:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6630/14/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

Line 27: Header
> Can you update this to reflect that the checksums are optional (and maybe d
I will add that they are optional. I am not sure if the cflags should go in this document since it describes the format and not the details of the reader/writer implementations.


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

Line 847:     Status s = CorruptAndReadBlock(id, i, 254);
> I had something similar to this at one point but I worried I was getting to
Done


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 438:     }
> I understand, but what exactly does it mean for a ReadBlock() call to provi
Since this is a public method I want to validate the input parameters. I don't think there is ever a valid scenario where a block would ever have 3 bytes or less. However, when hasChecksums is true, this is always the case given the checksum is 4 bytes and is included in the size.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change.

Change subject: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 16:

I still want to performance test this yet too.

-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 3:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6630/3//COMMIT_MSG
Commit Message:

Line 19: immeditaly result in performance degredation and incompatible data on
> typo
Done


Line 25: are not writen the incompatible_features "checksum" bit is not set.
> typo
Done


Line 27: cfile_verify_checksums is used in the CFileReader to enabled validating the
> typo
Done


http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

Line 790:   TestReadWriteRawBlocks(SNAPPY, 1000);
> maybe this could be collapsed into a for loop?
Done


Line 819:   unique_ptr<CorruptReadableBlock> corrupt_source(
> would it be simpler to just corrupt the data on disk instead of this code-i
I will look into this. I think ideally I would like to corrupt each byte in a sample file and show it results in a corrupt/io error. I was planning to use this to do that. However I ran into some issues with the caching of index blocks. So regardless a slightly different approach than what is here is needed.


http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 157:   if (PREDICT_FALSE(footer_->incompatible_features() > IncompatibleFeatures::ALL)) {
> this strikes me as odd... don't you mean & (~IncompatibleFeatures::ALL)?
Your check is technically more correct since if there are bits in the middle that are not valid but are set, it will check that. I assumed incompatible feature bits would be used sequentially. Especially since we output "supported" as an integer in the log message below.


Line 363:   uint32_t data_size = ptr.size();
> best to use a signed int here
ptr.size() returns uint32_t, so I kept it the same for consistency.


Line 366:     data_size = ptr.size() - sizeof(uint32_t);
> we should verify that ptr.size() >= sizeof(uint32_t) first and return a Cor
Done


Line 392:     RETURN_NOT_OK(block_->Read(checksum_offset, sizeof(uint32_t), &checksum, checksum_scratch));
> I think perf wise it may be better to just allocate the extra 4 bytes in th
I didn't want to cache the checksum since that seamed to complicate cache reads.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-463. Add checksumming to cfile

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 17: Code-Review+2

Should I merge this? Or are you still doing performance testing?

-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 15:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/6630/15/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

Line 31: <header length>: 32-bit unsigned integer length delimiter
specify whether this includes the checksum or just the following PB


Line 33: <checksum>: An optional Crc32 checksum of the entire header
does this include the magic?


Line 43: <checksum>: An optional Crc32 checksum of the entire footer
same - including the footer, magic, and pb len?


Line 211: When checksums for the header, data, and footer are written in the CFile
nit: add comma at end of line


http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

Line 223:     opts.storage_attributes.compression = compression;
oh, that's interesting, we forgot to actually use the compression here?


http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 141:   // Parse Footer first to find unsupported features.
does this mean that an old server will have problems with checksummed headers, because it will fail while reading the header rather than seeing the incompatible feature flag in the footer first?


Line 146:   if (PREDICT_FALSE(footer_->incompatible_features() > IncompatibleFeatures::ALL)) {
'>' strikes me as strange here vs doing & ~IncompatibleFeatures::ALL


PS15, Line 150: ALL
instead of ALL, maybe a better name would be 'SUPPORTED'?


Line 239:   if (footer_size >= file_size_ - kMagicAndLengthSize) {
do we want to have some kind of max size here? am afraid a flipped bit in a large block would cause a huge stack alloc below.

alernatively we could use faststring for the buffer below, which is still stack-allocated for small sizes but avoids blowing out the stack in the case of a corruption


PS15, Line 416:   uint32_t checksum_size = sizeof(uint32_t);
you have this in a few places. Maybe better to just have a global constant kChecksumSize?


http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

PS15, Line 230: =
maybe |= here so it doesn't need to change when we add some other feature


PS15, Line 500: checksum data checksum
typo?


http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/util/crc-test.cc
File src/kudu/util/crc-test.cc:

Line 50:   const uint64_t expected_crc = 0xa9421b7; // Known value from crcutil usage test program.
nit: kExpectedCrc


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#10).

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/fs-test-util.h
M src/kudu/fs/log_block_manager.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/env_util.cc
M src/kudu/util/env_util.h
M src/kudu/util/file_cache.cc
19 files changed, 792 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 15:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/6630/15/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

Line 31: <header length>: 32-bit unsigned integer length delimiter
> specify whether this includes the checksum or just the following PB
Done


Line 33: <checksum>: An optional Crc32 checksum of the entire header
> does this include the magic?
Done


Line 43: <checksum>: An optional Crc32 checksum of the entire footer
> same - including the footer, magic, and pb len?
Done


Line 211: When checksums for the header, data, and footer are written in the CFile
> nit: add comma at end of line
Done


http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

Line 223:     opts.storage_attributes.compression = compression;
> oh, that's interesting, we forgot to actually use the compression here?
I think so. I noticed it while doing my testing.


Line 336:     uint8_t orig = data.mutable_data()[corrupt_offset];
> Nit: could do data.data()[corrupt_offset] here I think (though it may mean 
Done


http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 141:   // Parse Footer first to find unsupported features.
> does this mean that an old server will have problems with checksummed heade
It shouldn't because the length doesn't include the checksum.


Line 146:   if (PREDICT_FALSE(footer_->incompatible_features() > IncompatibleFeatures::ALL)) {
> '>' strikes me as strange here vs doing & ~IncompatibleFeatures::ALL
I put a comment on a similar older review and wasn't sure if I should change it. Here was the response:

> Your check is technically more correct since if there are bits in the middle that are not valid but are set, it will check that. I assumed incompatible feature bits would be used sequentially. Especially since we output "supported" as an integer in the log message below.


Line 150:         IncompatibleFeatures::ALL));
> instead of ALL, maybe a better name would be 'SUPPORTED'?
Done


Line 239:   if (footer_size >= file_size_ - kMagicAndLengthSize) {
> do we want to have some kind of max size here? am afraid a flipped bit in a
ParseMagicAndLength has a max size check:

   if (len > kMaxHeaderFooterPBSize) {
      return Status::Corruption("invalid data size");
   }


Line 416:   uint32_t checksum_size = sizeof(uint32_t);
> you have this in a few places. Maybe better to just have a global constant 
Done


http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

Line 230:     incompatible_features = IncompatibleFeatures::CHECKSUM;
> maybe |= here so it doesn't need to change when we add some other feature
Done


Line 500:   // Calculate and append a checksum data checksum.
> typo?
Done


http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/util/crc-test.cc
File src/kudu/util/crc-test.cc:

Line 50:   const uint64_t expected_crc = 0xa9421b7; // Known value from crcutil usage test program.
> nit: kExpectedCrc
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6630/11/src/kudu/util/env.h
File src/kudu/util/env.h:

Line 375:   Slice*   result;  // Set to the data and length that was read (OUT)
Since there are no short reads, can we omit 'result' altogether? Without short reads, ReadV() either fails (in which case 'entries' is totally invalid) or it succeeds fully (in which case the results of the read is described fully by each 'length' and 'scratch' pair). If the caller needs slices, it can make its own.

Alternatively, perhaps we can replace vector<ReadEntry> with vector<Slice>? Each slice describes both 'length' and 'scratch', and would be used as input when constructing the iov internally.

In either case, ReadV() won't modify the vector and so it should be passed by const ref to avoid needless copies. An alternative would to be pass it by value and move it, but I think the caller still needs access after ReadV(), and it would lose access if move were used.

Given the amount of discussion around this interface, I really think this part of the patch should be split out into its own gerrit change, partly to ease code review and partly so the new git commit can more fully document the design trade-offs taken.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-463. Add checksumming to cfile

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 17: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 6:

(2 comments)

Just did a quick look at the approach using readv, it seems like a good direction to me. I didn't review the new header/footer checksum stuff yet.

It may be worth checking for Adar's opinion on the API in Env, he tends to have opinions on that stuff. The current one doesn't seem super idiomatic to me but also don't really have a better suggestion :)

http://gerrit.cloudera.org:8080/#/c/6630/6/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 446:     data_size -= checksum_size;
careful of underflow


PS6, Line 472: &&
||


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#5).

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

TODO:
- Finish testing
- Change cfile block to page to avoid term overload?

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/fs/fs-test-util.h
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
9 files changed, 346 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change.

Change subject: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 17:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 146:   if (PREDICT_FALSE(footer_->incompatible_features() & ~IncompatibleFeatures::SUPPORTED)) {
> eh, I still think that the "more correct" way is better. The point of using
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 14:

(1 comment)

Looks good, just two other things:
1. What was your conclusion re: bit flipping in the test?
2. Let's continue the comment thread about ReadBlock() when ptr.size() < 4.

http://gerrit.cloudera.org:8080/#/c/6630/14/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

Line 27: Header
Can you update this to reflect that the checksums are optional (and maybe describe the cflags that control them a little)?


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 1:

This is just a quick push to sanity check my approach.

-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#13).

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/util/crc-test.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
9 files changed, 344 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/13
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 3:

I need to handle the header and footer yet, which may require a change in how to handle backwards compatibility.

-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 13:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

Line 244:     if (FLAGS_cfile_write_checksums) {
> Can you make TestCFile a friend of CFileReader (see the FRIEND_TEST macro) 
Done


Line 330:     source->Size(&file_size);
> RETURN_NOT_OK
Done


Line 333:     source->Read(0, &data);
> RETURN_NOT_OK
Done


Line 843:   source->Size(&file_size);
> Should ASSERT_OK on this too.
Done


Line 847:     Status s = CorruptAndReadBlock(id, i, 254);
> Just in case one of the file's bytes actually has the value 254, maybe you 
I had something similar to this at one point but I worried I was getting to clever in a test. I figured a simple static value that was known to corrupt all bytes in the given test would be a clear way to fail fast if something in the test changed. I will take a look again now that my corrupt code is more straightforward.


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 56: TAG_FLAG(cfile_verify_checksums, experimental);
> Should probably default to evolving.
Done. I am not sure I am clear on the difference between experimental and evolving.


Line 137: Status CFileReader::ReadMagicAndLength(uint64_t offset, Slice* slice) {
> Since this function is now just a block read, I think it's net more readabl
I only kept it for the sake of the TRACE_EVENT1. Can remove it.


Line 210:     if (header.size() != header_size || checksum.size() != checksum_size) {
> Do we need this check now that there are no short reads?
Done


Line 215:     if (FLAGS_cfile_verify_checksums) {
> If this is false, why bother reading the checksum out of the block at all?
Done


Line 226:     RETURN_NOT_OK(block_->Read(off, &header));
> Would be cleaner to call ReadV() regardless of whether checksums exist or n
Done


Line 227:     if (header.size() != header_size) {
> Do we need this check now that there are no short reads?
Done


Line 273:   // This is done to avoid the need for a follow up read call.
> Just to be clear, if there are no checksums, the four extra bytes we're rea
That is exactly the choice being made. An earlier version of the patch did this in 2 reads. However because we know the header alone has enough bytes to support the over read, it should be safe to do this. 

The draw back here is that cfiles without checksums read 4 bytes more than required.


Line 277:   if (footer.size() != footer_size || checksum.size() != checksum_size) {
> No short reads; do we still need this?
Done


Line 283:   // incompatible_features flag tells us if a checksum exists at all.
> What's the point of the footer checksum then? It seems extraordinarily rare
Though rare, I am viewing checksumming as an all or nothing feature.

It's possible the contents of the protobuf could be parsable but incorrect. In the case parsing fails, we know its corrupt. In the case parsing succeeds the checksum ensures we parsed the right thing.

Given the importance of the fields in the footer I think it's worth being sure.


Line 438:     }
> This effectively forbids zero-length block reads, right? Is that an invaria
This is mainly to prevent data_size -= checksum_size from becoming negative and resulting in a cryptic error.


Line 461:     if (block.size() != data_size || checksum.size() != checksum_size) {
> No short reads...
Done


Line 474:     }
> Maybe it would even be possible to incorporate the ReadV() too? Basically a
Good idea. The one place that is different is the footer since we "preemptively" read checksums data and the checksum comes before the data. This could be reused for the header and data blocks though.


Line 476:     RETURN_NOT_OK(block_->Read(ptr.offset(), &block));
> Let's use ReadV() for both cases, and vary whether we pass a one-slice vect
Done


Line 477:     if (block.size() != data_size) {
> No short reads...
Done


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

Line 188:   // Returns true if the file has checksums on the header, footer, and data blocks
> Nit: terminate with a period.
Done


Line 189:   bool hasChecksums() const;
> Nit: should be either has_checksums() or HasChecksums()
Done


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

Line 67: TAG_FLAG(cfile_write_checksums, experimental);
> 1. Why default to false?
I describe the defaults a bit in the patch summary. Think I should add a more detailed comment here?

Here is a quote from the patch summary:
   
   cfile_write_checksums is defaulted to false to ensure upgrades don't
   immediately result in performance degredation and incompatible data on
   downgrade. It can and likely should be defaulted to true in a later release.

Essentially I think they should be opt in for at least 1 release. This also ensure we support a downgrade of at least 1 release.


Line 263:   // Prepend a footer checksum.
> Any idea why we're using block_->Append() here and not WriteRawData() as in
I saw the same thing but don't have a great answer. I can change it if you think I should. I opted to follow the existing code since I was unsure.


Line 494:     RETURN_NOT_OK(WriteRawData(data));
> Hmm, a little surprised that Todd didn't suggest coalescing these into one 
As I was reading through the comments I started getting a similar feeling that WriteV would be a good idea. I can definitely look into adding it for use in this patch following the ReadV() patch.


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/util/crc-test.cc
File src/kudu/util/crc-test.cc:

Line 58:   ASSERT_EQ(0xa9421b7, data_crc); // Known value from crcutil usage test program.
> Since this is used 3 times now, store it up front in a "const uint64_t kKno
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#12).

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/fs-test-util.h
M src/kudu/fs/log_block_manager.cc
M src/kudu/util/crc-test.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/file_cache.cc
18 files changed, 724 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/12
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6630/15//COMMIT_MSG
Commit Message:

Line 7: WIP: KUDU-463. Add checksumming to cfile
Also, this is no longer a WIP, right?


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-463. Add checksumming to cfile

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 146:   if (PREDICT_FALSE(footer_->incompatible_features() > IncompatibleFeatures::SUPPORTED)) {
> I put a comment on a similar older review and wasn't sure if I should chang
eh, I still think that the "more correct" way is better. The point of using feature flags instead of just a version number is so that you can backport non-sequential features into past version branches, etc, and still get correct behavior.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 10:

(21 comments)

I didn't review the checksumming stuff yet, just looked at the rest and the plumbing.

Since the ReadV() stuff turned out to be pretty substantial, could you move it into its own commit? Would be nice to finish reviewing it separately.

http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

Line 210:   bool has_checksums_;
This will be padded to 8 bytes, which is actually quite a bit given how many CFileReaders we instantiate. Perhaps we can calculate this on the fly instead, using a mask on footer_'s features?


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

PS10, Line 262: gscoped_ptr
Nit: use unique_ptr.

Actually, these scratch spaces are tiny; can you just allocate them on the stack?


Line 267:       ReadResult {
Surprised that this style of struct initialization works; isn't it C99 but not C++11 compatible?


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

Line 38: class Env;
No need for this anymore.

Though if you decide to pass the vector by pointer as I suggested in env.h, you can omit the env.h include, forward declare ReadResult, and continue to forward declare Env.


PS10, Line 162: result
You mean results?


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/fs-test-util.h
File src/kudu/fs/fs-test-util.h:

PS10, Line 72: vector
This is a header so we retain namespace qualifications (i.e. this should be std::vector).


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/crc.cc
File src/kudu/util/crc.cc:

PS10, Line 50:  
Nit: extra space here.


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/crc.h
File src/kudu/util/crc.h:

PS10, Line 38: crc32
Maybe previous_crc32 to be more clear?


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env.h
File src/kudu/util/env.h:

Line 371: struct ReadResult {
Should provide a short high-level struct document too.


PS10, Line 403: results
I'm finding this to be somewhat confusing.

Part of the issue is with the name ReadResult: the majority of stuff in it aren't "results"; they're actually input to the function. Maybe ReadEntry or ReadVEntry would be better?

Part of the issue is that the Slice* means an extra level of indirection in the setup that callers have to do. Perhaps they can be regular Slices, that are default-initialized by the caller and set by the callee? That means 'results' should be passed by pointer, but I think that's a clearer model anyway, since we pass by pointer things that the function is expected to modify.

Also, could you reorganize ReadResult so that the pure IN parameters precede IN/OUT and OUT? Meaning, the order should probably be length, scratch, then result.


Line 536:   // TODO (GH): Document
Yep. :)


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

PS10, Line 175: const struct iovec *
Should be const struct iovec*


Line 180:       return read_bytes;
Nit: if you're going to elide the braces, merge this line into the conditional line. This makes it harder to accidentally mess up the conditional if another line is added.


Line 182:       break;
Don't we need to update total_read_bytes with our partial read results before breaking?


Line 334:   virtual Status ReadV(uint64_t offset, vector<ReadResult> results) const OVERRIDE {
Could you add TRACE_EVENT1() calls to ReadV() and Read()? Not sure why they were missing in the first place.


PS10, Line 340: ReadResult
Can be cref?


Line 341:       return iovec {
Again, surprised this kind of initialization works in C++11.


PS10, Line 347: results.size()
arraylength(iov) is more idiomatic.


Line 353:     // Adjust slice sizes based on bytes read for short reads
As we discussed, let's make short reads an error, for Read() too. Do it as a separate patch and layer this one on top of it.


PS10, Line 355: ReadResult
Nit: auto is fine for short-scoped types like this one.


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_util.h
File src/kudu/util/env_util.h:

Line 66: Status ReadVFully(RandomAccessFile* file, uint64_t offset, std::vector<ReadResult> results);
If you end up solving the partial read stuff directly in RandomAccessFile, this won't be necessary.

But if it is, please also doc it.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#8).

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

TODO:
- Finish testing
- Method documentation
- Change cfile block to page to avoid term overload?

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/fs-test-util.h
M src/kudu/fs/log_block_manager.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/env_util.cc
M src/kudu/util/env_util.h
M src/kudu/util/file_cache.cc
18 files changed, 804 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 10:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 301:   RETURN_NOT_OK(block_->ReadV(off, results));
> I think it would be a little less obtrusive to have ReadResult construct an
My goal was to match the pattern and variables used in the Read function. Especially since this is essentially a version of Read for multiple Slices.


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

Line 210:   bool has_checksums_;
> This will be padded to 8 bytes, which is actually quite a bit given how man
Done


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

Line 262:   gscoped_ptr<uint8_t[]> scratch1(new uint8_t[size1]);
> Nit: use unique_ptr.
I mainly just followed the "pattern" from a few lines above. Happy to change it.


Line 267:       ReadResult {
> Surprised that this style of struct initialization works; isn't it C99 but 
Will change this.


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

Line 162:   // Does not modify 'result' on error (but may modify 'scratch').
> You mean results?
I mean the result field in the results. I will make this more clear.


Line 163:   virtual Status ReadV(uint64_t offset, vector<ReadResult> results) const = 0;
> nit: Consider making this vector<ReadResult>* perhaps, based on my comments
Looking into this. Adar mentioned it too.


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/fs-test-util.h
File src/kudu/fs/fs-test-util.h:

Line 72:   virtual Status ReadV(uint64_t offset, vector<ReadResult> results) const OVERRIDE {
> This is a header so we retain namespace qualifications (i.e. this should be
Done


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/crc.cc
File src/kudu/util/crc.cc:

Line 50:   uint64_t crc_tmp =  static_cast<uint64_t>(crc32);
> Nit: extra space here.
Done


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/crc.h
File src/kudu/util/crc.h:

Line 38: uint32_t Crc32c(const void* data, size_t length, uint32_t crc32);
> Maybe previous_crc32 to be more clear?
Done


Line 38: uint32_t Crc32c(const void* data, size_t length, uint32_t crc32);
> Maybe name it RollingCrc32()? Also, needs a unit test.
I though it was nice to have the same name as the original function since it works in tandem with it. 
ie:
uint32_t crc = Crc32c(data, length);
crc = Crc32c(data, length, crc);


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env.h
File src/kudu/util/env.h:

Line 371: struct ReadResult {
> Should provide a short high-level struct document too.
Done


Line 403:   virtual Status ReadV(uint64_t offset, std::vector<ReadResult> results) const = 0;
> I'm finding this to be somewhat confusing.
I agree. I had a hard time naming the struct too. I am not sure ReadEntry is perfect either, but it's definitely better. 

Will change and re-organize ReadResult.


Line 536:   // TODO (GH): Document
> Yep. :)
I was waiting until it was decided if I should allow short reads here or not. I will move the readfully logic here and document accordingly.


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

Line 175: ssize_t preadv(int fd, const struct iovec *iovec, int count, off_t offset) {
> Should be const struct iovec*
Done


Line 180:       return read_bytes;
> Nit: if you're going to elide the braces, merge this line into the conditio
Done


Line 182:       break;
> Don't we need to update total_read_bytes with our partial read results befo
Done


Line 334:   virtual Status ReadV(uint64_t offset, vector<ReadResult> results) const OVERRIDE {
> Could you add TRACE_EVENT1() calls to ReadV() and Read()? Not sure why they
It doesn't look like any read or write calls have this now. Since these are called frequently is there some overhead to worry about?


Line 341:       return iovec {
> Again, surprised this kind of initialization works in C++11.
Done


Line 353:     // Adjust slice sizes based on bytes read for short reads
> As we discussed, let's make short reads an error, for Read() too. Do it as 
Will do.


Line 355:     for (ReadResult& readResult : results) {
> Nit: auto is fine for short-scoped types like this one.
Done


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_util.cc
File src/kudu/util/env_util.cc:

Line 196:   size_t n = accumulate(results.begin(), results.end(), static_cast<size_t>(0),
> nit: name this req_len ?
I can rename this. Often when in a new codebase I try to match the style that exists. Given this is a "mirror" of ReadFully above I used similar names.


Line 207:     // Calculate the read amount of data
> Nit: Use punctuation like periods at the end of comment sentences per https
Done


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_util.h
File src/kudu/util/env_util.h:

Line 66: Status ReadVFully(RandomAccessFile* file, uint64_t offset, std::vector<ReadResult> results);
> If you end up solving the partial read stuff directly in RandomAccessFile, 
I will move this into the lower level read methods.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6630/6/src/kudu/util/env.h
File src/kudu/util/env.h:

Line 21: #include <sys/uio.h>
Would prefer not to leak this sort of platform-specific thing outside of env_posix.cc. An iovec is a pretty simple structure; perhaps you can define a Kudu version and adapt it to/from an iovec inside env_posix.cc?


Line 390:   virtual Status ReadV(uint64_t offset, Slice** results,
Would std::vector<Slice>* be more intuitive for 'results'? That is, ReadV() takes a caller-provided vector of Slices and replaces its contents with a couple Slices of its own creation?

Might be interesting to combine 'scratches' and 'results' into the same structure, since I imagine the length of the two must be the same.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#16).

Change subject: KUDU-463. Add checksumming to cfile
......................................................................

KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/util/crc-test.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
10 files changed, 325 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/16
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 10:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

PS10, Line 277:   uint8_t footer_scratch[footer_size];
              :   Slice footer;
              : 
              :   uint32_t checksum_size = sizeof(uint32_t);
              :   uint8_t checksum_scratch[checksum_size];
              :   Slice checksum;
              : 
              :   // Read both the header and checksum in one call.
              :   // We read the checksum position in case one exists.
              :   // This is done to avoid the need for a follow up read call.
              :   // Read both the data and checksum in one call
              :   vector<ReadResult> results = {
              :       ReadResult {
              :           .result = &checksum,
              :           .scratch = checksum_scratch,
              :           .length = checksum_size,
              :       },
              :       ReadResult {
              :           .result = &footer,
              :           .scratch = footer_scratch,
              :           .length = footer_size,
              :       }
              :   };
              :   uint64_t off = file_size_ - kMagicAndLengthSize - footer_size - checksum_size;
              :   RETURN_NOT_OK(block_->ReadV(off, results));
I think it would be a little less obtrusive to have ReadResult construct and own the Slice, the scratch buffer (you could use faststring) and the length and use a constructor like

  explicit ReadResult(buffer_size)

So then instead of all this you could just say

  uint32_t checksum_size = sizeof(uint32_t);
  vector<ReadResult> results = { ReadResult(checksum_size), ReadResult(footer_size) };
  uint64_t off = file_size_ - kMagicAndLengthSize - footer_size - checksum_size;
  RETURN_NOT_OK(block_->ReadV(off, &results));

And avoid copying the vector.


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

PS10, Line 163: vector<ReadResult>
nit: Consider making this vector<ReadResult>* perhaps, based on my comments elsewhere


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/crc.h
File src/kudu/util/crc.h:

PS10, Line 38: Crc32c
Maybe name it RollingCrc32()? Also, needs a unit test.


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_util.cc
File src/kudu/util/env_util.cc:

PS10, Line 196: n
nit: name this req_len ?


Line 207:     // Calculate the read amount of data
Nit: Use punctuation like periods at the end of comment sentences per https://google.github.io/styleguide/cppguide.html#Punctuation,_Spelling_and_Grammar


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#15).

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/util/crc-test.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
9 files changed, 328 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/15
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#4).

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

TODO:
- Update cfile docs
- Finish testing
- Change cfile block to page to avoid term overload?

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/fs/fs-test-util.h
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
8 files changed, 321 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

PS13, Line 466:     if (FLAGS_cfile_verify_checksums) {
              :       uint32_t expected_checksum = DecodeFixed32(checksum.data());
              :       uint32_t checksum_value = crc::Crc32c(block.data(), block.size());
              :       if (PREDICT_FALSE(checksum_value != expected_checksum)) {
              :         return Status::Corruption(
              :             Substitute("Checksum does not match at offset $0 size $1: $2 vs $3",
              :                        ptr.offset(), block.size(), checksum_value, expected_checksum));
              :       }
              :     }
> I think I've seen this block of code three times; can you factor it out int
Maybe it would even be possible to incorporate the ReadV() too? Basically a function that takes as input the data Slice and:
1. Creates a slice vector to pass to ReadV().
2. Checks if checksums are to be verified.
3. If so, creates a checksum buffer+slice and modifies the slice vector.
4. Calls ReadV().
5. If checksums are to be verified, performs the verification.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 13:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

Line 67: TAG_FLAG(cfile_write_checksums, experimental);
> No, my reading skills aren't as good as they should be. Sorry about that.
Done


Line 186:     RETURN_NOT_OK_PREPEND(WriteRawData(Slice(checksum_buf, sizeof(checksum_buf))),
> Rather than two separate calls to WriteRawData(), could we append the check
Done


Line 494:     RETURN_NOT_OK(WriteRawData(data));
> As I was reading through the comments I started getting a similar feeling t
Done


Line 504:     RETURN_NOT_OK(WriteRawData(Slice(crc_buf, sizeof(crc_buf))));
> Let's also combine the writes here. It's a little trickier since we need to
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6630/6/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 446:     data_size -= checksum_size;
> careful of underflow
Done


Line 472:     if (block.size() != data_size && checksum.size() != checksum_size) {
> ||
Done


http://gerrit.cloudera.org:8080/#/c/6630/6/src/kudu/util/env.h
File src/kudu/util/env.h:

Line 21: #include <sys/uio.h>
> Would prefer not to leak this sort of platform-specific thing outside of en
I agree. This was just a rough patch to see if the approach and use of preadv made sense before polishing.  See my comment below as well.


Line 390:   virtual Status ReadV(uint64_t offset, Slice** results,
> Would std::vector<Slice>* be more intuitive for 'results'? That is, ReadV()
I agree that a vector with all the combined parameters would be better in the end. Something that has slice, scratch, and length so we don't need to expose iovec. I planned to do this, I just posted this super rough version to see if the approach and use of preadv made sense before spending the time polishing, testing, etc.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

Line 819:   unique_ptr<CorruptReadableBlock> corrupt_source(
> I will look into this. I think ideally I would like to corrupt each byte in
yea, you probably need to re-open the file and clear the block cache in between


http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 363:   uint32_t data_size = ptr.size();
> ptr.size() returns uint32_t, so I kept it the same for consistency.
yea, though the style guide says to use signed quantities anywhere that you aren't doing bitwise ops (that way overflow will be more obvious since it will go to -1 instead of a very large number)


Line 392:     RETURN_NOT_OK(block_->Read(checksum_offset, sizeof(uint32_t), &checksum, checksum_scratch));
> I didn't want to cache the checksum since that seamed to complicate cache r
hm, possibly, but am worried that doubling the syscalls is expensive


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#2).

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block.

cfile_write_checksums is defaulted to false to ensure upgrades don't
immeditaly result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not writen the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enabled validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

TODO:
- More unit and perf testing
- Change cfile block to page to avoid term overload?

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
4 files changed, 102 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#3).

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block.

cfile_write_checksums is defaulted to false to ensure upgrades don't
immeditaly result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not writen the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enabled validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

TODO:
- More unit and perf testing
- Change cfile block to page to avoid term overload?

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/fs/fs-test-util.h
5 files changed, 195 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 3:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6630/3//COMMIT_MSG
Commit Message:

PS3, Line 19: immeditaly
typo


PS3, Line 25: writen
typo


PS3, Line 27: enabled
typo


http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

PS3, Line 772:   FLAGS_cfile_write_checksums = true;
             :   FLAGS_cfile_verify_checksums = true;
             :   TestReadWriteRawBlocks(NO_COMPRESSION, 1000);
             :   TestReadWriteRawBlocks(SNAPPY, 1000);
             : 
             :   FLAGS_cfile_write_checksums = true;
             :   FLAGS_cfile_verify_checksums = false;
             :   TestReadWriteRawBlocks(NO_COMPRESSION, 1000);
             :   TestReadWriteRawBlocks(SNAPPY, 1000);
             : 
             :   FLAGS_cfile_write_checksums = false;
             :   FLAGS_cfile_verify_checksums = true;
             :   TestReadWriteRawBlocks(NO_COMPRESSION, 1000);
             :   TestReadWriteRawBlocks(SNAPPY, 1000);
             : 
             :   FLAGS_cfile_write_checksums = false;
             :   FLAGS_cfile_verify_checksums = false;
             :   TestReadWriteRawBlocks(NO_COMPRESSION, 1000);
             :   TestReadWriteRawBlocks(SNAPPY, 1000);
maybe this could be collapsed into a for loop?

for (FLAGS_cfile_write_checksum : {false, true}) {
  for (FLAGS_cfile_verify_checksum : {false, true}) {
    ...
  }
}


PS3, Line 819: CorruptReadableBlock
would it be simpler to just corrupt the data on disk instead of this code-injection-based approach? A hook in the fs manager code to corrupt a block on disk is probably more generally reusable in higher-level integration tests as well (where injecting this CorruptReadableBlock class may be a bit tricky).

I think that would also enable making this code quite a bit simpler by reusing existing ReadFile/WriteFile functions


http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

PS3, Line 157: > IncompatibleFeatures::ALL
this strikes me as odd... don't you mean & (~IncompatibleFeatures::ALL)?


PS3, Line 363: uint32_t
best to use a signed int here


Line 366:     data_size = ptr.size() - sizeof(uint32_t);
we should verify that ptr.size() >= sizeof(uint32_t) first and return a Corruption otherwise.


Line 392:     RETURN_NOT_OK(block_->Read(checksum_offset, sizeof(uint32_t), &checksum, checksum_scratch));
I think perf wise it may be better to just allocate the extra 4 bytes in the block cache so that we can read the checksum in the same IO vs issuing the extra syscall here. Plus the code is likely to be a little simpler?


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#7).

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

TODO:
- Finish testing
- Method documentation
- Change cfile block to page to avoid term overload?

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/fs-test-util.h
M src/kudu/fs/log_block_manager.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/env_util.cc
M src/kudu/util/env_util.h
M src/kudu/util/file_cache.cc
18 files changed, 804 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

Line 174: // Simulates Linux's preadv API on OS X.
> BTW, I just noticed that we simulate pwritev() inline, in DoWriteV(). If yo
DoWriteV actually changes it entire implementation instead of simulating pwritev. I can definitely move this, but it might be best in a follow up patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

Line 262:   gscoped_ptr<uint8_t[]> scratch1(new uint8_t[size1]);
> I mainly just followed the "pattern" from a few lines above. Happy to chang
Yeah, we should be using unique_ptr in new sites. Not sure why I used heap allocation above, but that can change too.


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

Line 174: // Simulates Linux's preadv API on OS X.
BTW, I just noticed that we simulate pwritev() inline, in DoWriteV(). If you're feeling generous perhaps you can pull that out into this area too, so the two simulated functions can be defined side by side?


Line 334:   virtual Status ReadV(uint64_t offset, vector<ReadResult> results) const OVERRIDE {
> It doesn't look like any read or write calls have this now. Since these are
Hmm you're probably right; I didn't realize there weren't any calls in Write either. Nevermind then.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#6).

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

TODO:
- Finish testing
- Change cfile block to page to avoid term overload?

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/fs-test-util.h
M src/kudu/fs/log_block_manager.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/env_util.h
M src/kudu/util/file_cache.cc
17 files changed, 539 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

Line 174: // Simulates Linux's preadv API on OS X.
> DoWriteV actually changes it entire implementation instead of simulating pw
Sure, works for me.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 56: TAG_FLAG(cfile_verify_checksums, experimental);
> I think the main difference is that experimental is for things we don't nec
Adar is right, also here's the reference: https://github.com/apache/kudu/blob/master/src/kudu/util/flag_tags.h#L21

The main thing about experimental is that you must unlock them first to be able to use them. So if you enable something by default... but then mark it experimental, it means that users have to go out of their way to disable the experimental feature.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#11).

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/fs-test-util.h
M src/kudu/fs/log_block_manager.cc
M src/kudu/util/crc-test.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/file_cache.cc
18 files changed, 724 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/11
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#9).

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

TODO:
- Finish testing
- Method documentation
- Change cfile block to page to avoid term overload?

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/fs-test-util.h
M src/kudu/fs/log_block_manager.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/env_util.cc
M src/kudu/util/env_util.h
M src/kudu/util/file_cache.cc
19 files changed, 828 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#14).

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/util/crc-test.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
9 files changed, 324 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/14
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>