You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2017/01/11 09:45:54 UTC

[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

Hello Adar Dembo,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/5679

to review the following change.

Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't compress
......................................................................

KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

This changes the compressed block header for CFile v2 so that it only
includes the uncompressed block length, and not a redundant copy of the
compressed block length (which is already known by the length of the
block itself).

In addition, this enables a special behavior if the stored uncompressed
block length is exactly equal to the length of the compressed data. This
signifies that the data has not been compressed and that the reader
should not execute the configured codec.

On the write side, we always execute the codec, and in the case that the
resulting compression ratio is not good enough, we just store the
uncompressed data as above. By default, the compression ratio is 0.9x
(i.e at least 10% space reduction). It can be configured by an
experimental flag.

Initially, the JIRA had described doing this based on a policy rather
than based on the actual output of the codec. However, during
development I realized that, even with encodings like BIT_SHUFFLE, there
are actually some patterns that can benefit from a second pass of LZ4.
And even if the second pass is not effective, it isn't too expensive to
try a second pass on the write side anyway. See [1] for a discussion of
a dataset that benefits from multiple passes of compression.

Along the way, made a few changes to how the compressed block decoder is
implemented:

* Use strings::Substitute instead of StringPrintf (easier to read)
* Cleaned up unnecessary header includes
* Made the CompressedBlockDecoder a short-lifetime stack object rather
  than an allocated member of CFileReader. This avoids an allocation and
  indirection, and additionally allows us to put more state inside the
  object itself without worrying about thread safety, etc.
* The CompressedBlockDecoder is now stateful, making for an
  easier-to-use API.

[1] https://groups.google.com/forum/#!msg/lz4c/DcN5SgFywwk/AVMOPri0O3gJ

Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
---
M src/kudu/cfile/block_compression.cc
M src/kudu/cfile/block_compression.h
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/compression-test.cc
6 files changed, 200 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/5679/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>

[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.

Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5679

to look at the new patch set (#2).

Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't compress
......................................................................

KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

This changes the compressed block header for CFile v2 so that it only
includes the uncompressed block length, and not a redundant copy of the
compressed block length (which is already known by the length of the
block itself).

In addition, this enables a special behavior if the stored uncompressed
block length is exactly equal to the length of the compressed data. This
signifies that the data has not been compressed and that the reader
should not execute the configured codec.

On the write side, we always execute the codec, and in the case that the
resulting compression ratio is not good enough, we just store the
uncompressed data as above. By default, the compression ratio is 0.9x
(i.e at least 10% space reduction). It can be configured by an
experimental flag.

Initially, the JIRA had described doing this based on a policy rather
than based on the actual output of the codec. However, during
development I realized that, even with encodings like BIT_SHUFFLE, there
are actually some patterns that can benefit from a second pass of LZ4.
And even if the second pass is not effective, it isn't too expensive to
try a second pass on the write side anyway. See [1] for a discussion of
a dataset that benefits from multiple passes of compression.

Along the way, made a few changes to how the compressed block decoder is
implemented:

* Use strings::Substitute instead of StringPrintf (easier to read)
* Cleaned up unnecessary header includes
* Made the CompressedBlockDecoder a short-lifetime stack object rather
  than an allocated member of CFileReader. This avoids an allocation and
  indirection, and additionally allows us to put more state inside the
  object itself without worrying about thread safety, etc.
* The CompressedBlockDecoder is now stateful, making for an
  easier-to-use API.

[1] https://groups.google.com/forum/#!msg/lz4c/DcN5SgFywwk/AVMOPri0O3gJ

Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
---
M src/kudu/cfile/block_compression.cc
M src/kudu/cfile/block_compression.h
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/compression-test.cc
6 files changed, 200 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/5679/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.

Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5679

to look at the new patch set (#4).

Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't compress
......................................................................

KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

This changes the compressed block header for CFile v2 so that it only
includes the uncompressed block length, and not a redundant copy of the
compressed block length (which is already known by the length of the
block itself).

In addition, this enables a special behavior if the stored uncompressed
block length is exactly equal to the length of the compressed data. This
signifies that the data has not been compressed and that the reader
should not execute the configured codec.

On the write side, we always execute the codec, and in the case that the
resulting compression ratio is not good enough, we just store the
uncompressed data as above. By default, the compression ratio is 0.9x
(i.e at least 10% space reduction). It can be configured by an
experimental flag.

Initially, the JIRA had described doing this based on a policy rather
than based on the actual output of the codec. However, during
development I realized that, even with encodings like BIT_SHUFFLE, there
are actually some patterns that can benefit from a second pass of LZ4.
And even if the second pass is not effective, it isn't too expensive to
try a second pass on the write side anyway. See [1] for a discussion of
a dataset that benefits from multiple passes of compression.

Along the way, made a few changes to how the compressed block decoder is
implemented:

* Use strings::Substitute instead of StringPrintf (easier to read)
* Cleaned up unnecessary header includes
* Made the CompressedBlockDecoder a short-lifetime stack object rather
  than an allocated member of CFileReader. This avoids an allocation and
  indirection, and additionally allows us to put more state inside the
  object itself without worrying about thread safety, etc.
* The CompressedBlockDecoder is now stateful, making for an
  easier-to-use API.

[1] https://groups.google.com/forum/#!msg/lz4c/DcN5SgFywwk/AVMOPri0O3gJ

Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
---
M src/kudu/cfile/block_compression.cc
M src/kudu/cfile/block_compression.h
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/compression-test.cc
6 files changed, 216 insertions(+), 115 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/5679/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.

Adar Dembo has submitted this change and it was merged.

Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't compress
......................................................................


KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

This changes the compressed block header for CFile v2 so that it only
includes the uncompressed block length, and not a redundant copy of the
compressed block length (which is already known by the length of the
block itself).

In addition, this enables a special behavior if the stored uncompressed
block length is exactly equal to the length of the compressed data. This
signifies that the data has not been compressed and that the reader
should not execute the configured codec.

On the write side, we always execute the codec, and in the case that the
resulting compression ratio is not good enough, we just store the
uncompressed data as above. By default, the compression ratio is 0.9x
(i.e at least 10% space reduction). It can be configured by an
experimental flag.

Initially, the JIRA had described doing this based on a policy rather
than based on the actual output of the codec. However, during
development I realized that, even with encodings like BIT_SHUFFLE, there
are actually some patterns that can benefit from a second pass of LZ4.
And even if the second pass is not effective, it isn't too expensive to
try a second pass on the write side anyway. See [1] for a discussion of
a dataset that benefits from multiple passes of compression.

Along the way, made a few changes to how the compressed block decoder is
implemented:

* Use strings::Substitute instead of StringPrintf (easier to read)
* Cleaned up unnecessary header includes
* Made the CompressedBlockDecoder a short-lifetime stack object rather
  than an allocated member of CFileReader. This avoids an allocation and
  indirection, and additionally allows us to put more state inside the
  object itself without worrying about thread safety, etc.
* The CompressedBlockDecoder is now stateful, making for an
  easier-to-use API.

[1] https://groups.google.com/forum/#!msg/lz4c/DcN5SgFywwk/AVMOPri0O3gJ

Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
Reviewed-on: http://gerrit.cloudera.org:8080/5679
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
---
M src/kudu/cfile/block_compression.cc
M src/kudu/cfile/block_compression.h
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/compression-test.cc
6 files changed, 216 insertions(+), 115 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.

Adar Dembo has posted comments on this change.

Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't compress
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/5679/2/src/kudu/cfile/block_compression.cc
File src/kudu/cfile/block_compression.cc:

Line 137:     // Check if the on-disk data size matches with the buffer
Nit: terminate with period.


Line 144:   } else {
Shouldn't we explicitly test for == 2? Otherwise if/when we add cfile v3, old deployments will assume the compression header looks like a v2 compression header.


PS2, Line 149: uncompressed_size_
Isn't this -1 at this point?


Line 172:   DCHECK_GE(uncompressed_size_, 0);
If you call uncompressed_size() below, you won't need this.


http://gerrit.cloudera.org:8080/#/c/5679/2/src/kudu/cfile/block_compression.h
File src/kudu/cfile/block_compression.h:

Line 43: // CFile version 2
Should include the <compressed data> bit in here too.


Line 61:   // Sets "*result" to the compressed version of the "data".
Update to data_slices.


http://gerrit.cloudera.org:8080/#/c/5679/2/src/kudu/cfile/compression-test.cc
File src/kudu/cfile/compression-test.cc:

Line 77: class TestCompression : public CFileTestBase {
Curious: what prompted the changes to this test?


-- 
To view, visit http://gerrit.cloudera.org:8080/5679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.

Todd Lipcon has posted comments on this change.

Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't compress
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/5679/2/src/kudu/cfile/block_compression.cc
File src/kudu/cfile/block_compression.cc:

Line 137:     // Check if the on-disk data size matches with the buffer
> Nit: terminate with period.
Done


Line 144:   } else {
> Shouldn't we explicitly test for == 2? Otherwise if/when we add cfile v3, o
I think if we were to make another change to this format, we would not bump the cfile version number, but instead we'd introduce it as an 'incompatible_feature' flag in the footer. Put another way, I dont expect us to ever have a new cfile version number.


PS2, Line 149: uncompressed_size_
> Isn't this -1 at this point?
woops. I wonder how this worked... perhaps due to unsigned integer nonsense. Will move it back where it belongs.


Line 172:   DCHECK_GE(uncompressed_size_, 0);
> If you call uncompressed_size() below, you won't need this.
yea, though I think calling your own public accessors is a bit ugly because it obscures what's going on (if I saw that I might think something's getting calculated, for example)


http://gerrit.cloudera.org:8080/#/c/5679/2/src/kudu/cfile/block_compression.h
File src/kudu/cfile/block_compression.h:

Line 43: // CFile version 2
> Should include the <compressed data> bit in here too.
Done


Line 61:   // Sets "*result" to the compressed version of the "data".
> Update to data_slices.
Done


http://gerrit.cloudera.org:8080/#/c/5679/2/src/kudu/cfile/compression-test.cc
File src/kudu/cfile/compression-test.cc:

Line 77: class TestCompression : public CFileTestBase {
> Curious: what prompted the changes to this test?
(a) switched from uint32/GROUP_VARINT to INT32/BIT_SHUFFLE since users can no longer specify uint32 columns in the first place and GROUP_VARINT encoding is deprecated.
(b) added the 'random' data generation with PLAIN encoding to ensure that some blocks we created were non-compressible and exercise the new code path. Will add a comment.


-- 
To view, visit http://gerrit.cloudera.org:8080/5679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.

Hello Jean-Daniel Cryans, Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5679

to look at the new patch set (#3).

Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't compress
......................................................................

KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

This changes the compressed block header for CFile v2 so that it only
includes the uncompressed block length, and not a redundant copy of the
compressed block length (which is already known by the length of the
block itself).

In addition, this enables a special behavior if the stored uncompressed
block length is exactly equal to the length of the compressed data. This
signifies that the data has not been compressed and that the reader
should not execute the configured codec.

On the write side, we always execute the codec, and in the case that the
resulting compression ratio is not good enough, we just store the
uncompressed data as above. By default, the compression ratio is 0.9x
(i.e at least 10% space reduction). It can be configured by an
experimental flag.

Initially, the JIRA had described doing this based on a policy rather
than based on the actual output of the codec. However, during
development I realized that, even with encodings like BIT_SHUFFLE, there
are actually some patterns that can benefit from a second pass of LZ4.
And even if the second pass is not effective, it isn't too expensive to
try a second pass on the write side anyway. See [1] for a discussion of
a dataset that benefits from multiple passes of compression.

Along the way, made a few changes to how the compressed block decoder is
implemented:

* Use strings::Substitute instead of StringPrintf (easier to read)
* Cleaned up unnecessary header includes
* Made the CompressedBlockDecoder a short-lifetime stack object rather
  than an allocated member of CFileReader. This avoids an allocation and
  indirection, and additionally allows us to put more state inside the
  object itself without worrying about thread safety, etc.
* The CompressedBlockDecoder is now stateful, making for an
  easier-to-use API.

[1] https://groups.google.com/forum/#!msg/lz4c/DcN5SgFywwk/AVMOPri0O3gJ

Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
---
M src/kudu/cfile/block_compression.cc
M src/kudu/cfile/block_compression.h
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/compression-test.cc
6 files changed, 214 insertions(+), 115 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/5679/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.

Adar Dembo has posted comments on this change.

Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't compress
......................................................................


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.

Todd Lipcon has posted comments on this change.

Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't compress
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5679/3/src/kudu/cfile/block_compression.cc
File src/kudu/cfile/block_compression.cc:

Line 152:   // In CFile v2, we ensure that compressed size <= uncompressed size.
> I found this statement a little confusing w.r.t. the actual behavior in Com
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.

Adar Dembo has posted comments on this change.

Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't compress
......................................................................


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5679/3/src/kudu/cfile/block_compression.cc
File src/kudu/cfile/block_compression.cc:

Line 152:   // In CFile v2, we ensure that compressed size <= uncompressed size.
I found this statement a little confusing w.r.t. the actual behavior in CompressedBlockBuilder::Compress() (the part that elides compression if compressed_size == uncompressed_size). 

Perhaps: "// CFile V2 guarantees that compressed size <= uncompressed size (though, as per the file format, if the two are equal the file data was not compressed)" ?


-- 
To view, visit http://gerrit.cloudera.org:8080/5679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes