You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org> on 2019/10/09 20:52:04 UTC

[kudu-CR] [util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs

Bankim Bhavsar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14405


Change subject: [util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs
......................................................................

[util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs

RLE encoding may encode "literally" when it doesn't find sufficient
repeated values.

When the number of values are not multiple of 8, literal count is
rounded up to multiple of 8. After reading the last value,
another attempt is made to read the value in GetNextRun() method.
This is beyond the max_bytes in the BitReader which returns false
and causes DCHECK() failure in GetNextRun() method.
For further details see JIRA KUDU-2968.

Fix is to check for the return value of BitReader::_GetValue()
and return without modifying further internal state of RleDecoder.
Similarly if an attempt is made by the caller of
RleDecoder::GetNextRun() to read after having read the last value
then BitReader::GetValue() would also return false again
leading to DCHECK() failure.

Tests:
- Added a test case for various int data types
that inserts wide range of non-repeated values
including non-multiple-of-8 and verfies GetNextRun() method.

Change-Id: If1703efd5307c66e282a2d2d7600dc4658c252df
---
M src/kudu/util/rle-encoding.h
M src/kudu/util/rle-test.cc
2 files changed, 57 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/14405/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If1703efd5307c66e282a2d2d7600dc4658c252df
Gerrit-Change-Number: 14405
Gerrit-PatchSet: 1
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>

[kudu-CR] [util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14405 )

Change subject: [util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14405/1/src/kudu/util/rle-test.cc
File src/kudu/util/rle-test.cc:

http://gerrit.cloudera.org:8080/#/c/14405/1/src/kudu/util/rle-test.cc@33
PS1, Line 33: #include "kudu/util/bit-stream-utils.h"
            : #include "kudu/util/bit-stream
> Code in util shouldn't depend on code in common. That's because util is int
Thanks for pointing that out. Fixed.



-- 
To view, visit http://gerrit.cloudera.org:8080/14405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If1703efd5307c66e282a2d2d7600dc4658c252df
Gerrit-Change-Number: 14405
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 09 Oct 2019 21:18:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] [util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14405

to look at the new patch set (#2).

Change subject: [util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs
......................................................................

[util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs

RLE encoding may encode "literally" when it doesn't find sufficient
repeated values.

When the number of values are not multiple of 8, literal count is
rounded up to multiple of 8. After reading the last value,
another attempt is made to read the value in GetNextRun() method.
This is beyond the max_bytes in the BitReader which returns false
and causes DCHECK() failure in GetNextRun() method.
For further details see JIRA KUDU-2968.

Fix is to check for the return value of BitReader::_GetValue()
and return without modifying further internal state of RleDecoder.
Similarly if an attempt is made by the caller of
RleDecoder::GetNextRun() to read after having read the last value
then BitReader::GetValue() would also return false again
leading to DCHECK() failure.

Tests:
- Added a test case for various int data types
that inserts wide range of non-repeated values
including non-multiple-of-8 and verfies GetNextRun() method.

Change-Id: If1703efd5307c66e282a2d2d7600dc4658c252df
---
M src/kudu/util/rle-encoding.h
M src/kudu/util/rle-test.cc
2 files changed, 50 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/14405/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If1703efd5307c66e282a2d2d7600dc4658c252df
Gerrit-Change-Number: 14405
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14405 )

Change subject: [util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs
......................................................................


Patch Set 2: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14405/2/src/kudu/util/rle-encoding.h
File src/kudu/util/rle-encoding.h:

http://gerrit.cloudera.org:8080/#/c/14405/2/src/kudu/util/rle-encoding.h@326
PS2, Line 326:         literal_count_--;
I don't think it is, but is this important to update? Maybe set to 0 if we return early?


http://gerrit.cloudera.org:8080/#/c/14405/2/src/kudu/util/rle-test.cc
File src/kudu/util/rle-test.cc:

http://gerrit.cloudera.org:8080/#/c/14405/2/src/kudu/util/rle-test.cc@555
PS2, Line 555: 
             :     auto max_num_vals = std::min<uint64_t>(1024, std::numeric_limits<IntType>::max());
             :     for (auto num_vals = 1; num_vals <= max_num_vals; num_vals++) {
nit: could you comment briefly what the high-level pieces and rationale of this test is? I can parse it out by reading and taking time to grok the issue; but thinking about reading this code a year from now, it'd be much quicker to read comments and then validate correctness rather than re-grok this.



-- 
To view, visit http://gerrit.cloudera.org:8080/14405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If1703efd5307c66e282a2d2d7600dc4658c252df
Gerrit-Change-Number: 14405
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 10 Oct 2019 03:06:03 +0000
Gerrit-HasComments: Yes

[kudu-CR] [util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14405 )

Change subject: [util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14405/2/src/kudu/util/rle-encoding.h
File src/kudu/util/rle-encoding.h:

http://gerrit.cloudera.org:8080/#/c/14405/2/src/kudu/util/rle-encoding.h@326
PS2, Line 326:         literal_count_--;
> I don't think it is, but is this important to update? Maybe set to 0 if we 
It's not necessary/important to update literal_count here.
As per decoding algo, literal_count starts off as multiple of 8 and
may not reach zero for the last set of values.
In this case I feel the end of the values is better handled by the bits and byte offset
maintained by bit_reader that correctly returns false.


http://gerrit.cloudera.org:8080/#/c/14405/2/src/kudu/util/rle-test.cc
File src/kudu/util/rle-test.cc:

http://gerrit.cloudera.org:8080/#/c/14405/2/src/kudu/util/rle-test.cc@555
PS2, Line 555: 
             :     auto max_num_vals = std::min<uint64_t>(1024, std::numeric_limits<IntType>::max());
             :     for (auto num_vals = 1; num_vals <= max_num_vals; num_vals++) {
> nit: could you comment briefly what the high-level pieces and rationale of 
Added clarifying comment at the top of the test and here.



-- 
To view, visit http://gerrit.cloudera.org:8080/14405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If1703efd5307c66e282a2d2d7600dc4658c252df
Gerrit-Change-Number: 14405
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 10 Oct 2019 21:51:40 +0000
Gerrit-HasComments: Yes

[kudu-CR] [util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14405

to look at the new patch set (#3).

Change subject: [util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs
......................................................................

[util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs

RLE encoding may encode "literally" when it doesn't find sufficient
repeated values.

When the number of values are not multiple of 8, literal count is
rounded up to multiple of 8. After reading the last value,
another attempt is made to read the value in GetNextRun() method.
This is beyond the max_bytes in the BitReader which returns false
and causes DCHECK() failure in GetNextRun() method.
For further details see JIRA KUDU-2968.

Fix is to check for the return value of BitReader::_GetValue()
and return without modifying further internal state of RleDecoder.
Similarly if an attempt is made by the caller of
RleDecoder::GetNextRun() to read after having read the last value
then BitReader::GetValue() would also return false again
leading to DCHECK() failure.

Tests:
- Added a test case for various int data types
that inserts wide range of non-repeated values
including non-multiple-of-8 and verfies GetNextRun() method.

Change-Id: If1703efd5307c66e282a2d2d7600dc4658c252df
---
M src/kudu/util/rle-encoding.h
M src/kudu/util/rle-test.cc
2 files changed, 52 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/14405/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If1703efd5307c66e282a2d2d7600dc4658c252df
Gerrit-Change-Number: 14405
Gerrit-PatchSet: 3
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14405 )

Change subject: [util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs
......................................................................


Patch Set 1:

(1 comment)

Just passing through...

http://gerrit.cloudera.org:8080/#/c/14405/1/src/kudu/util/rle-test.cc
File src/kudu/util/rle-test.cc:

http://gerrit.cloudera.org:8080/#/c/14405/1/src/kudu/util/rle-test.cc@33
PS1, Line 33: #include "kudu/common/common.pb.h"
            : #include "kudu/common/types.h"
Code in util shouldn't depend on code in common. That's because util is intended to be things that are generally useful to a C++ application and entirely agnostic to Kudu, while common is intended to hold Kudu-specific logic that's common to multiple modules.



-- 
To view, visit http://gerrit.cloudera.org:8080/14405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If1703efd5307c66e282a2d2d7600dc4658c252df
Gerrit-Change-Number: 14405
Gerrit-PatchSet: 1
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 09 Oct 2019 20:59:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] [util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14405 )

Change subject: [util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If1703efd5307c66e282a2d2d7600dc4658c252df
Gerrit-Change-Number: 14405
Gerrit-PatchSet: 3
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 10 Oct 2019 22:48:32 +0000
Gerrit-HasComments: No

[kudu-CR] [util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14405 )

Change subject: [util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs
......................................................................

[util] KUDU-2968 Fix RleDecoder::GetNextRun() in case of literal runs

RLE encoding may encode "literally" when it doesn't find sufficient
repeated values.

When the number of values are not multiple of 8, literal count is
rounded up to multiple of 8. After reading the last value,
another attempt is made to read the value in GetNextRun() method.
This is beyond the max_bytes in the BitReader which returns false
and causes DCHECK() failure in GetNextRun() method.
For further details see JIRA KUDU-2968.

Fix is to check for the return value of BitReader::_GetValue()
and return without modifying further internal state of RleDecoder.
Similarly if an attempt is made by the caller of
RleDecoder::GetNextRun() to read after having read the last value
then BitReader::GetValue() would also return false again
leading to DCHECK() failure.

Tests:
- Added a test case for various int data types
that inserts wide range of non-repeated values
including non-multiple-of-8 and verfies GetNextRun() method.

Change-Id: If1703efd5307c66e282a2d2d7600dc4658c252df
Reviewed-on: http://gerrit.cloudera.org:8080/14405
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <aw...@cloudera.com>
---
M src/kudu/util/rle-encoding.h
M src/kudu/util/rle-test.cc
2 files changed, 52 insertions(+), 3 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Andrew Wong: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/14405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: If1703efd5307c66e282a2d2d7600dc4658c252df
Gerrit-Change-Number: 14405
Gerrit-PatchSet: 4
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)