You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2016/11/23 00:52:23 UTC

[kudu-CR] bshuf block: fix GetLastKey()

Todd Lipcon has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5192

Change subject: bshuf_block: fix GetLastKey()
......................................................................

bshuf_block: fix GetLastKey()

This fixes a bug where the bitshuffle encoding would return the wrong
value for GetLastKey(). This function is only used in the case that the
column in question is a non-composite primary key, and so far in real
use cases users have apparently only used bitshuffle-encoded ints as
non-key columns or part of a composite key (in which case the index is
generated on the encoded composite key, not the individual columns)

The result of returning the wrong value for GetLastKey() was that the
resulting index blocks would be incorrect, and seeks based on key would
end up at the wrong row. This could cause spurious "row not found"
errors or even crashes in some cases.

The patch fixes the issue and also expands test coverage to cover the
GetFirstKey/GetLastKey calls.

Change-Id: I83dbc01ebf7a10e69b7fff6b7973967ebf6b4580
---
M src/kudu/cfile/bshuf_block.h
M src/kudu/cfile/encoding-test.cc
2 files changed, 30 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/5192/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I83dbc01ebf7a10e69b7fff6b7973967ebf6b4580
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>

[kudu-CR] bshuf block: fix GetFirstKey() and GetLastKey()

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5192

to look at the new patch set (#4).

Change subject: bshuf_block: fix GetFirstKey() and GetLastKey()
......................................................................

bshuf_block: fix GetFirstKey() and GetLastKey()

This fixes a bug where the bitshuffle encoding would return the wrong
value for GetFirstKey() and GetLastKey(). The issue was that, in the
case of UINT32s, the data_ array was being inspected after 'Finish()'
had already collapsed down the elements to their minimum width.

Since this collapsing is only done on UINT32 types, this would only
affect these functions on dictionary-encoded binary blocks, which use
bitshuffle internally for storing the UINT32 code words. However, the
"cleanup" commit prior to this one also fixed some incorrect indexing
that would affect BIT_SHUFFLE blocks outside the context of UINT32.

These functions are only used in the case that the column in question is
a non-composite primary key, in order to build the value index as well
as to determine the min/max bounds of a written CFile.

The result of returning the wrong value for GetLastKey() was that the
resulting index blocks would be incorrect, and seeks based on key would
end up at the wrong row. This could cause spurious "row not found"
errors or even crashes in some cases.

Apparently real users have not made use of BITSHUFFLE or DICT_ENCODING
on such columns, since we haven't seen reports of this in the wild; I
instead discovered this issue while working on KUDU-1751 (changing those
encodings to be the default).

The patch fixes the issue and also expands test coverage to cover the
GetFirstKey/GetLastKey calls.

Change-Id: I83dbc01ebf7a10e69b7fff6b7973967ebf6b4580
---
M src/kudu/cfile/bshuf_block.cc
M src/kudu/cfile/bshuf_block.h
M src/kudu/cfile/encoding-test.cc
3 files changed, 49 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/5192/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83dbc01ebf7a10e69b7fff6b7973967ebf6b4580
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] bshuf block: fix GetFirstKey() and GetLastKey()

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5192

to look at the new patch set (#5).

Change subject: bshuf_block: fix GetFirstKey() and GetLastKey()
......................................................................

bshuf_block: fix GetFirstKey() and GetLastKey()

This fixes a bug where the bitshuffle encoding would return the wrong
value for GetFirstKey() and GetLastKey(). The issue was that, in the
case of UINT32s, the data_ array was being inspected after 'Finish()'
had already collapsed down the elements to their minimum width.

Since this collapsing is only done on UINT32 types, this would only
affect these functions on dictionary-encoded binary blocks, which use
bitshuffle internally for storing the UINT32 code words. However, the
"cleanup" commit prior to this one also fixed some incorrect indexing
that would affect BIT_SHUFFLE blocks outside the context of UINT32.

These functions are only used in the case that the column in question is
a non-composite primary key, in order to build the value index as well
as to determine the min/max bounds of a written CFile.

The result of returning the wrong value for GetLastKey() was that the
resulting index blocks would be incorrect, and seeks based on key would
end up at the wrong row. This could cause spurious "row not found"
errors or even crashes in some cases.

Apparently real users have not made use of BITSHUFFLE or DICT_ENCODING
on such columns, since we haven't seen reports of this in the wild; I
instead discovered this issue while working on KUDU-1751 (changing those
encodings to be the default).

The patch fixes the issue and also expands test coverage to cover the
GetFirstKey/GetLastKey calls.

Change-Id: I83dbc01ebf7a10e69b7fff6b7973967ebf6b4580
---
M src/kudu/cfile/bshuf_block.cc
M src/kudu/cfile/bshuf_block.h
M src/kudu/cfile/encoding-test.cc
3 files changed, 49 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/5192/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83dbc01ebf7a10e69b7fff6b7973967ebf6b4580
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] bshuf block: fix GetFirstKey() and GetLastKey()

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: bshuf_block: fix GetFirstKey() and GetLastKey()
......................................................................


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I83dbc01ebf7a10e69b7fff6b7973967ebf6b4580
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No

[kudu-CR] bshuf block: fix GetFirstKey() and GetLastKey()

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5192

to look at the new patch set (#2).

Change subject: bshuf_block: fix GetFirstKey() and GetLastKey()
......................................................................

bshuf_block: fix GetFirstKey() and GetLastKey()

This fixes a bug where the bitshuffle encoding would return the wrong
value for GetFirstKey() and GetLastKey(). The issue was that, in the
case of UINT32s, the data_ array was being inspected after 'Finish()'
had already collapsed down the elements to their minimum width.

Since this collapsing is only done on UINT32 types, this would only
affect these functions on dictionary-encoded binary blocks, which use
bitshuffle internally for storing the UINT32 code words. However, the
"cleanup" commit prior to this one also fixed some incorrect indexing
that would affect BIT_SHUFFLE blocks outside the context of UINT32.

These functions are only used in the case that the column in question is
a non-composite primary key, in order to build the value index as well
as to determine the min/max bounds of a written CFile.

The result of returning the wrong value for GetLastKey() was that the
resulting index blocks would be incorrect, and seeks based on key would
end up at the wrong row. This could cause spurious "row not found"
errors or even crashes in some cases.

Apparently real users have not made use of BITSHUFFLE or DICT_ENCODING
on such columns, since we haven't seen reports of this in the wild; I
instead discovered this issue while working on KUDU-1751 (changing those
encodings to be the default).

The patch fixes the issue and also expands test coverage to cover the
GetFirstKey/GetLastKey calls.

Change-Id: I83dbc01ebf7a10e69b7fff6b7973967ebf6b4580
---
M src/kudu/cfile/bshuf_block.cc
M src/kudu/cfile/bshuf_block.h
M src/kudu/cfile/encoding-test.cc
3 files changed, 49 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/5192/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83dbc01ebf7a10e69b7fff6b7973967ebf6b4580
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] bshuf block: fix GetFirstKey() and GetLastKey()

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.

Change subject: bshuf_block: fix GetFirstKey() and GetLastKey()
......................................................................


bshuf_block: fix GetFirstKey() and GetLastKey()

This fixes a bug where the bitshuffle encoding would return the wrong
value for GetFirstKey() and GetLastKey(). The issue was that, in the
case of UINT32s, the data_ array was being inspected after 'Finish()'
had already collapsed down the elements to their minimum width.

Since this collapsing is only done on UINT32 types, this would only
affect these functions on dictionary-encoded binary blocks, which use
bitshuffle internally for storing the UINT32 code words. However, the
"cleanup" commit prior to this one also fixed some incorrect indexing
that would affect BIT_SHUFFLE blocks outside the context of UINT32.

These functions are only used in the case that the column in question is
a non-composite primary key, in order to build the value index as well
as to determine the min/max bounds of a written CFile.

The result of returning the wrong value for GetLastKey() was that the
resulting index blocks would be incorrect, and seeks based on key would
end up at the wrong row. This could cause spurious "row not found"
errors or even crashes in some cases.

Apparently real users have not made use of BITSHUFFLE or DICT_ENCODING
on such columns, since we haven't seen reports of this in the wild; I
instead discovered this issue while working on KUDU-1751 (changing those
encodings to be the default).

The patch fixes the issue and also expands test coverage to cover the
GetFirstKey/GetLastKey calls.

Change-Id: I83dbc01ebf7a10e69b7fff6b7973967ebf6b4580
Reviewed-on: http://gerrit.cloudera.org:8080/5192
Tested-by: Kudu Jenkins
Reviewed-by: Dan Burkert <da...@apache.org>
---
M src/kudu/cfile/bshuf_block.cc
M src/kudu/cfile/bshuf_block.h
M src/kudu/cfile/encoding-test.cc
3 files changed, 49 insertions(+), 11 deletions(-)

Approvals:
  Dan Burkert: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I83dbc01ebf7a10e69b7fff6b7973967ebf6b4580
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>