You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2016/11/23 16:11:53 UTC

[kudu-CR] bshuf block: fix GetFirstKey() and GetLastKey()

Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5192

to look at the new patch set (#2).

Change subject: bshuf_block: fix GetFirstKey() and GetLastKey()
......................................................................

bshuf_block: fix GetFirstKey() and GetLastKey()

This fixes a bug where the bitshuffle encoding would return the wrong
value for GetFirstKey() and GetLastKey(). The issue was that, in the
case of UINT32s, the data_ array was being inspected after 'Finish()'
had already collapsed down the elements to their minimum width.

Since this collapsing is only done on UINT32 types, this would only
affect these functions on dictionary-encoded binary blocks, which use
bitshuffle internally for storing the UINT32 code words. However, the
"cleanup" commit prior to this one also fixed some incorrect indexing
that would affect BIT_SHUFFLE blocks outside the context of UINT32.

These functions are only used in the case that the column in question is
a non-composite primary key, in order to build the value index as well
as to determine the min/max bounds of a written CFile.

The result of returning the wrong value for GetLastKey() was that the
resulting index blocks would be incorrect, and seeks based on key would
end up at the wrong row. This could cause spurious "row not found"
errors or even crashes in some cases.

Apparently real users have not made use of BITSHUFFLE or DICT_ENCODING
on such columns, since we haven't seen reports of this in the wild; I
instead discovered this issue while working on KUDU-1751 (changing those
encodings to be the default).

The patch fixes the issue and also expands test coverage to cover the
GetFirstKey/GetLastKey calls.

Change-Id: I83dbc01ebf7a10e69b7fff6b7973967ebf6b4580
---
M src/kudu/cfile/bshuf_block.cc
M src/kudu/cfile/bshuf_block.h
M src/kudu/cfile/encoding-test.cc
3 files changed, 49 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/5192/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83dbc01ebf7a10e69b7fff6b7973967ebf6b4580
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins