You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Will Berkeley (Code Review)" <ge...@cloudera.org> on 2016/06/03 19:07:02 UTC

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Will Berkeley has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3304

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................

KUDU-1398 CFile index blocks can store shortest separating prefix

This changes the values stored as index keys to be a shortest key
between the first key of the data block and the last key of the previous
data block. This change does not apply to deltafiles, as they expect to
be able to decode an index key into a DeltaKey.

Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
---
M src/kudu/cfile/binary_dict_block.cc
M src/kudu/cfile/binary_dict_block.h
M src/kudu/cfile/binary_plain_block.cc
M src/kudu/cfile/binary_plain_block.h
M src/kudu/cfile/binary_prefix_block.cc
M src/kudu/cfile/binary_prefix_block.h
M src/kudu/cfile/block_encodings.h
M src/kudu/cfile/bshuf_block.h
M src/kudu/cfile/cfile_util.cc
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/cfile/gvint_block.cc
M src/kudu/cfile/gvint_block.h
M src/kudu/cfile/plain_bitmap_block.h
M src/kudu/cfile/plain_block.h
M src/kudu/cfile/rle_block.h
M src/kudu/tablet/deltafile.cc
18 files changed, 196 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/3304/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>

[kudu-CR] [WIP] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3304

to look at the new patch set (#2).

Change subject: [WIP] KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................

[WIP] KUDU-1398 CFile index blocks can store shortest separating prefix

This changes the values stored as index keys to be a shortest key
between the first key of the data block and the last key of the previous
data block. This change does not apply to deltafiles or bloomfiles.
Deltafiles expect to be able to decode an index key into a DeltaKey.
Bloom files don't work with this change for reasons I haven't tracked
down yet.

Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
---
M src/kudu/cfile/binary_dict_block.cc
M src/kudu/cfile/binary_dict_block.h
M src/kudu/cfile/binary_plain_block.cc
M src/kudu/cfile/binary_plain_block.h
M src/kudu/cfile/binary_prefix_block.cc
M src/kudu/cfile/binary_prefix_block.h
M src/kudu/cfile/block_encodings.h
M src/kudu/cfile/bshuf_block.h
M src/kudu/cfile/cfile_util.cc
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/cfile/gvint_block.cc
M src/kudu/cfile/gvint_block.h
M src/kudu/cfile/plain_bitmap_block.h
M src/kudu/cfile/plain_block.h
M src/kudu/cfile/rle_block.h
M src/kudu/tablet/deltafile.cc
18 files changed, 196 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/3304/2
-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change.

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 9:

> Looks like a small compile error here

:( whoops...was too hasty...

-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 8:

Build Started http://104.196.14.100/job/kudu-gerrit/1870/

-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] [WIP] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: [WIP] KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 2:

Build Started http://104.196.14.100/job/kudu-gerrit/1741/

-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change.

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 4:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/3304/4//COMMIT_MSG
Commit Message:

PS4, Line 17: compatibility with bloomfile
> what's this mean?
There was a bug making bloom files not work. The trick is that bloom files call AppendRawBlock directly instead of FinishCurDataBlock. I altered AppendRawBlock to accept the last key as well as the first key, so now a BloomFileWriter maintains its own last key, just like it maintains its own first key. I feel like the change is a little awkward though, so I'd appreciate any ideas on how to improve it.


http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/binary_dict_block.cc
File src/kudu/cfile/binary_dict_block.cc:

Line 131:     last_key_.assign_copy(last->data(), last->size());
> instead of copying this every time, could we use GetLast on the code word b
Doesn't the dictionary in the block builder map from string to codeword? The last key we want is the last string in the block, not the codeword stored for it, so we can't look it up in the dictionary. I did improve this, though, by looking up in the dict block via GetKeyAtIdx, a method which was added to plain encoding per one of your comments. This means I had to drop CHECK(finished_) from it (but kept it in GetFirstKey and GetLastKey). Does that solution seem reasonable to you?


http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/binary_plain_block.cc
File src/kudu/cfile/binary_plain_block.cc:

Line 132: Status BinaryPlainBlockBuilder::GetLastKey(void *key_void) const {
> can we refactor this to share code with GetFirstKey? eg a GetKeyAtIndex(int
Done


http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/cfile_util.cc
File src/kudu/cfile/cfile_util.cc:

Line 126: void SeparatingKey(const faststring& left, const faststring& right, faststring *target) {
> can you add some unit tests for this function? (eg the examples from the co
Done


Line 127:   DCHECK(memcmp(left.data(), right.data(), min(left.size(), right.size())) <=
> I think DCHECK_LE(Slice(left).compare(Slice(right), 0); is probably more re
Done


Line 130:   if (left.size() == 0) {
> .empty()
Done


Line 131:     // special case for first block: use the whole key
> not sure why this is the case
Bloomfiles work with this now; no more special case needed.


Line 135:     target->assign_copy(right.data(), cpl == right.size() ? cpl : cpl + 1);
> hm, this is a slightly different algorithm than the one used in gutil/strin
I didn't realize that function existed! Still, it finds a separator in the range left <= * < right, and we require it in the range left < * <= right. I thought about it a bit and I don't see any advantage.


http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/cfile_util.h
File src/kudu/cfile/cfile_util.h:

Line 63:   bool deltafile;
> I think I'd prefer this be called something like 'optimize_index_keys' or s
I'm going to keep it simple for now and just rename, but I'll consider implementing your suggestion for delta keys a little later.


Line 110: // Computes the shortest key satisfying left < key < right and places it into target.
> should clarify whether 'target' may be the same faststring as 'left' or 'ri
Superseded by another change.


PS4, Line 111: const faststring& left, const faststring& right
> may make sense to use StringPiece or Slice here rather than specifically fa
Agreed. Converted to use Slice.


Line 111: void SeparatingKey(const faststring& left, const faststring& right, faststring *target);
> rename to GetSeparatingKey
Done


http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

Line 141:     last_key_.clear();
> no need (it's initialized to clear)
Done


Line 437:       SeparatingKey(last_key_, tmp_buf_, &sep_buf);
> maybe make SeparatingKey actually just shorten tmp_buf_ in place to avoid t
Done


http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/rle_block.h
File src/kudu/cfile/rle_block.h:

Line 265:     if (count_ > 0) {
> usually we put the error case first
I'm being consistent with GetFirstKey's (preexisting) implementation. If you like, I'll adjust both of them to handle error case first.


-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 11:

Build Started http://104.196.14.100/job/kudu-gerrit/1973/

-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3304

to look at the new patch set (#7).

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................

KUDU-1398 CFile index blocks can store shortest separating prefix

(No changes: resubmitting to trigger Jenkins build)

This changes the values stored as index keys to be a shortest key
between the first key of the data block and the last key of the previous
data block. However, this change does not apply to deltafiles, because
deltafiles expect to be able to decode an index key into a DeltaKey.

The way the change works is illustrated with the example from the JIRA,
extended:

Block 1: apple, banana, cardamom
Block 2: carrot, epazote, fennel
Block 3: fig, guava, kiwi

Before: ['apple' -> block 1, 'carrot' -> block 2, 'fig' -> block 3]

After: ['a' -> block 1, 'carr' -> block 2, 'fi' -> block 3]

Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
---
M src/kudu/cfile/binary_dict_block.cc
M src/kudu/cfile/binary_dict_block.h
M src/kudu/cfile/binary_plain_block.cc
M src/kudu/cfile/binary_plain_block.h
M src/kudu/cfile/binary_prefix_block.cc
M src/kudu/cfile/binary_prefix_block.h
M src/kudu/cfile/block_encodings.h
M src/kudu/cfile/bloomfile.cc
M src/kudu/cfile/bloomfile.h
M src/kudu/cfile/bshuf_block.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_util.cc
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/cfile/gvint_block.cc
M src/kudu/cfile/gvint_block.h
M src/kudu/cfile/index-test.cc
M src/kudu/cfile/index_block.cc
M src/kudu/cfile/plain_bitmap_block.h
M src/kudu/cfile/plain_block.h
M src/kudu/cfile/rle_block.h
M src/kudu/tablet/deltafile.cc
23 files changed, 226 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/3304/7
-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


KUDU-1398 CFile index blocks can store shortest separating prefix

(No changes: resubmitting to trigger Jenkins build)

This changes the values stored as index keys to be a shortest key
between the first key of the data block and the last key of the previous
data block. However, this change does not apply to deltafiles, because
deltafiles expect to be able to decode an index key into a DeltaKey.

The way the change works is illustrated with the example from the JIRA,
extended:

Block 1: apple, banana, cardamom
Block 2: carrot, epazote, fennel
Block 3: fig, guava, kiwi

Before: ['apple' -> block 1, 'carrot' -> block 2, 'fig' -> block 3]

After: ['a' -> block 1, 'carr' -> block 2, 'fi' -> block 3]

Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Reviewed-on: http://gerrit.cloudera.org:8080/3304
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <to...@apache.org>
---
M src/kudu/cfile/binary_dict_block.cc
M src/kudu/cfile/binary_dict_block.h
M src/kudu/cfile/binary_plain_block.cc
M src/kudu/cfile/binary_plain_block.h
M src/kudu/cfile/binary_prefix_block.cc
M src/kudu/cfile/binary_prefix_block.h
M src/kudu/cfile/block_encodings.h
M src/kudu/cfile/bloomfile.cc
M src/kudu/cfile/bloomfile.h
M src/kudu/cfile/bshuf_block.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_util.cc
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/cfile/gvint_block.cc
M src/kudu/cfile/gvint_block.h
M src/kudu/cfile/index-test.cc
M src/kudu/cfile/index_block.cc
M src/kudu/cfile/plain_bitmap_block.h
M src/kudu/cfile/plain_block.h
M src/kudu/cfile/rle_block.h
M src/kudu/tablet/deltafile.cc
23 files changed, 234 insertions(+), 71 deletions(-)

Approvals:
  Todd Lipcon: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 4:

Sorry for the delay on this, Will. I was on PTO the last couple of weeks. Hope to get back to review this week. I noticed the last build failed but the build results have been purged by this point. Is the current rev passing tests, etc?

-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 9:

Build Started http://104.196.14.100/job/kudu-gerrit/1927/

-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 9:

Looks like a small compile error here

-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change.

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 8:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/cfile/binary_dict_block.cc
File src/kudu/cfile/binary_dict_block.cc:

Line 176:     data_builder_->GetLastKey(reinterpret_cast<void*>(&last_codeword));
> RETURN_NOT_OK here
Done


http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/cfile/binary_plain_block.h
File src/kudu/cfile/binary_plain_block.h:

Line 62:   // key should be a Slice *
> 'key_void' should be a Slice*
Done


PS8, Line 63: void *k
> style: void* (no space)
Done


PS8, Line 63: size_t
> better to use 'int' (we try to avoid unsigned types these days)
Done


Line 71:   Status GetLastKey(void *key) const OVERRIDE;
> same
Done


http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/cfile/cfile_util.h
File src/kudu/cfile/cfile_util.h:

Line 109: void GetSeparatingKey(const Slice& left, Slice& right);
> should be Slice* (we avoid non-const refs)
Done


http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

Line 141:     last_key_.clear();
> the default is empty, so no need to clear
Done


http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/cfile/rle_block.h
File src/kudu/cfile/rle_block.h:

Line 269:     return Status::NotFound("No keys in the block");
> usually we put the error case first, eg:
Done


http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/tablet/deltafile.cc
File src/kudu/tablet/deltafile.cc:

Line 76:   opts.optimize_index_keys = false;
> would be good to include a comment here explaining why (it's in the commit 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 7:

Build Started http://104.196.14.100/job/kudu-gerrit/1869/

-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3304

to look at the new patch set (#4).

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................

KUDU-1398 CFile index blocks can store shortest separating prefix

This changes the values stored as index keys to be a shortest key
between the first key of the data block and the last key of the previous
data block. Two exceptions:

1. This change does not apply to deltafiles. Deltafiles expect to be able
to decode an index key into a DeltaKey.

2. The first key in a file, which could always be "", is left the same
as before for compatibility with bloomfiles.

The way the change works is illustrated with the example from the JIRA,
extended:

Block 1: apple, banana, cardamom
Block 2: carrot, epazote, fennel
Block 3: fig, guava, kiwi

Before: ['apple' -> block 1, 'carrot' -> block 2, 'fig' -> block 3]

After: ['apple' -> block 1, 'carr' -> block 2, 'fi' -> block 3]

Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
---
M src/kudu/cfile/binary_dict_block.cc
M src/kudu/cfile/binary_dict_block.h
M src/kudu/cfile/binary_plain_block.cc
M src/kudu/cfile/binary_plain_block.h
M src/kudu/cfile/binary_prefix_block.cc
M src/kudu/cfile/binary_prefix_block.h
M src/kudu/cfile/block_encodings.h
M src/kudu/cfile/bshuf_block.h
M src/kudu/cfile/cfile_util.cc
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/cfile/gvint_block.cc
M src/kudu/cfile/gvint_block.h
M src/kudu/cfile/index_block.cc
M src/kudu/cfile/plain_bitmap_block.h
M src/kudu/cfile/plain_block.h
M src/kudu/cfile/rle_block.h
M src/kudu/tablet/deltafile.cc
19 files changed, 204 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/3304/4
-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 1:

Build Started http://104.196.14.100/job/kudu-gerrit/1739/

-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3304

to look at the new patch set (#6).

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................

KUDU-1398 CFile index blocks can store shortest separating prefix

This changes the values stored as index keys to be a shortest key
between the first key of the data block and the last key of the previous
data block. However, this change does not apply to deltafiles, because
deltafiles expect to be able to decode an index key into a DeltaKey.

The way the change works is illustrated with the example from the JIRA,
extended:

Block 1: apple, banana, cardamom
Block 2: carrot, epazote, fennel
Block 3: fig, guava, kiwi

Before: ['apple' -> block 1, 'carrot' -> block 2, 'fig' -> block 3]

After: ['a' -> block 1, 'carr' -> block 2, 'fi' -> block 3]

Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
---
M src/kudu/cfile/binary_dict_block.cc
M src/kudu/cfile/binary_dict_block.h
M src/kudu/cfile/binary_plain_block.cc
M src/kudu/cfile/binary_plain_block.h
M src/kudu/cfile/binary_prefix_block.cc
M src/kudu/cfile/binary_prefix_block.h
M src/kudu/cfile/block_encodings.h
M src/kudu/cfile/bloomfile.cc
M src/kudu/cfile/bloomfile.h
M src/kudu/cfile/bshuf_block.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_util.cc
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/cfile/gvint_block.cc
M src/kudu/cfile/gvint_block.h
M src/kudu/cfile/index-test.cc
M src/kudu/cfile/index_block.cc
M src/kudu/cfile/plain_bitmap_block.h
M src/kudu/cfile/plain_block.h
M src/kudu/cfile/rle_block.h
M src/kudu/tablet/deltafile.cc
23 files changed, 226 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/3304/6
-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 5:

Build Started http://104.196.14.100/job/kudu-gerrit/1786/

-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3304

to look at the new patch set (#5).

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................

KUDU-1398 CFile index blocks can store shortest separating prefix

This changes the values stored as index keys to be a shortest key
between the first key of the data block and the last key of the previous
data block. However, this change does not apply to deltafiles, because
deltafiles expect to be able to decode an index key into a DeltaKey.

The way the change works is illustrated with the example from the JIRA,
extended:

Block 1: apple, banana, cardamom
Block 2: carrot, epazote, fennel
Block 3: fig, guava, kiwi

Before: ['apple' -> block 1, 'carrot' -> block 2, 'fig' -> block 3]

After: ['a' -> block 1, 'carr' -> block 2, 'fi' -> block 3]

Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
---
M src/kudu/cfile/binary_dict_block.cc
M src/kudu/cfile/binary_dict_block.h
M src/kudu/cfile/binary_plain_block.cc
M src/kudu/cfile/binary_plain_block.h
M src/kudu/cfile/binary_prefix_block.cc
M src/kudu/cfile/binary_prefix_block.h
M src/kudu/cfile/block_encodings.h
M src/kudu/cfile/bshuf_block.h
M src/kudu/cfile/cfile_util.cc
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/cfile/gvint_block.cc
M src/kudu/cfile/gvint_block.h
M src/kudu/cfile/index-test.cc
M src/kudu/cfile/index_block.cc
M src/kudu/cfile/plain_bitmap_block.h
M src/kudu/cfile/plain_block.h
M src/kudu/cfile/rle_block.h
M src/kudu/tablet/deltafile.cc
20 files changed, 206 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/3304/5
-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 10:

Build Started http://104.196.14.100/job/kudu-gerrit/1948/

-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] [WIP] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change.

Change subject: [WIP] KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 2: Code-Review-1

(1 comment)

> (1 comment)

http://gerrit.cloudera.org:8080/#/c/3304/2//COMMIT_MSG
Commit Message:

Line 12: Deltafiles expect to be able to decode an index key into a DeltaKey.
> hmm, really? where's that code path? I would have guessed that delta and bl
I think the errors I saw when I didn't except delta files came from GetFirstRowIndexInCurrentBlock, about L449 of deltafile.cc. It decodes the index entry into a delta key, but a shortened key isn't decodable into a DeltaKey. The bloom file thing is speculation on my part while I investigate...I also forgot to actually commit the changes for my second patch set. I'll post an updated set after I get some more sleep.


-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change.

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 6:

> Sorry for the delay on this, Will. I was on PTO the last couple of
 > weeks. Hope to get back to review this week. I noticed the last
 > build failed but the build results have been purged by this point.
 > Is the current rev passing tests, etc?

No worries Mr Lipcon. I hope you enjoyed your vacation. IIRC it passes everything, but on the last build Jenkins choked somehow and failed the build through no fault on my own. I'll verify it passes the basic stuff locally and repush so Jenkins can (hopefully) verify.

Unrelatedly, I have a patch in the works for KUDU-1227 (update-merge). It works and passes the tests + the tests I wrote for it, but definitely needs some review love. Is there interest in that for 1.0? I mostly did it for personal enrichment but obv happy to contribute it if it's wanted.

-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] [WIP] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3304

to look at the new patch set (#3).

Change subject: [WIP] KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................

[WIP] KUDU-1398 CFile index blocks can store shortest separating prefix

This changes the values stored as index keys to be a shortest key
between the first key of the data block and the last key of the previous
data block. This change does not apply to deltafiles or bloomfiles.
Deltafiles expect to be able to decode an index key into a DeltaKey.
Bloom files don't work with this change for reasons I haven't tracked
down yet.

Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
---
M src/kudu/cfile/binary_dict_block.cc
M src/kudu/cfile/binary_dict_block.h
M src/kudu/cfile/binary_plain_block.cc
M src/kudu/cfile/binary_plain_block.h
M src/kudu/cfile/binary_prefix_block.cc
M src/kudu/cfile/binary_prefix_block.h
M src/kudu/cfile/block_encodings.h
M src/kudu/cfile/bloomfile.cc
M src/kudu/cfile/bshuf_block.h
M src/kudu/cfile/cfile_util.cc
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/cfile/gvint_block.cc
M src/kudu/cfile/gvint_block.h
M src/kudu/cfile/plain_bitmap_block.h
M src/kudu/cfile/plain_block.h
M src/kudu/cfile/rle_block.h
M src/kudu/tablet/deltafile.cc
19 files changed, 197 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/3304/3
-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 8:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/cfile/binary_dict_block.cc
File src/kudu/cfile/binary_dict_block.cc:

Line 176:     data_builder_->GetLastKey(reinterpret_cast<void*>(&last_codeword));
RETURN_NOT_OK here


http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/cfile/binary_plain_block.h
File src/kudu/cfile/binary_plain_block.h:

Line 62:   // key should be a Slice *
'key_void' should be a Slice*


PS8, Line 63: void *k
style: void* (no space)


PS8, Line 63: size_t
better to use 'int' (we try to avoid unsigned types these days)


Line 71:   Status GetLastKey(void *key) const OVERRIDE;
same


http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/cfile/cfile_util.h
File src/kudu/cfile/cfile_util.h:

Line 109: void GetSeparatingKey(const Slice& left, Slice& right);
should be Slice* (we avoid non-const refs)


http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

Line 141:     last_key_.clear();
the default is empty, so no need to clear


http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/cfile/rle_block.h
File src/kudu/cfile/rle_block.h:

Line 269:     return Status::NotFound("No keys in the block");
usually we put the error case first, eg:

if (PREDICT_FALSE(count_ == 0)) {
  return Status::NotFound(...);
}

(the philosophy is to keep the 'straight line' or 'common' code path with as little indentation as possible)


http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/tablet/deltafile.cc
File src/kudu/tablet/deltafile.cc:

Line 76:   opts.optimize_index_keys = false;
would be good to include a comment here explaining why (it's in the commit message, but people won't see that when reading the code)


-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] [WIP] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: [WIP] KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3304/2//COMMIT_MSG
Commit Message:

Line 12: Deltafiles expect to be able to decode an index key into a DeltaKey.
hmm, really? where's that code path? I would have guessed that delta and bloom files treat the indexes identically to cfiles.


-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3304

to look at the new patch set (#9).

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................

KUDU-1398 CFile index blocks can store shortest separating prefix

(No changes: resubmitting to trigger Jenkins build)

This changes the values stored as index keys to be a shortest key
between the first key of the data block and the last key of the previous
data block. However, this change does not apply to deltafiles, because
deltafiles expect to be able to decode an index key into a DeltaKey.

The way the change works is illustrated with the example from the JIRA,
extended:

Block 1: apple, banana, cardamom
Block 2: carrot, epazote, fennel
Block 3: fig, guava, kiwi

Before: ['apple' -> block 1, 'carrot' -> block 2, 'fig' -> block 3]

After: ['a' -> block 1, 'carr' -> block 2, 'fi' -> block 3]

Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
---
M src/kudu/cfile/binary_dict_block.cc
M src/kudu/cfile/binary_dict_block.h
M src/kudu/cfile/binary_plain_block.cc
M src/kudu/cfile/binary_plain_block.h
M src/kudu/cfile/binary_prefix_block.cc
M src/kudu/cfile/binary_prefix_block.h
M src/kudu/cfile/block_encodings.h
M src/kudu/cfile/bloomfile.cc
M src/kudu/cfile/bloomfile.h
M src/kudu/cfile/bshuf_block.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_util.cc
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/cfile/gvint_block.cc
M src/kudu/cfile/gvint_block.h
M src/kudu/cfile/index-test.cc
M src/kudu/cfile/index_block.cc
M src/kudu/cfile/plain_bitmap_block.h
M src/kudu/cfile/plain_block.h
M src/kudu/cfile/rle_block.h
M src/kudu/tablet/deltafile.cc
23 files changed, 234 insertions(+), 71 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/3304/9
-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 4:

Build Started http://104.196.14.100/job/kudu-gerrit/1749/

-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3304

to look at the new patch set (#10).

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................

KUDU-1398 CFile index blocks can store shortest separating prefix

(No changes: resubmitting to trigger Jenkins build)

This changes the values stored as index keys to be a shortest key
between the first key of the data block and the last key of the previous
data block. However, this change does not apply to deltafiles, because
deltafiles expect to be able to decode an index key into a DeltaKey.

The way the change works is illustrated with the example from the JIRA,
extended:

Block 1: apple, banana, cardamom
Block 2: carrot, epazote, fennel
Block 3: fig, guava, kiwi

Before: ['apple' -> block 1, 'carrot' -> block 2, 'fig' -> block 3]

After: ['a' -> block 1, 'carr' -> block 2, 'fi' -> block 3]

Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
---
M src/kudu/cfile/binary_dict_block.cc
M src/kudu/cfile/binary_dict_block.h
M src/kudu/cfile/binary_plain_block.cc
M src/kudu/cfile/binary_plain_block.h
M src/kudu/cfile/binary_prefix_block.cc
M src/kudu/cfile/binary_prefix_block.h
M src/kudu/cfile/block_encodings.h
M src/kudu/cfile/bloomfile.cc
M src/kudu/cfile/bloomfile.h
M src/kudu/cfile/bshuf_block.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_util.cc
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/cfile/gvint_block.cc
M src/kudu/cfile/gvint_block.h
M src/kudu/cfile/index-test.cc
M src/kudu/cfile/index_block.cc
M src/kudu/cfile/plain_bitmap_block.h
M src/kudu/cfile/plain_block.h
M src/kudu/cfile/rle_block.h
M src/kudu/tablet/deltafile.cc
23 files changed, 234 insertions(+), 71 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/3304/10
-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 6:

Build Started http://104.196.14.100/job/kudu-gerrit/1787/

-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] [WIP] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: [WIP] KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 3:

Build Started http://104.196.14.100/job/kudu-gerrit/1746/

-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 4:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/3304/4//COMMIT_MSG
Commit Message:

PS4, Line 17: compatibility with bloomfile
what's this mean?


http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/binary_dict_block.cc
File src/kudu/cfile/binary_dict_block.cc:

Line 131:     last_key_.assign_copy(last->data(), last->size());
instead of copying this every time, could we use GetLast on the code word builder, then look it up in our dictionary? What you have here will probably hurt write performance


http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/binary_plain_block.cc
File src/kudu/cfile/binary_plain_block.cc:

Line 132: Status BinaryPlainBlockBuilder::GetLastKey(void *key_void) const {
can we refactor this to share code with GetFirstKey? eg a GetKeyAtIndex(int idx)?


http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/cfile_util.cc
File src/kudu/cfile/cfile_util.cc:

Line 126: void SeparatingKey(const faststring& left, const faststring& right, faststring *target) {
can you add some unit tests for this function? (eg the examples from the commit message)


Line 127:   DCHECK(memcmp(left.data(), right.data(), min(left.size(), right.size())) <=
I think DCHECK_LE(Slice(left).compare(Slice(right), 0); is probably more readable. Especially if you change this function to take slices as arguments


Line 130:   if (left.size() == 0) {
.empty()


Line 131:     // special case for first block: use the whole key
not sure why this is the case


Line 135:     target->assign_copy(right.data(), cpl == right.size() ? cpl : cpl + 1);
hm, this is a slightly different algorithm than the one used in gutil/strings/util.h (FindShortestSeparator). eg their example:

  // FindShortestSeparator("foobar", "foxhunt", &sep) => sep == "fop"

you would return 'fox', right? Either one is the same length, but I guess theirs ends up tighter to the left bound whereas yours ends up somewhere in the middle. Can you think of any advantages/disadvantages between the two? As far as I can think, either is correct, but maybe I'm missing something.


http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/cfile_util.h
File src/kudu/cfile/cfile_util.h:

Line 63:   bool deltafile;
I think I'd prefer this be called something like 'optimize_index_keys' or something, rather than specifically referring to whether it's a deltafile.

Another option is to have an int 'preserve_index_key_prefix_length' or somesuch, where for delta keys we pass the maximum possible encoded length of a delta key. This ensures that we can still decode it, but we might still be able to truncate the value (since the delta entries themselves can be arbitrarily long with user data)


Line 110: // Computes the shortest key satisfying left < key < right and places it into target.
should clarify whether 'target' may be the same faststring as 'left' or 'right' or not


Line 111: void SeparatingKey(const faststring& left, const faststring& right, faststring *target);
rename to GetSeparatingKey


PS4, Line 111: const faststring& left, const faststring& right
may make sense to use StringPiece or Slice here rather than specifically faststring


http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

Line 141:     last_key_.clear();
no need (it's initialized to clear)


Line 437:       SeparatingKey(last_key_, tmp_buf_, &sep_buf);
maybe make SeparatingKey actually just shorten tmp_buf_ in place to avoid the extra allocation here


http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/rle_block.h
File src/kudu/cfile/rle_block.h:

Line 265:     if (count_ > 0) {
usually we put the error case first


-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix
......................................................................


Patch Set 11: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No