You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2020/03/24 21:47:16 UTC

[kudu-CR] wire protocol: some simplification and optimization for rowwise encoding

Hello Andrew Wong, Grant Henke,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/15550

to review the following change.


Change subject: wire_protocol: some simplification and optimization for rowwise encoding
......................................................................

wire_protocol: some simplification and optimization for rowwise encoding

* Re-implement GetSelectedRows based on the new ForEachSetBit(...)
  utility, which operates word-by-word instead of byte-by-byte.

* use boost::none as a special case return value which indicates the
  common case of "all rows are selected". Currently the rowwise
  serialization code path doesn't use this special value (and just
  reproduces the old std::iota() call to generate the sequence of all
  indexes), but the columnar code path will special case this as a
  memcpy.

* Avoid one call to CountSelected() in SerializeRowBlock() by calculating
  num_rows from the size of the row index vector.

* Change SerializeRowBlock() to return an int indicating the number of
  rows selected, instead of accumulating it into the protobuf. This
  value can then be re-used to eliminate one extra call to CountSelected
  in ScanResultCopier::HandleRowBlock().

  After this change, the protobuf is no longer used by
  SerializeRowBlock, so I removed the parameter, which required a bit of
  fixup in the tests.

Change-Id: I24dfb1bd036fde514ca6494bae0ddc171dd225dd
---
M src/kudu/common/rowblock-test.cc
M src/kudu/common/rowblock.cc
M src/kudu/common/rowblock.h
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/common/wire_protocol.h
M src/kudu/tserver/tablet_service.cc
7 files changed, 95 insertions(+), 62 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/15550/1
-- 
To view, visit http://gerrit.cloudera.org:8080/15550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I24dfb1bd036fde514ca6494bae0ddc171dd225dd
Gerrit-Change-Number: 15550
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>

[kudu-CR] wire protocol: some simplification and optimization for rowwise encoding

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15550 )

Change subject: wire_protocol: some simplification and optimization for rowwise encoding
......................................................................

wire_protocol: some simplification and optimization for rowwise encoding

* Re-implement GetSelectedRows based on the new ForEachSetBit(...)
  utility, which operates word-by-word instead of byte-by-byte.

* use a boolean return value which indicates the common case of "all
  rows are selected". Currently the rowwise serialization code path
  doesn't use this special value (and just reproduces the old
  std::iota() call to generate the sequence of all indexes), but the
  columnar code path will special case this as a memcpy.

* Avoid one call to CountSelected() in SerializeRowBlock() by calculating
  num_rows from the size of the row index vector.

* Change SerializeRowBlock() to return an int indicating the number of
  rows selected, instead of accumulating it into the protobuf. This
  value can then be re-used to eliminate one extra call to CountSelected
  in ScanResultCopier::HandleRowBlock().

  After this change, the protobuf is no longer used by
  SerializeRowBlock, so I removed the parameter, which required a bit of
  fixup in the tests.

Change-Id: I24dfb1bd036fde514ca6494bae0ddc171dd225dd
Reviewed-on: http://gerrit.cloudera.org:8080/15550
Reviewed-by: Grant Henke <gr...@apache.org>
Tested-by: Todd Lipcon <to...@apache.org>
---
M src/kudu/common/rowblock-test.cc
M src/kudu/common/rowblock.cc
M src/kudu/common/rowblock.h
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/common/wire_protocol.h
M src/kudu/tserver/tablet_service.cc
7 files changed, 89 insertions(+), 67 deletions(-)

Approvals:
  Grant Henke: Looks good to me, approved
  Todd Lipcon: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/15550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I24dfb1bd036fde514ca6494bae0ddc171dd225dd
Gerrit-Change-Number: 15550
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] wire protocol: some simplification and optimization for rowwise encoding

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15550 )

Change subject: wire_protocol: some simplification and optimization for rowwise encoding
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15550/2/src/kudu/common/rowblock.h
File src/kudu/common/rowblock.h:

http://gerrit.cloudera.org:8080/#/c/15550/2/src/kudu/common/rowblock.h@108
PS2, Line 108:   // In the case that all rows are selected, sets *selected to 'boost::none'
> It feels a little unintuitive that none == all, but as long as we document 
I felt similarly, but I also don't mind it if it's documented. Alternatively, we could forego optionality entirely and return a bool indicating whether all rows were selected or something, but I don't feel strongly about it.


http://gerrit.cloudera.org:8080/#/c/15550/2/src/kudu/common/wire_protocol.h
File src/kudu/common/wire_protocol.h:

http://gerrit.cloudera.org:8080/#/c/15550/2/src/kudu/common/wire_protocol.h@164
PS2, Line 164: protobuf and 
nit: this no longer applies



-- 
To view, visit http://gerrit.cloudera.org:8080/15550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24dfb1bd036fde514ca6494bae0ddc171dd225dd
Gerrit-Change-Number: 15550
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 25 Mar 2020 04:12:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] wire protocol: some simplification and optimization for rowwise encoding

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/15550 )

Change subject: wire_protocol: some simplification and optimization for rowwise encoding
......................................................................


Patch Set 2: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15550/2/src/kudu/common/rowblock.h
File src/kudu/common/rowblock.h:

http://gerrit.cloudera.org:8080/#/c/15550/2/src/kudu/common/rowblock.h@108
PS2, Line 108:   // In the case that all rows are selected, sets *selected to 'boost::none'
It feels a little unintuitive that none == all, but as long as we document it everywhere I think it make sense.


http://gerrit.cloudera.org:8080/#/c/15550/2/src/kudu/common/rowblock.cc
File src/kudu/common/rowblock.cc:

http://gerrit.cloudera.org:8080/#/c/15550/2/src/kudu/common/rowblock.cc@91
PS2, Line 91:   CHECK_LE(n_rows_, std::numeric_limits<uint16_t>::max());
What prevent's n_rows_ > 65535?



-- 
To view, visit http://gerrit.cloudera.org:8080/15550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24dfb1bd036fde514ca6494bae0ddc171dd225dd
Gerrit-Change-Number: 15550
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 25 Mar 2020 04:08:19 +0000
Gerrit-HasComments: Yes

[kudu-CR] wire protocol: some simplification and optimization for rowwise encoding

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15550 )

Change subject: wire_protocol: some simplification and optimization for rowwise encoding
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15550/2/src/kudu/common/rowblock.h
File src/kudu/common/rowblock.h:

http://gerrit.cloudera.org:8080/#/c/15550/2/src/kudu/common/rowblock.h@108
PS2, Line 108:   // In the case that all rows are selected, sets *selected to 'boost::none'
> I felt similarly, but I also don't mind it if it's documented. Alternativel
yea, I couldn't find a great way to do this. I can change to a bool return if you guys prefer


http://gerrit.cloudera.org:8080/#/c/15550/2/src/kudu/common/rowblock.cc
File src/kudu/common/rowblock.cc:

http://gerrit.cloudera.org:8080/#/c/15550/2/src/kudu/common/rowblock.cc@91
PS2, Line 91:   CHECK_LE(n_rows_, std::numeric_limits<uint16_t>::max());
> What prevent's n_rows_ > 65535?
just in practice we don't set rowblock sizes that large since the point of a RowBlock is to fit all the rows into CPU cache. I found some small perf improvement using uint16s here,



-- 
To view, visit http://gerrit.cloudera.org:8080/15550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24dfb1bd036fde514ca6494bae0ddc171dd225dd
Gerrit-Change-Number: 15550
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 25 Mar 2020 05:44:55 +0000
Gerrit-HasComments: Yes

[kudu-CR] wire protocol: some simplification and optimization for rowwise encoding

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has removed a vote on this change.

Change subject: wire_protocol: some simplification and optimization for rowwise encoding
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/15550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I24dfb1bd036fde514ca6494bae0ddc171dd225dd
Gerrit-Change-Number: 15550
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] wire protocol: some simplification and optimization for rowwise encoding

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15550 )

Change subject: wire_protocol: some simplification and optimization for rowwise encoding
......................................................................


Patch Set 3: Verified+1

k, skipping IWYU for now


-- 
To view, visit http://gerrit.cloudera.org:8080/15550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24dfb1bd036fde514ca6494bae0ddc171dd225dd
Gerrit-Change-Number: 15550
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 25 Mar 2020 17:34:42 +0000
Gerrit-HasComments: No

[kudu-CR] wire protocol: some simplification and optimization for rowwise encoding

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/15550 )

Change subject: wire_protocol: some simplification and optimization for rowwise encoding
......................................................................


Patch Set 3: Code-Review+2

There is an IWYU failure but you could probably skip it given Adar is fixing it right now.


-- 
To view, visit http://gerrit.cloudera.org:8080/15550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24dfb1bd036fde514ca6494bae0ddc171dd225dd
Gerrit-Change-Number: 15550
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 25 Mar 2020 17:31:02 +0000
Gerrit-HasComments: No

[kudu-CR] wire protocol: some simplification and optimization for rowwise encoding

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Andrew Wong, Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15550

to look at the new patch set (#2).

Change subject: wire_protocol: some simplification and optimization for rowwise encoding
......................................................................

wire_protocol: some simplification and optimization for rowwise encoding

* Re-implement GetSelectedRows based on the new ForEachSetBit(...)
  utility, which operates word-by-word instead of byte-by-byte.

* use boost::none as a special case return value which indicates the
  common case of "all rows are selected". Currently the rowwise
  serialization code path doesn't use this special value (and just
  reproduces the old std::iota() call to generate the sequence of all
  indexes), but the columnar code path will special case this as a
  memcpy.

* Avoid one call to CountSelected() in SerializeRowBlock() by calculating
  num_rows from the size of the row index vector.

* Change SerializeRowBlock() to return an int indicating the number of
  rows selected, instead of accumulating it into the protobuf. This
  value can then be re-used to eliminate one extra call to CountSelected
  in ScanResultCopier::HandleRowBlock().

  After this change, the protobuf is no longer used by
  SerializeRowBlock, so I removed the parameter, which required a bit of
  fixup in the tests.

Change-Id: I24dfb1bd036fde514ca6494bae0ddc171dd225dd
---
M src/kudu/common/rowblock-test.cc
M src/kudu/common/rowblock.cc
M src/kudu/common/rowblock.h
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/common/wire_protocol.h
M src/kudu/tserver/tablet_service.cc
7 files changed, 94 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/15550/2
-- 
To view, visit http://gerrit.cloudera.org:8080/15550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I24dfb1bd036fde514ca6494bae0ddc171dd225dd
Gerrit-Change-Number: 15550
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] wire protocol: some simplification and optimization for rowwise encoding

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Andrew Wong, Kudu Jenkins, Andrew Wong, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15550

to look at the new patch set (#3).

Change subject: wire_protocol: some simplification and optimization for rowwise encoding
......................................................................

wire_protocol: some simplification and optimization for rowwise encoding

* Re-implement GetSelectedRows based on the new ForEachSetBit(...)
  utility, which operates word-by-word instead of byte-by-byte.

* use a boolean return value which indicates the common case of "all
  rows are selected". Currently the rowwise serialization code path
  doesn't use this special value (and just reproduces the old
  std::iota() call to generate the sequence of all indexes), but the
  columnar code path will special case this as a memcpy.

* Avoid one call to CountSelected() in SerializeRowBlock() by calculating
  num_rows from the size of the row index vector.

* Change SerializeRowBlock() to return an int indicating the number of
  rows selected, instead of accumulating it into the protobuf. This
  value can then be re-used to eliminate one extra call to CountSelected
  in ScanResultCopier::HandleRowBlock().

  After this change, the protobuf is no longer used by
  SerializeRowBlock, so I removed the parameter, which required a bit of
  fixup in the tests.

Change-Id: I24dfb1bd036fde514ca6494bae0ddc171dd225dd
---
M src/kudu/common/rowblock-test.cc
M src/kudu/common/rowblock.cc
M src/kudu/common/rowblock.h
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/common/wire_protocol.h
M src/kudu/tserver/tablet_service.cc
7 files changed, 89 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/15550/3
-- 
To view, visit http://gerrit.cloudera.org:8080/15550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I24dfb1bd036fde514ca6494bae0ddc171dd225dd
Gerrit-Change-Number: 15550
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] wire protocol: some simplification and optimization for rowwise encoding

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has removed a vote on this change.

Change subject: wire_protocol: some simplification and optimization for rowwise encoding
......................................................................


Removed -Verified by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/15550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I24dfb1bd036fde514ca6494bae0ddc171dd225dd
Gerrit-Change-Number: 15550
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>