You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "ZhangYao (Code Review)" <ge...@cloudera.org> on 2019/06/25 08:05:30 UTC

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

ZhangYao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13721


Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................

KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

SerializeRowBlock to operate by row instead of by column so that we only
need to iterate over the selection bitmap once.

Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
---
M src/kudu/common/wire_protocol.cc
1 file changed, 64 insertions(+), 71 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/13721/1
-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 1
Gerrit-Owner: ZhangYao <tr...@gmail.com>

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13721 )

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................

KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

This improves the performance of serializing RowBlocks to the wire by
amortizing the cost of iterating over the set bits of the selection
bitmap. Instead of using the bitmap once per column, we convert the
bitmap to a list of set indices up front, and then use those indices for
conversion.

This changes the benchmarks to report cycles/cell instead of raw times,
making it easier to see the effects of column count or sparsity.

Benchmark results:
  column count 3 and row select rate 1:     5.12520529   ->  5.44280228 cycles/cell
  column count 3 and row select rate 0.8:   12.74473127  ->  7.04588262 cycles/cell
  column count 3 and row select rate 0.5:   23.98607461  ->  7.51201477 cycles/cell
  column count 3 and row select rate 0.2:   40.66053179  ->  8.30233998 cycles/cell
  column count 30 and row select rate 1:    15.43040511  ->  15.97765642 cycles/cell
  column count 30 and row select rate 0.8:  23.7480557   ->  17.84433817 cycles/cell
  column count 30 and row select rate 0.5:  40.08323337  ->  17.67888749 cycles/cell
  column count 30 and row select rate 0.2:  48.62210244  ->  16.56884988 cycles/cell
  column count 300 and row select rate 1:   18.9223316   ->  20.90426976 cycles/cell
  column count 300 and row select rate 0.8: 27.50793008  ->  21.92481189 cycles/cell
  column count 300 and row select rate 0.5: 40.34367716  ->  21.32180024 cycles/cell
  column count 300 and row select rate 0.2: 52.7446843   ->  20.92634437 cycles/cell

Patch co-authored by Zhang Yao.

Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Reviewed-on: http://gerrit.cloudera.org:8080/13721
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <aw...@cloudera.com>
---
M src/kudu/common/rowblock-test.cc
M src/kudu/common/rowblock.cc
M src/kudu/common/rowblock.h
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
5 files changed, 107 insertions(+), 53 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Andrew Wong: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 10
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
ZhangYao has posted comments on this change. ( http://gerrit.cloudera.org:8080/13721 )

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................


Patch Set 2:

Here is the result(Before/After), I only pick the real time in the form below.
total row count:10000

Schema size      3           30            300
Select rate
    1 0     0.002/0.002   0.085/0.082   1.617/1.656(maybe all selected need futher optimization)
    0.8     0.003/0.002   0.072/0.064   1.457/1.266
    0.5     0.004/0.002   0.054/0.041   1.044/0.825
    0.2     0.002/0.001   0.027/0.017   0.509/0.280

We can see it have some optimization althought no that much in some case :)
The cpu occupation benchmark will be update later.


-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 2
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Wed, 10 Jul 2019 09:47:04 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Yao Xu, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13721

to look at the new patch set (#4).

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................

KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Convert the bitmap into vector before serialize row block, thereby it
avoid traverse bitmap for each column.

Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
---
M src/kudu/common/wire_protocol.cc
1 file changed, 60 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/13721/4
-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 4
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13721 )

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13721/2/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/13721/2/src/kudu/common/wire_protocol.cc@918
PS2, Line 918: convert_bitmap_to_vector
style: this should be ConvertBitmapToVector


http://gerrit.cloudera.org:8080/#/c/13721/2/src/kudu/common/wire_protocol.cc@923
PS2, Line 923:   if (!sel_vec->AnySelected()) {
             :     return;
             :   }
             :   if (sel_vec->CountSelected() == nrows) {
seems like these two could be combined into just calling CountSelected once, since in the case that 'AnySelected' returns false, it would have passed over the whole array anyway


http://gerrit.cloudera.org:8080/#/c/13721/2/src/kudu/common/wire_protocol.cc@935
PS2, Line 935:       select_row_idx->resize(select_row_idx->size() + run_size);
If you call CountSelected above and save the result, you can get a single resize() call, instead of one resize() per run



-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 2
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Tue, 16 Jul 2019 06:29:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13721 )

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................


Patch Set 9: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 9
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Fri, 23 Aug 2019 06:23:49 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13721 )

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................


Patch Set 1:

I chatted with Zhang Yao last night on Slack, but reproducing the content here in case others want to follow along:

Long ago, we used to do this conversion row-by-row, and moved to column-by-column in commit 3cd5496710351f76a72841f7a691fb1bfd50bb9a back in 2013. This had some big performance improvements as measured by the benchmark in wire_protocol-test.

However, that benchmark is quite simplistic -- it only tests the case when all rows are selected, and only for a schema with three columns. In KUDU-2847 (and in other places over the last few years) we found that the columnar conversion is a bottleneck in other situations. In particular:

- if only a small number of rows are selected, or if around half of the rows are selected (resulting in small runs) then the existing CopyColumn path spends a lot of work in branches trying to find runs (BitmapFindFirst, etc). We could probably improve this bitmap iteration code somewhat, but this issue is magnifed by:
- we do the iteration over the selection vector once per column, so if there are many columns, we don't get to amortize that cost.

It looks like the patch here doesn't go fully back to "row-by-row", but instead switches to one iteration over the selection vector, and then for each run, iterates column-wise. That might be a good compromise, since it avoids the per-column selvec iteration cost while also avoiding per-row iteration over the schema.

That said, I suggested that Zhang Yao extend the benchmarks to test all combinations of the two important dimensions:
- schema column count (narrow, medium, wide), perhaps testing both STRING typed columns and primitives
- selection vector pattern (all set, mostly set, half set, few set)

If we have a good set of benchmarks to work from, I think we'll be in a better place to evaluate different options here.


-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 1
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Comment-Date: Wed, 26 Jun 2019 16:09:42 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
ZhangYao has posted comments on this change. ( http://gerrit.cloudera.org:8080/13721 )

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/13721/2/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/13721/2/src/kudu/common/wire_protocol.cc@918
PS2, Line 918:  if (IS_NULLABLE) {
> style: this should be ConvertBitmapToVector
Done


http://gerrit.cloudera.org:8080/#/c/13721/2/src/kudu/common/wire_protocol.cc@923
PS2, Line 923:       if (IS_NULLABLE) {
             :         BitmapChange(dst + offset_to_null_bitmap, dst_col_idx, false);
             :       }
             :     }
> seems like these two could be combined into just calling CountSelected once
Done :)


http://gerrit.cloudera.org:8080/#/c/13721/2/src/kudu/common/wire_protocol.cc@935
PS2, Line 935:   bool selected = false;
> I think it would be better to save the selected row idx boundary instead of
The rowblock is used with small size(128) in scan, so maybe it doesn't matter to copy full index. But there is no doubt that only save the boundary will benefit to the memory cost:)


http://gerrit.cloudera.org:8080/#/c/13721/2/src/kudu/common/wire_protocol.cc@935
PS2, Line 935:   bool selected = false;
> If you call CountSelected above and save the result, you can get a single r
Yea :)



-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 3
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Tue, 16 Jul 2019 16:27:22 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Yao Xu, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13721

to look at the new patch set (#8).

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................

KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

This improves the performance of serializing RowBlocks to the wire by
amortizing the cost of iterating over the set bits of the selection
bitmap. Instead of using the bitmap once per column, we convert the
bitmap to a list of set indices up front, and then use those indices for
conversion.

This changes the benchmarks to report cycles/cell instead of raw times,
making it easier to see the effects of column count or sparsity.

Benchmark results:
  column count 3 and row select rate 1:     5.12520529   ->  5.44280228 cycles/cell
  column count 3 and row select rate 0.8:   12.74473127  ->  7.04588262 cycles/cell
  column count 3 and row select rate 0.5:   23.98607461  ->  7.51201477 cycles/cell
  column count 3 and row select rate 0.2:   40.66053179  ->  8.30233998 cycles/cell
  column count 30 and row select rate 1:    15.43040511  ->  15.97765642 cycles/cell
  column count 30 and row select rate 0.8:  23.7480557   ->  17.84433817 cycles/cell
  column count 30 and row select rate 0.5:  40.08323337  ->  17.67888749 cycles/cell
  column count 30 and row select rate 0.2:  48.62210244  ->  16.56884988 cycles/cell
  column count 300 and row select rate 1:   18.9223316   ->  20.90426976 cycles/cell
  column count 300 and row select rate 0.8: 27.50793008  ->  21.92481189 cycles/cell
  column count 300 and row select rate 0.5: 40.34367716  ->  21.32180024 cycles/cell
  column count 300 and row select rate 0.2: 52.7446843   ->  20.92634437 cycles/cell

Patch co-authored by Zhang Yao.

Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
---
M src/kudu/common/rowblock-test.cc
M src/kudu/common/rowblock.cc
M src/kudu/common/rowblock.h
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
5 files changed, 107 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/13721/8
-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 8
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
ZhangYao has posted comments on this change. ( http://gerrit.cloudera.org:8080/13721 )

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................


Patch Set 1:

After I do the benchmark https://gerrit.cloudera.org/#/c/13749/ , I found this may be a retrogression :(
Maybe my change break the cpu cache lines somehow. I will try another way to do this optimization later.
Here is the result(Before/After), I only pick the real time in the form below.
Total row count:100000

Schema size      3           30            300
Select rate
    1 0     0.004/0.195   2.209/2.159   24.947/35.265
    0.8     0.046/0.155   1.358/1.801   18.824/28.634
    0.5     0.048/0.102   0.823/1.182   12.799/17.946
    0.2     0.025/0.043   0.365/0.732   6.51/8.238


-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 1
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Thu, 27 Jun 2019 09:49:01 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Yao Xu, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13721

to look at the new patch set (#9).

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................

KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

This improves the performance of serializing RowBlocks to the wire by
amortizing the cost of iterating over the set bits of the selection
bitmap. Instead of using the bitmap once per column, we convert the
bitmap to a list of set indices up front, and then use those indices for
conversion.

This changes the benchmarks to report cycles/cell instead of raw times,
making it easier to see the effects of column count or sparsity.

Benchmark results:
  column count 3 and row select rate 1:     5.12520529   ->  5.44280228 cycles/cell
  column count 3 and row select rate 0.8:   12.74473127  ->  7.04588262 cycles/cell
  column count 3 and row select rate 0.5:   23.98607461  ->  7.51201477 cycles/cell
  column count 3 and row select rate 0.2:   40.66053179  ->  8.30233998 cycles/cell
  column count 30 and row select rate 1:    15.43040511  ->  15.97765642 cycles/cell
  column count 30 and row select rate 0.8:  23.7480557   ->  17.84433817 cycles/cell
  column count 30 and row select rate 0.5:  40.08323337  ->  17.67888749 cycles/cell
  column count 30 and row select rate 0.2:  48.62210244  ->  16.56884988 cycles/cell
  column count 300 and row select rate 1:   18.9223316   ->  20.90426976 cycles/cell
  column count 300 and row select rate 0.8: 27.50793008  ->  21.92481189 cycles/cell
  column count 300 and row select rate 0.5: 40.34367716  ->  21.32180024 cycles/cell
  column count 300 and row select rate 0.2: 52.7446843   ->  20.92634437 cycles/cell

Patch co-authored by Zhang Yao.

Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
---
M src/kudu/common/rowblock-test.cc
M src/kudu/common/rowblock.cc
M src/kudu/common/rowblock.h
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
5 files changed, 107 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/13721/9
-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 9
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has uploaded a new patch set (#5) to the change originally created by ZhangYao. ( http://gerrit.cloudera.org:8080/13721 )

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................

KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

This improves the performance of serializing RowBlocks to the wire by
amortiznig the cost of iterating over the set bits of the selection
bitmap. Instead of using the bitmap once per column, we convert the
bitmap to a list of set indices up front, and then use those indices for
conversion.

This changes the benchmarks to report cycles/cell instead of raw times,
making it easier to see the effects of column count or sparsity.

Benchmark results:
  column count 3 and row select rate 1:     5.12520529   ->  5.44280228 cycles/cell
  column count 3 and row select rate 0.8:   12.74473127  ->  7.04588262 cycles/cell
  column count 3 and row select rate 0.5:   23.98607461  ->  7.51201477 cycles/cell
  column count 3 and row select rate 0.2:   40.66053179  ->  8.30233998 cycles/cell
  column count 30 and row select rate 1:    15.43040511  ->  15.97765642 cycles/cell
  column count 30 and row select rate 0.8:  23.7480557   ->  17.84433817 cycles/cell
  column count 30 and row select rate 0.5:  40.08323337  ->  17.67888749 cycles/cell
  column count 30 and row select rate 0.2:  48.62210244  ->  16.56884988 cycles/cell
  column count 300 and row select rate 1:   18.9223316   ->  20.90426976 cycles/cell
  column count 300 and row select rate 0.8: 27.50793008  ->  21.92481189 cycles/cell
  column count 300 and row select rate 0.5: 40.34367716  ->  21.32180024 cycles/cell
  column count 300 and row select rate 0.2: 52.7446843   ->  20.92634437 cycles/cell

Patch co-authored by Zhang Yao.

Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
---
M src/kudu/common/rowblock-test.cc
M src/kudu/common/rowblock.cc
M src/kudu/common/rowblock.h
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
5 files changed, 104 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/13721/5
-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 5
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Yao Xu, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13721

to look at the new patch set (#3).

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................

KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Convert the bitmap into vector before serialize row block, thereby it
avoid traverse bitmap for each column.

Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
---
M src/kudu/common/wire_protocol.cc
1 file changed, 59 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/13721/3
-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 3
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
ZhangYao has posted comments on this change. ( http://gerrit.cloudera.org:8080/13721 )

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................


Patch Set 1:

Maybe my description is a little bit confusing.
The total row count in benchmark is 100000.
The horizontal axis "schema size" is the column count in schema, here 
we have 3 columns, 30 columns and 300 columns for test.
The vertical axis "Select rate" decide how many rows we randomly selected,
here we have 1.0(all rows) * 100000, 0.8 * 100000, 0.5(half rows) * 100000
and 0.2 * 100000 for test.


-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 1
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Thu, 27 Jun 2019 12:38:12 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13721 )

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................


Patch Set 4:

Talked to ZhangYao offline and he suggested I upload my change as a patch. Here it is, with up-to-date perf results in the commit message


-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 4
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Wed, 24 Jul 2019 06:01:42 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
ZhangYao has posted comments on this change. ( http://gerrit.cloudera.org:8080/13721 )

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................


Patch Set 8:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13721/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13721/7//COMMIT_MSG@10
PS7, Line 10: amortizing
> amortizing
Done


http://gerrit.cloudera.org:8080/#/c/13721/7/src/kudu/common/rowblock.cc
File src/kudu/common/rowblock.cc:

http://gerrit.cloudera.org:8080/#/c/13721/7/src/kudu/common/rowblock.cc@91
PS7, Line 91:  the bit index to the output until none are set.
            :   for (int i = 0; i < n_bytes_; i++) {
            :     uint8_t bm = *bitmap++;
            :     while (bm != 0) {
            :       int bit = Bits::FindLSBSetNonZero(bm);
            :      
> nit: Jumping into this is can be intimidating with no comments. Maybe add a
Done


http://gerrit.cloudera.org:8080/#/c/13721/7/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/13721/7/src/kudu/common/wire_protocol.cc@927
PS7, Line 927: * slice = reinterpret_cast<const Slice *>(src);
             :       size_t offset_in_indirect = indirect_data->size();
             :       indirect_data->append(reinterpret_cast<const char*>(slice->data()), slice->size());
             : 
             :       Slice* 
> nit: fix ordering of *s
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 8
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Thu, 22 Aug 2019 09:14:20 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has uploaded a new patch set (#6) to the change originally created by ZhangYao. ( http://gerrit.cloudera.org:8080/13721 )

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................

KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

This improves the performance of serializing RowBlocks to the wire by
amortiznig the cost of iterating over the set bits of the selection
bitmap. Instead of using the bitmap once per column, we convert the
bitmap to a list of set indices up front, and then use those indices for
conversion.

This changes the benchmarks to report cycles/cell instead of raw times,
making it easier to see the effects of column count or sparsity.

Benchmark results:
  column count 3 and row select rate 1:     5.12520529   ->  5.44280228 cycles/cell
  column count 3 and row select rate 0.8:   12.74473127  ->  7.04588262 cycles/cell
  column count 3 and row select rate 0.5:   23.98607461  ->  7.51201477 cycles/cell
  column count 3 and row select rate 0.2:   40.66053179  ->  8.30233998 cycles/cell
  column count 30 and row select rate 1:    15.43040511  ->  15.97765642 cycles/cell
  column count 30 and row select rate 0.8:  23.7480557   ->  17.84433817 cycles/cell
  column count 30 and row select rate 0.5:  40.08323337  ->  17.67888749 cycles/cell
  column count 30 and row select rate 0.2:  48.62210244  ->  16.56884988 cycles/cell
  column count 300 and row select rate 1:   18.9223316   ->  20.90426976 cycles/cell
  column count 300 and row select rate 0.8: 27.50793008  ->  21.92481189 cycles/cell
  column count 300 and row select rate 0.5: 40.34367716  ->  21.32180024 cycles/cell
  column count 300 and row select rate 0.2: 52.7446843   ->  20.92634437 cycles/cell

Patch co-authored by Zhang Yao.

Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
---
M src/kudu/common/rowblock-test.cc
M src/kudu/common/rowblock.cc
M src/kudu/common/rowblock.h
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
5 files changed, 105 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/13721/6
-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 6
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
ZhangYao has posted comments on this change. ( http://gerrit.cloudera.org:8080/13721 )

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................


Patch Set 3:

> I played a bit with this patch and made a few changes shown here:
 > 
 > https://gist.github.com/e00c7f8e8360cd663d2ab1ac48959c9b
 > 
 > this changes the benchmark to run a constant number of cells and
 > output cycles/cell. It also makes a few optimizations to speed up
 > the conversion of bitmaps to vector of indices, similar to what I
 > suggested above.
 > 
 > original:
 > I0716 00:05:15.684546 18747 wire_protocol-test.cc:172] Converting
 > to PB with column count 3 and row select rate 1: 5.12520529
 > cycles/cell
 > I0716 00:05:16.123811 18747 wire_protocol-test.cc:172] Converting
 > to PB with column count 3 and row select rate 0.8: 12.74473127
 > cycles/cell
 > I0716 00:05:16.950168 18747 wire_protocol-test.cc:172] Converting
 > to PB with column count 3 and row select rate 0.5: 23.98607461
 > cycles/cell
 > I0716 00:05:18.350739 18747 wire_protocol-test.cc:172] Converting
 > to PB with column count 3 and row select rate 0.2: 40.66053179
 > cycles/cell
 > I0716 00:05:18.883632 18747 wire_protocol-test.cc:172] Converting
 > to PB with column count 30 and row select rate 1: 15.43040511
 > cycles/cell
 > I0716 00:05:19.702869 18747 wire_protocol-test.cc:172] Converting
 > to PB with column count 30 and row select rate 0.8: 23.7480557
 > cycles/cell
 > I0716 00:05:21.084743 18747 wire_protocol-test.cc:172] Converting
 > to PB with column count 30 and row select rate 0.5: 40.08323337
 > cycles/cell
 > I0716 00:05:22.760478 18747 wire_protocol-test.cc:172] Converting
 > to PB with column count 30 and row select rate 0.2: 48.62210244
 > cycles/cell
 > I0716 00:05:23.425213 18747 wire_protocol-test.cc:172] Converting
 > to PB with column count 300 and row select rate 1: 18.9223316
 > cycles/cell
 > I0716 00:05:24.384176 18747 wire_protocol-test.cc:172] Converting
 > to PB with column count 300 and row select rate 0.8: 27.50793008
 > cycles/cell
 > I0716 00:05:25.785208 18747 wire_protocol-test.cc:172] Converting
 > to PB with column count 300 and row select rate 0.5: 40.34367716
 > cycles/cell
 > I0716 00:05:27.613420 18747 wire_protocol-test.cc:172] Converting
 > to PB with column count 300 and row select rate 0.2: 52.7446843
 > cycles/cell
 > 
 > zhangyao's version:
 > 
 > I0716 00:05:43.160219 19261 wire_protocol-test.cc:172] Converting
 > to PB with column count 3 and row select rate 1: 5.32750006
 > cycles/cell
 > I0716 00:05:43.544219 19261 wire_protocol-test.cc:172] Converting
 > to PB with column count 3 and row select rate 0.8: 11.13753333
 > cycles/cell
 > I0716 00:05:44.195641 19261 wire_protocol-test.cc:172] Converting
 > to PB with column count 3 and row select rate 0.5: 18.90386941
 > cycles/cell
 > I0716 00:05:45.187793 19261 wire_protocol-test.cc:172] Converting
 > to PB with column count 3 and row select rate 0.2: 28.7992143
 > cycles/cell
 > I0716 00:05:45.713094 19261 wire_protocol-test.cc:172] Converting
 > to PB with column count 30 and row select rate 1: 15.20973827
 > cycles/cell
 > I0716 00:05:46.259582 19261 wire_protocol-test.cc:172] Converting
 > to PB with column count 30 and row select rate 0.8: 15.82916978
 > cycles/cell
 > I0716 00:05:46.809581 19261 wire_protocol-test.cc:172] Converting
 > to PB with column count 30 and row select rate 0.5: 15.92793484
 > cycles/cell
 > I0716 00:05:47.421159 19261 wire_protocol-test.cc:172] Converting
 > to PB with column count 30 and row select rate 0.2: 17.71918057
 > cycles/cell
 > I0716 00:05:48.091591 19261 wire_protocol-test.cc:172] Converting
 > to PB with column count 300 and row select rate 1: 19.09610579
 > cycles/cell
 > I0716 00:05:48.761175 19261 wire_protocol-test.cc:172] Converting
 > to PB with column count 300 and row select rate 0.8: 18.89817833
 > cycles/cell
 > I0716 00:05:49.393072 19261 wire_protocol-test.cc:172] Converting
 > to PB with column count 300 and row select rate 0.5: 18.01899173
 > cycles/cell
 > I0716 00:05:50.126744 19261 wire_protocol-test.cc:172] Converting
 > to PB with column count 300 and row select rate 0.2: 20.69221174
 > cycles/cell
 > 
 > 
 > with my changes:
 > 
 > I0716 00:04:49.736728 18599 wire_protocol-test.cc:172] Converting
 > to PB with column count 3 and row select rate 1: 6.15231488
 > cycles/cell
 > I0716 00:04:50.063104 18599 wire_protocol-test.cc:172] Converting
 > to PB with column count 3 and row select rate 0.8: 9.46510837
 > cycles/cell
 > I0716 00:04:50.354218 18599 wire_protocol-test.cc:172] Converting
 > to PB with column count 3 and row select rate 0.5: 8.44479528
 > cycles/cell
 > I0716 00:04:50.666503 18599 wire_protocol-test.cc:172] Converting
 > to PB with column count 3 and row select rate 0.2: 9.05289943
 > cycles/cell
 > I0716 00:04:51.228058 18599 wire_protocol-test.cc:172] Converting
 > to PB with column count 30 and row select rate 1: 16.26476494
 > cycles/cell
 > I0716 00:04:51.763269 18599 wire_protocol-test.cc:172] Converting
 > to PB with column count 30 and row select rate 0.8: 15.49973305
 > cycles/cell
 > I0716 00:04:52.277858 18599 wire_protocol-test.cc:172] Converting
 > to PB with column count 30 and row select rate 0.5: 14.89959128
 > cycles/cell
 > I0716 00:04:52.803534 18599 wire_protocol-test.cc:172] Converting
 > to PB with column count 30 and row select rate 0.2: 15.22482627
 > cycles/cell
 > I0716 00:04:53.483842 18599 wire_protocol-test.cc:172] Converting
 > to PB with column count 300 and row select rate 1: 19.39585408
 > cycles/cell
 > I0716 00:04:54.123446 18599 wire_protocol-test.cc:172] Converting
 > to PB with column count 300 and row select rate 0.8: 18.24819305
 > cycles/cell
 > I0716 00:04:54.757288 18599 wire_protocol-test.cc:172] Converting
 > to PB with column count 300 and row select rate 0.5: 18.06256705
 > cycles/cell
 > I0716 00:04:55.405241 18599 wire_protocol-test.cc:172] Converting
 > to PB with column count 300 and row select rate 0.2: 18.25103313
 > cycles/cell

WoW, I think your result is more better :D . You can just push your code here freely.


-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 3
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Tue, 16 Jul 2019 16:33:32 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13721 )

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................


Patch Set 7: Code-Review+1

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13721/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13721/7//COMMIT_MSG@10
PS7, Line 10: amortiznig
amortizing


http://gerrit.cloudera.org:8080/#/c/13721/7/src/kudu/common/rowblock.cc
File src/kudu/common/rowblock.cc:

http://gerrit.cloudera.org:8080/#/c/13721/7/src/kudu/common/rowblock.cc@91
PS7, Line 91: uint8_t bm = *bitmap++;
            :     while (bm != 0) {
            :       int bit = Bits::FindLSBSetNonZero(bm);
            :       *dst++ = (i * 8) + bit;
            :       bm ^= (1 << bit);
            :     }
nit: Jumping into this is can be intimidating with no comments. Maybe add a comment along the lines of:

Within each byte, keep flipping the least significant non-zero bit and adding the bit index to the output until none are set.


http://gerrit.cloudera.org:8080/#/c/13721/7/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/13721/7/src/kudu/common/wire_protocol.cc@927
PS7, Line 927:  *slice = reinterpret_cast<const Slice *>(src);
             :       size_t offset_in_indirect = indirect_data->size();
             :       indirect_data->append(reinterpret_cast<const char*>(slice->data()), slice->size());
             : 
             :       Slice *
nit: fix ordering of *s



-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 7
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Tue, 13 Aug 2019 01:04:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13721 )

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................


Patch Set 2:

I played a bit with this patch and made a few changes shown here:

https://gist.github.com/e00c7f8e8360cd663d2ab1ac48959c9b

this changes the benchmark to run a constant number of cells and output cycles/cell. It also makes a few optimizations to speed up the conversion of bitmaps to vector of indices, similar to what I suggested above.

original:
I0716 00:05:15.684546 18747 wire_protocol-test.cc:172] Converting to PB with column count 3 and row select rate 1: 5.12520529 cycles/cell
I0716 00:05:16.123811 18747 wire_protocol-test.cc:172] Converting to PB with column count 3 and row select rate 0.8: 12.74473127 cycles/cell
I0716 00:05:16.950168 18747 wire_protocol-test.cc:172] Converting to PB with column count 3 and row select rate 0.5: 23.98607461 cycles/cell
I0716 00:05:18.350739 18747 wire_protocol-test.cc:172] Converting to PB with column count 3 and row select rate 0.2: 40.66053179 cycles/cell
I0716 00:05:18.883632 18747 wire_protocol-test.cc:172] Converting to PB with column count 30 and row select rate 1: 15.43040511 cycles/cell
I0716 00:05:19.702869 18747 wire_protocol-test.cc:172] Converting to PB with column count 30 and row select rate 0.8: 23.7480557 cycles/cell
I0716 00:05:21.084743 18747 wire_protocol-test.cc:172] Converting to PB with column count 30 and row select rate 0.5: 40.08323337 cycles/cell
I0716 00:05:22.760478 18747 wire_protocol-test.cc:172] Converting to PB with column count 30 and row select rate 0.2: 48.62210244 cycles/cell
I0716 00:05:23.425213 18747 wire_protocol-test.cc:172] Converting to PB with column count 300 and row select rate 1: 18.9223316 cycles/cell
I0716 00:05:24.384176 18747 wire_protocol-test.cc:172] Converting to PB with column count 300 and row select rate 0.8: 27.50793008 cycles/cell
I0716 00:05:25.785208 18747 wire_protocol-test.cc:172] Converting to PB with column count 300 and row select rate 0.5: 40.34367716 cycles/cell
I0716 00:05:27.613420 18747 wire_protocol-test.cc:172] Converting to PB with column count 300 and row select rate 0.2: 52.7446843 cycles/cell

zhangyao's version:

I0716 00:05:43.160219 19261 wire_protocol-test.cc:172] Converting to PB with column count 3 and row select rate 1: 5.32750006 cycles/cell
I0716 00:05:43.544219 19261 wire_protocol-test.cc:172] Converting to PB with column count 3 and row select rate 0.8: 11.13753333 cycles/cell
I0716 00:05:44.195641 19261 wire_protocol-test.cc:172] Converting to PB with column count 3 and row select rate 0.5: 18.90386941 cycles/cell
I0716 00:05:45.187793 19261 wire_protocol-test.cc:172] Converting to PB with column count 3 and row select rate 0.2: 28.7992143 cycles/cell
I0716 00:05:45.713094 19261 wire_protocol-test.cc:172] Converting to PB with column count 30 and row select rate 1: 15.20973827 cycles/cell
I0716 00:05:46.259582 19261 wire_protocol-test.cc:172] Converting to PB with column count 30 and row select rate 0.8: 15.82916978 cycles/cell
I0716 00:05:46.809581 19261 wire_protocol-test.cc:172] Converting to PB with column count 30 and row select rate 0.5: 15.92793484 cycles/cell
I0716 00:05:47.421159 19261 wire_protocol-test.cc:172] Converting to PB with column count 30 and row select rate 0.2: 17.71918057 cycles/cell
I0716 00:05:48.091591 19261 wire_protocol-test.cc:172] Converting to PB with column count 300 and row select rate 1: 19.09610579 cycles/cell
I0716 00:05:48.761175 19261 wire_protocol-test.cc:172] Converting to PB with column count 300 and row select rate 0.8: 18.89817833 cycles/cell
I0716 00:05:49.393072 19261 wire_protocol-test.cc:172] Converting to PB with column count 300 and row select rate 0.5: 18.01899173 cycles/cell
I0716 00:05:50.126744 19261 wire_protocol-test.cc:172] Converting to PB with column count 300 and row select rate 0.2: 20.69221174 cycles/cell


with my changes:

I0716 00:04:49.736728 18599 wire_protocol-test.cc:172] Converting to PB with column count 3 and row select rate 1: 6.15231488 cycles/cell
I0716 00:04:50.063104 18599 wire_protocol-test.cc:172] Converting to PB with column count 3 and row select rate 0.8: 9.46510837 cycles/cell
I0716 00:04:50.354218 18599 wire_protocol-test.cc:172] Converting to PB with column count 3 and row select rate 0.5: 8.44479528 cycles/cell
I0716 00:04:50.666503 18599 wire_protocol-test.cc:172] Converting to PB with column count 3 and row select rate 0.2: 9.05289943 cycles/cell
I0716 00:04:51.228058 18599 wire_protocol-test.cc:172] Converting to PB with column count 30 and row select rate 1: 16.26476494 cycles/cell
I0716 00:04:51.763269 18599 wire_protocol-test.cc:172] Converting to PB with column count 30 and row select rate 0.8: 15.49973305 cycles/cell
I0716 00:04:52.277858 18599 wire_protocol-test.cc:172] Converting to PB with column count 30 and row select rate 0.5: 14.89959128 cycles/cell
I0716 00:04:52.803534 18599 wire_protocol-test.cc:172] Converting to PB with column count 30 and row select rate 0.2: 15.22482627 cycles/cell
I0716 00:04:53.483842 18599 wire_protocol-test.cc:172] Converting to PB with column count 300 and row select rate 1: 19.39585408 cycles/cell
I0716 00:04:54.123446 18599 wire_protocol-test.cc:172] Converting to PB with column count 300 and row select rate 0.8: 18.24819305 cycles/cell
I0716 00:04:54.757288 18599 wire_protocol-test.cc:172] Converting to PB with column count 300 and row select rate 0.5: 18.06256705 cycles/cell
I0716 00:04:55.405241 18599 wire_protocol-test.cc:172] Converting to PB with column count 300 and row select rate 0.2: 18.25103313 cycles/cell


-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 2
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Tue, 16 Jul 2019 07:10:44 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Yao Xu, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13721

to look at the new patch set (#2).

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................

KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Convert the bitmap into vector before serialize row block, thereby it
avoid traverse bitmap for each column.

Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
---
M src/kudu/common/wire_protocol.cc
1 file changed, 58 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/13721/2
-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 2
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>

[kudu-CR] KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock

Posted by "Yao Xu (Code Review)" <ge...@cloudera.org>.
Yao Xu has posted comments on this change. ( http://gerrit.cloudera.org:8080/13721 )

Change subject: KUDU-2847: Optimize iteration over selection vector in SerializeRowBlock
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13721/2/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/13721/2/src/kudu/common/wire_protocol.cc@935
PS2, Line 935:       select_row_idx->resize(select_row_idx->size() + run_size);
I think it would be better to save the selected row idx boundary instead of all the selected row idx.


http://gerrit.cloudera.org:8080/#/c/13721/2/src/kudu/common/wire_protocol.cc@999
PS2, Line 999: const ColumnBlock column_block 
const ColumnBlock& column_block



-- 
To view, visit http://gerrit.cloudera.org:8080/13721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19917d1875c46fd4cf98ef8a471b0340a76161e7
Gerrit-Change-Number: 13721
Gerrit-PatchSet: 2
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Wed, 10 Jul 2019 10:27:47 +0000
Gerrit-HasComments: Yes