You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2020/03/25 21:11:03 UTC

[kudu-CR] Add functionality to serialize a RowBlock into columnar format

Hello Andrew Wong, Grant Henke,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/15560

to review the following change.


Change subject: Add functionality to serialize a RowBlock into columnar format
......................................................................

Add functionality to serialize a RowBlock into columnar format

This adds the core functionality to take a RowBlock and convert it into
a set of buffers for each column: the cell data, the null bitmap, and
the indirect (string) data.

This also updates wire_protocol-test to add coverage for nulls and
unselected rows, and adds a comparison benchmark. The columnar code path
is 4-5x faster in the best case, and in the worst case only 30% slower
that the existing row-wise code path.

Some follow-up commits will add further optimizations.

Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
---
M src/kudu/common/columnar_serialization.cc
M src/kudu/common/columnar_serialization.h
M src/kudu/common/wire_protocol-test.cc
M src/kudu/util/faststring.h
4 files changed, 477 insertions(+), 113 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/15560/1
-- 
To view, visit http://gerrit.cloudera.org:8080/15560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
Gerrit-Change-Number: 15560
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>

[kudu-CR] Add functionality to serialize a RowBlock into columnar format

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15560 )

Change subject: Add functionality to serialize a RowBlock into columnar format
......................................................................


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/15560/4/src/kudu/common/wire_protocol-test.cc
File src/kudu/common/wire_protocol-test.cc:

http://gerrit.cloudera.org:8080/#/c/15560/4/src/kudu/common/wire_protocol-test.cc@68
PS4, Line 68:   WireProtocolTest()
> warning: argument name 'nullable' in comment does not match parameter name 
Done


http://gerrit.cloudera.org:8080/#/c/15560/4/src/kudu/common/wire_protocol-test.cc@70
PS4, Line 70:                   ColumnSchema("nullable_string", STRING, /* is_nullable=*/true),
> warning: argument name 'nullable' in comment does not match parameter name 
Done


http://gerrit.cloudera.org:8080/#/c/15560/4/src/kudu/common/wire_protocol-test.cc@76
PS4, Line 76:   }
> warning: method 'FillRowBlockWithTestRows' can be made static [readability-
Done


http://gerrit.cloudera.org:8080/#/c/15560/4/src/kudu/common/wire_protocol-test.cc@449
PS4, Line 449: struct RowwiseConverter {
> warning: multiple declarations in a single statement reduces readability [r
Done


http://gerrit.cloudera.org:8080/#/c/15560/4/src/kudu/common/wire_protocol-test.cc@537
PS4, Line 537:     }
> warning: method 'SelectRandomRowsWithRate' can be made static [readability-
Done


http://gerrit.cloudera.org:8080/#/c/15560/4/src/kudu/common/wire_protocol-test.cc@571
PS4, Line 571:     // Regardless of the config, use a constant number of cells for the test by
> warning: narrowing conversion from 'double' to 'int' [bugprone-narrowing-co
Done


http://gerrit.cloudera.org:8080/#/c/15560/4/src/kudu/util/faststring.h
File src/kudu/util/faststring.h:

http://gerrit.cloudera.org:8080/#/c/15560/4/src/kudu/util/faststring.h@82
PS4, Line 82:       delete[] data_;
> warning: Attempt to free released memory [clang-analyzer-cplusplus.NewDelet
tidy's not getting thi sright -- fails even this one: https://gist.github.com/toddlipcon/a5be097a734f7abda08a35e953302010



-- 
To view, visit http://gerrit.cloudera.org:8080/15560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
Gerrit-Change-Number: 15560
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 27 Mar 2020 16:23:17 +0000
Gerrit-HasComments: Yes

[kudu-CR] Add functionality to serialize a RowBlock into columnar format

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Andrew Wong, Kudu Jenkins, Andrew Wong, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15560

to look at the new patch set (#3).

Change subject: Add functionality to serialize a RowBlock into columnar format
......................................................................

Add functionality to serialize a RowBlock into columnar format

This adds the core functionality to take a RowBlock and convert it into
a set of buffers for each column: the cell data, the null bitmap, and
the indirect (string) data.

This also updates wire_protocol-test to add coverage for nulls and
unselected rows, and adds a comparison benchmark. The columnar code path
is 4-5x faster in the best case, and in the worst case only 30% slower
that the existing row-wise code path.

Some follow-up commits will add further optimizations.

Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
---
M src/kudu/common/columnar_serialization.cc
M src/kudu/common/columnar_serialization.h
M src/kudu/common/wire_protocol-test.cc
M src/kudu/util/faststring.h
4 files changed, 497 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/15560/3
-- 
To view, visit http://gerrit.cloudera.org:8080/15560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
Gerrit-Change-Number: 15560
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Add functionality to serialize a RowBlock into columnar format

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15560 )

Change subject: Add functionality to serialize a RowBlock into columnar format
......................................................................


Patch Set 5:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/columnar_serialization.h
File src/kudu/common/columnar_serialization.h:

http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/columnar_serialization.h@31
PS5, Line 31: block
nit: maybe "batch", to avoid batch/block confusion. Maybe also note that this may contain data from multiple RowBlocks?


http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/columnar_serialization.cc
File src/kudu/common/columnar_serialization.cc:

http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/columnar_serialization.cc@43
PS5, Line 43: using boost::optional;
> warning: using decl 'optional' is unused [misc-unused-using-decls]
Yeah, don't need this now that we're using SelectedRows.


http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/columnar_serialization.cc@373
PS5, Line 373: slice's pointer
nit: just "slice"?


http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/columnar_serialization.cc@374
PS5, Line 374: void RelocateSlicesToIndirect(uint8_t* __restrict__ cells_buf, int n_rows,
             :                               faststring* indirect) {
nit: Since 'cells_buf' is an in-out param, could you reorder these so it's after 'n_rows'?


http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/columnar_serialization.cc@466
PS5, Line 466:   if (out->columns.size() != projection_schema->num_columns()) {
             :     CHECK_EQ(out->columns.size(), 0);
Do you think we'll want to move this initialization out of this method at some point?


http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/wire_protocol-test.cc
File src/kudu/common/wire_protocol-test.cc:

http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/wire_protocol-test.cc@336
PS5, Line 336:         if (!col.is_nullable() || !row.is_null(c)) {
readability nit: mind inverting to avoid the negatives? eg if col.is_nullable() && row.is_null() continue? Also avoids some scoping.


http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/wire_protocol-test.cc@348
PS5, Line 348:       }
Should we test the serialized non null bitmap if the row is null?


http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/wire_protocol-test.cc@574
PS5, Line 574:     const int kNumTrials = static_cast<int>(
             :         kNumCellsToConvert / select_rate / spec.columns.size() / block.nrows());
nit: maybe note that we're doing something along the lines of:

# blocks iters (trials) = (selected cells) * (rows /selected rows) * (1 row / # cells) * (1 block iter / # rows)


http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/wire_protocol-test.cc@601
PS5, Line 601: cycles_per_cell_rowwise / cycles_per_cell_columnar;
Reverse these?


http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/util/faststring.h
File src/kudu/util/faststring.h:

http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/util/faststring.h@108
PS5, Line 108:   void resize_with_extra_capacity(size_t newsize) {
nit: should doc this and it'd be nice to explain when to use this over 'resize'.



-- 
To view, visit http://gerrit.cloudera.org:8080/15560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
Gerrit-Change-Number: 15560
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Sat, 28 Mar 2020 03:47:38 +0000
Gerrit-HasComments: Yes

[kudu-CR] Add functionality to serialize a RowBlock into columnar format

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Andrew Wong, Kudu Jenkins, Andrew Wong, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15560

to look at the new patch set (#5).

Change subject: Add functionality to serialize a RowBlock into columnar format
......................................................................

Add functionality to serialize a RowBlock into columnar format

This adds the core functionality to take a RowBlock and convert it into
a set of buffers for each column: the cell data, the null bitmap, and
the indirect (string) data.

This also updates wire_protocol-test to add coverage for nulls and
unselected rows, and adds a comparison benchmark. The columnar code path
is 4-5x faster in the best case, and in the worst case only 30% slower
that the existing row-wise code path.

Some follow-up commits will add further optimizations.

Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
---
M src/kudu/common/CMakeLists.txt
M src/kudu/common/columnar_serialization.cc
M src/kudu/common/columnar_serialization.h
M src/kudu/common/wire_protocol-test.cc
M src/kudu/util/faststring.h
5 files changed, 505 insertions(+), 116 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/15560/5
-- 
To view, visit http://gerrit.cloudera.org:8080/15560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
Gerrit-Change-Number: 15560
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Add functionality to serialize a RowBlock into columnar format

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15560 )

Change subject: Add functionality to serialize a RowBlock into columnar format
......................................................................

Add functionality to serialize a RowBlock into columnar format

This adds the core functionality to take a RowBlock and convert it into
a set of buffers for each column: the cell data, the null bitmap, and
the indirect (string) data.

This also updates wire_protocol-test to add coverage for nulls and
unselected rows, and adds a comparison benchmark. The columnar code path
is 4-5x faster in the best case, and in the worst case only 30% slower
that the existing row-wise code path.

Some follow-up commits will add further optimizations.

Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
Reviewed-on: http://gerrit.cloudera.org:8080/15560
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Tested-by: Kudu Jenkins
---
M src/kudu/common/CMakeLists.txt
M src/kudu/common/columnar_serialization.cc
M src/kudu/common/columnar_serialization.h
M src/kudu/common/wire_protocol-test.cc
M src/kudu/util/faststring.h
5 files changed, 510 insertions(+), 116 deletions(-)

Approvals:
  Andrew Wong: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/15560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
Gerrit-Change-Number: 15560
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Add functionality to serialize a RowBlock into columnar format

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15560 )

Change subject: Add functionality to serialize a RowBlock into columnar format
......................................................................


Patch Set 5:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/columnar_serialization.h
File src/kudu/common/columnar_serialization.h:

http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/columnar_serialization.h@31
PS5, Line 31: block
> nit: maybe "batch", to avoid batch/block confusion. Maybe also note that th
Done


http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/columnar_serialization.cc
File src/kudu/common/columnar_serialization.cc:

http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/columnar_serialization.cc@43
PS5, Line 43: using boost::optional;
> Yeah, don't need this now that we're using SelectedRows.
Done


http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/columnar_serialization.cc@373
PS5, Line 373: slice's pointer
> nit: just "slice"?
clarified


http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/columnar_serialization.cc@374
PS5, Line 374: void RelocateSlicesToIndirect(uint8_t* __restrict__ cells_buf, int n_rows,
             :                               faststring* indirect) {
> nit: Since 'cells_buf' is an in-out param, could you reorder these so it's 
I went back and forth on this but kept it as 'buf, count" because it seems that's the more common order when passing an array in a C-style parameter like this


http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/columnar_serialization.cc@466
PS5, Line 466:   if (out->columns.size() != projection_schema->num_columns()) {
             :     CHECK_EQ(out->columns.size(), 0);
> Do you think we'll want to move this initialization out of this method at s
yea, maybe worth considering in a future refactor


http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/wire_protocol-test.cc
File src/kudu/common/wire_protocol-test.cc:

http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/wire_protocol-test.cc@336
PS5, Line 336:         if (!col.is_nullable() || !row.is_null(c)) {
> readability nit: mind inverting to avoid the negatives? eg if col.is_nullab
Done


http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/wire_protocol-test.cc@348
PS5, Line 348:       }
> Should we test the serialized non null bitmap if the row is null?
Done


http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/wire_protocol-test.cc@574
PS5, Line 574:     const int kNumTrials = static_cast<int>(
             :         kNumCellsToConvert / select_rate / spec.columns.size() / block.nrows());
> nit: maybe note that we're doing something along the lines of:
added some named variables here to make it clearer


http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/wire_protocol-test.cc@601
PS5, Line 601: cycles_per_cell_rowwise / cycles_per_cell_columnar;
> Reverse these?
why? we're displaying a throughput ratio below, so if rowwise takes 10 cycles and columnar takes 5, then the ratio is "columnar has 2x throughput"


http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/util/faststring.h
File src/kudu/util/faststring.h:

http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/util/faststring.h@108
PS5, Line 108:   void resize_with_extra_capacity(size_t newsize) {
> nit: should doc this and it'd be nice to explain when to use this over 'res
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/15560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
Gerrit-Change-Number: 15560
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 30 Mar 2020 17:39:38 +0000
Gerrit-HasComments: Yes

[kudu-CR] Add functionality to serialize a RowBlock into columnar format

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15560 )

Change subject: Add functionality to serialize a RowBlock into columnar format
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/15560/2/src/kudu/common/columnar_serialization.h
File src/kudu/common/columnar_serialization.h:

http://gerrit.cloudera.org:8080/#/c/15560/2/src/kudu/common/columnar_serialization.h@19
PS2, Line 19: #include <stdint.h>
nit: don't we prefer cstdint these days?


http://gerrit.cloudera.org:8080/#/c/15560/2/src/kudu/common/columnar_serialization.cc
File src/kudu/common/columnar_serialization.cc:

http://gerrit.cloudera.org:8080/#/c/15560/2/src/kudu/common/columnar_serialization.cc@479
PS2, Line 479:     int t_schema_idx = tablet_schema->find_column(col.name());
Do we eventually need to be wary of virtual columns here? Or is projection_schema the mapped read projection at this point?


http://gerrit.cloudera.org:8080/#/c/15560/2/src/kudu/common/wire_protocol-test.cc
File src/kudu/common/wire_protocol-test.cc:

http://gerrit.cloudera.org:8080/#/c/15560/2/src/kudu/common/wire_protocol-test.cc@617
PS2, Line 617:             UniformColumns(10, INT64, 0.1),
Doesn't have to be here if you prefer not to keep them for benchmarking purposes, but it'd be nice to also cover the various differently sized types ranging from 1 byte to 16.



-- 
To view, visit http://gerrit.cloudera.org:8080/15560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
Gerrit-Change-Number: 15560
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 26 Mar 2020 00:36:44 +0000
Gerrit-HasComments: Yes

[kudu-CR] Add functionality to serialize a RowBlock into columnar format

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15560 )

Change subject: Add functionality to serialize a RowBlock into columnar format
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/15560/2/src/kudu/common/columnar_serialization.h
File src/kudu/common/columnar_serialization.h:

http://gerrit.cloudera.org:8080/#/c/15560/2/src/kudu/common/columnar_serialization.h@19
PS2, Line 19: #include <stdint.h>
> nit: don't we prefer cstdint these days?
Done


http://gerrit.cloudera.org:8080/#/c/15560/2/src/kudu/common/columnar_serialization.cc
File src/kudu/common/columnar_serialization.cc:

http://gerrit.cloudera.org:8080/#/c/15560/2/src/kudu/common/columnar_serialization.cc@479
PS2, Line 479:     int t_schema_idx = tablet_schema->find_column(col.name());
> Do we eventually need to be wary of virtual columns here? Or is projection_
I'm just following the example from wire_protocol.cc which does the same thing.


http://gerrit.cloudera.org:8080/#/c/15560/2/src/kudu/common/wire_protocol-test.cc
File src/kudu/common/wire_protocol-test.cc:

http://gerrit.cloudera.org:8080/#/c/15560/2/src/kudu/common/wire_protocol-test.cc@617
PS2, Line 617:             UniformColumns(10, INT64, 0.1),
> Doesn't have to be here if you prefer not to keep them for benchmarking pur
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/15560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
Gerrit-Change-Number: 15560
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 26 Mar 2020 19:11:43 +0000
Gerrit-HasComments: Yes

[kudu-CR] Add functionality to serialize a RowBlock into columnar format

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15560 )

Change subject: Add functionality to serialize a RowBlock into columnar format
......................................................................


Patch Set 4: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15560/4/src/kudu/common/columnar_serialization-test.cc
File src/kudu/common/columnar_serialization-test.cc:

http://gerrit.cloudera.org:8080/#/c/15560/4/src/kudu/common/columnar_serialization-test.cc@45
PS4, Line 45:   // TODO(todd): templatize this test for other types once we have specialized
            :   // implementations.
nit: can remove this now?


http://gerrit.cloudera.org:8080/#/c/15560/4/src/kudu/common/columnar_serialization.cc
File src/kudu/common/columnar_serialization.cc:

http://gerrit.cloudera.org:8080/#/c/15560/4/src/kudu/common/columnar_serialization.cc@464
PS4, Line 464:   // TODO(todd) don't pre-size these to 1MB per column -- quite
             :   // expensive if there are a lot of columns!
Yeah, I imagine once this is plumbed into the tserver, there'll be a more correct value related to the scanner batch size or something similar.



-- 
To view, visit http://gerrit.cloudera.org:8080/15560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
Gerrit-Change-Number: 15560
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 26 Mar 2020 22:52:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] Add functionality to serialize a RowBlock into columnar format

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Andrew Wong, Kudu Jenkins, Andrew Wong, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15560

to look at the new patch set (#7).

Change subject: Add functionality to serialize a RowBlock into columnar format
......................................................................

Add functionality to serialize a RowBlock into columnar format

This adds the core functionality to take a RowBlock and convert it into
a set of buffers for each column: the cell data, the null bitmap, and
the indirect (string) data.

This also updates wire_protocol-test to add coverage for nulls and
unselected rows, and adds a comparison benchmark. The columnar code path
is 4-5x faster in the best case, and in the worst case only 30% slower
that the existing row-wise code path.

Some follow-up commits will add further optimizations.

Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
---
M src/kudu/common/CMakeLists.txt
M src/kudu/common/columnar_serialization.cc
M src/kudu/common/columnar_serialization.h
M src/kudu/common/wire_protocol-test.cc
M src/kudu/util/faststring.h
5 files changed, 510 insertions(+), 116 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/15560/7
-- 
To view, visit http://gerrit.cloudera.org:8080/15560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
Gerrit-Change-Number: 15560
Gerrit-PatchSet: 7
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Add functionality to serialize a RowBlock into columnar format

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15560 )

Change subject: Add functionality to serialize a RowBlock into columnar format
......................................................................


Patch Set 6: Code-Review+2

(2 comments)

IWYU's still got a bone to pick with a couple files though.

http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/wire_protocol-test.cc
File src/kudu/common/wire_protocol-test.cc:

http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/wire_protocol-test.cc@574
PS5, Line 574:     RowBlock block(&benchmark_schema_, 1000, &arena);
             :     // Regardless of the config, use a constant number of selected cells for the
> added some named variables here to make it clearer
Thanks!


http://gerrit.cloudera.org:8080/#/c/15560/5/src/kudu/common/wire_protocol-test.cc@601
PS5, Line 601: olBenchmark, TestRowBlockToPBBenchmark) {
> why? we're displaying a throughput ratio below, so if rowwise takes 10 cycl
Ah, missed that the log below referred to throughput and not rate.



-- 
To view, visit http://gerrit.cloudera.org:8080/15560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
Gerrit-Change-Number: 15560
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 30 Mar 2020 19:10:44 +0000
Gerrit-HasComments: Yes

[kudu-CR] Add functionality to serialize a RowBlock into columnar format

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Andrew Wong, Kudu Jenkins, Andrew Wong, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15560

to look at the new patch set (#4).

Change subject: Add functionality to serialize a RowBlock into columnar format
......................................................................

Add functionality to serialize a RowBlock into columnar format

This adds the core functionality to take a RowBlock and convert it into
a set of buffers for each column: the cell data, the null bitmap, and
the indirect (string) data.

This also updates wire_protocol-test to add coverage for nulls and
unselected rows, and adds a comparison benchmark. The columnar code path
is 4-5x faster in the best case, and in the worst case only 30% slower
that the existing row-wise code path.

Some follow-up commits will add further optimizations.

Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
---
M src/kudu/common/CMakeLists.txt
M src/kudu/common/columnar_serialization.cc
M src/kudu/common/columnar_serialization.h
M src/kudu/common/wire_protocol-test.cc
M src/kudu/util/faststring.h
5 files changed, 498 insertions(+), 113 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/15560/4
-- 
To view, visit http://gerrit.cloudera.org:8080/15560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
Gerrit-Change-Number: 15560
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Add functionality to serialize a RowBlock into columnar format

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15560 )

Change subject: Add functionality to serialize a RowBlock into columnar format
......................................................................


Patch Set 7: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
Gerrit-Change-Number: 15560
Gerrit-PatchSet: 7
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 30 Mar 2020 19:43:12 +0000
Gerrit-HasComments: No

[kudu-CR] Add functionality to serialize a RowBlock into columnar format

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Andrew Wong, Kudu Jenkins, Andrew Wong, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15560

to look at the new patch set (#6).

Change subject: Add functionality to serialize a RowBlock into columnar format
......................................................................

Add functionality to serialize a RowBlock into columnar format

This adds the core functionality to take a RowBlock and convert it into
a set of buffers for each column: the cell data, the null bitmap, and
the indirect (string) data.

This also updates wire_protocol-test to add coverage for nulls and
unselected rows, and adds a comparison benchmark. The columnar code path
is 4-5x faster in the best case, and in the worst case only 30% slower
that the existing row-wise code path.

Some follow-up commits will add further optimizations.

Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
---
M src/kudu/common/CMakeLists.txt
M src/kudu/common/columnar_serialization.cc
M src/kudu/common/columnar_serialization.h
M src/kudu/common/wire_protocol-test.cc
M src/kudu/util/faststring.h
5 files changed, 512 insertions(+), 116 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/15560/6
-- 
To view, visit http://gerrit.cloudera.org:8080/15560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I287a8aa6736f19816b0edbe16409c01f35c0319e
Gerrit-Change-Number: 15560
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>