You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2020/03/25 07:37:00 UTC

[kudu-CR] Add core algorithms for columnar serialization

Hello Andrew Wong, Grant Henke,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/15556

to review the following change.


Change subject: Add core algorithms for columnar serialization
......................................................................

Add core algorithms for columnar serialization

This adds the core of the columnar serialization code paths. Even though
we internally scan in a columnar fashion in the tablet server, sending
those columns across the wire isn't straightforward. We have two bits of
necessary processing:

1) the selection vector needs to be taken into account so we only send
back selected rows. This means we need to copy out the selected cells
and also copy out the selected bits from the null bitmap where relevant.
Doing the null bitmap portion efficiently with wide platform support
makes up a lot of this patch.

2) for the case of null values, we want to make sure we don't send
uninitialized memory (which might include secrets!) to the client. So we
need to zero out any cells where the corresponding non-null bitmap bit
is unset.

To keep the review manageable, this just adds some unit tests and all
the new code is initially "dead". Later commits will add the parts that
construct the full block of columns to be sent on the wire, hook this
into the tserver, etc.

Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
---
M src/kudu/common/CMakeLists.txt
A src/kudu/common/columnar_serialization-test.cc
A src/kudu/common/columnar_serialization.cc
A src/kudu/common/columnar_serialization.h
A src/kudu/common/zp7.cc
A src/kudu/common/zp7.h
6 files changed, 763 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/15556/1
-- 
To view, visit http://gerrit.cloudera.org:8080/15556
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
Gerrit-Change-Number: 15556
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>

[kudu-CR] Add core algorithms for columnar serialization

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Andrew Wong, Kudu Jenkins, Andrew Wong, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15556

to look at the new patch set (#5).

Change subject: Add core algorithms for columnar serialization
......................................................................

Add core algorithms for columnar serialization

This adds the core of the columnar serialization code paths. Even though
we internally scan in a columnar fashion in the tablet server, sending
those columns across the wire isn't straightforward. We have two bits of
necessary processing:

1) the selection vector needs to be taken into account so we only send
back selected rows. This means we need to copy out the selected cells
and also copy out the selected bits from the null bitmap where relevant.
Doing the null bitmap portion efficiently with wide platform support
makes up a lot of this patch.

2) for the case of null values, we want to make sure we don't send
uninitialized memory (which might include secrets!) to the client. So we
need to zero out any cells where the corresponding non-null bitmap bit
is unset.

To keep the review manageable, this just adds some unit tests and all
the new code is initially "dead". Later commits will add the parts that
construct the full block of columns to be sent on the wire, hook this
into the tserver, etc.

Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
---
M LICENSE.txt
M src/kudu/common/CMakeLists.txt
A src/kudu/common/columnar_serialization-test.cc
A src/kudu/common/columnar_serialization.cc
A src/kudu/common/columnar_serialization.h
A src/kudu/common/zp7.cc
A src/kudu/common/zp7.h
7 files changed, 842 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/15556/5
-- 
To view, visit http://gerrit.cloudera.org:8080/15556
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
Gerrit-Change-Number: 15556
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Add core algorithms for columnar serialization

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/15556 )

Change subject: Add core algorithms for columnar serialization
......................................................................


Patch Set 5: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc
File src/kudu/common/columnar_serialization.cc:

http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc@314
PS4, Line 314: ////////////////////////////////////////////////////////////
> if n_sel_rows is zero this loop will just get skipped, no?
The concern was if that n_sel_rows were negative, this would go astray.

It seems I meant DCHECK_GE(n_sel_rows, 0).



-- 
To view, visit http://gerrit.cloudera.org:8080/15556
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
Gerrit-Change-Number: 15556
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 26 Mar 2020 23:23:14 +0000
Gerrit-HasComments: Yes

[kudu-CR] Add core algorithms for columnar serialization

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15556 )

Change subject: Add core algorithms for columnar serialization
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization-test.cc
File src/kudu/common/columnar_serialization-test.cc:

http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization-test.cc@46
PS4, Line 46:   static constexpr int kTypeSize = sizeof(DataType);
> Think it's worth testing other data sizes?
I'll add a TODO here to add tests once we have actual specializations for other types. Right now it's the same code for all types, just a different argument to memset.


http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc
File src/kudu/common/columnar_serialization.cc:

http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc@78
PS4, Line 78:   void Flush() {
            :     while (num_buffered_bits_ > 0) {
            :       *dst_++ = buffered_values_ & 0xff;
            :       buffered_values_ >>= 8;
            :       num_buffered_bits_ -= 8;
            :     }
            :   }
> nit: maybe mention that this won't fix "alignment" issues, eg if num_buffer
I'll add some kind of CHECK to ensure that it's only called once


http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc@132
PS4, Line 132: //
             : // NOTE: this assumes that dst_values_buf and non_null_bitmap are valid for the full range
             : // of indices [0, dst_idx + n_rows). The implementation may redundantly re-zero cells
             : // at indexes less than dst_idx.
> Just making sure I'm reading this right -- this is referring to the alignme
yea -- basically we may rewrite the tail of the prior call. If you ask to zero nulls from index 9 to 10 we'll actually zero nulls from 8-10 so we can operate bytewise on the bitmap


http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc@136
PS4, Line 136: type_size
> nit: maybe add units to the name? Same with the template arg.
will do


http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc@314
PS4, Line 314:   while (rem--) {
> Add DCHECK_GT(n_sel_rows, 0)  ?
if n_sel_rows is zero this loop will just get skipped, no?



-- 
To view, visit http://gerrit.cloudera.org:8080/15556
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
Gerrit-Change-Number: 15556
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 26 Mar 2020 18:34:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] Add core algorithms for columnar serialization

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/15556 )

Change subject: Add core algorithms for columnar serialization
......................................................................


Patch Set 4: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/15556
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
Gerrit-Change-Number: 15556
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 25 Mar 2020 21:59:47 +0000
Gerrit-HasComments: No

[kudu-CR] Add core algorithms for columnar serialization

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/15556 )

Change subject: Add core algorithms for columnar serialization
......................................................................


Patch Set 2:

(3 comments)

Just passing through with a few comments.

http://gerrit.cloudera.org:8080/#/c/15556/2/src/kudu/common/columnar_serialization-test.cc
File src/kudu/common/columnar_serialization-test.cc:

http://gerrit.cloudera.org:8080/#/c/15556/2/src/kudu/common/columnar_serialization-test.cc@20
PS2, Line 20: #include <stddef.h>
Prefer cstddef for new code.


http://gerrit.cloudera.org:8080/#/c/15556/2/src/kudu/common/zp7.h
File src/kudu/common/zp7.h:

http://gerrit.cloudera.org:8080/#/c/15556/2/src/kudu/common/zp7.h@23
PS2, Line 23: // Extract bits from 'a' corresponding to 'mask'.
Want to expand on this to help differentiate between the two function calls?


http://gerrit.cloudera.org:8080/#/c/15556/2/src/kudu/common/zp7.cc
File src/kudu/common/zp7.cc:

http://gerrit.cloudera.org:8080/#/c/15556/2/src/kudu/common/zp7.cc@27
PS2, Line 27: // project has no header file. It has been modified as follows:
Could you also reindent the file to conform to Kudu's indentation?



-- 
To view, visit http://gerrit.cloudera.org:8080/15556
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
Gerrit-Change-Number: 15556
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 25 Mar 2020 20:29:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] Add core algorithms for columnar serialization

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15556 )

Change subject: Add core algorithms for columnar serialization
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15556
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
Gerrit-Change-Number: 15556
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 26 Mar 2020 22:07:37 +0000
Gerrit-HasComments: No

[kudu-CR] Add core algorithms for columnar serialization

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Andrew Wong, Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15556

to look at the new patch set (#3).

Change subject: Add core algorithms for columnar serialization
......................................................................

Add core algorithms for columnar serialization

This adds the core of the columnar serialization code paths. Even though
we internally scan in a columnar fashion in the tablet server, sending
those columns across the wire isn't straightforward. We have two bits of
necessary processing:

1) the selection vector needs to be taken into account so we only send
back selected rows. This means we need to copy out the selected cells
and also copy out the selected bits from the null bitmap where relevant.
Doing the null bitmap portion efficiently with wide platform support
makes up a lot of this patch.

2) for the case of null values, we want to make sure we don't send
uninitialized memory (which might include secrets!) to the client. So we
need to zero out any cells where the corresponding non-null bitmap bit
is unset.

To keep the review manageable, this just adds some unit tests and all
the new code is initially "dead". Later commits will add the parts that
construct the full block of columns to be sent on the wire, hook this
into the tserver, etc.

Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
---
M LICENSE.txt
M src/kudu/common/CMakeLists.txt
A src/kudu/common/columnar_serialization-test.cc
A src/kudu/common/columnar_serialization.cc
A src/kudu/common/columnar_serialization.h
A src/kudu/common/zp7.cc
A src/kudu/common/zp7.h
7 files changed, 830 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/15556/3
-- 
To view, visit http://gerrit.cloudera.org:8080/15556
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
Gerrit-Change-Number: 15556
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] Add core algorithms for columnar serialization

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15556 )

Change subject: Add core algorithms for columnar serialization
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization-test.cc
File src/kudu/common/columnar_serialization-test.cc:

http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization-test.cc@46
PS4, Line 46:   static constexpr int kTypeSize = sizeof(DataType);
Think it's worth testing other data sizes?


http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc
File src/kudu/common/columnar_serialization.cc:

http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc@78
PS4, Line 78:   void Flush() {
            :     while (num_buffered_bits_ > 0) {
            :       *dst_++ = buffered_values_ & 0xff;
            :       buffered_values_ >>= 8;
            :       num_buffered_bits_ -= 8;
            :     }
            :   }
nit: maybe mention that this won't fix "alignment" issues, eg if num_buffered_bits_ becomes negative. But that's OK since this should only be called once anyway.


http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc@132
PS4, Line 132: //
             : // NOTE: this assumes that dst_values_buf and non_null_bitmap are valid for the full range
             : // of indices [0, dst_idx + n_rows). The implementation may redundantly re-zero cells
             : // at indexes less than dst_idx.
Just making sure I'm reading this right -- this is referring to the alignment calls in the implementation, right?


http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc@136
PS4, Line 136: type_size
nit: maybe add units to the name? Same with the template arg.



-- 
To view, visit http://gerrit.cloudera.org:8080/15556
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
Gerrit-Change-Number: 15556
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 25 Mar 2020 22:39:16 +0000
Gerrit-HasComments: Yes

[kudu-CR] Add core algorithms for columnar serialization

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/15556 )

Change subject: Add core algorithms for columnar serialization
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc
File src/kudu/common/columnar_serialization.cc:

http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc@51
PS4, Line 51: skip_initial_bits
Maybe, add DCHECK_GE(skip_initial_bits, 0) or make type of skip_initial_bits an unsigned type?  Or skip_initial_bits can be negative?


http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc@64
PS4, Line 64:     buffered_values_ |= v << num_buffered_bits_;
In addition, maybe it's worth adding DCHECK_GE(num_buffered_bits_, 0) due to the Flush()'s behavior?


http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc@251
PS4, Line 251: #endif
nit: for better readability, could you add comments about current scope?  For example, here it would be

#endif // #ifdef __x86_64__ ...


http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc@314
PS4, Line 314:   while (rem--) {
Add DCHECK_GT(n_sel_rows, 0)  ?


http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc@338
PS4, Line 338:     case 1:
             :       CopySelectedRowsImpl<1>(sel_rows, src_buf, dst_buf);
             :       break;
             :     case 2:
             :       CopySelectedRowsImpl<2>(sel_rows, src_buf, dst_buf);
             :       break;
             :     case 4:
             :       CopySelectedRowsImpl<4>(sel_rows, src_buf, dst_buf);
             :       break;
             :     case 8:
             :       CopySelectedRowsImpl<8>(sel_rows, src_buf, dst_buf);
             :       break;
             :     case 16:
             :       CopySelectedRowsImpl<16>(sel_rows, src_buf, dst_buf);
             :       break;
Use same macro-alike approach here as at line 144?



-- 
To view, visit http://gerrit.cloudera.org:8080/15556
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
Gerrit-Change-Number: 15556
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 26 Mar 2020 05:16:36 +0000
Gerrit-HasComments: Yes

[kudu-CR] Add core algorithms for columnar serialization

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15556 )

Change subject: Add core algorithms for columnar serialization
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/15556/2/src/kudu/common/columnar_serialization-test.cc
File src/kudu/common/columnar_serialization-test.cc:

http://gerrit.cloudera.org:8080/#/c/15556/2/src/kudu/common/columnar_serialization-test.cc@20
PS2, Line 20: #include <stddef.h>
> Prefer cstddef for new code.
Done


http://gerrit.cloudera.org:8080/#/c/15556/2/src/kudu/common/zp7.h
File src/kudu/common/zp7.h:

http://gerrit.cloudera.org:8080/#/c/15556/2/src/kudu/common/zp7.h@23
PS2, Line 23: // Extract bits from 'a' corresponding to 'mask'.
> Want to expand on this to help differentiate between the two function calls
Done


http://gerrit.cloudera.org:8080/#/c/15556/2/src/kudu/common/zp7.cc
File src/kudu/common/zp7.cc:

http://gerrit.cloudera.org:8080/#/c/15556/2/src/kudu/common/zp7.cc@27
PS2, Line 27: // project has no header file. It has been modified as follows:
> Could you also reindent the file to conform to Kudu's indentation?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/15556
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
Gerrit-Change-Number: 15556
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 25 Mar 2020 21:13:29 +0000
Gerrit-HasComments: Yes

[kudu-CR] Add core algorithms for columnar serialization

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Andrew Wong, Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15556

to look at the new patch set (#4).

Change subject: Add core algorithms for columnar serialization
......................................................................

Add core algorithms for columnar serialization

This adds the core of the columnar serialization code paths. Even though
we internally scan in a columnar fashion in the tablet server, sending
those columns across the wire isn't straightforward. We have two bits of
necessary processing:

1) the selection vector needs to be taken into account so we only send
back selected rows. This means we need to copy out the selected cells
and also copy out the selected bits from the null bitmap where relevant.
Doing the null bitmap portion efficiently with wide platform support
makes up a lot of this patch.

2) for the case of null values, we want to make sure we don't send
uninitialized memory (which might include secrets!) to the client. So we
need to zero out any cells where the corresponding non-null bitmap bit
is unset.

To keep the review manageable, this just adds some unit tests and all
the new code is initially "dead". Later commits will add the parts that
construct the full block of columns to be sent on the wire, hook this
into the tserver, etc.

Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
---
M LICENSE.txt
M src/kudu/common/CMakeLists.txt
A src/kudu/common/columnar_serialization-test.cc
A src/kudu/common/columnar_serialization.cc
A src/kudu/common/columnar_serialization.h
A src/kudu/common/zp7.cc
A src/kudu/common/zp7.h
7 files changed, 835 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/15556/4
-- 
To view, visit http://gerrit.cloudera.org:8080/15556
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
Gerrit-Change-Number: 15556
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Add core algorithms for columnar serialization

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Andrew Wong, Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15556

to look at the new patch set (#2).

Change subject: Add core algorithms for columnar serialization
......................................................................

Add core algorithms for columnar serialization

This adds the core of the columnar serialization code paths. Even though
we internally scan in a columnar fashion in the tablet server, sending
those columns across the wire isn't straightforward. We have two bits of
necessary processing:

1) the selection vector needs to be taken into account so we only send
back selected rows. This means we need to copy out the selected cells
and also copy out the selected bits from the null bitmap where relevant.
Doing the null bitmap portion efficiently with wide platform support
makes up a lot of this patch.

2) for the case of null values, we want to make sure we don't send
uninitialized memory (which might include secrets!) to the client. So we
need to zero out any cells where the corresponding non-null bitmap bit
is unset.

To keep the review manageable, this just adds some unit tests and all
the new code is initially "dead". Later commits will add the parts that
construct the full block of columns to be sent on the wire, hook this
into the tserver, etc.

Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
---
M LICENSE.txt
M src/kudu/common/CMakeLists.txt
A src/kudu/common/columnar_serialization-test.cc
A src/kudu/common/columnar_serialization.cc
A src/kudu/common/columnar_serialization.h
A src/kudu/common/zp7.cc
A src/kudu/common/zp7.h
7 files changed, 809 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/15556/2
-- 
To view, visit http://gerrit.cloudera.org:8080/15556
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
Gerrit-Change-Number: 15556
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] Add core algorithms for columnar serialization

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15556 )

Change subject: Add core algorithms for columnar serialization
......................................................................

Add core algorithms for columnar serialization

This adds the core of the columnar serialization code paths. Even though
we internally scan in a columnar fashion in the tablet server, sending
those columns across the wire isn't straightforward. We have two bits of
necessary processing:

1) the selection vector needs to be taken into account so we only send
back selected rows. This means we need to copy out the selected cells
and also copy out the selected bits from the null bitmap where relevant.
Doing the null bitmap portion efficiently with wide platform support
makes up a lot of this patch.

2) for the case of null values, we want to make sure we don't send
uninitialized memory (which might include secrets!) to the client. So we
need to zero out any cells where the corresponding non-null bitmap bit
is unset.

To keep the review manageable, this just adds some unit tests and all
the new code is initially "dead". Later commits will add the parts that
construct the full block of columns to be sent on the wire, hook this
into the tserver, etc.

Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
Reviewed-on: http://gerrit.cloudera.org:8080/15556
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Reviewed-by: Alexey Serbin <as...@cloudera.com>
---
M LICENSE.txt
M src/kudu/common/CMakeLists.txt
A src/kudu/common/columnar_serialization-test.cc
A src/kudu/common/columnar_serialization.cc
A src/kudu/common/columnar_serialization.h
A src/kudu/common/zp7.cc
A src/kudu/common/zp7.h
7 files changed, 842 insertions(+), 1 deletion(-)

Approvals:
  Kudu Jenkins: Verified
  Andrew Wong: Looks good to me, approved
  Alexey Serbin: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/15556
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
Gerrit-Change-Number: 15556
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Add core algorithms for columnar serialization

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15556 )

Change subject: Add core algorithms for columnar serialization
......................................................................


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization-test.cc
File src/kudu/common/columnar_serialization-test.cc:

http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization-test.cc@46
PS4, Line 46:   static constexpr int kTypeSize = sizeof(DataType);
> I'll add a TODO here to add tests once we have actual specializations for o
Done


http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc
File src/kudu/common/columnar_serialization.cc:

http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc@51
PS4, Line 51: skip_initial_bits
> Maybe, add DCHECK_GE(skip_initial_bits, 0) or make type of skip_initial_bit
Done


http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc@64
PS4, Line 64:     buffered_values_ |= v << num_buffered_bits_;
> In addition, maybe it's worth adding DCHECK_GE(num_buffered_bits_, 0) due t
added a DCHECK(!flushed_)


http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc@78
PS4, Line 78:   void Flush() {
            :     while (num_buffered_bits_ > 0) {
            :       *dst_++ = buffered_values_ & 0xff;
            :       buffered_values_ >>= 8;
            :       num_buffered_bits_ -= 8;
            :     }
            :   }
> I'll add some kind of CHECK to ensure that it's only called once
Done


http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc@136
PS4, Line 136: type_size
> will do
i'll rename to sizeof_type since people should understand thta to be equivalent to sizeof(type) which is clearly bytes


http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc@251
PS4, Line 251: #endif
> nit: for better readability, could you add comments about current scope?  F
Done


http://gerrit.cloudera.org:8080/#/c/15556/4/src/kudu/common/columnar_serialization.cc@338
PS4, Line 338:     case 1:
             :       CopySelectedRowsImpl<1>(sel_rows, src_buf, dst_buf);
             :       break;
             :     case 2:
             :       CopySelectedRowsImpl<2>(sel_rows, src_buf, dst_buf);
             :       break;
             :     case 4:
             :       CopySelectedRowsImpl<4>(sel_rows, src_buf, dst_buf);
             :       break;
             :     case 8:
             :       CopySelectedRowsImpl<8>(sel_rows, src_buf, dst_buf);
             :       break;
             :     case 16:
             :       CopySelectedRowsImpl<16>(sel_rows, src_buf, dst_buf);
             :       break;
> Use same macro-alike approach here as at line 144?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/15556
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16f2993081aac54609aab4d8219ef0bf6c7708c2
Gerrit-Change-Number: 15556
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 26 Mar 2020 19:05:05 +0000
Gerrit-HasComments: Yes