You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2020/04/07 22:55:21 UTC

[kudu-CR] columnar serialization: avoid preallocating 8MB per column

Hello Andrew Wong, Grant Henke,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/15679

to review the following change.


Change subject: columnar_serialization: avoid preallocating 8MB per column
......................................................................

columnar_serialization: avoid preallocating 8MB per column

Previously we would reserve an 8MB buffer for every column of data to be
scanned. This wouldn't scale well for high number of concurrent queries
with lots of columns.

The new approach is to use the configured batch size and apportion that
memory budget across the columns based on the size of those columns.
It's not 100% accurate but at least shouldn't overshoot by hundreds of
MB like the prior approach.

Change-Id: I9b7ff78547792acbd975a606a02ec388dba3a8e8
---
M src/kudu/common/columnar_serialization.cc
M src/kudu/common/columnar_serialization.h
M src/kudu/common/wire_protocol-test.cc
M src/kudu/tserver/tablet_service.cc
4 files changed, 17 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/15679/1
-- 
To view, visit http://gerrit.cloudera.org:8080/15679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9b7ff78547792acbd975a606a02ec388dba3a8e8
Gerrit-Change-Number: 15679
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>

[kudu-CR] columnar serialization: avoid preallocating 8MB per column

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15679 )

Change subject: columnar_serialization: avoid preallocating 8MB per column
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15679/1/src/kudu/common/columnar_serialization.h
File src/kudu/common/columnar_serialization.h:

http://gerrit.cloudera.org:8080/#/c/15679/1/src/kudu/common/columnar_serialization.h@38
PS1, Line 38: batch_size_bytes
> Does it make sense to document at least this parameter?
+1

Or rename this to expected_max_batch_size_bytes or something. Without reading the implementation, this can be a bit confusing, given we're not initializing the batch with any data, but we're giving it a size.



-- 
To view, visit http://gerrit.cloudera.org:8080/15679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b7ff78547792acbd975a606a02ec388dba3a8e8
Gerrit-Change-Number: 15679
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 13 Apr 2020 07:12:45 +0000
Gerrit-HasComments: Yes

[kudu-CR] columnar serialization: avoid preallocating 8MB per column

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/15679 )

Change subject: columnar_serialization: avoid preallocating 8MB per column
......................................................................


Patch Set 4: Code-Review+2

(1 comment)

It seems I'm late with the review, but anyways :)


Feel free to ignore the nit (especially given the fact this patch is already committed).

http://gerrit.cloudera.org:8080/#/c/15679/3/src/kudu/common/columnar_serialization.h
File src/kudu/common/columnar_serialization.h:

http://gerrit.cloudera.org:8080/#/c/15679/3/src/kudu/common/columnar_serialization.h@44
PS3, Line 44: is 
nit: extra 'is'



-- 
To view, visit http://gerrit.cloudera.org:8080/15679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b7ff78547792acbd975a606a02ec388dba3a8e8
Gerrit-Change-Number: 15679
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 16 Apr 2020 00:41:14 +0000
Gerrit-HasComments: Yes

[kudu-CR] columnar serialization: avoid preallocating 8MB per column

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/15679 )

Change subject: columnar_serialization: avoid preallocating 8MB per column
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15679/1/src/kudu/common/columnar_serialization.h
File src/kudu/common/columnar_serialization.h:

http://gerrit.cloudera.org:8080/#/c/15679/1/src/kudu/common/columnar_serialization.h@38
PS1, Line 38: batch_size_bytes
Does it make sense to document at least this parameter?


http://gerrit.cloudera.org:8080/#/c/15679/1/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/15679/1/src/kudu/tserver/tablet_service.cc@925
PS1, Line 925: batch_size_bytes
Where does it come from?  Should it be somehow passed in as a parameter of the enclosing method or be a member field?



-- 
To view, visit http://gerrit.cloudera.org:8080/15679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b7ff78547792acbd975a606a02ec388dba3a8e8
Gerrit-Change-Number: 15679
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 08 Apr 2020 05:07:38 +0000
Gerrit-HasComments: Yes

[kudu-CR] columnar serialization: avoid preallocating 8MB per column

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Andrew Wong, Kudu Jenkins, Andrew Wong, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15679

to look at the new patch set (#3).

Change subject: columnar_serialization: avoid preallocating 8MB per column
......................................................................

columnar_serialization: avoid preallocating 8MB per column

Previously we would reserve an 8MB buffer for every column of data to be
scanned. This wouldn't scale well for high number of concurrent queries
with lots of columns.

The new approach is to use the configured batch size and apportion that
memory budget across the columns based on the size of those columns.
It's not 100% accurate but at least shouldn't overshoot by hundreds of
MB like the prior approach.

Change-Id: I9b7ff78547792acbd975a606a02ec388dba3a8e8
---
M src/kudu/common/columnar_serialization.cc
M src/kudu/common/columnar_serialization.h
M src/kudu/common/wire_protocol-test.cc
M src/kudu/tserver/tablet_service.cc
4 files changed, 25 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/15679/3
-- 
To view, visit http://gerrit.cloudera.org:8080/15679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9b7ff78547792acbd975a606a02ec388dba3a8e8
Gerrit-Change-Number: 15679
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] columnar serialization: avoid preallocating 8MB per column

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Andrew Wong, Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15679

to look at the new patch set (#2).

Change subject: columnar_serialization: avoid preallocating 8MB per column
......................................................................

columnar_serialization: avoid preallocating 8MB per column

Previously we would reserve an 8MB buffer for every column of data to be
scanned. This wouldn't scale well for high number of concurrent queries
with lots of columns.

The new approach is to use the configured batch size and apportion that
memory budget across the columns based on the size of those columns.
It's not 100% accurate but at least shouldn't overshoot by hundreds of
MB like the prior approach.

Change-Id: I9b7ff78547792acbd975a606a02ec388dba3a8e8
---
M src/kudu/common/columnar_serialization.cc
M src/kudu/common/columnar_serialization.h
M src/kudu/common/wire_protocol-test.cc
M src/kudu/tserver/tablet_service.cc
4 files changed, 20 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/15679/2
-- 
To view, visit http://gerrit.cloudera.org:8080/15679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9b7ff78547792acbd975a606a02ec388dba3a8e8
Gerrit-Change-Number: 15679
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] columnar serialization: avoid preallocating 8MB per column

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15679 )

Change subject: columnar_serialization: avoid preallocating 8MB per column
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15679/1/src/kudu/common/columnar_serialization.h
File src/kudu/common/columnar_serialization.h:

http://gerrit.cloudera.org:8080/#/c/15679/1/src/kudu/common/columnar_serialization.h@38
PS1, Line 38: 
> +1
Done


http://gerrit.cloudera.org:8080/#/c/15679/1/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/15679/1/src/kudu/tserver/tablet_service.cc@925
PS1, Line 925: erializer::Creat
> Where does it come from?  Should it be somehow passed in as a parameter of 
looks like I lost part of this diff :)



-- 
To view, visit http://gerrit.cloudera.org:8080/15679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b7ff78547792acbd975a606a02ec388dba3a8e8
Gerrit-Change-Number: 15679
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 15 Apr 2020 22:36:24 +0000
Gerrit-HasComments: Yes

[kudu-CR] columnar serialization: avoid preallocating 8MB per column

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15679 )

Change subject: columnar_serialization: avoid preallocating 8MB per column
......................................................................

columnar_serialization: avoid preallocating 8MB per column

Previously we would reserve an 8MB buffer for every column of data to be
scanned. This wouldn't scale well for high number of concurrent queries
with lots of columns.

The new approach is to use the configured batch size and apportion that
memory budget across the columns based on the size of those columns.
It's not 100% accurate but at least shouldn't overshoot by hundreds of
MB like the prior approach.

Change-Id: I9b7ff78547792acbd975a606a02ec388dba3a8e8
Reviewed-on: http://gerrit.cloudera.org:8080/15679
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Tested-by: Kudu Jenkins
---
M src/kudu/common/columnar_serialization.cc
M src/kudu/common/columnar_serialization.h
M src/kudu/common/wire_protocol-test.cc
M src/kudu/tserver/tablet_service.cc
4 files changed, 25 insertions(+), 11 deletions(-)

Approvals:
  Andrew Wong: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/15679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9b7ff78547792acbd975a606a02ec388dba3a8e8
Gerrit-Change-Number: 15679
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] columnar serialization: avoid preallocating 8MB per column

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15679 )

Change subject: columnar_serialization: avoid preallocating 8MB per column
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b7ff78547792acbd975a606a02ec388dba3a8e8
Gerrit-Change-Number: 15679
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 15 Apr 2020 22:33:38 +0000
Gerrit-HasComments: No