You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2018/03/23 19:24:42 UTC

[kudu-CR] KUDU-16 pt 2: add client-side limits on scanners

Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9790


Change subject: KUDU-16 pt 2: add client-side limits on scanners
......................................................................

KUDU-16 pt 2: add client-side limits on scanners

This patch adds a public API to allow the specification of
per-client-side-scanner limits on the number of rows returned. Each
scanner will maintain a count of the number of rows already read, and
adjust the server-side limit upon sending the next scan request.

A couple of unit tests are included to verify that the limits act as
expected. I also verified that lowering the limit reduces the number of
bytes read on disk (at the granularity of a single scan batch at a
time).

Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d
Note: this patch does not implement the behavior for scan tokens.
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
7 files changed, 184 insertions(+), 40 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/9790/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9790
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d
Gerrit-Change-Number: 9790
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>

[kudu-CR] KUDU-16 pt 2: add client-side limits on scanners

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change. ( http://gerrit.cloudera.org:8080/9790 )

Change subject: KUDU-16 pt 2: add client-side limits on scanners
......................................................................


Patch Set 1:

Haven't fully reviewed this patch, just looked at the tests, overall.
I think we need a more randomized test that scans from both memrowsets and diskrowsets and has arbitrary limits.


-- 
To view, visit http://gerrit.cloudera.org:8080/9790
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d
Gerrit-Change-Number: 9790
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Fri, 23 Mar 2018 20:03:39 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-16 pt 2: add client-side limits on scanners

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9790

to look at the new patch set (#2).

Change subject: KUDU-16 pt 2: add client-side limits on scanners
......................................................................

KUDU-16 pt 2: add client-side limits on scanners

This patch adds a public API to allow the specification of
per-client-side-scanner limits on the number of rows returned. Each
scanner will maintain a count of the number of rows already read, and
adjust the server-side limit upon sending the next scan request.

A couple of unit tests are included to verify that the limits act as
expected. I also verified that lowering the limit reduces the number of
bytes read on disk (at the granularity of a single scan batch at a
time).

Note: this patch does not implement the behavior for scan tokens.

Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
7 files changed, 184 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/9790/2
-- 
To view, visit http://gerrit.cloudera.org:8080/9790
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d
Gerrit-Change-Number: 9790
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] KUDU-16 pt 2: add client-side limits on scanners

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9790

to look at the new patch set (#5).

Change subject: KUDU-16 pt 2: add client-side limits on scanners
......................................................................

KUDU-16 pt 2: add client-side limits on scanners

This patch adds a public API to allow the specification of
per-client-side-scanner limits on the number of rows returned. Each
scanner will maintain a count of the number of rows already read, and
adjust the server-side limit upon sending the next scan request.

A couple of tests are included to verify that the limits act as
expected. I also verified that lowering the limit reduces the number of
bytes read on disk (at the granularity of a single scan batch at a
time).

Note: this patch does not implement the behavior for scan tokens.

Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
7 files changed, 184 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/9790/5
-- 
To view, visit http://gerrit.cloudera.org:8080/9790
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d
Gerrit-Change-Number: 9790
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-16 pt 2: add client-side limits on scanners

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9790 )

Change subject: KUDU-16 pt 2: add client-side limits on scanners
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/9790
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d
Gerrit-Change-Number: 9790
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 03 Apr 2018 22:19:50 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-16 pt 2: add client-side limits on scanners

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9790

to look at the new patch set (#4).

Change subject: KUDU-16 pt 2: add client-side limits on scanners
......................................................................

KUDU-16 pt 2: add client-side limits on scanners

This patch adds a public API to allow the specification of
per-client-side-scanner limits on the number of rows returned. Each
scanner will maintain a count of the number of rows already read, and
adjust the server-side limit upon sending the next scan request.

A couple of tests are included to verify that the limits act as
expected. I also verified that lowering the limit reduces the number of
bytes read on disk (at the granularity of a single scan batch at a
time).

Note: this patch does not implement the behavior for scan tokens.

Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
7 files changed, 185 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/9790/4
-- 
To view, visit http://gerrit.cloudera.org:8080/9790
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d
Gerrit-Change-Number: 9790
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] KUDU-16 pt 2: add client-side limits on scanners

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9790 )

Change subject: KUDU-16 pt 2: add client-side limits on scanners
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9790/4/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

http://gerrit.cloudera.org:8080/#/c/9790/4/src/kudu/client/client-test.cc@812
PS4, Line 812: TEST_F(ClientTest, TestRandomizedLimitScans) {
> If I'm reading client-test correctly, this only sets up a table with two ta
Done. I just modified this test to use hash partitioning, keeping it randomized.


http://gerrit.cloudera.org:8080/#/c/9790/4/src/kudu/client/scan_configuration.h
File src/kudu/client/scan_configuration.h:

http://gerrit.cloudera.org:8080/#/c/9790/4/src/kudu/client/scan_configuration.h@153
PS4, Line 153:     CHECK(has_snapshot_timestamp());
> hm, why not use boost::optional here?
Better yet, the limit is completely owned by the scan spec.



-- 
To view, visit http://gerrit.cloudera.org:8080/9790
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d
Gerrit-Change-Number: 9790
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 30 Mar 2018 07:52:24 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-16 pt 2: add client-side limits on scanners

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9790

to look at the new patch set (#3).

Change subject: KUDU-16 pt 2: add client-side limits on scanners
......................................................................

KUDU-16 pt 2: add client-side limits on scanners

This patch adds a public API to allow the specification of
per-client-side-scanner limits on the number of rows returned. Each
scanner will maintain a count of the number of rows already read, and
adjust the server-side limit upon sending the next scan request.

A couple of unit tests are included to verify that the limits act as
expected. I also verified that lowering the limit reduces the number of
bytes read on disk (at the granularity of a single scan batch at a
time).

Note: this patch does not implement the behavior for scan tokens.

Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
7 files changed, 188 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/9790/3
-- 
To view, visit http://gerrit.cloudera.org:8080/9790
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d
Gerrit-Change-Number: 9790
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] KUDU-16 pt 2: add client-side limits on scanners

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9790 )

Change subject: KUDU-16 pt 2: add client-side limits on scanners
......................................................................


Patch Set 4:

> Patch Set 1:
> 
> Haven't fully reviewed this patch, just looked at the tests, overall.
> I think we need a more randomized test that scans from both memrowsets and diskrowsets and has arbitrary limits.

Done


-- 
To view, visit http://gerrit.cloudera.org:8080/9790
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d
Gerrit-Change-Number: 9790
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Thu, 29 Mar 2018 23:25:13 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-16 pt 2: add client-side limits on scanners

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9790 )

Change subject: KUDU-16 pt 2: add client-side limits on scanners
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9790/4/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

http://gerrit.cloudera.org:8080/#/c/9790/4/src/kudu/client/client-test.cc@812
PS4, Line 812: TEST_F(ClientTest, TestRandomizedLimitScans) {
If I'm reading client-test correctly, this only sets up a table with two tablets, and the first tablet only has 9 rows in it. Perhaps we should have some test which uses hash partitioning and ensures that each tablet has more than one batch worth of rows?


http://gerrit.cloudera.org:8080/#/c/9790/4/src/kudu/client/scan_configuration.h
File src/kudu/client/scan_configuration.h:

http://gerrit.cloudera.org:8080/#/c/9790/4/src/kudu/client/scan_configuration.h@153
PS4, Line 153:     return has_limit_;
hm, why not use boost::optional here?



-- 
To view, visit http://gerrit.cloudera.org:8080/9790
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d
Gerrit-Change-Number: 9790
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 30 Mar 2018 02:07:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-16 pt 2: add client-side limits on scanners

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9790 )

Change subject: KUDU-16 pt 2: add client-side limits on scanners
......................................................................

KUDU-16 pt 2: add client-side limits on scanners

This patch adds a public API to allow the specification of
per-client-side-scanner limits on the number of rows returned. Each
scanner will maintain a count of the number of rows already read, and
adjust the server-side limit upon sending the next scan request.

A couple of tests are included to verify that the limits act as
expected. I also verified that lowering the limit reduces the number of
bytes read on disk (at the granularity of a single scan batch at a
time).

Note: this patch does not implement the behavior for scan tokens.

Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d
Reviewed-on: http://gerrit.cloudera.org:8080/9790
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <to...@apache.org>
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
7 files changed, 184 insertions(+), 39 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Todd Lipcon: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/9790
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d
Gerrit-Change-Number: 9790
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>