You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2021/10/05 04:46:08 UTC

[kudu-CR] [master] update on max returned locations semantics

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17902


Change subject: [master] update on max_returned_locations semantics
......................................................................

[master] update on max_returned_locations semantics

While looking at the partition-related code, I noticed that
GetTableLocations RPC requires the 'max_returned_locations' field to be
always set, even if it's necessary to fetch information on all tablets
in a table.  This patch updates the implementation of the
GetTableLocations RPC to output information on every tablet in a table
if the GetTableLocationsRequestPB::max_returned_locations field isn't
set.  This patch also contains a unit test for the new functionality.

Change-Id: Iebc2b2295576c0b1b637fdeb95d4f9ff7b4d51c7
---
M src/kudu/integration-tests/create-table-stress-test.cc
M src/kudu/integration-tests/table_locations-itest.cc
M src/kudu/master/catalog_manager-test.cc
M src/kudu/master/catalog_manager.cc
4 files changed, 63 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/02/17902/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iebc2b2295576c0b1b637fdeb95d4f9ff7b4d51c7
Gerrit-Change-Number: 17902
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] [master] update on max returned locations semantics

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17902 )

Change subject: [master] update on max_returned_locations semantics
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iebc2b2295576c0b1b637fdeb95d4f9ff7b4d51c7
Gerrit-Change-Number: 17902
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 07 Oct 2021 20:30:18 +0000
Gerrit-HasComments: No

[kudu-CR] [master] update on max returned locations semantics

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17902 )

Change subject: [master] update on max_returned_locations semantics
......................................................................


Patch Set 1: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17902/1/src/kudu/master/catalog_manager-test.cc
File src/kudu/master/catalog_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/17902/1/src/kudu/master/catalog_manager-test.cc@222
PS1, Line 222:     req.clear_max_returned_locations(); // the default is 10 in protobuf
I'm curious, are we expecting to actually clear this value for our clients? Especially for huge ever-growing fact tables, this seems like it could yield huge responses



-- 
To view, visit http://gerrit.cloudera.org:8080/17902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iebc2b2295576c0b1b637fdeb95d4f9ff7b4d51c7
Gerrit-Change-Number: 17902
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 05 Oct 2021 08:24:58 +0000
Gerrit-HasComments: Yes

[kudu-CR] [master] update on max returned locations semantics

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17902

to look at the new patch set (#2).

Change subject: [master] update on max_returned_locations semantics
......................................................................

[master] update on max_returned_locations semantics

While looking at the partition-related code, I noticed that
GetTableLocations RPC requires the 'max_returned_locations' field to be
always set, even if it's necessary to fetch information on all tablets
in a table.  This patch updates the implementation of the
GetTableLocations RPC to output information on every tablet in a table
if the GetTableLocationsRequestPB::max_returned_locations field isn't
set.  This patch also contains a unit test for the new functionality.

The primary incentive for this update was getting rid of the hard-coded
values in test scenarios like we had in table_locations-itest.cc (see
the diff for TableLocationsTest.TestRangeSpecificHashing as a part of
this changelist).  That hard-coding the expected limit on the number of
returned locations for the 'max_returned_locations' field might hide a
bug if there are more tablets actually created, and making it possible
to request getting information on all the existing tablets in the API
looked cleaner than introducing some hard-coded values like INT_MAX in
the mentioned test scenario.  As for the Kudu client library code, the
'max_returned_locations' field is still set to some heuristic values
based on the type of tablet lookup performed, so there shouldn't be a
huge amount of metadata returned for large fact tables in case of a
real-life use case.

Change-Id: Iebc2b2295576c0b1b637fdeb95d4f9ff7b4d51c7
---
M src/kudu/integration-tests/create-table-stress-test.cc
M src/kudu/integration-tests/table_locations-itest.cc
M src/kudu/master/catalog_manager-test.cc
M src/kudu/master/catalog_manager.cc
4 files changed, 63 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/02/17902/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iebc2b2295576c0b1b637fdeb95d4f9ff7b4d51c7
Gerrit-Change-Number: 17902
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] [master] update on max returned locations semantics

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17902 )

Change subject: [master] update on max_returned_locations semantics
......................................................................

[master] update on max_returned_locations semantics

While looking at the partition-related code, I noticed that
GetTableLocations RPC requires the 'max_returned_locations' field to be
always set, even if it's necessary to fetch information on all tablets
in a table.  This patch updates the implementation of the
GetTableLocations RPC to output information on every tablet in a table
if the GetTableLocationsRequestPB::max_returned_locations field isn't
set.  This patch also contains a unit test for the new functionality.

The primary incentive for this update was getting rid of the hard-coded
values in test scenarios like we had in table_locations-itest.cc (see
the diff for TableLocationsTest.TestRangeSpecificHashing as a part of
this changelist).  That hard-coding the expected limit on the number of
returned locations for the 'max_returned_locations' field might hide a
bug if there are more tablets actually created, and making it possible
to request getting information on all the existing tablets in the API
looked cleaner than introducing some hard-coded values like INT_MAX in
the mentioned test scenario.  As for the Kudu client library code, the
'max_returned_locations' field is still set to some heuristic values
based on the type of tablet lookup performed, so there shouldn't be a
huge amount of metadata returned for large fact tables in case of a
real-life use case.

Change-Id: Iebc2b2295576c0b1b637fdeb95d4f9ff7b4d51c7
Reviewed-on: http://gerrit.cloudera.org:8080/17902
Tested-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Alexey Serbin <as...@cloudera.com>
---
M src/kudu/integration-tests/create-table-stress-test.cc
M src/kudu/integration-tests/table_locations-itest.cc
M src/kudu/master/catalog_manager-test.cc
M src/kudu/master/catalog_manager.cc
4 files changed, 63 insertions(+), 6 deletions(-)

Approvals:
  Alexey Serbin: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iebc2b2295576c0b1b637fdeb95d4f9ff7b4d51c7
Gerrit-Change-Number: 17902
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] [master] update on max returned locations semantics

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed a vote on this change.

Change subject: [master] update on max_returned_locations semantics
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/17902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Iebc2b2295576c0b1b637fdeb95d4f9ff7b4d51c7
Gerrit-Change-Number: 17902
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] [master] update on max returned locations semantics

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17902 )

Change subject: [master] update on max_returned_locations semantics
......................................................................


Patch Set 2: Verified+1

unrelated test failure in org.apache.kudu.client.TestKuduMetrics


-- 
To view, visit http://gerrit.cloudera.org:8080/17902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iebc2b2295576c0b1b637fdeb95d4f9ff7b4d51c7
Gerrit-Change-Number: 17902
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 07 Oct 2021 20:38:11 +0000
Gerrit-HasComments: No

[kudu-CR] [master] update on max returned locations semantics

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17902 )

Change subject: [master] update on max_returned_locations semantics
......................................................................


Patch Set 3: Verified+1

unrelated test failures:
  * WebserverCrawlITest.TestAllWebPages
  * DeleteTabletITest.TestLeaderElectionDuringDeleteTablet


-- 
To view, visit http://gerrit.cloudera.org:8080/17902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iebc2b2295576c0b1b637fdeb95d4f9ff7b4d51c7
Gerrit-Change-Number: 17902
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 08 Oct 2021 03:52:48 +0000
Gerrit-HasComments: No

[kudu-CR] [master] update on max returned locations semantics

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17902 )

Change subject: [master] update on max_returned_locations semantics
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17902/1/src/kudu/master/catalog_manager-test.cc
File src/kudu/master/catalog_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/17902/1/src/kudu/master/catalog_manager-test.cc@222
PS1, Line 222:     req.clear_max_returned_locations(); // the default is 10 in protobuf
> I'm curious, are we expecting to actually clear this value for our clients?
The primary incentive for this change was getting rid of the hard-coded values in test scenarios like one in table_locations-itest.cc (it's a part of this changelist).  The idea is that hard-coding the expected limit on the number of returned locations for the 'max_returned_locations' field might hide a bug if there are more tablets actually created.

Also, it looked a bit strange to try getting the field's value for those checks in catalog_manager.cc without first verifying the field is set at all.

As for the code in currently existing Kudu clients, the code is always using some heuristics to fetch up to some limit of locations: https://github.com/apache/kudu/blob/7b0a94d72de4475c6e6c3d7d78569403407cb278/src/kudu/client/meta_cache.h#L72-L77

I updated the commit description to be more clear what was the incentive behind this patch.



-- 
To view, visit http://gerrit.cloudera.org:8080/17902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iebc2b2295576c0b1b637fdeb95d4f9ff7b4d51c7
Gerrit-Change-Number: 17902
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 05 Oct 2021 17:55:59 +0000
Gerrit-HasComments: Yes

[kudu-CR] [master] update on max returned locations semantics

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17902 )

Change subject: [master] update on max_returned_locations semantics
......................................................................


Patch Set 3: Code-Review+2

Carrying over Andrew's +2 from PS2.


-- 
To view, visit http://gerrit.cloudera.org:8080/17902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iebc2b2295576c0b1b637fdeb95d4f9ff7b4d51c7
Gerrit-Change-Number: 17902
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 08 Oct 2021 03:53:11 +0000
Gerrit-HasComments: No

[kudu-CR] [master] update on max returned locations semantics

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed a vote on this change.

Change subject: [master] update on max_returned_locations semantics
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/17902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Iebc2b2295576c0b1b637fdeb95d4f9ff7b4d51c7
Gerrit-Change-Number: 17902
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)