You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2016/12/14 08:41:18 UTC

[kudu-CR] KUDU-1806. java: fetching scan tokens should fetch larger batches

Todd Lipcon has uploaded a new patch set (#2).

Change subject: KUDU-1806. java: fetching scan tokens should fetch larger batches
......................................................................

KUDU-1806. java: fetching scan tokens should fetch larger batches

This changes the number of tablets fetched in a single GetTableLocations
RPC from 10 to 1000. On a stress test on 200 nodes with 40 concurrent
query streams, this substantially reduced the time spent in scan token
generation by the Impala planner.

Initially, I was going to try to set the fetch size parameter in a
context-dependent way, fetching few locations for point lookups (i.e.
writes) and many locations for scans. However, it turns out that point
lookups already set an 'endPartitionKey' parameter in the RPC, and the
master will stop returning locations after reaching this point. So,
setting a batch size of 1000 for such queries will not change the amount
of work done by the master for point queries.

Although this will slightly increase the amount of work done by a
GetTableLocations RPC, my guess is that the majority of the RPC cost is
dominated by fixed per-RPC costs and not the linear cost based on the
number of tablets. This is especially true when taking into account the
typical RTT within a large/busy cluster (~1ms). So, it is a lot cheaper,
both in wall clock and total resources consumed, to process one larger
RPC rather than tens or hundreds of small ones.

In addition to changing the constant, this test also modifies the scan
token generation test case to set the fetch size down to a low value.
This ensures that the code path to go back and fetch more locations is
still exercised, rather than always fetching all of the tablets in one
RPC.

Change-Id: I46260a96dfd0847f70146496e48c2766b8e17ea9
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
2 files changed, 42 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/98/5498/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I46260a96dfd0847f70146496e48c2766b8e17ea9
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>