You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "Dan Burkert (JIRA)" <ji...@apache.org> on 2016/12/13 04:39:58 UTC

[jira] [Commented] (KUDU-1806) Creating a list of scan tokens should retrieve tablets in larger batches

    [ https://issues.apache.org/jira/browse/KUDU-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744135#comment-15744135 ] 

Dan Burkert commented on KUDU-1806:
-----------------------------------

We could probably get pretty far with a simple heuristic here without having to extend the meta cache APIs:

If the request is for a specific partition key, then request 10 locations, since it's probably for an individual write. If the request is for the empty partition key, then request 1k, since it's probably the start of a table scan or scan token operation.

> Creating a list of scan tokens should retrieve tablets in larger batches
> ------------------------------------------------------------------------
>
>                 Key: KUDU-1806
>                 URL: https://issues.apache.org/jira/browse/KUDU-1806
>             Project: Kudu
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.2.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>
> In a test on a 200-node cluster with 40 concurrent query streams, we found that the Impala planner was sometimes taking minutes to fetch the list of scan tokens. The tables in the query had several thousand tablets, so with the default batch size of 10 tablets per GetTableLocations RPC, the planning required hundreds of round trips, each of which had some chance of getting bumped from the queue due to backpressure, etc.
> A local hack to change the batching to 1000 tablets per RPC reduced the planning times down to sub-second.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)