You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/07/30 00:17:00 UTC

[jira] [Commented] (KUDU-1802) Deserializing scan tokens should avoid round-trip to master

    [ https://issues.apache.org/jira/browse/KUDU-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167570#comment-17167570 ] 

ASF subversion and git services commented on KUDU-1802:
-------------------------------------------------------

Commit 5ad5d3d6606bbca33fc5909f7a96c7fefd0299e0 in kudu's branch refs/heads/master from Grant Henke
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=5ad5d3d ]

KUDU-1802: Add KuduScanner.GetKuduTable

This patch adds the ability to get the KuduTable instance from the
KuduScanner. This allows the user to use the KuduTable instance
populated by the scan token instead of making a GetTableSchema
call to the master.

The complete KuduTable is passed to the KuduScanner in the
constructor, this patch allows it to be re-used by the consumer of the
scanner. It's literally a passthrough, but it helps when the KuduTable
was constructed internally (e.g via a ScanToken).

Change-Id: I09a80c6d499987553aef1338db93397a1de2491e
Reviewed-on: http://gerrit.cloudera.org:8080/16251
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>


> Deserializing scan tokens should avoid round-trip to master
> -----------------------------------------------------------
>
>                 Key: KUDU-1802
>                 URL: https://issues.apache.org/jira/browse/KUDU-1802
>             Project: Kudu
>          Issue Type: Improvement
>          Components: client, perf
>    Affects Versions: 1.2.0
>            Reporter: Todd Lipcon
>            Assignee: Grant Henke
>            Priority: Major
>              Labels: impala, ramp-up
>             Fix For: 1.13.0
>
>
> Currently, KuduScanToken::DeserializeIntoScanner calls KuduClient::OpenTable() which makes a GetTableSchema call to the master. This round trip is a bit expensive because it's always a "thundering herd" for an Impala query or Spark job -- every host deserializes a bunch of scan tokens at the same time and ends up having to back off.
> We should consider some ways to avoid this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)