You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Xu Yao (JIRA)" <ji...@apache.org> on 2018/06/20 03:48:00 UTC
[jira] [Updated] (KUDU-2437) Split a tablet into some chunks by
size
[ https://issues.apache.org/jira/browse/KUDU-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xu Yao updated KUDU-2437:
-------------------------
Summary: Split a tablet into some chunks by size (was: Generate ScanToken from small chunks in tablet)
> Split a tablet into some chunks by size
> ---------------------------------------
>
> Key: KUDU-2437
> URL: https://issues.apache.org/jira/browse/KUDU-2437
> Project: Kudu
> Issue Type: Improvement
> Components: client, master, tablet
> Reporter: Xu Yao
> Assignee: Xu Yao
> Priority: Major
>
> When reading data in a kudu table using spark, if there is a large amount of data in the tablet, reading the data takes a long time.
> The reason is that KuduRDD uses a tablet to generate the scanToken, so a spark task needs to process all the data in a tablet. So:
> # TS report the DRS bounds info to Master
> # Client get the bounds info from Master
> # Client generate the scanToken by bounds info of tablet(set LowerBoundPrimaryKey and UpperBoundPrimaryKey)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)