You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/10/06 09:51:00 UTC
[jira] [Updated] (HIVE-24207) LimitOperator can leverage
ObjectCache to bail out quickly
[ https://issues.apache.org/jira/browse/HIVE-24207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HIVE-24207:
----------------------------------
Labels: pull-request-available (was: )
> LimitOperator can leverage ObjectCache to bail out quickly
> ----------------------------------------------------------
>
> Key: HIVE-24207
> URL: https://issues.apache.org/jira/browse/HIVE-24207
> Project: Hive
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: László Bodor
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> {noformat}
> select ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk limit 100;
> select distinct ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk limit 100;
> {noformat}
> Queries like the above generate a large number of map tasks. Currently they don't bail out after generating enough amount of data.
> It would be good to make use of ObjectCache & retain the number of records generated. LimitOperator/VectorLimitOperator can bail out for the later tasks in the operator's init phase itself.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java#L57
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L58
--
This message was sent by Atlassian Jira
(v8.3.4#803005)