You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/10/06 09:51:00 UTC

[jira] [Updated] (HIVE-24207) LimitOperator can leverage ObjectCache to bail out quickly

     [ https://issues.apache.org/jira/browse/HIVE-24207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated HIVE-24207:
----------------------------------
    Labels: pull-request-available  (was: )

> LimitOperator can leverage ObjectCache to bail out quickly
> ----------------------------------------------------------
>
>                 Key: HIVE-24207
>                 URL: https://issues.apache.org/jira/browse/HIVE-24207
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> {noformat}
> select  ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk limit 100;
>  select distinct ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk limit 100;
>  {noformat}
> Queries like the above generate a large number of map tasks. Currently they don't bail out after generating enough amount of data. 
> It would be good to make use of ObjectCache & retain the number of records generated. LimitOperator/VectorLimitOperator can bail out for the later tasks in the operator's init phase itself. 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java#L57
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L58



--
This message was sent by Atlassian Jira
(v8.3.4#803005)