You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2021/03/24 13:32:00 UTC

[jira] [Updated] (HIVE-24930) Operator.setDone() short-circuit from child op is not used in vectorized codepath (childSize == 1)

     [ https://issues.apache.org/jira/browse/HIVE-24930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

László Bodor updated HIVE-24930:
--------------------------------
    Description: 
This looks like a possible performance regression in case of limit, considering the following query:
{code}
explain vectorization detail select
  ws_item_sk item_sk, d_date,
  sum(ws_sales_price) over (partition by ws_item_sk order by d_date range between 10 preceding and current row) cume_sales,
  last_value(ws_sales_price) over (partition by ws_item_sk order by d_date range between 10 preceding and current row) last_price
from web_sales
    ,date_dim
where ws_sold_date_sk=d_date_sk
  and d_month_seq between 1214 and 1214+11
  and ws_item_sk is not NULL
group by ws_item_sk, d_date, ws_sales_price
limit 100;
{code}

non-vectorized:
{code}
set hive.vectorized.execution.ptf.enabled=false;
...
|               Select Operator                      |
|                 Statistics: Num rows: 1415172503/1439591782 Data size: 248969569264 Basic stats: COMPLETE Column stats: COMPLETE |
|                 PTF Operator                       |
|                   Statistics: Num rows: 1415172503/449131 Data size: 248969569264 Basic stats: COMPLETE Column stats: COMPLETE |
|                   Select Operator                  |
|                     Statistics: Num rows: 1415172503/11526 Data size: 565867418560 Basic stats: COMPLETE Column stats: COMPLETE |
{code}

vectorized:
{code}
set hive.vectorized.execution.ptf.enabled=true;
...
|               Select Operator                      |
|                 Statistics: Num rows: 1415172503/1439591782 Data size: 248969569264 Basic stats: COMPLETE Column stats: COMPLETE |
|                 PTF Operator                       |
|                   Statistics: Num rows: 1415172503/1439591782 Data size: 248969569264 Basic stats: COMPLETE Column stats: COMPLETE |
|                   Select Operator                  |
|                     Statistics: Num rows: 1415172503/1439591782 Data size: 565867418560 Basic stats: COMPLETE Column stats: COMPLETE |
|                       File Output Operator         |
|                         Statistics: Num rows: 100/11300 Data size: 40000 Basic stats: COMPLETE Column stats: COMPLETE |
{code}

> Operator.setDone() short-circuit from child op is not used in vectorized codepath (childSize == 1)
> --------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-24930
>                 URL: https://issues.apache.org/jira/browse/HIVE-24930
>             Project: Hive
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>
> This looks like a possible performance regression in case of limit, considering the following query:
> {code}
> explain vectorization detail select
>   ws_item_sk item_sk, d_date,
>   sum(ws_sales_price) over (partition by ws_item_sk order by d_date range between 10 preceding and current row) cume_sales,
>   last_value(ws_sales_price) over (partition by ws_item_sk order by d_date range between 10 preceding and current row) last_price
> from web_sales
>     ,date_dim
> where ws_sold_date_sk=d_date_sk
>   and d_month_seq between 1214 and 1214+11
>   and ws_item_sk is not NULL
> group by ws_item_sk, d_date, ws_sales_price
> limit 100;
> {code}
> non-vectorized:
> {code}
> set hive.vectorized.execution.ptf.enabled=false;
> ...
> |               Select Operator                      |
> |                 Statistics: Num rows: 1415172503/1439591782 Data size: 248969569264 Basic stats: COMPLETE Column stats: COMPLETE |
> |                 PTF Operator                       |
> |                   Statistics: Num rows: 1415172503/449131 Data size: 248969569264 Basic stats: COMPLETE Column stats: COMPLETE |
> |                   Select Operator                  |
> |                     Statistics: Num rows: 1415172503/11526 Data size: 565867418560 Basic stats: COMPLETE Column stats: COMPLETE |
> {code}
> vectorized:
> {code}
> set hive.vectorized.execution.ptf.enabled=true;
> ...
> |               Select Operator                      |
> |                 Statistics: Num rows: 1415172503/1439591782 Data size: 248969569264 Basic stats: COMPLETE Column stats: COMPLETE |
> |                 PTF Operator                       |
> |                   Statistics: Num rows: 1415172503/1439591782 Data size: 248969569264 Basic stats: COMPLETE Column stats: COMPLETE |
> |                   Select Operator                  |
> |                     Statistics: Num rows: 1415172503/1439591782 Data size: 565867418560 Basic stats: COMPLETE Column stats: COMPLETE |
> |                       File Output Operator         |
> |                         Statistics: Num rows: 100/11300 Data size: 40000 Basic stats: COMPLETE Column stats: COMPLETE |
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)