You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2021/03/24 13:32:00 UTC
[jira] [Updated] (HIVE-24930) Operator.setDone() short-circuit from
child op is not used in vectorized codepath (childSize == 1)
[ https://issues.apache.org/jira/browse/HIVE-24930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
László Bodor updated HIVE-24930:
--------------------------------
Description:
This looks like a possible performance regression in case of limit, considering the following query:
{code}
explain vectorization detail select
ws_item_sk item_sk, d_date,
sum(ws_sales_price) over (partition by ws_item_sk order by d_date range between 10 preceding and current row) cume_sales,
last_value(ws_sales_price) over (partition by ws_item_sk order by d_date range between 10 preceding and current row) last_price
from web_sales
,date_dim
where ws_sold_date_sk=d_date_sk
and d_month_seq between 1214 and 1214+11
and ws_item_sk is not NULL
group by ws_item_sk, d_date, ws_sales_price
limit 100;
{code}
non-vectorized:
{code}
set hive.vectorized.execution.ptf.enabled=false;
...
| Select Operator |
| Statistics: Num rows: 1415172503/1439591782 Data size: 248969569264 Basic stats: COMPLETE Column stats: COMPLETE |
| PTF Operator |
| Statistics: Num rows: 1415172503/449131 Data size: 248969569264 Basic stats: COMPLETE Column stats: COMPLETE |
| Select Operator |
| Statistics: Num rows: 1415172503/11526 Data size: 565867418560 Basic stats: COMPLETE Column stats: COMPLETE |
{code}
vectorized:
{code}
set hive.vectorized.execution.ptf.enabled=true;
...
| Select Operator |
| Statistics: Num rows: 1415172503/1439591782 Data size: 248969569264 Basic stats: COMPLETE Column stats: COMPLETE |
| PTF Operator |
| Statistics: Num rows: 1415172503/1439591782 Data size: 248969569264 Basic stats: COMPLETE Column stats: COMPLETE |
| Select Operator |
| Statistics: Num rows: 1415172503/1439591782 Data size: 565867418560 Basic stats: COMPLETE Column stats: COMPLETE |
| File Output Operator |
| Statistics: Num rows: 100/11300 Data size: 40000 Basic stats: COMPLETE Column stats: COMPLETE |
{code}
> Operator.setDone() short-circuit from child op is not used in vectorized codepath (childSize == 1)
> --------------------------------------------------------------------------------------------------
>
> Key: HIVE-24930
> URL: https://issues.apache.org/jira/browse/HIVE-24930
> Project: Hive
> Issue Type: Bug
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
>
> This looks like a possible performance regression in case of limit, considering the following query:
> {code}
> explain vectorization detail select
> ws_item_sk item_sk, d_date,
> sum(ws_sales_price) over (partition by ws_item_sk order by d_date range between 10 preceding and current row) cume_sales,
> last_value(ws_sales_price) over (partition by ws_item_sk order by d_date range between 10 preceding and current row) last_price
> from web_sales
> ,date_dim
> where ws_sold_date_sk=d_date_sk
> and d_month_seq between 1214 and 1214+11
> and ws_item_sk is not NULL
> group by ws_item_sk, d_date, ws_sales_price
> limit 100;
> {code}
> non-vectorized:
> {code}
> set hive.vectorized.execution.ptf.enabled=false;
> ...
> | Select Operator |
> | Statistics: Num rows: 1415172503/1439591782 Data size: 248969569264 Basic stats: COMPLETE Column stats: COMPLETE |
> | PTF Operator |
> | Statistics: Num rows: 1415172503/449131 Data size: 248969569264 Basic stats: COMPLETE Column stats: COMPLETE |
> | Select Operator |
> | Statistics: Num rows: 1415172503/11526 Data size: 565867418560 Basic stats: COMPLETE Column stats: COMPLETE |
> {code}
> vectorized:
> {code}
> set hive.vectorized.execution.ptf.enabled=true;
> ...
> | Select Operator |
> | Statistics: Num rows: 1415172503/1439591782 Data size: 248969569264 Basic stats: COMPLETE Column stats: COMPLETE |
> | PTF Operator |
> | Statistics: Num rows: 1415172503/1439591782 Data size: 248969569264 Basic stats: COMPLETE Column stats: COMPLETE |
> | Select Operator |
> | Statistics: Num rows: 1415172503/1439591782 Data size: 565867418560 Basic stats: COMPLETE Column stats: COMPLETE |
> | File Output Operator |
> | Statistics: Num rows: 100/11300 Data size: 40000 Basic stats: COMPLETE Column stats: COMPLETE |
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)