You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zoltan Haindrich (Jira)" <ji...@apache.org> on 2019/10/17 15:10:00 UTC
[jira] [Created] (HIVE-22363) ReduceDeduplication may leave an
invalid GroupByOperator behind in some cases
Zoltan Haindrich created HIVE-22363:
---------------------------------------
Summary: ReduceDeduplication may leave an invalid GroupByOperator behind in some cases
Key: HIVE-22363
URL: https://issues.apache.org/jira/browse/HIVE-22363
Project: Hive
Issue Type: Bug
Components: Physical Optimizer
Affects Versions: 3.1.2
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich
since HIVE-11387 reducededup may traverse {{GroupByOperators}} [as well|https://github.com/apache/hive/blob/c6626edb65c2cd00576647e54db1995628fe64da/ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java#L244]
But the removal logic only removes the first parent; so if there is some other operator (a FIL in this case) between the sink and the gby - the removal may not happen [here|https://github.com/apache/hive/blob/c6626edb65c2cd00576647e54db1995628fe64da/ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java#L458]
{code}
set hive.cbo.enable=false;
drop table if exists xl1;
create table xl1 as
select '1' as mdl_yr_desc, 2 as seq_no,'3' as opt_desc1,4 as opt_desc,1 as row_num;
explain
select trim(base.mdl_yr_desc) mdl_yr_desc, trim(base.opt_desc) opt_desc
from
(
SELECT trim(mdl_yr_desc) mdl_yr_desc, concat_ws(' ', collect_set(trim(opt_desc1))) AS opt_desc
from
(
select t14304.*
from
(
select * from xl1
) t14304
where row_num = 1
order by trim(mdl_yr_desc), cast(seq_no as int) asc
) x
group by trim(mdl_yr_desc)
) base
inner join
(
select 1 as v
) dedup
on trim(base.mdl_yr_desc) != dedup.v
group by trim(base.mdl_yr_desc), trim(base.opt_desc) ;
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)