You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zoltan Haindrich (Jira)" <ji...@apache.org> on 2019/10/17 15:10:00 UTC

[jira] [Created] (HIVE-22363) ReduceDeduplication may leave an invalid GroupByOperator behind in some cases

Zoltan Haindrich created HIVE-22363:
---------------------------------------

             Summary: ReduceDeduplication may leave an invalid GroupByOperator behind in some cases
                 Key: HIVE-22363
                 URL: https://issues.apache.org/jira/browse/HIVE-22363
             Project: Hive
          Issue Type: Bug
          Components: Physical Optimizer
    Affects Versions: 3.1.2
            Reporter: Zoltan Haindrich
            Assignee: Zoltan Haindrich


since HIVE-11387 reducededup may traverse {{GroupByOperators}} [as well|https://github.com/apache/hive/blob/c6626edb65c2cd00576647e54db1995628fe64da/ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java#L244]

But the removal logic only removes the first parent; so if there is some other operator (a FIL in this case) between the sink and the gby - the removal may not happen [here|https://github.com/apache/hive/blob/c6626edb65c2cd00576647e54db1995628fe64da/ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java#L458]

{code}
set hive.cbo.enable=false;

drop table if exists xl1;
create table xl1 as
select '1' as mdl_yr_desc, 2 as seq_no,'3' as opt_desc1,4 as opt_desc,1 as row_num;

explain
select trim(base.mdl_yr_desc) mdl_yr_desc, trim(base.opt_desc) opt_desc
from
(
    SELECT trim(mdl_yr_desc) mdl_yr_desc, concat_ws(' ', collect_set(trim(opt_desc1))) AS opt_desc
    from
    (
        select t14304.* 
        from
        (
            select * from xl1
        ) t14304  
        where row_num = 1
        order by trim(mdl_yr_desc), cast(seq_no as int) asc
    ) x
    group by trim(mdl_yr_desc)
) base
inner join
    (
        select 1 as v
    ) dedup
    on  trim(base.mdl_yr_desc) != dedup.v
group by trim(base.mdl_yr_desc), trim(base.opt_desc) ;
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)