You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/08/31 18:35:00 UTC

[jira] [Commented] (IMPALA-10099) Push down DISTINCT aggregation for EXCEPT/INTERSECT

    [ https://issues.apache.org/jira/browse/IMPALA-10099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17187945#comment-17187945 ] 

ASF subversion and git services commented on IMPALA-10099:
----------------------------------------------------------

Commit 827070b473c02da480f3a9d77c59f7031f9070c2 in impala's branch refs/heads/master from Shant Hovsepian
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=827070b ]

IMPALA-10099: Push down DISTINCT in Set operations

INTERSECT/EXCEPT are not duplicate preserving operations. The distinct
aggregations can happen in each operand, the leftmost operand only, or
after all the operands in a separate aggregation step. Except for a
couple special cases we would use the last strategy most often.

This change pushes the distinct aggregation down to the leftmost operand
in cases where there are no analytic functions, or when a distinct or
grouping operation already eliminates duplicates.

In general DISTINCT placement such as in this case should be done
throughout the entire plan tree in a cost based manner as described in
IMPALA-5260

Testing:
 * TpcdsPlannerTest
 * PlannerTest
 * TPC-DS 30TB Perf run for any affected queries
   - Q14-1 180s -> 150s
   - Q14-2 109s -> 90s
   - Q8 no significant change
 * SetOperation Planner Tests
 * Analyzer tests
 * Tpcds Functional Workload

Change-Id: Ia248f1595df2ab48fbe70c778c7c32bde5c518a5
Reviewed-on: http://gerrit.cloudera.org:8080/16350
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Tim Armstrong <ta...@cloudera.com>


> Push down DISTINCT aggregation for EXCEPT/INTERSECT
> ---------------------------------------------------
>
>                 Key: IMPALA-10099
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10099
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Shant Hovsepian
>            Assignee: Shant Hovsepian
>            Priority: Major
>
> The implementation of SetOperations for EXCEPT/INTERSECT in IMPALA-9943 produced query rewrites that would apply DISTINCT aggregation after exchanges for distributed plans. In case where the query can be directly rewritten to apply the DISTINCT to the set operation operands would result in better performance for most large queries.
> This should help the performance TPC-DS Q14 which does an INTERSECT of queries with large result sets that contain many duplicates.
> In general it would better to have DISTINCT move around optimization phase during planning which would handle this case as well as many others.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org