You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/12/15 19:58:00 UTC

[jira] [Resolved] (IMPALA-1728) sub-query with duplicate values used IN conditional operator should discard the duplicate values before applying the operator

     [ https://issues.apache.org/jira/browse/IMPALA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong resolved IMPALA-1728.
-----------------------------------
    Resolution: Duplicate

IMPALA-1270 implemented this and changed the plan for TPC-DS Q95

> sub-query with duplicate values used IN conditional operator should discard the duplicate values before applying the operator
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-1728
>                 URL: https://issues.apache.org/jira/browse/IMPALA-1728
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Frontend
>    Affects Versions: Impala 2.0, Impala 2.1
>            Reporter: Dileep Kumar
>            Priority: Minor
>              Labels: performance, planner, tpc-ds
>         Attachments: q95.sql, q95.sql.DISTINCT
>
>
> When running the TPC-DS Q95 we found that it usages a result of CTE in IN conditional later in query.
> In this case CTE generates too many duplicate values for the same column which is used in conditional. When applied the DISTINCT to CTE it took 40% less time to complete.
> The timings(in Sec.) are as:
> Without DISTINCT : 1240
> With DISTINCT : 728
> Both versions of the query are attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)