You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/12/15 19:58:00 UTC
[jira] [Resolved] (IMPALA-1728) sub-query with duplicate values
used IN conditional operator should discard the duplicate values before
applying the operator
[ https://issues.apache.org/jira/browse/IMPALA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong resolved IMPALA-1728.
-----------------------------------
Resolution: Duplicate
IMPALA-1270 implemented this and changed the plan for TPC-DS Q95
> sub-query with duplicate values used IN conditional operator should discard the duplicate values before applying the operator
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-1728
> URL: https://issues.apache.org/jira/browse/IMPALA-1728
> Project: IMPALA
> Issue Type: New Feature
> Components: Frontend
> Affects Versions: Impala 2.0, Impala 2.1
> Reporter: Dileep Kumar
> Priority: Minor
> Labels: performance, planner, tpc-ds
> Attachments: q95.sql, q95.sql.DISTINCT
>
>
> When running the TPC-DS Q95 we found that it usages a result of CTE in IN conditional later in query.
> In this case CTE generates too many duplicate values for the same column which is used in conditional. When applied the DISTINCT to CTE it took 40% less time to complete.
> The timings(in Sec.) are as:
> Without DISTINCT : 1240
> With DISTINCT : 728
> Both versions of the query are attached.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)