You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Aman Sinha (Jira)" <ji...@apache.org> on 2021/02/12 18:05:00 UTC
[jira] [Resolved] (IMPALA-4805) Avoid hash exchanges before
analytic functions in more situations.
[ https://issues.apache.org/jira/browse/IMPALA-4805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aman Sinha resolved IMPALA-4805.
--------------------------------
Fix Version/s: Impala 4.0
Resolution: Fixed
> Avoid hash exchanges before analytic functions in more situations.
> ------------------------------------------------------------------
>
> Key: IMPALA-4805
> URL: https://issues.apache.org/jira/browse/IMPALA-4805
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 2.8.0
> Reporter: Alexander Behm
> Assignee: Aman Sinha
> Priority: Major
> Labels: performance, ramp-up
> Fix For: Impala 4.0
>
>
> This case works as expected. There is no no hash exchange before sort+analytic:
> {code}
> explain select /* +straight_join */ count(*) over (partition by t1.id)
> from functional.alltypes t1
> inner join /* +shuffle */ functional.alltypes t2
> on t1.id = t2.id
> +-----------------------------------------------------------+
> | Explain String |
> +-----------------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=168.01MB VCores=2 |
> | |
> | PLAN-ROOT SINK |
> | | |
> | 07:EXCHANGE [UNPARTITIONED] |
> | | |
> | 04:ANALYTIC |
> | | functions: count(*) |
> | | partition by: t1.id |
> | | |
> | 03:SORT |
> | | order by: id ASC NULLS FIRST |
> | | |
> | 02:HASH JOIN [INNER JOIN, PARTITIONED] |
> | | hash predicates: t1.id = t2.id |
> | | runtime filters: RF000 <- t2.id |
> | | |
> | |--06:EXCHANGE [HASH(t2.id)] |
> | | | |
> | | 01:SCAN HDFS [functional.alltypes t2] |
> | | partitions=24/24 files=24 size=478.45KB |
> | | |
> | 05:EXCHANGE [HASH(t1.id)] |
> | | |
> | 00:SCAN HDFS [functional.alltypes t1] |
> | partitions=24/24 files=24 size=478.45KB |
> | runtime filters: RF000 -> t1.id |
> +-----------------------------------------------------------+
> {code}
> This equivalent case has an unnecessary hash exchange:
> {code}
> explain select /* +straight_join */ count(*) over (partition by t2.id)
> from functional.alltypes t1
> inner join /* +shuffle */ functional.alltypes t2
> on t1.id = t2.id
> +-----------------------------------------------------------+
> | Explain String |
> +-----------------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=168.01MB VCores=3 |
> | |
> | PLAN-ROOT SINK |
> | | |
> | 08:EXCHANGE [UNPARTITIONED] |
> | | |
> | 04:ANALYTIC |
> | | functions: count(*) |
> | | partition by: t2.id |
> | | |
> | 03:SORT |
> | | order by: id ASC NULLS FIRST |
> | | |
> | 07:EXCHANGE [HASH(t2.id)] |
> | | |
> | 02:HASH JOIN [INNER JOIN, PARTITIONED] |
> | | hash predicates: t1.id = t2.id |
> | | runtime filters: RF000 <- t2.id |
> | | |
> | |--06:EXCHANGE [HASH(t2.id)] |
> | | | |
> | | 01:SCAN HDFS [functional.alltypes t2] |
> | | partitions=24/24 files=24 size=478.45KB |
> | | |
> | 05:EXCHANGE [HASH(t1.id)] |
> | | |
> | 00:SCAN HDFS [functional.alltypes t1] |
> | partitions=24/24 files=24 size=478.45KB |
> | runtime filters: RF000 -> t1.id |
> +-----------------------------------------------------------+
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)