You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Aman Sinha (Jira)" <ji...@apache.org> on 2021/02/12 18:05:00 UTC

[jira] [Resolved] (IMPALA-4805) Avoid hash exchanges before analytic functions in more situations.

     [ https://issues.apache.org/jira/browse/IMPALA-4805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aman Sinha resolved IMPALA-4805.
--------------------------------
    Fix Version/s: Impala 4.0
       Resolution: Fixed

> Avoid hash exchanges before analytic functions in more situations.
> ------------------------------------------------------------------
>
>                 Key: IMPALA-4805
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4805
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 2.8.0
>            Reporter: Alexander Behm
>            Assignee: Aman Sinha
>            Priority: Major
>              Labels: performance, ramp-up
>             Fix For: Impala 4.0
>
>
> This case works as expected. There is no no hash exchange before sort+analytic:
> {code}
> explain select /* +straight_join */ count(*) over (partition by t1.id)
> from functional.alltypes t1
> inner join /* +shuffle */ functional.alltypes t2
>   on t1.id = t2.id
> +-----------------------------------------------------------+
> | Explain String                                            |
> +-----------------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=168.01MB VCores=2 |
> |                                                           |
> | PLAN-ROOT SINK                                            |
> | |                                                         |
> | 07:EXCHANGE [UNPARTITIONED]                               |
> | |                                                         |
> | 04:ANALYTIC                                               |
> | |  functions: count(*)                                    |
> | |  partition by: t1.id                                    |
> | |                                                         |
> | 03:SORT                                                   |
> | |  order by: id ASC NULLS FIRST                           |
> | |                                                         |
> | 02:HASH JOIN [INNER JOIN, PARTITIONED]                    |
> | |  hash predicates: t1.id = t2.id                         |
> | |  runtime filters: RF000 <- t2.id                        |
> | |                                                         |
> | |--06:EXCHANGE [HASH(t2.id)]                              |
> | |  |                                                      |
> | |  01:SCAN HDFS [functional.alltypes t2]                  |
> | |     partitions=24/24 files=24 size=478.45KB             |
> | |                                                         |
> | 05:EXCHANGE [HASH(t1.id)]                                 |
> | |                                                         |
> | 00:SCAN HDFS [functional.alltypes t1]                     |
> |    partitions=24/24 files=24 size=478.45KB                |
> |    runtime filters: RF000 -> t1.id                        |
> +-----------------------------------------------------------+
> {code}
> This equivalent case has an unnecessary hash exchange:
> {code}
> explain select /* +straight_join */ count(*) over (partition by t2.id)
> from functional.alltypes t1
> inner join /* +shuffle */ functional.alltypes t2
>   on t1.id = t2.id
> +-----------------------------------------------------------+
> | Explain String                                            |
> +-----------------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=168.01MB VCores=3 |
> |                                                           |
> | PLAN-ROOT SINK                                            |
> | |                                                         |
> | 08:EXCHANGE [UNPARTITIONED]                               |
> | |                                                         |
> | 04:ANALYTIC                                               |
> | |  functions: count(*)                                    |
> | |  partition by: t2.id                                    |
> | |                                                         |
> | 03:SORT                                                   |
> | |  order by: id ASC NULLS FIRST                           |
> | |                                                         |
> | 07:EXCHANGE [HASH(t2.id)]                                 |
> | |                                                         |
> | 02:HASH JOIN [INNER JOIN, PARTITIONED]                    |
> | |  hash predicates: t1.id = t2.id                         |
> | |  runtime filters: RF000 <- t2.id                        |
> | |                                                         |
> | |--06:EXCHANGE [HASH(t2.id)]                              |
> | |  |                                                      |
> | |  01:SCAN HDFS [functional.alltypes t2]                  |
> | |     partitions=24/24 files=24 size=478.45KB             |
> | |                                                         |
> | 05:EXCHANGE [HASH(t1.id)]                                 |
> | |                                                         |
> | 00:SCAN HDFS [functional.alltypes t1]                     |
> |    partitions=24/24 files=24 size=478.45KB                |
> |    runtime filters: RF000 -> t1.id                        |
> +-----------------------------------------------------------+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)