You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tianyi Wang (JIRA)" <ji...@apache.org> on 2017/11/21 01:42:00 UTC
[jira] [Resolved] (IMPALA-6213) The partitioning compatibility check is wrong in consecutive outer join cases

     [ https://issues.apache.org/jira/browse/IMPALA-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tianyi Wang resolved IMPALA-6213.
---------------------------------
       Resolution: Resolved
    Fix Version/s: Impala 2.11.0

> The partitioning compatibility check is wrong in consecutive outer join cases
> -----------------------------------------------------------------------------
>
>                 Key: IMPALA-6213
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6213
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
>            Reporter: Tianyi Wang
>            Assignee: Tianyi Wang
>            Priority: Blocker
>              Labels: correctness
>             Fix For: Impala 2.11.0
>
>
> Currently createAnalyticFragment() and createMergeAggregationFragment() uses child fragment partitioning info and refsNullableTupleId() to determine whether the child fragment partitioning can be directly adapted to the parent fragment without an extra exchange.
> It is wrong because:
> # The output partition of an outer join node is always assigned its lhs input partition, which is not correct for full/right outer joins.
> # refsNullableTupleId() seems to be designed to handle the outer join case, but can be broken by 2 consecutive joins.
> Given the query
> {noformat}
> select /* +straight_join */ t2.id, count(*)
> from functional.alltypes t1
> left outer join /* +shuffle */ functional.alltypessmall t2
>   on t1.int_col = t2.int_col
> right outer join /* +shuffle */ functional.alltypestiny t3
>   on t2.id = t3.id
> group by t2.id
> {noformat}
> impala@3ddafcd29505614a01c8f4362396635c84ab4052 generates the following plan:
> {noformat}
> +--------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=5.81MB |
> | Per-Host Resource Estimates: Memory=205.88MB     |
> | Codegen disabled by planner                      |
> |                                                  |
> | PLAN-ROOT SINK                                   |
> | |                                                |
> | 10:EXCHANGE [UNPARTITIONED]                      |
> | |                                                |
> | 05:AGGREGATE [FINALIZE]                          |
> | |  output: count(*)                              |
> | |  group by: t2.id                               |
> | |                                                |
> | 04:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]     |
> | |  hash predicates: t2.id = t3.id                |
> | |  runtime filters: RF000 <- t3.id               |
> | |                                                |
> | |--09:EXCHANGE [HASH(t3.id)]                     |
> | |  |                                             |
> | |  02:SCAN HDFS [functional.alltypestiny t3]     |
> | |     partitions=4/4 files=4 size=460B           |
> | |                                                |
> | 08:EXCHANGE [HASH(t2.id)]                        |
> | |                                                |
> | 03:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]      |
> | |  hash predicates: t1.int_col = t2.int_col      |
> | |                                                |
> | |--07:EXCHANGE [HASH(t2.int_col)]                |
> | |  |                                             |
> | |  01:SCAN HDFS [functional.alltypessmall t2]    |
> | |     partitions=4/4 files=4 size=6.32KB         |
> | |     runtime filters: RF000 -> t2.id            |
> | |                                                |
> | 06:EXCHANGE [HASH(t1.int_col)]                   |
> | |                                                |
> | 00:SCAN HDFS [functional.alltypes t1]            |
> |    partitions=24/24 files=24 size=478.45KB       |
> +--------------------------------------------------+
> {noformat}, which is wrong because the rows with t2.id=null can appear in any partition after the outer join. So it's incorrect to aggregate without an exchange.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)