You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/06/06 02:14:00 UTC

[jira] [Commented] (IMPALA-10182) Rows with NULLs filtered out with duplicate columns in subquery select inside UNION ALL

    [ https://issues.apache.org/jira/browse/IMPALA-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550264#comment-17550264 ] 

ASF subversion and git services commented on IMPALA-10182:
----------------------------------------------------------

Commit 13bbff4e4e5fc5d459cc6f7a5512f84ceba897cd in impala's branch refs/heads/master from Steve Carlin
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=13bbff4e4 ]

IMPALA-11323: Don't evaluate constants-only inferred predicates

IMPALA-10182 fixed the problem of creating inferred predicates when
both sides of an equality predicate came from the same slot.

Inferred predicates also should not be created when both sides
of an equality predicate are constant values which do not have
scan slots.

Change-Id: If1cd4559dda406d2d38703ed594b70b41ed336fd
Reviewed-on: http://gerrit.cloudera.org:8080/18579
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Aman Sinha <am...@cloudera.com>


> Rows with NULLs filtered out with duplicate columns in subquery select inside UNION ALL
> ---------------------------------------------------------------------------------------
>
>                 Key: IMPALA-10182
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10182
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Tim Armstrong
>            Assignee: Aman Sinha
>            Priority: Blocker
>              Labels: correctness
>             Fix For: Impala 4.0.0
>
>
> Bug report from here - https://community.cloudera.com/t5/Support-Questions/quot-union-all-quot-dropping-records-with-all-null-empty/m-p/303153#M221415
> Repro:
> {noformat}
> create database if not exists as_adventure;
> use as_adventure;
> CREATE tABLE IF NOT EXISTS
>     as_adventure.t1 
>     ( 
>         productsubcategorykey INT, 
>         productline STRING);
> insert into t1 values (1,'l1');
> insert into t1 values (2,'l1');
> insert into t1 values (1,'l2');
> insert into t1 values (3,'l3');
> insert into t1 values (null,'');
> select * from t1; 
> SELECT
>     MIN(t_53.c_41)       c_41,
>     CAST(NULL AS DOUBLE) c_43,
>     CAST(NULL AS BIGINT) c_44,
>     t_53.c2              c2,
>     t_53.c3s0            c3s0,
>     t_53.c4              c4,
>     t_53.c5s0            c5s0
> FROM
>     (   SELECT
>             t.productsubcategorykey c_41,
>             t.productline           c2,
>             t.productline           c3s0,
>             t.productsubcategorykey c4,
>             t.productsubcategorykey c5s0
>         FROM
>             as_adventure.t1 t
>         WHERE
>             true
>         GROUP BY
>             2,
>             3,
>             4,
>             5 ) t_53
> GROUP BY
>     4,
>     5,
>     6,
>     7
>  
> UNION ALL
> SELECT
>     MIN(t_53.c_41)       c_41,
>     CAST(NULL AS DOUBLE) c_43,
>     CAST(NULL AS BIGINT) c_44,
>     t_53.c2              c2,
>     t_53.c3s0            c3s0,
>     t_53.c4              c4,
>     t_53.c5s0            c5s0
> FROM
>     (   SELECT
>             t.productsubcategorykey c_41,
>             t.productline           c2,
>             t.productline           c3s0,
>             t.productsubcategorykey c4,
>             t.productsubcategorykey c5s0
>         FROM
>             as_adventure.t1 t
>         WHERE
>             true
>         GROUP BY
>             2,
>             3,
>             4,
>             5 ) t_53
> GROUP BY
>     4,
>     5,
>     6,
>     7
> {noformat}
> Somewhat similar to IMPALA-7957 in that the inferred predicates from the column equivalences get placed in a Select node. It's a bit different in that the NULLs that are filtered out from the predicates come from the base table.
> {noformat}
> +------------------------------------------------------------------------------------------------+
> | Explain String                                                                                 |
> +------------------------------------------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=136.02MB Threads=6                                   |
> | Per-Host Resource Estimates: Memory=576MB                                                      |
> | WARNING: The following tables are missing relevant table and/or column statistics.             |
> | as_adventure.t1                                                                                |
> |                                                                                                |
> | PLAN-ROOT SINK                                                                                 |
> | |                                                                                              |
> | 13:EXCHANGE [UNPARTITIONED]                                                                    |
> | |                                                                                              |
> | 00:UNION                                                                                       |
> | |  row-size=52B cardinality=2                                                                  |
> | |                                                                                              |
> | |--08:SELECT                                                                                   |
> | |  |  predicates: t_53.c4 = t_53.c5s0, t_53.c2 = t_53.c3s0                                     |
> | |  |  row-size=36B cardinality=1                                                               |
> | |  |                                                                                           |
> | |  07:AGGREGATE [FINALIZE]                                                                     |
> | |  |  output: min(t.productsubcategorykey)                                                     |
> | |  |  group by: t.productline, t.productline, t.productsubcategorykey, t.productsubcategorykey |
> | |  |  row-size=36B cardinality=2                                                               |
> | |  |                                                                                           |
> | |  12:AGGREGATE [FINALIZE]                                                                     |
> | |  |  group by: t.productline, t.productsubcategorykey                                         |
> | |  |  row-size=16B cardinality=2                                                               |
> | |  |                                                                                           |
> | |  11:EXCHANGE [HASH(t.productline,t.productsubcategorykey)]                                   |
> | |  |                                                                                           |
> | |  06:AGGREGATE [STREAMING]                                                                    |
> | |  |  group by: t.productline, t.productsubcategorykey                                         |
> | |  |  row-size=16B cardinality=2                                                               |
> | |  |                                                                                           |
> | |  05:SCAN HDFS [as_adventure.t1 t]                                                            |
> | |     HDFS partitions=1/1 files=5 size=24B                                                     |
> | |     row-size=16B cardinality=2                                                               |
> | |                                                                                              |
> | 04:SELECT                                                                                      |
> | |  predicates: t_53.c4 = t_53.c5s0, t_53.c2 = t_53.c3s0                                        |
> | |  row-size=36B cardinality=1                                                                  |
> | |                                                                                              |
> | 03:AGGREGATE [FINALIZE]                                                                        |
> | |  output: min(t.productsubcategorykey)                                                        |
> | |  group by: t.productline, t.productline, t.productsubcategorykey, t.productsubcategorykey    |
> | |  row-size=36B cardinality=2                                                                  |
> | |                                                                                              |
> | 10:AGGREGATE [FINALIZE]                                                                        |
> | |  group by: t.productline, t.productsubcategorykey                                            |
> | |  row-size=16B cardinality=2                                                                  |
> | |                                                                                              |
> | 09:EXCHANGE [HASH(t.productline,t.productsubcategorykey)]                                      |
> | |                                                                                              |
> | 02:AGGREGATE [STREAMING]                                                                       |
> | |  group by: t.productline, t.productsubcategorykey                                            |
> | |  row-size=16B cardinality=2                                                                  |
> | |                                                                                              |
> | 01:SCAN HDFS [as_adventure.t1 t]                                                               |
> |    HDFS partitions=1/1 files=5 size=24B                                                        |
> |    row-size=16B cardinality=2                                                                  |
> +------------------------------------------------------------------------------------------------+
> {noformat}
> This query is a simplified version of the original one:
> {noformat}
> SELECT
>     t_53.c4              c4,
>     t_53.c5s0            c5s0
> FROM
>     (   SELECT
>             t.productsubcategorykey c4,
>             t.productsubcategorykey c5s0
>         FROM
>             as_adventure.t1 t
>         GROUP BY
>             1,
>             2 ) t_53
> GROUP BY 1,2
> UNION ALL
> SELECT
>     t_53.c4              c4,
>     t_53.c5s0            c5s0
> FROM
>     (   SELECT
>             t.productsubcategorykey c4,
>             t.productsubcategorykey c5s0
>         FROM
>             as_adventure.t1 t
>         GROUP BY
>             1, 2 ) t_53
> GROUP BY 1, 2;
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org