You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Aman Sinha (Jira)" <ji...@apache.org> on 2021/01/04 00:13:00 UTC
[jira] [Commented] (IMPALA-10182) Rows with NULLs filtered out with
duplicate columns in subquery select inside UNION ALL
[ https://issues.apache.org/jira/browse/IMPALA-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17257880#comment-17257880 ]
Aman Sinha commented on IMPALA-10182:
-------------------------------------
[~tarmstrong][~stigahuang] thanks for your analysis. I only recently got around to looking at this and have posted a patch: https://gerrit.cloudera.org/c/16917/ . It is along the lines of the above discussion although I did not need to create the IS NOT DISTINCT predicate .. I simply did not add such inferred identity conjuncts to the select node.
> Rows with NULLs filtered out with duplicate columns in subquery select inside UNION ALL
> ---------------------------------------------------------------------------------------
>
> Key: IMPALA-10182
> URL: https://issues.apache.org/jira/browse/IMPALA-10182
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Reporter: Tim Armstrong
> Assignee: Aman Sinha
> Priority: Blocker
> Labels: correctness
>
> Bug report from here - https://community.cloudera.com/t5/Support-Questions/quot-union-all-quot-dropping-records-with-all-null-empty/m-p/303153#M221415
> Repro:
> {noformat}
> create database if not exists as_adventure;
> use as_adventure;
> CREATE tABLE IF NOT EXISTS
> as_adventure.t1
> (
> productsubcategorykey INT,
> productline STRING);
> insert into t1 values (1,'l1');
> insert into t1 values (2,'l1');
> insert into t1 values (1,'l2');
> insert into t1 values (3,'l3');
> insert into t1 values (null,'');
> select * from t1;
> SELECT
> MIN(t_53.c_41) c_41,
> CAST(NULL AS DOUBLE) c_43,
> CAST(NULL AS BIGINT) c_44,
> t_53.c2 c2,
> t_53.c3s0 c3s0,
> t_53.c4 c4,
> t_53.c5s0 c5s0
> FROM
> ( SELECT
> t.productsubcategorykey c_41,
> t.productline c2,
> t.productline c3s0,
> t.productsubcategorykey c4,
> t.productsubcategorykey c5s0
> FROM
> as_adventure.t1 t
> WHERE
> true
> GROUP BY
> 2,
> 3,
> 4,
> 5 ) t_53
> GROUP BY
> 4,
> 5,
> 6,
> 7
>
> UNION ALL
> SELECT
> MIN(t_53.c_41) c_41,
> CAST(NULL AS DOUBLE) c_43,
> CAST(NULL AS BIGINT) c_44,
> t_53.c2 c2,
> t_53.c3s0 c3s0,
> t_53.c4 c4,
> t_53.c5s0 c5s0
> FROM
> ( SELECT
> t.productsubcategorykey c_41,
> t.productline c2,
> t.productline c3s0,
> t.productsubcategorykey c4,
> t.productsubcategorykey c5s0
> FROM
> as_adventure.t1 t
> WHERE
> true
> GROUP BY
> 2,
> 3,
> 4,
> 5 ) t_53
> GROUP BY
> 4,
> 5,
> 6,
> 7
> {noformat}
> Somewhat similar to IMPALA-7957 in that the inferred predicates from the column equivalences get placed in a Select node. It's a bit different in that the NULLs that are filtered out from the predicates come from the base table.
> {noformat}
> +------------------------------------------------------------------------------------------------+
> | Explain String |
> +------------------------------------------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=136.02MB Threads=6 |
> | Per-Host Resource Estimates: Memory=576MB |
> | WARNING: The following tables are missing relevant table and/or column statistics. |
> | as_adventure.t1 |
> | |
> | PLAN-ROOT SINK |
> | | |
> | 13:EXCHANGE [UNPARTITIONED] |
> | | |
> | 00:UNION |
> | | row-size=52B cardinality=2 |
> | | |
> | |--08:SELECT |
> | | | predicates: t_53.c4 = t_53.c5s0, t_53.c2 = t_53.c3s0 |
> | | | row-size=36B cardinality=1 |
> | | | |
> | | 07:AGGREGATE [FINALIZE] |
> | | | output: min(t.productsubcategorykey) |
> | | | group by: t.productline, t.productline, t.productsubcategorykey, t.productsubcategorykey |
> | | | row-size=36B cardinality=2 |
> | | | |
> | | 12:AGGREGATE [FINALIZE] |
> | | | group by: t.productline, t.productsubcategorykey |
> | | | row-size=16B cardinality=2 |
> | | | |
> | | 11:EXCHANGE [HASH(t.productline,t.productsubcategorykey)] |
> | | | |
> | | 06:AGGREGATE [STREAMING] |
> | | | group by: t.productline, t.productsubcategorykey |
> | | | row-size=16B cardinality=2 |
> | | | |
> | | 05:SCAN HDFS [as_adventure.t1 t] |
> | | HDFS partitions=1/1 files=5 size=24B |
> | | row-size=16B cardinality=2 |
> | | |
> | 04:SELECT |
> | | predicates: t_53.c4 = t_53.c5s0, t_53.c2 = t_53.c3s0 |
> | | row-size=36B cardinality=1 |
> | | |
> | 03:AGGREGATE [FINALIZE] |
> | | output: min(t.productsubcategorykey) |
> | | group by: t.productline, t.productline, t.productsubcategorykey, t.productsubcategorykey |
> | | row-size=36B cardinality=2 |
> | | |
> | 10:AGGREGATE [FINALIZE] |
> | | group by: t.productline, t.productsubcategorykey |
> | | row-size=16B cardinality=2 |
> | | |
> | 09:EXCHANGE [HASH(t.productline,t.productsubcategorykey)] |
> | | |
> | 02:AGGREGATE [STREAMING] |
> | | group by: t.productline, t.productsubcategorykey |
> | | row-size=16B cardinality=2 |
> | | |
> | 01:SCAN HDFS [as_adventure.t1 t] |
> | HDFS partitions=1/1 files=5 size=24B |
> | row-size=16B cardinality=2 |
> +------------------------------------------------------------------------------------------------+
> {noformat}
> This query is a simplified version of the original one:
> {noformat}
> SELECT
> t_53.c4 c4,
> t_53.c5s0 c5s0
> FROM
> ( SELECT
> t.productsubcategorykey c4,
> t.productsubcategorykey c5s0
> FROM
> as_adventure.t1 t
> GROUP BY
> 1,
> 2 ) t_53
> GROUP BY 1,2
> UNION ALL
> SELECT
> t_53.c4 c4,
> t_53.c5s0 c5s0
> FROM
> ( SELECT
> t.productsubcategorykey c4,
> t.productsubcategorykey c5s0
> FROM
> as_adventure.t1 t
> GROUP BY
> 1, 2 ) t_53
> GROUP BY 1, 2;
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org