You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/09/21 17:03:00 UTC
[jira] [Created] (IMPALA-10182) Rows with NULLs filtered out with
duplicate columns in subquery select inside UNION ALL
Tim Armstrong created IMPALA-10182:
--------------------------------------
Summary: Rows with NULLs filtered out with duplicate columns in subquery select inside UNION ALL
Key: IMPALA-10182
URL: https://issues.apache.org/jira/browse/IMPALA-10182
Project: IMPALA
Issue Type: Bug
Components: Frontend
Reporter: Tim Armstrong
Assignee: Aman Sinha
Bug report from here - https://community.cloudera.com/t5/Support-Questions/quot-union-all-quot-dropping-records-with-all-null-empty/m-p/303153#M221415
Repro:
{noformat}
create database if not exists as_adventure;
use as_adventure;
CREATE tABLE IF NOT EXISTS
as_adventure.t1
(
productsubcategorykey INT,
productline STRING);
insert into t1 values (1,'l1');
insert into t1 values (2,'l1');
insert into t1 values (1,'l2');
insert into t1 values (3,'l3');
insert into t1 values (null,'');
select * from t1;
SELECT
MIN(t_53.c_41) c_41,
CAST(NULL AS DOUBLE) c_43,
CAST(NULL AS BIGINT) c_44,
t_53.c2 c2,
t_53.c3s0 c3s0,
t_53.c4 c4,
t_53.c5s0 c5s0
FROM
( SELECT
t.productsubcategorykey c_41,
t.productline c2,
t.productline c3s0,
t.productsubcategorykey c4,
t.productsubcategorykey c5s0
FROM
as_adventure.t1 t
WHERE
true
GROUP BY
2,
3,
4,
5 ) t_53
GROUP BY
4,
5,
6,
7
UNION ALL
SELECT
MIN(t_53.c_41) c_41,
CAST(NULL AS DOUBLE) c_43,
CAST(NULL AS BIGINT) c_44,
t_53.c2 c2,
t_53.c3s0 c3s0,
t_53.c4 c4,
t_53.c5s0 c5s0
FROM
( SELECT
t.productsubcategorykey c_41,
t.productline c2,
t.productline c3s0,
t.productsubcategorykey c4,
t.productsubcategorykey c5s0
FROM
as_adventure.t1 t
WHERE
true
GROUP BY
2,
3,
4,
5 ) t_53
GROUP BY
4,
5,
6,
7
{noformat}
Somewhat similar to IMPALA-7957 in that the inferred predicates from the column equivalences get placed in a Select node. It's a bit different in that the NULLs that are filtered out from the predicates come from the base table.
{noformat}
+------------------------------------------------------------------------------------------------+
| Explain String |
+------------------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=136.02MB Threads=6 |
| Per-Host Resource Estimates: Memory=576MB |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| as_adventure.t1 |
| |
| PLAN-ROOT SINK |
| | |
| 13:EXCHANGE [UNPARTITIONED] |
| | |
| 00:UNION |
| | row-size=52B cardinality=2 |
| | |
| |--08:SELECT |
| | | predicates: t_53.c4 = t_53.c5s0, t_53.c2 = t_53.c3s0 |
| | | row-size=36B cardinality=1 |
| | | |
| | 07:AGGREGATE [FINALIZE] |
| | | output: min(t.productsubcategorykey) |
| | | group by: t.productline, t.productline, t.productsubcategorykey, t.productsubcategorykey |
| | | row-size=36B cardinality=2 |
| | | |
| | 12:AGGREGATE [FINALIZE] |
| | | group by: t.productline, t.productsubcategorykey |
| | | row-size=16B cardinality=2 |
| | | |
| | 11:EXCHANGE [HASH(t.productline,t.productsubcategorykey)] |
| | | |
| | 06:AGGREGATE [STREAMING] |
| | | group by: t.productline, t.productsubcategorykey |
| | | row-size=16B cardinality=2 |
| | | |
| | 05:SCAN HDFS [as_adventure.t1 t] |
| | HDFS partitions=1/1 files=5 size=24B |
| | row-size=16B cardinality=2 |
| | |
| 04:SELECT |
| | predicates: t_53.c4 = t_53.c5s0, t_53.c2 = t_53.c3s0 |
| | row-size=36B cardinality=1 |
| | |
| 03:AGGREGATE [FINALIZE] |
| | output: min(t.productsubcategorykey) |
| | group by: t.productline, t.productline, t.productsubcategorykey, t.productsubcategorykey |
| | row-size=36B cardinality=2 |
| | |
| 10:AGGREGATE [FINALIZE] |
| | group by: t.productline, t.productsubcategorykey |
| | row-size=16B cardinality=2 |
| | |
| 09:EXCHANGE [HASH(t.productline,t.productsubcategorykey)] |
| | |
| 02:AGGREGATE [STREAMING] |
| | group by: t.productline, t.productsubcategorykey |
| | row-size=16B cardinality=2 |
| | |
| 01:SCAN HDFS [as_adventure.t1 t] |
| HDFS partitions=1/1 files=5 size=24B |
| row-size=16B cardinality=2 |
+------------------------------------------------------------------------------------------------+
{noformat}
[~amansinha] [~superdupershant] [~stigahuang]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org