You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/07/11 17:26:00 UTC

[jira] [Created] (IMPALA-9949) Subqueries in select can result in rows not being returned

Tim Armstrong created IMPALA-9949:
-------------------------------------

             Summary: Subqueries in select can result in rows not being returned
                 Key: IMPALA-9949
                 URL: https://issues.apache.org/jira/browse/IMPALA-9949
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
            Reporter: Tim Armstrong
            Assignee: Tim Armstrong


IMPALA-8954 added support for uncorrelated subqueries but some do not return correct results. Both of those queries should return rows with NULLs where the subquery returned 0 rows.
{noformat}
[localhost.EXAMPLE.COM:21000] default> select (select min(int_col) from functional.alltypes having min(int_col) < 0) from functional.alltypestiny;
Fetched 0 row(s) in 0.16s

[localhost.EXAMPLE.COM:21000] default> select (select min(int_col) from functional.alltypes limit 0) from functional.alltypestiny;
Fetched 0 row(s) in 0.14s
{noformat}

The problem is that the CROSS JOIN will return 0 rows if the subquery returns 0 rows.
{noformat}
[localhost.EXAMPLE.COM:21000] default> explain select (select min(int_col) from functional.alltypes having min(int_col) < 0) from functional.alltypestiny;
Query: explain select (select min(int_col) from functional.alltypes having min(int_col) < 0) from functional.alltypestiny
+-------------------------------------------------------------+
| Explain String                                              |
+-------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=40.00KB Threads=5 |
| Per-Host Resource Estimates: Memory=180MB                   |
| Codegen disabled by planner                                 |
|                                                             |
| PLAN-ROOT SINK                                              |
| |                                                           |
| 03:NESTED LOOP JOIN [CROSS JOIN, BROADCAST]                 |
| |  row-size=4B cardinality=8                                |
| |                                                           |
| |--06:EXCHANGE [UNPARTITIONED]                              |
| |  |                                                        |
| |  00:SCAN HDFS [functional.alltypestiny]                   |
| |     HDFS partitions=4/4 files=4 size=460B                 |
| |     row-size=0B cardinality=8                             |
| |                                                           |
| 05:AGGREGATE [FINALIZE]                                     |
| |  output: min:merge(int_col)                               |
| |  having: min(int_col) < 0                                 |
| |  row-size=4B cardinality=1                                |
| |                                                           |
| 04:EXCHANGE [UNPARTITIONED]                                 |
| |                                                           |
| 02:AGGREGATE                                                |
| |  output: min(int_col)                                     |
| |  row-size=4B cardinality=1                                |
| |                                                           |
| 01:SCAN HDFS [functional.alltypes]                          |
|    HDFS partitions=24/24 files=24 size=478.45KB             |
|    row-size=4B cardinality=7.30K                            |
+-------------------------------------------------------------+
Fetched 29 row(s) in 0.04s
{noformat}

We need to detect cases where the subquery can return 0 rows and instead insert a left outer join. 

I did this in a patch and it fixed the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)