You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2018/10/30 15:42:00 UTC

[jira] [Commented] (IMPALA-7782) discrepancy in results with a subquery containing an agg that produces an empty set

    [ https://issues.apache.org/jira/browse/IMPALA-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668899#comment-16668899 ] 

Tim Armstrong commented on IMPALA-7782:
---------------------------------------

Looks like a bad rewrite in the planner. The plan doesn't make any sense to me, it seems like it did some incorrect optimisation based on the subquery being empty:
{noformat}
**
[localhost:21000] default> use functional;
Query: use functional
[localhost:21000] functional> explain 
                            > SELECT id
                            > FROM alltypestiny
                            > WHERE -1 NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false);
Query: explain SELECT id
FROM alltypestiny
WHERE -1 NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false)
+------------------------------------------------------------+
| Explain String                                             |
+------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=8.00KB Threads=4 |
| Per-Host Resource Estimates: Memory=32MB                   |
| Codegen disabled by planner                                |
|                                                            |
| PLAN-ROOT SINK                                             |
| |                                                          |
| 04:EXCHANGE [UNPARTITIONED]                                |
| |                                                          |
| 02:NESTED LOOP JOIN [CROSS JOIN, BROADCAST]                |
| |                                                          |
| |--03:EXCHANGE [BROADCAST]                                 |
| |  |                                                       |
| |  01:EMPTYSET                                             |
| |                                                          |
| 00:SCAN HDFS [functional.alltypestiny]                     |
|    partitions=4/4 files=4 size=460B                        |
+------------------------------------------------------------+
{noformat}

> discrepancy in results with a subquery containing an agg that produces an empty set
> -----------------------------------------------------------------------------------
>
>                 Key: IMPALA-7782
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7782
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.12.0, Impala 3.1.0
>            Reporter: Michael Brown
>            Priority: Blocker
>              Labels: correctness, query_generator
>
> A discrepancy exists between Impala and Postgres when a subquery contains an agg and results in an empty set, yet the WHERE clause looking at the subquery should produce a "True" condition.
> Example queries include:
> {noformat}
> USE functional;
> SELECT id
> FROM alltypestiny
> WHERE -1 NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false);
> SELECT id
> FROM alltypestiny
> WHERE NULL NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false);
> SELECT id
> FROM alltypestiny
> WHERE (SELECT COUNT(id) FROM alltypestiny HAVING false) IS NULL;
> {noformat}
> These queries do not produce any rows in Impala. In Postgres, the queries produce all 8 rows for the functional.alltypestiny id column.
> Thinking maybe there were Impala and Postgres differences with {{NOT IN}} behavior, I also tried this:
> {noformat}
> USE functional;
> SELECT id
> FROM alltypestiny
> WHERE -1 NOT IN (SELECT 1 FROM alltypestiny WHERE bool_col IS NULL);
> {noformat}
> This subquery also produces an empty set just like the subquery in the problematic queries at the top, but unlike those queries, this full query returns the same results in Impala and Postgres (all 8 rows for the functional.alltypestiny id column).
> For anyone interested in this bug, you can migrate data into postgres in a dev environment using
> {noformat}
> tests/comparison/data_generator.py --use-postgresql --migrate-table-names alltypestiny --db-name functional migrate
> {noformat}
> This is in 2.12 at least, so it's not a 3.1 regression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org