You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Yuchen Fan (Jira)" <ji...@apache.org> on 2022/11/15 12:54:00 UTC

[jira] [Created] (IMPALA-11725) Query result incorrect when querying and filtering NULL values of sub-query

Yuchen Fan created IMPALA-11725:
-----------------------------------

             Summary: Query result incorrect when querying and filtering NULL values of sub-query 
                 Key: IMPALA-11725
                 URL: https://issues.apache.org/jira/browse/IMPALA-11725
             Project: IMPALA
          Issue Type: Bug
          Components: Distributed Exec, Frontend
    Affects Versions: Impala 4.1.0
            Reporter: Yuchen Fan


We found that Impala can't filter NULL values from sub-query. For example, prepare two test tables err_tbl1 and err_tbl2(id INT, dt STRING):

+----+------------+
| id | dt         |
+----+------------+
| 14 | 2022-11-13 |
| 15 | 2022-11-13 |
| 13 | 2022-11-13 |
+----+------------+

+----+------------+
| id | dt         |
+----+------------+
| 14 | 2022-11-13 |
| 16 | 2022-11-13 |
| 13 | 2022-11-13 |
+----+------------+

And submit query below:
{code:java}
SELECT *
FROM (
    SELECT aid, bid, COUNT(*) AS c
    FROM (
        SELECT id AS aid
        FROM err_tbl1
        WHERE dt = '2022-11-13'
    ) a
        FULL JOIN (
            SELECT id AS bid
            FROM err_tbl2
            WHERE dt = '2022-11-13'
        ) b
        ON a.aid = b.bid
    GROUP BY aid, bid
) t1
WHERE aid = bid;{code}
Out result includes 4 rows:

+------+------+---+
| aid  | bid  | c |
+------+------+---+
| 13   | 13   | 1 |
| 14   | 14   | 1 |
| NULL | 15   | 1 |
| 16   | NULL | 1 |
+------+------+—+

Obviously, condition of 'aid=bid' is invalid. 'NULL' value should be filtered out. But if we use condition of 'aid!=bid', there will be empty result set, which means '!=' can filter out 'NULL' value while '=' cannot. What's more, if we create a table as select * from sub-query and execute 'SELECT * FROM sub_table WHERE aid=bid'. The result is correct. If we surround 'aid=bid' with 'trim()' like 'trim(cast(aid as string))=trim(cast(bid as string))', result is also correct. In Spark, result of the same query doesn't contain 'NULL' values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org