You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Yuchen Fan (Jira)" <ji...@apache.org> on 2023/06/26 02:23:00 UTC

[jira] [Assigned] (IMPALA-11725) Query result incorrect when querying and filtering NULL values of sub-query

     [ https://issues.apache.org/jira/browse/IMPALA-11725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuchen Fan reassigned IMPALA-11725:
-----------------------------------

    Assignee: Yuchen Fan

> Query result incorrect when querying and filtering NULL values of sub-query 
> ----------------------------------------------------------------------------
>
>                 Key: IMPALA-11725
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11725
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec, Frontend
>    Affects Versions: Impala 4.1.0
>            Reporter: Yuchen Fan
>            Assignee: Yuchen Fan
>            Priority: Critical
>
> We found that Impala can't filter NULL values from sub-query. For example, prepare two test tables err_tbl1 and err_tbl2(id INT, dt STRING):
> {noformat}
> +----+------------+
> | id | dt         |
> +----+------------+
> | 14 | 2022-11-13 |
> | 15 | 2022-11-13 |
> | 13 | 2022-11-13 |
> +----+------------+
> +----+------------+
> | id | dt         |
> +----+------------+
> | 14 | 2022-11-13 |
> | 16 | 2022-11-13 |
> | 13 | 2022-11-13 |
> +----+------------+
> {noformat}
> And submit query below:
> {code:java}
> SELECT *
> FROM (
>     SELECT aid, bid, COUNT(*) AS c
>     FROM (
>         SELECT id AS aid
>         FROM err_tbl1
>         WHERE dt = '2022-11-13'
>     ) a
>         FULL JOIN (
>             SELECT id AS bid
>             FROM err_tbl2
>             WHERE dt = '2022-11-13'
>         ) b
>         ON a.aid = b.bid
>     GROUP BY aid, bid
> ) t1
> WHERE aid = bid;{code}
> Out result includes 4 rows:
> {noformat}
> +------+------+---+
> | aid  | bid  | c |
> +------+------+---+
> | 13   | 13   | 1 |
> | 14   | 14   | 1 |
> | NULL | 15   | 1 |
> | 16   | NULL | 1 |
> +------+------+—+
> {noformat}
> Obviously, condition of 'aid=bid' is invalid. 'NULL' value should be filtered out. But if we use condition of 'aid!=bid', there will be empty result set, which means '!=' can filter out 'NULL' value while '=' cannot. What's more, if we create a table as select * from sub-query and execute 'SELECT * FROM sub_table WHERE aid=bid'. The result is correct. If we surround 'aid=bid' with 'trim()' like 'trim(cast(aid as string))=trim(cast(bid as string))', result is also correct. In Spark, result of the same query doesn't contain 'NULL' values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org