You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2023/06/25 23:58:00 UTC
[jira] [Commented] (IMPALA-11725) Query result incorrect when querying and filtering NULL values of sub-query
[ https://issues.apache.org/jira/browse/IMPALA-11725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17736940#comment-17736940 ]
Quanlong Huang commented on IMPALA-11725:
-----------------------------------------
I can still reproduce this in the master branch. I think the bug is due to predicate "aid = bid" is not enforced in the outermost query block. We should copy it into the inlineview instead of migrating it (pushing it down).
[~Yuchen Fan] [~tangzhi] Do you want to work on this?
> Query result incorrect when querying and filtering NULL values of sub-query
> ----------------------------------------------------------------------------
>
> Key: IMPALA-11725
> URL: https://issues.apache.org/jira/browse/IMPALA-11725
> Project: IMPALA
> Issue Type: Bug
> Components: Distributed Exec, Frontend
> Affects Versions: Impala 4.1.0
> Reporter: Yuchen Fan
> Priority: Critical
>
> We found that Impala can't filter NULL values from sub-query. For example, prepare two test tables err_tbl1 and err_tbl2(id INT, dt STRING):
> {noformat}
> +----+------------+
> | id | dt |
> +----+------------+
> | 14 | 2022-11-13 |
> | 15 | 2022-11-13 |
> | 13 | 2022-11-13 |
> +----+------------+
> +----+------------+
> | id | dt |
> +----+------------+
> | 14 | 2022-11-13 |
> | 16 | 2022-11-13 |
> | 13 | 2022-11-13 |
> +----+------------+
> {noformat}
> And submit query below:
> {code:java}
> SELECT *
> FROM (
> SELECT aid, bid, COUNT(*) AS c
> FROM (
> SELECT id AS aid
> FROM err_tbl1
> WHERE dt = '2022-11-13'
> ) a
> FULL JOIN (
> SELECT id AS bid
> FROM err_tbl2
> WHERE dt = '2022-11-13'
> ) b
> ON a.aid = b.bid
> GROUP BY aid, bid
> ) t1
> WHERE aid = bid;{code}
> Out result includes 4 rows:
> {noformat}
> +------+------+---+
> | aid | bid | c |
> +------+------+---+
> | 13 | 13 | 1 |
> | 14 | 14 | 1 |
> | NULL | 15 | 1 |
> | 16 | NULL | 1 |
> +------+------+—+
> {noformat}
> Obviously, condition of 'aid=bid' is invalid. 'NULL' value should be filtered out. But if we use condition of 'aid!=bid', there will be empty result set, which means '!=' can filter out 'NULL' value while '=' cannot. What's more, if we create a table as select * from sub-query and execute 'SELECT * FROM sub_table WHERE aid=bid'. The result is correct. If we surround 'aid=bid' with 'trim()' like 'trim(cast(aid as string))=trim(cast(bid as string))', result is also correct. In Spark, result of the same query doesn't contain 'NULL' values.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org