You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2022/11/22 23:43:00 UTC
[jira] [Resolved] (IMPALA-11707) Wrong results when global runtime IN-list filters are applied
[ https://issues.apache.org/jira/browse/IMPALA-11707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Quanlong Huang resolved IMPALA-11707.
-------------------------------------
Resolution: Fixed
> Wrong results when global runtime IN-list filters are applied
> -------------------------------------------------------------
>
> Key: IMPALA-11707
> URL: https://issues.apache.org/jira/browse/IMPALA-11707
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 4.1.0, Impala 4.1.1
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
> Labels: correctness
>
> Found this bug when doing a large scale TPC-H benchmark. The bug can be reproduced by the following query:
> {code:sql}
> use tpch_orc_def;
> set enabled_runtime_filter_types=in_list;
> select count(*) from supplier, nation, region
> where s_nationkey = n_nationkey
> and n_regionkey = r_regionkey
> and r_name = 'EUROPE';{code}
> The result is 0 which is wrong. The expected result is 1987. The summary shows that ScanNode on "nation" table returns 0 rows:
> {noformat}
> 04:HASH JOIN 1 1 445.629us 445.629us 0 2.00K 1.98 MB 1.94 MB INNER JOIN, BROADCAST
> |--07:EXCHANGE 1 1 40.466us 40.466us 1 1 16.00 KB 16.00 KB BROADCAST
> | F02:EXCHANGE SENDER 1 1 217.341us 217.341us 8.60 KB 99.20 KB
> | 02:SCAN HDFS 1 1 4.507ms 4.507ms 1 1 917.09 KB 96.00 MB tpch_orc_def.region
> 03:HASH JOIN 1 1 2.112ms 2.112ms 0 10.00K 1.97 MB 1.94 MB INNER JOIN, BROADCAST
> |--06:EXCHANGE 1 1 27.803us 27.803us 0 25 0 16.00 KB BROADCAST
> | F01:EXCHANGE SENDER 1 1 89.872us 89.872us 25.59 KB 32.00 KB
> | 01:SCAN HDFS 1 1 12.833ms 12.833ms 0 25 32.00 KB 64.00 MB tpch_orc_def.nation
> 00:SCAN HDFS 1 1 371.636us 371.636us 0 10.00K 16.00 KB 32.00 MB tpch_orc_def.supplier {noformat}
> There is a runtime IN-list filter applied on this node:
> {noformat}
> 01:SCAN HDFS [tpch_orc_def.nation, RANDOM]
> HDFS partitions=1/1 files=1 size=1.74KB
> runtime filters: RF000[in_list] -> n_regionkey
> stored statistics:
> table: rows=25 size=1.74KB
> columns: all
> extrapolated-rows=disabled max-scan-range-rows=25
> mem-estimate=64.00MB mem-reservation=32.00KB thread-reservation=1
> tuple-ids=1 row-size=4B cardinality=25
> in pipelines: 01(GETNEXT){noformat}
> The filter is generated from a build side which is reading the "region" table which predicate "r_name = 'EUROPE'". Note that it's a global runtime filter generated by other impalads (not the impalad scanning the "nation" table).
> The profile shows that this filter rejects one file which is the exact one file of "nation" table.
> {noformat}
> Filter 0 (2.00 KB):
> - Files processed: 1 (1)
> - Files rejected: 1 (1)
> - Files total: 1 (1){noformat}
> This is wrong since at least 5 rows in the file should pass the filter:
> {code:java}
> impala-shell> select count(*) from nation, region where n_regionkey = r_regionkey and r_name = 'EUROPE';
> +----------+
> | count(*) |
> +----------+
> | 5 |
> +----------+{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)