You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2022/11/07 03:14:00 UTC
[jira] [Created] (IMPALA-11707) Wrong results when global runtime IN-list filters are applied
Quanlong Huang created IMPALA-11707:
---------------------------------------
Summary: Wrong results when global runtime IN-list filters are applied
Key: IMPALA-11707
URL: https://issues.apache.org/jira/browse/IMPALA-11707
Project: IMPALA
Issue Type: Bug
Components: Backend
Affects Versions: Impala 4.1.1, Impala 4.1.0
Reporter: Quanlong Huang
Assignee: Quanlong Huang
Found this bug when doing a large scale TPC-H benchmark. The bug can be reproduced by the following query:
{code:sql}
use tpch_orc_def;
set enabled_runtime_filter_types=in_list;
select count(*) from supplier, nation, region
where s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'EUROPE';{code}
The result is 0 which is wrong. The expected result is 1987. The summary shows that ScanNode on "nation" table returns 0 rows:
{noformat}
04:HASH JOIN 1 1 445.629us 445.629us 0 2.00K 1.98 MB 1.94 MB INNER JOIN, BROADCAST
|--07:EXCHANGE 1 1 40.466us 40.466us 1 1 16.00 KB 16.00 KB BROADCAST
| F02:EXCHANGE SENDER 1 1 217.341us 217.341us 8.60 KB 99.20 KB
| 02:SCAN HDFS 1 1 4.507ms 4.507ms 1 1 917.09 KB 96.00 MB tpch_orc_def.region
03:HASH JOIN 1 1 2.112ms 2.112ms 0 10.00K 1.97 MB 1.94 MB INNER JOIN, BROADCAST
|--06:EXCHANGE 1 1 27.803us 27.803us 0 25 0 16.00 KB BROADCAST
| F01:EXCHANGE SENDER 1 1 89.872us 89.872us 25.59 KB 32.00 KB
| 01:SCAN HDFS 1 1 12.833ms 12.833ms 0 25 32.00 KB 64.00 MB tpch_orc_def.nation
00:SCAN HDFS 1 1 371.636us 371.636us 0 10.00K 16.00 KB 32.00 MB tpch_orc_def.supplier {noformat}
There is a runtime IN-list filter applied on this node:
{noformat}
01:SCAN HDFS [tpch_orc_def.nation, RANDOM]
HDFS partitions=1/1 files=1 size=1.74KB
runtime filters: RF000[in_list] -> n_regionkey
stored statistics:
table: rows=25 size=1.74KB
columns: all
extrapolated-rows=disabled max-scan-range-rows=25
mem-estimate=64.00MB mem-reservation=32.00KB thread-reservation=1
tuple-ids=1 row-size=4B cardinality=25
in pipelines: 01(GETNEXT){noformat}
The filter is generated from a build side which is reading the "region" table which predicate "r_name = 'EUROPE'". Note that it's a global runtime filter generated by other impalads (not the impalad scanning the "nation" table).
The profile shows that this filter rejects one file which is the exact one file of "nation" table.
{noformat}
Filter 0 (2.00 KB):
- Files processed: 1 (1)
- Files rejected: 1 (1)
- Files total: 1 (1){noformat}
This is wrong since at least 5 rows in the file should pass the filter:
{code:java}
impala-shell> select count(*) from nation, region where n_regionkey = r_regionkey and r_name = 'EUROPE';
+----------+
| count(*) |
+----------+
| 5 |
+----------+{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)