You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2022/11/07 03:14:00 UTC
[jira] [Created] (IMPALA-11707) Wrong results when global runtime IN-list filters are applied

Quanlong Huang created IMPALA-11707:
---------------------------------------

             Summary: Wrong results when global runtime IN-list filters are applied
                 Key: IMPALA-11707
                 URL: https://issues.apache.org/jira/browse/IMPALA-11707
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 4.1.1, Impala 4.1.0
            Reporter: Quanlong Huang
            Assignee: Quanlong Huang


Found this bug when doing a large scale TPC-H benchmark. The bug can be reproduced by the following query:
{code:sql}
use tpch_orc_def;
set enabled_runtime_filter_types=in_list;
select count(*) from supplier, nation, region
where s_nationkey = n_nationkey
  and n_regionkey = r_regionkey
  and r_name = 'EUROPE';{code}
The result is 0 which is wrong. The expected result is 1987. The summary shows that ScanNode on "nation" table returns 0 rows:
{noformat}
04:HASH JOIN                  1      1  445.629us  445.629us      0       2.00K    1.98 MB        1.94 MB  INNER JOIN, BROADCAST 
|--07:EXCHANGE                1      1   40.466us   40.466us      1           1   16.00 KB       16.00 KB  BROADCAST             
|  F02:EXCHANGE SENDER        1      1  217.341us  217.341us                       8.60 KB       99.20 KB                        
|  02:SCAN HDFS               1      1    4.507ms    4.507ms      1           1  917.09 KB       96.00 MB  tpch_orc_def.region   
03:HASH JOIN                  1      1    2.112ms    2.112ms      0      10.00K    1.97 MB        1.94 MB  INNER JOIN, BROADCAST 
|--06:EXCHANGE                1      1   27.803us   27.803us      0          25          0       16.00 KB  BROADCAST             
|  F01:EXCHANGE SENDER        1      1   89.872us   89.872us                      25.59 KB       32.00 KB                        
|  01:SCAN HDFS               1      1   12.833ms   12.833ms      0          25   32.00 KB       64.00 MB  tpch_orc_def.nation   
00:SCAN HDFS                  1      1  371.636us  371.636us      0      10.00K   16.00 KB       32.00 MB  tpch_orc_def.supplier {noformat}
There is a runtime IN-list filter applied on this node:
{noformat}
01:SCAN HDFS [tpch_orc_def.nation, RANDOM]
   HDFS partitions=1/1 files=1 size=1.74KB
   runtime filters: RF000[in_list] -> n_regionkey
   stored statistics:
     table: rows=25 size=1.74KB
     columns: all 
   extrapolated-rows=disabled max-scan-range-rows=25
   mem-estimate=64.00MB mem-reservation=32.00KB thread-reservation=1
   tuple-ids=1 row-size=4B cardinality=25
   in pipelines: 01(GETNEXT){noformat}
The filter is generated from a build side which is reading the "region" table which predicate "r_name = 'EUROPE'". Note that it's a global runtime filter generated by other impalads (not the impalad scanning the "nation" table).

The profile shows that this filter rejects one file which is the exact one file of "nation" table.
{noformat}
        Filter 0 (2.00 KB):
           - Files processed: 1 (1)
           - Files rejected: 1 (1)
           - Files total: 1 (1){noformat}
This is wrong since at least 5 rows in the file should pass the filter:
{code:java}
impala-shell> select count(*) from nation, region where n_regionkey = r_regionkey and r_name = 'EUROPE';
+----------+
| count(*) |
+----------+
| 5        |
+----------+{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)