You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2022/11/07 12:49:00 UTC

[jira] [Commented] (IMPALA-11707) Wrong results when global runtime IN-list filters are applied

    [ https://issues.apache.org/jira/browse/IMPALA-11707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629797#comment-17629797 ] 

Quanlong Huang commented on IMPALA-11707:
-----------------------------------------

Uploaded a fix for review: https://gerrit.cloudera.org/c/19220/

> Wrong results when global runtime IN-list filters are applied
> -------------------------------------------------------------
>
>                 Key: IMPALA-11707
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11707
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.1.0, Impala 4.1.1
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> Found this bug when doing a large scale TPC-H benchmark. The bug can be reproduced by the following query:
> {code:sql}
> use tpch_orc_def;
> set enabled_runtime_filter_types=in_list;
> select count(*) from supplier, nation, region
> where s_nationkey = n_nationkey
>   and n_regionkey = r_regionkey
>   and r_name = 'EUROPE';{code}
> The result is 0 which is wrong. The expected result is 1987. The summary shows that ScanNode on "nation" table returns 0 rows:
> {noformat}
> 04:HASH JOIN                  1      1  445.629us  445.629us      0       2.00K    1.98 MB        1.94 MB  INNER JOIN, BROADCAST 
> |--07:EXCHANGE                1      1   40.466us   40.466us      1           1   16.00 KB       16.00 KB  BROADCAST             
> |  F02:EXCHANGE SENDER        1      1  217.341us  217.341us                       8.60 KB       99.20 KB                        
> |  02:SCAN HDFS               1      1    4.507ms    4.507ms      1           1  917.09 KB       96.00 MB  tpch_orc_def.region   
> 03:HASH JOIN                  1      1    2.112ms    2.112ms      0      10.00K    1.97 MB        1.94 MB  INNER JOIN, BROADCAST 
> |--06:EXCHANGE                1      1   27.803us   27.803us      0          25          0       16.00 KB  BROADCAST             
> |  F01:EXCHANGE SENDER        1      1   89.872us   89.872us                      25.59 KB       32.00 KB                        
> |  01:SCAN HDFS               1      1   12.833ms   12.833ms      0          25   32.00 KB       64.00 MB  tpch_orc_def.nation   
> 00:SCAN HDFS                  1      1  371.636us  371.636us      0      10.00K   16.00 KB       32.00 MB  tpch_orc_def.supplier {noformat}
> There is a runtime IN-list filter applied on this node:
> {noformat}
> 01:SCAN HDFS [tpch_orc_def.nation, RANDOM]
>    HDFS partitions=1/1 files=1 size=1.74KB
>    runtime filters: RF000[in_list] -> n_regionkey
>    stored statistics:
>      table: rows=25 size=1.74KB
>      columns: all 
>    extrapolated-rows=disabled max-scan-range-rows=25
>    mem-estimate=64.00MB mem-reservation=32.00KB thread-reservation=1
>    tuple-ids=1 row-size=4B cardinality=25
>    in pipelines: 01(GETNEXT){noformat}
> The filter is generated from a build side which is reading the "region" table which predicate "r_name = 'EUROPE'". Note that it's a global runtime filter generated by other impalads (not the impalad scanning the "nation" table).
> The profile shows that this filter rejects one file which is the exact one file of "nation" table.
> {noformat}
>         Filter 0 (2.00 KB):
>            - Files processed: 1 (1)
>            - Files rejected: 1 (1)
>            - Files total: 1 (1){noformat}
> This is wrong since at least 5 rows in the file should pass the filter:
> {code:java}
> impala-shell> select count(*) from nation, region where n_regionkey = r_regionkey and r_name = 'EUROPE';
> +----------+
> | count(*) |
> +----------+
> | 5        |
> +----------+{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org