You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Qifan Chen (Jira)" <ji...@apache.org> on 2021/02/23 17:51:00 UTC

[jira] [Closed] (IMPALA-10325) Parquet scan should use min/max statistics to skip pages based on equi-join predicate

     [ https://issues.apache.org/jira/browse/IMPALA-10325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Qifan Chen closed IMPALA-10325.
-------------------------------
    Fix Version/s: Impala 4.0
       Resolution: Fixed

Some follow-up work is documented in IMPALA-10494. Making use of the min/max column (https://issues.apache.org/jira/browse/IMPALA-10494) and 
IMPALA-10495 Computing correlation coefficient for certain columns can be useful to min/max filters (https://issues.apache.org/jira/browse/IMPALA-10495).

> Parquet scan should use min/max statistics to skip pages based on equi-join predicate
> -------------------------------------------------------------------------------------
>
>                 Key: IMPALA-10325
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10325
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Qifan Chen
>            Assignee: Qifan Chen
>            Priority: Major
>             Fix For: Impala 4.0
>
>
> Parquet stores min/max stats for pages which can be used to skip certain pages if they don't qualify an equi-join predicate. 
> The query below ends up scanning all rows for table a, which may not be needed if the min/max of b.ss_addr_sk can be detected and used during the scan of a. 
> {code:java}
> select a.ss_sold_time_sk from
> store_sales a join [SHUFFLE] store_sales b
> where a.ss_addr_sk = b.ss_addr_sk and
> b.ss_customer_sk < 10
> ;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)