You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Qifan Chen (Jira)" <ji...@apache.org> on 2021/02/23 17:51:00 UTC
[jira] [Closed] (IMPALA-10325) Parquet scan should use min/max
statistics to skip pages based on equi-join predicate
[ https://issues.apache.org/jira/browse/IMPALA-10325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Qifan Chen closed IMPALA-10325.
-------------------------------
Fix Version/s: Impala 4.0
Resolution: Fixed
Some follow-up work is documented in IMPALA-10494. Making use of the min/max column (https://issues.apache.org/jira/browse/IMPALA-10494) and
IMPALA-10495 Computing correlation coefficient for certain columns can be useful to min/max filters (https://issues.apache.org/jira/browse/IMPALA-10495).
> Parquet scan should use min/max statistics to skip pages based on equi-join predicate
> -------------------------------------------------------------------------------------
>
> Key: IMPALA-10325
> URL: https://issues.apache.org/jira/browse/IMPALA-10325
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Qifan Chen
> Assignee: Qifan Chen
> Priority: Major
> Fix For: Impala 4.0
>
>
> Parquet stores min/max stats for pages which can be used to skip certain pages if they don't qualify an equi-join predicate.
> The query below ends up scanning all rows for table a, which may not be needed if the min/max of b.ss_addr_sk can be detected and used during the scan of a.
> {code:java}
> select a.ss_sold_time_sk from
> store_sales a join [SHUFFLE] store_sales b
> where a.ss_addr_sk = b.ss_addr_sk and
> b.ss_customer_sk < 10
> ;
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)