You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Qifan Chen (Jira)" <ji...@apache.org> on 2021/02/09 16:26:00 UTC

[jira] [Created] (IMPALA-10494) Making use of the min/max column stats to improve min/max filters

Qifan Chen created IMPALA-10494:
-----------------------------------

             Summary: Making use of the min/max column stats to improve min/max filters
                 Key: IMPALA-10494
                 URL: https://issues.apache.org/jira/browse/IMPALA-10494
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
            Reporter: Qifan Chen


HMS (hive metastore) API offers means to store the minimal and maximal value per column (https://hive.apache.org/javadocs/r3.0.0/api/org/apache/hadoop/hive/metastore/api/ColumnStatisticsData.html).  For example, such stats for an integer column can be captured via a LongColumnStatsData object (https://hive.apache.org/javadocs/r3.0.0/api/org/apache/hadoop/hive/metastore/api/LongColumnStatsData.html). 

It is desirable to use the min and max stats per column to help the formation of useful min/max filters that can help reduce the data scanned for Parquet tables. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)