You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Qifan Chen (Code Review)" <ge...@cloudera.org> on 2020/11/24 19:35:13 UTC

[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate

Qifan Chen has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
......................................................................

IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate

This patch adds the logic to utilize min/max stats for Parquet row
groups or pages to skip these entities when they don't qualify an
equi-join predicate.

A new class of predicates called overlap predicates is introduced to aid
in the determination of whether a Parquet row group or a page overlap
with a range computed from the hash join. If not, then the entire
Parquet row group or the page are skipped.

The new class of predicates co-exist with the existing min/max conjuncts
that are introduced based on the local scan predicates. Both classes of
predicates can work individually or together with each other. The overlap
predicates are evaluted after the existing min/max conjuncts.

To be done:
1. Handle STRING, DATE, TIME and DECIMAL data tyes;
2. Unit/performance testing;
3. Core testing.

Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
---
M be/src/exec/exec-node.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
16 files changed, 525 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16720/16
-- 
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 16
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>