You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tamas Mate (Code Review)" <ge...@cloudera.org> on 2022/01/03 15:29:09 UTC

[Impala-ASF-CR] IMPALA-10910, IMPALA-5509: Runtime filter: dictionary filter support

Tamas Mate has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/18017 )

Change subject: IMPALA-10910, IMPALA-5509: Runtime filter: dictionary filter support
......................................................................

IMPALA-10910, IMPALA-5509: Runtime filter: dictionary filter support

This commit is based on Csaba Ringhofer's earlier work on IMPALA-5509.

If a runtime filter uses only a single column, then it can be used to
filter Parquet dictionaries, and if all dictionary values are filtered,
out, the whole row group can be skipped. This is especially useful for
Iceberg tables, as the partition column is in the data file, therefore
this can help eliminate unnecessary reads.

The chance of false positives grow exponentially with the size of the
dictionary, so this optimisation is only useful for small dictionaries.

Testing:
 - Added e2e test that creates an Iceberg/Parquet table and queries it

Change-Id: Ida0ada8799774be34312eaa4be47336149f637c7
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
A testdata/workloads/functional-query/queries/QueryTest/iceberg-dictionary-runtime-filter.test
A testdata/workloads/functional-query/queries/QueryTest/parquet-dictionary-runtime-filter.test
M tests/query_test/test_runtime_filters.py
5 files changed, 153 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/18017/4
-- 
To view, visit http://gerrit.cloudera.org:8080/18017
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ida0ada8799774be34312eaa4be47336149f637c7
Gerrit-Change-Number: 18017
Gerrit-PatchSet: 4
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>