You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tamas Mate (Code Review)" <ge...@cloudera.org> on 2022/06/09 12:52:38 UTC

[Impala-ASF-CR] IMPALA-10453: Support file pruning via runtime filters on Iceberg

Tamas Mate has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/18531 )

Change subject: IMPALA-10453: Support file pruning via runtime filters on Iceberg
......................................................................

IMPALA-10453: Support file pruning via runtime filters on Iceberg

Iceberg tables store partition information in manifest files and not in
the file path. This metadata has already been pushed down to the
scanners and this commit uses this metadata to evaluate runtime filters
on Iceberg files.

Pefromance measurement:
Used TPC-DS Q10 [1] with scale of 10 to measure the query performance.
Min/Max filters have been disabled and increased the wait time for
runtime filters to 5 seconds. After pre-warming the Catalog I executed
Q10 5 times on my local machine. The fastest execution times were:
Baseline Parquet tables: 1.08s
Baseline Iceberg tables without this patch: 1.43s
Iceberg tables with this patch: 1.09s

Testing:
  * Added e2e tests.
  * Initial perofrmance test with TPC-DS Q10.

Ref:
[1] TPC-DS Q10:
select cd_gender, cd_marital_status, cd_education_status, count(*) cnt1,
  cd_purchase_estimate, count(*) cnt2, cd_credit_rating, count(*) cnt3,
  cd_dep_count, count(*) cnt4, cd_dep_employed_count, count(*) cnt5,
  cd_dep_college_count, count(*) cnt6
 from customer c, customer_address ca, customer_demographics
 where c.c_current_addr_sk = ca.ca_address_sk and
  ca_county in ('Walker County','Richland County','Gaines County',
  'Douglas County','Dona Ana County') and
  cd_demo_sk = c.c_current_cdemo_sk and
  exists (select *
          from store_sales, date_dim
          where c.c_customer_sk = ss_customer_sk and
                ss_sold_date_sk = d_date_sk and
                d_year = 2002 and
                d_moy between 4 and 4+3) and
   exists (select *
          from (select ws_bill_customer_sk as customer_sk, d_year,d_moy
             from web_sales, date_dim where ws_sold_date_sk = d_date_sk
              and d_year = 2002 and
                  d_moy between 4 and 4+3
             union all
             select cs_ship_customer_sk as customer_sk, d_year, d_moy
             from catalog_sales, date_dim
             where cs_sold_date_sk = d_date_sk and d_year = 2002 and
                  d_moy between 4 and 4+3
	     ) x
            where c.c_customer_sk = customer_sk)
 group by cd_gender, cd_marital_status, cd_education_status,
  cd_purchase_estimate, cd_credit_rating, cd_dep_count,
  cd_dep_employed_count, cd_dep_college_count
 order by cd_gender, cd_marital_status, cd_education_status,
  cd_purchase_estimate, cd_credit_rating, cd_dep_count,
  cd_dep_employed_count, cd_dep_college_count
limit 100;

Change-Id: I7762e1238bdf236b85d2728881a402a2bb41f36a
---
M be/src/exec/file-metadata-utils.cc
M be/src/exec/file-metadata-utils.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-runtime-filter.test
M tests/query_test/test_iceberg.py
M tests/query_test/test_runtime_filters.py
13 files changed, 217 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/18531/4
-- 
To view, visit http://gerrit.cloudera.org:8080/18531
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7762e1238bdf236b85d2728881a402a2bb41f36a
Gerrit-Change-Number: 18531
Gerrit-PatchSet: 4
Gerrit-Owner: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>