You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/11/03 12:23:00 UTC
[jira] [Commented] (IMPALA-10777) Enable min/max filtering for
Iceberg partitions.
[ https://issues.apache.org/jira/browse/IMPALA-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438016#comment-17438016 ]
ASF subversion and git services commented on IMPALA-10777:
----------------------------------------------------------
Commit 9ed4b3689784670532e840c5cb0389bdd9d5c0e8 in impala's branch refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=9ed4b36 ]
IMPALA-10777: Enable min/max filtering for Iceberg partitions
This patch enables min/max filters for Iceberg columns that
participate in table partitioning. The min/max filters are
evaluated at the Parquet row group level. This means that it
is still slower than dynamic partition pruning (which doesn't
even need to open the files), but much faster than no pruning at all.
Performance
I used the following query to measure perf on a scale 10 TPC-DS
dataset:
select i_item_id,sum(ss_ext_sales_price) total_sales
from
store_sales,
date_dim,
customer_address,
item
where i_item_id in (select
i_item_id
from item
where i_color in ('orchid','chiffon','lace'))
and ss_item_sk = i_item_sk
and ss_sold_date_sk = d_date_sk
and d_year = 2000
and d_moy = 1
and ss_addr_sk = ca_address_sk
and ca_gmt_offset = -8
The above query took the following times to execute:
Regular Parquet table: 1.16s
Iceberg table without min/max filters: 4.39s
Iceberg table with min/max filters: 1.77s
Testing:
* added e2e test
* planner test could not be added because Iceberg tables behave
differently during planner tests (due to some hacks that needs
refactoring)
Change-Id: I51b53188c6da7eeebfeae385e1de31ace0980cac
Reviewed-on: http://gerrit.cloudera.org:8080/17960
Reviewed-by: Qifan Chen <qc...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
> Enable min/max filtering for Iceberg partitions.
> ------------------------------------------------
>
> Key: IMPALA-10777
> URL: https://issues.apache.org/jira/browse/IMPALA-10777
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: Qifan Chen
> Assignee: Zoltán Borók-Nagy
> Priority: Major
> Labels: impala-iceberg
>
> The work to enable min/max filters for partition columns is underway. See IMPALA-10738.
> It is nice to enable the filtering for iceberg partitions.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org