You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/11/03 12:23:00 UTC

[jira] [Commented] (IMPALA-10777) Enable min/max filtering for Iceberg partitions.

    [ https://issues.apache.org/jira/browse/IMPALA-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438016#comment-17438016 ] 

ASF subversion and git services commented on IMPALA-10777:
----------------------------------------------------------

Commit 9ed4b3689784670532e840c5cb0389bdd9d5c0e8 in impala's branch refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=9ed4b36 ]

IMPALA-10777: Enable min/max filtering for Iceberg partitions

This patch enables min/max filters for Iceberg columns that
participate in table partitioning. The min/max filters are
evaluated at the Parquet row group level. This means that it
is still slower than dynamic partition pruning (which doesn't
even need to open the files), but much faster than no pruning at all.

Performance

I used the following query to measure perf on a scale 10 TPC-DS
dataset:

 select i_item_id,sum(ss_ext_sales_price) total_sales
 from
         store_sales,
         date_dim,
          customer_address,
          item
 where i_item_id in (select
      i_item_id
 from item
 where i_color in ('orchid','chiffon','lace'))
  and     ss_item_sk              = i_item_sk
  and     ss_sold_date_sk         = d_date_sk
  and     d_year                  = 2000
  and     d_moy                   = 1
  and     ss_addr_sk              = ca_address_sk
  and     ca_gmt_offset           = -8

The above query took the following times to execute:

Regular Parquet table: 1.16s
Iceberg table without min/max filters: 4.39s
Iceberg table with min/max filters: 1.77s

Testing:
 * added e2e test
 * planner test could not be added because Iceberg tables behave
   differently during planner tests (due to some hacks that needs
   refactoring)

Change-Id: I51b53188c6da7eeebfeae385e1de31ace0980cac
Reviewed-on: http://gerrit.cloudera.org:8080/17960
Reviewed-by: Qifan Chen <qc...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Enable min/max filtering for Iceberg partitions.
> ------------------------------------------------
>
>                 Key: IMPALA-10777
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10777
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Qifan Chen
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg
>
> The work to enable min/max filters for partition columns is underway. See IMPALA-10738. 
> It is nice to enable the filtering for iceberg partitions. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org