You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Zoltán Borók-Nagy (Jira)" <ji...@apache.org> on 2023/07/24 15:19:00 UTC

[jira] [Updated] (IMPALA-11986) Optimize MIN(part_col)/ MAX(part_col)/ COUNT(DISTINCT part_col)/ queries for Iceberg tables

     [ https://issues.apache.org/jira/browse/IMPALA-11986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zoltán Borók-Nagy updated IMPALA-11986:
---------------------------------------
    Labels: impala-iceberg performance  (was: impala-iceberg)

> Optimize MIN(part_col)/ MAX(part_col)/ COUNT(DISTINCT part_col)/ queries for Iceberg tables
> -------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-11986
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11986
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Li Penglin
>            Priority: Major
>              Labels: impala-iceberg, performance
>
> For Iceberg V1 and V2 tables without deletes:
> https://impala.apache.org/docs/build/html/topics/impala_optimize_partition_key_scans.html OPTIMIZE_PARTITION_KEY_SCANS optimizes the MIN(key_column), MAX(key_column), and COUNT(DISTINCT key_column) by 'TBLS' table and 'PARTITION_KEY_VALS' partition key column in the HMS metadata. For the Iceberg tables, its partitioning stats is not stored in the HMS, but can be obtained through the Iceberg API. We can optimize query performance for MIN(key_column), MAX(key_column), or COUNT(DISTINCT key_column) by similar idea, but we should make sure that 'Partition Transforms' is 'identity'.
> For non-partitioned columns, if min-max information is stored in Iceberg meta, the MIN(column) and MAX(column) queries can also be optimized based on this idea?
> But impala does not guarantee that the statistics for these non-partitioned columns are complete, it's confusing things.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org