You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Zoltán Borók-Nagy (Jira)" <ji...@apache.org> on 2022/09/15 12:34:00 UTC

[jira] [Assigned] (IMPALA-11583) Use Iceberg APIs to update table properties for Iceberg tables

     [ https://issues.apache.org/jira/browse/IMPALA-11583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zoltán Borók-Nagy reassigned IMPALA-11583:
------------------------------------------

    Assignee: Zoltán Borók-Nagy

> Use Iceberg APIs to update table properties for Iceberg tables
> --------------------------------------------------------------
>
>                 Key: IMPALA-11583
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11583
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg
>
> COMPUTE STATS updates table-level stats via alter_table() HMS API. This replaces the whole HMS table, therefore if there are concurrent modifications by another engine, e.g. Hive, it's possible that these modifications are lost.
> This is critical for Iceberg tables, as the 'metadata_location' table property must always point to the latest snapshot. Inadvertently rewriting it during COMPUTE STATS can result in a data loss.
> Table-level stats like 'numRows' and 'totalSize' are already updated by Iceberg during table modifications, i.e. there is no need to update these values for COMPUTE STATS.
> Column stats are not affected as they are updated via a different API call ([updateTableColumnStatistics|https://github.com/apache/impala/blob/4e813b7085c995a7244ef886b00c22e9d93cc80c/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1638()]), and it doesn't touch the table properties. But updating statistics also require us to update table property "impala.lastComputeStatsTime".  We should update it via Iceberg APIs when HiveCatalog is used:
> https://github.com/apache/impala/blob/4e813b7085c995a7244ef886b00c22e9d93cc80c/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java#L211
> For other catalogs than HiveCatalog we still need to update the table property via HMS API. It should be safe as other catalogs don't depend on HMS table properties.
> Reloading the HMS table before invoking 'alter_table()' can be considered in other cases (non-Iceberg tables as well), to decrease the possibility of losing concurrent table updates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org