You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/11/07 21:28:00 UTC

[jira] [Commented] (IMPALA-11681) Optimize 'setTableStats' for the Iceberg tables without stats in HMS

    [ https://issues.apache.org/jira/browse/IMPALA-11681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630034#comment-17630034 ] 

ASF subversion and git services commented on IMPALA-11681:
----------------------------------------------------------

Commit f3504566fb97719eec81771a61785cedc85ba6fa in impala's branch refs/heads/master from LPL
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f3504566f ]

IMPALA-11681: Set table stats for the Iceberg table by it's partition stats

For the Iceberg tables, table-level statistics such as numRows can be
computed according to iceberg parition stats, which is more accurate and
real-time. Obtaining these statistics is independent of
StatsSetupConst.ROW_COUNT and StatsSetupConst.TOTAL_SIZE in HMS. This is
an improvement for estimating the cardinality of the Iceberg tables.
But now the calculation of V2 Iceberg table is not accurate, maybe after
IMPALA-11516(Return better partition stats for V2 tables) is ready, they
can be considered to replace those MHS statistics.

Testing:
 - Existing tests
 - Test on 'On-demand Metadata' mode
 - For 'select * from
 iceberg_v2_positional_not_all_data_files_have_delete_files where i =
 (select max(i) from iceberg_v2_positional_update_all_rows)', the 'Join
 Order' and 'Distribution Mode' are the same as when table stats are
 present

Change-Id: I3e92d3f25e2a57a64556249410d0af3522598c00
Reviewed-on: http://gerrit.cloudera.org:8080/19168
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Optimize 'setTableStats' for the Iceberg tables without stats in HMS
> --------------------------------------------------------------------
>
>                 Key: IMPALA-11681
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11681
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: LiPenglin
>            Assignee: LiPenglin
>            Priority: Major
>              Labels: impala-iceberg
>
> For the Iceberg tables, when we cannot get the table Parameters StatsSetupConst.ROW_COUNT and StatsSetupConst.TOTAL_SIZE from HMS, It can be retrieved from Iceberg's meta. This is the appropriate ordering for the join of the Iceberg tables without stats.
> https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L348
> https://github.com/apache/impala/blob/1e30ca228d683821e42e51f94478c77642f5331a/fe/src/main/java/org/apache/impala/catalog/Table.java#L412



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org