You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/06/21 23:06:00 UTC

[jira] [Commented] (IMPALA-11534) Skip reloading file metadata for some ALTER_TABLE events

    [ https://issues.apache.org/jira/browse/IMPALA-11534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735924#comment-17735924 ] 

ASF subversion and git services commented on IMPALA-11534:
----------------------------------------------------------

Commit bb7ad61c6c381ac25cd3fa198944230799fb8be8 in impala's branch refs/heads/master from Sai Hemanth Gantasala
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=bb7ad61c6 ]

IMPALA-11534: Skip reloading file metadata for some ALTER_TABLE events

Reloading file metadata for medium to wide tables is heavy weight
operation in general. So it would be ideal from event processor
perspective to minimize file metadata reloading especially for
ALTER_TABLE statements which are quite common in metastore events.

This patch implements the above optimization by looking at before
and after table objects of an alter event and see if it corresponds
to ALTER TABLE add/change/replace column, set owner, set table
properties. If any of these are changed, the file metadata reloading
can be skipped. For inter-operability purpose this patch introduced a
new start-up flag 'file_metadata_reload_properties' which can be used
to define what table properties need file metadata to be reloaded. If
this value is set to empty, this optimization is not in effect and the
file metadata is always reloaded.

Testing: Added a unit test to confirm that, for certain alter table
statements the file metadata isn't reloaded.

Change-Id: Ia66b96a7c4b7f50fbf46b2e02296cd29a47347b6
Reviewed-on: http://gerrit.cloudera.org:8080/19838
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Skip reloading file metadata for some ALTER_TABLE events
> --------------------------------------------------------
>
>                 Key: IMPALA-11534
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11534
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Sai Hemanth Gantasala
>            Priority: Critical
>
> EventProcessor will ignore trivial ALTER_TABLE events that only modify tblproperties like "transient_lastDdlTime", "totalSize", "numFilesErasureCoded", and "numFiles" (IMPALA-8579). For other kinds of non-rename ALTER_TABLE events, it will trigger a full refresh on the table, which is expensive when there are tens of thousands of partitions.
> We can actually skip reloading file metadata for some ALTER_TABLE events, e.g. events that
>  * changing table comment
>  * adding/dropping columns
>  * changing column definition (name/type/comment)
>  * changing ownership
>  * setting customized tblproperties



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org