You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tamas Mate (JIRA)" <ji...@apache.org> on 2019/08/12 15:08:00 UTC

[jira] [Commented] (IMPALA-2426) COMPUTE INCREMENTAL STATS doesn't compute stats for newly discovered partitions

    [ https://issues.apache.org/jira/browse/IMPALA-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905295#comment-16905295 ] 

Tamas Mate commented on IMPALA-2426:
------------------------------------

An alter table {{UPDATE_STATS}} is being called to persist the stats, at the moment it leaves the {{reloadMetadata}} true, which later causes a table reload from HMS. The alter table {{UPDATE_STATS}} is called after the stats collection queries are executed, therefore Impala does not have stats for a new partition.

These are the related code parts from [CatalogOpExecutor|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L685]:
{code:java}
case UPDATE_STATS:
  Preconditions.checkState(params.isSetUpdate_stats_params());
  Reference<Long> numUpdatedColumns = new Reference<>(0L);
  alterTableUpdateStats(tbl, params.getUpdate_stats_params(),
      numUpdatedPartitions, numUpdatedColumns);
  reloadTableSchema = true;
  addSummary(response, "Updated " + numUpdatedPartitions.getRef() +
      " partition(s) and " + numUpdatedColumns.getRef() + " column(s).");
  break;
{code}
{code:java}
if (reloadMetadata) {
  loadTableMetadata(tbl, newCatalogVersion, reloadFileMetadata,
      reloadTableSchema, null, "ALTER TABLE " + params.getAlter_type().name());
  addTableToCatalogUpdate(tbl, response.result);
}
{code}
We talked about this Jira during a discussion with [~balazsj_impala_220b] and this unexpected side effect should possibly be removed. The fact that compute stats refreshing the metadata could cause trouble during a Hive ingestion for example.

> COMPUTE INCREMENTAL STATS doesn't compute stats for newly discovered partitions
> -------------------------------------------------------------------------------
>
>                 Key: IMPALA-2426
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2426
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 2.2.4, Impala 2.3.0
>            Reporter: Jim Apple
>            Assignee: Tamas Mate
>            Priority: Minor
>              Labels: catalog-server, ramp-up
>
> In the following sequence, I expect the stats for partition 333 to be computed, but they are not:
> # In Impala: create table T (x int) paritioned by (y int)
> # In Impala: insert into table T partition (y=42) values (2)
> # In Hive: alter table T add partition (y=333)
> # In Impala: compute incremental stats T
> # In Impala: show table stats T



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org