You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/05/31 15:37:00 UTC

[jira] [Commented] (IMPALA-10700) Introduce an option to skip deleting column statistics on truncate

    [ https://issues.apache.org/jira/browse/IMPALA-10700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354514#comment-17354514 ] 

ASF subversion and git services commented on IMPALA-10700:
----------------------------------------------------------

Commit e24bdd2175062c30cc9a6bcf0eb2910e8f1f7cae in impala's branch refs/heads/master from Vihang Karajgaonkar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e24bdd2 ]

IMPALA-10700: Add query options to skip deleting stats

When a truncate table command is issued, in case of
non-transactional tables, the table and column statistics for the table
are also deleted by default. This can be a expensive operation
especially when many truncate table commands are running concurrently.
As the concurrency increases, the response time from Hive metastore
slows down the delete table and column statistics RPC calls.

In cases where truncate operation is used to remove the existing
data and then reload new data, it is likely that users will compute
stats again as soon as the new data is reloaded. This would overwrite
the existing statistics and hence the additional time spent by
the truncate operation to delete column and table statistics becomes
unnecessary.

To improve this, this change introduces a new query option:
DELETE_STATS_IN_TRUNCATE. The default value of this option is 1 or true
which means stats will be deleted as part of truncate operation.

As the name suggests, when this query options are set to false or 0,
a truncate operation will not delete the table and column statistics
for the table.

This change also makes a improvement to truncate operation on
tables which are replicated. If the table is being replicated,
previously, the statistics were not getting deleted after truncate.
Now the statistics will get deleted after truncate.

Testing:
Modified truncate-table.test to include variations of these query
options and making sure that the statistics are deleted or skipped
from deletion after truncate operation.

Change-Id: I9400c3586b4bdf46d9b4056ea1023aabae8cc519
Reviewed-on: http://gerrit.cloudera.org:8080/17521
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Introduce an option to skip deleting column statistics on truncate
> ------------------------------------------------------------------
>
>                 Key: IMPALA-10700
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10700
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Major
>
> Currently when a user issues {{truncate table}} command on a non-transactional table, catalogd also deletes the table and column statistics. However, this can affect the performance of the truncate operation especially at high concurrency. Based on preliminary research it looks like other databases do not delete statistics after truncate operation (e.g Oracle, Hive). It would be good to introduce a query option which can set by the user to skip deleting the column statistics during the truncate table execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org