You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Jason Dere (JIRA)" <ji...@apache.org> on 2018/05/10 17:43:00 UTC

[jira] [Commented] (HIVE-19489) Disable stats autogather for external tables

    [ https://issues.apache.org/jira/browse/HIVE-19489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470827#comment-16470827 ] 

Jason Dere commented on HIVE-19489:
-----------------------------------

It looks like ALTER TABLE (which occurs after every load/insert) is responsible for computing numFiles/totalSize, and does so by checking the underlying files on the FS. These stats look like they should be accurate, so it looks like we can keep this portion of the stats. So autogather of row/column stats (generated by the autogather stats tasks) will be disabled for external tables.

> Disable stats autogather for external tables
> --------------------------------------------
>
>                 Key: HIVE-19489
>                 URL: https://issues.apache.org/jira/browse/HIVE-19489
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Statistics
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>            Priority: Major
>
> Hive auto-gather of table statistics can result in incorrect generation of stats (and the stats being marked as accurate) in the case of external tables where the data is being written by external apps.
> To avoid this issue, stats autogather will be disabled on external tables when loading/inserting into a table with existing data, if HIVE_DISABLE_UNSAFE_EXTERNALTABLE_OPERATIONS is enabled. In this situation, users should rely on explicitly calling ANALYZE TABLE on their external tables to make sure the stats are kept up-to-date.
> Autogather of stats will still be allowed to occur on external tables in the case of INSERT OVERWRITE or LOAD DATA OVERWRITE, since the existing data is being removed and so the stats calculated on the inserted/loaded data should be accurate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)