You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Aaron Tokhy (JIRA)" <ji...@apache.org> on 2015/08/11 02:55:46 UTC

[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats

    [ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681038#comment-14681038 ] 

Aaron Tokhy commented on HIVE-10631:
------------------------------------

Reading more about hive.stats.reliable, it did not appear to be appropriate to use it in this case, and to instead it would be better to defer stats calculation for partitioned tables when partitions are being added to a table (MSCK/ALTER TABLE), and not on table creation (CREATE [EXTERNAL] TABLE)

> create_table_core method has invalid update for Fast Stats
> ----------------------------------------------------------
>
>                 Key: HIVE-10631
>                 URL: https://issues.apache.org/jira/browse/HIVE-10631
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 0.13.0, 1.0.0
>            Reporter: Dongwook Kwon
>            Priority: Minor
>
> HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. 
> "Fast Stats" was implemented by HIVE-3959
> https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
> From create_table_core method
> {code}
>         if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) &&
>             !MetaStoreUtils.isView(tbl)) {
>           if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
>             MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir);
>           } else { // Partitioned table with no partitions.
>             MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
>           }
>         }
> {code}
> Particularly Line 1363: // Partitioned table with no partitions.
> {code}
> MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
> {code}
> This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true
> Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions.
> Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)